Off-and-on trying out an account over at @tal@oleo.cafe due to scraping bots bogging down lemmy.today to the point of near-unusability.

  • 71 Posts
  • 3.11K Comments
Joined 2 years ago
cake
Cake day: October 4th, 2023

help-circle

  • Oh, yeah, it’s not that ollama itself is opening holes (other than adding something listening on a local port), or telling people to do that. I’m saying that the ollama team is explicitly promoting bad practices. I’m just saying that I’d guess that there are a number of people who are doing things like fully-exposing or port-forwarding to ollama or whatever because they want to be using the parallel compute hardware on their computer remotely. The easiest way to do that is to just expose ollama without setting up some kind of authentication mechanism, so…it’s gonna happen.

    I remember someone on here who had their phone and desktop set up so that they couldn’t reach each other by default. They were fine with that, but they really wanted their phone to be able to access the LLM on their computer, and I was helping walk them through it. It was hard and confusing for them — they didn’t really have a background in the stuff, but badly wanted the functionality. In their case, they just wanted local access, while the phone was on their home WiFi network. But…I can say pretty confidently that there are people who want access all the time, to access the thing remotely.



  • If the chips are just being hoarded to shut out competitors, which is what the OpenAI deal was rumoured to be about, we could see the unused chips getting bought and used, but equally likely we could see the chips (and dimms and cpus) deliberately shredded to prevent them falling into competitor hands.

    By the time the things are cycled out, they may not be terribly compute-competitive, in which case…shrugs

    Also, a major unknown is where models go. Say that a bunch of people decide that they can’t get access to parallel compute hardware or a lot of memory, and they research looking at models split up into MoEs or otherwise broken up. Recent LLM models have been oriented towards MoEs. Llama.cpp, and I assume the other engines capable of running LLMs, has the ability to offload experts that don’t fit in GPU memory to main memory when they aren’t actively being used. Then maybe…having a bank of consumer-level 24GB GPUs or something like that is fine, and having chips with direct access to very large amounts of memory isn’t all that interesting. Then, what becomes essential to being competitive changes.

    EDIT: I also think that it’s safe to say that more memory will probably always benefit. But I’ll also say that it is also probably very likely that our existing models are staggeringly inefficient with memory. We are still doing early passes at this.

    Let me give you an example. I have a LLama 3-based model currently loaded on my Framework Desktop that’s using 96GB of memory for the model and associated storage.

    Prompt: What is 1+1?

    Response: I can answer that. 1+1 = 2.

    Prompt: How about 37 times 12?

    Response: 37 times 12 is 444.

    Now, those are correct answers. But…in order to make an LLM capable of providing that correct response, to do it purely via running a neural net trained on natural language, we had to stick a really inefficient amount of data into memory. That same hardware that I’m running it on has the ability to do billions of integer computations per second. As of today, the software running that model doesn’t provide it access to the hardware, and the model was never trained to use it. But…it could be. And if it were, suddenly a lot of need for storing edges in some neural net wasted on arithmetic go away.

    Plus, we could get better results:

    Prompt: What about 783901/76523?

    Response: 783901 divided by 76523 is approximately 10.23.

    That’s not far off — it is approximately 10.23 — but it should have been rounded to 10.24.

    $ maxima -q
    
    (%i1) float(783901/76523);
    
    (%o1)                         10.243992002404506
    (%i2) 
    

    So we could probably get more-useful models that don’t waste a ton of space in the model if we gave the model access to the computational hardware that’s presently sitting idle and trained it to use it. That’s an off-the-cuff example, but I think that it highlights how we’re solving problems inefficiently in terms of memory.

    Same sort of thing with a lot of other problems that we have (immensely-more-efficient and probably accurate) software packages that we can already solve problems with. If you can train the model to use those and run the software in an isolated sandbox rather than trying to do it itself, then we don’t need to blow space in the LLM on the capabilities there, shrink it.

    If we reduce the memory requirements enough to solve a lot of problems that people want with a much-smaller amount of memory, or with a much-less-densely-connected set of neural networks, the hardware that people care about may radically change. In early 2026, the most-in-demand hardware is hugely-power-hungry parallel processors with immense amounts of memory directly connected to it. But maybe, in 2028, we figure out how to get models to use existing software packages designed for mostly-serial computation, and suddenly, what everyone is falling over themselves to get ahold of is more-traditional computer hardware. Maybe the neural net isn’t even where most of the computation is happening for most workloads.

    Maybe the future is training a model to use a library of software and to write tiny, throwaway programs that run on completely different hardware optimized for this “model scratch computation” purposes, and mostly it consulting those.

    Lot of unknowns there.


  • I posted in a thread a bit back about this, but I can’t find it right now, annoyingly.

    You can use the memory on GPUs as swap, though on Linux, that’s currently through FUSE — going through userspace — and probably not terribly efficient.

    https://wiki.archlinux.org/title/Swap_on_video_RAM

    Linux apparently can use it via HMM: the memory will show up as system memory.

    https://www.kernel.org/doc/html/latest/mm/hmm.html

    Provide infrastructure and helpers to integrate non-conventional memory (device memory like GPU on board memory) into regular kernel path

    It will have higher latency due to the PCI bus. It sounds like it basically uses main memory as a cache, and all attempts to directly access a page on the device trigger an MMU page fault:

    Note that any CPU access to a device page triggers a page fault and a migration back to main memory. For example, when a page backing a given CPU address A is migrated from a main memory page to a device page, then any CPU access to address A triggers a page fault and initiates a migration back to main memory.

    I don’t know how efficiently Linux deals with this for various workloads; if it can accurately predict the next access, it might be able to pre-request pages and do this pretty quickly. That is, it’s not that the throughput is so bad, but the latency is, so you’d want to mitigate that where possible. There are going to be some workloads for which that’s impossible: an example case would be just allocating a ton of memory, and then accessing random pages. The kernel can’t mitigate the PCI latency in that case.

    There’s someone who wrote a driver to do this for old Nvidia cards, something that starts with a “P”, that I also can’t find at the moment, which I thought was the only place where it worked, but it sounds like it can also be done on newer Nvidia and AMD hardware. Haven’t dug into it, but I’m sure that it’d be possible.

    A second problem with using a card as swap is going to be that a Blackwell card uses extreme amounts of power, enough to overload a typical consumer desktop PSU. That presumably only has to be used if you’re using the compute hardware, which you wouldn’t if you’re just moving memory around. I mean, existing GPUs normally use much less power than they do when crunching numbers. But if you’re running a GPU on a PSU that cannot actually provide enough power for it running at full blast, you have to be sure that you never actually power up that hardware.

    EDIT: For an H200 (141 GB memory):

    https://www.techpowerup.com/gpu-specs/h200-nvl.c4254

    TDP: 600 W

    EDIT2: Just to drive home the power issue:

    https://www.financialcontent.com/article/tokenring-2025-12-30-the-great-chill-how-nvidias-1000w-blackwell-and-rubin-chips-ended-the-era-of-air-cooled-data-centers

    NVIDIA’s Blackwell B200 GPUs, which became the industry standard earlier this year, operate at a TDP of 1,200W, while the GB200 Superchip modules—combining two Blackwell GPUs with a Grace CPU—demand a staggering 2,700W per unit. However, it is the Rubin architecture, slated for broader rollout in 2026 but already being integrated into early-access “AI Factories,” that has truly broken the thermal ceiling. Rubin chips are reaching 1,800W to 2,300W, with the “Ultra” variants projected to hit 3,600W.

    A standard 120V, 15A US household circuit can only handle 1,800W, even if you keep it fully loaded. Even if you get a PSU capable of doing that and dedicate the entire household circuit to that, beyond that, you’re talking something like multiple PSUs on independent circuits or 240V service or something like that.

    I have a space heater in my bedroom that can do either 400W or 800W.

    So one question, if one wants to use the card for more memory, is going to be what the ceiling on power usage that you can ensure that those cards will use while most of their on-board hardware is idle.


  • I mean, the article is talking about providing public inbound access, rather than having the software go outbound.

    I suspect that in some cases, people just aren’t aware that they are providing access to the world, and it’s unintentional. Or maybe they just don’t know how to set up a VPN or SSH tunnel or some kind of authenticated reverse proxy or something like that, and want to provide public access for remote use from, say, a phone or laptop or something, which is a legit use case.

    ollama targets being easy to set up. I do kinda think that there’s an argument that maybe it should try to facilitate configuration for that setup, even though it expands the scope of what they’re doing, since I figure that there are probably a lot of people without a lot of, say, networking familiarity who just want to play with local LLMs setting these up.

    EDIT: I do kind of think that there’s a good argument that the consumer router situation plus personal firewall situation is kind of not good today. Like, “I want to have a computer at my house that I want to access remotely via some secure, authenticated mechanism without dicking it up via misconfiguration” is something that people understandably want to do and should be more straightforward.

    I mean, we did it with Bluetooth, did a consumer-friendly way to establish secure communication over insecure airwaves. We don’t really have that for accessing hardware remotely via the Internet.





  • While that is true in theory, it’s also true that it’s a little more complicated than that.

    My understanding is that in the past, the US tried placing tariffs on steel originating from China — steel being a strategic good, something where there’s a positive externality to having a secure supply — and it wound up effectively being routed through other countries.

    A second issue is that it’s not just a matter of the steel moving through countries directly, but the fact that products can be manufactured in other countries using steel from China, and there isn’t any system for tracking that. Like, say I buy a desktop computer case made of sheet metal from, oh, Taiwan. Where did the Taiwanese manufacturer get the steel from?

    searches

    Here’s something from Brookings (Brookings not being particularly enthusiastic about either Trump or protectionist trade policy):

    https://www.brookings.edu/articles/is-china-circumventing-us-tariffs-via-mexico-and-canada/

    Since 2018, the U.S. has imposed and ratcheted up tariffs on a broad range of Chinese imports. U.S. tariffs on China have created incentives for Chinese products to circumvent these tariffs by entering the U.S. via Canada and Mexico, paying either the United States-Mexico-Canada Agreement (USMCA) tariff rate of zero or the U.S. WTO Most-Favored-Nation (MFN) rate, which has been well below U.S. tariffs on China. Chinese circumvention of U.S. tariffs undermines the U.S. policy of reducing economic integration with China and addressing the impact of China’s imports on U.S. manufacturing. This paper analyzes the extent of Chinese circumvention of U.S. tariffs up until the end of 2024. Since President Donald Trump came into office in 2025, he has raised tariffs further on imports from China and (but to a lesser extent so far) on imports from Canada, Mexico, and other countries as well. At the time of writing, U.S. tariffs were in flux, but the end result will most likely be U.S. tariffs on imports from China that continue to be higher than U.S. tariffs on imports from Canada and Mexico, thereby maintaining the incentive for circumvention.

    This paper analyzes three ways that Chinese products can circumvent U.S. tariffs:

    1. Transshipment, which occurs when an import from China passes through Mexico or Canada on its way to the U.S.
    2. Incorporation of Chinese products into North American supply chains. This includes manufacturing in Mexico and Canada to produce products that are then exported to the U.S.
    3. Chinese foreign direct investment (FDI) into Mexico and Canada to produce goods that are then exported to the U.S.



  • I was about to say that I knew that COVID-19 caused video game sales to surge, and then crash, and there was over-hiring that had happened in response to those sales, but a third seems like an insanely high number.

    Looking at WP, it sounds like the surge was actually that high…but for mobile OS games and console games, not PC, where the surge was much more muted. I also hadn’t realized that mobile OS video game spending had become that much larger than PC spending.

    https://en.wikipedia.org/wiki/2022–2025_video_game_industry_layoffs

    The COVID-19 pandemic led to an increase in interest in gaming globally, and was a period of dramatic expansion in the industry, with many mergers and acquisitions conducted. In many cases companies over-expanded, as this rapid COVID-era growth was unsustainable. The industry began to slow in 2022, and amid spiralling costs and a shift in consumer habits, layoffs began.

    The first few months of the COVID-19 pandemic brought about a sharp increase in revenue for the gaming sector worldwide as people looked for indoor entertainment.[56] According to IDC, in 2020, revenue from mobile games climbed by 32.8% to $99.9 billion, while expenditure on digital PC and Mac games increased by 7.4% to $35.6 billion.[57] The amount spent on home console games increased significantly as well, reaching $42.9 billion, up 33.9%.[58][59]

    In the ensuing years, this growing pattern abruptly stopped.[60] Revenue growth from mobile gaming fell by 15% in 2021, and then fell even further in 2022 and 2023, to -3.3% and -3.1%, respectively. Sales of PC and Mac games saw a brief rise of 8.7% in 2021, a drop of 1.4% in 2022, and a rebound of 2.1% in 2023.[61] Similarly, after a surge in 2020, console game spending plateaued in 2021 with growth at 0.7%, followed by a decline of 3.4% in 2022, before returning to growth at 5.9% in 2023.[59][62]

    EDIT: Based on those numbers, the surge in mobile and console sales combined was basically equivalent in value to the entirety of PC video game sales. It’s like the equivalent of the entire PC video gaming industry materialized, existed for a few years, and then disappeared.


  • I mean, human environments are intrinsically made for humanoids to navigate. Like, okay, we put stairs places, things like that. So in theory, yeah, a humanoid form makes sense if you want to stick robots in a human environment.

    But in practice, I think that there are all kinds of problems to be solved with humans and robots interacting in the same space and getting robots to do human things. Even just basic safety stuff, much less being able to reasonably do general interactions in a human environment. Tesla spent a long time on FSD for its vehicles, and that’s a much-more-limited-scope problem.

    Like, humanoid robots have been a thing in sci-fi for a long time, but I’m not sold that they’re a great near-term solution.

    If you ever look at those Boston Dynamics demos, you’ll note that they do them in a (rather-scuffed-up) lab with safety glass and barriers and all that.

    I’m not saying that it’s not possible to make a viable humanoid robot at some point. But I don’t think that the kind of thing that Musk has claimed it’ll be useful for:

    “It’ll do anything you want,” Musk said. “It can be a teacher, babysit your kids; it can walk your dog, mow your lawn, get the groceries; just be your friend, serve drinks. Whatever you can think of, it will do.”

    …a sort of Rosie The Robot from The Jetsons, is likely going to be at all reasonable for quite some time.





  • An order had been issued on Friday prohibiting British activists from gathering for a planned “stop the boats” protest nicknamed Operation Overlord in the departments of Nord and Pas-de-Calais.

    They named a protest aimed at stopping boats from crossing the English Channel…after what was probably the largest crossing of boats across the English Channel ever?

    https://en.wikipedia.org/wiki/Operation_Overlord

    Operation Overlord was the codename for the Battle of Normandy, the Allied operation that launched the successful liberation of German-occupied Western Europe during World War II. The operation was launched on 6 June 1944 (D-Day) with the Normandy landings (Operation Neptune). A 1,200-plane airborne assault preceded an amphibious assault involving more than 5,000 vessels. Nearly 160,000 troops crossed the English Channel on 6 June, and more than two million Allied troops were in France by the end of August.


  • I just am not sold that there’s enough of a market, not with the current games and current prices.

    There are several different types of HMDs out there. I haven’t seen anyone really break them up into classes, but if I were to take a stab at it:

    • VR gaming googles. These focus on providing an expansive image that fills the peripheral vision, and cut one off from the world. The Valve Index would be an example.

    • AR goggles. I personally don’t like the term. It’s not that augmented reality isn’t a real thing, but that we don’t really have the software out there to do AR things, and so while theoretically these could be used for augmented reality, that’s not their actual, 2026 use case. But, since the industry uses it, I will. These tend to display an image covering part of one’s visual field which one can see around and maybe through. Xreal’s offerings are an example.

    • HUD glasses. These have a much more limited display, or maybe none at all. These are aimed at letting one record what one is looking at less-obtrusively, maybe throw up notifications from a phone silently, things like the Ray-Ban Meta.

    • Movie-viewers. These things are designed around isolation, but don’t need head-tracking. They may be fine with relatively-low resolution or sharpness. A Royole Moon, for example.

    For me, the most-exciting prospect for HMDs is the idea of a monitor replacement. That is, I’d be most-interested in something that does basically what my existing displays do, but in a lower-power, more-portable, more-private form. If it can also do VR, that’d be frosting on the cake, but I’m really principally interested in something that can be a traditional monitor, but better.

    For me, at least, none of the use cases for the above classes of HMDs are super-compelling.

    For movie-viewing. It just isn’t that often that I feel that I need more isolation than I can already get to watch movies. A computer monitor in a dark room is just fine. I can also put things on a TV screen or a projector that I already have sitting around and I generally don’t bother to turn on. If I want to block out outside sound more, I might put on headphones, but I just don’t need more than that. Maybe for someone who is required to be in noisy, bright environments or something, but it just isn’t a real need for me.

    For HUD glasses, I don’t really have a need for more notifications in my field of vision — I don’t need to give my phone a HUD.

    AR could be interesting if the augmented reality software library actually existed, but in 2026, it really doesn’t. Today, AR glasses are mostly used, as best I can tell, as an attempt at a monitor replacement, but the angular pixel density on them is poor compared to conventional displays. Like, in terms of the actual data that I can shove into my eyeballs in the center of my visual field, which is what matters for things like text, I’m better off with conventional monitors in 2026.

    VR gaming could be interesting, but the benefits just aren’t that massive for the games that I play. You get a wider field of view than a traditional display offers, the ability to use your head as an input for camera control. There are some genres that I think that it works well with today, like flight sims. If you were a really serious flight-simmer, I could see it making sense. But most genres just don’t benefit that much from it. Yeah, okay, you can play Tetris Effect: Connected in VR, but it doesn’t really change the game all that much.

    A lot of the VR-enabled titles out there are not (understandably, given the size of the market) really principally aimed at taking advantage of the goggles. You’re basically getting a port of a game aimed at probably a keyboard and mouse, with some tradeoffs.

    And for VR, one has to deal with more setup time, software and hardware issues, and the cost. I’m not terribly price sensitive on gaming compared to most, but if I’m getting a peripheral for, oh, say, $1k, I have to ask how seriously I’m going to play any of the games that I’m buying this hardware for. I have a HOTAS system with flight pedals; it mostly just gathers dust, because I don’t play many WW2 flight sims these days, and the flight sims out there today are mostly designed around thumbsticks. I don’t need to accumulate more dust-collectors like that. And with VR the hardware ages out pretty quickly. I can buy a conventional monitor today and it’ll still be more-or-less competitive for most uses probably ten or twenty years down the line. VR goggles? Not so much.

    At least for me, the main things that I think that I’d actually get some good out of VR goggles on:

    • Vertical-orientation games. My current monitors are landscape aspect ratio, and don’t support rotating (though I imagine that there might be someone that makes a rotating VESA mount pivot, and I could probably use wlr-randr to make Wayland change the display orientation manually) Some games in the past in arcades had something like a 3:4 portrait mode aspect ratio. If you’re playing one of those, you could maybe get some extra vertical space. But unless I need the resolution or portability, I can likely achieve something like that by just moving my monitor closer while playing such a game.

    • Pinball sims, for the same reason.

    • There are a couple of VR-only games that I’d probably like to play (none very new).

    • Flight sims. I’m not really a super-hardcore flight simmer. But, sure, for WW2 flight sims or something like Elite: Dangerous, it’s probably nice.

    • I’d get a little more immersiveness out of some games that are VR-optional.

    But…that’s just not that overwhelming a set of benefits to me.

    Now, I am not everyone. Maybe other people value other things. And I do think that it’s possible to have a “killer app” for VR, some new game that really takes advantage of VR and is so utterly compelling that people feel that they’d just have to get VR goggles so as to not miss out. Something like what World of Warcraft did for MMO gaming, say. But the VR gaming effort has been going on for something like a decade now, and nothing like that has really turned up.


  • Have a limited attack surface will reduce exposure.

    If, say, the only thing that you’re exposing is, oh, say, a Wireguard VPN, then unless there’s a misconfiguration or remotely-exploitable bug in Wireguard, then you’re fine regarding random people running exploit scanners.

    I’m not too worried about stuff like (vanilla) Apache, OpenSSH, Wireguard, stuff like that, the “big” stuff that have a lot of eyes on them. I’d be a lot more dubious about niche stuff that some guy just threw together.

    To put perspective on this, you gotta remember that most software that people run isn’t run in a sandbox. It can phone home. Games on Steam. If your Web browser has bugs, it’s got a lot of sites that might attack it. Plugins for that Web browser. Some guy’s open-source project. That’s a potential vector too. Sure, some random script kiddy running an exploit scanner is a potential risk, but my bet is that if you look at the actual number of compromises via that route, it’s probably rather lower than plain old malware.

    It’s good to be aware of what you’re doing when you expose the Internet to something, but also to keep perspective. A lot of people out there run services exposed to the Internet every day; they need to do so to make things work.