LLMs consume 5.4x less mobile energy than ad-supported web search
The standard AI energy debate compares server-side LLM inference to a server-side Google query. I think this misses most of what actually happens on a mobile device during a real search session. I built a parametric model of the full end-to-end mobile search session: 4G/5G radio energy, SoC rendering cost for a 2.5MB page, programmatic advertising RTB auctions running in the background, and network transmission costs for both sides. Then compared it to an equivalent LLM session. Main finding across 10,000 Monte Carlo draws: on mobile, a standard LLM session uses on average 5.4x less energy than a classic ad-supported web search session. Programmatic advertising alone accounts for up to 41% of device battery drain per session. Caveats I tried to be explicit about: - Advantage disappears on fixed Wi-Fi/fiber - Reverses for reasoning models - Parametric model, not empirical device measurement. Greenspector has offered to run terminal measurements for v2 - Jevons paradox applies SSRN working paper, not peer-reviewed. Methodology and Monte Carlo distributions fully documented in the paper. Happy to defend the assumptions. DOI: 10.2139/ssrn.6287918
AI-анализ
Анализ скоро появится.
Похожие продукты
Command Center, the AI coding env for people who care about quality
Hi HN! We’re Jimmy and Ray. Jimmy is a Thiel Fellow with a Ph. D. from MIT who has worked on programming tools for 15 years; Ray became VP of Sales at a $2B company when he was 19 and has built side-businesses vibe-coding. Last year, we set to answer the question “If AI can write code 100x faster, then why aren’t you shipping 100x faster?” What we learned shocked us — even fairly nontechnical people and solo founders told us they were spending more than half of their development time reading the AI-written code. And much of the rest of the time was spent either de-slop-ping it, or wishing they had done so. As luck turns out, our last two products were a tool that quickly onboards people to large codebases ( https://x.com/0xjimmyk/status/1873357324229984677 ) and trainings that taught deep concepts of code quality to CEOs, YC founders, and engineers at top companies ( mirdin.com ), so we were extremely well-positioned to solve these problems. Command Center is an agentic coding environment focused on quality. With a few keypresses, you can start building 3 features at once and soon have 3 diffs ready, each consisting of 2000 changed lines across 50 files…. This is normally the point where you think “Crap, what now?” With Command Center, at this point you simply click “Refactor,” and watch the vibed slop turn into readable robustness. Then you click “Generate Walkthrough,” and then suddenly, to read a 2000 line diff, instead of scrolling up and down trying to make sense of it, you just press the right arrow key 200 times. See something you don’t like? Click on line 37, type “Do this and all other network fetches in the background Cmd+Enter,” and you have a few more agents getting your code into final shape. Click or type “Commit,” “Push,” “Create PR” — you just shipped a high quality, non-slop feature We’re striving to be the best at every step of the pipeline, but can just try Command Center in pieces wherever you feel your current workflow is weakest. We have users who do all their coding in Zed or the Codex app, and then jump over to Command Center for a walkthrough when it finishes running. There’s even a skill that will pop open a Command Center walkthrough from the environment of your choice. Or you can just keep Command Center running while you do your work elsewhere, and if your AI deletes anything, you have Command Center’s snapshots to the rescue. We launched quietly last year and have been refining since. The quality and usability have kept going up, and Command Center is now ready for a lot more attention. Since our quiet launch, we’ve seen at least a dozen other agentic coding environments appear….approximately all of which have the same feature set focused on the part which is already easy (generating the first version of the code) and with at best a shoddy answer to the hard part (everything that comes after). Command Center’s focus is making the hard parts easy. Here’s what our users have to say: “[The refactorings] give your LLM taste. I’ve never seen an LLM write code this good before.” — Doug Slater, Staff Engineer, Climavision “With Command Center walkthroughs, I can get through a 400-line diff in less than half the time.” — Prateek Kumar, Platfor Engineer, Sumo Logic This product is not for everyone. If you’re someone who preaches “the prompt is the source, the code is the compiler output,” then you probably won’t enjoy Command Center. But if you want to uphold traditional engineering discipline while also shipping 20 PRs a day, then this is the environment for you.
A Highly Available Distributed Router for Global Realtime AI
Show HN: A Highly Available Distributed Router for Global Realtime AI
Rayline routes Claude Code subagents to on-device and cheaper models
Hi HN, I’m one of the builders of Rayline. Rayline is a Claude Code compatible LLM gateway. It intercepts and overrides claude code’s internal routing and lets you route subagent calls to different models instead. For example, you can run the main agent on Opus, some subagents on cloud-hosted open models, and other subagents on-device. We’ve seen others implement routing for claude code as tools the agent can invoke. In our experience, that doesn’t work well because it requires the main agent to use tokens to think about + call the tools, and LLMs are generally a very inefficient way to make routing decisions. By implementing Rayline as a gateway, we let users deterministically configure routing decisions, and you can optionally use our ML model to make routing decisions. We built it after noticing that Claude Code sessions contain a lot of subagent calls that don’t all need the same model. Other routers exist, but we built Rayline to let us continue using claude code (no separate harness), route tasks at a subagent level, and route across cloud and on-device. The main agent often benefits from Opus. But many delegated calls have narrow scope: search the repo, summarize context, inspect an error, poll for CI updates, etc. The thing we’re exploring is subagent-level routing. The main cost lever in coding agents is usually cached vs non-cached input. Subagent delegations are a natural point to make routing decisions because you avoid busting cache. We look at the message-thread context for a delegated call and choose a model for that call. At a task level, Sonnet and Haiku are almost always less capability-per-dollar than open models, so the main advantage is better + (much) cheaper subagents (60-90% in our private beta). The whole world seems to have started talking about model routing in the past two weeks, so apparently others agree it’s a relevant product area. We’d love to get feedback from the HN community!
DomainTasker
Show HN: DomainTasker – avoid losing domains and surprise renewals
Uruky (EU-based Kagi alternative) now has Image Search and URL Rewrites
You can get a 2h free trial by solving a proof-of-work captcha when topping up your account for the first time. If you'd like to learn more, an independent interview was posted a couple of weeks ago [1], and the FAQ [2] has a lot of information as well. For the source code sharing, we've talked with lawyers and are inclined to no longer require the NDA/NCC for privacy reasons shared with us before (signing requires identification), but instead use a source-available permissive license that doesn't allow competition, like PolyForm Shield [3] (we do still have about 6 months before finalising a decision, here). This does come with a lot more risks for us (it's harder to track down if someone publishes the code or uses it against the license), but given we've already passed 100 monthly active accounts, we're feeling more confident it's an acceptable risk. The plan is to give logged in accounts (who are 12 months old or more) a way to download a ZIP of the current code base that's in the server. Obviously there's no easy way to prove that's the case, but we're open to ideas/suggestions if someone here has them. [1]: https://theprivacydad.com/interview-with-the-engineer-of-uru... [2]: https://uruky.com/faq [3]: https://polyformproject.org/licenses/shield/1.0.0