CyberWriter
Apple has quietly shipped a pretty complete on-device AI stack into macOS, with these features first getting API access in MacOS 26. There are multiple components in the foundation model, but the skills it shipped with actually make this ~3b parameter model useful. The API to hit the model is super easy, and no one is really wiring them together yet. - Foundation Models (macOS 26) - a ~3B-parameter LLM with an API. Streaming, structured output, tool use. No API key, no cloud call, no per-token cost. - NLContextualEmbedding (Natural Language framework, macOS 14+) -- a BERT-style 512-dim text embedder. Exactly what OpenAI and Cohere sell, sitting in Apple's SDKs since iOS 17. - SFSpeechRecognizer / SpeechAnalyzer - on-device speech-to-text including live dictation. Solid accuracy on Apple Silicon. I built cyberWriter, a Markdown editor, on top of all three, mostly as a test and showcase to see what it can do. I actually integrated local and cloud AI first, and then Apple shipped the foundation model, it stacked on super easy, and now users with no local or API AI knowledge can use it with just a click or two. Well the real reason is because most markdown editors need plugins that run with full system access, and I work on health data and can't have that. Vault chat / semantic search. The app indexes your Markdown folder via NLContextualEmbedding (around 50 seconds for 1000 chunks on an M1). The search bar gets a "Related Ideas" section that matches by meaning - typing "orbital mechanics" surfaces notes about rockets and launch windows even when those exact words never appear. Ask the AI a question and it retrieves the top 5 chunks as context. Plain RAG, but the embedder, retrieval, chat model, and search all run locally. AI Workspace. Command+Shift+A opens a chat panel, Command+J triggers inline quick actions (rewrite, summarize, change tone, fix grammar, continue). Apple Intelligence is the default; Claude, OpenAI, Ollama, and LM Studio all work if you prefer. The same context layer - document selection, attached files, retrieved vault chunks - feeds every provider through the same system-message path. Because the vault context is file and filename aware, it can create backlinks to the referenced file if it writes or edits a doc for you. Voice notes and dictation. Record a voice note directly into your doc, transcribe it with SpeechAnalyzer, or just dictate into the editor while you think. Audio never leaves the Mac. The privacy story is straightforward because the primitives are already private. Vectors live in a `.vault.embeddings.json` file next to your vault, never sent anywhere. If you use Apple Intelligence, even the retrieved text stays on-device. For cloud models there is a clear toggle and an inline warning before any filenames or snippets leave the machine. Honest limitations: - 512-dim embeddings are solid mid-tier. A GPT-4-class embedder catches subtler relationships this will miss. - 256-token chunks can split long paragraphs mid-argument. - Foundation Models caps its context window around 6K characters, so vault context is budgeted to 3K with truncation markers on the rest. - Multilingual support is English-only right now. NLContextualEmbedding has Latin, Cyrillic, and CJK model variants; wiring the language detector across chunks is Phase 2. The developer experience for these APIs is genuinely good. Foundation Models streams cleanly, NLContextualEmbedding downloads assets on demand and gives you mean-poolable token vectors in a handful of lines. Curious what others here are building on this stack - feels like low-hanging fruit that has been sitting there for a while. https://imgur.com/a/HyhHLv2 The Apple AI embedding feature is going live today. I'm honestly surprised it even works out of the box.
AI Analysis
Analysis coming soon.
Similar Products
Command Center, the AI coding env for people who care about quality
Hi HN! We’re Jimmy and Ray. Jimmy is a Thiel Fellow with a Ph. D. from MIT who has worked on programming tools for 15 years; Ray became VP of Sales at a $2B company when he was 19 and has built side-businesses vibe-coding. Last year, we set to answer the question “If AI can write code 100x faster, then why aren’t you shipping 100x faster?” What we learned shocked us — even fairly nontechnical people and solo founders told us they were spending more than half of their development time reading the AI-written code. And much of the rest of the time was spent either de-slop-ping it, or wishing they had done so. As luck turns out, our last two products were a tool that quickly onboards people to large codebases ( https://x.com/0xjimmyk/status/1873357324229984677 ) and trainings that taught deep concepts of code quality to CEOs, YC founders, and engineers at top companies ( mirdin.com ), so we were extremely well-positioned to solve these problems. Command Center is an agentic coding environment focused on quality. With a few keypresses, you can start building 3 features at once and soon have 3 diffs ready, each consisting of 2000 changed lines across 50 files…. This is normally the point where you think “Crap, what now?” With Command Center, at this point you simply click “Refactor,” and watch the vibed slop turn into readable robustness. Then you click “Generate Walkthrough,” and then suddenly, to read a 2000 line diff, instead of scrolling up and down trying to make sense of it, you just press the right arrow key 200 times. See something you don’t like? Click on line 37, type “Do this and all other network fetches in the background Cmd+Enter,” and you have a few more agents getting your code into final shape. Click or type “Commit,” “Push,” “Create PR” — you just shipped a high quality, non-slop feature We’re striving to be the best at every step of the pipeline, but can just try Command Center in pieces wherever you feel your current workflow is weakest. We have users who do all their coding in Zed or the Codex app, and then jump over to Command Center for a walkthrough when it finishes running. There’s even a skill that will pop open a Command Center walkthrough from the environment of your choice. Or you can just keep Command Center running while you do your work elsewhere, and if your AI deletes anything, you have Command Center’s snapshots to the rescue. We launched quietly last year and have been refining since. The quality and usability have kept going up, and Command Center is now ready for a lot more attention. Since our quiet launch, we’ve seen at least a dozen other agentic coding environments appear….approximately all of which have the same feature set focused on the part which is already easy (generating the first version of the code) and with at best a shoddy answer to the hard part (everything that comes after). Command Center’s focus is making the hard parts easy. Here’s what our users have to say: “[The refactorings] give your LLM taste. I’ve never seen an LLM write code this good before.” — Doug Slater, Staff Engineer, Climavision “With Command Center walkthroughs, I can get through a 400-line diff in less than half the time.” — Prateek Kumar, Platfor Engineer, Sumo Logic This product is not for everyone. If you’re someone who preaches “the prompt is the source, the code is the compiler output,” then you probably won’t enjoy Command Center. But if you want to uphold traditional engineering discipline while also shipping 20 PRs a day, then this is the environment for you.
A Highly Available Distributed Router for Global Realtime AI
Show HN: A Highly Available Distributed Router for Global Realtime AI
Rayline routes Claude Code subagents to on-device and cheaper models
Hi HN, I’m one of the builders of Rayline. Rayline is a Claude Code compatible LLM gateway. It intercepts and overrides claude code’s internal routing and lets you route subagent calls to different models instead. For example, you can run the main agent on Opus, some subagents on cloud-hosted open models, and other subagents on-device. We’ve seen others implement routing for claude code as tools the agent can invoke. In our experience, that doesn’t work well because it requires the main agent to use tokens to think about + call the tools, and LLMs are generally a very inefficient way to make routing decisions. By implementing Rayline as a gateway, we let users deterministically configure routing decisions, and you can optionally use our ML model to make routing decisions. We built it after noticing that Claude Code sessions contain a lot of subagent calls that don’t all need the same model. Other routers exist, but we built Rayline to let us continue using claude code (no separate harness), route tasks at a subagent level, and route across cloud and on-device. The main agent often benefits from Opus. But many delegated calls have narrow scope: search the repo, summarize context, inspect an error, poll for CI updates, etc. The thing we’re exploring is subagent-level routing. The main cost lever in coding agents is usually cached vs non-cached input. Subagent delegations are a natural point to make routing decisions because you avoid busting cache. We look at the message-thread context for a delegated call and choose a model for that call. At a task level, Sonnet and Haiku are almost always less capability-per-dollar than open models, so the main advantage is better + (much) cheaper subagents (60-90% in our private beta). The whole world seems to have started talking about model routing in the past two weeks, so apparently others agree it’s a relevant product area. We’d love to get feedback from the HN community!
DomainTasker
Show HN: DomainTasker – avoid losing domains and surprise renewals
Uruky (EU-based Kagi alternative) now has Image Search and URL Rewrites
You can get a 2h free trial by solving a proof-of-work captcha when topping up your account for the first time. If you'd like to learn more, an independent interview was posted a couple of weeks ago [1], and the FAQ [2] has a lot of information as well. For the source code sharing, we've talked with lawyers and are inclined to no longer require the NDA/NCC for privacy reasons shared with us before (signing requires identification), but instead use a source-available permissive license that doesn't allow competition, like PolyForm Shield [3] (we do still have about 6 months before finalising a decision, here). This does come with a lot more risks for us (it's harder to track down if someone publishes the code or uses it against the license), but given we've already passed 100 monthly active accounts, we're feeling more confident it's an acceptable risk. The plan is to give logged in accounts (who are 12 months old or more) a way to download a ZIP of the current code base that's in the server. Obviously there's no easy way to prove that's the case, but we're open to ideas/suggestions if someone here has them. [1]: https://theprivacydad.com/interview-with-the-engineer-of-uru... [2]: https://uruky.com/faq [3]: https://polyformproject.org/licenses/shield/1.0.0