Control your X/Twitter feed using a small on-device LLM
We built a Chrome extension and iOS app that filters Twitter's feed using Qwen3.5-4B for contextual matching. You describe what you don't want in plain language—it removes posts that match semantically, not by keyword. What surprised us was that because Twitter's ranking algorithm adapts based on what you engage with, consistent filtering starts reshaping the recommendations over time. You're implicitly signaling preferences to the algorithm. For some of us it "healed" our feed. Currently running inference from our own servers with an experimental on-device option, and we're working on fully on-device execution to remove that dependency. Latency is acceptable on most hardware but not great on older machines. No data collection; everything except the model call runs locally. It doesn't work perfectly (figurative language trips it up) but it's meaningfully better than muting keywords and we use it ourselves every day. Also promising how local / open models can now start giving us more control over the algorithmic agents in our lives, because capability density is improving.
Keyboard First Email Client
My email clients/inbox really fu*ing annoyed me. Tallyman is what happened next: a keyboard driven email client on top of Gmail and Outlook. Your vim muscle memory works (j/k, gg, relative line numbers, counts, ...) 39 rebindable shortcuts, command palette, email templates, themes ... No migration. OAuth only. Verified by Microsoft and live now. Google verification is under review. 30 day free trial, $9/mo per inbox after that. Write me an email if you need an extended trial: contact@tallyman.io
boringBar
Hi HN! I recently switched from a Fedora/GNOME laptop to a MacBook Air. My old setup served me well as a portable workstation, but I’ve started traveling more while working remotely and needed something with similar performance but better battery life. The main thing I missed was a simple taskbar that shows the windows in the current workspace instead of a Dock that mixes everything together. I built boringBar so I would not have to use the Dock. It shows only the windows in the current Space, lets you switch Spaces by scrolling on the bar, and adds a desktop switcher so you can jump directly to any Space. You can also hide the system Dock, pin apps, preview windows with thumbnails, and launch apps from a searchable menu (I keep Spotlight disabled because for some reason it uses a lot of system resources on my machine). I’ve been dogfooding it for a few months now, and it finally felt polished enough to share. It’s for people who like macOS but want window management to feel a bit more like GNOME, Windows, or a traditional taskbar. It’s also for people like me who wanted an easier transition to macOS, especially now that Windows feels increasingly user-hostile. I’d love feedback on the UX, bugs, and whether this solves the same Dock/Spaces pain for anyone else. P.S. It might also appeal to people who feel nostalgic for the GNOME 2 desktop of yore. I started my Linux journey with it, and boringBar brings back some of that feeling for me.
ParseBench
Show HN: ParseBench – Document parsing benchmark for AI agents
Kelet
I've spent the past few years building 50+ AI agents in prod (some reached 1M+ sessions/day), and the hardest part was never building them — it was figuring out why they fail. AI agents don't crash. They just quietly give wrong answers. You end up scrolling through traces one by one, trying to find a pattern across hundreds of sessions. Kelet automates that investigation. Here's how it works: 1. You connect your traces and signals (user feedback, edits, clicks, sentiment, LLM-as-a-judge, etc.) 2. Kelet processes those signals and extracts facts about each session 3. It forms hypotheses about what went wrong in each case 4. It clusters similar hypotheses across sessions and investigates them together 5. It surfaces a root cause with a suggested fix you can review and apply The key insight: individual session failures look random. But when you cluster the hypotheses, failure patterns emerge. The fastest way to integrate is through the Kelet Skill for coding agents — it scans your codebase, discovers where signals should be collected, and sets everything up for you. There are also Python and TypeScript SDKs if you prefer manual setup. It’s currently free during beta. No credit card required. Docs: https://kelet.ai/docs/ I'd love feedback on the approach, especially from anyone running agents in prod. Does automating the manual error analysis sound right?
Ilha
Show HN: Ilha – a UI library that fits in an AI context window
CyberWriter
Apple has quietly shipped a pretty complete on-device AI stack into macOS, with these features first getting API access in MacOS 26. There are multiple components in the foundation model, but the skills it shipped with actually make this ~3b parameter model useful. The API to hit the model is super easy, and no one is really wiring them together yet. - Foundation Models (macOS 26) - a ~3B-parameter LLM with an API. Streaming, structured output, tool use. No API key, no cloud call, no per-token cost. - NLContextualEmbedding (Natural Language framework, macOS 14+) -- a BERT-style 512-dim text embedder. Exactly what OpenAI and Cohere sell, sitting in Apple's SDKs since iOS 17. - SFSpeechRecognizer / SpeechAnalyzer - on-device speech-to-text including live dictation. Solid accuracy on Apple Silicon. I built cyberWriter, a Markdown editor, on top of all three, mostly as a test and showcase to see what it can do. I actually integrated local and cloud AI first, and then Apple shipped the foundation model, it stacked on super easy, and now users with no local or API AI knowledge can use it with just a click or two. Well the real reason is because most markdown editors need plugins that run with full system access, and I work on health data and can't have that. Vault chat / semantic search. The app indexes your Markdown folder via NLContextualEmbedding (around 50 seconds for 1000 chunks on an M1). The search bar gets a "Related Ideas" section that matches by meaning - typing "orbital mechanics" surfaces notes about rockets and launch windows even when those exact words never appear. Ask the AI a question and it retrieves the top 5 chunks as context. Plain RAG, but the embedder, retrieval, chat model, and search all run locally. AI Workspace. Command+Shift+A opens a chat panel, Command+J triggers inline quick actions (rewrite, summarize, change tone, fix grammar, continue). Apple Intelligence is the default; Claude, OpenAI, Ollama, and LM Studio all work if you prefer. The same context layer - document selection, attached files, retrieved vault chunks - feeds every provider through the same system-message path. Because the vault context is file and filename aware, it can create backlinks to the referenced file if it writes or edits a doc for you. Voice notes and dictation. Record a voice note directly into your doc, transcribe it with SpeechAnalyzer, or just dictate into the editor while you think. Audio never leaves the Mac. The privacy story is straightforward because the primitives are already private. Vectors live in a `.vault.embeddings.json` file next to your vault, never sent anywhere. If you use Apple Intelligence, even the retrieved text stays on-device. For cloud models there is a clear toggle and an inline warning before any filenames or snippets leave the machine. Honest limitations: - 512-dim embeddings are solid mid-tier. A GPT-4-class embedder catches subtler relationships this will miss. - 256-token chunks can split long paragraphs mid-argument. - Foundation Models caps its context window around 6K characters, so vault context is budgeted to 3K with truncation markers on the rest. - Multilingual support is English-only right now. NLContextualEmbedding has Latin, Cyrillic, and CJK model variants; wiring the language detector across chunks is Phase 2. The developer experience for these APIs is genuinely good. Foundation Models streams cleanly, NLContextualEmbedding downloads assets on demand and gives you mean-poolable token vectors in a handful of lines. Curious what others here are building on this stack - feels like low-hanging fruit that has been sitting there for a while. https://imgur.com/a/HyhHLv2 The Apple AI embedding feature is going live today. I'm honestly surprised it even works out of the box.
Daemons
For almost two years, we've been developing Charlie, a coding agent that is autonomous, cloud-based, and focused primarily on TypeScript development. During that time, the explosion in growth and development of LLMs and agents has surpassed even our initially very bullish prognosis. When we started Charlie, we were one of the only teams we knew fully relying on agents to build all of our code. We all know how that has gone — the world has caught up, but working with agents hasn't been all kittens and rainbows, especially for fast moving teams. The one thing we've noticed over the last 3 months is that the more you use agents, the more work they create. Dozens of pull requests means older code gets out of date quickly. Documentation drifts. Dependencies become stale. Developers are so focused on pushing out new code that this crucial work falls through the cracks. That's why we pivoted away from agents and invented what we think is the necessary next step for AI powered software development. Today, we're introducing Daemons: a new product category built for teams dealing with operational drag from agent-created output. Named after the familiar background processes from Linux, Daemons are added to your codebase by adding an .md file to your repo, and run in a set-it-and-forget-it way that will make your lives easier and accelerate any project. For teams that use Claude, Codex, Cursor, Cline, or any other agent, we think you'll really enjoy what Daemons bring to the table.
AthleteData
Im a triathlete and the data for my training lives in 6 apps: Garmin, Strava, WHOOP, Intervals.icu, Wahoo, Withings, Apple Health, sometimes Hevy. Every morning Id eyeball a few of them and make a call on whether to do the planned session. For the past month I have been building a thing that does this for me, and got it to the point where I use it myself every day. It OAuths into whatever platforms you connect, reconciles the activities (tbh harder than it sounds — same ride shows up in Strava, Garmin, and Wahoo with different timestamps and rounding), computes daily load and readiness, and proactively messages you over Telegram or Whatsapp when something matters. Stack is straightforward: Typescript all the way, Postgres, an agent loop running on Claude (via Bedrock) with tool access to all your data + my computed metrics: zones, CTL/ATL/TSB, power/pace curves, anomaly detection on HRV and RHR, etc Two things that were harder than expected: 1. Garmins API only exposes the last 90 days. So for anyone with Garmin as their primary device, you have to backfill from Strava and stitch the two together. Strava has full history but misses some fields (e.g. HR-based TSS only — no power). Wahoo and intervals.icu fill different gaps. The dedup pipeline is ugly and I'd welcome feedback from anyone who has solved this better. 2. Deciding when to message vs. stay silent is entirely a product problem. Too chatty -> muted. Too quiet -> feels dead. One honest caveat though: no RCT data, and Id be skeptical of anyone who claims they have it for AI coaching at this stage. I am at ~50 paying users, I personally reach out to every user to build the next iterations of the product based on feedback. Already got testimonials from Ironman world championship finishers and other pro athletes. Theres also a $9/mo MCP tier for people who would rather pipe their data into their own Claude/ChatGPT. Happy to go deep on any topic! e.g. the tool-calling architecture, or the cost-per-user question (running an agent on every athlete daily is not free, and the margins here are worth discussing).
I built a toy that plays grandma's stories when my daughter hugs it
This was a project I built for my daughter's first birthday present. For context, I'm a surgical resident in the UK by background and am currently taking a year out of training to study a masters in computer science. My daughter just turned one. There are two things she really loves: the first is particular soft toy that she just can't live without, and the other is a good story book. Her grandparents live hours away and I didn't want her to forget what they sound like between visits. I wanted her to hear them whenever she missed them. My parents brought my brother and I up with incredible stories and books from all sorts of cultures, many of the stories being passed down from their parents before them. I didn't want my daughter to miss out on that. Finally, I was sick of missing storytime with her when I had to leave for night shifts. I wanted her to hear my voice before she slept every night. For all these reasons, I decided to build Storyfriend. It's her favourite soft toy with a custom made speaker-module inside. I combined my surgical skills with the skills I was learning as a CS student. Along the way I dipped my toes into the world of 3D printing, CAD and electronics design. When she hugs the toy, it plays stories read by her grandparents. She can take the toy with her anywhere and hear the stories anytime she wants - it works offline and has internal storage. It meets my wife's strict no-screen rule (which is getting harder to stick to as the days go by). I've recorded some of the stories that we would read together, so that on nights when I'm working she still has me there to read her a bedtime story. The bit I'm most pleased with: grandparents don't need an app. They just call a phone number. The audio routes through my server and pushes to the toy over WiFi. My own 86-year old grandmother in a rural village in another country can do it by just making a regular call via her landline, as she has done for many years - no help needed, no apps required, no smartphones involved. Hardware is a BLE/wifi module with a MAX98357 chip and custome battery management system, all soldered together, placed in a 3D printed enclosure and placed into a compartment that I stitched into her cuddly toy. Firmware pulls new messages when connected to WiFi and stores them on an SD card. So far I've sold a few hand-made units to parents and grandparents who resonated with the project. Site: https://storyfriend.co.uk Would love feedback on the technical approach, the product itself, or anything else. Happy to answer questions about the build
Atomic
Show HN: Atomic – Local-first, AI-augmented personal knowledge base
LLMs consume 5.4x less mobile energy than ad-supported web search
The standard AI energy debate compares server-side LLM inference to a server-side Google query. I think this misses most of what actually happens on a mobile device during a real search session. I built a parametric model of the full end-to-end mobile search session: 4G/5G radio energy, SoC rendering cost for a 2.5MB page, programmatic advertising RTB auctions running in the background, and network transmission costs for both sides. Then compared it to an equivalent LLM session. Main finding across 10,000 Monte Carlo draws: on mobile, a standard LLM session uses on average 5.4x less energy than a classic ad-supported web search session. Programmatic advertising alone accounts for up to 41% of device battery drain per session. Caveats I tried to be explicit about: - Advantage disappears on fixed Wi-Fi/fiber - Reverses for reasoning models - Parametric model, not empirical device measurement. Greenspector has offered to run terminal measurements for v2 - Jevons paradox applies SSRN working paper, not peer-reviewed. Methodology and Monte Carlo distributions fully documented in the paper. Happy to defend the assumptions. DOI: 10.2139/ssrn.6287918