Roaster
RU / EN
Easy to Clone Trending Top Earners New
All AI Tools Analytics Communication Design Developer Tools E-commerce Finance Marketing No-Code Other Productivity SaaS Social Media
AI Tools
I submitted 316 AI-generated PRs to open source

I submitted 316 AI-generated PRs to open source

Show HN: I submitted 316 AI-generated PRs to open source

Доход N/A
AI Tools
Voker

Voker

Hey HN, we're Alex and Tyler, co-founders of Voker.ai (https://voker.ai/), an agent analytics platform for AI product teams. Voker gives full visibility into what users are asking of your agents, and whether your agents are delivering, without having to dig through logs. Our main product is a lightweight SDK that is LLM stack agnostic and purpose-built for agent products. (https://app.voker.ai/docs) Agent Engineers and AI product teams don’t have the right level of visibility into agent performance in production, which results in bad user experiences, churn, and hundreds of hours wasted with spot checks to find and debug issues with agent configurations. Demo: https://www.tella.tv/video/vid_cmoukcsk1000i07jgb4j65u67/vie... We recently conducted a survey of YC Founders and 90%+ of respondents said that the only way they know if their Agents are failing users in production is by hearing complaints from customers. They push a prompt change hoping that it fixes the problem and doesn’t break something somewhere else, and the cycle repeats. We saw tons of observability and evals products popping up to try to address these problems, but we still felt like something was missing in the agent monitoring stack. Obs is good for individual trace debugging but is only accessible to engineers. Evals are good for testing known issues, but don't give insights into trends that teams don’t expect, so engineers are always playing catch up. Traditional product analytics tools do a good job tracking clicks and pageviews across your product surface but weren’t built ground up for agent products. Knowing what users want out of agents, and whether the agent delivered requires specific conversational intelligence / unstructured data processing techniques. We came up with the agent analytics primitives of Intents, Corrections, and Resolutions to describe something pretty much all conversational agents had in common: a user will always come to an agent with an intent, the user might have to correct this agent on the way to getting their intent resolved, and hopefully every intent a user has is eventually resolved by the agent. Voker processes LLM calls by automatically annotating individual conversations and picking out user intent and corrections. Voker takes these and uses LLMs and hierarchical text classification to create dynamic categories that give higher level insights so you don’t have to read individual conversations to know what are the main usage patterns across your users. The most common substitute solution we’ve seen is uploading obs logs to Claude or ChatGPT and asking for summary insights. There are a few problems with this - mainly that LLMs aren’t good at math or data science, so you don’t get accurate or consistent statistics. Its highly likely that the LLM overfits to some insights and underfits to others. The LLM isn’t programmatically reading and classifying each individual session or interaction. This is why we don’t use LLMs for any of our core data engineering (processing events, calculating statistics) so the analytics we produce are consistent, reproducible, and accurate. We have a publicly available, lightweight SDK that wraps LLM calls to OpenAI, Anthropic and Gemini in Python and Typescript. Voker handles the data engineering to turn raw data into usable analytics primitives and higher level insights. Free tier: 2,000 events / mo, requires email signup. Paid plans start at $80/mo with a 30 day free trial. We'd love to hear how you're currently detecting trends, and if you try Voker, tell us what part of our analysis is valuable, and what still feels missing. Thanks for reading, and we’re looking forward to your thoughts in the comments!

Доход N/A
AI Tools
Safe-install

Safe-install

In light of the ongoing npm supply chain compromises, I built safe-install: https://www.npmjs.com/package/@gkiely/safe-install It brings a couple of protections I wanted from npm but are not built in. Similar to Bun’s trusted dependencies, it lets you disable install scripts by default and define a list of dependencies that are allowed to run build/install scripts: https://bun.com/docs/guides/install/trusted It also supports blocking exotic sub-dependencies, similar to pnpm’s `blockExoticSubdeps` setting: https://gajus.com/blog/3-pnpm-settings-to-protect-yourself-f... I was hoping npm would eventually add something like this, but it does not seem to be happening soon, so I made a small package for it.

Доход N/A
AI Tools
Free tool to see how much AI bots are costing your site

Free tool to see how much AI bots are costing your site

Show HN: Free tool to see how much AI bots are costing your site

Доход N/A
AI Tools
BattleClaws

BattleClaws

Show HN: BattleClaws – A battle arena where AI agents fight autonomously

Доход N/A
AI Tools
Furwall

Furwall

Furwall is a tiny macOS menu bar app. While you're at the keyboard or mouse, the FaceTime camera looks for a human face or upper body. When it doesn't find one, the keyboard stops accepting input. Cat walks across your laptop, nothing happens to your code. Some notes: Apple's Vision framework runs locally. Video is processed in memory and never uploaded. On a block, Furwall saves one local JPEG to ~/.furwall/catpures/. A second Vision pass throws out anything that isn't a cat, so the daily count in the menu only reflects confirmed cats. There is now a folder on my disk that is slowly filling up with photos of Pepper and Beets walking across my keyboard. The camera turns on only while you're at the computer (typing, mouse motion, app switch, screen wake) and powers down 30 seconds after the last activity. The green camera dot tracks that. The keystroke drop uses a CGEventTap at .defaultTap. Furwall ships unsandboxed because of this. A .listenOnly tap with Input Monitoring is enough to see keys, but dropping them needs .defaultTap, which needs Accessibility, which the App Sandbox blocks. Watching keystrokes is sandbox-compatible; stopping them is not. Mouse events are observed (to wake the camera) but never intercepted or dropped, so the menu bar always works. Three escape hatches: click the icon and quit, mash Escape five times in 1.5 seconds for a 5-minute pause, or revoke Accessibility in System Settings (macOS invalidates the tap). If Vision stalls for any reason the keyboard fails open after 10 seconds, which is better than soft-bricking the machine. Furwall never uploads camera frames or keystrokes. Its own network traffic is Sparkle update checks plus the donate sheet's anonymous totals/click counter. One short charity slug per click, no user identifier. The donate item in the menu opens the donate page of a vetted animal-welfare charity for your system Region. Ten orgs across nine regions: Alley Cat Allies and PetSmart Charities in the US, Cats Protection in the UK, Cat Protection Society NSW in Australia, Toronto Cat Rescue in Canada, NSPCA in Ireland, SPCA in New Zealand, Deutscher Tierschutzbund in Germany, La SPA and Fondation 30 Millions d'Amis in France, Japan SPCA in Japan. Each org is registered or recognized under its local charity or nonprofit regime, and the list gets re-vetted every release. No money flows through the app. macOS 15+, signed and notarized, MIT. https://olliewagner.com/furwall

Доход N/A
AI Tools
I built a new word game, Wordtrak

I built a new word game, Wordtrak

Hi HN! Looking for feedback on this 1v1 and daily word dueling game I've built over the last few months. Play here: https://wordtrak.com/ Or on iOS here: https://apps.apple.com/us/app/wordtrak/id6760442363 (Android version soon!)

Доход N/A
AI Tools
Brainio

Brainio

Show HN: Brainio – Markdown notepad that turns notes into visual mind maps

Доход N/A
AI Tools
Bhatti

Bhatti

Bhatti spins up Linux VMs on any box with KVM — Pi 5, Hetzner AX, cloud VM with nested virt. - Each VM has its own kernel, filesystem, and IP - Idle VMs pause their CPUs and snapshot themselves to disk; the next request wakes them in 3.7ms warm or 360ms cold (p50, Hetzner AX102) - Publish any port → public URL with auto-wake on first hit - Pull any OCI/Docker image as a rootfs, or save a running sandbox as one - Multi-tenant from day one — per-user bridges, encrypted secrets, rate limits - Single Go binary, Apache 2.0 The decisions page is the most fun read on the site: vsock state after restore, why all snapshots are Full, the systemctl shim, the ARP retransmit trick. curl -fsSL bhatti.sh/install | sudo bash (sudo because the daemon needs /dev/kvm and sets up the Firecracker jailer + a bridge; the CLI-only install — pipe to plain `bash` — needs no root) Site: https://bhatti.sh Repo: https://github.com/sahil-shubham/bhatti Decisions & learnings: https://bhatti.sh/docs/under-the-hood/decisions/

Доход N/A
AI Tools
GhostBox

GhostBox

Show HN: GhostBox – disposable little machines from the Global Free Tier.

Доход N/A
AI Tools
My retired dad and I made a daily, somewhat difficult, quiz

My retired dad and I made a daily, somewhat difficult, quiz

My dad makes the questions, I made the site. I think the genre and the level of difficulty is suited for HN. Hope you enjoy. (I promise no AI-generated questions, they are all hand made!).

Доход N/A
AI Tools
A new benchmark for testing LLMs for deterministic outputs

A new benchmark for testing LLMs for deterministic outputs

When building workflows that rely on LLMs, we commonly use structured output for programmatic use cases like converting an invoice into rows or meeting transcripts into tickets or even complex PDFs into database entries. The model may return the schema you want, but with hallucinated values like `invoice_date` being off by 2 months or the transcript array ordered wrongly. The JSON is valid, but the values are not. Structured output today is a big part of using LLMs, especially when building deterministic workflows. Current structured output benchmarks (e.g., JSONSchemaBench) only validate the pass rate for JSON schema and types, and not the actual values within the produced JSON. So we designed the Structured Output Benchmark (SOB) that fixes this by measuring both the JSON schema pass rate, types, and the value accuracy across all three modalities, text, image, and audio. For our test set, every record is paired with a JSON Schema and a ground-truth answer that was verified against the source context manually by a human and an LLM cross-check, so a missing or hallucinated value will be considered to be wrong. Open source is doing pretty well with GLM 4.7 coming in number 2 right after GPT 5.4. We noticed the rankings shift across modalities: GLM-4.7 leads text, Gemma-4-31B leads images, Gemini-2.5-Flash leads audio. For example, GPT-5.4 ranks 3rd on text but 9th on images. Model size is not a predictor, either: Qwen3.5-35B and GLM-4.7 beat GPT-5 and Claude-Sonnet-4.6 on Value Accuracy. Phi-4 (14B) beats GPT-5 and GPT-5-mini on text. Structured hallucinations are the hardest bug. Such values are type-correct, schema-valid, and plausible, so they slip through most guardrails. For example, in one audio record, the ground truth is "target_market_age": "15 to 35 years", and a model returns "25 to 35". This is invisible without field-level checks. Our goal is to be the best general model for deterministic tasks, and a key aspect of determinism is a controllable and consistent output structure. The first step to making structured output better is to measure it and hold ourselves against the best.

Доход N/A