Roaster
EN / RU
Easy to Clone Trending Top Earners New
All AI Tools Analytics Communication Design Developer Tools E-commerce Finance Marketing No-Code Other Productivity SaaS Social Media
Design
Omar

Omar

We were both genuinely impressed by Claude Code after it helped each of us fix nasty CI problems overnight. Doing those fixes manually would have taken days. After that experience, we each found ourselves struggling through Ctrl+Tab through multiple Claude Code windows in our terminals. While we enjoyed having agents working for us in parallel, context switching and cycling through each terminal tab was a real pain. So we thought: Can we design a TUI dashboard that manages a large swarm of agents in one place? Even better, can agents manage agents hierarchically, like how companies work? OMAR (Open Multi-Agent Runtime) is the result of this exploration. We spent months building it, and we think it is now ready to show the world. If you find OMAR interesting, give it a try. We would love to hear from you. :) Check out our blog here for more details: https://omar.tech/blog/introducing-omar/ Thanks! Karim & Shaokai

Revenue N/A
AI Tools
GhostBox

GhostBox

Show HN: GhostBox – disposable little machines from the Global Free Tier.

Revenue N/A
Developer Tools
Pu.sh

Pu.sh

I originally was just messing with pi-autoresearch. Gave it a sample task to build the most portable coding agent. First cut was 6 KB of shell. Great for one-shots, unusable interactively. I was shocked it actually worked. Started building up -- adding features — but with a self-imposed rule: no new dependencies, and sub 500 LOC. This thing had to be truly portable. Just sh, curl, awk. System primitives only. Which means I did some genuinely disgusting things in awk, including JSON parsing and the OpenAI Responses tool loop with reasoning items carried across turns. It's now ~400 lines. In the box: Anthropic + OpenAI, 7 tools (bash, read, write, edit, grep, find, ls), REPL, auto-compaction, checkpoint/resume, pipe mode, 90 no-API tests. Not in the box: TUI, streaming, images, OAuth, Windows, dignity. Two honest things: 1. I stole/modified the system prompt and the architecture. Pi/Claude/Codex wrote the awk. I cannot read most of this code. This wasn't possible for me a year ago. 2. Heavily inspired by Pi (pi.dev) — same 7-tool surface, same exact-text edit model. Credit where it's due. Pi is awesome -- you should probably use them. The agent loop itself is tiny. Almost everything else in a "real" agent CLI is DX and hardening. You can probably build your own harness exactly how you like it. Mario Zechner's AI Engineer talk on taking back control of your tools nudged me here. The name is because it's a .sh file. The other thing it sounds like is, regrettably, also accurate.

Revenue N/A
Developer Tools
Spec27

Spec27

Hi HN! We’re a team of ML validation specialists and we’ve been building /Spec27, a tool for testing whether AI agents still do their job safely and reliably as models, prompts, tools, and surrounding systems change. We started working on this because a lot of current LLM evaluation work seems aimed at scoring general model behavior, while many teams are deploying systems that have a specific mission to fulfill. Many of the tools also assume you have full access to the agent stack and traces so you can place SDKs and Gateways, but a lot of agents are being created on vendor platforms where this isn’t possible. As a result, we approaches it from the outside in: all tests just run to the primary interfaces of an Agent and don’t assume anything about internals. The other important things about the approach is spec-driven. Instead of treating testing as a one-off benchmark or static eval set, we let teams define reusable specifications for the behavior they want from an agent, then generate tests against those specs. With this you can automatically generate adversarial and robustness checks, so you can see what an agent is sensitive to and what kinds of changes cause it to fail. We’ve worked on validation for other AI systems before, including vision and tabular workflows, and /Spec27 is our new product for language-model-based agents. Currently in early access, so we’d love feedback! The current version is strongest for single-turn agent and application validation. We do not fully support multi-turn interactions yet, and better telemetry/tool-call integration is still on our roadmap. We’ve made the product open to try for HN readers, with a sample flow so it’s easy to poke around without much setup. We’d especially love feedback from people deploying internal agents, vendor agents, or other AI systems where reliability matters more than benchmark scores.

Revenue N/A
Developer Tools
The Dominion List

The Dominion List

Show HN: The Dominion List – an open-source db of Canadian founders in the US

Revenue N/A
No-Code
VisuaLeaf

VisuaLeaf

Visualeaf is a MongoDB GUI I’ve been building over the past year. Stack is Electron + Angular + Spring Boot. There’s a live playground on the site if you want to try it without installing or putting in your connection (I provided one). The goal was to combine a visual workflow with the depth needed for real development work. Most existing MongoDB tools tend to optimize for either beginners or power users, but not both in the same interface. Core features: Query builder that supports full MongoDB query expressiveness + being able to drag and drop elements from the collection to the query builder Form based aggregation builder with synchronized JSON view Schema visualization and generation tools GridFS viewer with MP4 streaming support (streaming mp4 was pretty tricky ) IDE style split panels and multiple workspaces Import/export transformations (mask/edit fields during export ) Tree view ( finding a way to expand recursively thousands of nodes was a challenge) Table view (I had to build my own take on AG Grid focusing on optimizing horizontal and virtual scrolling to get it to scroll smoothly on thousands of rows and columns) A lot of the work ended up being performance engineering. It currently loads ~500MB of data into the UI in about 5 seconds on an M1 MacBook. And can even easily display over 20k documents of an average size (12kb) . Here’s a walkthrough of all its features:: https://www.youtube.com/watch?v=WNzvDlbpGTk Happy to answer questions! Thank you so much!

Revenue N/A
Developer Tools
Filling PDF forms with AI using client-side tool calling

Filling PDF forms with AI using client-side tool calling

Hey HN! I built SimplePDF Copilot: an AI assistant that can interact with the PDF editor. It fills fields, answers questions, focuses on a specific field, adds fields, deletes pages, and so on. It's built on top of SimplePDF that I started 7 years ago, pioneering privacy-respecting client-side pdf editing, now used monthly by 200k+ people. As for the privacy model: the PDF itself never leaves the browser. Parsing, rendering, and field detection all run client-side. The text the model needs (and your messages) goes to whatever LLM you point at. By default that's our demo proxy (DeepSeek V4 Flash, rate-capped), but you can BYOK and point it at any cloud provider, or go fully local (I've been testing with LM Studio). Unlike the existing "Chat with PDF" tools that only retrieve the text/OCR layer, Copilot can act on the PDF: filling fields, adding fields (detected client-side using CommonForms by Joe Barrow [1], jbarrow on HN with some post-processing heuristics I added on top), focusing on fields, deleting pages, and so on. I built this because SimplePDF is mostly used by healthcare customers where document privacy is paramount, and I wanted an AI experience that didn't require shipping PII to a third party. Stack is pretty standard: - Tanstack Start - AI SDK from Vercel - Tailwind (I personally prefer CSS modules, I'm old-school but the goal since I open source it, I figured that Tailwind would be a better fit) The more interesting part is the client-side tool calling: events are passed back and forth via iframe postMessage. If you're not familiar with "tool calling" and "client-side tool calling", a quick primer: Tool calling is what LLMs use to take actions. When Claude runs grep or ls, or hits an MCP server, those are tool calls. Client-side tool calling means the intent to call a tool comes from the LLM, but the execution happens in the browser. That matters for: speed, you can't go faster than client-to-client operations and also gives you the ability to limit the data you expose to the LLM. For the demo I do feed the content of the document to the LLM, but that connection could be severed as simply as removing the tool that exposes the content data. The demo is fully open source, available on Github [2] and the demo is the same as the link of this post [3] What's not open source is SimplePDF itself (loaded as the iframe). I could talk on and on about this, let me know if you have any questions, anything goes! [1] https://github.com/jbarrow/commonforms [2] https://github.com/SimplePDF/simplepdf-embed/tree/main/copil... [3] https://copilot.simplepdf.com/?share=a7d00ad073c75a75d493228...

Revenue N/A
AI Tools
My retired dad and I made a daily, somewhat difficult, quiz

My retired dad and I made a daily, somewhat difficult, quiz

My dad makes the questions, I made the site. I think the genre and the level of difficulty is suited for HN. Hope you enjoy. (I promise no AI-generated questions, they are all hand made!).

Revenue N/A
AI Tools
A new benchmark for testing LLMs for deterministic outputs

A new benchmark for testing LLMs for deterministic outputs

When building workflows that rely on LLMs, we commonly use structured output for programmatic use cases like converting an invoice into rows or meeting transcripts into tickets or even complex PDFs into database entries. The model may return the schema you want, but with hallucinated values like `invoice_date` being off by 2 months or the transcript array ordered wrongly. The JSON is valid, but the values are not. Structured output today is a big part of using LLMs, especially when building deterministic workflows. Current structured output benchmarks (e.g., JSONSchemaBench) only validate the pass rate for JSON schema and types, and not the actual values within the produced JSON. So we designed the Structured Output Benchmark (SOB) that fixes this by measuring both the JSON schema pass rate, types, and the value accuracy across all three modalities, text, image, and audio. For our test set, every record is paired with a JSON Schema and a ground-truth answer that was verified against the source context manually by a human and an LLM cross-check, so a missing or hallucinated value will be considered to be wrong. Open source is doing pretty well with GLM 4.7 coming in number 2 right after GPT 5.4. We noticed the rankings shift across modalities: GLM-4.7 leads text, Gemma-4-31B leads images, Gemini-2.5-Flash leads audio. For example, GPT-5.4 ranks 3rd on text but 9th on images. Model size is not a predictor, either: Qwen3.5-35B and GLM-4.7 beat GPT-5 and Claude-Sonnet-4.6 on Value Accuracy. Phi-4 (14B) beats GPT-5 and GPT-5-mini on text. Structured hallucinations are the hardest bug. Such values are type-correct, schema-valid, and plausible, so they slip through most guardrails. For example, in one audio record, the ground truth is "target_market_age": "15 to 35 years", and a model returns "25 to 35". This is invisible without field-level checks. Our goal is to be the best general model for deterministic tasks, and a key aspect of determinism is a controllable and consistent output structure. The first step to making structured output better is to measure it and hold ourselves against the best.

Revenue N/A
Other
Rip.so

Rip.so

Show HN: Rip.so – a graveyard for dead internet things

Revenue N/A
AI Tools
SyncVibe

SyncVibe

Show HN: SyncVibe – Code with friends in the terminal, each with your own AI

Revenue N/A
AI Tools
Figma alternative where AI works with vector primitives, not code

Figma alternative where AI works with vector primitives, not code

Show HN: Figma alternative where AI works with vector primitives, not code

Revenue N/A