Bhatti
Bhatti spins up Linux VMs on any box with KVM — Pi 5, Hetzner AX, cloud VM with nested virt. - Each VM has its own kernel, filesystem, and IP - Idle VMs pause their CPUs and snapshot themselves to disk; the next request wakes them in 3.7ms warm or 360ms cold (p50, Hetzner AX102) - Publish any port → public URL with auto-wake on first hit - Pull any OCI/Docker image as a rootfs, or save a running sandbox as one - Multi-tenant from day one — per-user bridges, encrypted secrets, rate limits - Single Go binary, Apache 2.0 The decisions page is the most fun read on the site: vsock state after restore, why all snapshots are Full, the systemctl shim, the ARP retransmit trick. curl -fsSL bhatti.sh/install | sudo bash (sudo because the daemon needs /dev/kvm and sets up the Firecracker jailer + a bridge; the CLI-only install — pipe to plain `bash` — needs no root) Site: https://bhatti.sh Repo: https://github.com/sahil-shubham/bhatti Decisions & learnings: https://bhatti.sh/docs/under-the-hood/decisions/
AI Analysis
Analysis coming soon.
Similar Products
GhostBox
Show HN: GhostBox – disposable little machines from the Global Free Tier.
My retired dad and I made a daily, somewhat difficult, quiz
My dad makes the questions, I made the site. I think the genre and the level of difficulty is suited for HN. Hope you enjoy. (I promise no AI-generated questions, they are all hand made!).
A new benchmark for testing LLMs for deterministic outputs
When building workflows that rely on LLMs, we commonly use structured output for programmatic use cases like converting an invoice into rows or meeting transcripts into tickets or even complex PDFs into database entries. The model may return the schema you want, but with hallucinated values like `invoice_date` being off by 2 months or the transcript array ordered wrongly. The JSON is valid, but the values are not. Structured output today is a big part of using LLMs, especially when building deterministic workflows. Current structured output benchmarks (e.g., JSONSchemaBench) only validate the pass rate for JSON schema and types, and not the actual values within the produced JSON. So we designed the Structured Output Benchmark (SOB) that fixes this by measuring both the JSON schema pass rate, types, and the value accuracy across all three modalities, text, image, and audio. For our test set, every record is paired with a JSON Schema and a ground-truth answer that was verified against the source context manually by a human and an LLM cross-check, so a missing or hallucinated value will be considered to be wrong. Open source is doing pretty well with GLM 4.7 coming in number 2 right after GPT 5.4. We noticed the rankings shift across modalities: GLM-4.7 leads text, Gemma-4-31B leads images, Gemini-2.5-Flash leads audio. For example, GPT-5.4 ranks 3rd on text but 9th on images. Model size is not a predictor, either: Qwen3.5-35B and GLM-4.7 beat GPT-5 and Claude-Sonnet-4.6 on Value Accuracy. Phi-4 (14B) beats GPT-5 and GPT-5-mini on text. Structured hallucinations are the hardest bug. Such values are type-correct, schema-valid, and plausible, so they slip through most guardrails. For example, in one audio record, the ground truth is "target_market_age": "15 to 35 years", and a model returns "25 to 35". This is invisible without field-level checks. Our goal is to be the best general model for deterministic tasks, and a key aspect of determinism is a controllable and consistent output structure. The first step to making structured output better is to measure it and hold ourselves against the best.
SyncVibe
Show HN: SyncVibe – Code with friends in the terminal, each with your own AI
Figma alternative where AI works with vector primitives, not code
Show HN: Figma alternative where AI works with vector primitives, not code