Roaster
EN / RU
Spec27

Spec27

Hi HN! We’re a team of ML validation specialists and we’ve been building /Spec27, a tool for testing whether AI agents still do their job safely and reliably as models, prompts, tools, and surrounding systems change. We started working on this because a lot of current LLM evaluation work seems aimed at scoring general model behavior, while many teams are deploying systems that have a specific mission to fulfill. Many of the tools also assume you have full access to the agent stack and traces so you can place SDKs and Gateways, but a lot of agents are being created on vendor platforms where this isn’t possible. As a result, we approaches it from the outside in: all tests just run to the primary interfaces of an Agent and don’t assume anything about internals. The other important things about the approach is spec-driven. Instead of treating testing as a one-off benchmark or static eval set, we let teams define reusable specifications for the behavior they want from an agent, then generate tests against those specs. With this you can automatically generate adversarial and robustness checks, so you can see what an agent is sensitive to and what kinds of changes cause it to fail. We’ve worked on validation for other AI systems before, including vision and tabular workflows, and /Spec27 is our new product for language-model-based agents. Currently in early access, so we’d love feedback! The current version is strongest for single-turn agent and application validation. We do not fully support multi-turn interactions yet, and better telemetry/tool-call integration is still on our roadmap. We’ve made the product open to try for HN readers, with a sample flow so it’s easy to poke around without much setup. We’d especially love feedback from people deploying internal agents, vendor agents, or other AI systems where reliability matters more than benchmark scores.

Developer Tools B2B · njyx
N/A
Revenue not available

AI Analysis

Analysis coming soon.

Similar Products

Developer Tools
Capgo

Capgo

Instant updates for Capacitor apps. Ship fixes in minutes, not weeks. Push OTA updates to users without app store delays.

$15.2K /mo
Developer Tools Easy to clone
OpenAlternative

OpenAlternative

Open source alternatives to popular software. Over 1 million users replaced their proprietary tools with open source software. Discover the best alternatives and join the movement.

$6.7K /mo
Developer Tools
I built 80 mini-games using Fable before it was shut down

I built 80 mini-games using Fable before it was shut down

Dear Hacker News, I'm kindly asking for your participation in the open beta for my AI-managed mini-games website. Thank you in advance! For a limited time window, I'm setting the all-free feature flag to true. I hope you have a lot of fun exploring the AI's sense for games! Here and there, I tweaked it to help with visual consistency. I would be deeply grateful if you opted into analytics. $2,300 in API tokens... Cheers!

Revenue N/A
Developer Tools
Homebrew 6.0.0

Homebrew 6.0.0

Today, I’m proud to announce Homebrew 6.0.0. The most significant changes since 5.1.0 are a new tap trust security mechanism, the new faster, smaller, default internal Homebrew JSON API, sandboxing on Linux, better defaults informed by our user survey, many brew bundle improvements, improved performance and initial support for macOS 27 (Golden Gate). Happy to discuss any questions here!

Revenue N/A
Developer Tools
Intunedhq

Intunedhq

Hey HN, we're Faisal and Ahmad from Intuned (https://intunedhq.com). We’re building a platform for building, deploying, and maintaining browser automations. Customers primarily use the Intuned AI agent to automate websites that don't expose APIs. Common use-cases include scraping data, pulling reports, and submitting forms. As the website changes, our agent also helps automatically heal the automation. On Intuned, browser automations are created by an AI agent and run as code. Our infra captures the context of every run, allowing our agent to debug and maintain the underlying code - to keep the automations working over time. This way, we’re able to offer the predictability, speed, and cost of code, without the painful parts of writing and maintaining it. Here’s a demo of building a scraper on Intuned: https://youtu.be/ruZP73bK4FU Here’s a demo of using AI to maintain a project: https://youtu.be/e4R4hLdHBro Backstory: we were accepted into YC for a completely different idea. During the batch, because of Faisal's background at UiPath, several batchmates asked us whether RPA tools could fill API gaps in their products by automating websites without APIs. When it was time to pivot, we went back to those founders to dig deeper. (RPA in this context is referring to using UI automation to do complete non-testing tasks) We discovered that the actual hard problem in browser automation is maintenance. Websites change, selectors break, and failures can be painful to reproduce and fix. So in early 2024, we decided to take a crack at this problem with a handful of customers. It needed a fair number of iterations before we landed on our current code-first approach. How it works: Intuned is infra + agent, deeply integrated. On the infrastructure side, Intuned is a managed runtime for browser automation code. Projects are usually Playwright-based TypeScript or Python. Users can write them directly in our online IDE, or hand the work off to the agent. Either way, once deployed, the platform runs each project in its own isolated machine and handles auth/session reuse, scheduling, batch execution, concurrency, observability, and the other plumbing around running browser code. On the agent side, it took us a few iterations to get to the current approach. Our initial attempts were rigid pipelines: collect requirements, inspect the site, generate code, then try to patch whatever broke. It looked reasonable on paper, but real websites are too messy for fixed paths. Late last year, we were planning to ship that version when stronger models landed and harnesses like Claude Code and Codex showed what a more open-ended coding agent could do. We built a prototype on the Claude Agent SDK, it felt much better than what we had, and we scrapped the release and decided to rebuild the agent. The rebuild came down to three pieces around the SDK: an execution environment for running long agent sessions reliably, a CLI that exposes the platform to the agent so it operates Intuned the way engineers do, and a custom plugin (skills + MCP) built around what we've learned building browser automations. The infra-agent integration is where the product gets more interesting. The runtime doesn't just run the automation; it captures the context needed to debug it when it fails: params, results, traces, logs. That enables features like Fix with AI, where you can open a failed run and have the agent investigate and prepare a fix. The same integration powers a feature called self-healing. For configured projects, the platform detects failures, starts an agent session with the relevant context, and either proposes a fix for review or deploys it automatically. Demo: https://youtu.be/IVHIXw0lYMs We recently also packaged the infra and agent as an API called Web Task API, here is a demo: https://youtu.be/1olRn3l95vw We strongly believe that browser automations can and should be faster, cheaper and more predictable. Check us out at https://app.intuned.io/, we have a free tier with trial credits for your first few automations. Excited to hear your thoughts, questions, and feedback!

Revenue N/A

Quick Facts

Category
Developer Tools
Audience
B2B
Founder
njyx
Revenue data
Unknown

Share