I submitted 316 AI-generated PRs to open source
Show HN: I submitted 316 AI-generated PRs to open source
Voker
Hey HN, we're Alex and Tyler, co-founders of Voker.ai (https://voker.ai/), an agent analytics platform for AI product teams. Voker gives full visibility into what users are asking of your agents, and whether your agents are delivering, without having to dig through logs. Our main product is a lightweight SDK that is LLM stack agnostic and purpose-built for agent products. (https://app.voker.ai/docs) Agent Engineers and AI product teams don’t have the right level of visibility into agent performance in production, which results in bad user experiences, churn, and hundreds of hours wasted with spot checks to find and debug issues with agent configurations. Demo: https://www.tella.tv/video/vid_cmoukcsk1000i07jgb4j65u67/vie... We recently conducted a survey of YC Founders and 90%+ of respondents said that the only way they know if their Agents are failing users in production is by hearing complaints from customers. They push a prompt change hoping that it fixes the problem and doesn’t break something somewhere else, and the cycle repeats. We saw tons of observability and evals products popping up to try to address these problems, but we still felt like something was missing in the agent monitoring stack. Obs is good for individual trace debugging but is only accessible to engineers. Evals are good for testing known issues, but don't give insights into trends that teams don’t expect, so engineers are always playing catch up. Traditional product analytics tools do a good job tracking clicks and pageviews across your product surface but weren’t built ground up for agent products. Knowing what users want out of agents, and whether the agent delivered requires specific conversational intelligence / unstructured data processing techniques. We came up with the agent analytics primitives of Intents, Corrections, and Resolutions to describe something pretty much all conversational agents had in common: a user will always come to an agent with an intent, the user might have to correct this agent on the way to getting their intent resolved, and hopefully every intent a user has is eventually resolved by the agent. Voker processes LLM calls by automatically annotating individual conversations and picking out user intent and corrections. Voker takes these and uses LLMs and hierarchical text classification to create dynamic categories that give higher level insights so you don’t have to read individual conversations to know what are the main usage patterns across your users. The most common substitute solution we’ve seen is uploading obs logs to Claude or ChatGPT and asking for summary insights. There are a few problems with this - mainly that LLMs aren’t good at math or data science, so you don’t get accurate or consistent statistics. Its highly likely that the LLM overfits to some insights and underfits to others. The LLM isn’t programmatically reading and classifying each individual session or interaction. This is why we don’t use LLMs for any of our core data engineering (processing events, calculating statistics) so the analytics we produce are consistent, reproducible, and accurate. We have a publicly available, lightweight SDK that wraps LLM calls to OpenAI, Anthropic and Gemini in Python and Typescript. Voker handles the data engineering to turn raw data into usable analytics primitives and higher level insights. Free tier: 2,000 events / mo, requires email signup. Paid plans start at $80/mo with a 30 day free trial. We'd love to hear how you're currently detecting trends, and if you try Voker, tell us what part of our analysis is valuable, and what still feels missing. Thanks for reading, and we’re looking forward to your thoughts in the comments!
Agentic interface for mainframes and COBOL
Hi HN, we’re Sai and Aayush, and we’re building Hypercubic (https://www.hypercubic.ai/), bringing AI tools to the mainframe and COBOL world. (We did a Launch HN last year: https://news.ycombinator.com/item?id=45877517.) Today we’re launching Hopper, an agentic development environment for mainframes. You can download it here: https://www.hypercubic.ai/hopper, and you can also request access and immediately get a mainframe user account to play with. There's also a video runthrough at https://www.youtube.com/watch?v=q81L5DcfBvE. Mainframes still run a surprising amount of critical infrastructure: banking, payments, insurance, airlines, government programs, logistics, and core operations at large institutions. Many of these systems are decades old, but they continue to process enormous transaction volumes because they are reliable, secure, and deeply embedded into business operations. A lot of that software is written in COBOL and runs on IBM z/OS. The development environment looks very different from modern cloud or Unix-style development. Instead of GitHub, shell commands, package managers, and CI pipelines, developers often work through TN3270 terminal sessions, ISPF panels, partitioned datasets, JCL, JES queues, spool output, return codes, VSAM files, CICS transactions, and shop-specific conventions. TN3270 is the terminal interface used to interact with many IBM mainframe systems. ISPF is the menu and panel system developers use inside that terminal to browse datasets, edit source, submit jobs, and inspect output. It is powerful and reliable, but it was designed for expert humans navigating screens, function keys, and fixed-width workflows, not AI agents. A simple COBOL change might require finding the right source member, checking copybooks, locating compile JCL, submitting a job, reading JES/SYSPRINT output, interpreting condition codes, patching fixed-width source, and resubmitting. Much of this work is so well-defined and repetitive that it's a good fit for agentic AI. To get that working, however, a chatbot next to a terminal is not enough. The agent needs to operate inside the mainframe environment. Hopper combines three things: (1) A real TN3270 terminal, (2) Mainframe-aware panels for datasets, members, jobs, and spool output, and (3) An AI agent that can operate across those z/OS surfaces. For example, here is a tiny version of the kind of thing Hopper can help debug: COBOL: IDENTIFICATION DIVISION. PROGRAM-ID. PAYCALC. DATA DIVISION. WORKING-STORAGE SECTION. 01 CUSTOMER-BALANCE PIC 9(7)V99. PROCEDURE DIVISION. ADD 100.00 TO CUSTOMER-BALNCE DISPLAY "UPDATED BALANCE: " CUSTOMER-BALANCE STOP RUN. JCL: //PAYCOMP JOB (ACCT),'COMPILE',CLASS=A,MSGCLASS=X //COBOL EXEC IGYWCL [//COBOL.SYSIN](https://cobol.sysin/) DD DSN=USER1.APP.COBOL(PAYCALC),DISP=SHR [//LKED.SYSLMOD](https://lked.syslmod/) DD DSN=USER1.APP.LOAD(PAYCALC),DISP=SHR A human would submit this job, inspect JES output, open `SYSPRINT`, find the undefined `CUSTOMER-BALNCE`, map it back to the source, patch the member, and resubmit. Hopper is designed to let an agent operate through that same loop autonomously. Hopper is not trying to hide the mainframe behind a generic abstraction, and it's not a chatbot. The design principle is simple: preserve the fidelity of the mainframe environment, but make it accessible to AI agents. Sensitive operations require approval, and the terminal remains visible at all times. Once agents can operate inside the mainframe environment, new workflows become possible: faster job debugging, automated documentation, safer code changes, test generation, migration planning, traffic replay, and modernization verification. We’re curious to hear your thoughts! especially from anyone who has worked with mainframes, COBOL or has done legacy enterprise modernization.
Personal Trainer
Hi HN, I'm an indie dev who built this to settle a long-running argument with my teenage son about who lifts more at the gym. I wanted a workout tracker to kept our data side-by-side on a leaderboard, and let us swap training plans without texting screenshots back and forth. Nothing I tried did all three without a subscription wall, so I wrote my own. Most of the app is free, it covers the full workout experience and progress tracking with 80+ exercises. The one-time purchase unlocks multiple plans, custom movements, and the social/sharing features, mainly to pay for the infra. The mobile app is React Native via Expo with Expo Router, NativeWind for styling, Zustand for state (every store is user-scoped via an AsyncStorage key prefix so account switches do a clean rehydrate, good way to have several using using the same app), Reanimated for animations, and react-i18next across six locales. The core workout loop is fully offline, and a Firebase layer sits on top for optional cross-device sync and the social features. Backend is a small workspace of Firebase Cloud Functions v2 that handle cross-user fan-out (friend requests, shared plans, shared movements) with rate limiting and anti-spam in Firestore triggers. Published a few weeks ago, I can proudly but surprisingly see the daily usage increasing. This is my first mobile app since a while (was doing some 10+y ago), keen for any feedback since it gave me the excitement of building some more. Android only for now, iOS to be published soon.
Safe-install
In light of the ongoing npm supply chain compromises, I built safe-install: https://www.npmjs.com/package/@gkiely/safe-install It brings a couple of protections I wanted from npm but are not built in. Similar to Bun’s trusted dependencies, it lets you disable install scripts by default and define a list of dependencies that are allowed to run build/install scripts: https://bun.com/docs/guides/install/trusted It also supports blocking exotic sub-dependencies, similar to pnpm’s `blockExoticSubdeps` setting: https://gajus.com/blog/3-pnpm-settings-to-protect-yourself-f... I was hoping npm would eventually add something like this, but it does not seem to be happening soon, so I made a small package for it.
Free tool to see how much AI bots are costing your site
Show HN: Free tool to see how much AI bots are costing your site
Open-source 2D IDE for managing agent CLIs
Show HN: Open-source 2D IDE for managing agent CLIs
Codebook of 450k+ unique words and phrases acts as a text compressor
Show HN: Codebook of 450k+ unique words and phrases acts as a text compressor
An index of indie web/blog indexes
I saw a comment here about how there are so many indexes of indie sites, blogs, etc but there wasn't an index of all the indexes. So I built it. It doesn't require a log in, just go browse! I've curated about 30 or so, but there is a submission form if there are ones I am missing. Also happy to take UI improvements because I am not great in that area!
A search engine for deleted YouTube videos (1.5B+ indexed since 2005)
Show HN: A search engine for deleted YouTube videos (1.5B+ indexed since 2005)
Mochi.js: bun-native high-fidelity browser automation library
Hi HN, I’m sharing mochi.js (https://github.com/0xchasercat/mochi), a Bun-native, raw-CDP browser automation framework. It's designed to make programmatic browser use more effective by focusing on consistency and measured parity with regular traffic, purely from the JS layer, against stock Chromium. The most common forms of browser automation focus heavily on client-side line by line probes, which are mostly cosmetic. This makes people feel better but it doesn't have much relevance to actual WAF or anti-automation defences. Mochi.js focuses on what actually matters, allowing you to get past captchas, WAF's and most defence mechanisms. In fact, in some cases it actually outperforms chromium forks simply by virtue of not having to lie. The foundation is built on a probe manifest based on analyzing several WAF's and trying to cover most of the ground that matters, and from there building upwards while ensuring every decision is backed by data. Solves turnstile/interstitial automatically, single digit fpjs suspect score, very good client-side results, though browserscan and a few others are known limitations that are fundamentally conflicting with what WAF's probe for. I'll be here if anyone wants to discuss the details, check out the docs and github. It's completely free and open source, MIT, strictly no relationship to any proprietary products whatsoever. No affiliation to patched chromium forks, or SaaS. But I also want to talk about why I built this, because the current paradigm of "bot detection" is fundamentally broken. Traditionally they would probably try to label my repository a malicious tool, or at best, a grey hat one. Let's take Turnstile for example, If you attach a debugger to see what data they are extracting from your hardware, their script intentionally self-destructs. When they try to extract your data—acting as a guest on your silicon, using your electricity, without asking, the industry calls it "Security." But if you write a script to control exactly what data your own hardware emits, refusing to provide the data they have no right to ask for, you are suddenly labeled a "Malicious Actor" engaged in "Bot Evasion." I find it absurd we let ourselves put up with this, and the stance of the bot-evasion community only makes them feel more able to take a higher moral ground. I have built a library that respects my hardware's reality. If that breaks your security model, that's because your security model relies on trespassing and secrecy. I stopped apologizing. Who's next? Mochi is the exact opposite of WAF opacity. It is a glass box. It is MIT-licensed. The entire DAG, fingerprint manifest schema, harvesting process, is documented. We even commit our live benchmarks to the public record (mochi on a Linux datacenter IP scored a suspect_score: 8 and bot: not_detected against FingerprintJS Pro v4). We don't even lie unnecessarily. We default to host-OS matching. If you run mochi on a Linux server, it uses privacy-sensible fingerprints for Linux, not Windows, because Linux is a real-user signal. It proves that WAFs aren't actually blocking what most people think they are, which begs the question of what they are really doing in that obfuscated payload. The legitimacy argument is exactly how they captured the narrative. And nobody challenged it because the people on the other side were too busy acting like they were doing something wrong. Is this a conspiracy theory? For sure, but only because they allow it to be. Try make a conspiracy theory about the sticky riceball.
CADara
Show HN: CADara – I made an open-source in-browser CAD