We scored 50k PRs with AI

Name: We scored 50k PRs with AI
Author: chuboy

I'm a CTO with a ~16-person engineering team. Last year I wanted real data on what was actually shipping, not guesswork or story point theater. So we built GitVelocity. Every merged PR gets scored 0–100 by Claude across six dimensions: scope (0–20), architecture (0–20), implementation (0–20), risk (0–20), quality (0–15), perf/security (0–5). Six dimensions added up, then scaled by change size — a 10-line fix scores lower than a 500-line refactor even at the same complexity. Full formula at gitvelocity.dev/scoring-guide. After scoring 50,000+ PRs across TypeScript, Python, Rust, Go, Java, Elixir, and more, some things surprised us: Big PRs don't automatically score high. An 800-line migration with low complexity scores worse than a 200-line architectural change. Size gets you the full multiplier, but the base score still has to earn it. You can't score well without tests. The quality dimension (0–15) won't give you points without test coverage. At similar experience levels, this was the clearest separator between engineers. Juniors started outscoring some seniors. They adopted AI tools faster and took on harder problems. Once they could see their own scores, they aimed higher. We score AI-generated code the same as human-written code. Code is code. An engineer who uses AI to ship more complex work faster is more productive, and their scores reflect that. Scoring consistency was the hardest technical problem. Without reference examples anchoring each dimension, Claude's scores drifted 15+ points between runs. With 18 calibrated anchors (three per dimension at low/mid/high), we got it down to 2–4 points on the same PR. The thing we didn't expect was behavioral. We call it the Fitbit effect — the tool doesn't make you ship better code, but seeing the score does. Engineers started referencing their own scores in 1:1s unprompted, because the numbers matched what they already felt about their work. A junior who shipped a tricky concurrency fix could point to a score that proved it wasn't "just a small PR." We recently added team benchmarks (gitvelocity.dev/demo/benchmarks). Once you're scoring PRs, you can see how your team compares to others across the dataset — about 1,000 engineers on 60 teams so far. Headline's team ships faster than roughly 95% of them, which was nice to confirm but also made us wonder who the other 5% are. The competitive angle surprised us: teams that were skeptical about individual scores got genuinely curious once they could measure themselves against the field. Every score is fully visible to the engineer who wrote the PR, with per-dimension breakdowns and reasoning. There's no hidden dashboard that management sees and engineers don't. Free, BYOK (your Anthropic API key). We default to Sonnet 4.6, which scores nearly as well as Opus 4.6 at a fraction of the cost — but you can switch models if you want. Pennies per PR either way. No source code stored, diffs analyzed and discarded. Works with GitHub, GitLab, and Bitbucket. Ask me anything about the scoring methodology, how we solved calibration, or what it was actually like rolling this out to a team.

Developer Tools B2B · chuboy

Перейти на We scored 50k PRs with AI

N/A

Данные о доходе недоступны

AI-анализ

Анализ скоро появится.

Похожие продукты

Developer Tools

Capgo

Мгновенные обновления для Capacitor-приложений. Выпускайте исправления за минуты, а не недели. Отправляйте OTA-обновления пользователям без задержек App Store.

$15.2K /мес

Developer Tools Легко клонировать

OpenAlternative

OpenAlternative — каталог open-source альтернатив проприетарному софту. На сайте собраны проекты из разных категорий с информацией о возможностях, стеке технологий и метриках GitHub. Платформа монетизируется через платные размещения и партнёрские ссылки.

$6.7K /мес

Developer Tools

Integuru

Hey HN! We’re Alan and Richard from Integuru (YC W24). We generate fast, reliable integrations for platforms lacking official APIs. About 2 years ago, we released the first agent that reverse-engineers network traffic to build integrations (https://github.com/Integuru-AI/Integuru). Since then, we’ve developed a new approach to reverse-engineer platforms’ source code directly. This solution also includes authentication support. Here’s a demo: https://youtu.be/4l2L8fILC2g?si=nbWbDiFrWZIWRPM7. Many AI products need to integrate with web apps, but platforms often lack official APIs. So far, there are two main ways to integrate: browser automation and via network requests. We set out to build the original agent because we ourselves suffered from RPA’s latency, reliability, and throughput issues. The original agent solved many of the prior issues, but it wasn’t perfect either. The original agent did things the obvious way: (1) have a human do the action; (2) the agent observes the network requests and (3) recreates them. That got us far, but it only supported the path the user triggered. In production, we saw all the uncovered cases: different states, missing fields, permission differences, hidden validations, and request changes we could never catch in a single run. So we started building a new solution from the ground up. Our first step was to add agents that trigger many variations of the same action. To protect the platform’s data integrity, we added a gating layer that blocks outbound requests. This lets us observe the exact request structure, branching behavior, and platform logic without accidentally mutating the live system. But this still wasn’t enough. Some logic is hard to surface by execution alone. A lot of the business rules live in the frontend bundle. So we set out to analyze the true “answer sheet” for each platform: the source code. After experimenting, we got this working. We built a source-code analysis layer that deobfuscates and traces the code associated with each action. In practical terms, our system can handle most tricky edge cases without triggering all flows. Together, these two layers result in much better coverage of the production surface area. They support more edge cases, fail less often, and avoid a lot of the brittle one-off fixes that usually come later. Finally, we added auto-healing and API doc generation to improve reliability and the UX. We also offer a 24/7 on-call maintenance team for companies on the production plan. We now spend most of our time supporting vertical AI companies and helping them connect to their customer systems. We offer a free plan for integrating with one platform and charge for additional platforms, accounts, and overage API calls. For instance, we help healthcare AI companies connect to EHRs and payer portals, and logistics companies connect to TMSs and ERPs. Some companies are now running more than 1M monthly requests per platform. Across our production users, API calls complete in ~3 seconds at 99.9%+ success rate on average. We’re also building a library of APIs that users can use out of the box. That said, this version still has limitations we want to iterate on. Although we already tackle some anti-bot mechanisms, the agent still struggles to generate integrations with heavily anti-botted platforms. When the agent fails, our on-call team steps in to improve the agent or build the integration manually if the customer requests it. Also, the UX for generating an integration is still quite manual. Our next step is to build a CLI experience, so people and their agents can create, test, and use integrations in a much more flexible manner. This also prevents humans from having to wait for Integuru to finish its tasks. We want to one day allow developers and agents to integrate with all platforms instantly. Integuru is an ongoing effort. We’re passionate about automating integrations and would love your feedback!

Доход N/A

Developer Tools

LINQ CLI

Hey my name is Patrick, I’m a co-founder and CTO of Linq. We’re an API for sending and receiving iMessages (it does RCS/SMS too). It can do everything you can manually in iMessage (typing indicators, reactions, delivery emphasis, FindMy etc.) Our main customers are companies building conversational agents but we’re wanting to make it easier for developers to get started for free. To do that we built a CLI that lets you manage up to 20 contacts and gives you full API access for free. I’d love your feedback so we can keep improving it. Install via npm using: npm install -g @linqapp/cli Recently, I used the CLI to connect my Claude bot to WeWork & iMessage and haven’t had to use the WeWork app in a few weeks to book rooms. Github: https://github.com/linq-team/linq-cli Landing page: https://linqapp.com/cli Three constraints you should know about: 1. The free tier requires inbound-first (ie someone must text you before you text them) and has a limit of 20 contacts. This is to avoid spam. 2. The line is shared. This means a few other people will be using the same phone number as you, none of our paid production lines work this way. If you're testing enterprise grade our sandbox mirrors production, but has a 7 day time limit. The CLI is shared because there is a real infrastructure cost to us and we want to give this away for free. 3. We require an email to sign up. To avoid spam + our infrastructure cost. To be precise about "open source", it's the CLI. The whole client is in that repo, so you can read exactly what leaves your machine. The backend that delivers messages is closed.

Доход N/A

Developer Tools

Open-Source AI Racing Harness

Hi I'm Dan from Elodin, making an open source real-time capable flight software simulation. For AI Grand Prix contestants, the wait for the Round 1 virtual qualifier simulation has been grueling. If you’re competing, check out our simulation harness to tide you over, built to match the published competition constraints and message format. It runs against real Betaflight, which we learned requires at least 1000 sensor samples per second to run real-time correctly. The competition warranted introducing a new feature to generate the camera sensor directly in the simulation loop. Typically people connect to Unreal or similar game engine to create a camera sensor, which works well but is very heavy. For the simple needs of this challenge, creating sample directly in the loop is very handy and easy to use. Happy to hear your feedback on this! While it's not fancy looking currently, it uses the Rust Bevy game engine, which should allow us to improve the visual fidelity quickly. We all should easily be able to shift our implementation to the published competition sim once it lands. Hope you enjoy and good luck!

Доход N/A

Ключевые факты

Категория: Developer Tools
Аудитория: B2B
Основатель: chuboy
Данные о доходе: Неизвестно

Twitter LinkedIn

We scored 50k PRs with AI

AI-анализ

Похожие продукты

Capgo

OpenAlternative

Integuru

LINQ CLI

Open-Source AI Racing Harness

Ключевые факты

Поделиться