Show HN: claude-autopilot, autonomous dev pipeline with multi-model review(github.com)
4 分 | 作者 axledbetter01 16小时前
4 条评论
- axledbetter01 16小时前A few framing notes since the body had to fit under 4,000 chars:
On the volume number. "Several hundred thousand LOC per week sustained, with peaks over a million" is gross churn (insertions + deletions) across feature code, tests, generated types, lockfile updates, and migrations. Net new shippable code is a smaller fraction. The point isn't raw LOC; it's that the pipeline can sustainably operate on a real production codebase at that throughput, not a toy. On stacks supported. The pipeline orchestrates whatever your project uses. Migration adapters cover Rails (Active Record), Alembic, Django, Prisma, Drizzle, golang-migrate, dbmate, flyway, supabase-cli, ecto, and typeorm; falls back to a configurable shell command for anything else. Deploy adapters cover Vercel, Fly, Render, and a generic shell adapter. Validate runs whatever test/lint/typecheck command you configure (npm test, pytest, go test, anything). Monorepo support auto-detects npm/yarn/pnpm workspaces, Turborepo, and Nx. Review engine adapters cover Claude, Gemini, Codex, and any OpenAI-compatible endpoint (Groq, Ollama, Together). Why this vs Devin or Cursor agent mode. Devin is hosted, opaque, per-ACU billed, single-vendor stack. claude-autopilot runs locally, every phase is an editable skill, you bring your own provider keys, MIT-licensed. Cursor agent mode is a single-shot in-IDE loop. claude-autopilot sits one layer higher: spec review, implementation dispatch, validation, PR review, release workflow, retry-loop progress detection. Closest cousins. Aider, OpenHands, SWE-agent. We share the local-CLI plus user's-key philosophy and add the phase pipeline, multi-model role split, risk-tiered review, and the retry-loop sameness detector (halts the pipeline when retries make no progress instead of burning the retry budget on attempts going nowhere). See it work, with numbers: - DEMO.md walks through one autonomous run, 12 minutes wall clock, $2.20 spend, 5 new tests: https://github.com/axledbetter/claude-autopilot/blob/master/DEMO.md - Benchmark: 13/13 production-realistic bugs caught in 38 seconds for $0.21, reproducible: https://github.com/axledbetter/claude-autopilot#benchmark Happy to dig into any of it. - axledbetter01 16小时前claude-autopilot is an MIT-licensed npm package that runs Claude Code through an autonomous pipeline: brainstorm, spec, plan, implement, migrate, validate, PR, review, bugbot. Point it at an idea, walk away, come back to a PR that's review-ready. Merge stays human-gated by default.
Try it in 30 seconds: npm install -g @delegance/claude-autopilot claude-autopilot examples # list 5 starter stacks claude-autopilot examples node > spec.md claude-autopilot autopilot spec.md # ship it Five bundled stack templates (node, python, fastapi, go-cli, rust-cli) so you don't write your first spec from a blank page. The strongest credibility signal I can give you: claude-autopilot built itself. Every version of this project that ever shipped, including v7.10.1 today, went through the pipeline you'll see on GitHub. Spec, plan, implementation subagents, Codex review, bugbot triage, admin-merge, npm publish. Full commit history and review threads preserved on the repo. No marketing, just the receipts. I also use it daily on a production codebase. Several hundred thousand lines of code merged per week sustained, with one week peaking over a million. That's gross churn across feature code, tests, types, and migrations, mostly via the autopilot pipeline. The CLI is solving real problems for me before it ships to anyone else. What's actually distinctive: 1. Multi-model role split, by default. Claude writes code, Codex reviews the plan and the diff, Cursor bugbot triages PR findings. Each model gets the job it's actually best at. Sequential by default. Opt-in parallel council (claude-autopilot council) dispatches the same prompt to Claude + Codex + Gemini and synthesizes consensus. 2. Every phase is an editable markdown skill. Not a black-box pipeline. .claude/skills/autopilot/SKILL.md is plain markdown you can read in 5 minutes, audit, edit, swap any phase. The risk-tiered review policy (1/2/3 Codex passes by spec risk frontmatter, auto-escalated for auth, multi-tenancy, billing, secrets, migrations, RLS, IAM) lives there as plain instructions. Inspectability is the wedge against Devin and Cursor agent mode. 3. Local CLI, your provider keys. Anthropic, OpenAI, Google, Groq, Ollama-local. The orchestration runs on your machine. Prompts go to whichever models you've configured. For pure local-only you need Claude Code itself on a local provider; for most teams the goal is "no hosted orchestration plus existing keys." Benchmark on a Next.js fixture seeded with 13 production-realistic bugs (SQL injection, missing auth, IDOR, SSRF, open redirect, TOCTOU race, console.log in prod, missing input validation, etc): scan caught 13/13 in 38 seconds for $0.21. Fixture and reproduction in the repo. Links: https://www.npmjs.com/package/@delegance/claude-autopilot https://github.com/axledbetter/claude-autopilot I'm Alex, founding eng at Delegance (insurance brokerage platform). Built claude-autopilot for my own internal use, open-sourced when it started shipping itself. - Jinyibruceli 2小时前[flagged]
- Jinyibruceli 13小时前[flagged]