AI Native QA Protocol
Your AI ships code in 20 minutes. Nobody is checking it.
Four phases: reconnaissance, custom agent analysis, recursive test enhancement, permanent automation. The verification infrastructure your AI-native stack needs. Fixed fee. You own everything.
The vibe coding quality crisis — by the numbers
4×
more findings surfaced than teams knew about before engagement
Academ-ia Protocol data, 2026
92%
of AI codebases have at least one critical vulnerability
Sherlock Forensics, Jan–Apr 2026
45%
of AI-generated code fails basic security tests
Veracode, 2026
35
CVEs attributed to AI-generated code in March 2026 alone
Georgia Tech Vibe Security Radar, 2026
The four phases
Not a tool. Not a body. A four-phase methodology that builds verification infrastructure your team runs without us. Each phase compounds on the previous one.
Reconnaissance
Clone, Inventory, Map
Clone the repo. Catalogue every component, service, and integration point. Extract requirements from code, docs, and a 30-minute call. Build a traceability matrix: components mapped to paths mapped to coverage targets. This is the denominator that makes everything measurable.
Custom Agent Analysis
Static Analysis + Security Screening
Custom AI agents built per project context — your stack, your domain, your acceptance criteria. Each agent goes through its own dev lifecycle: designed, tested, validated before it touches client code. SARIF reports, OWASP Top 10 screening, and agentic fix prompts your team feeds to Cursor or Claude Code. Security isn't a separate pass — it's the same analysis.
Recursive Test Enhancement
The Living Layer
A test layer that moves with the codebase. When code changes — new PR, new feature, refactor — the test layer detects it and expands. Each iteration catches things the previous one didn't because the analysis has more context. Coverage compounds instead of eroding. Measured against the traceability matrix, not assumed.
Permanent Automation
Runs Without Us
Playwright for UI flow testing and regression. Cloudflare Workers for endpoint monitoring, data validation, and health checks. CI integration ties it together — static analysis and security on every push, test suites execute automatically. Contract testing prevents cross-service breakage. A production feedback loop means bugs that reach production expand coverage for the next cycle.
The boundary
Visible from day one. What the Protocol owns and what your team owns — no ambiguity, no scope creep.
The Protocol owns
- ✓Component inventory and traceability matrix
- ✓Static analysis with custom agents (built, tested, validated per project)
- ✓Security screening (OWASP Top 10 surface pass)
- ✓Recursive test enhancement (tests evolve with every PR or push)
- ✓UI automation (Playwright), endpoint automation (Workers), CI integration
- ✓Data validation (deterministic + AI scoring)
- ✓Contract and integration testing
- ✓Agentic fix prompts the team can execute
- ✓Coverage benchmarks set per project context
- ✓Dashboards and reporting in the team's existing tool
- ✓Production feedback loop: issues in production expand coverage next cycle
Your team owns
- →Applying the fixes — the Protocol finds and prescribes; the team or their AI executes
- →Production deployment decisions — the Protocol provides release signals, not release authority
- →Business logic correctness — if the requirements are wrong, the tests pass and the product is still wrong
- →Exploratory testing and edge cases outside the component inventory
- →Native mobile paths not coverable by automation: Bluetooth, NFC, camera, hardware-specific behavior
Four engagement shapes
Transparent pricing. Fixed scope. You own everything we deliver.
Vibe Code Audit
$3,000 – $6,000
We map your codebase and run custom agent analysis. You walk away knowing exactly what's broken, what's insecure, and how to fix it.
What you get
- ✓Component inventory and traceability matrix
- ✓Custom AI agents built for your project's stack and context
- ✓SARIF 2.1.0 static analysis report (Critical / High / Medium / Low / Info)
- ✓OWASP Top 10 security screening in the same pass
- ✓Human-readable Markdown report (quality + security unified)
- ✓Agentic fix prompt for Cursor or Claude Code
- ✓30-minute walkthrough with your team
Best for
Founders who suspect their AI-built codebase has quality or security problems but don't know where.
Audit + QA Foundation
$6,000 – $9,000
Everything in Tier 1, plus the recursive test layer and infrastructure your team will use going forward.
What you get
- ✓Full Tier 1 audit (inventory, agents, analysis, fix prompts)
- ✓Recursive test enhancement layer: initial test cases that evolve with your codebase
- ✓Dashboard: severity, triage status, coverage progress against traceability matrix
- ✓Security findings tracked with OWASP category and remediation priority
- ✓Quality strategy document (PDF) tailored to your stack and cadence
- ✓Run-book for handing off to junior QA — or for running the system without us
Best for
Teams ready to professionalize QA without hiring a full-time engineer yet.
Continuous Monitoring
$1,500 – $3,000/mo
The recursive loop keeps running. Every push gets analyzed, test coverage compounds, the boundary expands.
What you get
- ✓Custom agent analysis + security screening on every push to main (CI integration)
- ✓Recursive test enhancement: new code changes expand test coverage automatically
- ✓New findings surfaced in your dashboard with fix prompts
- ✓Monthly summary: findings, severity trends, coverage boundary changes
- ✓Support and communication Monday–Friday, 9 AM – 6 PM CST
Best for
Teams that completed an audit and want coverage that compounds without full sprint orchestration.
Sprint Orchestration
$9,000+/mo
The complete system deployed and running. Playwright automation, Workers monitoring, CI integration, contract testing, and a production feedback loop.
What you get
- ✓All Phase 1–3 deliverables, continuously maintained
- ✓Playwright UI automation: regression suites, critical path verification
- ✓Cloudflare Workers: endpoint monitoring, data validation, health checks
- ✓Contract and integration testing in CI pipeline
- ✓Production feedback loop: bugs in prod expand Protocol coverage next cycle
- ✓Monthly dashboard reports for founders and investors
- ✓Execution, support, and communication Monday–Friday, 9 AM – 6 PM CST with predefined deadlines
Best for
Teams shipping every 1–2 weeks who need a verification system that runs without hiring full-time.
What the infrastructure looks like
Live QA dashboard — severity breakdown, triage status, and findings by acceptance area. Coverage progress measured against the traceability matrix.
26
Total Findings
2
Critical
9
High
11
Medium
Findings by Severity
Findings by Area
| ID | Severity | Finding | Area | Status |
|---|---|---|---|---|
| CR-017 | Critical | Supabase RLS policy on appointments allows row access across tenant boundaries via crafted JWT claim | Data Handling | Fixed |
| CR-009 | High | Race condition in concurrent booking flow — two users can claim the same time slot when requests arrive within 50ms window | API Security | Confirmed |
| CR-022 | High | Optimistic UI update on reschedule persists stale datetime in local state when Supabase realtime subscription reconnects | UI State | To Verify |
| CR-004 | Medium | Edge function cold start > 4s on first invocation causes silent timeout in payment confirmation webhook — no retry logic | API Security | Fixed |
| CR-011 | Medium | Client-side date parsing assumes UTC but Supabase returns timestamptz in user locale — off-by-one day for bookings near midnight | Data Handling | Confirmed |
Why existing solutions don't fit
| Category | Examples | What they do | What they don't do |
|---|---|---|---|
| AI Testing SaaS | Mabl, Testsigma, Autonoma, QA Wolf | Generate tests from codebase, self-heal | Orchestrate the full QA function. Founder still runs them. |
| AI Code Review | CodeRabbit, Qodo, Graphite | Pre-merge AI review | Catch what ships through. No post-merge verification. |
| Enterprise QA | Tricentis Tosca, UFT One | Enterprise test management ($100K–$300K/yr) | Fit early-stage. Wrong tool, wrong price, wrong audience. |
| Outsourced QA Agencies | DeviQA, Qxf2, Thaloz | Sell hours offshore at $18–$30/hr | Bring methodology. Hours without orchestration is billable noise. |
| Freelance QA | Upwork (median $15/hr) | Per-project commodity work | Build durable infrastructure. Each engagement starts from zero. |
| In-House QA Hire | Senior QA Engineer | Long-term coverage ($130K/yr + 3 months ramp) | Solve the immediate problem. Not viable pre-Series A. |
AI Testing SaaS
Mabl, Testsigma, Autonoma, QA Wolf
What they do
Generate tests from codebase, self-heal
What they don't do
Orchestrate the full QA function. Founder still runs them.
AI Code Review
CodeRabbit, Qodo, Graphite
What they do
Pre-merge AI review
What they don't do
Catch what ships through. No post-merge verification.
Enterprise QA
Tricentis Tosca, UFT One
What they do
Enterprise test management ($100K–$300K/yr)
What they don't do
Fit early-stage. Wrong tool, wrong price, wrong audience.
Outsourced QA Agencies
DeviQA, Qxf2, Thaloz
What they do
Sell hours offshore at $18–$30/hr
What they don't do
Bring methodology. Hours without orchestration is billable noise.
Freelance QA
Upwork (median $15/hr)
What they do
Per-project commodity work
What they don't do
Build durable infrastructure. Each engagement starts from zero.
In-House QA Hire
Senior QA Engineer
What they do
Long-term coverage ($130K/yr + 3 months ramp)
What they don't do
Solve the immediate problem. Not viable pre-Series A.
Nobody sells AI-native startups a four-phase QA methodology that combines component inventory, custom agent analysis, recursive testing, and permanent automation — all in one engagement.
How it works
Book a 30-minute discovery call. We look at your repo and confirm scope.
Fixed-fee proposal within 48 hours. No runaround.
Engagement starts within 7 days of acceptance.
You receive all artifacts. Documented, transferable, yours to keep.
Frequently asked
A four-phase QA methodology for codebases built with AI tools. It starts with a component inventory and traceability matrix, runs static analysis through custom-built AI agents, builds a recursive test layer that evolves with every PR, and deploys permanent automation (Playwright, Cloudflare Workers, CI) that runs without ongoing involvement. The boundary between what the Protocol covers and what the team owns is defined from day one.
Map every component first: visual, back-end, and integration. Build a traceability matrix. Run scoped static analysis with custom agents engineered per project. Build test cases that evolve with every code change. Automate UI paths, endpoint validation, data quality checks, and contract tests. Measure coverage against context-specific benchmarks.
AI-generated code follows patterns that static analysis catches well: state management bugs, auth gaps, missing error handling, insecure defaults. Custom agents built per project context surface roughly 4x more findings than teams knew about. The agents go through their own development lifecycle before they review any code.
The OWASP categories that show up most: authentication gaps, exposed secrets, injection paths (SQL, XSS, log injection), insecure defaults, missing rate limiting, hardcoded credentials. 45% of AI-generated code fails basic security tests (Veracode, 2026). 92% of AI-generated codebases contain at least one critical vulnerability (Sherlock Forensics, Jan–Apr 2026). The Protocol screens for these in the same pass as quality analysis.
Only if the test layer evolves with the code. Static test suites degrade as AI-assisted teams ship fast. Recursive test enhancement means coverage compounds instead of eroding. Coverage is measured against a traceability matrix, not assumed.
Tools like Mabl, CodeRabbit, or QA Wolf generate tests or review code. The Protocol orchestrates when and how those capabilities run across the full release cycle: component inventory, custom agent analysis, recursive testing, permanent automation, and a production feedback loop. The tool is never the problem; running the tool inside an engineered process is.
The Protocol builds verification infrastructure. The team owns production code, deployment decisions, business logic correctness, and exploratory testing outside the component inventory. Hardware-specific mobile paths (Bluetooth, NFC, camera) are documented as team-owned during the inventory. The boundary is visible and measurable from day one, and it expands with every cycle.
Yes — through a separate development engagement. The QA Protocol finds issues and hands you agentic fix prompts your team or their AI can execute. If you need us to fix them directly, we scope that as a development project. Book a call to discuss.
Start with a Vibe Code Audit
If we find substantial issues, we credit the audit fee against any subsequent engagement.
Book your audit