Skip to main content

AI Native QA Protocol

Your AI ships code in 20 minutes. Nobody is checking it.

Four phases: reconnaissance, custom agent analysis, recursive test enhancement, permanent automation. The verification infrastructure your AI-native stack needs. Fixed fee. You own everything.

The vibe coding quality crisis — by the numbers

more findings surfaced than teams knew about before engagement

Academ-ia Protocol data, 2026

92%

of AI codebases have at least one critical vulnerability

Sherlock Forensics, Jan–Apr 2026

45%

of AI-generated code fails basic security tests

Veracode, 2026

35

CVEs attributed to AI-generated code in March 2026 alone

Georgia Tech Vibe Security Radar, 2026

The four phases

Not a tool. Not a body. A four-phase methodology that builds verification infrastructure your team runs without us. Each phase compounds on the previous one.

Phase 1

Reconnaissance

Clone, Inventory, Map

Clone the repo. Catalogue every component, service, and integration point. Extract requirements from code, docs, and a 30-minute call. Build a traceability matrix: components mapped to paths mapped to coverage targets. This is the denominator that makes everything measurable.

Phase 2

Custom Agent Analysis

Static Analysis + Security Screening

Custom AI agents built per project context — your stack, your domain, your acceptance criteria. Each agent goes through its own dev lifecycle: designed, tested, validated before it touches client code. SARIF reports, OWASP Top 10 screening, and agentic fix prompts your team feeds to Cursor or Claude Code. Security isn't a separate pass — it's the same analysis.

Phase 3

Recursive Test Enhancement

The Living Layer

A test layer that moves with the codebase. When code changes — new PR, new feature, refactor — the test layer detects it and expands. Each iteration catches things the previous one didn't because the analysis has more context. Coverage compounds instead of eroding. Measured against the traceability matrix, not assumed.

Phase 4

Permanent Automation

Runs Without Us

Playwright for UI flow testing and regression. Cloudflare Workers for endpoint monitoring, data validation, and health checks. CI integration ties it together — static analysis and security on every push, test suites execute automatically. Contract testing prevents cross-service breakage. A production feedback loop means bugs that reach production expand coverage for the next cycle.

The boundary

Visible from day one. What the Protocol owns and what your team owns — no ambiguity, no scope creep.

The Protocol owns

  • Component inventory and traceability matrix
  • Static analysis with custom agents (built, tested, validated per project)
  • Security screening (OWASP Top 10 surface pass)
  • Recursive test enhancement (tests evolve with every PR or push)
  • UI automation (Playwright), endpoint automation (Workers), CI integration
  • Data validation (deterministic + AI scoring)
  • Contract and integration testing
  • Agentic fix prompts the team can execute
  • Coverage benchmarks set per project context
  • Dashboards and reporting in the team's existing tool
  • Production feedback loop: issues in production expand coverage next cycle

Your team owns

  • Applying the fixes — the Protocol finds and prescribes; the team or their AI executes
  • Production deployment decisions — the Protocol provides release signals, not release authority
  • Business logic correctness — if the requirements are wrong, the tests pass and the product is still wrong
  • Exploratory testing and edge cases outside the component inventory
  • Native mobile paths not coverable by automation: Bluetooth, NFC, camera, hardware-specific behavior

Four engagement shapes

Transparent pricing. Fixed scope. You own everything we deliver.

Vibe Code Audit

$3,000 – $6,000

2 weeks · Fixed feePhase 1 + Phase 2

We map your codebase and run custom agent analysis. You walk away knowing exactly what's broken, what's insecure, and how to fix it.

What you get

  • Component inventory and traceability matrix
  • Custom AI agents built for your project's stack and context
  • SARIF 2.1.0 static analysis report (Critical / High / Medium / Low / Info)
  • OWASP Top 10 security screening in the same pass
  • Human-readable Markdown report (quality + security unified)
  • Agentic fix prompt for Cursor or Claude Code
  • 30-minute walkthrough with your team

Best for

Founders who suspect their AI-built codebase has quality or security problems but don't know where.

Recommended

Audit + QA Foundation

$6,000 – $9,000

3 weeks · Fixed feePhase 1 + Phase 2 + Phase 3 setup

Everything in Tier 1, plus the recursive test layer and infrastructure your team will use going forward.

What you get

  • Full Tier 1 audit (inventory, agents, analysis, fix prompts)
  • Recursive test enhancement layer: initial test cases that evolve with your codebase
  • Dashboard: severity, triage status, coverage progress against traceability matrix
  • Security findings tracked with OWASP category and remediation priority
  • Quality strategy document (PDF) tailored to your stack and cadence
  • Run-book for handing off to junior QA — or for running the system without us

Best for

Teams ready to professionalize QA without hiring a full-time engineer yet.

Continuous Monitoring

$1,500 – $3,000/mo

Ongoing · Low-touch retainerPhase 3 ongoing

The recursive loop keeps running. Every push gets analyzed, test coverage compounds, the boundary expands.

What you get

  • Custom agent analysis + security screening on every push to main (CI integration)
  • Recursive test enhancement: new code changes expand test coverage automatically
  • New findings surfaced in your dashboard with fix prompts
  • Monthly summary: findings, severity trends, coverage boundary changes
  • Support and communication Monday–Friday, 9 AM – 6 PM CST

Best for

Teams that completed an audit and want coverage that compounds without full sprint orchestration.

Sprint Orchestration

$9,000+/mo

Ongoing retainer · Two-week sprint cadenceAll 4 phases

The complete system deployed and running. Playwright automation, Workers monitoring, CI integration, contract testing, and a production feedback loop.

What you get

  • All Phase 1–3 deliverables, continuously maintained
  • Playwright UI automation: regression suites, critical path verification
  • Cloudflare Workers: endpoint monitoring, data validation, health checks
  • Contract and integration testing in CI pipeline
  • Production feedback loop: bugs in prod expand Protocol coverage next cycle
  • Monthly dashboard reports for founders and investors
  • Execution, support, and communication Monday–Friday, 9 AM – 6 PM CST with predefined deadlines

Best for

Teams shipping every 1–2 weeks who need a verification system that runs without hiring full-time.

What the infrastructure looks like

Live QA dashboard — severity breakdown, triage status, and findings by acceptance area. Coverage progress measured against the traceability matrix.

IRATZÚ — QA Dashboard
Build RC-4.2.1

26

Total Findings

2

Critical

9

High

11

Medium

Findings by Severity

Critical
2
High
9
Medium
11
Low
3
Info
1

Findings by Area

Authentication
5
Data Handling
7
API Security
4
UI State
6
File Upload
3
Notifications
1
Confirmed: 14Fixed: 7To Verify: 3New: 2
IDSeverityFindingStatus
CR-017CriticalSupabase RLS policy on appointments allows row access across tenant boundaries via crafted JWT claim Fixed
CR-009HighRace condition in concurrent booking flow — two users can claim the same time slot when requests arrive within 50ms window Confirmed
CR-022HighOptimistic UI update on reschedule persists stale datetime in local state when Supabase realtime subscription reconnects To Verify
CR-004MediumEdge function cold start > 4s on first invocation causes silent timeout in payment confirmation webhook — no retry logic Fixed
CR-011MediumClient-side date parsing assumes UTC but Supabase returns timestamptz in user locale — off-by-one day for bookings near midnight Confirmed

Why existing solutions don't fit

AI Testing SaaS

Mabl, Testsigma, Autonoma, QA Wolf

What they do

Generate tests from codebase, self-heal

What they don't do

Orchestrate the full QA function. Founder still runs them.

AI Code Review

CodeRabbit, Qodo, Graphite

What they do

Pre-merge AI review

What they don't do

Catch what ships through. No post-merge verification.

Enterprise QA

Tricentis Tosca, UFT One

What they do

Enterprise test management ($100K–$300K/yr)

What they don't do

Fit early-stage. Wrong tool, wrong price, wrong audience.

Outsourced QA Agencies

DeviQA, Qxf2, Thaloz

What they do

Sell hours offshore at $18–$30/hr

What they don't do

Bring methodology. Hours without orchestration is billable noise.

Freelance QA

Upwork (median $15/hr)

What they do

Per-project commodity work

What they don't do

Build durable infrastructure. Each engagement starts from zero.

In-House QA Hire

Senior QA Engineer

What they do

Long-term coverage ($130K/yr + 3 months ramp)

What they don't do

Solve the immediate problem. Not viable pre-Series A.

Nobody sells AI-native startups a four-phase QA methodology that combines component inventory, custom agent analysis, recursive testing, and permanent automation — all in one engagement.

How it works

01

Book a 30-minute discovery call. We look at your repo and confirm scope.

02

Fixed-fee proposal within 48 hours. No runaround.

03

Engagement starts within 7 days of acceptance.

04

You receive all artifacts. Documented, transferable, yours to keep.

Frequently asked

A four-phase QA methodology for codebases built with AI tools. It starts with a component inventory and traceability matrix, runs static analysis through custom-built AI agents, builds a recursive test layer that evolves with every PR, and deploys permanent automation (Playwright, Cloudflare Workers, CI) that runs without ongoing involvement. The boundary between what the Protocol covers and what the team owns is defined from day one.

Map every component first: visual, back-end, and integration. Build a traceability matrix. Run scoped static analysis with custom agents engineered per project. Build test cases that evolve with every code change. Automate UI paths, endpoint validation, data quality checks, and contract tests. Measure coverage against context-specific benchmarks.

AI-generated code follows patterns that static analysis catches well: state management bugs, auth gaps, missing error handling, insecure defaults. Custom agents built per project context surface roughly 4x more findings than teams knew about. The agents go through their own development lifecycle before they review any code.

The OWASP categories that show up most: authentication gaps, exposed secrets, injection paths (SQL, XSS, log injection), insecure defaults, missing rate limiting, hardcoded credentials. 45% of AI-generated code fails basic security tests (Veracode, 2026). 92% of AI-generated codebases contain at least one critical vulnerability (Sherlock Forensics, Jan–Apr 2026). The Protocol screens for these in the same pass as quality analysis.

Only if the test layer evolves with the code. Static test suites degrade as AI-assisted teams ship fast. Recursive test enhancement means coverage compounds instead of eroding. Coverage is measured against a traceability matrix, not assumed.

Tools like Mabl, CodeRabbit, or QA Wolf generate tests or review code. The Protocol orchestrates when and how those capabilities run across the full release cycle: component inventory, custom agent analysis, recursive testing, permanent automation, and a production feedback loop. The tool is never the problem; running the tool inside an engineered process is.

The Protocol builds verification infrastructure. The team owns production code, deployment decisions, business logic correctness, and exploratory testing outside the component inventory. Hardware-specific mobile paths (Bluetooth, NFC, camera) are documented as team-owned during the inventory. The boundary is visible and measurable from day one, and it expands with every cycle.

Yes — through a separate development engagement. The QA Protocol finds issues and hands you agentic fix prompts your team or their AI can execute. If you need us to fix them directly, we scope that as a development project. Book a call to discuss.

Start with a Vibe Code Audit

If we find substantial issues, we credit the audit fee against any subsequent engagement.

Book your audit
Vibe Code Audit — AI Native QA Protocol · Academ-ia