Wishdeal Factory · Storefront
Operator interview · $75/hr · Roll Digital's seat
← Back to QA Testing AI

How Caleb would build QA Testing AI.

First-person from one of our chief operators. What he'd ship and how, AI-amplified. Stack, hour estimate, day-by-day plan, the parts that are hard, and the handoff. Synthesized from the agent spec.

How I'd build QA Testing AI

I'd reach for FastAPI on the backend, Postgres for the database, and Next.js for the dashboard, with Claude API deeply integrated for test generation and Stripe handling the billing tier logic. The frontend's a single-page app for code uploads and viewing generated tests. I'm estimating 240 hours for a functional MVP that's production-ready: that includes auth, the generation pipeline, the dashboard, and basic monitoring. Call it three months of part-time work, or six weeks at half-time.

Day-by-day plan

  • Days 1-2: Postgres schema for users, organizations, test runs, and generated test results. FastAPI auth with JWT. Role-based access control for team members.
  • Days 3-4: Stripe integration across three pricing tiers. Webhook handling for subscription events. Seat provisioning and quota logic per tier.
  • Days 5-6: Next.js dashboard with code upload, test generation trigger, results display. Basic UI for browsing generated tests and viewing pass/fail status.
  • Days 7-9: Claude API integration for test generation. Prompt engineering for each supported language (starting with Python and JavaScript). Streaming responses for real-time feedback.
  • Days 10-11: Test result persistence, SQLAlchemy ORM layer, indexing on test runs for fast queries. Caching layer for generated tests (Redis).
  • Days 12-13: GitHub OAuth integration for user signups. Code repo linking so tests can be pushed back as PRs.
  • Days 14-15: Basic onboarding flow, welcome email via Resend, sample project for first-time users.
  • Days 16-17: Security audit, API rate limiting, encrypted storage of code samples, audit logging for compliance.
  • Days 18-19: Monitoring with Datadog, error tracking with Sentry, performance profiling on the generation endpoint.
  • Day 20: Documentation, API reference, deployment runbook.

What's hard about this build

The core risk is test reliability. Generated tests that pass locally but miss real bugs tank credibility fast. A single batch of false negatives and the team abandons the tool. I'd mitigate this by starting with Python (simpler semantics, fewer edge cases), building confidence, then expanding to JavaScript and Go. The second hard problem is language coverage. Each stack has its own testing idiom: pytest, Jest, JUnit. Supporting all of them requires baking in language-specific heuristics, and that eats runway. I'm not trying to solve Rails or Rust in week one. Third risk: the code samples users send are often proprietary. Storing them, even transiently, is a privacy liability and a sales objection. I'd hash them immediately post-generation and never persist raw code.

What's fast because of AI

Claude handles the heavy lifting on scaffolding and test enumeration. Instead of me hand-coding test templates for each language, Claude generates them from a user's function signature and docstring. That collapses what was a week of template writing into two days of prompt iteration. Documentation that might take three days gets written in a day using Claude to draft copy and catch gaps. Debugging generated code is faster too. When a generated test doesn't compile, Claude can usually diagnose the problem from the error message and suggest a fix. Edge-case enumeration - the boring part where you list null checks, type mismatches, boundary conditions - Claude does that in minutes. I'd use it to populate test suites with variations, then spot-check them. Copywriting for the dashboard, pricing page, and onboarding UI all compress significantly with the right prompts.

How I'd hand it off

I'd ship a Loom walkthrough covering the dashboard, test generation flow, and Stripe admin panel. The runbook covers deployment steps, environment variables, database migrations, and the Claude API key rotation schedule. I'd leave 30 days of pager duty for any critical issues or production alerts, and I'd transfer all credentials: GitHub OAuth app, Stripe account access, Datadog and Sentry logins, the Claude API key, and AWS infrastructure. You'd also get a Linear project with outstanding tasks and a prioritized list of the three languages I'd build next.

Hire Caleb to build this for you.

QA Testing AI is available to own for $200 flat. Or pay $75/hr for a Roll Digital chief operator to build it for you, AI-amplified.

See pricing →