How Caleb would build Bookkeeper AI

How I'd build Bookkeeper AI

I'd reach for FastAPI on the backend, Next.js for the dashboard, Postgres with a properly thought-out multi-tenant schema, Claude API for the draft generation, and Stripe for billing. I'd wire Resend for outbound email and connect it to the QBO API for live client and invoice data. Rough estimate: 140 hours. That puts us at $10,500 and aligned with the investment-to-production ask.

Day-by-day plan

Day 1: Provision auth schema with role-based access control, design multi-tenant tenant model, test isolation at the query level.

Day 2: Build QBO OAuth flow, handle refresh tokens, sync client list and AR aging into Postgres daily.

Day 3-4: Integrate Claude API, engineer and test drafts across three communication types (collections, scope renegotiation, onboarding), build the draft review UI in Next.js.

Day 5: Wire Resend for outbound email, build send-and-log workflow, surface drafts in a simple one-click approve interface.

Day 6: Build conflict detection (does this draft violate engagement terms, suggest out-of-scope work), add soft rate limiting to prevent spam sends, implement voice-match testing.

Day 7: Write integration tests against QBO sandbox, add Sentry, set up database monitoring and alert thresholds.

Day 8: Deploy to staging, run basic security audit, test Stripe webhook handling for subscription changes.

Day 9: Write runbook, record Loom walkthrough, audit logs and audit trails for compliance.

What's hard about this build

Multi-tenant isolation at scale is non-negotiable here. One bookkeeper's client data bleeding into another's is a shutdown event. I'm not just partitioning by tenant ID in queries; I'm building policy-layer enforcement so a missed WHERE clause can't sink us. QBO OAuth is also deceptively fiddly. Token refresh timing, rate limits that change per endpoint, and the API returning different field structures based on which version the bookkeeper's account uses will cause debug time. The conflict-checking logic is the other hidden cost. "This draft suggests a scope change" sounds simple until you're parsing engagement letters in unstructured text and trying to detect intent. Finally, voice calibration across different communication types is harder than it sounds. The tone for a gentle payment reminder is different from a formal scope-change request, and the agent needs to hold that distinction. Expect 15-20 hours of iteration on prompts alone.

What's fast because of AI

Claude compresses prompt iteration cycles from weeks to hours. Instead of shipping a draft, waiting for bookkeeper feedback, and rewriting, I'm testing five variants in an afternoon. Test generation is similarly fast. I'd normally handwrite 40-50 unit tests for the draft generation alone; Claude writes them in 30 minutes, and I spend an hour hardening the ones that matter. Product copywriting for the UI and help text that usually eats a day gets written and iterated in two hours. Scaffolding the QBO sync logic, error handling trees, and webhook listener boilerplate compressed into single-prompt generation. And edge-case enumeration: when I say "what breaks if QBO returns null for invoice date," Claude lists 15 scenarios I'd have missed.

How I'd hand it off

I'd record a Loom walking through the full bookkeeper workflow: flag a client, generate draft, approve, send, log. I'd write a runbook covering scaling concerns (cache invalidation, QBO rate limits), incident response (what to do if a draft goes out with corrupted client data), and the monitoring dashboard. Thirty-day pager rotation: I'm on-call the first month, with a playbook for each high-risk scenario. I'd transfer Stripe, QBO, and AWS credentials via 1Password. The Linear board stays live with ongoing tracking for minor UX fixes and new communication types.

How Caleb would build Bookkeeper AI.

How I'd build Bookkeeper AI

Day-by-day plan

What's hard about this build

What's fast because of AI

How I'd hand it off