How Caleb would build Marigold

How I'd build Marigold

I'd reach for Next.js on the frontend, FastAPI for the backend with async LLM streaming, Postgres for the core data model, Stripe for subscription management, and Claude's API for the tutor. Deployment is Vercel frontend, Cloud Run backend. I'm estimating 120-140 hours to production, not including the COPPA legal review which is a separate blocker I'll flag below.

Day-by-day plan

Day 1: Provision Postgres schema (users, subscriptions, chat history, sessions), Auth0 integration for OAuth, multi-tenant tenant isolation via auth context.
Day 2: Wire Stripe Billing with webhook handlers for subscription state changes. Set up three tiers and license keys.
Day 3-4: Build the core tutoring chat interface in Next.js. Real-time message streaming from Claude API via FastAPI. UI polish for mobile (parents will access on phone).
Day 5-6: Parent dashboard with session history, learning progress metrics, usage tracking. Analytics backbone for CAC/LTV measurement.
Day 7: Admin panel (internal use) for user management, refund handling, tier overrides. Email notifications (Resend) for trial signup and churn.
Day 8: LLM prompt optimization and token budgeting. A/B test response length, temperature, system prompts to keep per-session cost under 0.15 USD.
Day 9: Integration testing, error handling for LLM timeouts and API failures. Rate limiting.
Day 10: Deployment runbook, monitoring (basic Sentry), handoff documentation. Soft launch to 10 users for live feedback.

What's hard about this build

COPPA compliance kills fast iteration. Any data collection on under-13 users requires verifiable parental consent via federal ID verification or credit card, and you inherit liability for mishandling. I'd block this behind a legal review before any parent PII is touched. Token costs are the second gotcha: verbose LLM responses cost 5-10 cents per session. At 20 sessions per user per month, you're burning 1.00-2.00 USD per user at scale, which crushes a 15 USD margin. I'd build token budgeting from day one with strict response length guards. Parent trust is fragile: incumbents like Khanmigo are well-resourced and free-tier strong. Marketing will be expensive unless we nail a sharp positioning angle before launch.

What's fast because of AI

Claude Compiler and prompt caching compress what would be a week of prompt engineering into two days. I'd use Claude to enumerate edge cases in the auth and billing logic (payment race conditions, refund edge cases, multi-session collisions), then scaffold the test suite. Writing copy for the product UI, onboarding flows, and error messages normally takes a designer and copywriter; Claude does this in hours. Debugging LLM behavior (why is the tutor being terse or verbose, is it system prompt or temperature) is normally manual tweaking; structured prompt iteration with Claude's feedback compresses this to a day. Test coverage for async streaming endpoints is tedious; Claude scaffolds this well, reducing hand-editing.

How I'd hand it off

I'd record a Loom walkthrough of the admin panel and deployment pipeline. Hand off a runbook covering database schema, Stripe webhook handling, auth token refresh, and LLM cost alerts (if monthly cost exceeds threshold, page on-call). 30-day pager rotation for the first month of production to catch COPPA issues, billing bugs, or LLM cost explosions. I'd transfer Stripe keys, Auth0 credentials, and Claude API billing to your AWS account. Database backups are automated to GCS with a 30-day retention policy.

How Caleb would build Marigold.

How I'd build Marigold

Day-by-day plan

What's hard about this build

What's fast because of AI

How I'd hand it off