Day 1 operating QA Testing AI · Wishdeal Factory

8:42 AM - Inbox triage

I open my laptop in the kitchen with coffee still brewing. The Slack notification badge shows 14 unread messages. I'm on day 42 of operating QA Testing AI, which means I'm starting to recognize patterns instead of fighting fires every thirty minutes.

First thing: I check the admin dashboard. Three new signups came in overnight from a LinkedIn post I seeded on Thursday. Week-to-date revenue is sitting at $4,200, which puts us on pace for maybe $18k this month if the conversion funnel holds. I pull up Stripe to see which of those three actually paid their trial activation fee. Two did. The third signed up but hasn't activated their test environment yet, which means they're a maybe.

I skim the Slack alerts. Our outreach bot flagged that Carol Reyes, a QA manager at a mid-market logistics company, downloaded the documentation PDF yesterday at 11 PM and spent eighteen minutes on the "integrations" page. The bot drafted a personalized email suggesting we hop on a call to discuss her team's testing workflow. I read through it. It's good - mentions her company by name, references the specific pain point she seemed interested in - but it's missing something. The tone is too formal. Carol's Slack profile shows she's been at her company for eight years, came up through QA, not management track. She'll respond better to someone who talks about actual testing problems, not process optimization jargon.

I rewrite the email. Two sentences. I cut the formal greeting and start with: "I noticed you spent time looking at our GitHub integration last night. I'm guessing you're dealing with some version control chaos in your test cycles." I approve it and send it manually from my own email address. This takes me twelve minutes.

10:15 AM - A flagged conflict

One of my core workflows is reviewing the agent's email drafts for customer responses. The AI does the first pass on support tickets that come through our Slack channel, and I spot-check them before they go out.

I see a flag. Marcus Webb, a customer on our $199 monthly plan, replied to an automated onboarding email asking why his test runs are taking so long. He's only been with us for four days. The agent's draft response explains our batching algorithm in technical detail and suggests he might need to adjust his test parameters. It's not wrong, but it misses the real answer: he hasn't tuned his environment variables yet. He doesn't know that. He's frustrated because nothing's working as expected.

I pull up Marcus's account in our admin UI. I can see exactly which tests he's running, which ones are failing, and which ones are timing out. Most of them time out because he's running integration tests against a staging database that sleeps after ten minutes of inactivity. The agent couldn't see that context. I write a response from myself, walking him through the environment setup step by step. I also add him to our "onboarding call" list. Some customers just need thirty minutes with a human.

This kind of moment used to frustrate me. Now I see it as the thing I was specifically meant to do. The agent surfaces the ticket, provides a first pass, and I add the judgment. That's the job.

12:30 PM - Lunch and the metrics check

I heat up yesterday's leftovers and open my metrics spreadsheet. I'm tracking three numbers every day: signups, paid conversions, and active teams using the platform in a meaningful way.

Today's snapshot:

47 signups this month (up from 22 last month)
8 paid conversions (tracking toward maybe 18-20 by month end)
34 active teams with runs in the last 48 hours

The pipeline looks like this: I have 12 teams in "free trial" stage, 5 of those having done a demo call with me, 2 with serious buying signals. One of them is Reyes, the logistics company. The email I sent will probably land around 5 PM based on her timezone. That's real revenue sitting three inches to my left on the spreadsheet, contingent on me being a decent operator.

I also check Linear, where our support tickets live. Three new ones since yesterday morning. One is a bug report about test result formatting. One is a feature request. One is a churn notice from a customer named Devon Petersen who says the pricing is "not justified for the feature set we're actually using." That one stings. Devon signed up in week two, ran tests for a solid three weeks, then went dormant last week. I should have reached out when I saw the usage drop.

I make a note: flag account dormancy at day 21, not day 35. Reach out proactively.

2:08 PM - Customer escalation

A Slack alert comes through from our bot. Kyle Chen, another customer, hit an edge case. His test suite includes a custom assertion library that's not standard. When our AI tries to auto-generate test cases for his code, it doesn't understand the custom assertions and is outputting invalid syntax.

I pull up his test files. I can see exactly what he's doing. His assertions are wrapped in a decorator that changes their behavior. It's clever. It's also not something our agent should be expected to guess from reading code alone.

This is a decision point. Do I:

A) Build a workaround into the agent's instructions for custom assertions in general

B) Tell Kyle he needs to refactor his code to use standard assertions

C) Manually configure his account to skip auto-generation for that specific folder and add a note that he'll need to write those tests by hand

Option B would probably make me lose his $249 monthly contract. Option A is a nice-to-have but probably a week of work. Option C takes me forty minutes and keeps him happy while I think about the bigger problem.

I choose C. I update his account settings in the admin UI, adding a regex rule that excludes his custom assertion folder from our auto-generation. I send him an email explaining what I did and why. I tell him that if this becomes a bottleneck, we should talk about a longer-term solution, but for now he can still use our AI for everything else and handle that folder manually.

He replies within seven minutes: "This is exactly what I needed. Thanks for understanding the use case."

That reply sits in my inbox and makes the afternoon feel less like drowning and more like building something.

3:45 PM - Bug and finish

I've been meaning to fix something all week. Our onboarding flow has a bug where team members added after the initial signup don't see the integration guide. It's not catastrophic, but it's the kind of friction that erodes a product.

I spend 90 minutes in the code. It's a permissions issue in our Rails API. I find it, test it locally, deploy it to staging, smoke test it with a fake account, and push it to production by 5:15 PM. During that time I'm in VS Code and Terminal, barely looking at Slack. The focus feels good. This is the kind of work I used to do before I became an operator. I remind myself why I like shipping code.

4:55 PM - Pipeline and close

I have one more decision to make today. I promised Reyes a call time slot for Thursday morning at 10 AM. I open my calendar and block it. I add a note to myself: pull her account data tonight, write talking points about what her testing bottleneck probably is, and have a demo environment ready.

I check Slack one more time. Carol replied to my email at 4:42 PM. "Tuesday at 10 AM works for me. Looking forward to it." That's a qualified lead and a potential $3,000-$5,000 annual contract sitting in my calendar three days out. I feel the momentum.

I also see that Devon Petersen, the customer who churned, hasn't been back. I'm letting that one go for now, but I'll track churn reasons more carefully from this point forward.

6:15 PM - Wrap

I close the laptop around 6:20. It's a Tuesday evening. I've been working since 8:42 AM with a break for lunch. I shipped a bug fix, I reviewed maybe twelve pieces of agent output, I escalated one tricky customer situation, I sent two manual emails that probably matter. Revenue today was $180 from three activated trials. Week-to-date is $4,200. Not fast yet, but real. I'm not drowning. I can see the shape of the business now. It requires me to stay sharp on judgment calls - knowing when the agent is missing context, knowing which customers need a human touch, knowing when to fix something in the product versus work around it. It's not automated. It's me, using AI as a much smarter junior employee, making decisions that move the needle. It's strange that I'm the bottleneck sometimes. It's also clear that I'm the reason any of this works. That's harder than I expected. It's also more satisfying than I expected.