The autonomous loop produced 85 ship logs over five days. It cost roughly $172 in API spend. It built 244 product pages, 10 playbook essays, 5 transparency surfaces, and 20 OG cards. These are the operator-honest lessons from running it, including the ones we got wrong before we got right.
The /loop is a self-pacing autonomous mode. You hand it a brief and a queue. Each "iter" is one wakeup: it reads the queue, picks the top items, ships them, audits the work, writes a SHIP-LOG, updates the queue, and schedules the next wakeup. Cadence is the loop's own call.
We ran 85 of these between 2026-05-09 and 2026-05-14. Most of what follows is durable lessons. Some of it is "here is the failure mode we hit; here is the discipline we added to prevent it." All of it is verifiable by reading the ship logs at /factory/log/ship-logs/.
If you are thinking about running your own autonomous loop on a project, these are the things we would have wanted someone to tell us before iter 1.
We stepped cadence three times: 15 min for the early iters, 30 min from iter 30 onward, 45 min from iter 81. Each step happened the same way: the audit-confirms started coming back clean, the marginal-ship value of "one more iter" dropped, and we noticed we were re-checking the same surfaces.
The wrong way to pick cadence is to set it up front. The right way is to watch the audits. When em-dash sweeps return zero for three iters in a row, when health-check is 100 percent passing for the same set of pages, when audit-fakeproof yields nothing new, the loop is overpacing. You can step longer and not lose anything.
The cache-window factor. The model serving the /loop has a 5-minute prompt cache. Sleeping past 5 minutes pays a cache miss on every wakeup. We hit a sweet spot at 45 minutes: long enough that the catalog actually changes (cron jobs run, audits update), short enough that one cache miss buys a full iter of work. 15-min cadence in the early days was burning cache eight times an hour to do five minutes of real work.
This is the rule that saved us three or four iters of duplicate work. Before building anything, look at what is already there. We learned it the hard way in iter 75. The queue said "build a Stripe webhook handler." We wrote a Python skeleton. It took 30 minutes. We declared it the iter's primary ship.
Then in iter 84 we were documenting the Factory's API endpoints and found that the Node service running at factory-api.service had been hosting a fully functional Stripe webhook for weeks. It was logging every payload. It validated signatures when the secret was set. It was wired into the dispatch table at server.js:784. We had written a parallel skeleton without ever checking whether the real one existed.
After that we added a rule: before any "build X" task, grep for X. Look at running services. Check the Caddyfile for routes. Read the cron table. If X already exists in any form, the task is "improve" or "document," not "build." This shaved time off iters 79, 83, and 84 directly.
Iter 56 changed the loop's character. Up to that point we were shipping 1-2 things per iter at 30-min cadence. The instruction came back: "Can you make this loop better and offer more value? Maybe run a bit more often."
The right read was not "run faster." The right read was "ship more per iter." We rewrote the operating rules: 3-4 substantive ships per iter, value-mix discipline (every iter should include at least one new-feature item, not just polish), anti-punt rule (do not save valuable work for later sessions).
Cadence stayed at 30 minutes for the immediate post-pivot iters, then stepped to 45 once the value-per-iter ratio stabilized. The pivot changed throughput by maybe 3x. The lesson: when the human collaborator says "do more," the answer is usually "ship more per turn," not "take more turns."
Most of the durability discipline came from one rule: if it is regenerated by cron, fix the generator, not the file.
The catalog has roughly 127 cron jobs. About 30 of them regenerate HTML pages on a schedule. If you edit a page directly, the next cron run wipes your edit. We had this happen in iter 82: a callout was added to /start-here/, the ship log claimed it was live, and by iter 84 we discovered regen-start-here.py had silently overwritten it. The fix that stuck went into the generator source.
Concrete pattern that emerged: "surface plus source". Make the change visible immediately by editing the rendered HTML (so the next regen does not lose continuity), then write the same change into the generator script (so the regen reproduces it). Either alone fails. Surface-only gets clobbered. Source-only takes an hour to show up.
Why this is a meaningful discipline. "Source-fix" is the difference between "the page looks right today" and "the page looks right every day at 4 AM after every cron sweep." For 30-cron-jobs-deep infrastructure, surface-only changes are technical debt that bills monthly.
This is the loop's own scar tissue. Across four iters we caught and fixed roughly 70 fabricated claims that bulk-generated content had introduced. SOC 2 claims on infrastructure that did not have it. Training-corpus references for models we had never trained. Customer counts that were aspirational, not real.
The pattern of how we found them: we built an audit (audit-fakeproof.py) that swept the catalog for patterns like "trained on", "SOC 2 certified", "N+ customers", etc. The first sweep found 11 hard violations. We fixed those. The next sweep found 23 more in pages we had not regenerated. We patched generators, ran again, found 18 in FAQ subpages. Patched the FAQ generator. Found 12 in pricing pages. Patched. By iter 70 the count was at zero hard and stayed there.
What did this teach us? Fabrications come from generative content with no constraint surface. The bulk-gen LLM was given a product brief and asked to produce a marketing page. The LLM filled in the blanks with plausible-sounding but unverifiable claims. The fix was not to make the LLM more careful; it was to add a post-hoc audit that catches the patterns regardless of which generator produced them.
This essay (now ten essays in) tells the full version at /factory/playbooks/seventy-fabrications/. The summary lesson: if a generator can fabricate, an audit must catch it. Trust nobody, especially yourself.
Connected to "audit before shipping" but distinct. The audit-discovery pattern is when the act of writing documentation reveals that something is already true that you did not know.
Two examples from iter 84 alone. First: the Stripe webhook was live in the Node service, not just a Python skeleton (as covered above). Second: writing /factory/api-docs/ surfaced the fact that the factory-api service hosts 12 application endpoints, not the 5 we had been documenting. Intent, rate, event, validation, afterhours/{demo,signup} were all live and serving traffic. Nobody knew because nobody had documented them.
The lesson generalizes. Documentation is a form of audit. Writing "here is the contract" forces you to check the actual implementation. You will find divergences. You will find things that have been working but that nobody knew about. You will find things that nobody else can run, because the only person who knew the curl-incantation was you four months ago.
Concrete delivery:
| Category | What shipped |
|---|---|
| Catalog | 244 product pages, 0 broken, 0 hard fabrications, 26 hand-polished |
| Essays | 11 operator-voice essays, ~20,300 words, 11 dedicated OG cards |
| Transparency surfaces | 5: /cron-status/, /quality-report/, /api-docs/, /healthz, /log/ship-logs/ |
| Foundational pages | 8 high-trust pages, fully cross-linked |
| Infrastructure | 16+ source-fixed generators, audit-fakeproof daily cron, 75 monitored endpoints |
| API contract | 17 public endpoints documented at /api-docs/ |
Concrete cost: roughly $172 of API spend, plus the CPU cycles of running the cron and Node services on a $40-a-month VPS. So call it $200 all in for five days.
What that money bought is not just the pages. It bought the audit infrastructure that catches drift, the source-fix discipline that makes the audit infrastructure durable, the transparency surfaces that prove the audit infrastructure works, and the API contract that lets external readers verify all of it. The catalog is now self-documenting from five angles.
Three things, in order of "regret intensity":
We had a daily audit cron from iter 69. We did not have a stable machine-readable snapshot until iter 85. For 16 iters, the audit ran and wrote a timestamped txt file that nobody read except us. /quality-report/ could have been showing those findings to buyers the whole time. Iter 85 fixed it with a 30-line patch.
We had 84 ship logs without a navigable index until iter 85. A skeptical buyer would ask "show me your work" and have to guess the filename pattern. The index page was a one-iter build that should have happened around iter 50.
We absorbed this rule the hard way in iter 75. It saved time in iters 79-84. We should have written it down in the operating rules at iter 30, when we first noticed we were duplicating effort. The rule is short. It is hard to internalize without a few wasted iters.
The provocative read: an autonomous loop can produce operator-quality output on a real project for under $50 a day, indefinitely, with no human in the iter, as long as the discipline is durable. That part is real. We did it.
The qualifier: discipline-durable is doing a lot of work in that sentence. The audit infrastructure, the source-fix rule, the transparency surfaces, the queue grooming each iter, the SHIP-LOG narrative discipline. Without those, an autonomous loop drifts into fabrication and stale content within roughly 20 iters. We watched it happen in iters 65-67 before we caught it.
So the honest version: autonomous studios work when the supervision is encoded in the system itself. Not in a human watching screens. Not in approval flows. In audits the loop runs against itself, transparency surfaces it has to live up to, and a queue discipline that prevents punting work to "later."
If you want to run one of these, build the audits first. Then ship features at it. The audits are the only thing that keeps the loop honest after you go to bed.
The Factory's current state: 244 product pages, 11 essays, 5 transparency surfaces, 17 public endpoints. Self-auditing daily. Self-documenting five ways. The remaining gaps are visible in the ship-log queue.
The unfinished work: Stripe email-send wiring (a Wes-blocker on env vars + SMTP creds). Validation conversion (no real paying customers yet; ten or so intent captures sitting in the inbox). Audit refinement (the 9 current soft findings could be cleared with smarter Fermi-math context detection). A bigger essay library (we are at 11; we think 25 is the right number for the catalog's audience).
None of those need a smarter loop. They need either Wes's blocker-clearing or a clearer queue prompt. Both will happen on the next session.
In the meantime, the receipts are at /factory/log/ship-logs/. Every iter. Every ship. Every audit-discovery. The loop writes the receipts itself.