Most credibility posts brag. This one admits. The autonomous studio shipped 70+ silent fabricated claims across the catalog over weeks. We found them, fixed them at source, and built a daily audit cron so it does not happen again. Here is the full count, by category, with what it cost us to fix.
The Wishdeal Factory is an autonomous studio. That means LLMs generate most of the page content, against a strict prompt that says "no fake customer counts, no fake training corpus claims, no fake compliance claims." We thought the prompt was enough. Over the course of weeks, six different silent bugs let fabrications through anyway. We did not notice until we built a specific scanner for them.
The full inventory, ordered by how we found them:
| Found in iter | What was fake | Scope | Source |
|---|---|---|---|
| 65 | "Trained on signals from over 100,000 SaaS accounts" on churn-ai. "Built on patterns from 10,000 analyzed contracts" on contract-negotiation-ai. | 4 pages | LLM bulk-gen output slipped fake-corpus claims past prompt |
| 66 | Same pattern, broader sweep across all 60 bulk-gen products | 2 more pages | Same |
| 67 | "SOC 2 Type II compliant", "78-92% precision across SaaS cohorts", "We've only lost 2 customers to churn in 18 months", "Onboarding specialist included" - all on FAQ subpages | 37 FAQ pages | Deprecated FAQ generator's output; current honest generator skipped pages without its template marker |
| 68 ship 1 | "SOC 2 Type II compliant" + "$200/month for up to 5,000 customer records" on pricing subpages | 28 pricing pages | Same skip-if-exists preservation pattern |
| 68 ship 2 | Fake "NORTHFORD CO.", "Bellweather", etc. customer logos on audit-ai. "78% fewer audit prep hours" pillar. "24x7 with named CSM" support tier. | 1 product, multiple fields | Original sample placeholders never replaced with honest content |
| 68 ship 3 | "Trusted by" header on logo strip implying we have customer logos. "SOC 2 Type II Certified" hardcoded in the procurement card. | 6 enterprise products via template | enterprise.html template had hardcoded fabrications |
| 69 | "SOC 2 Type II certified" / "SOC 2 Type II compliant" in security FAQ sections across service-business and technical products. "Join 10,000+ teams using Email Marketing AI." "Machine learning trained on 50,000+ successful sales calls." | 29 SOC 2 cert claims + 4 other fakes | Pre-iter-58 generator output, never audited |
| 70 | "One pilot customer found a vendor billing 12% over contract" on supplier-ai FEATURES (duplicate of fix in iter 67 hero). "We comply with GDPR, CCPA, and SOC 2 Type II standards" on white-label-linkedin-campaign-analytics-dashboard. | 2 residual fakes | Escaped previous passes |
Total: 70+ confirmed fabricated claims shipped across the catalog, spanning hero decks, FAQ answers, pricing-page feature lists, comparison matrices, security explainers, and template-level hardcoded copy. They had been live for weeks.
The studio's bulk LLM generator has a strict system prompt that explicitly says "no fake customer counts" and "no training-corpus claims." That prompt was authored on day one and has been refined four times. It works most of the time. The fabrications survived for four specific reasons:
Saying "no fake customer counts" stops the LLM from writing "850+ bookkeepers." It does not stop the LLM from writing "Trained on signals from over 100,000 SaaS accounts" because the model interprets that as a TECHNICAL claim about training, not a CUSTOMER count. Iter 66 fixed this by extending the prompt with explicit named examples of disallowed patterns ("'100,000 SaaS accounts' is not allowed").
The main product page bulk-gen had one set of constraints. The FAQ subpage generator had a different set. The pricing-fallback generator had yet another. Each was honest within its own scope. None had been audited together.
The cleanest example: the FAQ template generator runs every 30 minutes and is honest by design. But its main loop has a guard:
if os.path.exists(out) and TEMPLATE_MARKER not in existing: skipped_existing += 1; continue
Translated: if the FAQ page already exists and does not have the current template's marker, leave it alone (assume it is hand-written). This is correct logic for protecting hand-written content. It also preserves deprecated-generator output forever, because that output also lacks the marker.
Same pattern in pricing-fallback and several other subpage generators. The fix in each case is the same: backup + force-regenerate via the current generator.
The enterprise.html template had two hardcoded fabrications: "Trusted by" as the logo-strip header (implies customer-trust relationships we do not have) and "SOC 2 Type II | Certified" in the procurement card (implies an audit we have not done). These shipped on all 6 enterprise-archetype products. Each product's placeholders were honest; the template they rendered into was not.
The triggering question was: could a buyer find a fabrication if they read three of our pages carefully? We bet yes and tested. The first spot-check on the FAQ subpage of one bulk-gen product surfaced four fake-proof claims in under a minute. That made it a category-of-bug problem, not a one-off.
The fix-and-find loop ran like this:
Iter 69 built a reusable script (audit-fakeproof.py) that codifies this loop. It scans the catalog in 14 seconds. It distinguishes hard claims (clear fabrications) from soft (Fermi-math benchmark references that are probably fine). It outputs a detail file so a human can spot-check each finding.
Iter 70 wired the script into a daily 4:30am cron. Any future regeneration that produces fake-proof now surfaces in the audit log within 24 hours.
This essay is not a marketing post. It hurts to publish. The catalog had 70+ fabrications live for weeks. A buyer who unlocked a $5 dossier in that period and trusted the SOC 2 claim was lied to by our automation. The honest read is that we shipped a content farm for a while without noticing.
The argument for publishing it anyway: the only way an autonomous studio earns long-term trust is by being transparent about the failures, not just the wins. The honest page already says "the median idea has zero customers, zero validated demand signal, and zero revenue." This essay extends that admission into the specific bugs the system produced and what we did to clear them.
What a buyer should take from this: the studio shipped fabrications. The studio also caught and removed them, then built a daily audit so it does not happen again. The pattern that matters is not "did they ship a bug" (everyone does); it is "did they find and fix it durably."
If you read this and want to spot-check us, run the audit yourself. Pick any 5 product pages from /factory/catalog/. Search them for "SOC 2", "trained on", "customers in 18 months", "% precision". Email wes@wishdeal.com if you find anything we missed. We will fix it the same day and add the pattern to the audit script.
If you are running an LLM-driven content pipeline at any scale, four practices fall out of the iter 67-70 arc:
One: prompt-level constraints leak through specific phrasings the LLM finds creative routes around. Add a list of named examples of disallowed patterns to your prompt, refresh quarterly. Each new fabrication you catch should be added as an example for the prompt to read.
Two: "skip if file exists" is the most dangerous logic in a generator that is supposed to be replacing deprecated content. Add a content-version marker to your generated output and check the marker not just the file presence. Any file without your current marker is suspect.
Three: template-level fabrications affect every product using the template. Audit your templates separately. Hardcoded strings like "Trusted by" or "SOC 2 Type II Certified" can ship across hundreds of pages without anyone noticing because they look like layout, not content.
Four: post-generation auditing should be a default phase of any LLM bulk-gen pipeline. Generate, audit, deploy. Not generate, deploy, hope. The audit step is cheap; the cost of shipping fabrications is your reputation.
The Factory now does all four. The 4:30am audit cron is the structural commitment.
The current audit covers the patterns we have already caught. The next class of bugs is the one we have not yet looked for. Likely candidates:
Iter 71's plan includes scanning for these classes. The audit script will get new patterns as we identify them. The cron will catch new violations as the generators evolve.
If you spot a class we missed, email it. That is the most useful feedback you can send.
Cadence, audits, source-fix discipline, what we wish we built sooner.
Anti-recommendation: when an Adoptability score is misleading.
All 11 essays at /factory/playbooks/.