Operator essay #9 · the catalog-honesty story

How we caught 70 fabrications in our own catalog.

Most credibility posts brag. This one admits. The autonomous studio shipped 70+ silent fabricated claims across the catalog over weeks. We found them, fixed them at source, and built a daily audit cron so it does not happen again. Here is the full count, by category, with what it cost us to fix.

2026-05-14 · 11 min read · ~1800 words

What we shipped that was not true

The Wishdeal Factory is an autonomous studio. That means LLMs generate most of the page content, against a strict prompt that says "no fake customer counts, no fake training corpus claims, no fake compliance claims." We thought the prompt was enough. Over the course of weeks, six different silent bugs let fabrications through anyway. We did not notice until we built a specific scanner for them.

The full inventory, ordered by how we found them:

Found in iterWhat was fakeScopeSource
65"Trained on signals from over 100,000 SaaS accounts" on churn-ai. "Built on patterns from 10,000 analyzed contracts" on contract-negotiation-ai.4 pagesLLM bulk-gen output slipped fake-corpus claims past prompt
66Same pattern, broader sweep across all 60 bulk-gen products2 more pagesSame
67"SOC 2 Type II compliant", "78-92% precision across SaaS cohorts", "We've only lost 2 customers to churn in 18 months", "Onboarding specialist included" - all on FAQ subpages37 FAQ pagesDeprecated FAQ generator's output; current honest generator skipped pages without its template marker
68 ship 1"SOC 2 Type II compliant" + "$200/month for up to 5,000 customer records" on pricing subpages28 pricing pagesSame skip-if-exists preservation pattern
68 ship 2Fake "NORTHFORD CO.", "Bellweather", etc. customer logos on audit-ai. "78% fewer audit prep hours" pillar. "24x7 with named CSM" support tier.1 product, multiple fieldsOriginal sample placeholders never replaced with honest content
68 ship 3"Trusted by" header on logo strip implying we have customer logos. "SOC 2 Type II Certified" hardcoded in the procurement card.6 enterprise products via templateenterprise.html template had hardcoded fabrications
69"SOC 2 Type II certified" / "SOC 2 Type II compliant" in security FAQ sections across service-business and technical products. "Join 10,000+ teams using Email Marketing AI." "Machine learning trained on 50,000+ successful sales calls."29 SOC 2 cert claims + 4 other fakesPre-iter-58 generator output, never audited
70"One pilot customer found a vendor billing 12% over contract" on supplier-ai FEATURES (duplicate of fix in iter 67 hero). "We comply with GDPR, CCPA, and SOC 2 Type II standards" on white-label-linkedin-campaign-analytics-dashboard.2 residual fakesEscaped previous passes

Total: 70+ confirmed fabricated claims shipped across the catalog, spanning hero decks, FAQ answers, pricing-page feature lists, comparison matrices, security explainers, and template-level hardcoded copy. They had been live for weeks.

Why the bugs survived the prompt

The studio's bulk LLM generator has a strict system prompt that explicitly says "no fake customer counts" and "no training-corpus claims." That prompt was authored on day one and has been refined four times. It works most of the time. The fabrications survived for four specific reasons:

1. The LLM is creative when the prompt has a gap.

Saying "no fake customer counts" stops the LLM from writing "850+ bookkeepers." It does not stop the LLM from writing "Trained on signals from over 100,000 SaaS accounts" because the model interprets that as a TECHNICAL claim about training, not a CUSTOMER count. Iter 66 fixed this by extending the prompt with explicit named examples of disallowed patterns ("'100,000 SaaS accounts' is not allowed").

2. Different generators with different prompts produce different fabrications.

The main product page bulk-gen had one set of constraints. The FAQ subpage generator had a different set. The pricing-fallback generator had yet another. Each was honest within its own scope. None had been audited together.

3. "Skip if file exists" preservation hides bugs forever.

The cleanest example: the FAQ template generator runs every 30 minutes and is honest by design. But its main loop has a guard:

if os.path.exists(out) and TEMPLATE_MARKER not in existing: skipped_existing += 1; continue

Translated: if the FAQ page already exists and does not have the current template's marker, leave it alone (assume it is hand-written). This is correct logic for protecting hand-written content. It also preserves deprecated-generator output forever, because that output also lacks the marker.

Same pattern in pricing-fallback and several other subpage generators. The fix in each case is the same: backup + force-regenerate via the current generator.

4. Hardcoded template-level fabrications affect every product using that template.

The enterprise.html template had two hardcoded fabrications: "Trusted by" as the logo-strip header (implies customer-trust relationships we do not have) and "SOC 2 Type II | Certified" in the procurement card (implies an audit we have not done). These shipped on all 6 enterprise-archetype products. Each product's placeholders were honest; the template they rendered into was not.

How we found them

The triggering question was: could a buyer find a fabrication if they read three of our pages carefully? We bet yes and tested. The first spot-check on the FAQ subpage of one bulk-gen product surfaced four fake-proof claims in under a minute. That made it a category-of-bug problem, not a one-off.

The fix-and-find loop ran like this:

Iter 69 built a reusable script (audit-fakeproof.py) that codifies this loop. It scans the catalog in 14 seconds. It distinguishes hard claims (clear fabrications) from soft (Fermi-math benchmark references that are probably fine). It outputs a detail file so a human can spot-check each finding.

Iter 70 wired the script into a daily 4:30am cron. Any future regeneration that produces fake-proof now surfaces in the audit log within 24 hours.

What this costs us to admit

This essay is not a marketing post. It hurts to publish. The catalog had 70+ fabrications live for weeks. A buyer who unlocked a $5 dossier in that period and trusted the SOC 2 claim was lied to by our automation. The honest read is that we shipped a content farm for a while without noticing.

The argument for publishing it anyway: the only way an autonomous studio earns long-term trust is by being transparent about the failures, not just the wins. The honest page already says "the median idea has zero customers, zero validated demand signal, and zero revenue." This essay extends that admission into the specific bugs the system produced and what we did to clear them.

What a buyer should take from this: the studio shipped fabrications. The studio also caught and removed them, then built a daily audit so it does not happen again. The pattern that matters is not "did they ship a bug" (everyone does); it is "did they find and fix it durably."

If you read this and want to spot-check us, run the audit yourself. Pick any 5 product pages from /factory/catalog/. Search them for "SOC 2", "trained on", "customers in 18 months", "% precision". Email wes@wishdeal.com if you find anything we missed. We will fix it the same day and add the pattern to the audit script.

The general lesson for any autonomous content system

If you are running an LLM-driven content pipeline at any scale, four practices fall out of the iter 67-70 arc:

One: prompt-level constraints leak through specific phrasings the LLM finds creative routes around. Add a list of named examples of disallowed patterns to your prompt, refresh quarterly. Each new fabrication you catch should be added as an example for the prompt to read.

Two: "skip if file exists" is the most dangerous logic in a generator that is supposed to be replacing deprecated content. Add a content-version marker to your generated output and check the marker not just the file presence. Any file without your current marker is suspect.

Three: template-level fabrications affect every product using the template. Audit your templates separately. Hardcoded strings like "Trusted by" or "SOC 2 Type II Certified" can ship across hundreds of pages without anyone noticing because they look like layout, not content.

Four: post-generation auditing should be a default phase of any LLM bulk-gen pipeline. Generate, audit, deploy. Not generate, deploy, hope. The audit step is cheap; the cost of shipping fabrications is your reputation.

The Factory now does all four. The 4:30am audit cron is the structural commitment.

What to look for next

The current audit covers the patterns we have already caught. The next class of bugs is the one we have not yet looked for. Likely candidates:

Iter 71's plan includes scanning for these classes. The audit script will get new patterns as we identify them. The cron will catch new violations as the generators evolve.

If you spot a class we missed, email it. That is the most useful feedback you can send.

Read next.

Meta-honest

What we learned from 85 iterations of an autonomous /loop

Cadence, audits, source-fix discipline, what we wish we built sooner.

Anti-recommendation

The dossiers we would tell a friend to skip

Anti-recommendation: when an Adoptability score is misleading.

All 11 essays at /factory/playbooks/.