The Blog That Writes Itself: A Full Teardown
No approval queue. No human in the loop. Here's how it holds together.
The Lab runs itself now.
Every day, a pipeline scans the day’s AI sources, groups them into stories, throws out anything it has already covered, writes original posts with a point of view, scores each one against a quality bar, makes its own art, and publishes 8 to 10 posts. No human in the loop. It runs on a schedule and pings me on Telegram when it’s done.
I rebuilt the whole thing from scratch by directing an AI coding agent. This post is the full teardown. The architecture, the models, the calls I made, and the parts that broke.
Where it started
The old version was an n8n workflow. It turned curated YouTube transcripts into auto-published posts. About 3 a day, all in the AI-marketing lane.
It worked. But it was narrow. One source type. One topic area. Each post leaned on a single transcript, so it could only ever be as good as that one video.
The brief for v2
I wanted more range and a real spine:
Broaden the scope from AI-marketing to all of AI. Models, research, builder tools, agents, products, applied workflows, policy, culture.
Go from 3 posts a day to 10.
Pull from more sources, and more kinds of sources, not just YouTube.
Write with an actual point of view, synthesized across multiple sources, in an operator voice.
Stay autonomous. Auto-publish with spot-checks, no approval queue.
I was open to leaving n8n behind. I did. v2 is a TypeScript pipeline that runs on a daily schedule in GitHub Actions.
How it works now
Each daily run moves through ten stages, end to end, with nobody watching:
Ingest. Pull the day’s items from every source.
Embed. Turn each item into a vector. Basically a string of numbers that captures what the item is about.
Cluster. Group items that cover the same story.
Dedup. Compare each cluster against a memory of everything already written, and drop the repeats.
Rank. Score the surviving stories by signal.
Select tiers. Pick the 2 to 3 biggest as flagships, the next several as notes.
Synthesize. Write each post from its cluster of sources.
Gate. Score the draft against a quality bar. Reject what doesn’t clear it.
Images. Generate a hero plus inline visuals.
Publish. Commit the posts, trigger the deploy, send a Telegram digest.
The quiet hero is dedup
If I had to point at one stage that makes the whole thing work, it’s step 4.
It uses a Postgres database with pgvector as a memory of everything the blog has ever covered. Plain version: every published story gets turned into that string of numbers and filed away. The next day, anything too close to something already written gets dropped before it ever costs a writing call.
That’s what stops the blog from repeating itself. A site publishing 10 posts a day will circle the same story five times in a week if nothing is watching. The memory watches. It also saves money, because a repeat dies before the expensive synthesis step ever runs.
Images, and the part I got wrong first
Every post gets at least two images. A hero, which doubles as the social card, and inline visuals. The writer drops a placeholder wherever a diagram or chart would help, and the image stage fills it.
The first version made every hero the same abstract, warm-toned shape. Looked fine on a single post. On the index page, every story looked identical. Just rows of the same blob.
So now there are five art directions: editorial illustration, flat vector, isometric risograph, cinematic still, and Bauhaus geometric. Each post gets one, chosen by hashing its slug. A given post always looks the same, but the feed looks varied. Each direction carries its own palette, freed from the site’s color scheme, and each image is grounded in the post’s actual subject.
One hard rule in the prompt: no text. The image model will cheerfully render any words you hand it, and it renders them badly. Ban the text up front and the problem goes away.
The stack
Site: Astro, Tailwind, MDX, deployed on Vercel
Pipeline: TypeScript on Node, run with tsx
Orchestration: GitHub Actions, daily cron, gated by a repo flag
Memory and dedup: Postgres + pgvector on Supabase
Flagship writing: Claude Opus 4.8
Notes writing: GPT-5.5
Quality gate: GPT-5.5
Embeddings: text-embedding-3-small
Images: gpt-image-1
Digest: Telegram bot
Sources span RSS feeds, YouTube, arXiv, GitHub releases, Hacker News, and Reddit. The point was variety of input, not one big firehose.
What broke
The architecture is the boring part. The interesting part is always where it broke, and there was plenty.
A couple I already mentioned count here. The identical-hero problem was a real one: the system did exactly what I asked and the result looked broken, because “make a nice hero image” and “make a feed that doesn’t look like a copy-paste” are two different jobs. And the image model rendering garbled text taught me that you don’t ask a model not to do something, you remove the option.
What I’d tell another operator
A messy process doesn’t get better because you added AI. It gets faster at being messy. The thing that made v2 work wasn’t a smarter writer. It was the boring infrastructure around the writer: a memory so it doesn’t repeat, and a gate so it doesn’t ship junk.
If you’re building something that’s supposed to run without you, build those two things first. The autonomy isn’t in the model. It’s in the guardrails.
Next up, I’m watching the gate. I want to know how often it rejects a draft that I’d have actually shipped, and how often it passes one I wouldn’t. That’s the number that tells me whether the quality bar is set right or just set.


