I Had to Rebuild Everything. Here’s Why That’s a Good Thing.
If you’ve been following, you probably noticed things got quiet on the user side for a while.
I'm not done. I was deep in the code, rebuilding everything from the ground up. Because the truth is, the original version wasn’t broken. It was just shallow.
Raw Transcripts Are Not Intelligence*
Like most podcast AI projects out there, I started with just transcripts. Clean enough, sure. You get every word spoken. But that’s it — a giant wall of text.
That’s what most people stop at:
“Hey, look, you can search the transcript!”
But if you’ve ever tried to find a specific moment or idea that changed your thinking… you know how useless raw text can be. You’re still guessing and still skimming. I'm still hoping to stumble across something valuable.
Words ≠ knowledge.
That’s when it clicked:
Raw transcripts aren't enough if we’re serious about LLMs and AI search.
We need enrichment. We need structure. We need context, segmentation, speaker identity, topics, timelines, sentiment, entities… all the invisible threads that make ideas searchable at a higher level.
That’s what I’m building now.
I Tore It Down to Build It Right
The old version was a prototype, one messy Python file that dumped transcripts into a single Supabase table. It had no structure, relationships, or context.
So I paused it and started over.
Now, it’s a proper pipeline:
- Modular codebase
- Status tracking and rerun logic
- Robust error handling
- Fully automated deployment
The backend runs locally, enriching Alex Hormozi’s RSS feed with Deepgram and OpenAI, chunking the content into meaningful units, and populating four normalized tables in Supabase that represent how knowledge flows.
This isn’t a transcript app.
It’s a data engine that creates a knowledge graph from audio.
New ... but a Lot More Work Ahead
Here’s the current state:
- The data asset is growing: 28 episodes fully processed, creating 6608 text chunks and discovering 1848 unique entities. The factory is running! 🚀
- But the real engine hasn’t been unleashed yet
Why? Because everything has to be reprocessed. The enriched pipeline is new, and none of the old episodes went through it.
So, for now, it’s a bit of a time capsule. A foundation. A placeholder.
But it works — and what’s coming next is far beyond keyword search.
I'm building the infrastructure for deep, intelligent recall.
The kind where you can ask:
“When did Alex discuss hiring a sales team before product-market fit?”
“How has his view of ‘brand’ evolved?”
“Show me every time he compared paid vs organic acquisition.”
That’s what LLMs should be for.
Not just repeating words, but helping you understand their meaning over time.
And if you’re building something similar:
Don’t stop at transcripts. Enrich your data. Structure your system.
Because in this new AI era, context is the product.