How I Got to 98.7% Coverage of The Game
Sometimes, a minor cleanup goes a long way.
When I first imported the YAML files from version 1, only 240 episodes entered the database. That was bad. Why? The YAML file had different structures coming from Deepgram. So I went back and reimported everything. Now? I’ve got 98.7% of The Game podcast indexed and ready to chat.
That last 1.3%, though? It’s 11 episodes that still need transcripts.
Before fixing the transcription pipeline, I had to sort out something else: making sure each episode in the RSS feed matched the database. It sounds simple, but it wasn’t. I built a better matching strategy using episode numbers and titles, pulling from Podnews.
With that in place, I can now reliably link episodes from the RSS feed to what’s already in the system. And here’s something I learned: even if you’re prototyping, having a clean but straightforward database is way more reliable than trying to duct-tape everything together from local files.
But those last 11 episodes? Yeah, there's no fancy automation there. I will copy-paste the transcripts manually like some kind of wild animal.
Once that’s done, I’ll finally rebuild the transcription feature correctly.
Also, would you want to download all the transcripts? It could be a nice little feature. Let me know.