The future of hyper-personalized media might start with your inbox, with Jellypod founder Pierson Marks
How text-to-speech has evolved from a novelty to a tool that can power a personalized audio media experience for anyone
Welcome to Day to Data!
Did a friend send you this newsletter? Subscribe to get a dose of musings by a data scientist turned venture investor, breaking down technical topics and themes in data science, artificial intelligence, and more. New post every other Sunday.
Behind all the cool technical products in our every day are the people building them. Engineers, data scientists, designers, and founders leading the charge on shipping great products to the masses. It would be remiss to not share their journeys, so for this week on Day to Data, we’re talking to the founder of an exciting new consumer product about building a company, his tech stack of choice, and the people who got him to where he is today.
When I was a sophomore at UCLA, debating whether to go all in on the pre-med path or do a 180 to follow my growing interest in computer programming, I was looking for an outlet to learn more about what a future in engineering and tech could look like. I met Pierson and learned about the startup and VC club he was leading on campus, and figured I’d try it out. As college kids, we mused on futures seemingly decades away as founders or investors, working in industries we admired from afar. Having this conversation was quite the full circle moment.
A bit on text to speech
We’ve been trying to get our computers to talk to us for decades. At the 1939 New York World’s Fair, the VODER was unveiled. It was a piano-like system developed by Homer Dudley and his team at Bell Labs that electronically synthesized speech by using a keyboard and foot pedals to create a human-like voice. We’ve come a long way since then.
Now, text to speech tech is somewhat table stakes. In 2007, Siri started as a research project at Stanford to be a virtual personal assistant. Siri was acquired by Apple and launched to the masses in 2011. “Hey Siri!” became the phrase that opened the eyes of the everyday consumer who realized that our voice could now let us send emails, search the web, and text our friends.
The real differentiator in the past five years comes from two things - latency (speed) and how realistic the voice sounds (which is definitely impacted by improvements to latency). And maybe I’ll add one more - ease of access. Text to speech (TTS) is just an API away! Any language, any gender, any phrase, with a few lines of code.
To dig into the tech:
From Hawking to Siri: the Evolution of Speech Synthesis by Tife Sanusi
Demystifying the Technical Structure of Text-to-Speech Models by Abdulkader Helwan
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality by Microsoft Research
The product at ElevenLabs
Using OpenAI’s Text to Speech (TTS) model
The journey to becoming a founder
As a builder at heart, from Legos to software, Pierson studied computer science at UCLA. During undergrad, Pierson led the aforementioned VC club, Bruin Ventures, through years of learning about starting and growing incredible companies. These interests led him to Amazon, where he spent 3 years as a software engineer on the “out of box” experience team for Alexa. “Every person that purchased an Alexa was interacting with something we built”, Pierson says. The global scale of an Alexa customer’s experience exposed him to skills across the stack, from hardware, to product marketing.
Deciding to build Jellypod
“It was on my walks to the Amazon office, where I came up with the idea”. He’d listen to podcasts or try to catch up on reading emails and newsletters he subscribed to. It seemed like there should have been an option for him to combine the two — turning his emails into a podcast, getting him the personalized information he cared about in a manner that worked for his day to day. He scoured the web for a solution, and couldn’t find anything to meet his needs, so he began to think about building it himself. He validated his ideas by talking to family, friends and colleagues. He bought a domain and placed an email box to get updates on the website. “Pre-product, people from around the world signed up to learn more, and that was validation enough”.
Behind the scenes: the tech behind Jellypod
When you sign up for Jellypod, the service creates a unique email address for you to sign up for your favorite digital publications with. Using AWS Lambda schedulers, Jellypod checks your unique inbox for newsletters at a time of your choosing everyday. It ingests the emails found in your inbox and begins to transform them into your daily podcast.
Before bringing in LLMs, the email text needs to be pre-processed. This means cleaning up the text — removing promotional text or boiler-plate language — to be ready for a large language model. Advancements in generative AI in text and speech were a clear catalyst as to “why now” with Jellypod.
“LLMs are very good at summarization,” Pierson says. Using GPT-4, Jellypod is able to transform numerous emails into summarized text. While Jellypod currently uses GPT-4, Pierson’s currently integrating open source models into the platform to perform the summarization, using OpenPipe to host and fine-tune these models, bringing costs down ~80%.
From there, that text is transformed into a podcast, using a text-to-speech model from OpenAI. “We tried a variety of text-to-speech models, like ElevenLabs, but the realistic sound of the resulting voice and price point of OpenAI’s model was the right fit for Jellypod”, says Pierson. The entire product is build on a serverless infrastructure, that responds to the spikes in usage (typically in the morning) and is low-cost when the podcasts are done being generated for the day. Now, you’ve got a podcast, personalized to you, ready to listen to directly in the app.
Jellypod includes a variety of additional features, like generating multiple daily podcasts based on category, offline listening, and generate-on-demand podcasts. It’s truly incredible the speed at which Pierson has been able to ship features and updates as a solo technical founder — much thanks to his vast technical skills and access to incredible AI tools.
As for someone who helped shape where Pierson’s career has landed him today? Pierson thanks Paul Eggert, a brilliant professor known for challenging lower level computer science classes at UCLA. Likely unbeknownst to you, we’ve all got Eggert to thank. He’s a coordinator and editor of the Time Zone Database at the Internet Assigned Numbers Authority (IANA). The database keeps phones, computers, tablets, and applications on the right time zone, as well as enables timestamps for documents and photos signaling creation and modification.
In regards to Pierson’s learnings from Eggert — “His tests were hard for everyone. When imposter syndrome dissolved, I realized you can do anything you set your mind to”.
The future of Jellypod & generative media
“Hyper-personalized media is where the world is moving. Everything you see is going to be created for you, and you alone,” Pierson says. Jellypod aims to shape the future of generative, personalized media. It’s a future that is being shaped as we speak. The intersection of media and generative content is up for debate — it’s likely that the New York Times lawsuit against OpenAI will set a new precedent for what these now intertwined industries will look like. And Pierson’s excited to be a part of it - shaping the way we interact with ideas, opinions, and perspectives.
If you’re interested in trying out Jellypod, it’s available in the Apple App Store. You can connect with Pierson here. See you next week!