Happy Sunday friends! It’s a great morning in New York City - grey skies for a good, cloudy fall day. It’s been quite the week. Fall is a busy time for investors all over. And personally, I’ve finally entered taper for the Chicago Marathon! Ready for much needed rest and relaxation before the big race in two weeks. It’s an even better day in the marathon world as Tigst Assefa obliterated the women’s world record earlier today in Berlin, clocking in at 2:11:53. For context, that’s just about 5:02 minutes per mile (3:08 per km). Light work.
For today’s post, we’re going to talk data moats. Data moats don’t have as much to do with data science as they have to do with defensibility. Moats were supposedly used by Egyptians initially, but gained traction in the Middle Ages, as massive castles were built with large bodies of water surrounding them, defending them from invaders. Modern day data moats are used primarily as an act of defense from competition as well. Data moats have evolved dramatically in the last few years, especially with the hype of generative AI, so let’s dive in.
Building a data moat
A company can build a data moat, and hence an advantage over competitors, by having access to, or generating, data that other companies don’t. Access to this data creates customer lock in and in turn, can make the product stickier.
To be clear — you can build a company with a data moat and generative AI is not in the picture. Plenty of companies exist in that realm (enter a 2012 article about the early blunders of Apple Maps compared to Google Maps, which had a much better data moat). However, with the onset of generative AI, companies with data moats are going to provide value more quickly to users, as they’re bringing AI into the workflows and data that they already spend time in (see below for more).
Some companies start off building this data moat with a novel approach to data collection, like fitness wearables such as WHOOP that are building a multi-dimensional dataset that no other company was generating. Some companies will build tools on a commonly used dataset, like Common Crawl, but gather proprietary data on top of it by having the largest number of users providing feedback to fine-tune their models. ChatGPT is a great example of this.
Data moats can be viewed as more valuable, compared to a tech moat, in the age of AI as open-source technology continues to be emphasized by developers. Tech moats, which can include proprietary algorithms, aren’t safe from the innovation of the open source community. Just because a company has a really novel approach to building something, it doesn’t make their product totally defensible. However, great tech teams working alongside great product teams have historically built great lock in from both.
Are data moats always a sound defense?
In 2019, a16z wrote an article called “The Empty Promise of Data Moats”, citing the narrative that data moats ensure safety from competitors is not always the case.
“Treating data as a magical moat can misdirect founders from focusing on what’s really needed to win” - a16z
Technical innovation in AI in relation to data moats is evolving in a unique way. At a time when we once thought models simply got better by feeding them more data, a data moat and millions of users would have been a natural remedy. However, innovation is finding that models don’t need to be bigger and we can train on smaller portions of data to still get incredibly powerful results.
But, data moats are not in vain! As said by the authors, “data moats clearly don’t last (or automatically happen) through data collection alone, carefully thinking about the strategies that map onto the data journey can help you compete with — and more intentionally and proactively keep up with — a data advantage.”
Data moats in the age of AI
As AI startups and internal solutions are popping up everywhere, each company hopes to differentiate their AI product by building on top of proprietary data. Every company is tacking “AI” onto the end of an existing product, or building an internal chat bot. Yes, some of these tools will succeed, many will never gain traction. But at the end of the day, vendor lock in keeps customers using many of the same tools they’ve used for years perhaps because the tool is to embedded into workflows or the process to switch just is too daunting.
Salesforce’s Einstein: helps draft emails, prepare for meetings, and know deal status by building a conversational chatbot on top of company’s comprehensive CRM
Google’s Bard: the rapidly evolving AI model from Google is now able to search for answers from your Gmail, Docs, Drive, and other Google-hosted data sources to give you responses no other AI model has access to.
More reading & listening
The defensible startup by Mariela Atanasova — diving into the 8 moats a company can build
The empty promise of data moats by a16z
Build a data moat around your AI initiative with Adam Oliner of Slack