Data-driven VC #9: Patterns of successful startups
Where venture capital and data intersect. Every week.
👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers: 2,539, +139 since last week
The last few episodes were centered around a hybrid approach to startup screening where we combine deterministic, rule-based systems with ML-based approaches. While deterministic logic gets infused by human biases and the perceived importance of different selection criteria, the ML-based approach derives the screening criteria weights directly from the data. Combining both approaches is, in short, a two-way street that unifies the benefits (objective, less biased, more inclusive, efficient, effective etc.) of both individual approaches and avoids their respective drawbacks (mirroring the past into the future, limited sample size etc.)
To better understand both concepts and explore the patterns behind successful companies, I will first look at the deterministic approach through a more academic lens (in order to remove my own biases) before then drawing insights from ML-based screening models.
I’d love to better understand who you are in order to improve and further tailor my content. Would be really grateful if you quickly vote below, 1 sec, no downside for you. Thank you!
VC selection criteria - What are investors looking for?
VC selection criteria are a well-researched field with several papers examining the different dimensions. The two tables below summarize the most important studies in the field and rank different selection criteria accordingly. Among them are team, problem/market (size, growth, timing pull/push, fragmentation), solution/product (USP, IP, etc), business model, go-to-market motion, traction (product, financial), competitive landscape, defensibility, cap table and round structure.
Clearly, the evaluation criteria and their relative importance differ across studies but there are some that predominate all across. Namely the team, product, market and traction (#biasON which are also the most important ones for us at Earlybird #biasOFF).
I supervised a Master’s thesis in 2019 to understand the major VC selection criteria in more detail. For example, when thinking about the team, we aimed to understand both the underlying sub-criteria like educational background, professional background, age, gender or social media presence, but also the relative balancing across these sub-criteria. Unfortunately, results were wide-spread and statistically not significant, showcasing once more the diversity of investor perspectives.
That being said, deterministic approaches have benefits but also clear shortcomings and instead of asking investors what they look for based on their experience and limited sample size, why not flipping it around and analyzing the data to understand what successful startups have in common?
Startup success patterns - What does the data tell us?
Let’s have a look at some independent analyses to get a well-rounded picture. First off, I’d like to share selected results from a scoring model we have developed at Earlybird. Due to the overall high relevance of the founder and executive team, I’ll focus the subsequent deep dive on this dimension.
According to our findings, the most important features are the number of executives, whether the CEO is a founder or not, the split of business versus tech roles, the split of male versus female founders, their age and their degrees, among others. You can find an overview in the figure below. For context, the SHAP values measure the contribution of an individual feature to the overall feature set. The higher the values, the more important the feature.
Now that we know which team features matter most, let’s look at what “good” (=promoters) and “bad” (=detractors) mean for the respective dimensions, see also the four graphs below.
#Ececutives: 3-6 is best, single founder is worst
CEO: Important to be founder, external CEO is bad
Age: Teams with higher age are more successful; little difference between 25 to early 40s
Education: Master and PhD similarly good, Bachelors and no degree both bad
Obviously, the insights are rather generic. When moderating for different dimensions like industry or go-to-market strategy, however, the feature importance partially shifts. Happy to discuss this in more detail if you’re interested but for the purpose of this post, I’ll stop here and share some insights from other external analyses.
My friend Francesco and some fellow researchers published an interesting study in 2019. The figure below shows the feature importance of their ML models across successful founder profiles. Age (in months), the number of founders, their country of origin, diversity in the founder team and whether they have a LinkedIn profile or not seemed to have the biggest impact on their scoring. A deep dive and follow-up analysis can be found in another paper from 2021 here. All very much in line with our findings at Earlybird.
A slightly less academic but still very interesting analysis on patters of European unicorn founder CEOs has recently been published by Mosaic Ventures. They look at a sample of 197 individuals to identify patterns among them. Their main findings are:
Prior experience as a founder (~65% were repeat founders)
No previous industry experience in the sector of their unicorn business (~55%)
A Master’s or PhD degree (~55%)
More than 10 years of work experience before founding the company (~35%)
Moreover, they look at the inverse, namely, “what European unicorn founder CEOs tend not to be”
Very unlikely to have worked for another unicorn (only ~10% had)
Very unlikely to have worked for a FAANG or Microsoft (~5%)
Unlikely to have skipped college, including dropping out (~10%)
Unlikely to have studied at a small set of highly represented universities. The top 5 alma maters accounted for only ~15% of founders in the sample
Unlikely to have a technical background (~35% of founding CEOs are technically oriented)
Again, all very much in line with our findings at Earlybird. Hopefully this brief overview gives you a feeling of what successful startups, specifically the teams behind them look like. Clearly, we didn’t stop here but conducted similar analyses for other selection criteria like market, product or competitive landscape. In line with the previous episode, feature engineering is key all across. The more creative and more granular the feature engineering, the more robust and insightful the resulting models.
This is it for today. In the next episode, I will explore how we can integrate a hybrid screening model into existing VC investment processes and how an Augmented VC can look like in practice.
Thank you for reading. If you liked it, share it with your friends, colleagues and everyone interested in data-driven innovation. Subscribe below and follow me on LinkedIn or Twitter to never miss data-driven VC updates again.
What do you think about my weekly Newsletter? Love it | It's great | Good | Okay-ish | Stop it
If you have any suggestions, want me to feature an article, research, your tech stack or list a job, hit me up! I would love to include it in my next edition😎
Kudos for the excellent article.
Below is a good read about the design pattern for the database layer:
Again, this is fallacious.
Prediction does not imply causation.
And you can’t just condition on an outcome (startup success), because that induces collider bias.
P(tech degree | startup success) =/= P(startup success | tech degree) =/= P(startup success | do(tech degree))