Data-driven VC #14: How can VCs learn from each other?
Where venture capital and data intersect. Every week.
👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers: 3,120, +103 since last week
Big shout-out to everyone who submitted their input to the “Data-driven VC Landscape 2023”. 100+ nominations and counting. Make sure to submit your perspective and share the link with everyone who could potentially contribute. You will be the first to receive the full landscape and report including insights into different tech stacks, tools used, thought leaders, content to follow etc.
This episode is intended to answer three reader questions that came up so many times that I decided to finally write the answers down 👨🏻💻
Why are you talking so transparently about your learnings and ideas?
Surprisingly, I feel that the VC industry is still at day one in terms of digitization and data-driven innovation. Most firms today use CRMs (although nobody seems really happy and teams jump across solutions), one or two commercial database providers like Crunchbase, Dealroom or Pitchbook (which become increasingly redundant and hard to differentiate) and simple productivity tools like Calendly, Mixmax or Superhuman. Their approach to data-driven sourcing, screening or even portfolio value creation, however, is in most cases either non-existing or depends on external solutions like Specter or SourceScrub.
Very few funds actually have the means (=management fee), the commitment in the leadership, someone who bridges the gap between the investment and the engineering world as well as access to world-class engineering talent to really do something innovative with data. As a result, the majority of readers are actually not in a position to act upon most of the shared content but rather perceive it as an inspiration and a rough idea of what is possible. It eventually informs their decision to start becoming more data-driven and helps them avoid stupid mistakes. For everyone else, those who are in the position to build something in-house, the majority of content is probably “Hey, that’s exactly what we do”!
In reality, 80% of the stuff is trial and error and in retrospect very straightforward. Not difficult to get right if you really want to get it right and have the right people. These “straightforward insights” are mainly the learnings I share here, but it’s far from the secret sauce we have developed at Earlybird ;)
What is the secret sauce in data-driven VC?
Short-term: If you do something, you’re already ahead of the majority.
Mid-term: Creativity with respect to identification sources becomes key. Commercial data providers become increasingly redundant and a “creativity race” has been started to be among the first to find novel identification sources. For example, one or two years back it was highly innovative to leverage LinkedIn sales navigator to search for and crawl profiles of professionals who changed their title from whatever to “Starting something new” or “Stealth mode”. In many cases, this approach helped us identify new startups even before they got officially registered. It certainly kept us ahead for some time, but is really no rocket science to replicate once the secret - in this case just a website/source and a search term - is out.
Long-term: This answer is two-fold.
The availability of proprietary data such as pitch decks, financial information, notes from founder interactions etc will become increasingly important. Thankfully, at Earlybird we have granular data on 25 years+ of transactions and founder interactions. This proprietary information can then be leveraged to verify and complement publicly available data (as per episode#3 startup database benchmarking and episode#4 how to scrape alternative data sources) to eventually receive a more balanced and well-rounded picture of every company. In turn, this helps us to improve screening models and identify the best opportunities earlier than others.
As said earlier in episode#6 and episode #7, creativity with respect to feature engineering becomes the ultimate differentiator. While creativity with respect to identification sources is fairly easy to copy, creativity with respect to feature engineering is significantly more complex and requires proper data extraction, data cleaning and novel approaches for feature creation.
Feature engineering is a machine learning technique that leverages raw data and existing variables/features to create new variables/features that aren’t in the training set. The goal of this technique is to speed up data transformations while also enhancing model accuracy. - TowardsDataScience
For example, we collect the social media profiles of founders. This is the raw data. Hereof, we use a model to predict the big5 character trades of the respective founders. This is the transformation. We assume that big5 character trades of founders do not (or only slightly) change over time. Subsequently, we take a sample of successful companies to identify patterns within their founders’ character trades that can predict the success of the respective companies. This is the training. Similarly, we can take companies with founder teams who ended up in conflicts and try to identify patterns among their big5 character trades. Another example for training.
Where can I learn about data-driven VC?
Well, that’s a difficult one. Very few people are actually in the position to share hands-on learnings, what works well, what doesn’t work and where they get their information from. And those who do, are oftentimes not willing to share their perspectives because they’re afraid to lose their edge.
As a result, the ecosystem is not only very early but also highly fragmented with strong reluctance to share learnings. This is also why most researchers struggle. They lack the hands-on insights from practitioners who are in the trenches. Therefore, a close researcher-VC relationship is key to generate useful studies. This newsletter is here to change this dynamic and provide a platform for like-minded people. So open question to you:
I hear your feedback and the idea is to complement this newsletter in order to bring us closer together. The stack will gradually be commoditized and I encourage you to share your perspectives more openly, to learn from each other and to prevent the stupid mistakes of migrating to the wrong CRM or paying too much for the commercial database. These mistakes have been made by your peers already, no need for you to do it too.
Yes, we should keep our secret sauce, the proprietary data, the ins and outs of feature engineering to differentiate and generate alpha, but there are so many things we can openly share to learn from each other and accelerate our mutual journey to a more efficient, effective and inclusive VC industry.
Hope you enjoyed this short Q&A format and I am always happy to receive your questions. The next episode will be more technical again🤓
PS: Book recommendation for X-mas “The man who solved the market: How Jim Simons Launched the Quant Revolution”
Thank you for reading. If you liked it, share it with your friends, colleagues and everyone interested in data-driven innovation. Subscribe below and follow me on LinkedIn or Twitter to never miss data-driven VC updates again.
What do you think about my weekly Newsletter? Love it | It's great | Good | Okay-ish | Stop it
If you have any suggestions, want me to feature an article, research, your tech stack or list a job, hit me up! I would love to include it in my next edition😎