Data-driven VC #29: Beyond sourcing and screening - how data-driven approaches create alpha in the long run
Where venture capital and data intersect. Every week.
👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers: 7,183+, +170 since last week
Brought to you by Affinity - Find, manage, and close more deals with Affinity
Affinity Campfire brings together industry-leading dealmakers to explore what it means to be a part of the leading relationship intelligence ecosystem. Dive deeper into the importance of data-driven sourcing and what dealmaking will look like in 2023 as we navigate a changing global landscape.
“VC is a finding and picking the winners game”
This statement might feel repetitive to the long-term readers among you as I not only pointed this out in the very first episode but in many other occasions too. Yet, it’s crucial to keep this in mind when critically rethinking the VC investment process. You need to start somewhere and focus is key.
Morten Sorensen (2007, “How smart is smart money”) found in his study that about 2/3 of the VC value is created in the sourcing and screening stages of the investment process.
Following this value-oriented approach, the majority of my newsletter thus far was focused on the sourcing and screening stages:
Difference between human-centric and data-centric sourcing approaches
Make versus buy and the results of my startup database benchmarking
A list of alternative data sources and how to scrape them at scale
Feature engineering, missing data and how to make sense of it all
Getting out of the sourcing and screening rabbit hole (although there’s certainly more to explore here in the future), I want to dedicate today’s post to answering two closely related and frequently asked questions:
“Will data-driven approaches go beyond sourcing and screening?”
“Can data-driven approaches actually create alpha in the long run?”
My short answer to both: For sure! Once a scalable scraping infrastructure, reliable data pipelines, inter and intra-entity matching, creative feature engineering, deterministic and ML-based screening approaches as well as an intuitive UI/UX got set up, we can take the next step and rethink the due diligence (DD) and portfolio value creation (PVC) stages of the VC investment process.
Due diligence as you know it, just without the pain
Market sizing, competitor benchmarking, extensive data requests, slow responses, back-and-forth clarifications, and endless follow-up questions. You know it, I know it, we all know it: DD is painful and nobody likes it. Nevertheless, it’s an important part of the investment process.
Looking to solve the well-known pain points with data-driven approaches, I don’t want to reinvent the wheel but automate the pain away. Some examples:
Market sizing: While bottom-up calculations tend to follow a deterministic logic and require more assumptions on Price and Quantity (as recently described by my friendfrom in his exceptional “TAM Masterclass”), top-down approaches and considering variety of market studies is more of a data collection job. Clearly, the latter can be done by LLMs to serve as a starting point for further analyses.
Competitive landscapes: Who has a similar product offering? Are there differences? How much funding did they raise? From which investors? These and many more questions need to be answered as part of competitive benchmarking. Days and weeks of manual data collection work can now be done with a simple prompt and LLMs. Even non-fine-tuned models like GPT-4 provide a great starting point, not to speak about our internal experiments with more advanced vector search, fine-tuned models, and tons of structured and unstructured startup data.
Traction and KPI analysis: Excel sheets come in different forms and shapes. Thankfully, today’s parsing algorithms allow us to extract data from every document, feed it into a standardized table and leverage it for large-scale benchmarking. An enterprise company with a top-down sales motion selling infrastructure software with an ARR between 0-20M? Easy. Once the data is stored in a structured form, benchmarks like the a16z growth guide can be codified and automatically leveraged to assess new investment opportunities at scale.
Clearly, we cannot automate the full DD process. Yet, we can mitigate the major pain points and make it significantly more efficient, for founders and investors alike. The average investor will join the meetings way better prepared and founders need less time talking through the value propositions of their competitors or their traction in too much detail. They can finally focus on strategic questions, the strengths of their team, the why and how, and so much more on the actual drivers of their venture.
Join 7,100+ thought leaders from VCs like a16z, Accel, Index, Sequoia and more.
The power of ten operating partners - Portfolio value creation on steroids
Only recently I saw a well-known growth fund presenting their (exceptionally well-researched) DD findings to one of our portfolio CEOs, just to prove their expertise and potential value add, and put themselves into a better position to win this competitive deal. This was a great reminder that DD findings can serve multiple purposes and present the beginning of a fruitful portfolio value creation journey (PVC).
But what is PVC actually about and how can data-driven approaches (DDVC) improve it? A few examples (beyond capital) below:
Offering strategic advice: VCs often have a wealth of experience in their industry and can offer valuable advice and guidance to their portfolio companies. This unique cross-company perspective can help startups navigate challenges and make strategic decisions that will drive growth and success.
DDVC: Following the DD example above, VCs can continuously share their research findings, competitor benchmarking (ofc this does not mean sharing confidential information), and other information to complement the founders perspective and highlight strengths, weaknesses, and a clear path for improvement. With generative AI models, these findings can be automatically transformed into text, slides, or whatever format suits the founders best.
Hiring and scaling an organization: VCs typically have a team of experienced professionals who can provide guidance and support to their portfolio companies, specifically with respect to hiring and scaling an organization.
DDVC: As described in the sourcing-related episodes before, we have collected millions of people profiles within startups and scale-ups but also across closely related industries with high talent density such as BigTech, Consulting, IB, PE and others. With these profiles in our database and some simple classification, it becomes easy to support founders with requests like “Looking for a VP Sales with direct enterprise software sales experience and ACVs of 50-100k in Munich” at scale. Think of it like a personal recruiter tailored to your needs without ever adding a filter.
Creating networks and partnerships: VCs often have extensive networks of contacts in the industry and can help their portfolio companies forge strategic partnerships and customer relationships that can help them grow and scale.
DDVC: Few individuals within a VC fund actually know about all of the existing relationships within their firm. Not to speak about relationship strength. This is why CRM systems like Affinity are so powerful. They clearly show who is connected to whom and which might be the best path for an introduction. An independent off-the-shelf alternative for data-driven relationship workflows is Brdg.app from the one and only Connor Murphy.
Helping with follow-on funding and exits: VCs spend a lot of time with their peers at other funds to understand each other’s perspectives on market segments, exchange thoughts on hot opportunities and position their own portfolio companies for follow-on funding. According to a survey by our friends at Creandum, this is also the most sought-after value add by founders. Moreover, VCs can help their portfolio companies prepare for exit events and provide support throughout the process.
DDVC: Same applies here like for the industry and customer relationships above.
But won’t every VC at some point use the same tools?
Today, in a period where the majority of VCs still classify as “old-school” and “productivity VCs”, it’s fairly easy to differentiate via a combination of innovative human-centric and data-driven sourcing approaches. Being the first to identify a promising founder via her “Stealth” title on LinkedIn will give you a headstart, and often enough an advantage to build a relationship and strike a deal before the opportunity appears on the radar of other VCs.
While this certainly helps you to create alpha in the short and mid-term (say 3-5y), it won’t be a lasting differentiator forever. It really doesn’t require too much creativity to come up with the data sources that reveal the first digital footprints of founders or inflection points of their companies. LinkedIn, public registers, Github, ProductHunt, Grant databases, Twitter, Reddit, HackerNews - you name it.
Hunting for sustainable differentiation and alpha, I expect the focus of data-driven VCs to quickly shift from sourcing to screening, and more specifically to feature engineering and ML-based scoring. Collecting data on internal decision-making processes, startup failure and success, follow-on funding, exits, their monthly performance and qualitative updates, and a lot more will be key to continuously updating the ML-based scoring models and getting better in flagging the right opportunities at the right point in time.
I expect the screening process to be the mid and long-term differentiator as the continuously collected information and the resulting model weights will be proprietary for every VC. This also indirectly answers another frequently asked question, i.e. “Why would you need this in-house and not just buy signals from an external data provider?”. Well, we all want to create alpha and if everyone reacts on the same signal, it will be only one investor to create significant alpha whereas all others will be left behind. Let’s not be lemmings.
Therefore, the screening part for me is the most important part that in no case should be outsourced. It’s OK for all funds to procure data from the same external data providers but the subsequent data processing and “making sense of the data” needs to stay in-house.
DD is hygiene but PVC impacts founder NPS
Sourcing and screening have the greatest impact on performance, whereas portfolio value creation has the strongest impact on founder NPS and thus a VC’s long-term ability to win the most competitive deals.
DD will continue to be a hygiene factor where we can win efficiency but little effectiveness (assuming work has been done in detail in the past too, just with more friction).
Data-driven approaches in the PVC stages will serve mostly as an amplifier (think of an operating partner on steroids) to provide useful insights in the right context at the right point in time. Moreover, they allow us to better activate our networks and eventually increase founder NPS which in turn will increase the chances to win more competitive deals in the future with strong #references.
Thank you for reading. If you liked it, share it with your friends, colleagues and everyone interested in data-driven innovation. Subscribe below and follow me on LinkedIn or Twitter to never miss data-driven VC updates again.
What do you think about my weekly Newsletter? Love it | It's great | Good | Okay-ish | Stop it
If you have any suggestions, want me to feature an article, research, your tech stack or list a job, hit me up! I would love to include it in my next edition😎
Thanks for the shout out!
A more qualitative way to think about picking an investment partner - how excited are they able to get about your space in a limited amount of time? In their due diligence process, do they come back with insights on your business that you didn't even realize (usually from talking to customers and doing channel checks)? Or are they always on their back foot, needing you to explain everything about the market to them.
I've found the best investment partners (even those who don't invest but do the work to understand your space) will send you their notes on their customer / ecosystem commentary. You can tell a lot about them from their notes. (this part, of course, can't be outsourced to AI... they have to do the hard work of getting on the phone and having a human convo!)
Regarding the value add "Creating networks and partnerships": Do you make your Affinity CRM directly accessible/searchable to founders?
I really like the idea of Alexis Ohanian od 776 ventures to have a searchable app/database. Especially since your main CRM might be too sensitive to share.
I looked at Bridge before, but it looks like it has not much traction in Europe. At least in my network 😄