

Discover more from Data-driven VC
3 Data Sources Every Investor Should Track
DDVC #31: Where venture capital and data intersect. Every week.
👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers: 7,585+, +200 since last week
Brought to you by VESTBERRY - the future of portfolio management.
Harness real-time data to leapfrog in the investment game and uncover hidden opportunities. Make data-driven decisions with VESTBERRY's intuitive platform and automate your LP reporting. Act now! Be among the first 20 to request a FREE branded LP report template and reduce your reporting time from 7 days to just 1 hour!
There are hundreds and thousands of data sources out there in the web, and most of them are useless for investors.
Investors into private companies seek clear signals for early company formation (identification) or inflection points along a company’s journey (enrichment). While company traces are spread across the web and almost every website provides at least some relevant records, digital footprints of companies tend to follow an 80/20 Pareto distribution. 80% of traces are recorded by 20% of the sources, whereas 20% of the traces are recorded by 80% of the sources.
Finding the few data sources with the highest signal-to-noise ratio is the primary challenge for investors who want to become more data-driven.
I’ve been professionally chasing digital startup footprints for more than 5 years now, and in that time, I’ve evaluated hundreds of data sources. Most of them fall into the 80% of sources that cover 20% of the traces. The noise.
Not the following three, they are different.
LinkedIn, Public Registers and Website Traffic - they are the truffles we’ve been looking for.
The 20% that cover 80% of the companies.
The high signal-to-noise sources.
Sounds obvious, but tracking them in the right way, you will see your coverage and alertness jump to the next level.
Why LinkedIn, Public Registers and Website Traffic are so useful for investors
“What do all companies have in common?” This simple question helps us to find the common denominator, the few data sources that cover the majority of digital footprints.
The answer? Well, every company has human founders (at least pre ChatGPT x Plugins :-P), gets registered at some point and, if things go well, will attract attention.
“What” leads to “where”, and here we find LinkedIn, Public Registers and Website Traffic providing all relevant data at scale. So let’s dive into each source in detail, and see how they can be leveraged across sourcing (=identification, see every company as early as possible) and screening (=enrichment, track companies over time to recognize and inflection point and be in front of the founders at the right point in time).
How to unlock the power of LinkedIn
Two relevant entity types exist within LinkedIn: People and companies. Investors eventually seek to track both, yet the identification order doesn’t matter too much. We might spot a founder first and follow her traces to find the company or vice versa.
Identification:
People: Title or job changes to “Founder of XYZ”, “Stealth”, “Starting something new” etc. There are two options, 1) either track changes for a pre-defined list of “watchlist people” or 2) track catch-all companies like “Stealth Startup” (currently 11.5k employees ;)) and compute the diff of its employees on a regular basis to spot the new-born founders
Companies: Repeat search with fixed criteria (like geo or industry focus) and compute the diff of the samples to identify new-born companies
Enrichment:
People: Most obvious, number of followers as a proxy for attention. If you’re well connected yourself, you can also check for mutual connections and see if investor interest increases (this works better for Twitter btw). Screen founder content for relevant events (new hires, milestones, partnerships, etc.),sentiment, quality and quantity of interactions.
Companies: Followers, content, headcount and number of job postings split across departments
LinkedIn is a true goldmine for investors and several businesses have evolved around this value proposition. I will provide a comprehensive list of tools in the upcoming “Data-driven VC Landscape 2023”, make sure to sign up and be the first to receive it!
How to unlock the power of Public Registers
While public registers are similar to LinkedIn in the sense that the same entities of “people” and “companies” exist, the major difference is that public registers are country specific and managed in national siloes.
Their non-standardized, oftentimes very old-school way of providing information about new company formation and changes in shareholder structures make it more challenging to collect and unify the data.
Moreover, public registers cover really every company, from the Kebab shop over the bakery to the haircutter. The noise is significantly higher than for LinkedIn, yet worth cutting through it in order to achieve comprehensive coverage and stay on top of funding rounds. The most important aspects:
Identification:
People: Who are founders, managers, key employees with shares but also who are the investors, which relationships exist etc.
Companies: New registrations and details about focus of business, headquarter, etc.
Enrichment:
People: Changes in the management
Companies: Changes focus of the business, headquarter, shareholder structure, new funding rounds, etc.
Resulting information can be leveraged to spot new company formation or track new funding rounds. Providers like Startup Detector have automated the tedious data collection and cleaning for the German Handelsregister. Similar services exist in other countries too.
Besides startup focused insights, the information can also serve as the foundation for a bottom-up knowledge graph to run social network analyses as described in my post “What social graphs of founders and VCs tell us”. Hereof, we can identify well-connected up-stream investors (=investing earlier than us, e.g. angel investors) with high centrality measures and a track-record of spotting promising companies early on.
Tiger Global gained prominence in Europe by adhering to this approach: Keeping a close eye on the latest investments of top VCs such as Accel, Index and few others, and quickly doubling their previous round size and valuation to invest in the same companies.
How to unlock the power of Website Traffic
Website traffic reflects attention. While absolute levels depend a lot on industry, business model and geography, relative changes provide great indication of potential inflection points.
For example, a German quantum computing company without commercial activities might attract less traffic than a US-based BNPL FinTech. Still, relative comparisons to industry and geo peer groups as well as time-series analyses for an individual company provide useful insights. Therefore, Website Traffic is mostly relevant for enrichment and screening.
There exist plenty of tools to track website traffic, e.g. SimilarWeb or SemRush. It’s most useful to create a sample of watchlist companies and regularly track them to identify relative movements.
Moreover, it’s helpful to use other company characteristics such as number of employees, founding date, funding etc. to create peer groups and see what good, average and bad attention looks like.
Conclusion
If you are just starting out to become more data-driven, less is more. Focussing on the above three sources will provide a significant uplift in coverage and allow you to focus on the right opportunities at the right point in time.
LinkedIn is a universal weapon for VCs, for me personally the single most important data source. It’s powerful for sourcing (=identification) and screening (=enrichment) alike.
Similar, but definitely with more noise and complexity in terms of data collection and unification, are public registers like Handelsregister, Companies House and others. They are most useful for sourcing and social network analysis. In contrast, Website Traffic is equally powerful but mostly for enrichment and tracking.
If you want to learn more on how to track these data sources at scale, you might want to check out my article “How to scrape alternative data sources”.
Stay driven,
Andre
Thank you for reading. If you liked it, share it with your friends, colleagues and everyone interested in data-driven innovation. Subscribe below and follow me on LinkedIn or Twitter to never miss data-driven VC updates again.
What do you think about my weekly Newsletter? Love it | It's great | Good | Okay-ish | Stop it
If you have any suggestions, want me to feature an article, research, your tech stack or list a job, hit me up! I would love to include it in my next edition😎
3 Data Sources Every Investor Should Track
Do you also view Twitter as a viable source for new company / founder identification?
Lots of founders active there to share + connect with other founders
Andre, have you tried to use the European business portal (https://e-justice.europa.eu/searchBris.do) as a single source for public European company data? It looks like all member states publish public company data there. Would love to hear your thoughts on that..