Data-driven VC #25: A blueprint to map the entrepreneurial footprint of organizations

🔥Inside WHU's data-driven journey

Mar 02, 2023

👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.

Current subscribers: 6,360+, +250 since last week

Following your great feedback on last week’s guest post covering Hustle Fund’s data-driven journey, I’m happy to take a completely different angle and have Dries Faems contribute today’s episode. Dries is a Professor for Entrepreneurship, Innovation and Technological Transformation at the WHU Otto Beisheim School of Management, one of the leading entrepreneurial universities in Europe that is lucky to count the founders of Zalando, Rocket Internet, Forto, Flixbus, HelloFresh and many more unicorns to its alumni.

I’m particularly excited about this episode as it perfectly exemplifies how data-driven approaches can be leveraged outside of VC, for example in academic research, M&A or corporate innovation scouting. Thank you, Dries, for sharing your innovative work with us and providing a blueprint in your guest post below 🙏🏻

At the Chair of Entrepreneurship, Innovation and Technological Transformation of WHU, we have started building the WHU Founder Database, a data infrastructure which allows us to address exactly these kind of research questions. In this guest contribution, I want to provide a blueprint that will allow any data enthusiast to build a similar data infrastructure for his or her own organization. In this contribution, I will describe the following steps:

(i)              Step 1: Identifying founders
(ii)             Step 2: Collecting company data
(iii)           Step 3: Collecting investor data
(iv)           Step 4: Merging founder, company and investor data
(v)             Step 5: Developing use cases for your data infrastructure

Step 1: Identifying founders

A valuable data source for collecting Founder Data is LinkedIn. Doing a search in LinkedIn Sales Navigator or LinkedIn Recruiter on the terms ‘Founder’ and ‘Co-Founder’ in the category Job Title and your organization in the category ‘Company’ or ‘School’ will give you a good overview of all the founders in your ecosystem.

Some people are quite proactive in claiming a founder role. As an organization, for instance, you might not be really interested in people, who have been the founder of the local synchronized swimming club in their village (yes this is a real example…). Another issue is that employees in corporates might claim ‘founder’ roles for specific activities within the company (i.e., I am the founder of the feminist book club at Google…). This requires careful cleaning to make sure that only relevant founders are identified.

Whereas LinkedIn is a valuable tool for identifying founders, it cannot be used for unauthorized scraping of founder profiles. LinkedIn defines unauthorized scraping as ‘the use of code and automated collection methods to make (up to) thousands of queries per second and evade technical blocks in order to take data without permission.’ Andre has provided more info on the do’s and don’ts of web scraping in this newsletter post.

Step 2: Collecting Company data:

When you have identified the founders in your ecosystem, the next step is to retrieve more information on the founded companies. Today, quite some data providers exist, providing access to legal information on formally founded companies. NorthData is one potential data provider that you can consider.

NortData: This data provider gives access to structured company information, allowing you to collect information on topics such as when was the company founded, who are the legal owners of the company, when did legal changes in the company occur, …. The good news is that NorthData has an API service that allows you to automate the search for all the companies that you have identified. The API also provides the opportunity for name matching.

Another option is to go directly to the ultimate data source yourself. In Germany, for instance, you can go to the website of the Unternehmensregister. Here you can collect information on the companies for free. However, this website meets the reputation of Germany as a country where the fax machine is still a relevant communication tool😉

I have also seen the first experiments in leveraging ChatGPT to collect company information. My experience, however, is that ChatGPT can very convincingly ‘create’ nonsense information about companies. It is important to realize that Large Language Models are predicting information instead of retrieving information and that, given the current status, predictions on company information are quite inaccurate

Step 3: Collecting Investor Data

Data providers such as NorthData are valuable sources for legal company information but are less relevant to get investor data. If you want to find out which companies in your entrepreneurial ecosystem have raised funding and from who they have raised funding, you need to consult other types of data providers. Here we enter the world of data providers such as Crunchbase, Pitchbook, Dealroom and many others. Luckily, a spiritual leader and data-driven VC guru Andre has used his precious time as a PhD student to produce a great benchmark study, comparing the offerings of different providers.

In our case, we rely on Crunchbase to get investment data from companies in our ecosystem. With a Crunchbase Pro account, you can also upload lists of companies into Crunchbase and it will automatically search for the relevant company information in the database. Subsequently, you can export the collected information by downloading csv files.

Step 4: Merging Founder, Company and Investor Data

After having collected the necessary data from different data sources, a crucial activity will be to accurately merge the data into one single database. At this stage, company name matching becomes an important issue. Matching company names is not a trivial activity, especially in a country where the ‘Umlaut’ is causing headaches for every database administrator… Luckily, a lot of tools and packages exist today to help you with company name matching. The Dutch Central Bank, for instance, has a great Medium post, providing very useful Python code to address this challenge.

At this stage, it is important to engage in quality checks to make sure that data have been accurately matched. When you have collected data from different databases, you will have some redundancies in information. For instance, both Crunchbase and NorthData provide information on the headquarters location of the company. Moreover, people that are listed as founder in Crunchbase will typically be mentioned as manager in NorthData. You can leverage this kind of redundancies to automate quality checks. For instance, if the name of the founder, which you identified in LinkedIn, corresponds with the names of one of the founders in Crunchbase and/or corresponds with one of the managers in NorthData, you can be sure that you have established an accurate link.

Share this article with others who might benefit.

Step 5: Developing Use Cases

When the data infrastructure is established, the fun part can begin! Here are some examples of use cases that can be developed:

Use case #1: Creating dashboards on the footprint of the entrepreneurial ecosystem
We leverage the WHU Founder Database to create dashboards, highlighting the footprint of the WHU ecosystem on the German VC landscape. For an example, check out this LinkedIn post. To do so, we use PowerBI as convenient tool to generate a wide variety of dashboards. Here is a link to a YouTube tutorial that nicely explains how to start building dashboards in PowerBI.

Use case #2: Identify investment news in the entrepreneurial ecosystem
Every month, we do an update of the investment news in Crunchbase to check for new investments in the ecosystem. Subsequently, the WHU Entrepreneurship Center will communicate and celebrate these investments via Investment update reports and WHU Founder reports.

Use case #3: Social network analyses in the entrepreneurial ecosystem
Databases like NorthData and Orbis provide the opportunity to collect information on the shareholders of the companies. In this way, it becomes possible to generate social networks where founders, companies and their shareholders are connected to each other. Andre has written a great newsletter post on this topic, so no need to repeat the basics here. Such network analyses allow us to generate interesting research insights into the complex relationships between different actors. For instance, we can see in our network that teachers, living in Ontario (and contributing to the Ontario Teachers’ Pension Plan) will benefit significantly in terms of their pension when WHU startups are successful 😊

We restrict the use cases of the WHU Founder Database to research-related activities. It is important to realize that different use cases will require different data management approaches to be GDPR compliant. Here are some interesting sources that can give you a better understanding of how to deal with GDPR legislation when collecting and processing personal data from external data providers:

Conclusion

With this guest contribution, I have tried to provide an overview of how organizations can establish a data infrastructure to get a better understanding of their entrepreneurial footprint. I also see some promising signs that multiple actors are active in developing this kind of data intelligence. In Germany, for instance, not only WHU, but also other universities such as RWTH Aachen and TUM have started developing this kind of databases to better understand the impact of their entrepreneurial ecosystem. I hope that this contribution can stimulate other actors to also start building this kind of data intelligence to strengthen our knowledge of different entrepreneurial ecosystems and their broader societal impact.

This is it for today. I hope you enjoyed this more academic guest post and would love to hear your thoughts. Join me next week for our first “Insights from the data” post revealing some useful findings for VCs, founders, researchers and everyone interested in startup innovation.

Stay driven,
Andre

Thank you for reading. If you liked it, share it with your friends, colleagues and everyone interested in data-driven innovation. Subscribe below and follow me on LinkedIn or Twitter to never miss data-driven VC updates again.

What do you think about my weekly Newsletter? Love it | It's great | Good | Okay-ish | Stop it

If you have any suggestions, want me to feature an article, research, your tech stack or list a job, hit me up! I would love to include it in my next edition😎