

Discover more from Data-driven VC
How to Extract Names From Startup Landscapes & Market Maps At Scale
DDVC #58: Where venture capital and data intersect. Every week.
👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts, and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers: 14,285, +150 since last week
Brought to you by VESTBERRY - The Portfolio Intelligence Platform for Data-driven VCs
Watch a short video showcasing our new Traffic Lights System dashboard. It's designed for quick and intuitive evaluation of your portfolio performance based on the KPIs you want to monitor. At a glance, you can spot which of your portfolio companies are overperforming or underperforming against their budgets, helping you stay on top of your investments.
Following two more qualitative deep dive pieces on cognitive and data biases involved in the investment selection process, today’s episode will cover one of the most frequently asked operational questions: How to Automatically Extract Names From Startup Landscapes and Market Maps?
Wondering why you would even want to do this? Read on, I’ll explore two exciting use cases at the end of this article. While a year ago, this episode would’ve been several pages long, the widespread adoption and powerful capabilities of LLMs and ChatGPT allow me to keep it short. So let’s dive right into this simple step-by-step guide.
1) Sign up for ChatGPT+ for $20/month (with priority access to new features such as DALLE-3 integration)
2) Copy & paste an Image of the Market Map or Startup Landscape Into ChatGPT+
3) Add the Prompt “Extract the startup names from the image above and list them in bullets based on the different categories in the landscape”
I tested and double-checked the results for 20+ landscapes and the extraction is almost entirely correct. Few minor errors but tbh in some of the cases I couldn’t even recognize the companies by their logos myself 🙄
4) Add the Prompt “Find website for above startups”
Different from the name extraction in the initial step, the identification of the respective websites seems to be less reliable. In my tests, around 10% of the pages were either not recognized or wrongly identified. Still, adding URLs to the names significantly improves the accuracy of the entity matching required in the use cases below. If you’re interested in diving deeper into entity matching and deduplication, you might want to check out my earlier post “How to Create A Single Source of Truth for Startups”.
5) Why all of this? Some simple use cases
Spotting new companies: Start by exporting all companies including the URLs from your CRM system into an XLS file. Mark these companies in red. Next, copy & paste the ChatGPT+ list of extracted company names and URLs into an XLS file. Mark these companies in green. Merge both files by copying & pasting the green companies and URLs below the red lines in the same two columns. Click the “Data” tab in Excel and “Remove duplicates”. The remaining green companies are new companies that were previously not included in your CRM system. Lastly, copy them into a new xls file, make sure to transfer the column headers from your CRM export, and upload the file with new companies to your CRM system. Of course, this can be done with a few lines of code, but I wanted to provide this simple no-code option too.
Mapping industry groups and competitors: Copy & paste the list of extracted company names and URLs from ChatGPT+ into an XLS file. Make sure to add column headers as defined in your CRM system. Add a new column with the header “Industry”. Add the category as extracted from the market map (e.g., “Generative AI - Text” or “Generative AI - Video”) into the xls file. Lastly, upload the file into your CRM system. A new feature will be created that allows you to filter and find competitors for future analyses.
…
Extracting encoded knowledge from these market maps can be useful across a diversity of use cases. The two above are just the beginning and hopefully, this simple process saves you as much time as it did for me.
Reversing the workflow: Automatically creating startup landscape from a list of companies
Having created several startup landscapes myself in the past, I also flipped the overall use case upside down and tried to create a market map from classified lists with the new DALLE-3 feature of ChatGPT+. Having tried a range of prompts, the visual result remains more abstract art than anything else - even when including examples of other landscapes and asking it to stick to the same visual structure.
Prompt: “Create a startup landscape similar to the image above with the listed categories below and include the logos of the startups within the respective category boxes”
Well, turns out to be not as useful. Let me know in case you figure this out.
Stay driven,
Andre
PS: The above extraction workflow works well for parsing information from pitch decks too ;)
Thank you for reading. If you liked it, share it with your friends, colleagues, and everyone interested in data-driven innovation. Subscribe below and follow me on LinkedIn or Twitter to never miss data-driven VC updates again.
What do you think about my weekly Newsletter? Love it | It's great | Good | Okay-ish | Stop it
If you have any suggestions, want me to feature an article, research, your tech stack or list a job, hit me up! I would love to include it in my next edition😎
How to Extract Names From Startup Landscapes & Market Maps At Scale
As always a fantastic automation guide. Thank you Andre. 🙏
Really helpful, also for accelerator participant lists etc. The thing I struggled a lot with though is how to get ChatGPT+ to process the URL prompt in a batch, works only in 1/10 attempts and in all other cases I get a few URLs and then an error message or ChatGPT+ stops generating. Did you encounter that, too? Any ideas on how to solve this?