Startup Data Analysis

I found data on startups that I decided to analyse and visualise. A snapshot of the data, ranked descendingly by total funding, is produced below.

Other columns not displayed include number of funding rounds, founding date, and the dates of first and last funding. There are 66,368 startups in the data that includes startups founded up til mid-2015.

Below is a plot of the number of startups founded by year.

Though the number of startups generally increases over the years, there are noticeable dips after Y2K (2000) and the 2008-2012 recession.

Below, I plot the frequency distribution of total funding. I have put an upper limit of USD 25mn as this plot would otherwise have a very long tail due to some startups with exceptionally sizeable funding.

This is the frequency distribution of the statuses of the startups in the dataset.

Considering just the startups that haven’t closed, I calculated the frequencies of categories in the ‘category_list’ column. I was trying to identify the type of startups that succeed. Here is the frequency count:

These are the most popular regions for startups to be based. I list the region and the count of startups in that region.

I grouped the data by the region a startup’s based in. I created a label column that is valued 0 if the startup has closed and 1 otherwise. Looking at the most popular regions, I calculated the mean of this label value. A high value indicates high average startup success in the region.

Region and startup success metric average

Below is a list of the number of funding rounds and the number of startups with that many funding rounds.

2 comments

Rashmi says:

February 7, 2023 at 10:37 pm

Very nice article!!
Alpana Vats says:

February 7, 2023 at 11:32 pm

Good analysis on start ups. Why the startups in some region is more successful then the other regions. Is it because of govt. policy or other factors etc. This can be further investigated.

Comments are closed.

Share this:

Like this:

2 comments