Analysing City Population Data

I analysed worldwide city population data1 with Python to investigate city population distributions in countries. This 2015 data accounts for 32% of the worldwide population then in 49980 cities.

Bucketing population in bins 1000 wide helps categorise cities for frequency analysis. Since mega-cities create a long tail in the distribution of city populations, I have restricted below distributional plot up to populations of 500,000.

Counts of City Population Bins

The modal city population range is 2000-3000 people. Most cities have tiny populations. 60% of the world’s cities have population < 15,000. 92.6% of world’s cities have populations < 100k.

A few cities contain a lot of the total city population. The 100 most populous cities contain 21% of the global city population. The top 2.5% of the most populous cities contain 50%+ of the worldwide city population. The top 20% of the most populous cities host 82.9% of worldwide city population.

Given a country is populous, the percentage of a nation’s population contained in the most populous n% of the number of cities gives an indication of how urbanised the country is.

Considering countries with total city population 10mn+, I plot below on the x-axis the % of the nation’s city population contained in the top 5% of the most populous cities. Then, I plot the % contained in the top 20% of the most populous cities.

Percentage of nation’s city population contained in the top 5% most populous cities
Percentage of nation’s city population contained in the top 20% most populous cities

Below is a plot of the percentage of the total national city population contained in the top 20% of the most populous cities in the 10 most populous countries.

Percentage of national population in the top 20% most populous cities for the 10 most populous countries

Using bins 10,000 wide for frequency analysis comparisons, I created the table below showing the country name, the most frequently occurring interval of city population, and the proportion of all the cities in that interval. For reference, the modal global population interval is 0-10k with a proportion of 0.46.

Ten most populous countries, modal populational interval, proportion of total count

Binning in 10k intervals, I visualise below the normalised count (count of an interval divided by total counts) distribution of the city populations of Australia, Canada, Germany, UK, and US. Given the long, short tail, I limit the population to 300k.

Distribution of city populations (intervals of 10k)

There are countries with a dominant city that contains majority of the total urban population. Among the populous (>10mn) multi-city countries, Cambodia’s Phnom Penh (79%), Congo’s Brazzaville (76%), and Iraq’s Baghdad (74%) have the highest percentages of the country’s city population contained in the most populous city.

The data is decent but lacking, and the analyses are affected by the data quality. You can find my code, imported from a local Jupyter Python notebook, here.

Reference:
1. https://www.kaggle.com/datasets/max-mind/world-cities-database?datasetId=2015

2 comments

  1. Very informative and informative article. Well researched and supported by facts and figures with diagrams.

Comments are closed.