In this project, you will build a complete data workflow: API consumption, DataFrame transformation, visual analysis, and SQL database loading.
You will work with the public World Bank v2 API (no authentication required). The goal is to analyze the socioeconomic and environmental evolution of 5 countries chosen by you between 2010 and 2024.
- Choose 5 countries (ISO3) that interest you.
- Choose the indicators you want to analyze.
- Suggested indicators (optional):
SP.POP.TOTL: Total populationNY.GDP.PCAP.CD: GDP per capita (current USD)EN.ATM.CO2E.PC: CO2 emissions per capita (metric tons)SP.DYN.LE00.IN: Life expectancy at birth (years)
Base API URL: https://api.worldbank.org/v2
You must work in a .ipynb file and use:
requestspandasmatplotliband/orseabornsqlalchemy
Install dependencies:
pip install requests pandas matplotlib seaborn sqlalchemyCreate a notebook, for example: src/world_bank_analysis.ipynb.
Review these reference endpoints:
- Countries:
https://api.worldbank.org/v2/country - Indicators:
https://api.worldbank.org/v2/indicator
Check the response structure. The API paginates results (usually up to per_page=50), so you should plan a strategy to loop through pages and store all the information you need.
Example request in Python (template):
import requests
url = "https://api.worldbank.org/v2/country"
params = {
"format": "json",
"per_page": 50,
"page": 1
}
response = requests.get(url, params=params, timeout=30)
response.raise_for_status()
payload = response.json()
# payload[0] -> pagination metadata
# payload[1] -> current page data
print("Metadata:", payload[0])
print("First item:", payload[1][0])IMPORTANT: The code above is only a guide. In the link below you can find all the information needed to make API calls:
https://datahelpdesk.worldbank.org/knowledgebase/articles/898581
Download 2010-2024 time series for the countries and indicators you selected.
Goal:
- Consume the API for multiple countries and indicators
- Handle pagination when needed
- Store responses in a temporary structure (list of dictionaries)
Create one table (DataFrame) per indicator to make country comparisons easier.
Suggested columns for each table:
countryyearvalue
Minimum cleaning:
- Remove rows with null
valuewhen needed - Convert
yearto integer - Convert
valueto numeric
Create at least 2 charts and explain findings in Markdown cells.
Examples:
- Line chart: evolution of one indicator by country (2010-2024)
- Scatter plot: relationship between two indicators for a recent year
Use SQLite with SQLAlchemy to persist data:
- Database:
world_bank_analysis.db - Teaching recommendation: one table per indicator (example:
indicator_gdp_per_capita,indicator_life_expectancy, etc.)
Recommended flow:
- Create an engine with SQLAlchemy
- Save each DataFrame with
to_sql(..., if_exists="replace") - Read a sample with
pd.read_sql()to validate the load
You already have everything you need to start! Take your time exploring the API documentation and understanding the response structure. If you have any questions during the process, contact your mentors.