Don't you love data?

Data Science Blog

Odd Correlations

December 7th, 2020

Lightbulb image from Pexels

A few years ago I discovered a fun website Spurious Correlations. When two variables are highly correlated (negatively or positively), and if their correlation doesn't seem to make any sense, that's when we call it spurious.

I started working on this post a few months ago, and quite frankly collecting data and finding patterns was frustrating. There are a lot more data sources that I'd like to explore, so I'm planning on collecting more data over time and expanding this post. For now, I'll share some exhibits that I found interesting.

For the purpose of this post, I compiled a dataset with over 1,600 of columns from a variety of sources including the World Bank, Federal Reserve Bank of St. Louis, EIA, FBI, Wikipedia, and a few others (full list is below).

Here are a few interesting spurious relationships I found (in case you don't feel like scrolling down). Dairy consumption in the US seems to correlate with a lot of not so good things including increased CO2 emissions in certain countries. While I can buy this trend (not the magnitude of the correlation coefficient), it's also correlated with different types of crimes in the US with milk, cottage cheese and regular ice cream being the worst offenders. Butter and cheese consumption, on the other hand, are negatively correlated with various types of crime in the US. Should American people increase butter and cheese consumption to keep crime at bay? We would need to do a different analysis to study any potential causal relationship.

Crime Dairy Correlation Matrix

You can find my Jupyter notebook Here.

# of Songs by the year in which they were written, first performed, published, recorded, or released vs CO2 Emissions (metric tons per capita) in Dominica

Songs vs CO2 in DMA

China/US Foreign Ecchange Rate vs Renewable Electricity Output (as % of Total Electricity Output) in Arab World

China US Exchange Rate vs Renewable Electricity Output in Arab Wolrd

Mexico/US Foreign Exchange Rate vs Air Transport, Registered Carrier Departures in Vietnam

Mexico US Exchange Rate vs Air Departures in Vietnam

Consumer Price Index (All Items) in Russia vs # of ATMs per 100,000 adults in Pakistan

CPI Russia vs ATM in Pakistan

Government Gross Debt in Russia (% of GDP, Not Seasonally Adjusted) vs Rural Population in South Korea (% of Total Population) in South Korea

Government Debt in Russia vs Rural Population in South Korea

Banana Price vs # of ATMs per 100,000 adults in Lebanon

Banana Price vs # of ATMs in Lebanon

US Milk Consumption vs Rural Population (as % of Total Population) in North Korea

Milk Consumption vs Rural Population in North Korea

Cottage Cheese Consumption in the US vs US Forgery Crime

Cottage Cheese Consumption vs Forgery

US Yogurt Consumption vs CO2 in China

US Yogurt Consumption vs CO2 in China

# of Unmanned Space Launches vs # of ATMs per 100,000 adults in Afghanistan

# of Unmanned Space Launches vs # of ATMs per 100,000 adults in Afghanistan

US Motor Vehicle Theft vs China/US Foreign Exchange Rate

US Motor Vehicle Theft vs China/US Foreign Exchange Rate

US Gambling Crimes vs # of ATMs per 100,000 adults in Denmark

US Gambling Crimes vs # of ATMs per 100,000 adults in Denmark

Here you can explore other series (sorry no support for Internet Explorer). You can view JS code that powers this chart Here

Charts may not look exactly as the ones above due to different default y-axis ranges in Matplotlib (Python) and Google Charts (JavaScript)

Here is a full list of data sources I used