GithubHelp home page GithubHelp logo

theeconomist / big-mac-data Goto Github PK

View Code? Open in Web Editor NEW
1.6K 108.0 413.0 15.4 MB

Data and methodology for the Big Mac index

Home Page: https://www.economist.com/bigmac

License: MIT License

Jupyter Notebook 99.10% R 0.90%

big-mac-data's Introduction

The Big Mac index

This repository contains the data behind The Economist’s Big Mac index, and code that shows how we calculate it. To download the data, go to the latest release, where you can download the index data in a CSV or Excel, or the code behind it.

Methodology changes

In July 2022 we updated the Big Mac index to use a McDonalds-provided price for the United States (previously, we averaged the price from four major US cities). We also changed how we calculate the GDP-adjusted index. Instead of using the IMF's calculation of purchasing-power parity, we adjust the GDP per person by the difference in each country's Big Mac prices. The full history of the GDP-adjusted series will now be updated whenever the IMF’s historical GDP series are updated, which means the GDP series for a given year may change slightly over time as the IMF refines its measurements. The previously published versions of both indices are available in the releases.

Source data

Our source data are from several places. Big Mac prices are from McDonald’s directly and from reporting around the world; exchange rates are from Thomson Reuters (until January 2022) and Refinitiv Datastream (July 2022 on); GDP and population data used to calculate the euro area averages are from Eurostat and GDP per person data are from the IMF World Economic Outlook reports.

Output data

The script provides data in three files:

  • big-mac-raw-index.csv contains values for the “raw” index
  • big-mac-adjusted-index.csv contains values for the “adjusted” index
  • big-mac-full-index.csv contains both

Each file also contains the source data used to calculate it.

Codebook

This codebook largely applies to all three files. The exception is the variables suffixed "_raw" or "_adjusted"—these appear (with suffixes) in the "full" file but without suffixes in the respective ("raw" or "adjusted") files.

variable definition source
date Date of observation
iso_a3 Three-character ISO 3166-1 country code
currency_code Three-character ISO 4217 currency code
name Country name
local_price Price of a Big Mac in the local currency McDonalds; The Economist
dollar_ex Local currency units per dollar Reuters
dollar_price Price of a Big Mac in dollars
USD_raw Raw index, relative to the US dollar
EUR_raw Raw index, relative to the Euro
GBP_raw Raw index, relative to the British pound
JPY_raw Raw index, relative to the Japanese yen
CNY_raw Raw index, relative to the Chinese yuan
GDP_dollar GDP per person, in dollars IMF
adj_price GDP-adjusted price of a Big Mac, in dollars
USD_adjusted Adjusted index, relative to the US dollar
EUR_adjusted Adjusted index, relative to the Euro
GBP_adjusted Adjusted index, relative to the British pound
JPY_adjusted Adjusted index, relative to the Japanese yen
CNY_adjusted Adjusted index, relative to the Chinese yuan

Calculating the Big Mac index

The code to calculate the index is provided as a Jupyter Notebook. The code itself is written in R, a programming language designed for data manipulation and statistics. You can view the notebook on github.

If you want to run the notebook, you’ll need to set up a few things:

Install Python

You can refer to the installation instructions at the Hitchhiker’s Guide to Python

On a Mac, you already have Python 2.7 installed, but it does not come with Python’s package manager. We recommend using Python 3. To install it, we recommend using Homebrew. In terminal, install Homebrew:

$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Then, use Homebrew to install Python 3.x:

$ brew install python3

On Ubuntu Linux you can use aptitude:

$ sudo apt-get update
$ sudo apt-get install python3.6

On Windows, instructions coming.

Install Jupyter

On Mac or Linux, you should now also have pip installed. pip is a package manager for Python. You can install Jupyter with pip:

$ python3 -m pip install jupyter

You’re all set. (If you are using Python 2, run python -m pip install jupyter.)

On Windows, instructions coming.

Install R

On a Mac, use Homebrew again. At a terminal prompt, run:

$ brew install R

On Ubuntu Linux, you’re recommended to add a new source to your aptitude setup to install R. Run:

$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9

Once you have added the key, add R repository (called CRAN):

$ sudo add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/'

Now, you can install R:

$ sudo apt-get update
$ sudo apt-get install r-base

On Windows, instructions coming.

Install IRkernel

IRKernel lets you run R code in Jupyter notebooks. This is the best way to work with R code (this is a truth not yet universally acknowledged). Installation instructions for IRKernel are here. In short:

At a terminal prompt, start R:

$ R
> install.packages(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'devtools', 'uuid', 'digest'))
> devtools::install_github('IRkernel/IRkernel')
> IRkernel::installspec()

Congratulations, you can run R in Jupyter.

Install tidyverse and data.table

Finally, our R script uses a few R packages you’ll need to install. The tidyverse is a collection of useful packages for data science work in R. Data.table is a complicated but extremely useful alternative to R’s standard data frames for storing and manipulating data. At the R prompt from above, run:

> install.packages(c('tidyverse','data.table'))

You’re all set.

Start the notebook

Navigate to the repository on the command line, and run:

$ jupyter notebook

You should see a browser window pop up on http://localhost:8888. Click on “Big Mac data generator” to launch the notebook.

To run the notebook, you can run the code cell by cell by clicking on the first cell and using shift+enter to run each cell in turn. Or you can run the whole thing by clicking on the “Cell” menu and selecting “Run All”.

R script

We also include the calculation as a bare R script (data-generator.R) if you just want to run the code, but this doesn't explain what the code does or walk you through it. To run this, you'll only need to install R, tidyverse, and data.table; once those are installed, you can just run

$ R data-generator-v2.R

to calculate the index files. (The R script may generate numbers that are different at the last decimal place to those from the Python notebook—these differences are due to rounding errors and can be safely ignored.)

Licence

This software is published by The Economist under the MIT licence. The data generated by The Economist are available under the Creative Commons Attribution 4.0 International License.

The licences include only the data and the software authored by The Economist, and do not cover any Economist content or third-party data or content made available using the software. More information about licensing, syndication and the copyright of Economist content can be found here.

big-mac-data's People

Contributors

futuraprime avatar marie-ella avatar martgnz avatar philippotto-hauber avatar roxwillis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

big-mac-data's Issues

I think Lebanon July 2022 is wrong

Hi folks

In the raw data, I think the value for dollar_ex for Lebanon in 2022 is wrong - down as 1512.2, while in the outward-facing data it's 25600. This gives the below figure without adjusting.

cheers
Mark

image

Calclation for Lebanon is wrong

According to the conversion data for Lebanon, a big mac would cost +30 dollars in Lebanon. I think there's something wrong with the dollar conversion rate as well as the local currency.

git2r missing

Using macOS Mojave (18A391) has issues with devtools R-package since libgit2 is missing.

Can be solved with

brew install libgit2

MXN dollar_ex

If I understand correctly the meaning of the fields, the value dollar_ex for the row MXN (Mexico) is wrong. It has been around 20 pesos for dollar for a long time. Regards, Luis

brew install R

In the "Install R" section, the suggestion for Mac is brew install R. You may consider switching to brew cask install r-app which supports fast download of pre-compiled packages from CRAN, supposedly works better with RStudio, and has some other advantages.

Syntax error in installation instructions

Small typo: I believe this line in the installation instructions contains an error.

install.packages('tidyverse','data.table')

The correct code would use the c() function to create the vector of package names.

install.packages(c('tidyverse','data.table'))

Dockerize this app

I'm a big fan of Docker, because it provides a universal development environment and thus prevents the infamous "but it works on my machine" problem.

Provide a machine-readable data schema using data packages

A data package is a lightweight standard to describe tabular data. In it's simplest form, it's a datapackage.json file like:

{
  "name": "big-mac-data",
  "description": "Lorem ipsum",
  "resources": [
    {
      "name": "big-mac-full-index",
      "path": "output-data/big-mac-full-index.csv",
      "schema": {
        "fields": [
          {
            "name": "date",
            "type": "date",
            "constraints": {
              "required": true
            }
          },
          // ...
        ]
      }
    }
  ]
}

You can see a full example of a data package at https://github.com/vitorbaptista/birmingham_schools.

There are multiple libraries that understand this format, for example https://github.com/frictionlessdata/goodtables-py allows the data to be validated by running goodtables datapackage.json (this can even be run automatically using https://goodtables.io/), https://github.com/frictionlessdata/datapackage-py allows loading the data in Python (automatically validating and casting the data to their specific types), and there are others for R, JavaScript, Ruby and others.

I'd be happy to talk more about it, and/or write a datapackage.json and send a PR.

(cc @serahrono @pwalsh)

Data Update 2023

Hi, when will the data be updated for the July 2023 data release please?

Question about Change in Methodology (US)

The change in methodology has created a significant (downward) price difference in the U.S. Can the names of the "four major US cities" used in the prior methodology be shared?

Data via Quilt

Quilt provides a way to treat data like code packages. It would be good to hook this data into that service. Then someone who wants to use the data can just write:

$ quilt install theeconomist/big_mac_index
$ python
>>> from quilt.data.theeconomist import big_mac_index

Data availability prior to 2000 + Frequency Change

Dear Developers / Maintainers,
How are you today?

I'm writing to check if data before 2000 is available? Based on wikipedia it seems this index was created in 1986, but it would appear that the data on this repo starts in year 2000.

Also, it would appear that there seems to be a frequency change from annually 2000~2005 to semi-annually after 2006. How often is the data refreshed?

Thank you and have a wonderful day!
All the best,
Kathy Gao

Reconciling IMF GDP PC data and the figures in the dataset

Hi and thanks for the package!

You note that GDP PC ($) figures are taken from the IMF but there's no GDP data for most countries (50-70% of the 56 countries) in the dataset pre-July-2011, and consequently no adj_price, etc. I'd be interested in extending the data back to 2000 if possible. So I'd be curious why this wasn't possible with your original IMF source/dataset, since the data does exist.

I'd also be interested in which source you did use as I downloaded the IMF World Economic Outlook Database data from datahub to try and do so myself. This database covers 55 of the 56 countries in the Big Mac dataset. Though observations are on a yearly basis, comparing the GDP PC ($) from this dataset versus the figures from the Big Mac dataset shows consistent variations:

image

Plot above shows average proportional difference (IMF_GDP-big_mac_GDP)/big_mac_GDP when grouped by year and country where big_mac_GDP is the average of the two GDP figures if there are two in a given year.

Assuming the IMF dataset above is downloaded and saved as values_csv.csv, then the following code reproduces the plot above:

library(tidyverse)

big_mac_data <- readr::read_csv("big-mac-full-index.csv") %>% 
  janitor::clean_names()

IMF_data <- read_csv("values_csv.csv") %>% 
  filter(
    Indicator == "NGDPDPC", # this is the indicator code for GDP pc in $
    Country %in% unique(big_mac_data$iso_a3), 
    Year %in% 2000:2020
  ) 

GDP_data <- big_mac_data %>% 
  mutate(year = lubridate::year(date)) %>% 
  group_by(iso_a3, year) %>% 
  summarize(big_mac_GDP = mean(gdp_dollar, na.rm=T)) %>% 
  inner_join(IMF_data, by=c("year"="Year", "iso_a3"="Country")) %>% 
  rename("IMF_GDP" = Value) 

GDP_data %>% 
  filter(!is.na(big_mac_GDP)) %>% 
  mutate(var = IMF_GDP-big_mac_GDP) %>% 
  group_by(year, iso_a3) %>% 
  summarize(prop = mean(var)/big_mac_GDP) %>% 
  ggplot(aes(iso_a3, prop)) + 
  geom_bar(stat = "identity") +
  labs(title="") +
  coord_flip() +
  facet_wrap(~year)

Where did the original data come from?

This repo is great, I might use some of the data to teach a class.

I have a quick question, where did the original data come from?
In other words how did you obtain the big-mac-source-data.csv file.

Thanks

Turkey January-23

I think there might be a lag in prices for 2023 January in Turkey, right now a single Big Mac costs 88 TL in. Below you can find the image.
image

EUZ adjusted price enhancement

Hi,

Are the big mac prices really the same in the entire Euro zone, and if so, shouldn't be there a difference between a Big Mac in Germany and Slovenia or Greece for example?

I would assume that based on the GDP per capita there should be a difference that could be calculated for EUR countries.

Is this something you could consider for a future update?

Thanks,
Florian

Jan 2023 updated data

Is there plan and/or timeline to update the git data with the Jan 2023 updated data?

Question about the exchange rate

Hi Developers,

Thanks for creating this amazing dataset!
I have a question about the exchange rate used in the dataset. Do you use the rate when generating the output dataset or use the rate when pushing the dataset into Github?

Thanks!

Big Mac Material Shrink/Inflation

It has been asserted that over the years, the Big Mac itself has shrunk, or has gotten bigger[1][2]. I don't know if this is true or not, but:

Does The Economist, having used the BMI for a long time:

  • Have any hard or soft primary source data on this?

    • Interviews with employees?
    • Old recipe manuals?
  • If so, incorporate any of this into the raw or adjusted data?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.