GithubHelp home page GithubHelp logo

epogrebnyak / weo-reader Goto Github PK

View Code? Open in Web Editor NEW
31.0 3.0 12.0 7.12 MB

Python client to read IMF World Economic Outlook (WEO) dataset as pandas dataframe.

Python 12.03% Jupyter Notebook 87.72% Just 0.25%
weo economic-data economic-datasets macroeconomics international-economics country-statistics

weo-reader's Introduction

weo-reader

PyPI pytest Downloads

Open In Colab Open in Streamlit

This is a Python client to download IMF World Economic Outlook Report dataset as pandas dataframes by release dates. You can explore:

  • single country macroeconomic data and forecast,
  • macro variables across countries for a given year,
  • country-year panel for single macro variable.

Testimonials

Thanks for your contribution and am happy to be able to work off of your codes. This is really awesome.

I have been using your WEO API which is very great!

I wanted to express our appreciation for your package, weo-reader. We have used the package extensively in our work. It has been an invaluable tool for efficiently updating our database and conducting research.

Dataset releases (vintages)

Dataset releases (vintages) are available back to 2007, the reported data goes back to 1980, forecast is three years ahead.

Release Date
Latest confirmed April 2024
First October 2007

Confirmed release is tested to be processed with weo. Usually, if something breaks in a new release users raise an issue here.

изображение

Install

The program runs with Python 3.9 or higher.

To install:

pip install weo

Latest version:

pip install git+https://github.com/epogrebnyak/weo-reader.git

First glance

Get US inflation forecast from April 2022 semiannual WEO release.

from weo import download, WEO

path, url = download(2022, 1)
# weo_2022_1.csv 18.8Mb
# Downloaded 2022-Apr WEO dataset

df_cpi = WEO(path).inflation()
print(df_cpi.USA.tail(8))
#         USA
# 2020  1.549
# 2021  7.426
# 2022  5.329
# 2023  2.337
# 2024  2.096
# 2025  1.970
# 2026  1.983
# 2027  2.017

Step 1. Download data

Save data from IMF web site as local file. Specify year and release:

import weo

weo.download(year=2020, release="Oct", filename="weo.csv")
  • You can access WEO releases starting October 2007 with this client.
  • WEO is normally released in April and October, one exception is September 2011.
  • Release is referenced by:
    • number 1 or 2;
    • month 'Apr' or 'Oct', and 'Sep' in 2011.

Your can list all years and releases available for download with weo.all_releases(). Combine to create local dataset of WEO vintages from 2007 to present:

import pathlib
import weo

# create folder
pathlib.Path("weo_data").mkdir(parents=False, exist_ok=True)

# download all releases
for (year, release) in weo.all_releases():
  weo.download(year, release, directory="weo_data")

Step 2. Inspect data

Use WEO class to view and extract data. WEO is a wrapper around a pandas dataframe that ensures proper data import and easier access and slicing of data across time-country-variable dimensions.

Try code below:

from weo import WEO

w = WEO("weo.csv")

What variables and measurements are inside?

# variable listing
w.variables()

# units
w.units()
w.units("Gross domestic product, current prices")

# variable codes
w.codes
w.from_code("LUR")

# countries
w.countries("United")      # Dataframe with United Arab Emirates, United Kingdom
                           # and United States
w.iso_code3("Netherlands") # 'NLD'

The dataset is year-country-variable-value cube, you can fix any dimension to get a table.

w.get("General government gross debt", "Percent of GDP")
w.getc("NGDP_RPCH")
w.country("DEU")
w.fix_year(1994)

Plot a chart with the projected 12 largest economies in 2024 (current prices):

(w.gdp_usd(2024)
  .dropna()
  .sort_values()
  .tail(12)
  .plot
  .barh(title="GDP by country, USD billion (2024)")
)

Get GDP per capita data from 2000 to 2020:

w.gdp_pc_usd(start_year=2000, end_year=2020)

Code documentation

weo package documentation is here.

Alternative data sources

1. If you need the latest data as time series and not the vintages of WEO releases, and you know variables that you are looking for, DBnomics is a good choice:

Example:

from dbnomics import fetch_series_by_api_link
ts1 = fetch_series_by_api_link("https://api.db.nomics.world/v22/"
                               "series/IMF/WEO:latest/DEU.PCPI"
                               "?observations=1")

dbnomics

More on DBnomics:

2. Similar dataset, not updated since 2018, but with earlier years than weo-reader: https://github.com/datasets/imf-weo

Development notes

  • You can download the WEO file in command line with curl command:
curl -o weo.csv https://www.imf.org/-/media/Files/Publications/WEO/WEO-Database/2020/02/WEOOct2020all.xls
  • WEOOct2020all.xls from the web site is really a CSV file, not an Excel file.
  • There is an update of GDP figures in June 2020, but the file structure is incompatible with regular releases.
  • Prior to 2020 the URL structure was similar to https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls

weo-reader's People

Contributors

aneziac avatar dependabot[bot] avatar epogrebnyak avatar jm-rivera avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

weo-reader's Issues

October 2021 vintage cannot be downloaded

When trying to download the October 2021 vintage, a CSV with some gibberish is downloaded. This is because the URL created by the package (through make_url_countries()) is of the old format. Just like for the April 2021 vintage, the URL has changed format.

Update dependencies

Merging, if you can update the depentencies and notch up our package version in poetry will accept as well.

Welcome to open a separate issue on SDMX endpoint. Does it provide access to all years? Can do some planning in an issue, then integrate your code in weo-reader if you would feel like it.

Originally posted by @epogrebnyak in #36 (comment)

April 2021 release does not load

Hi,

I have been trying to download Apr 21 release but the downloaded file is empty (0.0Mb) and then there is a bug when trying to get any data. How to fix the issue ?

WEOJune2020update

Base functionality does not support downloading the recent update from the IMF.

EDIT: Seeing now that the June Update dataset is not in the same format or for all countries. Will leave this open in case there is eventually an update to the source dataset.

Perhaps something along the lines of:

def to_month(year: int, period: int):
    check_period(period)
    month = {1: "Apr", 2: "Oct"}[period]
    # Second 2011 WEO issue was in September, not October
    if year == 2011 and period == 2:
        month = "Sep"
   # Real GDP outlooks were updated in June
    if year == 2020 and period == 1:
        month = "Jun"

    return month

EDIT 2: Closing as this happens frequently and is not specific to this June ..

make a listing of all available dates

Possible interface:

weo.all_dates()
assert weo.dates(2007) = ["2007-Oct"]
assert weo.dates(2011) = ["2011-Apr", "2011-Sep"]
assert weo.dates(2020) = ["2020-Apr"]

from typing import List

def dates(year: int) -> List[str]:
    raise NotImplementedError

def all_dates() -> List[str]:
    raise NotImplementedError

Create Github Pages sphinx documentation

Todos:

  • generate sphinx documentations in a site folder.
  • Use ghp-import to handle pushing of the documentation to the gh-pages branch.
  • Create github action workflow to publish documentation in the pipeline.
  • Use rinohtype to generate pdf version of the documentation.
  • Add instruction to justfile to generate pdf documentation version.

Fails for April 2021 Database

Running the following code

from weo import download

download(year=2021, release="Apr", filename="weo.csv")

results in curl returning the HTML not found page instead of the csv.

The link for the April 2021 database is https://www.imf.org/-/media/Files/Publications/WEO/WEO-Database/2021/WEOApr2021all.xls. So, it seems likely that IMF went back to their original URL format and October 2020 is just an anomaly. Or maybe only October releases will now include the /02/. The code needs to be fixed either way.

doesn't work for 2020-October database

weo package doesn't seem to work for 2020-October database.
running this code:

from weo import download
from weo import WEO

download("2020-Oct", path='weo.csv', overwrite=True)
w = WEO("weo.csv")

gives the following error:

ParserError Traceback (most recent call last)
in
----> 1 w = WEO("weo.csv")

~/.local/lib/python3.7/site-packages/weo/dataframe.py in init(self, filename)
111
112 def init(self, filename):
--> 113 self.df, _ = read_csv(filename)
114
115 @Property

~/.local/lib/python3.7/site-packages/weo/dataframe.py in read_csv(filename)
31
32 def read_csv(filename):
---> 33 df = pd.read_csv(filename, delimiter="\t", encoding="iso-8859-1")
34 ix = df["Country"].isna()
35 return df[~ix], df[ix]

~/.local/lib/python3.7/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
686 )
687
--> 688 return _read(filepath_or_buffer, kwds)
689
690

~/.local/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
458
459 try:
--> 460 data = parser.read(nrows)
461 finally:
462 parser.close()

~/.local/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
1196 def read(self, nrows=None):
1197 nrows = _validate_integer("nrows", nrows)
-> 1198 ret = self._engine.read(nrows)
1199
1200 # May alter columns / col_dict

~/.local/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
2155 def read(self, nrows=None):
2156 try:
-> 2157 data = self._reader.read(nrows)
2158 except StopIteration:
2159 if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 48, saw 2

Incomplete release of April 2020 WEO

The April 2020 weo seems to have a number of changes (identifier, keys, etc.). Not all methods are properly working with that file. Also some of your tests are failing with this new weo file edition.

Bug: WEO download url changed again

The month has been added to the url.

Previously, it would have been: https://www.imf.org/-/media/Files/Publications/WEO/WEO-Database/2024/WEOApr2024all.ashx

Now it is: https://www.imf.org/-/media/Files/Publications/WEO/WEO-Database/2024/April/WEOApr2024all.ashx

Will submit a PR that adds logic to deal with this from the current release onwards.

MATPLOTLIB dependency

The reader requires a specific version of matplotlib (version 3.2.0), which requires downgrading matplotlib as newer versions of matplotlib are released as part of distro packages (e.g., Anaconda). Can you instead require a minimum version number instead of an exact match? Thanks for the excellent reader.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.