developmentseed / climatescope-data Goto Github PK

The data processing scripts for the Global Climatescope

Python 95.68% Shell 1.81% JavaScript 2.51%

climatescope-data's Introduction

This script processes the data for the Global Climatescope and prepares it for use on the website. It works as a make file and will rebuild the full dataset on every run.

Usage

A quick overview of the process:

Provide source data
The source data is stored in the source folder.
Run script
python cs-core.py
Move output to Jekyll site structure

Source data

Core CS data

The script expects the core Climatescope data to be provided in .xlsx format and stored in source/cs-core. The data for each edition should be in separate files that are named after its year. For example: source/cs-core/2013.xlsx and source/cs-core/2014.xlsx.

These files should contain the following sheets:

score
param
ind

Each of these files, should have the following columns:

column	description
id	Contains the id of the score, param or indicator
iso	The ISO 3166 code of the country, state or province
score	The score

Any extra columns or sheets will be ignored by the script.

Notes:

the header of the different sheets should not have filters enabled
the structure of the files was proposed by BNEF for the first edition of the Global Climatescope

Sources

Shapefiles: Natural Earth Country and state capitals: Wikipedia

climatescope-data's People

Contributors

Stargazers

Watchers

Forkers

danlopez00 kublaj fagan2888 isabella232

climatescope-data's Issues

Print column headers CSV

The current CSV headers contains id's. For example:

name:en_var,type_var

Substitute this for human readable labels

Change folder api

Parameter jsons should be stored in:

[lang]/api/parameters...

Static maps script

Enhancements needed before handing in the script to the CS team:

general cleanup and optimization of code
documentation
check if Natural Earth shapefiles exist and if not, download them
currently runs Tilemill in the standard Ubuntu install folder.

Add regions

This was included in a previous iteration, but removed since:

  # Add region for the countries
  if df_meta_aa.ix[aa,'type'] == 'country':
    aa_region = {}
    region = df_meta_aa.ix[aa,'region']
    # Add the id of the region
    aa_region['id'] = region
    # Fetch the name of the region from the meta file
    aa_region['name'] = df_meta_aa.ix[region,'name:' + lang]
    aa_data['region'] = aa_region

Support unit localization

In the country profile script should be added support for localized units.
For example: billion => mil millones

Datapackage

Add datapackage, at least for the CSV files.

http://dataprotocols.org/data-packages/

Inactive indicators should not be ranked

For on-grid countries, some of the indicators are not used to calculate the score. In the app, these will be greyed out.

On the other hand, there are a lot of indicators with a 0 score.

Both should not be ranked.

Add support for missing data for a country

For the cs-core.py:

If there is a country in the meta file with administrative areas, but this country is not present in one of the core data files, pandas will throw an error when selecting an object in the dataframe.

With CD available in the metadata, but not in the core data, this:

df_main.ix['CD',0]

results in:

KeyError: 'CD'

This can be an issue in the future when not all countries have data for all years.

Add ranks for param and indicators

Countries:

calculate the global ranks for all parameters and indicators

States / provinces:

calculate the in-country ranks for parameters and indicators.

China on static maps

Check what's up with China on the static map. Wrong calc because of Northern hemisphere?

Meta file for country profile indicators

The matrix on the country profile script should be moves to a meta file.

CS Core optimizations

build_main_json
No real use any-more for this function
build_json_aa
Slice the full dataframe in the beginning and do the subsequent .loc only on the id of the parameter/indicator

Add descriptions to indicators

Historic state date on parameters

All the data on the parameter json is historic with the exception of states.
Data for states should also be made historic.

Floating point precision

There is an issue in Pandas with floating point precision: pandas-dev/pandas#2069

Check how we should solve this.

Build CSV

With the re-factoring of some pieces, the CSV's are not building anymore.

Selections like:

df_param = df_full.loc[(slice(None),param),:]

return a:

KeyError: '1'

Has to do with different types

Improve rank

Define what to do with ties (Bloomberg will get back with their point of view) and how to deal with 0 values.

Remove state_ranking on country

On countries, state_ranking: none

Ideally, this should be removed completely on countries.

Add ranking

When loading the full dataframe, we can determine the rankings for the administrative regions:

Countries:

global ranking
regional rankings

States:

country ranking

Static maps

Needed for production:

India states not being generated. NE's state data for India doesn't have the correct ISO codes.
Optimize the PNG's

Move bbox calculation function to module

The function that returns a new bbox should be moved to its own module, as it is generic enough to be used another time.

(https://docs.python.org/2/tutorial/modules.html#packages)

Document well and move somewhere useful.

Sort arrays

Whenever we have an array with data for multiple years, this should be sorted descending. This will result in a minor performance enhancement since we won't have to do this client-side.

Eg.

{
  "data": [
    {
      "value": 2.5,
       "year": 2015
    },
    {
      "value": 2.2,
       "year": 2014
    }
  ]
}

Lowercase ISO

ISO codes are lower-cased throughout the application. Currently this is not being dealt with in a very pretty way. Improve this, taking into account that BNEF provides data with ISO upper-cased.

Add 'real data'

For some indicators, Bloomberg is now providing actual data to illustrate the scores. For example:

indicator	score	data
Biofuels production	0.031	2.2 billion litres
Average electricity spot prices	0.027	71.4549 USD/MWh

This has to be added to the JSON in the following way:

"data": [
  {
    "overall_ranking": 9,
    "regional_ranking": 5,
    "value": 0.02615,
    "raw": {
      "value": 72.421,
      "unit": "USD / Kwh"
    }
  }
],

The units are stored in the meta file so they can be translated

5 decimals to values

To fix sorting issues reported in https://github.com/flipside-org/climatescope/issues/127, the values of the scores, parameters and indicators should have 5 decimals.

Build list with available years

Need for improvement in final version:

To check how many years of data we have, we loop over the contents of the source/cs-core folder and parse the filename.

To improve:

check only for files
only type xls
only well formatted yrs should be stored, otherwise an error message saying to check the filenames

Stats.json

For the sidebar stats, create an extra JSON with stats. First version can include stats by region:

avg, min, max per paramater + score.

Include topojson in make file

The topojson used on the main map, should be included in the data wrangling process.

Also -> lower-case the iso codes

@dereklieu We'll take care of this after the launch, but what does the Topojson do / how did you build it?