GithubHelp home page GithubHelp logo

climatescope-data's Introduction

This script processes the data for the Global Climatescope and prepares it for use on the website. It works as a make file and will rebuild the full dataset on every run.

Usage

A quick overview of the process:

  1. Provide source data
    The source data is stored in the source folder.
  2. Run script
    python cs-core.py
  3. Move output to Jekyll site structure

Source data

Core CS data

The script expects the core Climatescope data to be provided in .xlsx format and stored in source/cs-core. The data for each edition should be in separate files that are named after its year. For example: source/cs-core/2013.xlsx and source/cs-core/2014.xlsx.

These files should contain the following sheets:

  • score
  • param
  • ind

Each of these files, should have the following columns:

column description
id Contains the id of the score, param or indicator
iso The ISO 3166 code of the country, state or province
score The score

Any extra columns or sheets will be ignored by the script.

Notes:

  • the header of the different sheets should not have filters enabled
  • the structure of the files was proposed by BNEF for the first edition of the Global Climatescope

Sources

Shapefiles: Natural Earth Country and state capitals: Wikipedia

climatescope-data's People

Contributors

olafveerman avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

climatescope-data's Issues

Print column headers CSV

The current CSV headers contains id's. For example:

name:en_var,type_var

Substitute this for human readable labels

Static maps script

Enhancements needed before handing in the script to the CS team:

  • general cleanup and optimization of code
  • documentation
  • check if Natural Earth shapefiles exist and if not, download them
  • currently runs Tilemill in the standard Ubuntu install folder.

Add regions

This was included in a previous iteration, but removed since:

  # Add region for the countries
  if df_meta_aa.ix[aa,'type'] == 'country':
    aa_region = {}
    region = df_meta_aa.ix[aa,'region']
    # Add the id of the region
    aa_region['id'] = region
    # Fetch the name of the region from the meta file
    aa_region['name'] = df_meta_aa.ix[region,'name:' + lang]
    aa_data['region'] = aa_region

Support unit localization

In the country profile script should be added support for localized units.
For example: billion => mil millones

Inactive indicators should not be ranked

For on-grid countries, some of the indicators are not used to calculate the score. In the app, these will be greyed out.

On the other hand, there are a lot of indicators with a 0 score.

Both should not be ranked.

Add support for missing data for a country

For the cs-core.py:

If there is a country in the meta file with administrative areas, but this country is not present in one of the core data files, pandas will throw an error when selecting an object in the dataframe.

With CD available in the metadata, but not in the core data, this:

df_main.ix['CD',0]

results in:

KeyError: 'CD'

This can be an issue in the future when not all countries have data for all years.

Add ranks for param and indicators

Countries:

  • calculate the global ranks for all parameters and indicators

States / provinces:

  • calculate the in-country ranks for parameters and indicators.

China on static maps

Check what's up with China on the static map. Wrong calc because of Northern hemisphere?

CS Core optimizations

  • build_main_json
    No real use any-more for this function
  • build_json_aa
    Slice the full dataframe in the beginning and do the subsequent .loc only on the id of the parameter/indicator

Build CSV

With the re-factoring of some pieces, the CSV's are not building anymore.

Selections like:

df_param = df_full.loc[(slice(None),param),:]

return a:

KeyError: '1'

Has to do with different types

Improve rank

Define what to do with ties (Bloomberg will get back with their point of view) and how to deal with 0 values.

Add ranking

When loading the full dataframe, we can determine the rankings for the administrative regions:

Countries:

  • global ranking
  • regional rankings

States:

  • country ranking

Static maps

Needed for production:

  • India states not being generated. NE's state data for India doesn't have the correct ISO codes.
  • Optimize the PNG's

Sort arrays

Whenever we have an array with data for multiple years, this should be sorted descending. This will result in a minor performance enhancement since we won't have to do this client-side.

Eg.

{
  "data": [
    {
      "value": 2.5,
       "year": 2015
    },
    {
      "value": 2.2,
       "year": 2014
    }
  ]
}

Lowercase ISO

ISO codes are lower-cased throughout the application. Currently this is not being dealt with in a very pretty way. Improve this, taking into account that BNEF provides data with ISO upper-cased.

Add 'real data'

For some indicators, Bloomberg is now providing actual data to illustrate the scores. For example:

indicator score data
Biofuels production 0.031 2.2 billion litres
Average electricity spot prices 0.027 71.4549 USD/MWh

This has to be added to the JSON in the following way:

"data": [
  {
    "overall_ranking": 9,
    "regional_ranking": 5,
    "value": 0.02615,
    "raw": {
      "value": 72.421,
      "unit": "USD / Kwh"
    }
  }
],

The units are stored in the meta file so they can be translated

Build list with available years

Need for improvement in final version:

To check how many years of data we have, we loop over the contents of the source/cs-core folder and parse the filename.

To improve:

  • check only for files
  • only type xls
  • only well formatted yrs should be stored, otherwise an error message saying to check the filenames

Stats.json

For the sidebar stats, create an extra JSON with stats. First version can include stats by region:

avg, min, max per paramater + score.

Include topojson in make file

The topojson used on the main map, should be included in the data wrangling process.

Also -> lower-case the iso codes

@dereklieu We'll take care of this after the launch, but what does the Topojson do / how did you build it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.