GithubHelp home page GithubHelp logo

peterb / vz-coding-challenge Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 147 KB

Rails app that provides rake task that imports a CSV file into a PostgreSQL database of CO2 emissions data. Once imported it can be filtered by Sector, Year, and Territory ( classified as Country, Region, or Federation).

Ruby 83.82% JavaScript 2.53% CSS 1.52% HTML 12.13%
climate-change geography economic-data

vz-coding-challenge's Introduction

README

Setup

To get the application up and running you'll need to install:

  • Ruby 2.3.2
  • PostgreSQL
  • Bundled gems and their native extensions which compile to binary on your platform.

You'll also have to create a role named after your system username, and a database for the app owned by that user in PostgreSQL via psql. In order to run the tests, you will also need to create the test database.

Running Tests

You can run them together:

rails test

Or seperately:

rails test test/integration/interface_to_filterable_charts_test.rb;
rails test test/models/sector_test.rb;
rails test test/models/emission_test.rb;
rails test test/models/period_test.rb;
rails test test/models/territory_test.rb

Fixtures are used and are present for all models.

Loading Data With Rake

To find out the Rake tasks available which load the CSV file into the database, run:

rake -T | grep co2

To import the data, first drop, create and run migrations as needed:

rake db:drop
rake db:create
rake db:migrate

Then run co2 load:

rake co2data:load

It will log general operation straight to the console, and detailed information to a logfile in:

~/Downloads/co2data_load.log

The task will take a few hours to complete depending on the hardware its run on and the size of the co2 csv data file to be loaded which should be placed in:

~/Downloads/emissions.csv

The data loading doesn't use much RAM, about 2MB, but CPU load spikes very quickly when starting the task. Initial benchmarks on a modern laptop are:

2m row 66 = 1.188s (seconds) per row 4m row 155 = 1.548s per row 12m row 500 = 1.44s per row 27m row 1165 = 1.39s per row

This means the load task for the sample data set would take around 2.1 hours,for the full data set, 21 hours. Although feasible,this means waiting a couple of days to load data. Optimizations have been applied to the way in which Emission data points are created and saved to the database:

By using a single database transaction, benchmark is:

2m row 100, 1.2s per row

Or 1.6 hours, 16 hours for the whole dataset.

By reducing the number of INSERT statements, benchmarks are:

53s row 100 = 0.53s per row

Or 0.71 hours, 7.1 hours for the whole dataset, which means that data can be reloaded overnight on a single CPU.

What this app does and what it will be used for.

The test dataset for this app is 5'000 rows long.

It will be used to import a large dataset (just under 50'000 rows) pertaining to Carbon Emissions (CO2) by country, sector and region from 1850 until 2014 in a CSV file in an as of yet unknown format and imports it into Rails models via migrations for further analysis and use in filterable charts.

App architecture.

There are four objects in the data to be imported via CSV, Sector, Emission, Territory and Period.

Units used for credits and debits are tonnes. Emissions are a liability, so crediting them increases, and debiting them decreases any agregate.

I've modeled them as an accounting transaction to decrease ambiguity and safeguard operations surrounding negative or positive numbers as they are present in data.

The floating point features of Ruby and PostgreSQL, depending on their respective versions, and model validations, ought to cover the rest of the data integrity and correctness aspects of storage and computation.

If set up plainly, you can leave the database.yml file as-is.

Other things that will be covered here soon:

  • System dependencies

  • Configuration

  • Database initialization

  • Services (job queues, cache servers, search engines, etc.)

  • Deployment instructions

  • ...

vz-coding-challenge's People

Contributors

peterb avatar

Watchers

 avatar  avatar  avatar

vz-coding-challenge's Issues

Logfile safety.

The log file ought to be closed after each line is written, not left open.

World C02 emissions in tonnes for 2014 match a public website.

I.e.

test "World C02 emissions in tonnes for 2014 match www.co2.earth/global-co2-emissions." do
  results = Territory.where(code: 'WORLD').years.where(name: '2014').emissions.sum(:retrieve_tonnage)
  assert_in_delta( 3.59e8, results, 0.5e8, explain_test_failure)
end

def explain_test_failure
  "Expected total_co2_emissions for 2014 to be close to 3.59e10."
end

Tests will have to be changed to reflect larger dataset when it's imported. Currently out by a factor of 10
i.e. 3.59e8 will have to be changed to 3.59e9.

Filter Carbon Emissions (CO2) by country, sector, and region, for use in charts.

Data from 1850 until 2014 will be used. Some ideas along these lines:

test "Emissions can be filtered by sector only" do
  results = Sector.where(name: 'Agriculture').emissions.sum(:retrieve_tonnage)
end

test "Emissions can be filtered by territory, sector, and year." do
  results = Territory.where(code:'IRL').years.where(name: '1982').sectors.where(name: 'Mineral Products').emissions.sum(:retrieve_tonnage)
  assert_not_nil(results)
end

test "Emissions can be filtered by territory and sector." do
  results = Territory.where(code:'IRL').sectors.where(name: 'Mineral Products').emissions.sum(:retrieve_tonnage)
  assert_not_nil(results)
end

test "Emissions can be filtered by territories such as countries." do
  results = Territory.where(code:'IRL').years.where(name: '2014').emissions.sum(:retrieve_tonnage)
  assert_not_nil(results)
end

test "Emissions can be filtered by territories such as federations of states." do
  results = Territory.where(code:'EU28').years.where(name: '2014').emissions.sum(:retrieve_tonnage)
  assert_not_nil(results)
end

Also need to consider how carbon dioxide emissions in data will be computed given
the debit/credit focus of the app.

For example, emission1 + emission2 would yield the net carbon dioxide, likewise, Period.where(year:2014).emissions.sum would yield net CO2 in 2014.

Depending on the Ruby version, could implement addition on the Emission class, in order to use
the sum method of Enumerable, and maybe a mixin comes into play here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.