GithubHelp home page GithubHelp logo

isabella232 / dpkg-uk25k Goto Github PK

View Code? Open in Web Editor NEW

This project forked from datagovuk/dpkg-uk25k

0.0 0.0 0.0 205 KB

UK departmental spending above 25000 GBP

Home Page: http://data.gov.uk/openspending

Python 100.00%

dpkg-uk25k's Introduction

UK departmental spending over GBP 25000

This repository contains scripts to acquire, clean and process the spending information released by the UK central government.

ETL stages

The scripts have several stages that need to be run in order:

  • build_index - will find all related metadata (tagged: spend-transactions) on data.gov.uk
  • retrieve will then try to fetch all the files
  • extract will attempt to parse CSV/XLS/... and load it into a DB
  • combine column names are mapped and values are stored in one central table
  • cleanup
  • validate
  • report creates the report HTML

Setup

First clone this repo:

git clone https://github.com/okfn/dpkg-uk25k.git

You need to install the dependencies (best in a python virtual environment):

virtualenv pyenv-dpkg-uk25k
pyenv-dpkg-uk25k/bin/pip install -r requirements.txt

The default configuration is in default.ini. If you want to change the configuration, copy it config.ini and edit it there.

Before you can run the scripts you need to prepare a database:

sudo -u postgres createdb uk25k

Now create a postgres user for your unix user name:

sudo -u postgres createuser -D -R -S $USER

And allow access to the database by editing /etc/postgresql/9.1/main/pg_hba.conf and adding this line:

local uk25k all trust

Now restart postgres:

sudo service postgresql restart

Running the scripts

Run the scripts like this:

. pyenv-dpkg-uk25k/bin/activate
cd dpkg-uk25k
python build_index.py
python retrieve.py
python extract.py
python combine.py
python cleanup.py
python validate.py
python report.py reports

Or do the whole lot together:

python build_index.py && python retrieve.py && python extract.py && python combine.py && python cleanup.py && python validate.py && python report.py reports

Before running the scripts again, be sure to clear out old data from the issues table or from all tables like this:

sudo -u postgres dropdb uk25k
sudo -u postgres createdb uk25k

To limit the analysis to one publisher, specify the name as a parameter to build_index:

python build_index.py wales-office

Open Issues

?

Punted

  • PDFs
  • Zip files containing a bunch of CSVs (potentially for a number of publishers)

dpkg-uk25k's People

Contributors

asuffield avatar pudo avatar rossjones avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.