GithubHelp home page GithubHelp logo

datasets / publicbodies Goto Github PK

View Code? Open in Web Editor NEW
61.0 27.0 28.0 13.8 MB

A database of public bodies such as government departments, ministries etc.

Home Page: http://publicbodies.org

License: MIT License

CSS 36.47% Python 7.17% HTML 17.40% Less 38.96%
open-government open-data open-knowledge-international ministries department police fire

publicbodies's Introduction

Data

A database of public bodies (or organizations):

Government-run or controlled organizations or entities which may or may not have distinct corporate existence

Examples are:

  • Government Ministries or Departments
  • State-run Health organizations
  • Police and fire departments

Visit the site: https://publicbodies.org/

Data

Data is stored in CSVs partitioned by country or region (e.g. EU) in the data folder. Files are named by two-letter ISO code.

Contribute data

Please just add a CSV file and submit a pull request or open an issue.

The set of fields required in the CSV file can be seen in the field list on: public-body-schema.json. You can also check out the existing data in data/ for hints. To learn more about Data Packages, visit https://specs.frictionlessdata.io/.

If you can, developing a bot to automatically and periodically collect the data is even better.

For developers of the website

The website is a Jekyll site. To get it running locally:

  1. Install Docker.

  2. Get the code

    git clone https://github.com/okfn/publicbodies
    cd publicbodies
  3. Run Jekyll

    cd website
    export JEKYLL_VERSION=4.2.0
    docker run --rm --volume="$PWD:/srv/jekyll" -it jekyll/minimal:$JEKYLL_VERSION jekyll build --baseurl $PWD/_site/ --watch

    The built website will appear on the website/_site folder.

The list of outstanding issues is at: https://github.com/okfn/publicbodies/issues

For developers of data collector bots

Data is kept automatically up-to-date by bots that collect and update data once a week. The scripts are kept on the scripts/import directory, followed by the international place code (e.g. br for Brazil, it for Italy).

The script MUST be runnable from a command line interface. It should display the available options if run with the --help parameter, and output data to the file chosen by the --output parameter. For example:

python3 scripts/import/br/import_br.py --help
usage: import_br.py [-h] [--output file_name]

Imports Brazilian public body data from the official source and complements it
with data from several auxiliary sources. Official source: [SIORG's open data
API](https://dados.gov.br/dataset/siorg)

optional arguments:
  -h, --help          show this help message and exit
  --output file_name  filename for the data output as CSV

When making requests, bots MUST use the Public Bodies Bot user agent string to identify themselves to servers:

PublicBodiesBot (https://github.com/okfn/publicbodies)

If using Python, use the same libraries already defined in scrips/requirements.txt, in order to keep the project dependencies tidy, and only add new ones if strictly necessary.

After creating a new bot, make sure to add it to the update data workflow so that it runs regularly and keeps the data up-to-date.


Original preparation

Details of the automated data extraction to build the original database.

Data sources:

publicbodies's People

Contributors

andylolz avatar augusto-herrmann avatar danfowler avatar dependabot[bot] avatar github-actions[bot] avatar hannesgassert avatar ljoelle avatar nikeshbalami avatar okfngr avatar pudo avatar roll avatar rufuspollock avatar todrobbins avatar traversaro avatar wombleton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

publicbodies's Issues

Specify source code license

Is the license for the source code of this project (not the data, as that is a separate issue) specified somewhere? I couln't find it. Please include a (preferrably) open source license or, if there is already one, make it more evident (e.g. mention on the README and/or include a COPYING.txt file).

Note: it may be necessary to:

  1. Suggest a propositional license here; and
  2. Obtain consent from each contributor of source code in this project to license his/her work under said proposed license.

Lower case country leads to dead page

The search leads to links like publicbodies.org/gb which is an empty page but the front page leads to publicbodies.org/GB. These need to be harmonised.

Search support

Options

  • JS solr (lunrjs etc)
  • Separate solr
  • Google custom search (require us to build a site-map or list everything on the front page)
  • No search

Implement hierarchy browser

For countries for which we have a good tree structure being able to browse that tree in the UI would be very helpful.

Requirements:

  • Go from a public body to its parent body (done)
  • See a list of child bodies per public body
  • Present overview per jurisdiction in tree / forrest form

List Bodies On Per-Country Pages

The index page is quite long, and atm ~75% is probably not relevant to a given user. I spent ~15 minutes working on splitting them out. Should I continue? Thoughts?

screen shot 2013-05-30 at 9 27 47 am

I'm slightly confused by the website tagline

The Public Bodies tagline is "A URL for every part of government"

yet very non-government entities pop-up on the UK list e.g. ASDA

It would be less catchy a tagline but perhaps, "A URL for every FoI-able public sector organisation" might be more accurate, less confusing?

Add i18n support

The web application should support internationalization.

I also suggest we create a project to localize it in Transifex.

That should help users of other languages to browse for public bodies in their native language.

Licence for whatdotheyknow data

Has the licence for the whatdotheyknow list of public bodies been established? We asked a few months ago and they didn't have one, although no doubt with a good nudge they would be happy to.

Organisation identifiers (for discussion)

This is an idea that I've been thinking about for a while. I discussed it with @rgrp a couple of weeks ago and wanted to share it with the list to see what everyone thinks.

The short version: could public bodies be used to generate usable organisation identifiers?

Background

The IATI Standard is an XML based format for sharing detailed information about aid projects. Fundamentally, the model shows resource flows from one organisation to another, with various classifications in between and many financial transactions as part of each project. So like this:

activity (DFID -> World Health Organisation)
  - transaction (GBP 500 disbursed on 2013-05-01)
  - transaction (GBP 500 disbursed on 2013-07-05)

For the private sector and NGOs, the methodology for uniquely identifying organisations is:

Jurisdiction-National registration body-Number
e.g. for Oxfam GB, registered at the Charity Commission, with reg number 202918:
GB-CHC-202918

For governments, the following methodology is used:
Jurisdiction-OECD/DAC Agency code
e.g. for the UK's Department for International Development:
GB-1

For multilaterals, we use the following methodology:
OECD/DAC Channel code
e.g. for the World Bank's International Development Association (IDA):
44002

Problems

Agency codes

  • Agency codes only include donor agencies. So the Ministry of Finance in Botswana, for example, does not have a code.
  • Agency codes don't even include all donor agencies: for example, parts of the European Commission or the United States, even though they give aid, don't have their own identifier - they're categorised under Miscellaneous.
  • The process for adding new agency codes is slow (even if it took a day, that might be too long)

Channel codes

  • Channel codes only contain a subset of all of the multilateral / international / intergovernmental organisations in the world, and many of them are not listed in a very usable way. For example, the World Health Organisation has two codes:
    a) World Health Organisation - core voluntary contributions account
    b) World Health Organisation - assessed contributions
    --> but there isn't one for just "World Health Organisation", for example if you're contracting them to deliver a project.

Many organisations publishing IATI data will therefore struggle to provide unique organisation identifiers for many of the public sector / international organisations that they are working with.

Rationale

  • Official lists of organisations should be used if possible.
  • Official lists of organisations don't exist in most cases.
  • The exact identifier assigned to an organisation is not fundamentally important (whether it's BW-1 or BW-21, the Botswana Ministry of Finance just needs a code).
  • Organisation identifiers should be cross-mapped to other codes / identifiers for those organisations so that the data is easily interoperable.

Proposal

Fuzzy reconciliation / text matching of organisations, with an API that assigns an existing identifier where available, and creates a new one where it's not available

  1. Organisations (initially, preferably those with a large amount of data) throw four key pieces of data at the API:
  • organisation name (text) - e.g. MINISTRY OF FINANCE
  • organisation country (code) - e.g. BW (for Botswana)
  • language (code) - e.g. en
  • last recorded transaction with this organisation (date) - e.g. 2013-07-05
  1. the API responds with one of the following (possibly using HTTP status codes?):
    a) Organisation found => use code BW-1
    b) Organisation not found => created code BW-21

    it also stores the data about the last recorded transaction, so that other people know that that organisation may have existed on that date.

Another source could be Charts of Accounts, existing lists (like those that exist on PB already), budget documents, and structured spending data, e.g. from OpenSpending.

Dealing with duplicates

This will probably lead to some duplicates being created. There could be some manual reconciliation for this. Organisations could have a primary identifier and several secondary identifiers that were used by duplicate organisations..

Dealing with changing organisations

Organisations can be created / deleted / merged in the real world. This should probably lead to:
a) created - a new identifier gets created;
b) merged - a new identifier gets created for the new organisation; and (manually) the old organisations are linked / related to the new organisation;
c) deleted - the identifier continues to exist, because old (and possibly future) data will still refer to it. However, it should be (manually) marked as no longer existing, pointing to a successor organisation of one exists (with some flag to explain whether it's a wholly .

Questions

  1. Does this sound sensible? Is it a good idea? Is there a better alternative?
  2. Will the fuzzy matching be accurate enough to be useful? Is it likely to assign organisations an incorrect code?
  3. How should the identifiers be identified as being created by Public Bodies - just a prefix like PB-?

OECD-DAC codelists:

Lifecycle issues

Public bodies change frequently and it would be good to agree how to deal with this. I think having a sense of permanence for URLs is useful, so I suggest:

Suggest:

  • URLs for a body must never change
  • Title should not change. If a body changes its name then it should be handled as if it died and a new one was created.
  • When a body dies it should be marked as inactive.
  • If a body takes over the main role of a previous body, then the old body should have a 'redirect' to the new body stored with it.
  • If a body's abbreviation or other property changes then that is ok (e.g. DBIS -> BIS)

Home page issue with Firefox

it looks like with Firefox the two main DIVs, the one with the jurisdictions and the sidebar on the right overlap a bit. On both Chrome and Safari are instead well positioned

Instructions for data contributors

This should probably go on the wiki once finished.

Fields

  • key names:
    • should be url suitable: alphanumeric + '-' only
    • use - rather than _
    • use abbreviations where appropriate
  • use iso formatted date / times

To discuss

  • Do we need last modified and created?
  • Do we want both parent and parent_key?

What Public Bodies

  • National or local departments or agencies
  • (Probably) Not every school of fire station in existence.

Asides

  • Write up a description of the columns

Change key to use slug

Let's get rid of random generated uuid parts for keys and use slug instead.

  • Check that slugs are unique per jurisdiction
  • implement the change

Also:

  • What about rename key => id?

Ensure id present for all Greek public bodies

@okfngr just noticed that in #43 PR a lot of public bodies were missing a key field (now called id).

Would it be possible to generate and add an id field to all records - an id field is required and is necessary for the frontend to work.

We also seem to be missing jurisdiction codes (which are required) for

gr/dpa
gr/adae
gr/asep
gr/esr
gr/synigoros
gr/minedu
gr/neagenia
gr/gsae
gr/culture
gr/gss
gr/gsrt
gr/minedu
gr/minedu
gr/minedu
gr/gak
gr/gak
gr/iky

Support for sending corrections / additions

Several options:

  • Fork and pull (good for bulk corrections and submissions)
  • We could load the CSVs into google docs and have people edit then remerge
    • perhaps we can / should have them permanently there
  • Submission of individual corrections (feedback form style) - Suggest the google forms hack approach (we'll just submit stuff into gforms via js ...) - cf http://github.com/okfn/opendatacensus which uses this technique for city submissions

Switch to simple web app with templating

e.g. nodejs + nunjucks + deploy on heroku

Note we would still just load raw csv when app loads - heroku 512 MB limit should be fine give amount of data we have so far ...

Build to flat files and deploy to s3

  • Build
  • Deploy

Build

Let's use nunjucks

var env = new nunjucks.Environment();
var tmpl = env.getTemplate('test.html');
console.log(tmpl.render({ username: "james" }));

Field for local identifier

There should a field in the csv schema to store the local identifier code for a public body, if available. This would make it possible to later create a global identifier (as per #41) and also to keep track of bodies that change names.

Rework schema (list of headers) and document

@rossjones suggested: "Would it make sense for publicbodies.org to follow the popolo spec at http://popoloproject.com/data.html" (that link is now broken)

Correct link is: http://popoloproject.com/specs/organization.html

Seems a great idea!

Current fields

Current fields and suggested changes (e.g. to be in line with popolo as much as possible). Note the list of changes is in progress and incomplete.

  • title => name (in org name)
  • abbr => abbreviation
  • key => id (?)
  • category => classification
  • parent => DELETE (just have parent_id)
  • parent_key => parent_id
  • description
  • url
  • jurisdiction => DELETE (just have jurisdiction code)
  • jurisdiction_code = ISO 2 digit code where that exists. Otherwise we coin.
  • source => DELETE in favour of source URL (??)
  • source_url => keep
    • make clear there is no point pointing at exactly the same API endpoint - much more useful to point at a specific location
    • (??) DELETE entirely and just credit in contributor notes (we already have a bunch of different sources for data and as people add the problem will get worse)
    • Could have multiple sources per entry (??)
  • address
  • contact => What's the difference from address
  • email
  • tags => keep
    • at the moment several of the files use tags (though not necessarily consistently)
  • created_at => DELETE (little value ...)
  • updated_at => DELETE (ditto)

Add:

  • other_names: semi-colon separated list of alternate names
  • founding_date: ISO 8601
  • dissolution_date: ISO 8601
  • image

Consider switch to JSON from CSV

Pros / Cons

  • (+) Greater flexibility, ability to directly match org spec
    • In particular can handle multiple values, multiple identifiers
  • (-) Much bigger and less compact. Harder for people to work with (e.g. CSV usable in spreadsheets etc)
  • (-) More complexity (but perhaps necessary)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.