tmtmtmtm / csv_to_popolo Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Currently and area's type
is hard-coded to be constituency
. For Uganda we want to have some areas which have a type
of district
to handle the Women's Representatives, who are assigned at the district rather than constituency level.
People can use a StringIO if they need to.
Currently honorific_prefix
only takes a single value but for cases such as Samoa a member can have multiple prefixes.
The Samoan PM has the full title: 'Susuga Hon AUELUA FATIALOFA LUPESOLIAI TUILAEPA Dr Sailele Malielegaoi'
(http://www.palemene.ws/new/members-of-parliament/members-of-the-xvi-parliament/)
The names in uppercase are tribal names and are very much discrete parts as they can be listed in any order.
As one person may hold a number of different matai names from different branches of their genealogy, the new names are also added before their Christian name, with no set order in terms of general usage.
(https://en.wikipedia.org/wiki/Fa%27amatai#Naming_convention)
In the above example we also have a professional title (Dr).
Make csv_to_popolo accept multiple prefixes separated by a semi-colon.
if someone passes plain numeric IDs for people, parties etc, these will clash.
Links to Commons are coming through as https://commons.wikipedia.org/wiki/Category:Jenniffer_González, which correctly resolves to the right place (commons.wikiMEDIA.org), but it would be good to give the proper URL.
Links to Wikiquote/Wikinews etc are totally wrong: e.g. https://enwikiquote.wikipedia.org/wiki/Tony_Abbott
Currently skips:
"mapit_id",
"mapit_url",
"gss_code",
"party_ppc_page_url",
"facebook_personal_url",
"parlparse_id",
"theyworkforyou_url",
"elected"
PopIt gets very upset by a Relationship to an Org that doesn't exist.
We should be able to note that a column has a multivalue_separator
to signify that it holds a list of values, rather than just a single value.
For example:
email: {
type: 'asis',
multivalue_separator: ';',
},
And then the source could have [email protected];[email protected]
as an email
value.
Then, when extracting the data from any such field, we should split on the separator, and build up a list of values for it, rather than assuming it will only have one (similarly to what we've currently hard-coded for alternate_names
).
The Popolo spec requires ID fields to be Strings
These are attached to the wrong Memberships.
e.g. date of birth
phone, fax, cell, etc.
Rather than requiring people to produce CSV with column names that exactly map, allow them to specify which columns should remap to which Popolo concepts.
This would also be a good time to take care of #33
Find generic role names for being an MP and an executive member that read well in popit's sentence descriptions.
In #106 we changed the field separator for the URL field to not split on semi-colons that are within URLs.
However, this is now too tight, as it fails to split URLs like
https://www.gov.uk/government/organisations/wales-office ; http://www.aluncairns.co.uk
Make sure that all JSON we're generating validates against http://popoloproject.com/schemas/
Currently we set an Area name to r[:area] || 'unknown'
But this should be more like the Group name, where we also handle empty strings: r[:group] = 'unknown' if r[:group].to_s.empty?
In cases where legislators can also be members of the executive, allow those roles to also be specified.
e.g accept any of party
, bloc
, faction
etc as group
.
If given optional files of of Terms, or Parties, or Areas, combine those with the main data file.
Provide a list of the columns we didn't map to Popolo fields. (And maybe on the ones we did.)
Rather than just nesting Areas on Memberships, promote them to first class citizens in their own right.
We don't have a separate field for TTY numbers. One example of a site listing such number is Puerto Rico: http://www.tucamarapr.org/dnncamara/web/ComposiciondelaCamara/biografia.aspx?rep=39
It would be a good idea if we could store these numbers.
Simply setting a :date
converter should mean that it will automagically convert anything that Date.parse()
can understand
e.g. from https://morph.io/tmtmtmtm/congo-assemblee
(or split them, as in Australia)
should be an array of Name objects.
Currently each Person gets a new Organization
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.