Serenata de Amor
Jarbas โ a tool forJarbas is part of Serenata de Amor โ we fight corruption with data science.
Jarbas is in charge of making data from CEAP more accessible. In the near future Jarbas will show what Rosie thinks of each reimbursement made for our congresspeople.
Table of Contents
JSON API endpoints
Reimbursement
Each Reimbursement
object is a reimbursement claim made by a congressperson. Each reimbursement isidentified by an unique combination of year
, applicant_id
and document_id
.
Retrieving a specific reimbursement
GET /api/reimbursement/<year>/<applicant_id>/<document_id>/
Details from a specific reimbursement. If receipt_url
wasn't fecthed yet, the server won't try to fetche it.
GET /api/reimbursement/<year>/<applicant_id>/<document_id>/receipt/
URL of the digitalized version of the receipt of this specific reimbursement.
If receipt_url
wasn't fecthed yet, the server will try to fetch it.
If you append the parameter force
(i.e. GET /api/reimbursement/<year>/<applicant_id>/<document_id>/receipt/?force
) the server will re-fetch the receipt URL.
Not all receipts are available, so this URL can be null
.
Listing reimbursements
GET /api/reimbursement/
Lists all reimbursements.
GET /api/reimbursement/<year>/
Lists all reimbursements from a specific year
.
GET /api/reimbursement/<year>/<applicant_id>/
Lists all reimbursements from a specific year
and applicant_id
.
Filtering
All these endpoints accepts any combination of the following parameters:
applicant_id
cnpj_cpf
document_id
issue_date_start
(inclusive)issue_date_end
(exclusive)month
subquota_id
year
order_by
:issue_date
(default) orprobability
(both descending)
For example:
GET /api/reimbursement/2016/?cnpj_cpf=11111111111111&subquota_id=42&order_by=probability
This request will list:
- all 2016 reimbursements
- made in the supplier with the CNPJ 11.111.111/1111-11
- made according to the subquota with the ID 42
- sorted by the highest probability
Also you can pass more than one value per field (e.g. document_id=111111,222222
).
Subquota
Subqoutas are categories of expenses that can be reimbursed by congresspeople.
Listing subquotas
GET /api/subquota/
Lists all subquotas names and IDs.
Filtering
Accepts a case-insensitve LIKE
filter in as the q
URL parameter (e.g. GET /api/subquota/?q=meal
list all applicant that have meal
in their names.
Applicant
An applicant is the person (congressperson or theleadership of aparty or government) who claimed the reimbursemement.
List applicants
GET /api/applicant/
Lists all names of applicants together with their IDs.
Filtering
Accepts a case-insensitve LIKE
filter in as the q
URL parameter (e.g. GET /api/applicant/?q=lideranca
list all applicant that have lideranca
in their names.
Company
A company is a Brazilian company in which congressperson have made expenses and claimed for reimbursement.
Retrieving a specific company
GET /api/company/<cnpj>/
This endpoit gets the info we have for a specific company. The endpoint expects a cnpj
(i.e. the CNPJ of a Company
object, digits only). It returns 404
if the company is not found.
Tapioca Jarbas
There is also a tapioca-wrapper for the API. The tapioca-jarbas can be installed with pip install tapioca-jarbas
and can be used to access the API in any Python script.
Installing
Using Docker
If you have Docker (with Docker Compose) and make, just run:
$ docker-compose up -d --build
$ docker-compose run --rm jarbas python manage.py migrate
$ docker-compose run --rm jarbas python manage.py ceapdatasets
You can access it at localhost:80
. However your database starts empty and you still have to collect your static files:
$ docker-compose run --rm jarbas python manage.py collectstatic --no-input
$ docker-compose run --rm jarbas python manage.py reimbursements <path to reimbursements.xz>
$ docker-compose run --rm jarbas python manage.py irregularities <path to irregularities.xz file>
$ docker-compose run --rm jarbas python manage.py companies <path to companies.xz>
You can get the datasets running Rosie or directly with the toolbox.
Also there are some cleaver shortcuts in the Makefile
if you like it.
If you have some issues with settings, maybe this section can be helpful.
Local install
Requirements
Jarbas requires Python 3.5, Node.js 6 with Yarn, and PostgreSQL 9.4+.
Once you have pip
and yarn
available install the dependencies:
yarn install
python -m pip install -r requirements.txt
lzma
module
Python's In some Linux distros lzma
is not installed by default. You can check whether you have it or not with $ python -m lzma
. In Debian based systems you can fix that with $ apt-get install liblzma-dev
or in macOS with $ brew install xz
โ but you mihght have to re-compile your Python.
Settings
Copy contrib/.env.sample
as .env
in the project's root folder and adjust your settings. These are the main variables:
Django settings
DEBUG
(bool) enable or disable Django debug modeSECRET_KEY
(str) Django's secret keyALLOWED_HOSTS
(str) Django's allowed hostsUSE_X_FORWARDED_HOST
(bool) Whether to use theX-Forwarded-Host
headerCACHE_BACKEND
(str) Cache backend (e.g.django.core.cache.backends.memcached.MemcachedCache
)CACHE_LOCATION
(str) Cache location (e.g.localhost:11211
)SECURE_PROXY_SSL_HEADER
(str) Django secure proxy SSL header (e.g.HTTP_X_FORWARDED_PROTO,https
transforms in tuple('HTTP_X_FORWARDED_PROTO', 'https')
)
Database
DATABASE_URL
(string) Database URL, must be PostgreSQL since Jarbas uses JSONField.
Amazon S3 settings
AMAZON_S3_BUCKET
(str) Name of the Amazon S3 bucket to look for datasets (e.g.serenata-de-amor-data
)AMAZON_S3_REGION
(str) Region of the Amazon S3 (e.g.s3-sa-east-1
)AMAZON_S3_CEAPTRANSLATION_DATE
(str) File name prefix for dataset guide (e.g.2016-08-08
for2016-08-08-ceap-datasets.md
)
Google settings
GOOGLE_ANALYTICS
(str) Google Analytics tracking code (e.g.UA-123456-7
)GOOGLE_STREET_VIEW_API_KEY
(str) Google Street View Image API key
Migrations
Once you're done with requirements, dependencies and settings, create the basic database structure:
$ python manage.py migrate
Load data
Now you can load the data from our datasets and get some other data as static files:
$ python manage.py reimbursements <path to reimbursements.xz>
$ python manage.py irregularities <path to irregularities.xz file>
$ python manage.py companies <path to companies.xz>
$ python manage.py ceapdatasets
You can get the datasets running Rosie or directly with the toolbox.
Generate static files
We generate assets through NodeJS, so run it before Django collecting static files:
$ yarn assets
$ python manage.py collectstatic
Ready?
Not sure? Test it!
$ python manage.py check
$ python manage.py test
$ yarn test
Ready!
Run the server with $ python manage.py runserver
and load localhost:8000 in your favorite browser.