GithubHelp home page GithubHelp logo

nfraprado / querido-diario Goto Github PK

View Code? Open in Web Editor NEW

This project forked from okfn-brasil/querido-diario

0.0 0.0 0.0 12.56 MB

📰 Brazilian government gazettes, accessible to everyone.

License: MIT License

Python 99.65% Makefile 0.35%

querido-diario's Introduction

Diário Oficial

Diário Oficial is the Brazilian government gazette, one of the best places to know the latest actions of the public administration, with distinct publications in the federal, state and municipal levels.

Even with recurrent efforts of enforcing the Freedom of Information legislation across the country, official communication remains - in most of the territories - in PDFs.

The goal of this project is to upgrade Diário Oficial to the digital age, centralizing information currently only available through separate sources.

When this project was initially released, had two distinct goals: creating crawlers for governments gazettes and parsing bidding exemptions from them. Now going forward, it is limited to the first objective.

Table of Contents

Development environment

The best way to understand how Querido Diário works, is getting the source and run it locally. All crawlers are developed using Scrapy framework. They provide a tutorial so you can learn to use it.

If you are in a Windows computer, before you run the steps below you will need Microsoft Visual Build Tools (download here). When you start the installation you need to select 'C++ build tools' on Workload tab and also 'Windows 10 SDK' and 'MSVC v142 - VS 2019 C++ x64/x86 build tools' on Individual Components tab.

If you are in a Linux-like environment, the following commands will create a new virtual environment - that will keep everything isolated from your system - activate it and install all libraries needed to start running and developing new spiders.

$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip install -r data_collection/requirements.txt
$ pre-commit install

In a Windows computer, you can use the code above. You just need to substitute source .venv/bin/activate for .venv/Scripts/activate.bat. The rest is the same as in Linux.

Run Gazette Crawler

After configuring your environment, you will be able to execute and develop new spiders. The Scrapy project is in data_collection directory, so you must enter in to execute the spiders and the scrapy command:

$ cd data_collection

Following we list some helpful commands.

Get list of all available spiders:

$ scrapy list

Execute spider with name spider_name:

$ scrapy crawl spider_name

You can limit the gazettes you want to download passing start_date as argument with YYYY-MM-DD format. The following command will download only gazettes which date is greater than 01/Sep/2020:

$ scrapy crawl sc_florianopolis -a start_date=2020-09-01

Troubleshooting

Python.h missing

While running pip install command, you can get an error like below:

module.c:1:10: fatal error: Python.h: No such file or directory
     #include <Python.h>
              ^~~~~~~~~~
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Please try to install python3-dev. E.g. via apt install python3-dev, if you is using a Debian-like distro, or use your distro manager package. Make sure that you use the correct version (e.g. python3.6-dev or python3.7-dev). You can check your version via python3 --version.

Contributing

If you are interested in fixing issues and contributing directly to the code base, please see the document CONTRIBUTING.md.

Acknowledgments

This project is maintained by Open Knowledge Foundation Brasil, thanks to the support of Digital Ocean and hundreds of other names.

querido-diario's People

Contributors

jvanz avatar irio avatar rennerocha avatar cuducos avatar giovanisleite avatar sergiomario avatar antoniovendramin avatar brunolellis avatar dannnylo avatar rodolfolottin avatar victor-torres avatar alfakini avatar pgarcias01 avatar vitorbaptista avatar alvarolqueiroz avatar feliperuhland avatar he7d3r avatar luzfcb avatar weibemoura avatar adorilson avatar beothorn avatar viniciusartur avatar tcurvelo avatar rodbv avatar hugoleodev avatar avelino avatar andersonberg avatar anapaulagomes avatar gusrabbit avatar gustavodemari avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.