An PHP implementation of the Regulations.gov API that will eventually attempt do some smart mirroring.
This is a requirement of using the API. So lets put it front and center:
“Regulations.gov and the Federal government cannot verify and are not responsible for the accuracy or authenticity of the data or analyses derived from the data after the data has been retrieved from Regulations.gov.” “This product uses the Regulations.gov Data API but is neither endorsed nor certified by Regulations.gov.”
This should run out of the box with php 7
First install composer if you have not already..
From the command prompt
$ composer install
This is a simple project it is composed of several php functions that understand how to work with documents and dockets from regulations.gov
To run the code, you must apply for a Regulations.gov API key here: https://regulationsgov.github.io/developers/ The resulting API key needs to be stored in regulations.gov.api.yaml the file regulations.gov.api.yaml.template exists to show you how to build that file.
This code also expects that you will store the underlying data on Google Cloud Storage (googles competitors to Amazon S3) in a way that allows the public download of your files. For the time being, the scripts will not work without google cloud credentials to store the resulting data. Google likes to distribute its authentication credentials in a JSON file, which this project expects to be named google_keyfile.json
So you need to setup the following two configuration files properly
- regulations.gov.api.yaml - for your regulations.gov API credentials
- google_keyfile.json - for your google cloud storage credentials.
There are entries in .gitignore to prevent you from commiting these files, if you are contributing code, this should keep you from making the mistake and publishing your API credentials.
- recurse_and_mirror_docket.function.php understands how to accept a docket_id and download all of the corresponding documents in bulk
- recurse_and_mirror_documents.function.php usually documents are json files only, but they can also be pdfs/html. This can download all of the things.
- ./data/ is where the files you temporarily scrape live on the local machine.
- mine_docket_data.php loops over the json files found in /data/ for specific mentions of document_ids with content not found in the bulk json download.. and downloads them using recurse_and_mirror_documents.function.php
- search_for_dockets.php accepts a command line argument for a topic to search regulations.gov for, and downloads the resulting search results. By default this process excludes comments and other citizen uploaded content, so that you are only searching for docket_ids for the topics that are being discussed by the regulatory process directly.
- mine_searches_data.php loops over the search json files and then uses recurse_and_mirror_docket.function.php to download all of the documents in the docket.
So basically, the workflow goes...
- use search_for_dockets.php to search for the type of content that you want
- use mine_searches_data.php to get the dockets from the search results
- use mine_docket_data.php to ensure that you have all of the pdfs and html that is associated with your documents.
Eventually we will have some code in here to respect the API rate-limiting
php search_for_dockets 'your search term'
php mine_searches_data.php
php mine_docket_data.php
These are all command line programs http://linuxcommand.org/
Take a look at the Data Directory README
The spiritual predecesor to the Mirrulations Project