GithubHelp home page GithubHelp logo

isabella232 / pricehistoryapi Goto Github PK

View Code? Open in Web Editor NEW

This project forked from 18f/pricehistoryapi

0.0 0.0 0.0 7.68 MB

Backend and web service for searching a database imported from CSV files (someday generally, but Prices Paid for now.)

License: Other

Python 100.00%

pricehistoryapi's Introduction

PricesPaidAPI

Backend and web service for searching a database imported from CSV files (someday generally, but Prices Paid for now.)

This is part of the PriceHistory (P3) Project

The PriceHistory (P3) project is market research tool to allow search of purchase transactions. It is modularized into 5 github repositories:

  1. PriceHistoryInstall,
  2. PriceHistoryGUI,
  3. PriceHistoryAPI,
  4. MorrisDataDecorator,
  5. PriceHistoryAuth.

To learn how to install the system, please refer PriceHistoryInstall project, which contains a Vagrant install script. That repo is actively under development in preparation of the Houston Hackathon.

The name "PriceHistory" is descriptive: this project shows you prices that have been paid for things. However, the name is applied to many projects, so "P3" is the specific name of this project.

INTRODUCTION

This is a project written by Robert L. Read, Martin Ringlein, Aaron Snow, and Gregory Godbout, who are all Presidential Innovation Fellows (Round 2). The purpose is to save the Federal Government money by providing a research tool that shows prices actually paid for commodities and perhaps services. This is completely analogous to market research that any buyer would do, although of course other avenues of market research should be used as well.

STATUS

Robert L. Read is the main engineer on this project, which was begun as Presidentional Innovation Fellowship project in July of 2013. At the time of this writing it is probably not documented well enough to use easily, though of course I will respond [email protected] to any questions.

The code is currently in use within the Federal government in a Beta mode, with a plan to roll out to many federal procurement officers by the summer of 2014. That website, however, will have data that the government considers sensitive, and will be available only to federal employees.

However, this code is in the public domain within the United States. I would love to see it used by someobody else--- for example, a city or state, that wanted to present a simple research tool for price transactions, or to provide their citizens transparency into their purchases.

WHY YOU MIGHT CARE

You might find some value in this code if you want an example of using Python code to load a SOLR index.

You might find some value if you want to see an approach to adapting mulitple file formats to loading into a single harmonized schema.

Today, this project is specialized, but we would like to factor out the "Prices Paid" part of this to make it a more general tool---a "Simple Heterogeneous Data Visualizer". You can help with that!

You might use this code as a starting point if you want to provide an API to a SOLR index but don't want to allow direct access against SOLR for security or other reasons.

WHAT IT DOES

This project loads a SOLR index from csv files (SolrLodr.py) and then presents a very simple API (which really a security restricition on the SOLR web service. The main thing it does is harmonize existing formats (2 at present) into a single database that can be rendered. It is thus an approach to using simple Python programs to harmonize data. Eventually, we hope it will be used on many data sets.

In docs/example.SOLR.schema.xml is an example of the schema.xml file that I use. This will be currently evolving.

THE SISTER PROJECT

Although we are presenting an Api which (when hosted) will let any programmer do as they please in querying the haronized databases, most users will use the GUI.

The GUI is in a project called PriceHistory GUI. That project has the best installation instructions, although this project is completely independent of that one. PriceHistoryGUI uses PriceHistoryAPI but PriceHistoryAPI depends only on the mode P3Auth (also one of my github repos) and open-source software which I did not write, mentioned at PriceHistoryGUI.

The easiest way to understand what PriceHistoryAPI does and play with it is to install PriceHistoryGUI---but that is not strictly necessary.

THE PHILOSOPHY

"Embrace Chaos." "Data Standardization is a Siren." "Transparency trumps Standardization." "Buyers are smart enough to understand when the data is problematic." "Anathematize Aggregation."

But seriously, folks, the idea is to make it work as much like Google as possible.

HOW YOU CAN HELP

  • I'm a Python neophyte, this code can probably be improved.
  • We need to modularize all the actual named fields to that this code could be generalized for some other purpose.
  • We need to write a push API so that new data can be pushed rather than delivered through a CSV file as today.

USING THE PROJECT

Note that abn example file exists in the "cookedData" file. In that directory you fill find the file FY14TX-pppifver-USASpending-5-0-0-0-1.csv. This file is simply a renamed (and unchanged) export from the site USASpending.gov, in this case for fiscal year 2014 and the state of Texas. It contains 23K records of completely public data for testing this code. Note that the "Units" field is constructed by this adapter, because USASpending.gov does not actually contain "number of units purchased data". The basic approach of PriceHistoryAPI at present is to read .CSV files like that one. Note that name of the file follows a strict convention that defines which adapter to use. If you would like to use this project for something else, create your own adapter, possibly using the same filename/versioning convention.

The basic idea of this project is to have many such .csv files using many different adapters. I would love for someone to donate an additional public data file to this project. The Federal government has many such files but considers them confidential.

Note that the cookedData directory in this project is NOT in the place it is configured to be in ppcofig.example.py. I have added it here only as an example.

This is a small data file, but when placed in the correct (not example) cookedData directory, will allow the execution of "python SolrLodr.py" to load 23K records of completely public data for testing your own site.

Public domain

This project is in the worldwide public domain. As stated in CONTRIBUTING:

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be relatedsed under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

pricehistoryapi's People

Contributors

robertlread avatar adp04c avatar konklone avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.