GithubHelp home page GithubHelp logo

civic-json-worker's Introduction

civic-json-worker

Flask app for tracking opengov projects in Chicago.

NOTE: civic-json-worker is being expanded to work for any city. Development of this project has moved to Code for America's civic-json-worker fork.

Objective

Keep track of all the civic tech projects worked on at the Chicago open gov hack night. Eventually this could track all civic projects in Chicago, or even nationally.

The plan

Curate a simple list of Github URLs for civic tech projects and leave filling in the rest to the Github API.

Looking at other civic tech listing projects like this that have gone stale, the real sticking point is in maintaining and curating the list of projects. If we can make the maintenance part as simple as possible, we have a greater chance that this thing will live on.

So humans will be responsible for one thing: deciding what gets tracked.

[
    "https://github.com/dssg/census-communities-usa",
    "https://github.com/open-city/open-gov-hack-night",
    ...
]

The rest is up to computers. When the /update-projects/ path is hit on this app, it loops over the projects in the list and captures something like this:

[
    {
        "contributors": [
            {
                "avatar_url": "https://0.gravatar.com/avatar/5e5eb188a0e4d3a7c8f38ee0fc3a6cbd?d=https%3A%2F%2Fidenticons.github.com%2Fd8c3ef3ed05a213a7225bf5e6e46101a.png", 
                "contributions": 51, 
                "html_url": "https://github.com/derekeder", 
                "login": "derekeder"
            }, 
            {
                "avatar_url": "https://2.gravatar.com/avatar/813d23c289052af417387a9270d0da31?d=https%3A%2F%2Fidenticons.github.com%2Ffa9357bb22fd993fc9795619c7e1d4f7.png", 
                "contributions": 46, 
                "html_url": "https://github.com/fgregg", 
                "login": "fgregg"
            }, 
            {
                "avatar_url": "https://2.gravatar.com/avatar/1d0c5faee140af87d7d6967bc946ecc6?d=https%3A%2F%2Fidenticons.github.com%2F44e80db9ed8527f429c969e804432b0f.png", 
                "contributions": 9, 
                "html_url": "https://github.com/evz", 
                "login": "evz"
            }
        ], 
        "contributors_url": "https://api.github.com/repos/datamade/csvdedupe/contributors", 
        "created_at": "2013-07-11T14:23:33Z", 
        "description": "Command line tool for deduplicating CSV files", 
        "forks_count": 2, 
        "homepage": null, 
        "html_url": "https://github.com/datamade/csvdedupe", 
        "id": 11343900, 
        "language": "Python", 
        "name": "csvdedupe", 
        "open_issues": 8, 
        "owner": {
            "avatar_url": "https://2.gravatar.com/avatar/0a89207d38feff1dcd938bdc1e4a9b5e?d=https%3A%2F%2Fidenticons.github.com%2F3424042f8cb2b04950903794ad9c8daf.png", 
            "html_url": "https://github.com/datamade", 
            "login": "datamade"
        }, 
        "updated_at": "2013-09-20T06:32:39Z", 
        "watchers_count": 26
    },
    ...
]

This data is hosted on a publicly available endpoint as JSON with a CORS configuration that allows it to be loaded via an Ajax call, for use on this site for listing/sorting/searching projects. bonus: anyone can use this JSON file for their own purposes. Details on setting up a CORS configuration for nginx can be found here

Benefits

By pushing everything on to Github, we will have very little to maintain, content-wise, as administrators. Simultaneously, we will encourage people to:

  • sign up for Github if they aren't already
  • keep their projects open source (we can't crawl private repos)
  • make sure their description and website urls are up to date
  • use the issue tracker

Setup this app

Propping this sucker up for oneself is pretty simple. Howver, there are some basic requirements which can be gotten in the standard Python fashion (assuming you are working in a virtualenv):

$ pip install -r requirements.txt

Besides that, there are a few environmental variables that you'll need to set:

$ export FLASK_KEY=[whatever you want] # This is a string that you'll check to make sure that only trusted people are deleting things
$ export GITHUB_TOKEN=[Github API token] # Read about setting that up here: http://developer.github.com/v3/oauth/
$ export S3_BUCKET=[Name of the bucket] # This is the bucket where you'll store the JSON files 
$ export AWS_ACCESS_KEY=[Amazon Web Services Key] # This will need access to the bucket above
$ export AWS_SECRET_KEY=[Amazon Web Services Secret] # This will need access to the bucket above

Probably easiest placed in the .bashrc (or the like) of the user that the app is running as rather than manually set but you get the idea...

Want to help? Have ideas to make this better?

The issue tracker is actively watched and pull requests are welcome!

civic-json-worker's People

Contributors

derekeder avatar jpvelez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

civic-json-worker's Issues

Create dummy user to backup data

As fun as it is to have my github commit history augmented by a robot committing on my behalf every four hours, I am constantly reminded that this is happening because in one hour out of four, my github user account is over its API limit. (This will also effect https://github.com/open-city/chicago-river-sewage because my account is being used to push the scraped data to github every so often.)

Not only is this slightly annoying for me, it also means that the actual work that this worker is trying to accomplish is almost certainly not completing. There may be a more efficient way to leverage the API that doesn't require that many calls so maybe that's really what we ought to do. In the meantime, making a dummy user would be appreciated.

@derekeder Any thoughts?

Owner contributions to repos might not be counting

Just added a repo for which I am the owner and sole contributor (real community effort, I know but I was testing the add project endpoint, OK?). Even after running the update manually, it didn't update my user totals on the people page. This might mean that the owner counts are not getting added in there...

useful endpoints

add a project

$.post( 'http://civic-json-app.herokuapp.com/add-project/', data={'project_url': "https://github.com/my-name/my-project/"}, function(resp){ console.log(resp) });

delete a project

$.post( 'http://civic-json-app.herokuapp.com/delete-project/', data={'project_url': "https://github.com/my-name/my-project/"}, function(resp){ console.log(resp) });

add contributors to project_details.json

will require making another API call to contributors_url

"contributors": [
{
    "login": "hunterowens",
    "avatar_url": "https://2.gravatar.com/avatar/ff16180e6d9715d5ae17526cb83d7cfb?d=https%3A%2F%2Fidenticons.github.com%2Fc3663e10b721e5d713a55aebf8621901.png",
    "html_url": "https://github.com/hunterowens",
    "contributions": 74
  },
  {
    "login": "evz",
    "avatar_url": "https://1.gravatar.com/avatar/1d0c5faee140af87d7d6967bc946ecc6?d=https%3A%2F%2Fidenticons.github.com%2F44e80db9ed8527f429c969e804432b0f.png",
    "html_url": "https://github.com/evz",
    "contributions": 55
  },
  {
    "login": "fishmanadam",
    "avatar_url": "https://2.gravatar.com/avatar/d1529c1f159f49be15cf5fde0366b599?d=https%3A%2F%2Fidenticons.github.com%2Fdb534c76db466f7764d7e745900be88d.png",
    "html_url": "https://github.com/fishmanadam",
    "contributions": 7
  },
  {
    "login": "jtwalsh0",
    "avatar_url": "https://0.gravatar.com/avatar/05e1d117da7a62657f617bd553f4ed83?d=https%3A%2F%2Fidenticons.github.com%2F31d750398f38453e376cf435deb5fea9.png",
    "html_url": "https://github.com/jtwalsh0",
    "contributions": 2
  },
  {
    "login": "matthewgee",
    "avatar_url": "https://1.gravatar.com/avatar/1dbb7a239d59f1effc696f6424050a5a?d=https%3A%2F%2Fidenticons.github.com%2F9bf88927e24ab526ed285da976feecf0.png",
    "html_url": "https://github.com/matthewgee",
    "contributions": 1
}
]

people.json

Let's create a JSON file that shows a list of people with total number of contributions of the projects we track.

Includes invalid-email-address Fake User in Project Contributors List

Originally reported in the open-gov-hack-night issue tracker under issue 14, searching for "invalid email address" in the People search of open-gov-hack-night will link to a real Github user:

Invalid Email Address in People search

Invalid Email Address user page

It's an account provided by Github to indicate when there is an invalid author email address for a commit. In this case, it was found on 93 commits of look-at-cook; the email address in question was devnull@localhost.

Since this tracks a publicly curated list of projects, it's out of our control to make sure all future project contributors are also valid. Instead, we should filter out this user account from the list of contributors.

Want to set a projects page up for Oakland, CA

Hey guys, I've cloned this and want to hook it up for Oakland but I'm a bit of a newbie. For the civic worker app, it seems like I need to get an Amazon EC2 account/space where I can store the Flask app and the output JSON as a resource.
Sound like the right approach?

Updating projects seems to have mixed results

As of right now, hitting /update-projects/ is emptying out the detail listing. I think this may have to do with github responding with a non-200 status code when we attempt to fetch the info about repos. I have a theory (which I will test in the morning) that this might be hitting a rate limit (they tend to be rather restrictive about non-authenticated requests).

A better approach here would be to not dump out the file before rebuilding it (duh). Another thing to do would be to actually authenticate the request so that we aren't rate limited.

Anyways, research is needed

XSS error when trying to delete project

Trying to delete deprecated app:

$.post( 'http://civic-json-app.herokuapp.com/delete-project/', data={'project_url': "https://github.com/smartchicago/chicago-health-atlas"}, function(resp){ console.log(resp) });

Getting an XSS error
screen shot 2013-09-21 at 10 35 34 am

Modify civic-json-worker API to work for any city / group

This civic tech project tracking API is generically useful infrastructure. Instead of having people deploy their own versions of this API, which has proven difficult, we should just modify this service to work for multiple civic hacking groups and cities.

Our API is already up and running. Civic hacking groups would just need to submit their repo urls to the endpoint from their websites. The system would hit the Github API and get additional repo details, just like it's doing for Chicago. Brigades could then request a list of their projects to display on their own project pages.

Here's what would need to happen, as far as I can tell. (@evz, please jump in):

  • The add project endpoint would need to require 'city' and 'group' params for url submissions, and store these along with the url.
  • Add city and group field to the complete project_details.json file. Project pages could then pull only projects for only their cities / groups.

organizations.json

Another JSON file for tracking organizations and total number of repositories (based on what we're tracking).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.