GithubHelp home page GithubHelp logo

propublica / django-collaborative Goto Github PK

View Code? Open in Web Editor NEW
94.0 26.0 18.0 9.06 MB

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

License: MIT License

Python 73.82% HTML 22.78% JavaScript 1.32% Shell 1.49% Dockerfile 0.52% Procfile 0.07%
django csv-import screendoor google-sheets journalism

django-collaborative's Introduction

Collaborate

ProPublica Google News Initiative

This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to setup for non-programmer, intuitive to use and highly extendable.

Here are a few use cases:

  • Collection of data from various sources (Google Form via Google Sheets, Screendoor, Private Google Spreadsheets)
  • An easy to setup data entry system
  • Organizing data from multiple sources and allowing many users to view and annotate it

The project is broken up into several components:

  • A system for transforming CSV files into managed database records
  • A default and automatic Django admin panel built for rapid and easy editing, managing and browsing of data
  • Customizable fields for tagging, querying, annotating and tracking tips

This is a project of ProPublica, supported by the Google News Initiative.

Documentation

We have a GitBook with a full user guide that covers running Collaborate, importing and refining data, and setting up Google services. You can read the documentation here.

Deploy it

Collaborate has builtin support for one-click installs in both Google Cloud and Heroku. During the setup process for both deployments, make sure to fill in the email, username and password fields so you can log in.

Heroku

Deploy

The Heroku deploy button will create a small, "free-tier" Collaborate system. This consists of a small web server, a database which supports between 10k-10M records (depending on data size) and automatically configures scheduled data re-importing.

Google Cloud

Run on Google Cloud

The Google Cloud Run button launches Collaborate into the Google Cloud environment. This deploy requires you to setup a Google Project, enable Google Cloud billing and enable the Cloud Run API. Full set up instructions are here.

This deploy does not automatically configure scheduled re-importing, but you can add it via Cloud Scheduler by following these instructions.

Once you've deployed your Cloud Run instance, you can manage your running instance from the Google Developer's Console.

Getting Started (Local Testing/Development)

Getting the system set up and running locally begins with cloning this repository and installing the Python dependencies. Python 3.6 or 3.7 and Django 2.2 are assumed here.

# virtual environment is recommended
mkvirtualenv -p /path/to/python3.7 collaborative
# install python dependencies
pip install -r requirements.txt

Assuming everything worked, let's bootstrap and then start the local server:

# get the database ready
python manage.py migrate

# create a default admin account
python manage.py createsuperuser

# gather up django and collaborate assets
python manage.py collectstatic --noinput

# start the local application
python manage.py runserver

You can then access the application http://localhost:8000 and log in with the credentials you selected in the createsuperuser step (above). Logging in will bring you to a configuration wizard where you will import your first Google Sheet and import its contents.

Production Deploy (Nginx/Docker)

If you want to deploy this to a production environment, we've included configuration templates and scripts for Docker and Nginx.

A Collaborate Dockerfile (the same one used by the Google Cloud Run deploy) can be found here:

deploy/google-cloud/Dockerfile

This creates a basic production environment with nginx and gunicorn. By default, it uses SQLite3, but you can configure the database by adding a DATABASE_URL environment variable. You can read more about the format for this variable here.

We also included a configuration script for plain Nginx deploys here:

deploy/google-cloud/django_nginx.conf

This can be copied to your main Nginx sites configuration directory (e.g., /etc/nginx/sites-available/).

In order to get auto-updating data sources, make sure to add a cron job that runs the following manage.py command:

manage.py refresh_data_sources

There's an example cron file that, when added to your /etc/crontab, will update data every 15 minutes:

./deploy/cron/refresh_data_sources

Note that if you use the above example, you probably want to add logrotate for the logfile the above cron config adds. You can find the logrotate script here (add it to /etc/logrotate.d/refresh_data_sources):

./deploy/logrotate/refresh_data_sources

django-collaborative's People

Contributors

brandonrobertz avatar dependabot[bot] avatar mtigas avatar rachelgli avatar schwanksta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-collaborative's Issues

Store/fetch Google OAuth credentials

Either prompt the user to paste in the keys, or grab them via API call during VM onboarding. Currently, Google OAuth only works via a settings.py configuration variable.

No matching distribution found for pkg-resources==0.0.0

๐Ÿ‘‹ Hello!

Was trying to get this cool thing running locally and hit a small snag โ€” pkg-resources==0.0.0 appears in the requirements.txt but doesn't appear to be install-able. I tried to run pip install pkg-resources separately and it didn't take either.

Change header name

When you open a given data set, we need to get rid of the "metadata" in the header. It should just be "Status" and "Partner."
Screen Shot 2019-07-08 at 5 28 21 PM

Documentation

We need to flesh out not just the README, but write actual documentation about how the system works and how people can use it. Assigning this to all of us.

Tags

Tags that can be assigned and filtered, especially using the actions menu

Implement delete/update data source models

Delete currently works, but you have to restart the process to get it to reflect the changes. The DB is also left dirty right now.

Update, currently, only works by re-importing the file with the same name.

Both of these are required to get to a stable deployable system.

Add error notification to auto-import

Auto-importing (via a cron job) works, but there are a few issues with it. Also, we need to handle errors gracefully and alert the user to any issues. Currently, if there's an error, the automatic re-import will just silently fail.

Google Oauth2 throws an error with http per docs

Hi, was just setting up Collaborate @themarshallproject and hit one hiccup following the directions on the setup-credentials page. I think on this line: https://github.com/propublica/django-collaborative/blob/master/templates/setup-credentials.html#L108

When I changed it from http to https, my users were able to authenticate through Google. Following the directions on the page, my users got this msg:
screen_shot_2019-10-10_at_3 35 56_pm

Hope this helps. Thanks for open sourcing this!

No Google authentication on main page for fresh install

Per @rachelgli , I'm filing this ticket:

I've got a fairly fresh install of Collaborate running on Heroku. So fresh it has no data, which may be part of the problem. But the base page at / does not have a way to authenticate with Google:
basepage

Clicking on the Collaborate icon leads to /admin/ , which redirects to /admin/login/?next=/admin/ , where I do get Google authentication:
adminpage

If this is your biggest problem you're in really good shape.

Inline editing of metadata in admin list view

We need the ability for users to be able to edit data source metadata from the list view, avoiding the annoyance of going into the detail view and clicking 'Save' for every change. This is complicated by django's quite restrictive list_editable functionality, though, and will most likely require some custom widgets.

This SO post goes over the problem and a (complex) solution used in another library: https://stackoverflow.com/questions/8398797/django-admin-foreign-key-field-in-list-editable

Screendoor project wouldn't load

Screen Shot 2019-08-27 at 2 12 55 PM

When I gave it the name DocHateSignUps it wouldn't upload, and I got this error. When I gave it an all lowercase name, it loaded correctly.

Column sort doesn't recognize lowercase words

When you sort a column A-Z, it doesn't alphabetize words that begin with a lowercase letter. So all of the lowercase words are put at the end of the list, and not alphabetized with the full list.

Screendoor data import failure

For a different Screendoor form, I got through all the steps until the final import step, when I got the same error message for every row.
Screen Shot 2019-06-06 at 5 05 36 PM

Customizable metadata/editable fields

We want people to be able to set their own fields for the metadata -- that way we can help reporters who want to, for example, do data entry work to create clean data. Since the metadata is already a dynamic model, can we create some workflow/admin/widget to let people design their own metadata fields?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.