GithubHelp home page GithubHelp logo

lahoffm / aclu-bail-reform Goto Github PK

View Code? Open in Web Editor NEW
8.0 6.0 11.0 45.75 MB

Webscraping, ETL and visualization of Georgia county jail statistics for ACLU bail reform project

License: MIT License

Python 73.92% Batchfile 0.84% Jupyter Notebook 25.23%

aclu-bail-reform's Introduction

ACLU of Georgia logo

ACLU bail reform project

Webscraping, ETL and visualization of Georgia county jail statistics.

Update

This project is no longer under active development. Email the organizer to access our raw data for research or activism.

Project Results

We created several Tableau worksheets (scroll down to "Metadata") and wrote the ACLU bail reform report in Jan 2018.

Project motivation - PDF

Like many aspects of the criminal justice system, the current approach to money bail is extremely problematic from a policy perspective, a legal perspective, and a moral perspective. In 2013, there were 609,464 jail admissions in Georgia. During an inmate count on August 3, 2017, there were 25,022 people incarcerated statewide who are awaiting trial, representing 65% of the total statewide jail population. Many are stuck in jail because they can’t afford to pay money bail to secure their release. Pretrial incarceration may last days, months, or even years with devastating consequences for people and their families. After arrest, individuals who cannot afford bail face an impossible choice: sit in jail for days, months, or possibly years as their case moves through the system; or plead guilty and give up their rights. This choice is faced regardless of an individual’s guilt or innocence. This is why, controlling for all other factors, being jailed pretrial due to unaffordable bail is the single greatest predictor of a conviction. A study shows that the non-felony conviction rate jumps from 50% for people released pretrial to 92% for those jailed pretrial. The system traps people of low- and moderate-income, tears families apart, leads to lost jobs, housing, and caregiver assistance, and hurts communities. Meanwhile, people who have money can make bail and get out. This is an unfair, discriminatory, and unconstitutional system.

Project goal

The goal of the coding project is to understand who is in jail where and for how long.
We have 15 priority areas around the state. We would like to have a monthly breakdown of who is in jail by race, charge, severity (misdemeanor or felony), and length of time in jail. We will use this data to help advocate and push for bail reform through community organizing and the legislature.

The 15 priority areas are

  • Athens-Clarke
  • Bibb
  • Chatham
  • Cobb
  • Columbia/Richmond
  • Dekalb
  • Dougherty
  • Glynn
  • Gwinnett
  • Hall
  • Henry
  • Fulton
  • Lowndes
  • Muscogee
  • Whitfield

How to run

  • Requires Python 3.6.3

aclu-bail-reform's People

Contributors

ableonard avatar jttew avatar kevinglover avatar lahoffm avatar motorizedwandoffury avatar rimjieun avatar tactical-foresight avatar zandergordan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

aclu-bail-reform's Issues

Bibb county webscraper

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

http://bibbsheriff.us/isearch/
“View today’s arrests” button (but “no records found”, I wonder when they update it? but no daily inmate roster or release info.

Henry county webscraper

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

http://www.henrycountysheriff.net/InmateInformation
Race and charge yes, but no release dates. Click on the "Search inmates" to view all inmates, without entering anything in the text boxes.

Brainstorm what initial analyses we should do

What analyses should we do first, and why?

I'd recommend we start with a few low-hanging fruit analyses, that will tell a compelling story.

  • easy to write code, given the webscraper CSV outputs (or, whatever Postgres database format we decide from #20)
  • results will be useful for ACLU based on project description. Either to help ACLU figure out what's going on in the jails, or to help ACLU lobby state legislators.

Put your suggestions below like

  • bar graph of total misdemeanor & felony bookings in each county in a given month. Would help ACLU target bail reform efforts toward counties with greatest total misdemeanor bookings, because that's where the most people could stand to benefit.

We can make them into separate Issues soon.

De-identify data so we can post it publicly

De-identify the names of people in the CSV files, and have a key such that only one or two people have the names. Then we can post the data online for other people to analyze however they want.

One option is a Python script that copies CSV files to a new folder, then replaces identifying information in fields inmate_firstname, inmate_lastname, inmate_middlename, inmate_address with a randomly generated key assigned to each unique value of these. So "John Alex Smith, 100 Anyroad GA" becomes "xoihje dskds sdkfj, weofiu". And someone else with first name "John", in ANY CSV file, would also be "xoihje". And also makes a key-value text file so we can translate it back if necessary. Or we could just make one key for each unique combo of those four fields and substitute it in. Or something like python's hashlib library.

Or it could be part of the ETL.

Cobb county webscraper

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

http://inmate-search.cobbsheriff.org/enter_name.shtm
Through the “Admissions” link you can get booking data over 3-day intervals (going back to 30 days in the past) that includes, race, charge and release date (but only if they got booked AND also got released during the last 30 days).
They have disclaimer that it’s not necessarily accurate.

Lowndes county webscraper

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

http://www.lowndessheriff.com/default.asp?P=current_inmates
Current roster with race and charge: yes. But only county that does NOT have the arrest date (and also doesn’t say if they are in jail pre-trial or not).
Release dates: no but maybe could infer this from looking at when people disappear from roster (but that could mean they are sentenced and not released).

Script to run scrapers daily

We need a script or online service to run our scrapers daily to make CSVs. If it's not an online job, it should be runnable on Windows because project leader @lahoffm has Windows laptop (unless I can get commitment from other team member to run it on their Linux/Mac daily for several months). Maybe Windows Task Scheduler or one of the "cron for windows"? Once we finish #20 we can expand script to run the "dump to Postgres database" too.

Script should also have

  • print start/stop time plus how much time each scraper took.
  • option for X number of retries if there's connection errors for a scraper
  • print out if any scraper failed to finish
  • run scrapers simultaneously in different threads (only if we find it takes a long time for each scraper to finish)

Richmond county webscraper

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

http://appweb2.augustaga.gov/InmateInquiry/AltInmatesOnline.aspx
Plain HTML data probably has current roster (but I’m confused about what “sentenced/released/blank” means). Includes race and charge but not release date. They say “sentenced/released” but not when, and there can be multiple charges.

Columbia county webscraper

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

http://p2c.columbiacountyso.org/jailinmates.aspx
They probably have current roster but not sure (“awaiting trial, sentenced” probably means in current roster or not but “awaiting trial” might mean released on bail? And release date that they specify could mean release date after they were sentenced?).

Columbia/Dougherty/Whitfield combo webscraper

@imanioliver Replaces #5, #7, #16. All counties show data in similar format using "P2C" system. Can write a single program to get each one. Be sure to account for minor variations between counties especially where in the main page they put the tables. We still want three output CSV files. Webscraper folder should be called 'columbia-dougherty-whitfield'.

Athens-Clarke county webscraper

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

enigma.athensclarkecounty.com/photo/jailcurrent.asp
https://www.athensclarkecounty.com/1299/Current-List-of-Inmates-Clarke-County-Ja
Inmate roster and arrests in last 7 days in plain HTML but they warn it’s “unofficial” and I should submit an open records request.
Data includes race and charge but not release date.

Postgres database table format

Start thinking about how to set up database tables, based on the info to be collected in webscraper CSVs. Can discuss this on Slack too. https://codeforatlanta.slack.com/ #aclu-bail-reform

You could put suggested table specifications in a README or other document or just write your general ideas.

Fulton county webscraper

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

http://justice.fultoncountyga.gov/PAJailManager/JailingSearch.aspx?ID=400
Inmate name search only.

https://performance.fultoncountyga.gov/stories/s/Jail-Report/ts5s-mgr9/
Jail report already shows how many misdemeanors/felonies plus length of time in jail. But not race. Maybe dataset owner could provide this if we emailed?

Write ETL code to transfer CSV files to database

We have a bunch of webscraped CSV files in a standardized format. Before we can make cool visualizations for ACLU, we need to put data into a database. The database will make it easier to query whatever data we need to make plots.

Here is the database specification, if you have ideas for improvement please comment.

This is a great Issue to practice SQL programming! If you don't know much SQL, you can learn as you go. Also a great Issue for text processing.

We will start with SQLite (despite early thoughts of using Postgres) so you can get started writing code without installing a bunch of software.

If you want to help, post below or hit me up on our Slack channel. We'll discuss specific tasks you can do, depending on what we already finished.

Glynn county webscraper

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

http://www.glynncountysheriff.org/
Daily PDF of jail population report, includes # of days in jail and race/charge. But no release dates.

Write unittest for webscraper CSVs

Write python script to check webscraper output is in standard CSV format as described in CONTRIBUTING.md. Should go into the "test" folder.

Things to test

  • column header names are in same order & identical as on the specification
  • commas within fields are escaped appropriately as described on the specification, and number of columns in each row matches total # column headers
  • all characters in CSV are characters we can type on keyboard. This is partial check for encoding errors. Also important when we use SQLite database because it doesn't automatically compare Unicode characters during string comparison, it just does ASCII comparisons. So it's best if all strings in the CSV are verified ASCII.
  • test each field to make sure the standard (or semi-standard) format is obeyed and we can parse it as expected. Examples: inmate_age is an int, url is a valid url, no semicolons except in fields that are allowed to have multiple entries (which indicates semicolons weren't converted to colons), charges/severity/current_status have same number of semicolons (unless field is blank)
    • county_name
    • timestamp
    • url
    • inmate_id
    • inmate_lastname
    • inmate_firstname
    • inmate_middlename
    • inmate_sex
    • inmate_race
    • inmate_age
    • inmate_dob
    • inmate_address
    • booking_timestamp
    • release_timestamp
    • processing_numbers
    • agency
    • facility
    • charges
    • severity
    • bond_amount
    • current_status
    • court_dates
    • days_jailed
    • other
    • notes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.