GithubHelp home page GithubHelp logo

cityofaustin / atd-data-publishing Goto Github PK

View Code? Open in Web Editor NEW
13.0 13.0 3.0 708 KB

Python scripts for Austin Transportation's ETL tasks

Python 99.64% Dockerfile 0.36%
python docker open-data intelligent-transportation-systems bcycle transportation mobility

atd-data-publishing's Introduction

transportation-data-publishing

This repo houses ETL scripts for Austin Transportation's data integration projects. They're written in Python.

Quick Start

  1. Clone this repository to your host: git clone https://github.com/cityofaustin/transportation-data-publishing

  2. Create your secrets.py and drop it into transportation-data-publishing/config following the template in fake-secrets.py

  3. If setting up ESB inegration add certificates to transportation-data-publishing/config/esb

  4. Run scripts as needed, or deploy to a Docker host with transportation-data-deploy

About the Repo Structure

These scripts load B-Cycle tripe data from an Austin B-Cycle Dropbox folder to data.austintexas.gov.

Config holds configuration files needed for the various scripts. secrets.py belongs here -- see fake_secrets.py as a reference.

These scripts modify data in our Data Tracker application, and support its integration with other applications.

These scripts publish transportation data to data.austintexas.gov and the City's ArcGIS Online organization site.

These are the dedicated files for publishing traffic study data, as described in the wiki.

Contributing

Public contributions are welcome! Assign pull requests to @johnclary.

Deployment Pipeline

We are currently using CircleCI to automatically rebuild the container on every update to this repo, and autonomously deploy that image to Docker-hub. When working in the production or master branches, there are some considerations to keep in mind:

  • You will need someone to review your changes for Production or Master branches.
  • The production branch is tagged as latest in dockerhub, which means it will be the default image and "source of truth".
  • Ideally you will want to merge to master, which will cause CircleCI to create a master tag in dockerhub, which can be used for additional testing or troubleshooting.
  • Any code merged to production, should be production-worthy already. Test thoroughly in master or any other branch before merging in any additional code.

Development

Feel free to create a new branch, and commit/push as many times as you need. The pipeline will create a new docker image (if it does not exist), or update the existing image in dockerhub. Your branch name will be used as the identifying tag in docker hub.

For example:

Say you create a branch with the name 123-atd-updatedcode, as soon as the branch is created CircleCI will begin building the docker image, and tag it as atddocker/atd-data-publishing:123-atd-updatedcode , then it will upload it to Docker hub. If there is already an image with that tag, it will simply update it.

License

As a work of the City of Austin, this project is in the public domain within the United States.

Additionally, we waive copyright and related rights in the work worldwide through the CC0 1.0 Universal public domain dedication.

atd-data-publishing's People

Contributors

joeyl6 avatar johnclary avatar mddilley avatar sergiogcx avatar tillyw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atd-data-publishing's Issues

Street Segment Updater is Failing to Match Known Segments

There are a number of segments in the Data Tracker that have not been updated by street_seg_updater.py. Hence, they are missing street name attributes from the AGOL layer:
2006688, 2010328, 2012287, 2012828, 2012833, 2012841, 2022803, 2023254, 2031897, 2037765, 2040196, 2041973, 2042707, 5280279, 5418521

I spot checked a few and they do exist in the AGOL layer, so there's something else going on here.

Inconsistent code structure under __name__ = __main__

Function calling and argument processing parts after name = main are different from one script to another. This part should only takes care of calling existing functions and catching exceptions. As the result, it would be ideal if they maintain a same structure.

ranking issues

  • clear old value from recent nulls
  • only rank 'completed'--?
  • verify in database that scores will reset to 0 if values are cleared from the evaluation

Add logging for Socrata upsert functions

Right now it looks like that UpsertData function fails silently in certain situations. In particular, there was one such case when Socrata returned the following error message for a manual upsert command for a single row:

{'message': 'Cannot update a synced dataset manually'}

but a script using UpsertData continued to run without giving any indication that the data wasn't being added to the data set.

Auto-Assign Asset Records to 311 SRs

When an SR includes a lat/lon an asset type, automatically fetch and assign the nearest asset record. TMC users and on-call technicians will still need to review the asset attachment, but this should significantly reduce time spent on this task.

GRIDSMART Comm Status Checks Are Failing

We are currently failing to check comm status for GRIDSMART due to some apparent data quality issues. Reached out to Joey and Joshil:

I see a few issues. First, we have duplicate choices/typos in the Gridsmart Status menu:

Adv Replacement
Bad
Cage
Good
Needs Investigating
Needs Investigation
No
RMA
Repairs
Unknown
Unkonwn

Second, what do these status mean? And what to they mean in relation to the detector statuses:

OK
BROKEN
UNKNOWN
REMOVED

Lastly, we need every Gridsmart detector to have Detector Status populated. There are ~20 that do not have Detector Status. Please review and add a status, and/or let's discuss a new if needed. I'm going to make it required now in the data tracker.

Also, we have 14 units with a status of OK that do not have port numbers. This is an issue because we cannot check comm status without port.

Migrated to atd-data-tech #1688

github caching = fail

github cdn seems to cache files for 5-10 minutes. what this means is that, although data may be published at very small intervals (e.g. 1 min) to the open data portal, these only 1 in 10ish of these events will be logged on github, which means the 'last updated' date in our application will not be accurate if it is pointing to the github logs.

options:

  1. publish logs on socrata rather than github
  2. do a complete refresh of dataset on socrata instead of an upsert
  3. leave as is and understand that the 'last updated' date in our application is +/- 10 minutes

socrata change detection issues

if a field in the source db is changed to Null, data_helpers.DetectChanges will not detect a change vis a vis the destination db!

Socrata Pub Fails with "Illegal field name" Error

We recently started receiving an unfamiliar error message from the publisher API. It is a 400 error with message "Illegal field name sent: [field name]".

We occasionally POST records to the publisher endpoint whose payload may include fields that do not exist in the destination dataset. In the past these fields were ignored, but it seems there may be new validation rules in place.

Affected datasets:

B-Cycle Trip Data is Failing

B-Cycle Trip publisher is failing with a TypeError and has not been updated since November 2018.

After investigation, it appears that new columns were updated to the November data, and furthermore there are no current trip records. Will reach out to BCycle about this.

rename branch?

transportation-data-pushers
transportation-data-publishers
transportation-data-publishing

backup data: needs a home

nightly backup is dumping to the KITS app server.

we can move the data:

to a network drive
to another access web app database (hmm)
github (hmm)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.