GithubHelp home page GithubHelp logo

cstaff18 / pipeline-data-exploration Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 2.31 MB

Exploratory Analysis of Pipeline Accidents

Jupyter Notebook 94.42% HTML 5.58%
switch-point pipeline-accidents pymc

pipeline-data-exploration's Introduction

Pipeline-Data-Exploration

Data

The dataset, provided by the Department of Transportation's Pipeline and hazardous materials safety administration via Kaggle.com, provides records of pipeline accidents, date, time and location of the accident and other fields like cause of accident, injuries, oil/gas spilled, etc.

Location Mapping of Pipeline Accidents

Using python's folium library I created a heatmap of all recorded accidents. You can view the interactive maps along with the source code through NBviewer with the links below.

All Accidents from Jan 2010 through Jan 2017

Monthly Accidents from Jan 2010 to Jan 2017

With these graphs you can see the most common locations for spills such as Houston, New York and Oklahoma, but also how they change over time.

Time Analysis of Pipeline Accidents

It appeared that accidents were becoming more frequent over time so I wanted to dig in deeper and figure out what was going on. Below shows the yearly break down of accident counts.

Yearly Accident Count

Accidents were certainly increasing but I wanted to know if the increase indicated a significant change in accident rate. I performed a T-test to test whether the accident rate in 2016 was statistically different from 2010 with a significance level of 95%.

Results:

T-Statistic: P-Value: Significant Change?
-1.085 0.030 Yes

Switch Point Analysis

Lets take a look at the daily accident count for the whole range of our data set.

Daily Accident Count

Now lets assume that the increase in accident rate happened on a single day. With the PyMC3 library we can use a bayesian switch point analysis to find the accident rates (lambda) before and after the switch point as well as what day the switch point occurred (tau).

Single Switch Point

The results indicate that there are two distinct accident rates, around 0.95 accidents per day and 1.2 accidents per day. The switch point is around 2012 with the most likely day, day 1092, corresponding to December 27th 2012. There is another clusterings of likely switch dates that correspond to roughly end of year 2013. This could potentially indicate that there are two switch dates. Let's see how our model reacts if we add another lambda and tau term to simulate two step changes in the frequency of accidents.

Two Switch Point

Acknowledgements

Kaggle and DOT for data.

Cam Davidson-Pilon for the excellent reference on PyMC3 and switch point analysis.

pipeline-data-exploration's People

Contributors

cstaff18 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.