GithubHelp home page GithubHelp logo

haphut / sujuikodb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from datarttu/sujuikodb

0.0 1.0 0.0 3.58 MB

Database backend for transit network & observation data.

PLpgSQL 97.91% Shell 1.92% Dockerfile 0.17%

sujuikodb's Introduction

Sujuiko database

Title picture: general idea.

The purpose of this tool is to enable analysis of public transport service in the past at a more detailed level than transit stops, and, on the other hand, at the general transit network level. This is done by aggregating historical high-frequency positioning data (HFP) points projected onto a transit network consisting of links and nodes. Currently, this very experimental tool is developed using the HSL bus and tram data only, as well as the related OpenStreetMap network and GTFS data. Development is based on an HFP raw data set from November 2019, but in future the tool should support analyzing much longer periods of time.

By using the tool, one should be able to answer the following questions, for example:

  • A given transit line seems to get always delayed between stops A and B. What is happening along the network route between A and B? Where do the vehicles tend to stop, and for how long at each location?
  • What is the average speed and its standard deviation of transit vehicles that went through a given link, or from point A to point B using the same path on the network?
  • How many seconds do transit vehicles in average remain stopped at a given intersection?
  • How do these measures vary in time, e.g., between two different weeks, working days vs. weekends, or peak hours vs. off-peak?
  • What are the "worst" links on the network causing delays, in relation to a weight measure such as number of scheduled trips per link?

The general idea is that we do not store every single HFP observation as a point geometry but instead, we store a reference to a network link that is used by a set of successive observations, and an array of relative time and location values for those observations along the link. As a result, we can inspect time-space profiles along sequences of links like these ...

Example of a driving time profile

Example of a speed profile

... and, finally, aggregate that data by link, route or other common attributes.

To reliably project the HFP data onto the network links, we also need a reasonable schedule and trip path model. This enables finding a "planned entity" for each HFP journey by route, direction and start time. The schedule model in turn enables analyzing planned-vs-operated time metrics such as schedule adherence, in addition to metrics that come purely from the HFP and network data (such as average operating time through a set of links).

Requirements

The tool is being developed on an Ubuntu 18.04 LTS server with 2 TB of disk space, 8 GB RAM and 2 CPU cores. I have not tested anything on Windows.

You will need the following, either installed on the machine or by using Docker:

  • PostgreSQL 13. This is the core of the tool. Also majority of the data transformation logic is written in PLPGSQL.
  • PostGIS 3. Core of the geometries and spatial operations.
  • pgRouting 3. Core of the routable network model.
  • TimescaleDB 2. Supports partitioning and managing large amounts of the HFP data.

Unfortunately, there is no automated deployment process at least yet.

Deployment

After cloning the git repo:

  • Create .env file in the project root. See .example_env.
  • Create data/ directory in the project root. Populate it with the raw data for the database.
  • ... TODO
  • Run docker-compose up, after this you should have a database instance up and running.
  • Connect to the database e.g. with psql from your machine, and run ddl scripts from db/ddl.

Data model & data import and transformation

The tool uses a logig called "extract-load-transform", i.e., I try to avoid additional tools and libraries for data wrangling beforehand, and instead I have included as much as possible of the application logic inside the Postgres database. You can read more about this philosophy e.g. from The Art of PostgreSQL by Dimitri Fontaine. This is obviously not the best possible approach, since PLPGSQL language has turned out powerful yet very inflexible in many ways, but I decided to give it a try and we're on that track now.

Data sources

  • HFP from Digitransit - this is real-time data but you can collect it yourself e.g. with this hfplogger tool (a bit messy). I am using a data dump from HSL.
  • HSL GTFS dump as it was on 1 November 2019. This provides us with the transit schedules as well as stop point locations.
  • OpenStreetMap subset: the current tool needs a dataset containing ways in Helsinki region that are used by any bus route in OSM (bus relations - this is far from perfect but better than downloading all the possible highways, 99 % of which could be but are not really used by any bus route) and railway=tram ways (surprisingly good data, though some intersections have not been modeled properly).

Note that the current tool does not support incremental changes to the transit schedule (GTFS) and network (OSM) model. It assumes that these data remain static, which is a relatively realistic assumption when we analyse data from a single month only. For longer term analyses, it should of course be possible to import new GTFS data on top of older data, and to account for network changes as the time goes by (e.g. building sites, new or moved stops, etc).

Database model in more detail in db README.

Author

Arttu Kosonen, @datarttu, HSL / Aalto University, 2019-2020. Developing this tool is essentially part of my master's thesis in Spatial Planning and Transportation Engineering.

sujuikodb's People

Contributors

datarttu avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.