GithubHelp home page GithubHelp logo

conveyal / r5 Goto Github PK

View Code? Open in Web Editor NEW
273.0 14.0 72.0 51.96 MB

Developed to power Conveyal's web-based interface for scenario planning and land-use/transport accessibility analysis, R5 is our routing engine for multimodal (transit/bike/walk/car) networks with a particular focus on public transit

Home Page: https://conveyal.com/learn

License: MIT License

Java 97.37% Python 0.15% HTML 0.20% CSS 0.17% JavaScript 2.07% Dockerfile 0.02% Shell 0.03%
transportation transit planning gtfs modeling accessibility

r5's Introduction

Conveyal R5 Routing Engine

R5: Rapid Realistic Routing on Real-world and Reimagined networks

R5 is the routing engine for Conveyal, a web-based system that allows users to create transportation scenarios and evaluate them in terms of cumulative opportunities accessibility indicators. See the Conveyal user manual for more information.

We refer to the routing method as "realistic" because it works by planning door-to-door trips at many different departure times in a time window, which better reflects how people use transportation systems than planning a single trip at an exact departure time. R5 handles both scheduled public transit and headway-based lines, using novel methods to characterize variation and uncertainty in travel times. It is designed for one-to-many and many-to-many travel-time calculations used in access indicators, offering substantially better performance than repeated calls to older tools that provide one-to-one routing results. For a comparison with OpenTripPlanner, see this background.

We say "Real-world and Reimagined" networks because R5's networks are built from widely available open OSM and GTFS data describing baseline transportation systems, but R5 includes a system for applying light-weight patches to those networks for immediate, interactive scenario comparison.

Please note that the Conveyal team does not provide technical support for third-party deployments. R5 is a component of a specialized commercial system, and we align development efforts with our roadmap and the needs of subscribers to our hosted service. This service is designed to facilitate secure online collaboration, user-friendly data management and scenario editing through a web interface, and complex calculations performed hundreds of times faster using a compute cluster. These design goals may not align well with other use cases. This project is open source primarily to ensure transparency and reproducibility in public planning and decision making processes, and in hopes that it may help researchers, students, and potential collaborators to understand and build upon our methodology.

While the Conveyal team provides ongoing support and compatibility to subscribers, third-party projects using R5 as a library may not work with future releases. R5 does not currently expose a stable programming interface ("API" or "SDK"). As we release new features, previous functions and data types may change. The practical effect is that third-party wrappers or language bindings (e.g., for R or Python) may need to continue using an older release of R5 for feature compatibility (though not necessarily result compatibility, as the methods used in R5 are now relatively mature).

Methodology

For details on the core methods implemented in Conveyal Analysis and R5, see:

Citations

The Conveyal team is always eager to see cutting-edge uses of our software, so feel free to send us a copy of any thesis, report, or paper produced using this software. We also ask that any academic or research publications using this software cite the papers above, where relevant and appropriate.

Configuration

It is possible to run a Conveyal Analysis UI and backend locally (e.g. on your laptop), which should produce results identical to those from our hosted platform. However, the computations for more complex analyses may take quite a long time. Extension points in the source code allow the system to be tailored to cloud computing environments to enable faster parallel computation.

Running Locally

To get started, copy the template configuration (analysis.properties.tmp) to analysis.properties. To run locally, use the default values in the template configuration file. offline=true will create a local instance that avoids cloud-based storage, database, or authentication services. By default, analysis-backend will use the analysis database in a local MongoDB instance, so you'll also need to install and start a MongoDB instance.

Database configuration variables include:

  • database-uri: URI to your MongoDB cluster
  • database-name: name of the database to use in your MongoDB cluster

Building and running

Once you have configured analysis.properties and started MongoDB locally, you can build and run the analysis backend with gradle runBackend. If you have checked out a commit (such as a release tag) where you are sure all tests will pass, you can skip the tests with gradle -x test runBackend.

You can build a single self-contained JAR file containing all the dependencies with gradle shadowJar and start it with java -Xmx2g -cp build/libs/r5-vX.Y.Z-all.jar com.conveyal.analysis.BackendMain.

Once you have this backend running, follow the instructions to start the analysis-ui frontend. Once that the UI is running, you should be able to log in without authentication (using the frontend URL, e.g. http://localhost:3000).

Creating a development environment

In order to do development on the frontend or backend, you'll need to set up a local development environment. We use IntelliJ IDEA. The free/community edition is sufficient for working on R5. Import R5 into IntelliJ as a new project from existing sources. You can then create a run configuration for com.conveyal.analysis.BackendMain, which is the main class. You will need to configure the JVM options and properties file mentioned above.

By default, IntelliJ will follow common Gradle practice and build R5 using the "Gradle wrapper" approach, in which operating-system specific scripts are run that download and install a specific version of Gradle in the projet directory. We have encountered problems with this approach where IntelliJ seems to have insufficient control over the build/run/debug cycle. IntelliJ has its own internal implementation of the Gradle build process, and in our experience this works quite smoothly and is better integrated with the debug cycle. To switch to this appraoch, in the Gradle section of the IntelliJ settings, choose "Build and run using IntelliJ IDEA" and "Run tests using IntelliJ IDEA". Below that you may also want to choose "Use Gradle from specified location" to use your local system-wide copy.

Structured Commit Messages

We use structured commit messages to help generate changelogs.

The first line of these messages is in the following format: <type>(<scope>): <summary>

The (<scope>) is optional and is often a class name. The <summary> should be in the present tense. The type should be one of the following:

  • feat: A new feature from the user point of view, not a new feature for the build.
  • fix: A bug fix from the user point of view, not a fix to the build.
  • docs: Changes to the user documentation, or to code comments.
  • style: Formatting, semicolons, brackets, indentation, line breaks. No change to program logic.
  • refactor: Changes to code which do not change behavior, e.g. renaming a variable.
  • test: Adding tests, refactoring tests. No changes to user code.
  • build: Updating build process, scripts, etc. No changes to user code.
  • devops: Changes to code that only affect deployment, logging, etc. No changes to user code.
  • chore: Any other changes causing no changes to user code.

The body of the commit message (if any) should begin after one blank line.

From 2018 to 2020, we used major/minor/patch release numbering as suggested by https://www.conventionalcommits.org. Starting in 2021, we switched to major/minor release numbering, incrementing the minor version with regular feature releases and the major version only when there are substantial changes to the cluster computing components of our system. Because there is no public API at this time, the conventional definition of breaking changes under semantic versioning does not apply.

r5's People

Contributors

abyrd avatar ansoncfit avatar bmander avatar bobkarreman avatar buma avatar csolem avatar evansiroky avatar hannesj avatar jkoelewijn avatar jordenverwer avatar kodiakhq[bot] avatar kpwebb avatar landonreed avatar laurentg avatar mattwigway avatar michaz avatar semantic-release-bot avatar t2gran avatar trevorgerhardt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

r5's Issues

Respect pattern bundles (common trunks) after frequency conversions

When we frequency-convert a common trunk, in our current system we lose the ability to represent the synchronization on that trunk (i.e. the intentional interleaving of vehicles serving different trip patterns to provide reliably lower headways on that common trunk).

If we group the trips by trip patterns (rather than whole routes possibly containing many different patterns) when performing the frequency conversion, our Monte Carlo simulation will capture this effect to some degree, in that it will generate some schedules with excellent interleaving of the common trunk, and others with very poor interleaving (bunching of vehicles on the common trunk due to simultaneous arrival of vehicles serving different patterns). The trick is knowing whether the existing system is consciously interleaved, or whether interleaving depends on chance, and whether the future system would be consciously interleaved. If we don't know this for sure, it's best to count on the Monte Carlo randomization of schedules, though that yields a wider variance in accessibility results. (Which in this case is more truthful, since we don't know what specific schedule, if any, will be used in practice.)

If we do know for a fact that the common trunk is consciously interleaved and will continue to be so under the future scenarios, It would be possible to group the relevant patterns into a bundle, and randomize their schedules together when performing the Monte Carlo simulation, adding some constant offset to each constituent branch. However we may not want to apply perfect interleaving, because existing real-world scheduled operation does not necessarily provide perfect interleaving. We’d be assuming the system will be run better than it is today, causing artificial improvements under the scenarios.

But say we want to replicate current imperfections in interleaving to minimize erroneous, artificial improvements due only to frequency conversion. Analyzing the existing bundle to derive those offsets is not always obvious (at which stop are they measured?) and when only some (not all) of the constituent frequencies are modified by a scenario, it's not obvious how to scale those empirical offsets. When the frequency of the entire bundle is modified in a scenario the adjustment process is more straightforward, so one solution is that when a trunk is identified, we only allow adjusting the combined frequency of the whole trunk in scenarios (all constituent patterns adjusted together).

If that kind of certainty is not available, then we should just say “no idea what the schedule will be” and use Monte Carlo approach applied independently at the trip pattern level, with uncertainty appearing as variance in the results.

So to clarify, I propose only two different options: A) the scenario identifies an interleaved bundle of trip patterns (which might just be a GTFS route), along with the specific stop at which their interleaving is to be measured (or specifies perfect interleaving at that specific stop) in which case frequency modifications are only applied to the whole bundle; or B) each trip pattern is frequency-converted independently, and Monte Carlo randomization takes care of the interleaving possibilities, in which case each pattern can have its frequency adjusted independently.

Reverse optimization

We don't need it for Analyst, but for static site generation/modeify use we need to be able to reverse optimize trips. We can do this at the target stop level for now because the rest will be handled in the client.

Deal better with unlinked stops

Once a TransportNetwork is built, we convert it into Transitive format. The last stage is to write out the stops, but layer.streetVertexForStop.get(sidx) can return -1 if the stop is not linked to the street.
This leads to the following exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at gnu.trove.list.array.TIntArrayList.get(TIntArrayList.java:283)
    at com.conveyal.r5.streets.VertexStore$Vertex.getLat(VertexStore.java:81)
    at com.conveyal.r5.transitive.TransitiveNetwork.<init>(TransitiveNetwork.java:69)
    at com.conveyal.r5.publish.StaticServer.main(StaticServer.java:43)
    at com.conveyal.r5.R5Main.main(R5Main.java:29)

Filtering trips and patterns unexpectedly changes results

Given a search day, com.conveyal.r5.analyst.scenario.InactiveTripsFilter removes all trips and patterns that are not running on that day during the search time window. It should have no effect on the analysis results, but we see a small change in areas farther away from the origin point: travel time increases by 2-5 minutes when the filter is used.

This implies that the basic search is making use of some routes that the filter is removing. Since none of these routes should be active, this implies an oversight in the basic search process.

One consideration is that searches may continue up to two hours past the end of the departure time window, but that is taken into account when filtering the trips. Even deactivating the time window filtering and full-pattern filtering (filtering only individual trips based on their service ID) the discrepancy is still visible.

We do not need Joda time library

I looked into the code and saw that Joda LocalDate is used only in JodaLocalDateSerializer/Deserializer otherwise it is just imported and not used. In OTP it made sense to use it but since Java 8 has its own LocalDate classes I would like to know if we specifically need joda LocalDate or is java LocalDate enough. Since it also means one library less.

Park and Ride support

We need to detect park and rides in OSM data and connect them to nearby stations. Rather than spawning new "incomparable layers" of Dijkstra search as in OTP, we should probably just add explicit edges from the parking lots to nearby stops.

This is a priority feature for @chourmo for Marseille.

Paths include a lot of weird rides

image

(note that it rides first away from the destination then comes back, rather than just waiting a while and catching the next bus towards the destination).

After discussion with @abyrd we figured out that this is due to not keeping a separate state after each RAPTOR round. Suppose lines are arranged in the graph in order 1...n, and line 2 goes towards the destination and 1 away from it. The RAPTOR search will first encounter line 1 and ride it away from the destination. Then it will board line 2 (in the same round) and ride it towards the destination. If frequencies are low enough, it will catch the same trip it would otherwise have caught by boarding at the origin, so will not re-board at the origin, hence the crazy path (which is still optimal in the earliest-arrival sense). This is also why some paths have a huge tangle of routes, because the router is finding different ways to kill time somewhere along the route before catching the bus that will take you to the destination.

Single precision grid files

Currently we're using double-precision floats to store opportunity density. The files shipped over the network are zipped and reasonably small (a few hundred kB), but the grids can take up hundreds of MB when uncompressed. This is not excessive for a modern web app, but still not necessary and contributes to high resource consumption.

Single-precision floating point numbers are sufficiently accurate and would cut memory consumption in half. Single-precision can represent all integers with six or less significant digits, and can store values up to the 10^38 range, which is probably sufficient to count the individual atoms in our analysis grid cells.

Park and ride support

This is conceptually similar to #43 and I think can be implemented in the same way. We first do a driving search, then a multiorigin walking search from the reached park and rides.

Optimizing saving of EnumSets (edge flags)

Currently flags are saved as ArrayList of EnumSets. EnumSets are great for that since all information is saved internally as one long (bitset basically) if number of enums is less then 64 (which it is currently). But problem is that it isn't serialized as such. Even default java serialization of EnumSet is saved as type of enum and array of Enums.

In FST (which we use) for serialization enumSet is saved as list of ordinal numbers of each enum together with class metadata for each enum in EnumSet.

I compared this together with sving enums as bitset in int/long and

enumSetFST is current way of saving enums.
intFlags is previous one (as troveIntList), longFlags (as troveLongList) if we need more then 32 flags.
FST is fast serialization Ser is java serialization. It can be seen that just enumflags are 26% of whole file size since all the *.dat files except network.dat has only the flags saved.

First I wrote new FST serializer which saves Enumset as int flag. File size did get lower from 8,6 M to 5,1. Then I wrote custom arrayList serializer which saves ArrayList of EnumSets as just primitive int array of flags each flag is bitset of enumset. This makes file size even smaller then previous Trove intMap (2,3 M) vs (2,8 M). The problem is that it uses twice as much memory for reading and writing since it needs to have flags in memory twice once as arraylist of enumset and as int of bitsets of flags.

What do you think? It probably needs some more testing and error handling.
gist with serializers.

usage: (you need one or the other)

FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
conf.registerSerializer(EnumSet.class, new MyEnumSetSerializer(), true);
conf.registerSerializer(ArrayList.class, new MyEnumSetListSerializer(), false);
FSTObjectOutput out = getConf().getObjectOutput(outputStream);

My branch with tests just start it with com.conveyal.r5.R5Main point --test path to graph directory and it will load saved graphs and save it in different formats.

Sizes:

-rw-r--r-- 1 mabu users 8,6M nov 25 18:05 enumSetFST.dat
-rw-r--r-- 1 mabu users  23M nov 25 18:05 enumSetSer.dat
-rw-r--r-- 1 mabu users 2,8M nov 25 18:05 intFlagsFST.dat
-rw-r--r-- 1 mabu users 2,3M nov 25 18:05 intFlagsSer.dat
-rw-r--r-- 1 mabu users 4,1M nov 25 18:05 longFlagsFST.dat
-rw-r--r-- 1 mabu users 4,5M nov 25 18:05 longFlagsSer.dat
-rw-r--r-- 1 mabu users  32M nov 24 18:06 network.dat
-rw-r--r-- 1 mabu users  47M nov 25 11:01 network_ser.dat
-rw-r--r-- 1 mabu users  58M sep 14 07:31 slovenia-latest.osm.pbf

Custom FST enumSet serializer:

-rw-r--r-- 1 mabu users 5,1M nov 25 18:09 enumSetFST.dat

Custom FST ArrayList EnumSet serializer

-rw-r--r-- 1 mabu users 2,3M nov 25 18:10 enumSetFST.dat

How to handle different modes in ProfileRequest?

Currently profileRequest has:

  • accessModes
  • egressModes
  • directModes
  • transitModes

Response has:

  • modes in streetSegment
  • modes in streetEdgeInfo
  • modes for transit

Currently all those modes are the same type enumMode with (WALK, BICYCLE, CAR, TRANSIT)

In GraphQL I changed this to more specific modes.
Access, egress, StreetSegment and directModes to LegModes(WALK, BICYCLE, CAR, BICYCLE_RENT, CAR_PARK), transitModes to TransitModes(BUS, RAIL...).
NonTransitModes(WALK, BICYCLE, CAR) for StreetEdgeInfo.

Should I change ProfileRequest to support those different Enums or not?

Modeify-style profile routing

R5 needs to work for customer facing trip planning, including Modeify-style profile routing (probably a brute-force approach by exhaustive combinatorial banning of trips). This is probably the only way we want to do trip planning going forward.

Ride transit in circles to get past walk limit

Here is a case where a particular line has only a single stop in a particular area, and that stop is within walking distance of both the origin and the destination, however the destination is not within walking distance of the origin:

image

So r5 rides the 5A to the end of the line (Dulles Airport) then takes it back to the stop it started from so that it can egress from it.

At least it's comforting that the rules preventing egress from the board stop are working.

GraphQL API

We need a stable trip planning API in order to replace OTP with R5 in Modeify.

Separate frequency / schedule trips

We’ve currently got one type (TripSchedule) for both schedule trips and frequency trips. These two kinds of trips are all mixed together in a TripPattern. This means code must check whether it’s seeing a freq or schedule trip each time, and there are three null fields (headways, start and end times) in every scheduled trip object.

We could just make separate classes for ScheduleTrips and FrequencyTrips, and store them in separate lists.

@mattwigway comments: This seems more readable and compact. In fact it would probably make sense to only store one type of trip in each pattern. That gets rid of a lot more checks.

We have to think about whether there’s a problem potentially having two patterns with the same stop pattern, one freq and one sched. But it seems harmless. We’d already do it if they had different route ids.

Stop trees get built twice when running StaticMain

Building the stop trees takes long enough already, we don't need to do it twice:

23:23:19.870 [main] INFO  com.conveyal.r5.publish.StaticMain - Computing metadata
23:23:19.879 [main] INFO  c.c.r.a.WebMercatorGridPointSet - Creating web mercator pointset for transport network with extents Env[3.2442475 : 5.7251103, 47.5333508 : 50.2076948]
23:23:19.895 [Thread-1] WARN  c.a.services.s3.AmazonS3Client - No content length specified for stream data.  Stream contents will be buffered in memory and could result in out of memory errors.
23:23:19.955 [main] INFO  c.conveyal.r5.streets.LinkedPointSet - Linking pointset to street network...
23:23:19.955 [Thread-2] WARN  c.a.services.s3.AmazonS3Client - No content length specified for stream data.  Stream contents will be buffered in memory and could result in out of memory errors.
23:24:54.891 [main] INFO  c.conveyal.r5.streets.LinkedPointSet - Creating travel distance trees from each transit stop...
23:33:36.268 [main] INFO  c.conveyal.r5.streets.LinkedPointSet - Done creating travel distance trees.
23:33:36.269 [main] INFO  c.conveyal.r5.streets.LinkedPointSet - Done linking pointset to street network. 869155 features unlinked.
23:33:36.269 [main] INFO  c.conveyal.r5.streets.LinkedPointSet - Creating travel distance trees from each transit stop...

How to handle date/time in profile request?

Currently we have date as joda localDate in unknown timezone and time in fromTime and toTime as seconds since midnight. We have two options to interpret this:

  • As date and time in graph timezone
  • As date and time in UTC

This is necessary in routing since some turn restrictions are based on time and if we have graphs with only one timezone first option is OK but if we have graph on a border of timezones second option is probably better since then all of the times will be converted to UTC.

Another question is do we need to specify time as number of seconds since midnight? It would be nicer IMHO if we could specify it as time HH:MM:SS and timezone. Or even date time together.

Crazy tangles of routes in pure frequency network

We've gotten to a point where generating paths from scheduled networks works pretty well. One really cool side effect of the range-RAPTOR optimization is that transfer compression (what we formerly attempted to do with "reverse optimization") is no longer necessary, for the following reason. The proper way to do transfer compression is to run a forward search, find the earliest possible arrival time at the destination, then run a reverse search (not just on the same routes, but a bona fide reverse search) to find options that arrive at the destination at the same time. @abyrd says that you then need to run one more forward search, I suppose to address the case of someone who rides an infrequent feeder bus to a frequent rail line and then gets on another infrequent bus; the reverse search will have the user wait at the first transfer and take the last possible train to just catch the second feeder bus, but that's not what anyone would do (unless they were optimizing on something else, e.g. pleasantness of the waiting area; I used to catch BART in San Francisco, ride to San Francisco International Airport, wait there in the indoor station as long as possible, and then catch the last possible BART train that would connect to the commuter rail, but I digress).

In any case, range RAPTOR takes care of all of that for us, at least if you ignore edge effects. The end result of the complicated transfer compression procedure described above is that you run a forward search some minutes in the future from the original departure time. In range RAPTOR, we first run a search at the end of the time window, and step back by one minute at a time, using the same output tables. We've written the domination rules so that trips found at earlier departure minutes only dominate trips previously found at later departure minutes (recall that the algorithm is stepping backwards through departure minutes), so un-optimized trips would not dominate the trip that would be found were we to do transfer compression, which we have already found by running the search at the later departure time.

And, as LeVar Burton would say, don't take my word for it—it actually works. Here are some nicely compressed trips:

image

(I might have done a dance when I figured all of this out.)

The big caveat is the edge effect, which I said to ignore earlier. For the trips at the end of the time window, we may not have had sufficient searches later in the time window to produce an optimized trip. The simple solution is to just run the search for a few minutes past the end of the desired time window, and then discard the results for the last few minutes. I believe the theoretical bound on how far you have to run the search to guarantee optimal paths is (max headway) * (number of rounds - 1) as that's the most that the reverse search could move the departure time I believe. That is quite a large number. However, acceptable results can probably be attained using something like an additional one hour at the end of the window. Keep in mind that even if the paths are non-optimal, the arrival time is still optimal (we just find ways that leave too soon but arrive at the same time).

This is all well and good when you have a scheduled network where range RAPTOR works. When you have a frequency-based network, range RAPTOR stops working, because we use a different Monte Carlo randomized schedule at each minute. The obvious and naïve solution is to just run a full multiple-hour departure window search for each Monte Carlo draw. This will likely be too slow because it's basically squaring the number of searches we currently do (one search per Monte Carlo draw, with range-RAPTOR disabled). @abyrd points out that we should never say something will or will not be too slow because who knows what properties the JVM and CPU prefetch logic will find to exploit, but I strongly suspect it will be slow.

Without the transfer compression, we get some pretty wild hyperpaths. The following images show the hyperpaths displayed on a map using Transitive, a Marey plot at the bottom which is basically useless with this many options, and a schematic plot at the left which shows riding lines in blue and transferring between them in red, with time increasing as you go down.

image

image
(something is wrong with the visualization here, the graph on the left shows 7 unique paths, while the callout on the Marey plot says there are only 4, see conveyal/browsochrones#15)

image

There are two obvious problems here. The first, which @abyrd and I hypothesized theoretically this morning, is that there are many possible transfer stops between two lines, and one or another may be optimal depending on the phasing of the lines (recall that we transfer greedily, so will transfer at the first place we can to get on the correct target vehicle; as the phasing changes the "first place we can" changes). This is most evident in the third image. This can be solved by (a) clustering stops and (b) filtering paths based on whether they use the same sequence of patterns, and picking a single representative path for each sequence of patterns.

The other problem, and the one that's more of a bear to solve, is what I'll term the "many-short-rides" problem; it's most evident in the second image. There may be many ways to ride just a few blocks to reach the line you want to take most of the way to the destination. Of course, maybe this is not a problem in the algorithm at all but rather a problem in the transit system - it may be the case that, given uncertain phasing, all of those short bus legs could be optimal. It might make sense to increase the walk distance to access transit, though; I suspect many short legs would be eliminated if we did that.

Walk all the way across the graph

I think that we have an off-by-one error mapping the pixels, and the far-right pixels are being seen as being on the far left:

image

The segment that goes from Dulles to Annapolis is a walk leg, so I think the pixel in Annapolis (which is probably the easternmost extent of the graph) is being seen as being on the west edge (which would be the 5A stop at Dulles). This could also be a bug in conveyal/browsochrones.

Frequency support is broken

After running a search on a frequency-based network I get paths that have 30+ components. Obviously there's something wrong here.

Multiple board stops

We frequently have many possible places to board a single line:
image

The reason for this is pretty clear. Suppose that you have a 20-minute walk limit to transit and can reach 3 stops on a given line:

image

Obviously stop 2 is the closest, but most of the time the router will board at stop 1 because it's earlier on the trip and thus is reached first, and usually it doesn't matter which stop you go to as you'll wait there a while anyhow and catch the same vehicle. However, occasionally walking to stop 1 means you just miss a vehicle, so the router will board the earlier vehicle at stop 2. So both paths are present in the output.

Depending on the differences in path lengths and the relative speed of the vehicles, it might occasionally make sense to walk to stop 3 so you can just catch a vehicle you would otherwise have missed at stop 2.

There's no easy heuristic solution; you don't always want to board at the first stop you reach, nor at the last (in fact, the situation here is common enough, with a vehicle running in a straight line, that frequently you will want to board somewhere in the middle).

I think the right solution is to have another array in RaptorState that has the walk distance/time. We don't need to add it to the Pareto front (that would make computation much slower), we just need to use it as a tiebreaker when optimizing on travel time.

Paths with no components

We're getting some paths to stops that have no components in the static site output (i.e. the path does not board transit). This appears to be happening when a stop is near the origin and is in the initial stops. However, we don't want to allow the planner to walk to stop, not board transit, and then egress from that stop (this just allows the planner to bypass the walk limit in a non-optimal way). This is probably a confusion between bestTimes and bestNonTransferTimes.

Find suboptimal paths

In order to replace the routing engine in Modeify, we need to be able to find suboptimal paths. The approach @abyrd and I had discussed was to find paths through repeated banning. The problem with this is that it becomes a combinatorial problem - suppose you have a path A -> B -> C, you then have to ban A, B, C, AB, AC, BC and then repeat. We have not conclusively shown that this is infeasible. I suspect though that we'll need a heuristic to choose what to ban.

Stop tree files are way too big

2.8 MB (zoom 10) in Indianapolis, 3 MB (zoom 9) in Champagne-Ardenne. That's much too large. We're already delta-coding and using minutes instead of seconds. A few additional possibilities:

  • Use zoom 9 instead of zoom 10. This might be fine if we use conveyal/jsolines for isochrones, otherwise isochrones will be too blocky. They're still too big in Champagne.
  • Use a filter/prediction algorithm, à la PNG. We're currently not exploiting the fractal nature of the stop trees - the stop tree for x, y is very similar to (x + 1), y, but we're not exploiting that. We might be able to compress it a lot better if we stored reachable pixels from each stop instead of the other way around, and then inverted it on the client. We already have to scan over the whole file anyhow, because we have to index it.

Frequency support

We need to support frequencies (including Monte Carlo simulation &c.) in r5.

Bike rental support

We need to support bike rental. There are two ways to do this:

  1. The way it was historically done in OTP, where there are incomparable states and we do the whole search at once. It complicates the search code somewhat as we have to store multiple states at each vertex.
  2. Instead of running the walking and biking search at once, we could first run a walking search, then a multi-origin cycling search from all reached stations, then a multi-origin walking search using the same output tree as was used in the original search from the reached bikeshare stations.

I kind of prefer the second option (and not just because I thought of it) because it neatly removes the need for multiple states per vertex, which can be a debug nightmare.

Do not delta-code grids

We've decided to uniformly use floating point numbers for our opportunity grids. The values of successive pixels are delta-coded, an approach inspired by PBF and PNG formats which pre-filter data to facilitate gzip's job. This approach is suited to integers but probably is not suited to floating point numbers, which have multiple equivalent representations. I'm also concerned that error will accumulate and the decoded value will not return to exactly zero in empty parts of the map.

On the other hand such pre-filtering doesn't seem to be necessary, since gzip is remarkably effective on the raw data, yielding files that are only a few hundred kB.

Empirical tests confirm that delta coding is not necessary: when I don't delta code, the gzipped grid files are smaller than when I do delta code. So we can get simpler and smaller both.

Store paths

We need a basic version of this for #4, since we have a need there to show the subnetwork that was used to reach a particular point. We also need this for #7. In theory this is simple: we just store a previous pattern/state pointer each time we board transit.

One really simple way to do this is to simply store for each transit stop (a) what pattern was used to get there and (b) what stop was the previous stop.

There's a pretty big caveat here. Suppose there is a faster one-transfer way to get to a stop and a slower one-seat ride. From that stop you then board another vehicle that you could easily have reached by taking the slower route. Most people would take the one seat ride. In some ways this just represents the assumption of always greedily optimizing travel time. I'm not sure how much of a problem this actually is; there are a probably relatively few places where a one-transfer ride is both faster than and not objectively better than the zero-transfer ride.

Exact itineraries

We need to be able to produce exact itineraries for options. I think this would be best done by just interleaving the schedules after the fact, because it may be that a particular option is available every ten minutes but is only optimal every 30.

This requires bundling of patterns, for instance we want a commute from Van Ness-UDC to Shaw-Howard in DC to show the schedule for the red line transferring to either the green or yellow line.

Workers react very slowly, accumulating priority tasks

The first time a new worker receives some single-point analysis work it reacts very slowly. It seems to be accumulating requests from the UI (which is regularly retrying the analysis). Once the TransportNetwork and the linked point set are ready, the worker then repeats the work as many times as it has accumulated requests, stating each time that the task no longer exists. The transcript below has been edited to remove long-polling chatter and edge length errors for clarity.

11:59:09.525 [pool-1-thread-1] INFO  c.c.r5.transit.TransportNetwork - Writing transport network...
11:59:11.712 [pool-1-thread-1] INFO  c.c.r5.transit.TransportNetwork - Done writing.
11:59:15.346 [Thread-1] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 3, batch: 0
11:59:17.825 [main] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 4, batch: 0
11:59:34.097 [main] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 5, batch: 0
11:59:49.963 [main] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 6, batch: 0
12:00:05.920 [main] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 7, batch: 0
12:00:22.174 [main] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 8, batch: 0
12:00:54.104 [main] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 10, batch: 0
12:01:10.043 [main] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 11, batch: 0
12:01:26.739 [main] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 12, batch: 0
12:01:35.813 [main] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 13, batch: 0
12:01:39.946 [pool-1-thread-1] INFO  c.conveyal.r5.streets.LinkedPointSet - Linking pointset to street network...
12:01:40.077 [pool-1-thread-1] ERROR com.conveyal.r5.streets.Split - Length of first street segment was greater than the whole edge (9311 > 9299).
12:01:42.308 [main] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 14, batch: 0
12:01:45.471 [Thread-1] INFO  c.c.r5.analyst.cluster.AnalystWorker - Waiting tasks: high priority: 14, batch: 0
12:01:49.297 [pool-1-thread-1] INFO  c.conveyal.r5.streets.LinkedPointSet - Creating travel distance trees from each transit stop...
12:01:52.763 [pool-1-thread-1] INFO  c.conveyal.r5.streets.LinkedPointSet - Done creating travel distance trees.
12:01:52.763 [pool-1-thread-1] INFO  c.conveyal.r5.streets.LinkedPointSet - Done linking pointset to street network. 9842 features unlinked.
12:01:52.766 [pool-1-thread-1] INFO  c.c.r.p.RepeatedRaptorProfileRouter - Beginning repeated RAPTOR profile request.
12:01:52.785 [pool-1-thread-1] INFO  c.c.r.p.RepeatedRaptorProfileRouter - Found 430 transit stops near origin
12:01:52.788 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 0
12:01:53.192 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 15
12:01:53.315 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 30
12:01:53.419 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 45
12:01:53.526 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 60
12:01:53.629 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 75
12:01:53.738 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 90
12:01:53.841 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 105
12:01:53.931 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - calc time 1.146sec
12:01:53.931 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker -   propagation 0.07sec
12:01:53.931 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker -   raptor 1.076sec
12:01:53.931 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - 5 rounds
12:01:54.024 [pool-1-thread-1] INFO  c.c.r.p.RepeatedRaptorProfileRouter - Profile request finished in 1.257 seconds
12:01:54.390 [pool-1-thread-1] INFO  c.c.r5.analyst.cluster.AnalystWorker - Task 0 was not marked as completed because it doesn't exist.
12:01:54.390 [pool-1-thread-1] WARN  c.c.r5.analyst.cluster.AnalystWorker - Handling single point request via normal channel, side channel should open shortly.
12:01:54.390 [pool-1-thread-1] INFO  c.c.r5.analyst.cluster.AnalystWorker - Handling message com.conveyal.r5.analyst.cluster.AnalystClusterRequest@5ccb3a57
12:01:54.390 [pool-1-thread-1] INFO  c.c.r5.transit.TransportNetworkCache - Finding or building a TransportNetwork for ID fd6ab21a6ca3f48ccf3961e74467b358
12:01:54.390 [pool-1-thread-1] INFO  c.c.r5.transit.TransportNetworkCache - Network ID has not changed. Reusing the last one that was built.
12:01:54.390 [pool-1-thread-1] INFO  c.c.r5.analyst.cluster.AnalystWorker - Applying scenario...
12:01:54.390 [pool-1-thread-1] INFO  c.c.r5.analyst.cluster.AnalystWorker - Done aplying scenario.
12:01:54.390 [pool-1-thread-1] INFO  c.c.r.p.RepeatedRaptorProfileRouter - Beginning repeated RAPTOR profile request.
12:01:54.402 [pool-1-thread-1] INFO  c.c.r.p.RepeatedRaptorProfileRouter - Found 430 transit stops near origin
12:01:54.405 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 0
12:01:54.549 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 15
12:01:54.655 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 30
12:01:54.762 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 45
12:01:54.870 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 60
12:01:54.973 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 75
12:01:55.081 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 90
12:01:55.183 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - minute 105
12:01:55.274 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - calc time 0.872sec
12:01:55.274 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker -   propagation 0.046sec
12:01:55.274 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker -   raptor 0.826sec
12:01:55.274 [pool-1-thread-1] INFO  com.conveyal.r5.profile.RaptorWorker - 5 rounds
12:01:55.354 [pool-1-thread-1] INFO  c.c.r.p.RepeatedRaptorProfileRouter - Profile request finished in 0.964 seconds

[RAPTOR search logs repeat 14 times]

12:04:32.442 [pool-1-thread-1] INFO  c.c.r5.analyst.cluster.AnalystWorker - Task 14 was not marked as completed because it doesn't exist.

Note that between saving the TransportNetwork at 11:59:11 and beginning to link the pointset to the network at 12:01:39 there is no sign of progress. It's not clear what the worker is doing for well over two minutes.

Do we need multiple BIKE_PATH flags?

Currently we have BIKE_PATH flag for bike paths in R5. I wrote a labeler which adds flags to edges.
But since there are 3 types of bike paths:

  • Road with bike lanes
  • road with bike tracks this is basically road with cycleways next to it but tagged as one way instead of three.
  • bike track itself (cycleway)

I currently add flag to all three so I'm wondering would we have 3 different flags for that.

Removing irrelevant trips causes changes in travel time

When working on #3 I made a filter that removes all trips whose service IDs are not active, and which are outside the time search time window. Removing these trips that theoretically have zero effect on the search caused small (1-10 minutes) but definite changes in travel time to areas far away from the origin point.

U-turns at splitter vertices

We need to support turn restrictions. This seems simple enough but it gets a little bit complex when you realize that it would require multiple states per vertex. For example, consider the following situation:

image

The intersection in the middle has two incomparable states. One solution is to allow multiple states per vertex, but as outlined in #43 that is undesirable. There's another way, which used to be in OTP, which was to use an edge-based graph of the street network, where every street segment becomes a node and every turn possibility becomes an edge, as described in this paper, which is incidentally also the source of the above image. @abyrd remarks in no uncertain terms that he never wants to do this for the whole graph again. However, we don't have to do it for the whole graph, just for intersections where there are turn restriction (of which there are only about 500,000 in the entire world). So those particular intersections would (topologically) look like this (of course all the vertices depicted below would be coincident):

image

Turn restrictions would then be encoded by changing permissions on the turn edges. Note that there is no vertex in the center, so the situation depicted above would not pass through the same vertex twice (it would traverse both straight-through edges as well as the approach vertices to each, but would not visit any vertex twice).

This approach is nice because it can be extended to more complex turn restrictions without huge messes of path dependence (a la path parsers). For example, consider the following (from the OSM wiki:

image

In this location, you cannot turn right (slight right) from the main freeway (b) if you have come from the left entrance (a) (because they don't want people cutting quickly across many lanes). This is very difficult to represent with the standard graph structure as it basically requires a state machine. However, it can be simply represented with the edge based structure, by treating the entire intersection as a single unit and just connecting up all the possible entrances and exits:

image

So the main roadway becomes two edges because there are two possible states on it, the state where you have come from the main roadway and that where you have come from the left entrance. I think the code would just take more complex restrictions, find all of the edges that enter and exit the restriction, and create a node for each, then connect up edges based on what is legal.

It gets messy though when we have to make a split in a complex intersection like this, with duplicated edges (for the simple case it doesn't matter because the edges are infinitely short, we can just omit them from the spatial index and they won't be split, but these more complex intersections may take up significant area.

More logging on analyst worker

The first time you do an analysis via the analyst web UI, it reports "processing query" for a long time. Even when all components are running locally, there is no sign from any component that any long-running operation is in progress, or even that it has started. This leaves you wondering if it's actually doing anything or just stalled. We should log the beginning and end of any potentially slow one-off operation.

All Non walkable street are removed when islands are removed

I was debugging missing motorways and found that since removeDisconnectedSubgraph uses streetRouter which uses WALK mode by default ever since permissions were added non walkable streets are removed. This is mostly motorways. When I build graph without removing islands motorways are back.

When I added support for permission in TN previously same thing happened. I added switchMode parameter to request which used any mode available during routing. Commit if this would be OK I can do the same here.

In OTP edges which are supposed to be prunned are first checked if they have traversable permissions and are removed only if they can't be walked or cycled. If I understand the code correctly.

How to add pedestrian permissions on street with sidewalks?

I'm improving permission support and I would like to know how would we want to support sidewalks.
For example street with sidewalks can have three types of sidewalk tags:

  • sidewalk:left
  • sidewalk:right
  • sidewalk:both
  • sidewalk:none

Should street get pedestrian permissions in all first three cases in both direction since pedestrian permission usually doesn't obey oneways or should we add pedestrian permissions only in direction it is given. (right for forward edge, left for backward edge both for both).
Bicycle permissions obey direction (cycleway:left=lane, cycleway:right=track etc) because cycling also obeys one ways.

Per-edge Elevation Profiles

[As of late 2021 this is a development priority.]

[Original description from 2015]: This is a low priority, but we eventually want to import elevation data for use in routing and Modeify.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.