GTFS Data format (XML)

In this section of the manual we discuss the GTFS (General Transit Feed Specification) format in the context of PLANit supporting parsing this format.

This data format only prescribes public transport services and lacks an underlying network. Therefore, its PLANit GTFS reader requires an existing PLANit network (with or without existing transit stops in the form of transferzones) to overlay the GTFS services on.

Data Format Type Status Network (Public Transport) Services Zoning Demands Python Java
GTFS GTFS Supported Read Read Read (No ODs) N/A YES YES

GTFS and PLANit

Any existing PLANit network/zoning can be supplemented with a service network and routed services constructed from a GTFS file. This network may already have information on existing stop locations. In this case the GTFS reader will attempt to match the GTFS stops to these existing stops if deemed compatible, otherwise it will create new stops (transfer zones).

Examples on how to supplement a PLANit network/zoning with GTFS services are provided in the respective PLANit-Python and PLANit-Java reference sections of this manual.

Outline

GTFS is an open source format for which a detailed description is available on their own website:

We refer the interested reader to the above links for in-depth information on the data format itself. With respect to PLANit, this page details what our GTFS reader does and does not support with respect to the data available in a typical GTFS file in the outline below:

Mode support

GTFS files have standardised modes for services. When parsing a GTFS file this information is present as a route_type field in the routes.txt file. These modes are either in the basic (1-12) format or in the extended format, see here. When parsing a GTFS file the user configures which of the two is used. Also, the mapping from a GTFS mode to a predefined PLANit mode can be altered from the default mapping.

The mapping from GTFS to PLANit predefined mode is logged, so if unsure consider running with the default mapping first and then use the logs to manually override the mapping to your liking.

Stop mapping

If the underlying PLANit network/zoning has no public transport stops yet (transfer zones), then the GTFS reader will create a new transfer zone (stop) for each GTFS stop available in the dataset. However, if there already exist stops it will first try to match these. It does so as follows:

  1. Consider existing stops nearby in a user configurable search radius
  2. Remove all possible matches that do not support the mode of the GTFS stop, i.e., avoiding matching a tram stop to a bus stop
  3. Ony consider existing stops on the correct side of the road (only relevant for bus)
  4. Exclude stops that were manually excluded based on user settings
  5. Try to match to the most likely stop based on
    1. platform/pole name
    2. matching ideal ‘access link’, i.e., the road/rail segment the transit vehicle would stop on to service the platform/pole
    3. match on closeness and angle of their respective ideal ‘access link’ if 2) did not yield a match.
  6. If 5) did not yield an acceptable answer, we revert to creating a new transfer zone

It should be noted that in case a match is found that is deemed possibly wrong, this is logged and the user is encouraged to verify correctness, similarly, if some possible matches were rejected but the algorithm could not reject them with a lot of certainty, again, a message will be logged prompting the user to verify if the rejection was valid.

GTFS misalignment

GTFS stops carry a location with them but lack any relation to a network. As a result there is the possibility of misalignment. this is especially prudent for bus stops which are expected to reside on a particular side of a road. In the situation that either the GTFS stop location and/or the underlying network link location is off, it might happen that a GTFS stop gets mapped to the wrong existing stop, causes a new PLANit stop to be created when it should have been matched, or the GTFS stop gets matched to the wrong network link.

The GTFS reader has been designed to identify such situations and make sure the user gets notified when in doubt for manual verification. This is done by:

  1. identifying the closest link to each GTFS stop at all times
  2. identifying the closest compatible link to each GTFS stop at all times
  3. Comparing the closest link and closest compatible stop and logging a situation when they differ as this might indicate misalignment
  4. Comparing the road types of the chosen link, where when the algorithm was forced to choose a minor road over a major road to map to due to mode incompatibility, the user is notified

When a user identifies that the automated solution is suboptimal various options are available to correct for this during the parsing. It is possible to override a GTFS stop location, it is possible to force a mapping to a particular PLANit link, it is possible to either force a mapping to an existing PLANit stop, or do the opposite, force the creation of a new stop. Lastly, it is possible to force extensive logging on how a particular GTFS stop gets mapped to the network, or to log every created/matched GTFS stop in more detail.

Bounding Box

The GTFS parser will attempt to parse as much from the GTFS services and stops as possible. However, if stops fall outside the underlying network’s bounding box they cannot be parsed. In that case, we only ingest the GTFS stops that can be mapped to the network.

Services that partly run inside and partly run outside the bounding box will be split. Meaning that the service is considered to be terminating at the last stop within the bounding box. Then, a new transit service is created, for when it (re-)enters the network again.

Logging support and identifying/correcting tagging errors

As mentioned, the GTFS parser will log information, warnings, and other useful findings during parsing in a way that allows the user to sense check and/or correct for potential errors in the GTFS itself, misalignment between GTFS and network, or other issues that were found. For the GTFS reader this logging is particularly extensive because of the need to map to an existing network that by definition is sourced from elsewhere. It is very likely that some manual overrides based on the initial logging is required for any GTFS parsing exercise.

The GTFS parser has a number of salvage algorithms built-in to attempt to address the most common situations. For each of the issues found PLANit will inform the user if the GTFS entity has been:

  • Discarded, e.g., warning DISCARD: <message>
  • Savaged, e.g., info SALVAGED: <message>
  • Possible bug in the parser or unexpected behaviour e.g., severe <message>

A DISCARD message indicates the parser was unable to convert the GTFS entity into a valid PLANit entity even though it was expected to. A SALVAGED message indicates the GTFS entity could not be converted the way it was expected to, but the parser was able to derive a likely solution based on context. In both cases it is recommended to verify the result by checking the OSM entities referred to in the DISCARD and SALVAGED messages.

You can often quickly lookup a GTFS stop location based on its id in online trip planners. For example in Melbourne (Victoria, Australia) one simply pastes the stop id into the departure field to find where this stop lives, see https://www.ptv.vic.gov.au/journey. If the underlying network comes from OSM, then the external id of the links can be pasted into the OSM website to find information on that road, e.g., https://www.openstreetmap.org/way/485842787 (using some way id)