Open Street Map Data format (XML/PBF)
In this section of the manual we discuss the Open street Map (OSM) XML and PBF format. The OSM parser requires a local file (XML or PBF) as input. Use for example Osmosis to extract such a file from the cloud based on a predetermined bounding box.
Data Format | Type | Status | Network | (Public Transport) Services | Zoning | Demands | Python | Java |
---|---|---|---|---|---|---|---|---|
Open Street Map | XML /PBF |
Supported | Read | Read (platforms/poles only) | Read (No ODs) | N/A | YES | YES |
OSM (GTFS) and PLANit
Any OSM XML or PBF file can be converted into a PLANit memory model using the available readers. The resulting PLANit memory model can then be used to conduct an assignment (Java) or persist into another data format, e.g., PLANit XML, MATSim, etc. (Java, Python).
The user has the option to either just extract a network, or supplement this with public transport infrastructure (poles, platforms, stations) available in the OSM data. Since the OSM data generally does not have good quality public transport service information, this is not supported, instead it is advised to ingest public transport services by utilising a GTFS reader which can be run after the OSM parser and which is capable of supplementing the network and stops with routed pt services and the underlying service network.
Examples on how to perform a network conversion from OSM to other data formats (PLANit, MATSim, etc.) are provided in the respective PLANit-Python and PLANit-Java reference sections of this manual.
Outline
OSM has a detailed description of its XML and PBF formats available on their own Wiki:
We refer the interested reader to the above links for an in-depth discussion on the data formats themselves. With respect to PLANit, this page details what our OSM reader does and does not support with respect to the entities that reside in each OSM file in the outline below:
- OSM basics
- OSM road network support
- OSM rail network support
- OSM public transport support
- OSM mode support
- Bounding box edges
- Logging support
OSM Basics
OSM has only three types of entities that cover all its categories ranging from buildings, to trees, to seas, to roads:
OSM Nodes
These are point locations with additional tags that identify what this point represents. For example, this can be a tree, but also an intersection, it can be a landmark, or a train station.
OSM Ways
These are either lines (open) or polygons (closed) with additional tags that identify what this way represents. For example, this can be a road, but also a building, it can be a train platform, but also a border.
OSM Relations
These are groups of OSM entities, where each member can either be an OSM node, OSM way, or OSM relation. For example, this can be a station with multiple platforms, or a group of buildings that belong together etc.
OSM Road Network Support
In OSM the road network is represented by nodes and ways, where each way is tagged with highway=<type>
.
A way consists of two or more nodes. These nodes can represent the shape of the internal road, or intersections.
PLANit supports all meaningful highway types and allows the user to configure which types to parse and which types to ignore. This way you can decide to parse a bicycle network, pedestrian network, bus, or main arterial private car network, or a combination of the former.
If you activate
highway=footway
but have thepedestrian
mode deactivated, footways will still not appear in the final result (unless default that we do not parse highway types without activated modes is overwritten). So, make sure that for your desired result the combination of activated modes, and way types makes sense.
More information on the highway=
tag can be found under https://wiki.openstreetmap.org/wiki/Key:highway
Topological vs Non-topological
OSM networks are not topological in the sense that OSM ways may intersect but at the point of intersection they either have no node, or they do have a node but the same OSM way continues beyond the intersection point. PLANit requires topological networks where each discontinuity in the road infrastructure leads to a node (intersection) and a new road (link) starts beyond the discontinuity.
Therefore, The OSM parser will automatically split all identified locations where OSM ways intersect in separate links. Each link will get its own unique PLANit id, however it will retain the OSM way id as its external id, so it can be identified after the fact if needed.
If two OSM ways intersect but do not share a node it is assumed they are non-level crossings and PLANit will not split the links (tunnels, bridges etc.)
The PLANit parser is quite sophisticated in that it can handle complex OSM way intersects, even to the point that it correctly disentangles self-intersection OSM ways. This sometimes occurs, especially for footways it is quite common.
OSM Rail Network Support
In OSM the rail network is represented by nodes and ways, just as roads, where each way is tagged with railway=<type>
.
A way consists of two or more nodes. These nodes can represent the shape of the internal railway, or intersections/switches.
PLANit supports all meaningful railway types and allows the user to configure which types to parse and which types to ignore. This way you can decide to parse a tram network, lightrail network, rail, or even miniature railways, or a combination of the former.
Similar to road networks, the parser will split an OSM railway whenever it intersects with another OSM way, see topological vs non-topological.
More information on the railway=
tag can be found under https://wiki.openstreetmap.org/wiki/Key:railway
OSM Public transport support
The OSM parser currently supports parsing of public transport infrastructure, but not public transport services and/or lines. This means that stations, platforms, bus stops, stop locations, and stop areas are all parsed, but no information on the lines utilising this infrastructure or their schedules/frequencies is available.
Support for lines/services is available through the separate GTFS parser.
In OSM there exist various iterations of tagging schemes to identify public transport (pt) infrastructure. There are two dominant schemes, both of which are supported by this parser, see also https://wiki.openstreetmap.org/wiki/Public_transport :
- Public transport scheme v1 (Ptv1), a.k.a. original public transport schema
- Public transport scheme v2 (Ptv2), a.k.a. new public transport schema
Ptv2 is backwards compatible with Ptv1. The OSM parser attempts to parse pt infrastructure as Ptv2 if possible (since it is more comprehensive and less ambiguous), but if insufficient tagging is available, it falls back onto Ptv1 parsing. alternatively, if incomplete Ptv2 features are identified, but additional Ptv1 tags are available to provide context, the parser will attempt to complete the parsing action using both sources of information.
The main difference between Ptv1 and Ptv2 is that Ptv1 largely identifies pt by value tags for existing keys,
e.g. highway=bus_stop
, or railway=station
, whereas Ptv2 has its own dedicated key public_transport=
.
Transfer Zones And Connectoids
The OSM parser supports all common OSM public transport infrastructure entities such as platforms,
platform_edges, bus_stops, and stations, that directly relate to the transfer from one mode to another.
To enable this functionality the user must use an INTERMODAL
reader
(representing the ability to transfer between modes within a trip), rather than the simpler NETWORK
reader.
All parsed pt infrastructure is converted into transfer zones (geographic locations where intermodal transfers occur,
e.g., platforms, stop poles) and connectoids (the location where a mode from a physical network can access the transfer
zone, e.g., vehicle stop locations).
Public Transport infrastructure such as bus_stops, platforms, stations, generally reside next to the network and not on the network. While there are exceptions and frequent tagging errors (that the parser supports/identifies), the connectoid to the road network is typically achieved via the nearest node, or via explicit references between a road network node (stop_location) and the waiting area (platform, pole etc.). The OSM parser utilises the explicit references where present, adopts the nearest node strategy otherwise, or alternatively let’s the user override the mapping between waiting area and stop_location manually if desired.
OSM stop_location
To PLANit LinkSegment
A relation between a waiting area and stop_location in PLANit is registered on the connectoid.
Each connectoid (stop_location) can have access to one or more transfer zones (waiting areas),
and for each relation it has a dedicated list of modes that it allows for this transfer,
all of which is parsed or derived from the available OSM tagging. The more detailed the tagging the more precise this
information is. Finally, each connectoid is related to
the physical network by referring to a particular link segment in PLANit, based on the OSM way the stop_location
resides on. However, since a stop_location in OSM is a node, and a link segment in PLANit is not, the parser infers
which link segment upstream to the (stop_location) node is the most likely choice. If something unusual is found,
it will be logged for the user to make the right decision via the configuration.
It is recommended to carefully check the logs for warnings and other information to determine if manual override action is required. Especially in public transport tagging many user errors exist due to the complexity of the tagging scheme. It is therefore likely that some action is required on the user’s part to achieve the desired result.
OSM station
to PLANit
OSM nodes tagged with railway=station
are complex to parse because they can represent:
- Just the name of a station, where the station is defined in Ptv2 format separately in the same stop_area
- Just the name of a station, where the station is defined in Ptv2 format separately in a nearby stop_area
- A stand-alone train station, where the actual platforms are not available yet, due to lack of detail
In the first case, the OSM parser will simply identify the station name and apply it to all entities in the stop_area if they do not have a name yet, and do nothing else. In the second case, the OSM parser will search for nearby platforms/stop_areas it likely needs to match the station to. If it finds one, it reverts to the action of bullet one, if it does not, it assumes the action for bullet three. In the last case, we assume there is a station but no detail on where platforms are. Therefore, the OSM parser will look for nearby train tracks of supported rail modes. If it finds any it will attempt to create (virtual) platforms on the nearest node (within threshold) if that seems feasible. At most it creates two platforms this way on nearby parallel train lines. If it cannot find any, it will log a warning to the user that the station could not be parsed.
OSM Bus_stop location (on wrong side of the road)
Non rail based modes are assumed to have access to passengers only on one side of the vehicle (doors). This means that in left-hand drive countries all such stops, like bus_stop, should be located on the left-hand side of the road, whereas on right-hand drive countries they should reside on the right-hand side. The OSM parser verifies for each bus_stop if this is the case. This is a common tagging error source and The OSM parser will generate warning when a bus_stop is found that is located on the alleged wrong side of the road. To address any such tagging errors, explore the location of the stop in OSM and then overwrite the mapping explicitly if the stop should be kept (sometimes it is an invalid stop, sometimes due to the location it is mapped to the wrong waiting area, e.g., waiting area in other direction).
Carefully check all warnings related to this issue to avoid having mappings between the wrong stop_location and waiting area. there is little the parser can do about this, because often these are tagging errors in OSM.
Parsing OSM Stop_areas
The OSM parser also supports stop_area relations. A stop_area relation groups a number of public transport entities that logically belong together, e.g. train station platforms and their bus_stops. In PLANit a stop_area is converted into a PLANit transfer zone group. All the OSM waiting areas are converted into transfer zones, and registered on the group.
Parsing OSM Stations
The parser does not (yet) support:
- Subway_entrances to stations, i.e., only the station and vehicle stop_locations are parsed
- Pedestrian access to stations, platforms etc. e.g., only interaction between motorised modes is supported via stop_locations
- parsing of public transport services and/or lines
- Groups of stop_areas, these are simply ignored
OSM Mode Support
OSM supports a number of de-facto standard modes as listed under https://wiki.openstreetmap.org/wiki/Key:access. These modes are mapped to their respective PLANit counterparts. Currently, PLANit supports all - arguably meaningful - modes listed (complete list in JavaDoc). By default, a number of OSM modes is activated and a number is deactivated. The user can overwrite these defaults to their liking, to, for example, only consider pedestrians, or bicycles, or cars, or a combination of the former.
Link Access Restrictions
Each OSM way has defaults regarding what modes it supports, these defaults differ per country as per https://wiki.openstreetmap.org/wiki/OSM_tags_for_routing/Access_restrictions. PLANit supports these country specific defaults. It provides out-of-the-box versions for a number of countries and a global default, all of which can be altered by the user if desired.
On top of the defaults each OSM way might have additional tags to indicate further restrictions specific to that way. PLANit supports a number of these additional tags and parses them accordingly. For each unique combination of mode access restrictions a PLANit (link segment) type is created in memory. Currently, PLANit supports the following access restriction schemes either fully or partially:
- Mode direct, e.g.
bicycle=yes bus=yes
etc. - Oneway restrictions
oneway=yes bus=yes
etc. - Busway scheme basic (a.k.a. oneway special cases), e.g.,
oneway=yes busway=opposite_lane
etc. - Busway scheme location, e.g.,
busway:left=lane
etc. - Cycleway scheme e.g.,
oneway=yes cycleway:left=lane
etc. - Lanes:mode scheme (a.k.a. Lane count information), e.g.
lanes:psv:forward=2, lanes:psv=4
etc. - Mode:lanes scheme (a.k.a. Access information per lane), e.g.
psv:lanes:forward=2, psv:lanes=4
etc.
While the “mode:Lanes” scheme potentially provides per lane information on access for a mode, PLANit currently only identifies if any lane is accessible to the mode (implicitly assumed if this tag is available for the mode) and if so, the mode is allowed on the road. The same holds for the “Lanes: mode” scheme, since this tag is only available when at least one lane is available to that mode, hence, the mode can access that road. For the one way restrictions and bus way scheme the parse does interpret the values as they provide additional information on mode access, either in what direction or for other modes than the key describe.
See also:
- https://wiki.openstreetmap.org/wiki/Key:access (routing-restrictions section)
- https://wiki.openstreetmap.org/wiki/Bus_lanes (example on busway scheme, lanes:mode scheme and mode:lanes scheme)
PLANit does not yet support: Temporal/conditional mode access restrictions, see https://wiki.openstreetmap.org/wiki/Conditional_restrictions
Bounding Box Edges
The OSM parser will attempt to parse as much from the OSM infrastructure as possible. However around the bounding box edges of the input file, it is likely that roads/tracks/platforms/stations are incomplete due to partly falling outside the bounding box. If so, the parser will attempt to salve the OSM entity and provide feedback on whether this was successful or not.
We recommend checking the logs to determine if the result is satisfactory or not and if needed exclude the compromised entity from parsing if so required.
Logging support and identifying/correcting tagging errors
The OSM parser will log information, warnings, and other useful information during parsing. For the OSM reader this logging is particularly extensive to inform the user of any issues. It is very likely that due to tagging mistakes some of the OSM features could not be parsed as expected.
The OSM parser has a number of salvage algorithms built-in to attempt to address the most common tagging errors, especially with respect to public transport. For each of the issues found PLANit will inform the user if the OSM entity has been:
- Discarded, e.g.,
warning DISCARD: <message>
- Savaged, e.g.,
info SALVAGED: <message>
- Possible bug in the parser or unexpected behaviour e.g.,
severe <message>
A DISCARD
message indicates the parser was unable to convert the OSM entity into a valid PLANit entity even though it
was expected to. A SALVAGED
message indicates the OSM entity could not be converted the way it was expected to,
but the parser was able to derive a likely solution based on context. In both cases it is recommended to verify
the result by checking the OSM entities referred to in the DISCARD
and SALVAGED
messages.
You can quickly have a look at an OSM way in OSM via https://www.openstreetmap.org/way/485842787 (using some way id)
You can quickly have a look at an OSM way in OSM via https://www.openstreetmap.org/node/5397680278 (using some node id)