Tutorial: Using OpenStreetMap data

OpenStreetMap is an open, freely editable repository of geodata. This data can be useful for research purposes. This page aims to give an overview of what data is available, how to get it, and what to do with it. We will especially focus on topics that are relevant while implementing algorithms that run on OpenStreetMap data, and on how to use OpenStreetMap data for visualizing data. For more general information about OpenStreetMap, please consult the OpenStreetMap Wiki.

Note This tutorial is a work in progress. Please check back later!

Data format

Nodes and ways

Data in OpenStreetMap is stored in a simple data structure that consists of nodes and ways. A node represents one single point on the map; a node carries its geographical location (latitude and longitude) and a unique identifier number. A way represents a polyline or (closed) polygon on the map. Ways do not store their own location; instead they carry an ordered list of node identifiers.

Nodes are hence used for two purposes. Firstly, they can denote pointlike entities, like points of interest (shops, lamp posts, highway exit numbers, ...). In this case the node carries some additional information about what it represents (see the section Tags). Secondly, they can just be placed as part of a way, to encode their shape. A node can have both purposes at the same time; for example, a node that is part of a way may simultaneously represent a speed bump at that position.

Ways are used to represent non-pointlike entities, namely polylines and polygons. Despite their name, ways are not just used for roads: they are used for any polyline or polygon on the map. Examples of polyline features that ways are used for are roads, railways, river centerlines, powerlines and administrative borders. Examples of polygon features are forests, water bodies, residential areas and building outlines. In the data structure there is no fundamental difference between a polyline and a polygon; a polygon is just a way with the same node at the beginning and the end of the way. It is even possible to make ways that use nodes several times, although usually this seems to be discouraged.


The map on the left can be represented by nodes (red circles) and ways (black polylines) as shown on the right.

Tags

Nodes and ways need to describe the type of feature they represent. For this it is possible to attach tags to nodes and ways. Tags consist of a key and a value; we usually write down a tag as key=value. An example of a tag is the name tag (more precisely: the tag with the key name), that defines the name of the object. For instance, the way representing the Auditorium building on the TU/e has the tag name=Auditorium. There are also many tags that do not naturally need a value. Those tags usually take the value yes; for example building=yes means that something is a building.

Note that both the key and the value can be any string; blahblah=blah would be a valid tag in principle. However, to avoid total chaos, there are conventions on how to tag objects.

Just to give a taste of what tags look like, we give some examples of tags. This overview is intentionally very incomplete, because the number of available tags is enormous. For a complete overview of accepted tagging practices, see this page.

Here are some tags that should be used on nodes:

amenity=restaurant / school / library / parking / hospital
Useful facilities
highway=motorway_junction
Motorway exits
place=city / suburb / town / village
Inhabited places
shop=supermarket / bakery / bicycle / pet / chemist
Types of shops
tourism=museum / hotel / attraction
Places that are of interest to tourists

Here are some tags that should be used on non-closed ways:

highway=motorway / trunk / primary / secondary / tertiary / unclassified / residential
Roads, from motorways to residential streets
waterway=river / stream
Waterways, like rivers and streams
railway=rail
Railways for passenger trains
bridge=yes
Bridges (used together with one of the other tags)
tunnel=yes
Tunnels (used together with one of the other tags)

Here are some tags that should be used on closed ways:

landuse=forest / residential / grass
Type of landuse
natural=wood / water
Natural features
building=yes
Buildings

On the main OpenStreetMap website, it is possible to see which tags a node or way has. To do this, zoom in on the entity, click the question mark button on the right, and click the entity. In the list on the left, click the object you want to inspect. A list of tags is shown. For example, the tags on the Auditorium can be seen here.

Connectivity

OpenStreetMap data is used not only for rendering images of maps, but also for routing purposes. This means that it is important that the road network has the proper graph structure. If two roads intersect in reality, the corresponding ways must share a node there. If roads cross at a location but do not intersect (for instance, with a bridge), that is, there is no possibility to get from one road to the other, the corresponding ways must not share a node there. Hence, to run geometric algorithms on the road network, it is not generally needed to preprocess the network to fix connectivity issues. Note, however, that sometimes someone inadvertently breaks the connectivity somewhere, but this is usually fixed quickly.

A related rule is that multi-lane roads are drawn as one way if and only if there is no physical separation between the lanes. A typical motorway consisting of four lanes (two for each direction) would hence be drawn as two ways: one way for each direction, where both ways represent two lanes.


Representation using nodes and ways of a typical motorway exit (simplified).

Directionality

Ways are stored as an ordered list of nodes, so in graph terminology, the database consists of directed edges. However, usually the direction of a way does not matter. For example an ordinary residential road between two nodes $A$ and $B$ is drawn as a way from $A$ to $B$ or a way from $B$ to $A$, not both. This means that when implementing routing algorithms, one needs to take care to allow routing in the other direction.

However, there are some exceptions where the direction does matter. The most important example is one-way streets: those are mapped like ordinary roads, but with the additional tag oneway=yes. In this case, the direction of the way indicates the allowed driving direction (which should be taken into account for routing, of course). (Additional note: there are some tags, like highway=motorway, that imply oneway=yes, so handling this properly is not entirely trivial. See here for details.) Another example is rivers: those are always drawn in the flow direction, that is, from upstream to downstream.

Relations

Besides nodes and ways, there is an additional element that can be recorded in the OpenStreetMap database, called a relation. A relation is simply an ordered list of other objects (nodes, ways or other relations). Relations can carry tags on their own, and are used to denote things like public transport routes (being a list of ways).

Since relations are much less frequently used than nodes and ways, we will not explain them further.

Getting the data

One of the problems of working with OpenStreetMap data is that it is huge: the entire world is something like 600 GB (uncompressed) or 30 GB (compressed into a highly optimized binary format). To handle such a large data set is a painful process. Luckily it is possible to get extracts of the data, so you can download and process only the part of the world you are interested in.

The planet file

The planet file is a dump of the entire database. If you'd like to try something with the 30 GB of compressed data, you can find more information here.

The OpenStreetMap API

Using the official OpenStreetMap API we can obtain an excerpt of the data. This API always returns live (non-cached) data, but is (intentionally) very limited in scope: it can only handle small rectangular areas. Besides, it cannot do filtering, so you always get all of the data in the query area, even if you only wanted the highways. This API is meant to be used primarily by map editors, who download an excerpt, make edits locally, and upload their changes. It is unsuitable to download large areas of the map for analysis purposes.

The Overpass API

A much better alternative, for our purposes, is the Overpass API (so called because it provides a much faster alternative for the official API). The Overpass API consists of a mirror of the main OpenStreetMap database that is updated about every minute. Hence it is less suitable to use for editing the map: it may happen that you do not get the latest map updates, which may result in data loss when editing with several people in the same area. However, for our purposes, it does not matter that the data lags behind a few minutes, so we can use the larger capacity and filtering capabilities of the Overpass API.

The Overpass API implements a specialized query language. The website Overpass Turbo provides a front-end for the Overpass API that enables users to write queries and obtain the results.

Editing the data

This is not a tutorial on editing OpenStreetMap; the OpenStreetMap documentation is much better at explaining that. However, it can still be useful to edit an OpenStreetMap data file, for example to remove unwanted parts of the data.

Visualization

Leaflet