OpenStreetMap

Nominatim and Postcodes

Posted by lonvia on 16 January 2018 in English.

Nominatim (the search engine that powers the search box on the OpenStreetMap website) has recently changed significantly its way how postcodes are handled. This post tries to give a bit of background on what has changed and why.

When you search for a place on osm.org, Nominatim not only presents the name of the place in the result but a complete address. This address not only helps distinguish the different places but is also used to narrow down your search. This address is not a postal address as you would put on a postcard. It is more a textual description where the place is located, in which suburb, city, state, country etc. This information is fairly easy to compute from OSM data. There are areas for all these administrative areas. So Nominatim just needs to check in which areas a place is inside, order all appropriately and there is the address.

Postcodes, however, are different. In most countries there is no such thing as postcode areas. Postcodes are simply assigned to a some place (a house or POI) in a fashion that is deemed most practical for the local postal service. Often the post codes follow delivery routes. It might be possible to draw an area around houses with the same postcode but this would be an artificial distinction and there is no guarantee that the resulting areas don’t overlap.

For that reason, there are very few boundaries in OSM that describe postcode areas. Mostly postcodes can be found on house numbers and POIs in the addr:postcode tag. But even here coverage is rather sparse. So when computing the address of a place, Nominatim has to go a different way to determine the most likely postcode for a place where no addr:postcode tag exists.

With the new version, Nominatim tries two different methods to infer the postcode of the place: an address lookup and an area-based lookup.

The address lookup comes first. Nominatim assembles all other parts of the address and then checks if any part of the address carries an addr:postcode tag that might apply. It does that going from the most specific part of the address, the street, up to the most generic one, the country. As soon as it finds an appropriate tag, it stops and uses the postcode. This means that when tagging postcodes you can start with assigning an approximate postcode for a larger area, like a complete village or suburb, and then later come back and add addr:postcode tags to the handful of houses that are the exception to rule (or even complete postcode coverage for the whole village and then delete the postcode tag on the village again).

If there is no postcode to be found in the address, Nominatim tries the area method. That means that it ideally should be looking for the closest object with an addr:postcode tag within a certain area and use that postcode as a guess. This is unfortunately a bit expensive, so Nominatim implements a simplified version. For each postcode, it looks for all the points in OSM that are tagged with the appropriate addr:postcode tag and computes one central point, the postcode centroid. When guessing the postcode of an object with the area method, the closest postcode centroid is used. This is not quite as accurate but considerably faster. The postcode centroids are also used when you search for a postcode. If OSM has no postcode area, then an artificial point is returned with the same location as the centroid.

Postcode centroids have been a feature of Nominatim for a long time. However, they have always been static and only computed once when the database was initially imported. Starting with the next release, postcodes become their own entity in Nominatim and can be regularly recomputed and updated. On nominatim.osm.org this is already done once per day now.

Finally, there is also a change in the way postcodes are handled in your search query. Formerly, if you added a postcode to your search, you had to use the one that Nominatim had guessed for the place or you would get no result at all. That was particular annoying when Nominatim had guessed wrong and the search had the right postcode. With the new version Nominatim is now able to detect postcodes in the query and ignore them, if necessary. So if a place has a wrong postcode in Nominatim it is now nonetheless able to find the place by the correct address. There is one catch though: Nominatim needs to understand that the part of your query is indeed a postcode. At the moment it takes this information from OSM itself. That means it can really only detect (and ignore) postcodes that have been previously mapped in OSM somewhere. At some point, it will learn to detect postcodes by their format but that is a project for a future version of Nominatim.

Discussion

Comment from stevea on 17 January 2018 at 10:12

Thank you for your outstanding and clear description on Nominatim’s latest (and future) postcode improvements!

Comment from gileri on 17 January 2018 at 22:52

Thank you for these insights in Nominatim !

Comment from escada on 18 January 2018 at 10:02

Read with great interest and bookmarked for future reference. Thanks for your hard work on Nominatim

Comment from Glenn Plas on 18 January 2018 at 10:58

Hi this is quite interesting but after reading this I do wonder if postal code boundaries are still used with this system?

Comment from lonvia on 18 January 2018 at 12:54

Glen, yes they are still used. If there is a postcode boundary, it will be treated exactly like and administrative boundary and it will be preferred over any guessed postcode.

Comment from Glenn Plas on 18 January 2018 at 13:29

Tx for elaborating on that, That is great news in fact.

Comment from Alexander-II on 22 January 2018 at 09:56

That’s great!
I noticed only addr:postcode within the text. And now I’m curious does nominatim use postal_code (wiki) tags on highways for the postal code guessing?

Comment from lonvia on 22 January 2018 at 20:42

Nominatim recognises any of the tags ‘addr:postcode’, ‘postcode’ and ‘postal_code’. And in the US it is also able to process the ‘tiger:zip_left’ and ‘tiger:zip_right’ tags.

Comment from marc__marc on 29 January 2018 at 23:06

Thank you for these improvements.

what is the first level taken into account ? one sentence suggests that it is the road (“the most specific part of the address, the street”), another suggests that it is the building (“add addr:postcode tags to the handful of houses).

Log in to leave a comment