About main keys and values
Posted by kumakyoo on 18 April 2025 in English. Last updated on 25 April 2025.This blog post is part of a series of blog posts about the new OSM file format “OMA”. This is the seventh post. At the end of the article you’ll find links to the other blog entries.
To create and use blocks and slices, you need to know the type of an element - is it a highway
, a landuse
or a barrier
. And if it is a highway, is it a primary
, a secondary
or a footway
.
Keys
Apart from relations where the type
key stores it, this information is not available within OSM data. Again, it has to be guessed, and there is much discussion on the forums and elsewhere about the “main keys”, as they are sometimes called. Several apps, such as most OSM editors, contain such a list, but the lists are not always the same.
The approach I took was this: I counted all the keys of all the tags of all the elements and took the one that appeared most often in this list. I considered this the first main key and removed all elements with this key from the set of elements. Then I repeated this process.
After a while some “dirt” remains: elements that do not show a clear “main key”. I think if you start digging around in that dirt, you might find some hidden gems, that is, main keys that aren’t used very often, because the items they represent do not exist very often in the real world. But I didn’t.
I repeated this process twice,1 once for nodes and once for ways. You have to keep an open mind, because sometimes the top key is clearly not a “main key”, for example name
pops up after a while. Addresses needed some special treatment, because there is no address
key. Addresses are stored as a collection of addr:*
keys. Fortunately the addr:housenumber
is almost always present, so I used that as the key for addresses.2
Values
Once you’ve got these “main keys”, the next step is to identify the most common values of these keys. This is straightforward, but there are a lot of values and you have to draw a line somewhere. This is important because otherwise the number of slices would increase enormously, resulting in larger files without much benefit.
I didn’t use an absolute value for the cut, but rather my common sense. Often there are large gaps in the numbers. In this case I used the gap to stop adding values, but sometimes I had to draw a decision without a good reason and it could as well have been drawn somewhere else.
If you are curious about the result: It can be found in the repository of the converter. This file contains some more information about the separation of ways from areas, as well as a section about lifecycle prefixes.
This data is also included in the header of each Oma file. Queries can be speeded up with this information, and a wrong assumption could result in incomplete data.
See also
- A New File Format for OSM Data
- Using the Oma Library
- Getting Files in OMA File Format
- The OMA File Format
- Dealing with Relations in Oma Files
- Sorting into Chunks
- Summary, Outlook and a Real-Life Example
-
I’ve used the excerpt from Germany for this, because the whole planet file is still too big for my computer to process, and I’m personally mainly interested in data from Germany anyway. Creating a world wide
default.type
file is something I cannot do on my own. Help with this would be appreciated. ↩ -
TMC data is even more cumbersome. There are a lot of tags with a lot of colons in the keys, but there is no clear main key and the data is often attached to other things. Also: It’s sometimes written in upper case and sometimes in lower case. In my opinion, this is a big mess that needs some sorting out before it can be considered for a main key in OMA files. I considered this data to be part of the “dirt” mentioned above. ↩
Discussion
Comment from SomeoneElse on 19 April 2025 at 11:30
I notice you also have “area” information in there too - How do you cope with things like
man_made=pier
which (if linear) will not an area feature but (if closed) will be?Also - what about things like
leisure=track
andhighway=raceway
which can be either areas or linear, depending on whether anarea=no
orarea=yes
tag is present (and can be unclear without, although being a multipolygon relation nudges towardsarea
)?Comment from kumakyoo on 20 April 2025 at 08:48
@SomeoneElse: Here’s what the algorithm does:
area
tag exists and it’s value isyes
orno
, this value is used.So a
man_made=pier
will always be treated as a linear feature, except if it hasarea=yes
(or it’s a multipolygon).Neither
leisure=track
norhighway=raceway
are listed in the type file.leisure
s are always areas,highway
s are always linear features. Again, this can be overridden by thearea
tag.Unfortunately, OSM does not provide a clear way to distinguish between ways and areas. It’s always some guesswork - that cannot be helped.
Comment from SomeoneElse on 20 April 2025 at 11:50
@kumakyoo yes - that mostly makes sense, and assuming that
leisure
is an area feature andhighway
a linear one unless there’s an explicitarea
tag is certainly an approach, but you will find some values for which this isn’t true -highway=pedestrian
is one example, most closedhighway=pedestrian
ways without an area tag are actually areas.Comment from kumakyoo on 20 April 2025 at 14:46
@SomeoneElse Yes, there is certainly room for improvement in the type file. In your case, you can add
pedestrian
afterhighway
-EXCEPTION
. In this case, closed paths withhighway=pedestrian
will be treated as areas.