OpenStreetMap logo OpenStreetMap

Using the Oma Library

Posted by kumakyoo on 14 March 2025 in English. Last updated on 25 April 2025.

Please note: This blog post is part of a series of blog posts about the new OSM file format “OMA”. This is the second post. At the end of the article you’ll find links to the other blog entries.

 

This time I will give you an example of how to query Oma files. I wrote a prototype of a library for working with Oma files. I called it OmaLibJava.

To explain how to use this library, let’s say, we want to create an overview map of all the power facilities in a certain town.

 

The Classical Approach

The classical approach with OSM files would be, to first reduce the size of the file by creating a smaller file containing only the data of interest. This is typically done in two steps: remove everything that is not a power facility / remove everything that is not in the town. The order of these two steps is not important for the result, but might have a huge impact on the duration of the process.

Although it is not necessary – or even counterproductive – this can easily be done with Oma files too. For example, the following small Java program extracts all power facilities of Germany:

import java.io.*;
import de.kumakyoo.omalibjava.*;

public class ExtractPower
{
    public static void main(String[] args) throws IOException
    {
        Extractor e = new Extractor("germany.oma");
        e.addExtract(new BlockFilter("power"),"power.oma");
        e.run();
    }
}

Running this program on germany.oma on my computer takes 8.5 seconds. It creates the file power.oma, which is about 22MB in size and contains all power facilities of Germany.1

Let’s have a closer look at the program: the library contains a class called Extractor, which reads an Oma file (here germany.oma) and writes several extracts simultaneously (here only one, called power.oma).

An important part is the filter BlockFilter("power"). This tells the extractor to keep only elements with the key power. I’ll tell you more on filters in a moment. But first I want to show you why you don’t need this step.

 

The Oma Way

Here is a rough sketch of a program that draws power facilities of the town Trossingen:

public class DrawPower
{
    public static void main(String[] args) throws IOException
    {
        OmaReader r = new OmaReader("germany.oma");

        Filter f1 = new BlockFilter("power");
        Filter f2 = new LifecycleFilter();
        Filter power = new AndFilter(f1,f2);

        Filter f3 = new TagFilter("name","Trossingen");
        Filter f4 = new BlockSliceFilter("boundary","administrative");
        Filter f5 = new AndFilter(f3,f4);
        Filter town = new PolygonFilter(new Polygon(r,f5));

        r.setFilter(new AndFilter(power, town));

        while (true)
        {
            Element e = r.next();
            if (e==null) break;

            // do something with e
        }
    }
}

This time we will use a class called OmaReader. This class provides access to the elements in an Oma file. As we are not interested in all elements, we provide a Filter with r.setFilter(...). After having setup the filter, the elements that pass the filter can be queried in a simple loop.

The filter combines two other filters: One (called power) for the power facilities and one (called town) for the town (here Trossingen).

These two filters are very efficient, that is, almost only the parts of the Oma file need to be read in that contain the data we are interested in. On my computer, the above program only takes 1.4 seconds. That’s why we do not need an extract – the program wouldn’t be any faster if we used an extract containing only the power facilities of Trossingen.

 

Filters

Let’s take a closer look at the filters: The power filter is a combination of two other filters: The BlockFilter, which we already know from above and a LifecycleFilter.

A short remark on this second one: OSM data can contain a lifecycle prefix on every tag. If such a tag exists on the key of an element, it is removed in Oma files and added as a separate tag with the key lifecycle. The advantage of this is faster access. The disadvantage is that all elements now pass a BlockFilter, even those with a lifecycle prefix. That’s where the LifecycleFilter jumps in: It prevents elements with lifecycle prefix from being passed, because we do not want to show demolished or planned power facilities.

The town filter is more complicated: First, two filters are combined: One that looks for elements with the name “Trossingen” and one that looks for administrative boundaries. Combined, this results in the boundaries of Trossingen.2

This filter is used to create a Polygon. Here, something very important happens under the hood: This polygon is not created from a poly file (that would also be possible) or provided in another way. No, it’s extracted directly from germany.oma using the same OmaReader that we’re going to use to extract the power facilities.

Please note that this would not have been possible if you had extracted power.oma in a first step using the classical approach, because the boundaries would have been lost in this case.

The result is a Polygon3, which is passed to a PolygonFilter. This PolygonFilter provides access to all the elements that are inside of the polygon.

All together this results in the following graph:4

A graph of all used filters

 

Do some Drawing

I cannot go into all the details of drawing a map. For a first impression, I will just draw a small circle for each node, a line for each way and an area for, well, each area.

For a map, you often draw some background areas first. The second step is to add the ways and finally some icons for the nodes.

One might be tempted to read in all the power elements first, sort them and draw them in whatever order seems appropriate. Unfortunately this could take up a lot of memory.

Another approach would be to query the elements in the order you need them. This is shown in the following sketch of a program:

Filter power_of_town = new AndFilter(power, town);

r.setFilter(new AndFilter(power_of_town, new TypeFilter("A")));
while (true)
{
    Area a = (Area)r.next();
    if (a==null) break;

    // draw area
}

r.setFilter(new AndFilter(power_of_town, new TypeFilter("W")));
while (true)
{
    Way w = (Way)r.next();
    if (w==null) break;

    // draw way
}

r.setFilter(new AndFilter(power_of_town, new TypeFilter("N")));
while (true)
{
    Node n = (Node)r.next();
    if (n==null) break;

    // draw node
}

First, all power elements of type area are queried and painted. Then the ways and finally the nodes. This takes about the same time as querying them all together.

And this is, what the result might look like (took 1.4 seconds to generate):

A map of all power facilities in Trossingen

I was asked how this scales. If we were to draw all the power facilities of whole germany, how long would it take? For this, replace “Trossingen” by “Deutschland” in the program above. It took 5.7 seconds on my computer and this is the result:

A map of all power facilities in Germany

See also


  1. For comparison: osmium needs 2:24 minutes for this job (on a pbf file) and osmfilter uses 1:47 minutes (on an o5m file). All three results differ: First of all, osmfilter adds more than 40,000 wrong elements. Oma adds power elements with life-cycle-prefixes, while the other two omit them. Omitting them with Oma would be easy (see later in the main text), but adding them with the other two is more difficult. And finally, there are a lot of differences on how Oma files handle areas and relations. 

  2. You should make sure, that there is only one “Trossingen” in your file. If there are several such areas, you will have to take some other measures to characterise, which “Trossingen” you are interested in. Fortunately, there is only one “Trossingen” in Germany. 

  3. It’s actually a collection of areas (with holes), which is optimized for fast inside queries and useless for almost everything else. 

  4. The graph contains a TypeFilter that is implicitly added in the constructor of Polygon, restricting the result of the query to areas. 

Discussion

Comment from cello on 17 March 2025 at 19:02

Very nice, and very impressive from a performance point of view! I think it looks very promising and I love that there is already a Java API/Library for this!

As you requested feedback in your first blog post, here are some of my thoughts as a software developer (only after looking at your posts, I have not yet tested the library):

TypeFilter

r.setFilter(new AndFilter(power_of_town, new TypeFilter(“A”)));
while (true) { … }

In the TypeFilter, what are possible parameter values? A (area), W (way), N (node). Are there others? What happens if I pass a, Q or # or ?? Instead of taking a char as argument, you probably could use an enum with a fixed set of options

Multiple Queries with same reader

In your example, you seem to re-use your reader r 3 times, each time setting a filter and then just calling r.next(). For me, this is a bit confusing. Traditionally in Java, I have a query method or a method returning an iterator, and then I can iterate over the found values (e.g. with iterator.hasNext(); iterator.next().

In your code, the reader has a next() method that seems to automatically reset when you set a filter. But this “reset” is not really visible in the code.

Maybe something the following would be a nicer API design?

OmaIterator iter = r.query(new AndFilter(power_of_town, new TypeFilter("A")));
while (true) {
  Object o = iter.next();
  if (o == null) break;
 ....
}

Although then people might try to run two queries in parallel on the same reader, and I don’t know if that is supported or not.

I think what confuses me is that I don’t see where/when the query happens: I set a filter, and suddenly I can access next to get results. This does not seem intuitive to me (but others might have a different opinion on that)

Comment from kumakyoo on 18 March 2025 at 16:53

In the TypeFilter, what are possible parameter values? A (area), W (way), N (node).

And C (collection) - I’ll go into more detail on collections in one of my next blog posts (about how Oma files handle relations). Parameters can be combined, so you can also use “WA” if you are interested in ways and areas. Have a look at the API for more details.

Instead of taking a char as argument, you probably could use an enum with a fixed set of options

Yes. Would probably be more Java like. It’s only a prototype of a library; mainly intended to show what is possible. For a “real” library there needs to be a lot of refinement, I think.

Traditionally in Java, I have a query method or a method returning an iterator, and then I can iterate over the found values (e.g. with iterator.hasNext(); iterator.next().

In the past, I have run into problems, when trying to write Java iterators myself, so I have shied away from this approach. I’m probably missing something fundamental here. The design was probably inspired by Python.

In your code, the reader has a next() method that seems to automatically reset when you set a filter. But this “reset” is not really visible in the code.

Originally you had to call reset() everytime you set a filter manually. This was errorprone and thus I decided to make it automatically be called. (Normally I don’t like automatisms that cannot be overruled by a human, but in this case I can’t see a use case, because after setting a filter, the OmaReader is in an undefined state and the only way to get back to a defined state is to call reset() anyway.)

Although then people might try to run two queries in parallel on the same reader, and I don’t know if that is supported or not.

It’s not supported yet. You might create two OmaReaders in two threads and make them read the same file in parallel. That should work, but might slow everything down; depends on how file access works under the hood.

I think what confuses me is that I don’t see where/when the query happens: I set a filter, and suddenly I can access next to get results. This does not seem intuitive to me (but others might have a different opinion on that)

When an OmaReader is created, the file is immediately opened and some basic data (needed for all queries) is retrieved. Querying starts with the first call to next(). It scans the file until it finds an item that fits the filter (skipping larger parts if possible) and returns this item. With the next call it continues search at the place it stopped before.

Log in to leave a comment