Using the Oma Library
Posted by kumakyoo on 14 March 2025 in English. Last updated on 25 April 2025.Please note: This blog post is part of a series of blog posts about the new OSM file format “OMA”. This is the second post. At the end of the article you’ll find links to the other blog entries.
This time I will give you an example of how to query Oma files. I wrote a prototype of a library for working with Oma files. I called it OmaLibJava
.
To explain how to use this library, let’s say, we want to create an overview map of all the power facilities in a certain town.
The Classical Approach
The classical approach with OSM files would be, to first reduce the size of the file by creating a smaller file containing only the data of interest. This is typically done in two steps: remove everything that is not a power facility / remove everything that is not in the town. The order of these two steps is not important for the result, but might have a huge impact on the duration of the process.
Although it is not necessary – or even counterproductive – this can easily be done with Oma files too. For example, the following small Java program extracts all power facilities of Germany:
import java.io.*;
import de.kumakyoo.omalibjava.*;
public class ExtractPower
{
public static void main(String[] args) throws IOException
{
Extractor e = new Extractor("germany.oma");
e.addExtract(new BlockFilter("power"),"power.oma");
e.run();
}
}
Running this program on germany.oma
on my computer takes 8.5 seconds. It creates the file power.oma
, which is about 22MB in size and contains all power facilities of Germany.1
Let’s have a closer look at the program: the library contains a class called Extractor
, which reads an Oma file (here germany.oma
) and writes several extracts simultaneously (here only one, called power.oma
).
An important part is the filter BlockFilter("power")
. This tells the extractor to keep only elements with the key power
. I’ll tell you more on filters in a moment. But first I want to show you why you don’t need this step.
The Oma Way
Here is a rough sketch of a program that draws power facilities of the town Trossingen:
public class DrawPower
{
public static void main(String[] args) throws IOException
{
OmaReader r = new OmaReader("germany.oma");
Filter f1 = new BlockFilter("power");
Filter f2 = new LifecycleFilter();
Filter power = new AndFilter(f1,f2);
Filter f3 = new TagFilter("name","Trossingen");
Filter f4 = new BlockSliceFilter("boundary","administrative");
Filter f5 = new AndFilter(f3,f4);
Filter town = new PolygonFilter(new Polygon(r,f5));
r.setFilter(new AndFilter(power, town));
while (true)
{
Element e = r.next();
if (e==null) break;
// do something with e
}
}
}
This time we will use a class called OmaReader
. This class provides access to the elements in an Oma file. As we are not interested in all elements, we provide a Filter
with r.setFilter(...)
. After having setup the filter, the elements that pass the filter can be queried in a simple loop.
The filter combines two other filters: One (called power
) for the power facilities and one (called town
) for the town (here Trossingen).
These two filters are very efficient, that is, almost only the parts of the Oma file need to be read in that contain the data we are interested in. On my computer, the above program only takes 1.4 seconds. That’s why we do not need an extract – the program wouldn’t be any faster if we used an extract containing only the power facilities of Trossingen.
Filters
Let’s take a closer look at the filters: The power
filter is a combination of two other filters: The BlockFilter
, which we already know from above and a LifecycleFilter
.
A short remark on this second one: OSM data can contain a lifecycle prefix on every tag. If such a tag exists on the key of an element, it is removed in Oma files and added as a separate tag with the key lifecycle
. The advantage of this is faster access. The disadvantage is that all elements now pass a BlockFilter
, even those with a lifecycle prefix. That’s where the LifecycleFilter
jumps in: It prevents elements with lifecycle prefix from being passed, because we do not want to show demolished or planned power facilities.
The town
filter is more complicated: First, two filters are combined: One that looks for elements with the name “Trossingen” and one that looks for administrative boundaries. Combined, this results in the boundaries of Trossingen.2
This filter is used to create a Polygon
. Here, something very important happens under the hood: This polygon is not created from a poly file (that would also be possible) or provided in another way. No, it’s extracted directly from germany.oma
using the same OmaReader
that we’re going to use to extract the power facilities.
Please note that this would not have been possible if you had extracted power.oma
in a first step using the classical approach, because the boundaries would have been lost in this case.
The result is a Polygon
3, which is passed to a PolygonFilter
. This PolygonFilter
provides access to all the elements that are inside of the polygon.
All together this results in the following graph:4
Do some Drawing
I cannot go into all the details of drawing a map. For a first impression, I will just draw a small circle for each node, a line for each way and an area for, well, each area.
For a map, you often draw some background areas first. The second step is to add the ways and finally some icons for the nodes.
One might be tempted to read in all the power elements first, sort them and draw them in whatever order seems appropriate. Unfortunately this could take up a lot of memory.
Another approach would be to query the elements in the order you need them. This is shown in the following sketch of a program:
Filter power_of_town = new AndFilter(power, town);
r.setFilter(new AndFilter(power_of_town, new TypeFilter("A")));
while (true)
{
Area a = (Area)r.next();
if (a==null) break;
// draw area
}
r.setFilter(new AndFilter(power_of_town, new TypeFilter("W")));
while (true)
{
Way w = (Way)r.next();
if (w==null) break;
// draw way
}
r.setFilter(new AndFilter(power_of_town, new TypeFilter("N")));
while (true)
{
Node n = (Node)r.next();
if (n==null) break;
// draw node
}
First, all power elements of type area are queried and painted. Then the ways and finally the nodes. This takes about the same time as querying them all together.
And this is, what the result might look like (took 1.4 seconds to generate):
I was asked how this scales. If we were to draw all the power facilities of whole germany, how long would it take? For this, replace “Trossingen” by “Deutschland” in the program above. It took 5.7 seconds on my computer and this is the result:
See also
- A New File Format for OSM Data
- Getting Files in OMA File Format
- The OMA File Format
- Dealing with Relations in Oma Files
- Sorting into Chunks
- About main keys and values
- Summary, Outlook and a Real-Life Example
-
For comparison: osmium needs 2:24 minutes for this job (on a pbf file) and osmfilter uses 1:47 minutes (on an o5m file). All three results differ: First of all, osmfilter adds more than 40,000 wrong elements. Oma adds power elements with life-cycle-prefixes, while the other two omit them. Omitting them with Oma would be easy (see later in the main text), but adding them with the other two is more difficult. And finally, there are a lot of differences on how Oma files handle areas and relations. ↩
-
You should make sure, that there is only one “Trossingen” in your file. If there are several such areas, you will have to take some other measures to characterise, which “Trossingen” you are interested in. Fortunately, there is only one “Trossingen” in Germany. ↩
-
It’s actually a collection of areas (with holes), which is optimized for fast inside queries and useless for almost everything else. ↩
-
The graph contains a
TypeFilter
that is implicitly added in the constructor ofPolygon
, restricting the result of the query to areas. ↩
Discussion
Comment from cello on 17 March 2025 at 19:02
Very nice, and very impressive from a performance point of view! I think it looks very promising and I love that there is already a Java API/Library for this!
As you requested feedback in your first blog post, here are some of my thoughts as a software developer (only after looking at your posts, I have not yet tested the library):
TypeFilter
In the
TypeFilter
, what are possible parameter values?A
(area),W
(way),N
(node). Are there others? What happens if I passa
,Q
or#
or?
? Instead of taking achar
as argument, you probably could use an enum with a fixed set of optionsMultiple Queries with same reader
In your example, you seem to re-use your reader
r
3 times, each time setting a filter and then just callingr.next()
. For me, this is a bit confusing. Traditionally in Java, I have a query method or a method returning an iterator, and then I can iterate over the found values (e.g. withiterator.hasNext(); iterator.next()
.In your code, the reader has a
next()
method that seems to automatically reset when you set a filter. But this “reset” is not really visible in the code.Maybe something the following would be a nicer API design?
Although then people might try to run two queries in parallel on the same reader, and I don’t know if that is supported or not.
I think what confuses me is that I don’t see where/when the query happens: I set a filter, and suddenly I can access
next
to get results. This does not seem intuitive to me (but others might have a different opinion on that)Comment from kumakyoo on 18 March 2025 at 16:53
And C (collection) - I’ll go into more detail on collections in one of my next blog posts (about how Oma files handle relations). Parameters can be combined, so you can also use “WA” if you are interested in ways and areas. Have a look at the API for more details.
Yes. Would probably be more Java like. It’s only a prototype of a library; mainly intended to show what is possible. For a “real” library there needs to be a lot of refinement, I think.
In the past, I have run into problems, when trying to write Java iterators myself, so I have shied away from this approach. I’m probably missing something fundamental here. The design was probably inspired by Python.
Originally you had to call
reset()
everytime you set a filter manually. This was errorprone and thus I decided to make it automatically be called. (Normally I don’t like automatisms that cannot be overruled by a human, but in this case I can’t see a use case, because after setting a filter, theOmaReader
is in an undefined state and the only way to get back to a defined state is to callreset()
anyway.)It’s not supported yet. You might create two OmaReaders in two threads and make them read the same file in parallel. That should work, but might slow everything down; depends on how file access works under the hood.
When an
OmaReader
is created, the file is immediately opened and some basic data (needed for all queries) is retrieved. Querying starts with the first call tonext()
. It scans the file until it finds an item that fits the filter (skipping larger parts if possible) and returns this item. With the next call it continues search at the place it stopped before.