OpenStreetMap logo OpenStreetMap

This blog post is part of a series of blog posts about the new OSM file format “OMA”. This is the last post. At the end of the article you’ll find links to the other blog entries.

 

I will end this series of blog posts about the OMA file format with a short summary of all the feedback I have received and a brief outlook. But first I’d like to give you a real-life example that uses OMA files.

 

A Real-Life Example

Over a year ago I wrote a renderer suitable for displaying micromapped areas. I used it to generate tiles at zoom level 20 for the restricted area of my micromapping project.

Now I have adapted this program to use OMA files and recomputed the tiles for the area of my micromapping project. The program can be found on GitHub; and there’s a slippy map showing the tiles.

For example, this is what the station forecourt of Hilden looks like:

station forecourt of Hilden

 

Summary

First of all, I have received a lot of feedback from you - thank you very much for that. For the sake of brevity, I’ll just address only three things from the feedback here:

There were concerns about dropping topological information when resolving the node references. In my opinion, this happens only in rare situations, as there is usually only one node at a location, so the topology can be recomputed. However, it may be more difficult.1 I personally haven’t had any problems with this yet, and since resolving the nodes is one of the main ideas that lead to the Oma file format, I decided to stick with the dropped topology.

Second, user cello suggested adding the ability to use different compression methods: Modern compression methods are faster and might even compress the data better.

That’s obviously true. I was so happy that it was so easy to get the deflate algorithm to work that I didn’t bother looking for alternatives. And there is a big advantage to the deflate algorithm: It’s available in the libraries of almost every programming language.

On the other hand, fast access and small file sizes are two main design goals of the OMA format, so a better compression algorithm sounds like something one must add. From what I have read in the last weeks, I think that the Zstandard algorithm might be a good choice.

Tp use such an algorithm, I would have to implement it (or integrate an existing implementation) and test it thoroughly. I’d like to do that, but I can’t afford it at the moment. So I will just reserve the last unused bit in the features byte for adding two more compression algorithms, without actually specifying the algorithms at the moment. This should make it easy to extend the format, when the time comes.

User cello also suggested adding more bits for more compression algorithms. I’m shying away from that. This idea reminds me of the zip file format. This format accepts about a dozen different compression methods, which is quite a burden for any program that uses it. This is something I want to spare the programmers of software that uses OMA files.

User -karlos- pointed out some other mechanisms that could be used instead of sorting the data into chunks. For example, storing the data using a Z curve would allow you to look only at the elements in an arbitrary bounding box (with little overhead). This sounds amazing, but I cannot see how it works well together with compression and elements of unequal length. The faster lookup would probably be paid for by larger files.

Anyway, implementing such a change would take quite a lot of time. Probably more than I’m willing to spend. So I decided to stick with my chunk-based implementation and hope to improve bounding boxes of the chunks.

 

Outlook

What’s next? There is only one small change (the order of the bits in the features byte) that I plan to apply to the format, before it can be finalised. But before that, I want to take care of some issues that have arisen while writing these blog posts. This will take some time, and I expect the file format to be finalised this summer.

After that, the library could become a “real” library. I’m not sure if I really know what a “real” library looks like. But at the moment it feels a bit provisional. Probably tests need to be added, as suggested by user rayKiddy, and maybe the user interface needs to be rethought.

Also, the converter needs some polishing. I tried converting europe.osm this week and discovered an integer overflow (which is fixed meanwhile) and an infinite recursion caused by a certain (not yet identified) multipolygon. It would also be nice to try the converter on planet.osm. I’m not sure if I have the infrastructure to do that.

I’ve been using the Oma tools at work since the beginning of the year. It has simplified a lot of things. I often don’t even notice that I’m querying germany.oma in the background when doing something. The data is just there when I need it.

But every now and then I’ve got to do a one-way query. For example, I recently needed to find a pedestrian area with exactly two holes and only a small number of nodes defining that area. I wrote a Java program using the libraray that discovered five such areas. This was fine, but writing a Java program for this always feels like overkill.

I would have preferred to write the program in PHP, because it’s much faster to write such one-way programs in PHP, and I don’t mind if it’s a little bit slower. So I’m looking forward to creating a PHP version of the library. I could also make good use of a Python version when working with QGIS.

Finally, I hope that other people will join in using the OMA file format. I will be happy when someone tells me that they have successfully used it for one of their projects. :-)

See also


  1. I usually use a two-dimensional hashmap to search for nodes. This results in a lookup in almost constant time, but needs some additional memory. 

Discussion

Log in to leave a comment