kumakyoo's Comments
Post | When | Comment |
---|---|---|
Sorting into Chunks | Thanks for pointing out those two things. I wasn’t aware of the concept of Caesium, so I didn’t consider it. As far as I understand it, it’s something like a k-d-tree approach. Regarding the Z-curve: If anything, this could be used within slices to sort the data once more. It would be nice, but you need to have indexed access to the elements, which is not the case: The data is compressed and must be retrieved linearly. And even, if it were not compressed, the size of the elements is not fixed, so indexing still does not work. So I’m not sure if such a technique could be used. Anyway, it would be nice, to be able to read only part of the slices if only part is needed. |
|
About main keys and values | @SomeoneElse Yes, there is certainly room for improvement in the type file. In your case, you can add |
|
About main keys and values | @SomeoneElse: Here’s what the algorithm does:
So a Neither Unfortunately, OSM does not provide a clear way to distinguish between ways and areas. It’s always some guesswork - that cannot be helped. |
|
The OMA File Format | I fear I wasn’t as clear about this, as I wanted: I’m not using single-precision floats, exactly because they loose precision. I’m using the same trick that pbf files (and many others) use: I multiply the numbers by 10,000,000 which leads to integers without any loss of precision. With that accuracy is at about 1cm. In my opinion that’s enough. |
|
A New File Format for OSM Data |
I’m not sure, which wiki you are referring to. I plan to add a page in the OSM-Wiki when the format is finalized. I don’t want to do it in advance, because then I would have to change the entry every time I change the format. But maybe you have got something else in mind.
Of course not. I have spend a year on this. I’m not going to give up, because there is something similar out there. But I like the idea of comparing it to my approach. It will help me get a clearer picture of the strengths and weaknesses of my format. And it might bring up some new ideas that I may have overlooked.
I think so. It’s an all-purpose format that contains almost all the information available from OSM. It might be better suited for vector tiles though, because, I think, Oma files could be used directly, without any additional preparation.
No, there is currently no automated testing. The main goal so far has been to create a new file format. In my opinion this cannot be tested automatically, because after every change I would have to rewrite all the tests, and then I would have to test the tests, just to run them once… The converter and the library are a kind of add-on, a prototype to show what is possible. When the format is fixed and a “real” converter/libraray is created, it should definitely be accompanied by automated tests. Having said that, I did a lot of testing during the development of the two tools mentioned. It was just nothing automated. :-) Regarding a roundtrip: That is not possible. You can’t convert Oma files back to OSM files. Some information is lost in the conversion process, for example the IDs and other meta information of the nodes that make up a way, but also how multipolygons have been pieced together and a few other things. There is only one round trip I know of: From Oma to Opa and back. The resulting Oma file must be identical to the original.
Of course it would be nice to have the library in several languages. But first, the file format needs to be finalized. Python and PHP are languages, where I’ll probably write the library myself (but I don’t mind if someone else volunteers) when the time comes. For other languages other people will have to do the job. Concerning the converter: I doubt that Python is fast enough for this job. And memory management may also be an issue. Java is (despite its reputation) one of the fastest languages available (but memory is an issue here too - I’ll cover that in my next post in this series) and thus a rewrite in another language may be required sooner or later. |
|
Using the Oma Library |
And C (collection) - I’ll go into more detail on collections in one of my next blog posts (about how Oma files handle relations). Parameters can be combined, so you can also use “WA” if you are interested in ways and areas. Have a look at the API for more details.
Yes. Would probably be more Java like. It’s only a prototype of a library; mainly intended to show what is possible. For a “real” library there needs to be a lot of refinement, I think.
In the past, I have run into problems, when trying to write Java iterators myself, so I have shied away from this approach. I’m probably missing something fundamental here. The design was probably inspired by Python.
Originally you had to call
It’s not supported yet. You might create two OmaReaders in two threads and make them read the same file in parallel. That should work, but might slow everything down; depends on how file access works under the hood.
When an |
|
A New File Format for OSM Data |
Sounds like a good idea. I didn’t know these two compression algorithms and I didn’t look for alternatives to deflate. Many thanks for pointing this out. I’ll have a look soon. I’ll also have a look at GeoParquet and GeoDesk when I find the time. They might contain additional ideas I overlooked. Many thanks too. |
|
A New File Format for OSM Data |
This is very difficult to answer (which is the reason, why I didn’t give any numbers above). First of all, the idea behind overpass is similar to the idea behind Oma files - do the problematic stuff once. Instead of a file, overpass uses a database. Databases have some advantages over files: For example they contain indexes to speed up searching. Since I don’t have an instance of overpass on my computer I can only guess, but I’d say that it would be faster. The drawback is that databases are not so easy to share. The dumps tend to get big. And the initialisation may take some time to create the indices. All in all, I think, these two approches are not easily compareable. Querying a pbf file with tools like After that, you can use
|
|
A New File Format for OSM Data | I hope, that at least the library can be used by non-specialists too (only some basics about Java programming is needed). :-) I plan to give a short introduction on how to use the library in my next post. |