OpenStreetMap

Google Summer of Code 2022

Posted by tareqpi on 24 May 2022 in English.

Hi everyone, my name is Tareq Al-Ahdal. I am a computer science undergraduate student at Universiti Teknologi Malaysia. Recently, I got accepted into Google Summer of Code 2022 as an open source contributor with OpenStreetMap. I will work this summer on enhancing Nominatim: OpenStreetMap’s geocoding software that enables us to search and find location addresses based on their names and vice versa.

Nominatim is currently using a computed importance value to rank the search results based on the location’s perceived importance. This importance value is derived from the popularity of the Wikipedia article of each location. However, not every location on earth has its own Wikipedia article. As a result, the locations that do not have their own Wikipedia articles will not have an importance value, thereby the ranking of the search results, in that case, is deemed inaccurate. OpenStreetMap has data regarding the number of times users accessed each location on the map. This data is a good indicator of how popular a place is. The aim of my work is to integrate this data into Nominam’s computation of the importance value so that the search results become more accurate which will help the users find the correct places that they are looking for in less time.

I will use this diary to keep you updated about my work. Please feel free to reach out if you have any questions regarding my work or anything else you have in mind.

Location: Taman Tun Dr Ismail, TTDI, Kuala Lumpur, 60000, Malaysia

Discussion

Comment from bryceco on 24 May 2022 at 18:01

If anyone else is curious about the map access statistics referenced here, it’s the number of times each map tile has been served by the tile server: https://planet.openstreetmap.org/tile_logs/

Comment from tareqpi on 24 May 2022 at 21:46

Yes, thank you @bryceco 🙌

Comment from mmd on 25 May 2022 at 15:11

Tiles are mostly served by Fastly CDN these days. IIRC, tile_logsonly includes tiles which haven’t been cached by the CDN and need to be re-rendered. This might result in quite some skewed data. I’d highly recommend to get in touch with @pnorman to discuss those topics before starting actual implementation work.

Comment from tareqpi on 25 May 2022 at 22:57

Ok, I will to him. Thank you @mmd

Comment from pnorman on 27 May 2022 at 05:10

No, the tile logs are from the CDN and are an accurate count of successful requests for tiles. Prior to 2021-04-13 the logs were only from the second layer of the old CDN.

The logs include successful requests on tiles where there were at least 10 requests, and the requests came from at least 3 distinct IPs. Most of them represent real views, but there are some artifacts.

Log in to leave a comment