OpenStreetMap

ChangesetMD

Posted by SomeoneElse on 6 August 2023 in English.

There’s lots of software around to work with OpenStreetMap data - querying it, creating maps from it, using it for navigation. One area that there has historically been less support is analysing OSM changeset data. As an example, here is one of mine.

However, many years ago Toby Murray wrote ChangesetMD. That can be used to maintain a database of changeset information (tags, discussions, etc.). Martijn van Exel updated that to a recent version of Python and I made a couple more changes to support very recent Python versions and made the replication more scriptable.

Downloading changeset data

Over on the planet.osm.org site, there’s a download for all the changeets generated at the same time as the weekly “planet” file is created, and also one for all the changeets and discussions as well. The changesets-only one is just over 6GB (bzipped), and the one with discussions as well is only about 250MB more, because most changesets don’t have discussions.

If you’re not interested in changeset data dating back to 2012 you don’t need to download this file; you can just start replicating from whatever point in the past you are intereted in starting from.

Installing the software

If you’re on Debian or Ubuntu, you’ll likely want to follow these instructions from a non-root user account of your choice (substitute that account for “youruseraccount” in the instructions).

Running it for the first time

From the virtual account created by the installation, which you can reenter by running

cd /home/${youruseraccount}/src/ChangesetMD
source .venv/bin/activate

Run something like this

python changesetmd.py -d changesets -c -g

The “-g” is optional and initialises the database to story geometric data. “changesets” in the line above is just the database name.

Loading a changeset dump file

To load the database with changesets and discussions:

python changesetmd.py -d changesets -g  -f ~/data/discussions-latest.osm.bz2

Adjust this to wherever you’ve downloaded discussions-latest.osm.bz2; if you are not interested in changeset discussions you can use changesets-latest.osm.bz2 instead.

Setting up replication

This section in the readme explains how to find out the date of the latest data you’ve imported from, and how to find out which replication file corresponds to that date.

Alternatively, if you haven’t loaded a dump file you might want to “just load all the changesets from the last year” - do do that find the state file in the replication directories that matches “about a year ago” and initialise the replication from there. To actually initialise the replication, do this:

cd /home/${youruseraccount}/src/ChangesetMD
source .venv/bin/activate
psql changesets
update osm_changeset_state set last_sequence = 5607928;

Again, “changesets” here is the database name. “5607928” is the replication number which will vary depending on where you want to start replication from. 5607928 corresponds to https://planet.osm.org/replication/changesets/005/607/928.state.txt , which is 18:43 UTC on 3rd August, 2023.

To run replication once, run this command

python changesetmd.py -d changesets -r -g

Again, “changesets” is just the database name and “-g” means “process geometry as well”. This will run though all pending replication files, which may take some time. You can monitor the process from psql by running “select * from osm_changeset_state;”. The “last_sequence” value is updated as it progresses but “last_timestamp” is only updated when replication is complete, so look at the former, not the latter.

When in progress the state table might look like:

changesets=> select * from osm_changeset_state ;
 last_sequence |       last_timestamp       | update_in_progress 
---------------+----------------------------+--------------------
       5607985 | 2023-08-02 00:59:46.096662 |                  1
(1 row)

when replication has completed, “update_in_progress” will be set to “0”.

Running replication from cron

A shall script “call_changesetmd_replication.sh” has been written to check that changeset replication is not already running, and if it is not to run it.

That is designed to be run from a cron job like this:

*/5 *   *   *   *    /usr/local/sbin/call_changesetmd_replication.sh >> /var/log/tiles/ChangesetMD.log

That runs every 5 minutes; the location /var/log/tiles/ was chosed for the log because if rendering database replication is already set up that’ll probably go there too.

Clearing replication flag at restart

A script “reset_changesetmd_replication.sh” has been written to do this. The version distributed in the script is written to sudo to the database owner from root, so that it can be run in a startup script somewhere.

What can it do

Counting changesets by a user

changesets=> select count(*) from osm_changeset where user_name = 'SomeoneElseSC';
 count 
-------
   2175
 (1 row)

Displaying details of changesets

changesets=> select id, num_changes, tags -> 'comment'  from osm_changeset where user_name = 'SomeoneElseSC' and created_at > '2023-08-01';
    id     | num_changes |                     ?column?                     
-----------+-------------+--------------------------------------------------
 139486735 |           2 | Determine roads lane count
 139486744 |           5 | Specify road surfaces
 139486795 |           2 | Specify whether public transport stops have bins
 139486799 |           2 | Add whether public transport stops are lit
 139486859 |           1 | Specify crossing types
 139486892 |           1 | Specify whether crosswalks have tactile paving
 139486897 |           1 | Specify cycle barrier types
 139486904 |           1 | Specify cycle barrier installation
(8 rows)

Changesets within an area

The “ST_MakeEnvelope” below is an area around “null island”:

changesets=> select id, user_name, tags -> 'comment' FROM osm_changeset c, (SELECT ST_SetSRID(ST_MakeEnvelope(-0.1,-0.1,0.1,0.1),4326) AS geom) s WHERE ST_CoveredBy(c.geom, s.geom) and created_at > '2023-06-01';
    id     |     user_name      |              ?column?               
-----------+--------------------+-------------------------------------
 137125242 | TV Sand in Taufers | 
 137125690 | TV Sand in Taufers | 
 137233374 | ivanbranco         | deleted null island drinking waters
(3 rows)

With different imagery sources

changesets=> select count(*) from osm_changeset where created_at > '2023-05-01 00:00:00' and created_at < '2023-06-01 00:00:00' and tags -> 'imagery_used' like '%Maxar%';
 count 
-------
 80407
(1 row)

changesets=> select count(*) from osm_changeset where created_at > '2023-05-01 00:00:00' and created_at < '2023-06-01 00:00:00' and tags -> 'imagery_used' like '%Bing%';
 count  
--------
 274372
(1 row)

With certain changeset comments

changesets=> select id, user_name, tags -> 'comment' from osm_changeset where tags -> 'comment' like '%muppetry%';
    id    |     user_name      |                                               ?column?                                                
----------+--------------------+-------------------------------------------------------------------------------------------------------
 27073588 | SomeoneElse        | Added a couple of hedges in the vicinity to try and prevent further armchair muppetry.
 67493564 | SomeoneElse_Revert | Revert more muppetry in USA.  See https://www.openstreetmap.org/user_blocks/2545 and previous blocks.
 45959206 | SomeoneElse        | Undoing more road merging muppetry
(3 rows)

(oops)

Changeset comments

changesets=> select * from osm_changeset_comment where comment_changeset_id = 126313717;
 comment_changeset_id | comment_user_id | comment_user_name |    comment_date     |                                                      comment_text                                                      
----------------------+-----------------+-------------------+---------------------+------------------------------------------------------------------------------------------------------------------------
            126313717 |            5359 | user_5359         | 2022-09-22 19:30:10 | Hello! Please have a look on http://overpass-turbo.eu/s/1m76. What is the meaning of the tag do = View from path;Bing?
            126313717 |           61942 | SomeoneElse       | 2022-09-22 19:39:35 | Thanks - just a typo, now fixed.
(2 rows)
Location: Charlesworth, Higher Chisworth, High Peak, Derbyshire, England, United Kingdom

Discussion

Log in to leave a comment