Cogito ergo sum

How to create your own reverse geocoding API using OpenStreetMap Docker, Osimium and Nominatim7 min read

During my master’s at the University of Twente and for the course Managing Big Data, I worked with three other students on a very interesting project in the field of big data. We measured global internet speed and latency changes between 2019 and 2022 (Covid period). The dataset we used for the project was “The Internet Speed Dataset” by Ookla. The dataset consists of internet connection measurements for mobile and fixed connections.

The dataset consisted of multiple fields. tile was the most important field since it helps us specify where (geographically) the speed test was performed. The field contained geographical information at zoom level 16, equating to approximately 610.8 × 610.8 meters at the equator (18-arcsecond blocks). The data was in WGS 1984 (EPSG:4326) format, which was represented as Well Known Text (WKT). The WKT has multiple object types: Point, LineString, Multipolygon, and Polygon. In our case, the field contained only Polygon objects. The Polygon consists of five-coordinate points defined in counterclockwise order. For example, the following Polygon corresponds to a polygon-shaped location in Nanchang County, Jiangxi, China :
POLYGON((116.098022460938 28.6713109158808, 116.103515625 28.6713109158808, 116.103515625 28.6664911769866, 116.098022460938 28.6664911769866, 116.098022460938 28.6713109158808)).

You guessed it already, we must find a way to convert those Polygon objects to human-understandable data, to cities, countries, to normal addresses. I won’t go into the implementation details of the project, but one of the approaches we initially experimented with was setting up our own reverse geocoding API (RGA). Setting it up locally was the cheapest option. Despite the existence of a big variety of online RGA’s, unfortunately, they are expensive when it comes to large volumes of data which was the case in our project. We had millions of WKT row data that needed to be converted into human-readable formats.

I wasn’t aware of the fact that you can set up your own RGA using OpenStreetMap (OSM). Once I knew that it exists and I have tinkered with it myself, I thought it would be a good idea to write about it. In this post, I will explain step by step how to create your own RGA using OSM, Docker, Osmium, and Nominatim. Let me start first to describe briefly what each software is or does.

  1. OpenStreetMap: a project that creates and distributes free geographic data for the world.
  2. Docker: a software platform that allows you to build, test, and deploy applications quickly. Docker packages software into standardized units called containers that have everything the software needs to run including libraries, system tools, code, and runtime. Using Docker, you can quickly deploy and scale applications into any environment and know your code will run.
  3. Osmium: a fast and flexible C++ library for working with OpenStreetMap data.  Osmium Tool is a command-line tool based on the Osmium library to interact with OSM files and map extracts. This tool facilitates manipulating and merging multiple OSM map extracts into one file. In this tutorial, I won’t be using it since the use case we have is a very simple and small map. There’s no need for merging.
  4. Nominatim: a tool to search OSM data by name and address (geocoding) and to generate synthetic addresses of OSM points (reverse geocoding).

For this tutorial, I’m using Ubuntu 22.04. This means all commands will be executed using the command line in Ubuntu Linux. Iceland will be used in this tutorial because the size of the provided map extract from GeoFabrik is relatively small (55 MB) and that would not take too much time to build. The bigger the file (bigger the geographical spread of your RGA) the longer it takes to build.

Setting up the reverse geocoding API (RGA)

  1. Download the OSM data extract from the website of GeoFabrik. You can either download a whole continent like Europe, or an individual country.  In addition, you can also download individual countries and merge them together. This can be done for specific cases, for example, if you want to set up an API for the Benelux, which is the collection of three countries: The Netherlands, Belgium, and Luxembourg.  We download the OSM data extract of Iceland https://download.geofabrik.de/europe/iceland.html. (iceland-latest.osm.pbf). We will need this file for the build process of the API using Nominatim and Docker.
    You can set up the API using either the docker-run or using a docker-compose up commands
  2. Using the docker-run :
    docker run -it --rm --shm-size=80g -e PBF_PATH=/nominatim/flatnode/iceland-latest.osm.pbf -e THREADS=48 -e IMPORT_STYLE=address -e POSTGRES_SHARED_BUFFERS=4GB -e POSTGRES_WORK_MEM=1GB -e POSTGRES_EFFECTIVE_CACHE_SIZE=20GB -e POSTGRES_MAX_WAL_SIZE=2GB -e POSTGRES_CHECKPOINT_TIMEOUT=50min -p 8080:8080 -v ~/Downloads:/nominatim/flatnode --name nominatim mediagis/nominatim:4.0

    The environment variables used to run the docker container are all listed on the Github page of nomination docker https://github.com/mediagis/nominatim-docker/tree/master/4.0#docker-compose. It’s noteworthy that the variable IMPORT_STYLE has a huge impact on the required time for setting up the API. If you choose to use full that gonna take ages to build and it requires a powerful machine to host the API.  The documentation of the Nominatim reports very well about the IMPORT_STYLE and its impact. For example, to build an API for the whole planet ( IMPORT_STYLE=full) using OSM data extracts on a machine with 64GB RAM and 4 CPUs, we would need 54 hours and around 700 GB of disk space. You can read more about it here https://nominatim.org/release-docs/latest/admin/Import/#filtering-imported-data.
    The execution of the above command took barely 5 minutes to complete. The 5 minutes could be easily extended to 90 minutes if we build the same for a country like Belgium. As I have said earlier, the bigger the file and the type of IMPORT_STYLE the longer it would take to build.

  3. Using the docker-compose up and docker-compose.yml file
    version: "3"
    
    services:
        nominatim:
            container_name: nominatim
            image: mediagis/nominatim:4.1
            restart: always
            ports:
                - "8080:8080"
            environment:
                # see https://github.com/mediagis/nominatim-docker/tree/master/4.0#configuration for more options
                PBF_URL: https://download.geofabrik.de/europe/iceland-latest.osm.pbf
                REPLICATION_URL: https://download.geofabrik.de/europe/iceland-updates/
                NOMINATIM_PASSWORD: very_secure_password
            volumes:
                - nominatim-data:/var/lib/postgresql/12/main
            shm_size: 1gb
    
    volumes:
        nominatim-data:
            driver: local
            driver_opts:
                type: 'none'
                o: "bind"
                device: "/home/peshmerge/osm-maps/files/"
    

    In this case, you don’t have to download the map extract file because it will be automatically downloaded if you run docker-compose up

Testing the API

Using Postman or even the web browser, you can query some information about specific locations in Iceland (since our API is built for Iceland). We will be using this place, which seems to be a shopping mall as a test

Using Postman, you can query the following URL:  http://localhost:8080/search?street=1 Smáralind (notice, we are using the house number first and then the street name. For more information about the endpoints, you can visit this page https://nominatim.org/release-docs/latest/api/Search/ )

screenshot_postman_querying_nominatim_reverse_geo_api_Iceland

GET: http://localhost:8080/search?street=1 Smáralind
[
    {
        "place_id": 143365,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright",
        "osm_type": "way",
        "osm_id": 45705386,
        "boundingbox": [
            "64.1004213",
            "64.1016091",
            "-21.886456",
            "-21.8805034"
        ],
        "lat": "64.10098825",
        "lon": "-21.883596409321832",
        "display_name": "Smáralind, 1, Hagasmári, Smárinn, Kópavogsbær, Höfuðborgarsvæðið, 201, Ísland",
        "place_rank": 30,
        "category": "shop",
        "type": "mall",
        "importance": 0.22010000000000002
    }
]

Picking the lat and longitude from the response and querying it, should give us the same information.

screenshot_postman_querying_nominatim_reverse_geo_api_Iceland_response

GET: http://localhost:8080/reverse?lat=64.10098825&lon=-21.883596409321832&format=json
{
    "place_id": 143365,
    "licence": "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright",
    "osm_type": "way",
    "osm_id": 45705386,
    "lat": "64.10098825",
    "lon": "-21.883596409321832",
    "display_name": "Smáralind, 1, Hagasmári, Smárinn, Kópavogsbær, Höfuðborgarsvæðið, 201, Ísland",
    "address": {
        "shop": "Smáralind",
        "house_number": "1",
        "road": "Hagasmári",
        "suburb": "Smárinn",
        "town": "Kópavogsbær",
        "state_district": "Höfuðborgarsvæðið",
        "ISO3166-2-lvl5": "IS-1",
        "postcode": "201",
        "country": "Ísland",
        "country_code": "is"
    },
    "boundingbox": [
        "64.1004213",
        "64.1016091",
        "-21.886456",
        "-21.8805034"
    ]
}

As you see, the response is exactly what we have been expecting!

Don’t forget to leave a comment if you have any questions regarding this tutorial! Happy reverse geocoding 🙂

 

About the author

Peshmerge Morad

Data Science student and a software engineer whose interests span multiple fields.

Add comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Cogito ergo sum