GithubHelp home page GithubHelp logo

peter-evans / nominatim-k8s Goto Github PK

View Code? Open in Web Editor NEW
67.0 5.0 29.0 46 KB

Nominatim for Kubernetes on Google Container Engine (GKE).

Home Page: https://hub.docker.com/r/peterevans/nominatim-k8s/

License: MIT License

Shell 52.23% PHP 2.61% Dockerfile 45.16%
google-storage kubernetes nominatim gke docker-image pbf canary-deployment google-cloud

nominatim-k8s's Introduction

Nominatim for Kubernetes

CircleCI

Nominatim for Kubernetes on Google Container Engine (GKE).

This Docker image and sample Kubernetes configuration files are one solution to persisting Nominatim data and providing immutable deployments.

Supported tags and respective Dockerfile links

Usage

The Docker image can be run standalone without Kubernetes:

docker run -d -p 8080:8080 \
-e NOMINATIM_PBF_URL='http://download.geofabrik.de/asia/maldives-latest.osm.pbf' \
--name nominatim peterevans/nominatim-k8s:latest

Tail the logs to verify the database has been built and Apache is serving requests:

docker logs -f <CONTAINER ID>

Then point your web browser to http://localhost:8080/

Kubernetes Deployment

Nominatim's data import from the PBF file into PostgreSQL can take over an hour for a single country. If a pod in a deployment fails, waiting over an hour for a new pod to start could lead to loss of service.

The sample Kubernetes files provide a means of persisting a single database in storage that is used by all pods in the deployment. Each pod having its own database is desirable in order to have no single point of failure. The alternative to this solution is to maintain a HA PostgreSQL cluster.

PostgreSQL's data directory is archived in storage and restored on new pods. While this may be a crude method of copying the database it is much faster than pg_dump/pg_restore and reduces the pod startup time.

Explanation

Initial deployment flow:

  1. Create a secret that contains the JSON key of a Google Cloud IAM service account that has read/write permissions to Google Storage.
  2. Deploy the canary deployment.
  3. Wait for the database to be created and its archive uploaded to Google Storage.
  4. Delete the canary deployment.
  5. Deploy the stable track deployment.

To update the live deployment with new PBF data:

  1. Deploy the canary deployment alongside the stable track deployment.
  2. Wait for the database to be created and its archive uploaded to Google Storage.
  3. Delete the canary deployment.
  4. Perform a rolling update on the stable track deployment to create pods using the new database.

Creating the secret

# Google Cloud project ID and service account details
PROJECT_ID=my-project
SA_NAME=my-service-account
SA_DISPLAY_NAME="My Service Account"
SA_EMAIL=$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com
KEY_FILE=service-account-key.json

# Create a new GCP IAM service account
gcloud iam service-accounts create $SA_NAME --display-name "$SA_DISPLAY_NAME"

# Create and download a new key for the service account
gcloud iam service-accounts keys create $KEY_FILE --iam-account $SA_EMAIL

# Give the service account the "Storage Object Viewer" and "Storage Object Creator" IAM roles
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:$SA_EMAIL --role roles/storage.objectViewer
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:$SA_EMAIL --role roles/storage.objectCreator

# Create a secret containing the service account key file
kubectl create secret generic nominatim-storage-secret --from-file=$KEY_FILE

Deployment configuration

Before deploying, edit the env section of both the canary deployment and stable track deployment.

  • NOMINATIM_MODE - CREATE from PBF data, or RESTORE from Google Storage.
  • NOMINATIM_PBF_URL - URL to PBF data file. (Optional when NOMINATIM_MODE=RESTORE)
  • NOMINATIM_DATA_LABEL - A meaningful and unique label for the data. e.g. maldives-20161213
  • NOMINATIM_SA_KEY_PATH - Path to the JSON service account key. This needs to match the mountPath of the volume mounted secret.
  • NOMINATIM_PROJECT_ID - Google Cloud project ID.
  • NOMINATIM_GS_BUCKET - Google Storage bucket.
  • NOMINATIM_PG_THREADS - Number of threads available for PostgreSQL. Defaults to 2.

License

MIT License - see the LICENSE file for details

nominatim-k8s's People

Contributors

auro avatar joshuajackson-jobvite avatar peter-evans avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nominatim-k8s's Issues

PostgreSQL Threads

Hi peter!

I would like to push a feature to your code - the ability to specify postgresql threads as an env var.

I thing it can be useful.

Could you give me permission to create a branch so you can evaluate the changes?

Thank you.

Auro

K8s container initialization design

Hi!
This configuration under k8s takes to long time run.
Maybe use a initial container pattern?

Pod config will have a initial container with download tar and unpacking.
This will better for health checking by k8s.

This require small changes in entrypoint and copy logic to init.sh

Example deployment config:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nominatim
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nominatim
  template:
    metadata:
      labels:
        app: nominatim
    spec:
      initContainers:
      - name: nominatim
        image: peterevans/nominatim-k8s:latest
        command: ["init.sh"]
        volumeMounts:
          - name: pg-data
            mountPath: /var/lib/postgresql/9.5/main
      containers:
      - name: nominatim
        image: peterevans/nominatim-k8s:latest
        resources:
          limits:
            memory: "500Mi"
            cpu: "0.2"
          requests:
            memory: "150Mi"
            cpu: "0.06"
        ports:
        - containerPort: 8080
          name: web
        livenessProbe:
          httpGet:
            path: /status.php
            port: web
          failureThreshold: 1
          periodSeconds: 10
        startupProbe:
          httpGet:
            path: /status.php
            port: web
          failureThreshold: 30
          periodSeconds: 10
          initialDelaySeconds: 250
        volumeMounts:
        - name: firebase-credentials-volume
          mountPath: /etc/credentials
          readOnly: true
        - name: pg-data
          mountPath: /var/lib/postgresql/9.5/main
        envFrom:
        - configMapRef:
            name: nominatim-envs-config
      volumes:
      - name: firebase-credentials-volume
        secret:
          secretName: firebase-credentials-secret
      - name: pg-data
        emptyDir: {}

README instructions create service account that does not work

Great project. I find this extremely valuable.

One problem that I have run in to however, is that the instructions in the README for creating a service account to use in order to dump the data to the GCS bucket does not work.

I have repeatedly tried this, and each time it creates a service account that does not have access to the bucket I have created. Whether this account is used in k8s or separately by me testing on the command line, I get these errors: service account does not have storage.objects.list access to (bucket name)

I've verified in the GCS console that the permissions should work, but they don't. Any ideas on what I could be doing wrong here?

Canary deployment not able to process Bigger OSM files

I am trying to deploy nominatim for Europe, but even after 48 hours (approx.) it is still running and it is just Germany's OSM file.
For Maldives file (which is given in example) it works fine but as soon as I move to bigger files it seems to be taking just too much time. Below is my canary deployment file. (I am just pasting the spec.template.spec portion)

spec:
      volumes:
        - name: nominatim-secret-volume
          secret:
            secretName: osm-storage
      containers:
        - name: nominatim-k8s
          image: peterevans/nominatim-k8s:2.5.4
          env:
            - name: NOMINATIM_MODE
              value: CREATE
            - name: NOMINATIM_PBF_URL
              value: "https://download.geofabrik.de/europe/germany-latest.osm.pbf"
            - name: NOMINATIM_DATA_LABEL
              value: germany-2020
            - name: NOMINATIM_SA_KEY_PATH
              value: "/etc/nominatim-secret-volume/KeyFile.json"
            - name: NOMINATIM_PROJECT_ID
              value: my-project-id
            - name: NOMINATIM_GS_BUCKET
              value: "gs://my-bucket-path"
          volumeMounts:
            - name: nominatim-secret-volume
              readOnly: true
              mountPath: /etc/nominatim-secret-volume
          ports:
            - containerPort: 8080
          resources:
            limits:
              memory: 26Gi
            requests:
              memory: 24Gi
          readinessProbe:
            httpGet:
              path: /search
              port: 8080
            initialDelaySeconds: 30
            timeoutSeconds: 1

Maldives file is just 3.1 MB but Germany file is 3.2 GB. Does it need more resources for 3.2GB file size then I have already restricted it too ?

No gzip file created by nominatim-k8s canary deployment

I have been trying to run the nominatim-k8s deployment but the stable deployment keeps failing the health check. After modifying the deployment yaml i found the following in the logs for pod

Activated service account credentials for: [[email protected]]
Updated property [core/project].
CommandException: No URLs matched: gs://MY-GCP-PROJECT-nominatim/maldives-20161213/*.tgz*
CommandException: 1 file/object could not be transferred.
cat: '/srv/nominatim/data/maldives-20161213.tgz_*': No such file or directory

After this i checked the canary deployment and that seems to work fine without any issues.

2020-10-22 10:38:38 == Setup finished.
 * Stopping PostgreSQL 9.5 database server
   ...done.
tar: Removing leading `/' from member names
Activated service account credentials for: [[email protected]]
Updated property [core/project].
Copying file:///srv/nominatim/data/maldives-20161213.tgz_aa [Content-Type=application/octet-stream]...
- [1/1 files][136.2 MiB/136.2 MiB] 100% Done
Operation completed over 1 objects/136.2 MiB.
 * Starting PostgreSQL 9.5 database server
   ...done.
==> /var/log/apache2/access.log <==
2020-10-22 10:38:38 == Setup finished.
 * Stopping PostgreSQL 9.5 database server
   ...done.
tar: Removing leading `/' from member names
Activated service account credentials for: [[email protected]]
Updated property [core/project].
Copying file:///srv/nominatim/data/maldives-20161213.tgz_aa [Content-Type=application/octet-stream]...
- [1/1 files][136.2 MiB/136.2 MiB] 100% Done
Operation completed over 1 objects/136.2 MiB.
 * Starting PostgreSQL 9.5 database server
   ...done.
==> /var/log/apache2/access.log <==

I checked the cloud storage and a file was present there but without and file extension gs://MY-GCP-PROJECT-nominatim/maldives-20161213. Its is missing the file extension .tgz_ that is expected by stable deployment.

Remote postgres?

I think they added this to nominatim recently. The issue im having is trying to load the planet file from OSM into PG with the canary this which takes forever (I realized I needed lots of memory and disk space for the nodes). Then I would need to use pod anit-affinity in kubernetes to limit 1 container per node basically. I was wondering if you were going to at some point added support for hitting a remote postgres database?

Reverse api

Does this server offers the reverse api as well?

Wrong postgresql version

hey, on the kubernetes deploy, the canary starts fine but wen trys to do the copy it fail, reading a little bit about the error it show

tar: Removing leading `/' from member names
tar: /var/lib/postgresql/9.3/main: Cannot stat: No such file or directory

but it use the 9.5 so looking a bit the code on the docker-entrypoint.sh it need a fix on the line 12 to point to the right version

Persist database in docker image

Hey there

What would be the drawbacks if the desired pbf file would be processed and loaded into postgres during the image build?
Therefore, the docker-entrypoint.sh wouldn't distinguish between a "CREATE" and "RESTORE" mode, it would just start postgresql. The origin approach of restoring data using GKE would be needlessly.

Obviously the image size could be enormous.

I tried the approach successfully with https://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf (2.2MB).
docker.entrypoint.sh got reduced basically to

#!/bin/bash
# Start PostgreSQL
service postgresql start

# Tail Apache logs
# tail -f /var/log/apache2/* &

# Run Apache in the foreground
/usr/sbin/apache2ctl -D FOREGROUND

The Dockerfile got expanded, like

# Import pbf and import database
ARG NOMINATIM_PBF_URL

RUN NOMINATIM_DATA_PATH=${NOMINATIM_DATA_PATH:="/srv/nominatim/data"} \
    && NOMINATIM_DATA_LABEL=${NOMINATIM_DATA_LABEL:="data"} \
    && NOMINATIM_PBF_URL=${NOMINATIM_PBF_URL:="http://download.geofabrik.de/europe/switzerland-latest.osm.pbf"} \
    && NOMINATIM_POSTGRESQL_DATA_PATH=${NOMINATIM_POSTGRESQL_DATA_PATH:="/var/lib/postgresql/9.3/main"} \
    && curl -L $NOMINATIM_PBF_URL --create-dirs -o $NOMINATIM_DATA_PATH/$NOMINATIM_DATA_LABEL.osm.pbf \
    && chmod 755 $NOMINATIM_DATA_PATH \
    && service postgresql start \
    && sudo -u postgres psql postgres -tAc "SELECT 1 FROM pg_roles WHERE rolname='nominatim'" | grep -q 1 || sudo -u postgres createuser -s nominatim \
    && sudo -u postgres psql postgres -tAc "SELECT 1 FROM pg_roles WHERE rolname='www-data'" | grep -q 1 || sudo -u postgres createuser -SDR www-data \
    && sudo -u postgres psql postgres -c "DROP DATABASE IF EXISTS nominatim" \
    && useradd -m -p password1234 nominatim \
    && sudo -u nominatim /srv/nominatim/build/utils/setup.php --osm-file $NOMINATIM_DATA_PATH/$NOMINATIM_DATA_LABEL.osm.pbf --all --threads 2

For a 2nd approach I took https://download.geofabrik.de/europe/switzerland-latest.osm.pbf (295MB).
The image creation took around 90minutes. But I faced the following problem: moby/moby#22610.
It seems the host space got filled up during container startup until kubernetes killed the pod. I will provide more information about the reason the next days. It is not about data from within the container. Maybe, like the issue points out, it is about accumulated container logs.

Thank you for your thoughts.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.