GithubHelp home page GithubHelp logo

hashicorp / nomad-driver-podman Goto Github PK

View Code? Open in Web Editor NEW
220.0 30.0 59.0 10.85 MB

A nomad task driver plugin for sandboxing workloads in podman containers

Home Page: https://developer.hashicorp.com/nomad/plugins/drivers/podman

License: Mozilla Public License 2.0

Shell 0.42% Go 97.48% Makefile 1.22% HCL 0.88%
podman nomad-podman-driver nomad containers dockerless sandbox

nomad-driver-podman's Introduction

Nomad podman Driver

Many thanks to @towe75 and Pascom for contributing this plugin to Nomad!

Features

  • Use the jobs driver config to define the image for your container
  • Start/stop containers with default or customer entrypoint and arguments
  • Nomad runtime environment is populated
  • Use Nomad alloc data in the container.
  • Bind mount custom volumes into the container
  • Publish ports
  • Monitor the memory consumption
  • Monitor CPU usage
  • Task config cpu value is used to populate podman CpuShares
  • Task config cores value is used to populate podman Cpuset
  • Container log is forwarded to Nomad logger
  • Utilize podmans --init feature
  • Set username or UID used for the specified command within the container (podman --user option).
  • Fine tune memory usage: standard Nomad memory resource plus additional driver specific swap, swappiness and reservation parameters, OOM handling
  • Supports rootless containers with cgroup V2
  • Set DNS servers, searchlist and options via Nomad dns parameters
  • Support for nomad shared network namespaces and consul connect
  • Quite flexible network configuration, allows to simply build pod-like structures within a nomad group

Redis Example job

Here is a simple redis "hello world" Example:

job "redis" {
  datacenters = ["dc1"]
  type        = "service"

  group "redis" {
    network {
      port "redis" { to = 6379 }
    }

    task "redis" {
      driver = "podman"

        config {
          image = "docker://redis"
          ports = ["redis"]
        }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}
nomad run redis.nomad

==> Monitoring evaluation "9fc25b88"
    Evaluation triggered by job "redis"
    Allocation "60fdc69b" created: node "f6bccd6d", group "redis"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "9fc25b88" finished with status "complete"

podman ps

CONTAINER ID  IMAGE                           COMMAND               CREATED         STATUS             PORTS  NAMES
6d2d700cbce6  docker.io/library/redis:latest  docker-entrypoint...  16 seconds ago  Up 16 seconds ago         redis-60fdc69b-65cb-8ece-8554-df49321b3462

Building The Driver from source

This project has a go.mod definition. So you can clone it to whatever directory you want. It is not necessary to setup a go path at all. Ensure that you use go 1.17 or newer.

git clone [email protected]:hashicorp/nomad-driver-podman
cd nomad-driver-podman
make dev

The compiled binary will be located at ./build/nomad-driver-podman.

Runtime dependencies

  • Nomad 0.12.9+
  • Linux host with podman installed
  • For rootless containers you need a system supporting cgroup V2 and a few other things, follow this tutorial

You need a 3.0.x podman binary and a system socket activation unit, see https://www.redhat.com/sysadmin/podmans-new-rest-api

Nomad agent, nomad-driver-podman and podman will reside on the same host, so you do not have to worry about the ssh aspects of the podman api.

Ensure that Nomad can find the plugin, see plugin_dir

Driver Configuration

  • volumes stanza:

    • enabled - Defaults to true. Allows tasks to bind host paths (volumes) inside their container.
    • selinuxlabel - Allows the operator to set a SELinux label to the allocation and task local bind-mounts to containers. If used with volumes.enabled set to false, the labels will still be applied to the standard binds in the container.
plugin "nomad-driver-podman" {
  config {
    volumes {
      enabled      = true
      selinuxlabel = "z"
    }
  }
}
  • gc stanza:

    • container - Defaults to true. This option can be used to disable Nomad from removing a container when the task exits.
plugin "nomad-driver-podman" {
  config {
    gc {
      container = false
    }
  }
}
  • recover_stopped (bool) Defaults to false. Allows the driver to start and reuse a previously stopped container after a Nomad client restart. Consider a simple single node system and a complete reboot. All previously managed containers will be reused instead of disposed and recreated.

    WARNING - use of recover_stopped may cause Nomad agent to not start on system restarts. This setting has been left in place for compatibility.

plugin "nomad-driver-podman" {
  config {
    recover_stopped = true
  }
}
  • socket_path (string) Defaults to "unix:///run/podman/podman.sock" when running as root or a cgroup V1 system, and "unix:///run/user/<USER_ID>/podman/podman.sock" for rootless cgroup V2 systems
plugin "nomad-driver-podman" {
  config {
    socket_path = "unix:///run/podman/podman.sock"
  }
}
  • disable_log_collection (string) Defaults to false. Setting this to true will disable Nomad logs collection of Podman tasks. If you don't rely on nomad log capabilities and exclusively use host based log aggregation, you may consider this option to disable nomad log collection overhead. Beware to you also loose automatic log rotation.
plugin "nomad-driver-podman" {
  config {
    disable_log_collection = false
  }
}
  • extra_labels ([]string) Defaults to []. Setting this will automatically append Nomad-related labels to Podman tasks. Supports glob matching such as task*. Possible values are:
job_name
job_id
task_group_name
task_name
namespace
node_name
node_id
plugin "nomad-driver-podman" {
  config {
    extra_labels = ["job_name", "job_id", "task_group_name", "task_name", "namespace", "node_name", "node_id"]
  }
}
  • logging stanza:

    • type - Defaults to "nomad". See the task configuration for details.
    • options - Defaults to {}. See the task configuration for details.
  • client_http_timeout (string) Defaults to 60s default timeout used by http.Client requests

plugin "nomad-driver-podman" {
  config {
    client_http_timeout = "60s"
  }

Task Configuration

  • image - The image to run. Accepted transports are docker (default if missing), oci-archive and docker-archive. Images reference as short-names will be treated according to user-configured preferences.
config {
  image = "docker://redis"
}
  • auth - (Optional) Authenticate to the image registry using a static credential. tls_verify can be disabled for insecure registries.
config {
  image = "your.registry.tld/some/image"
  auth {
    username   = "someuser"
    password   = "sup3rs3creT"
    tls_verify = true
  }
}
  • entrypoint - (Optional) A string list overriding the image's entrypoint. Defaults to the entrypoint set in the image.
config {
  entrypoint = [
    "/bin/bash",
    "-c"
  ]
}
  • command - (Optional) The command to run when starting the container.
config {
  command = "some-command"
}
  • args - (Optional) A list of arguments to the optional command. If no command is specified, the arguments are passed directly to the container.
config {
  args = [
    "arg1",
    "arg2",
  ]
}
  • working_dir - (Optional) The working directory for the container. Defaults to the default set in the image.
config {
  working_dir = "/data"
}
  • volumes - (Optional) A list of host_path:container_path:options strings to bind host paths to container paths. Named volumes are not supported.
config {
  volumes = [
    "/some/host/data:/container/data:ro,noexec"
  ]
}
  • tmpfs - (Optional) A list of /container_path strings for tmpfs mount points. See podman run --tmpfs options for details.
config {
  tmpfs = [
    "/var"
  ]
}
  • devices - (Optional) A list of host-device[:container-device][:permissions] definitions. Each entry adds a host device to the container. Optional permissions can be used to specify device permissions, it is combination of r for read, w for write, and m for mknod(2). See podman documentation for more details.
config {
  devices = [
    "/dev/net/tun"
  ]
}
  • hostname - (Optional) The hostname to assign to the container. When launching more than one of a task (using count) with this option set, every container the task starts will have the same hostname.

  • Forwarding and Exposing Ports - (Optional) See Docker Driver Configuration for details.

  • init - Run an init inside the container that forwards signals and reaps processes.

config {
  init = true
}
  • init_path - Path to the container-init binary.
config {
  init = true
  init_path = /usr/libexec/podman/catatonit
}
  • user - Run the command as a specific user/uid within the container. See Task configuration
user = nobody

config {
}
  • logging - Configure logging. See also plugin option disable_log_collection

driver = "nomad" (default) Podman redirects its combined stdout/stderr logstream directly to a Nomad fifo. Benefits of this mode are: zero overhead, don't have to worry about log rotation at system or Podman level. Downside: you cannot easily ship the logstream to a log aggregator plus stdout/stderr is multiplexed into a single stream..

config {
  logging = {
    driver = "nomad"
  }
}

driver = "journald" The container log is forwarded from Podman to the journald on your host. Next, it's pulled by the Podman API back from the journal into the Nomad fifo (controllable by disable_log_collection) Benefits: all containers can log into the host journal, you can ship a structured stream incl. metadata to your log aggregator. No log rotation at Podman level. You can add additional tags to the journal. Drawbacks: a bit more overhead, depends on Journal (will not work on WSL2). You should configure some rotation policy for your Journal. Ensure you're running Podman 3.1.0 or higher because of bugs in older versions.

config {
  logging = {
    driver = "journald"
    options = {
      "tag" = "redis"
    }
  }
}
  • memory_reservation - Memory soft limit (nit = b (bytes), k (kilobytes), m (megabytes), or g (gigabytes))

After setting memory reservation, when the system detects memory contention or low memory, containers are forced to restrict their consumption to their reservation. So you should always set the value below --memory, otherwise the hard limit will take precedence. By default, memory reservation will be the same as memory limit.

config {
  memory_reservation = "100m"
}
  • memory_swap - A limit value equal to memory plus swap. The swap LIMIT should always be larger than the memory value.

Unit can be b (bytes), k (kilobytes), m (megabytes), or g (gigabytes). If you don't specify a unit, b is used. Set LIMIT to -1 to enable unlimited swap.

config {
  memory_swap = "180m"
}
  • memory_swappiness - Tune a container's memory swappiness behavior. Accepts an integer between 0 and 100.
config {
  memory_swappiness = 60
}

By default the task uses the network stack defined in the task group, see network Stanza. If the groups network behavior is also undefined, it will fallback to bridge in rootful mode or slirp4netns for rootless containers.

  • bridge: create a network stack on the default podman bridge.
  • none: no networking
  • host: use the Podman host network stack. Note: the host mode gives the container full access to local system services such as D-bus and is therefore considered insecure
  • slirp4netns: use slirp4netns to create a user network stack. This is the default for rootless containers. Podman currently does not support it for root containers issue.
  • container:id: reuse another podman containers network stack
  • task:name-of-other-task: join the network of another task in the same allocation.
config {
  network_mode = "bridge"
}
  • cap_add - (Optional) A list of Linux capabilities as strings to pass to --cap-add.
config {
  cap_add = [
    "SYS_TIME"
  ]
}
  • cap_drop - (Optional) A list of Linux capabilities as strings to pass to --cap-drop.
config {
  cap_add = [
    "MKNOD"
  ]
}
  • selinux_opts - (Optional) A list of process labels the container will use.
config {
  selinux_opts = [
    "type:my_container.process"
  ]
}
  • sysctl - (Optional) A key-value map of sysctl configurations to set to the containers on start.
config {
  sysctl = {
    "net.core.somaxconn" = "16384"
  }
}
  • privileged - (Optional) true or false (default). A privileged container turns off the security features that isolate the container from the host. Dropped Capabilities, limited devices, read-only mount points, Apparmor/SELinux separation, and Seccomp filters are all disabled.

  • tty - (Optional) true or false (default). Allocate a pseudo-TTY for the container.

  • labels - (Optional) Set labels on the container.

config {
  labels = {
    "nomad" = "job"
  }
}
  • apparmor_profile - (Optional) Name of a apparmor profile to be used instead of the default profile. The special value unconfined disables apparmor for this container:
config {
  apparmor_profile = "your-profile"
}
  • force_pull - (Optional) true or false (default). Always pull the latest image on container start.
config {
  force_pull = true
}
  • readonly_rootfs - (Optional) true or false (default). Mount the rootfs as read-only.
config {
  readonly_rootfs = true
}
  • ulimit - (Optional) A key-value map of ulimit configurations to set to the containers to start.
config {
  ulimit {
    nproc = "4242"
    nofile = "2048:4096"
  }
config {
  userns = "keep-id:uid=200,gid=210"
}
  • pids_limit - (Optional) An integer value that specifies the pid limit for the container.
config {
  pids_limit = 64
}
  • image_pull_timeout - (Optional) time duration for your pull timeout (default to 5m).
config {
  image_pull_timeout = "5m"
}

Network Configuration

nomad lifecycle hooks combined with the drivers network_mode allows very flexible network namespace definitions. This feature does not build upon the native podman pod structure but simply reuses the networking namespace of one container for other tasks in the same group.

A typical example is a network server and a metric exporter or log shipping sidecar. The metric exporter needs access to i.E. a private monitoring Port which should not be exposed the the network and thus is usually bound to localhost.

The repository includes three different examples jobs for such a setup. All of them will start a nats server and a prometheus-nats-exporter using different approaches.

You can use curl to proof that the job is working correctly and that you can get prometheus metrics:

curl http://your-machine:7777/metrics

2 Task setup, server defines the network

See examples/jobs/nats_simple_pod.nomad

Here, the server task is started as main workload and the exporter runs as a poststart sidecar. Because of that, Nomad guarantees that the server is started first and thus the exporter can easily join the servers network namespace via network_mode = "task:server".

Note, that the server configuration file binds the http_port to localhost.

Be aware that ports must be defined in the parent network namespace, here server.

3 Task setup, a pause container defines the network

See examples/jobs/nats_pod.nomad

A slightly different setup is demonstrated in this job. It reassembles more closely the idea of a pod by starting a pause task, named pod via a prestart/sidecar hook.

Next, the main workload, server is started and joins the network namespace by using the network_mode = "task:pod" stanza. Finally, Nomad starts the poststart/sidecar exporter which also joins the network.

Note that all ports must be defined on the pod level.

2 Task setup, shared Nomad network namespace

See examples/jobs/nats_group.nomad

This example is very different. Both server and exporter join a network namespace which is created and managed by Nomad itself. See nomad network stanza to get started with this generic approach.

Rootless on ubuntu

edit /etc/default/grub to enable cgroups v2

GRUB_CMDLINE_LINUX_DEFAULT="quiet cgroup_enable=memory swapaccount=1 systemd.unified_cgroup_hierarchy=1"

sudo update-grub

ensure that podman socket is running

$ systemctl --user status podman.socket
* podman.socket - Podman API Socket
     Loaded: loaded (/usr/lib/systemd/user/podman.socket; disabled; vendor preset: disabled)
     Active: active (listening) since Sat 2020-10-31 19:21:29 CET; 22h ago
   Triggers: * podman.service
       Docs: man:podman-system-service(1)
     Listen: /run/user/1000/podman/podman.sock (Stream)
     CGroup: /user.slice/user-1000.slice/[email protected]/podman.socket

ensure that you have a recent version of crun

$ crun -V
crun version 0.13.227-d38b
commit: d38b8c28fc50a14978a27fa6afc69a55bfdd2c11
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL

nomad job run example.nomad

job "example" {
  datacenters = ["dc1"]
  type        = "service"

  group "cache" {
    count = 1
    restart {
      attempts = 2
      interval = "30m"
      delay    = "15s"
      mode     = "fail"
    }
    network {
      port "redis" { to = 6379 }
    }
    task "redis" {
      driver = "podman"

      config {
        image = "redis"
        ports = ["redis"]
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB
      }
    }
  }
}

verify podman ps

$ podman ps
CONTAINER ID  IMAGE                           COMMAND       CREATED        STATUS            PORTS                                                 NAMES
2423ae3efa21  docker.io/library/redis:latest  redis-server  7 seconds ago  Up 6 seconds ago  127.0.0.1:21510->6379/tcp, 127.0.0.1:21510->6379/udp  redis-b640480f-4b93-65fd-7bba-c15722886395

Local Development

Requirements

  • Vagrant >= 2.2
  • VirtualBox >= v6.0

Vagrant Environment Setup

# create the vm
vagrant up

# ssh into the vm
vagrant ssh

Running a Nomad dev agent with the Podman plugin:

# Build the task driver plugin
make dev

# Copy the build nomad-driver-plugin executable to examples/plugins/
cp ./build/nomad-driver-podman examples/plugins/

# Start Nomad
nomad agent -config=examples/nomad/server.hcl 2>&1 > server.log &

# Run the client as sudo
sudo nomad agent -config=examples/nomad/client.hcl 2>&1 > client.log &

# Run a job
nomad job run examples/jobs/redis_ports.nomad

# Verify
nomad job status redis

sudo podman ps

Running the tests:

# Start the Podman server
systemctl --user start podman.socket

# Run the tests
CI=1 ./build/bin/gotestsum --junitfile ./build/test/result.xml -- -timeout=15m . ./api

nomad-driver-podman's People

Contributors

acornies avatar apollo13 avatar dependabot[bot] avatar drewbailey avatar gjpin avatar hashicorp-ci avatar hashicorp-copywrite[bot] avatar hashicorp-tsccr[bot] avatar hc-github-team-nomad-ecosystem avatar jdoss avatar jrasell avatar lgfa29 avatar modrake avatar ncode avatar optiz0r avatar pabloyoyoista avatar rbarry82 avatar rjthomas013 avatar sarahethompson avatar schmichael avatar seriv avatar shishir-a412ed avatar shoenig avatar sushain97 avatar tgross avatar towe75 avatar ttys3 avatar unexist avatar weeezes avatar zyclonite avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nomad-driver-podman's Issues

nginx image fails to start with podman driver

When I use podman driver I get below error.

rpc error: code = Unknown desc = failed to start task, could not start container: io.podman.ErrorOccurred(Reason: container_linux.go:349: starting container process caused "exec: \"nginx\": executable file not found in $PATH": OCI runtime command not found error)

The same works with docker driver. At the same time using directly podman to start nginx image works. I'm also able to run nomad's redis example with podman driver.

I'm using Arch Linux with pacman's podman package. I used below to install podman.

systemctl start podman
systemctl start io.podman

Example nomad job:

job "nginx" {
  datacenters = ["dc1"]

  group "nginx" {
    reschedule {
      attempts  = 0
      unlimited = false
    }

    task "web" {
      driver = "podman"

      config {
        image = "nginx"

        port_map {
          http = 80
        }
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB

        network {
          mbits = 10
          port  "http"  {}
        }
      }

      service {
        name = "nginx"
        port = "http"

        check {
          name     = "alive"
          type     = "http"
          path     = "/"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

Clean-up containers on exit?

After execution of job, podman ps -a show stopped containers.
I think it is undesirable to leave them behind by default but regardless from the defaults it would be great to have a configurable option to pass --rm when container is started.

Thanks.

Support to run pods from yaml definitions

First, I have to say that I am very much a novice regarding both Nomad and Podman.

Exploring podman, I have found that it supports pods the same way as Kubernetes does - guess thats the reason for the name.

This means that a yaml file in the same format as the Kubernetes pod definition files can both be generated and played, launching one or multiple containers as specified: https://github.com/containers/libpod/blob/master/docs/source/markdown/podman-play-kube.1.md

The central point of this is that all containers in a pod share the same namespaces. This is essential to the pod concept, allowing composition of multiple ready-made, unchanged docker images to a useful unit which is always managed as a whole.

According to the following issue, it isn't foreseeable that Nomad tasks will be able to share the same namespace due to architectural constraints: hashicorp/nomad#3622

Therefore, this seems to be something that should be supported by the podman driver / task itself as an alternative operation mode to the "single container" approach.

Basically, a way should be provided to be able to run an entire pod from a pod definition file.

Does this seem like a sensible approach? I am not sure if and how this fits into the architecture of Nomad, as I don't really know it, but in theory I would assume that a pod is just another type of workload, not that much different from a single container.

Environment leak between jobs

The driver leaks envionment variables. It basically accumulates all env variables over time.

Steps to reproduce:

  1. run a job and set env "foo1"="bar"
  2. inspect container, "foo1" is bar, like expected
  3. run another job and set env "foo2" = "bar
  4. inspect second container. It will show both "foo1" and "foo2" variables.

no STDOUT/STDERR

STDOUT and STDERR are not visible in Nomad.

Something like the following config demonstrates that the output is going nowhere and no log appears in the alloc directory:

config {
    image = "docker.io/library/debian:stable"
    command = "/bin/echo"
    args = [ "4444" ]
}

Driver API/Container stability issues

Our e2e test suite occasionally hits an error trying to register a podman job, once rescheduled it usually doesn't fail again. I think we likely need to retry when inspecting the container

Task "redis" is "dead"
Task Resources
CPU      Memory   Disk     Addresses
500 MHz  256 MiB  300 MiB  redis: 172.31.84.120:30133

Task Events:
Started At     = N/A
Finished At    = 2020-08-14T12:56:05Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type             Description
2020-08-14T08:56:06-04:00  Killing          Sent interrupt. Waiting 5s before force killing
2020-08-14T08:56:05-04:00  Alloc Unhealthy  Unhealthy because of failed task
2020-08-14T08:56:05-04:00  Not Restarting   Error was unrecoverable
2020-08-14T08:56:05-04:00  Driver Failure   rpc error: code = Unknown desc = failed to start task, could not inspect container : unexpected EOF
2020-08-14T08:56:04-04:00  Task Setup       Building Task Directory
2020-08-14T08:56:04-04:00  Received         Task received by client

Can I just use the binaries? What do you recommend?

Hi,

I was wondering if I could just use the nomad and nomad-drive-podman binaries, what do you recommend in terms of utilization and setup? I saw the README and it seems that you guys are looking for to the build I was thinking about not doing the build but rather just use the binaries.

Will the binary work in Ubuntu, Fedora?

Seems that the Nomad setups are more Fedora oriented, I'm fine with the Fedora is just that I wanted to understand if I could just used the binaries and how should I use them and where is best.

Can we add --sysctl in podman driver?

Hi,

Sometimes you need to set some kernel values like net.core.somaxconn with sysctl. 
It is available in docker driver, can we also have it for podman driver?

Thanks!

Driver is not reading correct rootless info

Hi guys, recently I have been playing with the driver quite some and found some information displayed in nomad ui incorrectly, which is the rootless attribute.

Per the current implementation, it's reading from info.Host.Rootless:

attrs["driver.podman.rootless"] = pstructs.NewBoolAttribute(info.Host.Rootless)

However I think it should be host.security.rootless per the podman's documentation: https://docs.podman.io/en/latest/_static/api.html#operation/libpodGetInfo

Please let me know if I got things wrong, and thank you for this awesome driver!

container init caused \"setenv: invalid argument\"

Hi, I'm unable to run the redis example, podman refuses to run the container:

CentOS Linux release 8.1.1911 (Core)
Nomad v0.10.2 (5adbc10ffcc2ae18524a857e70df22842eaf5d95+CHANGES)
nomad-driver-podman: 0.0.3
podman-1.4.2-5.module_el8.1.0+237+63e26edc

No difference when disabling SELinux (= setenforce 0, nomad runs confined, but permissive).

Jan 26 23:16:44 hypercube.lan nomad[57464]: ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
Jan 26 23:16:44 hypercube.lan nomad[57464]: ==> Loaded configuration from /etc/nomad.d/consul.hcl, /etc/nomad.d/nomad.json
Jan 26 23:16:44 hypercube.lan nomad[57464]: ==> Starting Nomad agent...
Jan 26 23:16:48 hypercube.lan nomad[57464]: ==> Nomad agent configuration:
Jan 26 23:16:48 hypercube.lan nomad[57464]:        Advertise Addrs: HTTP: 192.168.1.4:4646; RPC: 192.168.1.4:4647; Serf: 192.168.1.4:4648
Jan 26 23:16:48 hypercube.lan nomad[57464]:             Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
Jan 26 23:16:48 hypercube.lan nomad[57464]:                 Client: true
Jan 26 23:16:48 hypercube.lan nomad[57464]:              Log Level: INFO
Jan 26 23:16:48 hypercube.lan nomad[57464]:                 Region: global (DC: home)
Jan 26 23:16:48 hypercube.lan nomad[57464]:                 Server: true
Jan 26 23:16:48 hypercube.lan nomad[57464]:                Version: 0.10.2
Jan 26 23:16:48 hypercube.lan nomad[57464]: ==> Nomad agent started! Log data will stream in below:
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.577+0100 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.577+0100 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.577+0100 [INFO]  agent: detected plugin: name=rkt type=driver plugin_version=0.1.0
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.577+0100 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.577+0100 [INFO]  agent: detected plugin: name=podman type=driver plugin_version=0.0.1-dev
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.577+0100 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.577+0100 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.577+0100 [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.623+0100 [INFO]  nomad: raft: Initial configuration (index=1): [{Suffrage:Voter ID:192.168.1.4:4647 Address:192.168.1.4:4647}]
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.623+0100 [INFO]  nomad: raft: Node at 192.168.1.4:4647 [Follower] entering Follower state (Leader: "")
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.624+0100 [INFO]  nomad: serf: EventMemberJoin: hypercube.lan.global 192.168.1.4
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.624+0100 [INFO]  nomad: starting scheduling worker(s): num_workers=8 schedulers=[service, batch, system, _core]
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.624+0100 [WARN]  nomad: serf: Failed to re-join any previously known node
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.624+0100 [INFO]  nomad: adding server: server="hypercube.lan.global (Addr: 192.168.1.4:4647) (DC: home)"
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.624+0100 [INFO]  client: using state directory: state_dir=/var/lib/nomad/client
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.624+0100 [INFO]  client: using alloc directory: alloc_dir=/var/lib/nomad/alloc
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.626+0100 [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.628+0100 [INFO]  client.fingerprint_mgr.consul: consul agent is available
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.632+0100 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=bond0
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:44.634+0100 [INFO]  client.fingerprint_mgr.vault: Vault is available
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:45.759+0100 [WARN]  nomad: raft: Heartbeat timeout from "" reached, starting election
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:45.759+0100 [INFO]  nomad: raft: Node at 192.168.1.4:4647 [Candidate] entering Candidate state in term 8
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:45.829+0100 [INFO]  nomad: raft: Election won. Tally: 1
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:45.829+0100 [INFO]  nomad: raft: Node at 192.168.1.4:4647 [Leader] entering Leader state
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:45.830+0100 [INFO]  nomad: cluster leadership acquired
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:48.635+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:48.635+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=device
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:48.718+0100 [INFO]  client: started client: node_id=89a0d532-c081-7bb2-5c24-c99e3ae9a5ee
Jan 26 23:16:48 hypercube.lan nomad[57464]:     2020-01-26T23:16:48.778+0100 [INFO]  client: node registration complete
Jan 26 23:16:57 hypercube.lan nomad[57464]:     2020-01-26T23:16:57.949+0100 [INFO]  client: node registration complete
Jan 26 23:17:16 hypercube.lan nomad[57464]:     2020-01-26T23:17:16.212+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=ef62b5e6-92e8-cdcc-1874-43e0ca0ccc10 task=redis @module=logmon path=/var/lib/nomad/alloc/ef62b5e6-92e8-cdcc-1874-43e0ca0ccc10/alloc/logs/.redis.stdout.fifo timestamp=2020-01-26T23:17:16.212+0100
Jan 26 23:17:16 hypercube.lan nomad[57464]:     2020-01-26T23:17:16.213+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=ef62b5e6-92e8-cdcc-1874-43e0ca0ccc10 task=redis @module=logmon path=/var/lib/nomad/alloc/ef62b5e6-92e8-cdcc-1874-43e0ca0ccc10/alloc/logs/.redis.stderr.fifo timestamp=2020-01-26T23:17:16.213+0100
Jan 26 23:17:17 hypercube.lan nomad[57464]:     2020-01-26T23:17:17.456+0100 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=ef62b5e6-92e8-cdcc-1874-43e0ca0ccc10 task=redis error="rpc error: code = Unknown desc = failed to start task, could not start container: io.podman.ErrorOccurred(Reason: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"setenv: invalid argument\""
Jan 26 23:17:17 hypercube.lan nomad[57464]: : OCI runtime error)"
Jan 26 23:17:17 hypercube.lan nomad[57464]:     2020-01-26T23:17:17.456+0100 [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=ef62b5e6-92e8-cdcc-1874-43e0ca0ccc10 task=redis reason="Error was unrecoverable"
Jan 26 23:17:17 hypercube.lan nomad[57464]:     2020-01-26T23:17:17.506+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=ef62b5e6-92e8-cdcc-1874-43e0ca0ccc10
Jan 26 23:17:21 hypercube.lan nomad[57464]:     2020-01-26T23:17:21.507+0100 [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=ef62b5e6-92e8-cdcc-1874-43e0ca0ccc10 task=redis @module=logmon timestamp=2020-01-26T23:17:21.506+0100

Podman doesn't pull images on nomad job schedule

Hi,

I have a fresh podman installation on fedora coreos, with no local images. plugin is correctly configured and I try to run the example.nomad job (which is generated by nomad job init) with minor modifications:

...
task "redis" {
    driver = "podman"
    config {
        image = "docker.io/library/redis:4.0"
        # no ports
    }
...
}

When running nomad job run example.nomad, the job is schedued but fails to start. But it starts when I run podman pull docker.io/library/redis:4.0 on the nomad host. So apparently the image is not automatically pulled if it doesn't exist.

Is this a misconfiguration on my side or a bug?

Release prebuilt binaries

In order to automate installation of the plugin, is useful to release prebuilt binaries in GitHub releases page

nomad getting EOF from podman

i am getting this in the monitoring output of my server:

2020-12-09T17:49:53.476Z [ERROR] client.driver_mgr.nomad-driver-podman: Could not get podman info: driver=podman @module=podman err="unexpected EOF" timestamp=2020-12-09T17:49:53.475Z
2020-12-09T17:49:55.454Z [TRACE] client: next heartbeat: period=10.164087418s
2020-12-09T17:50:05.621Z [TRACE] client: next heartbeat: period=18.108048407s
2020-12-09T17:50:17.847Z [DEBUG] client.driver_mgr.nomad-driver-podman: Get podman info: driver=podman @module=podmanClient timestamp=2020-12-09T17:50:17.846Z
2020-12-09T17:50:23.474Z [ERROR] client.driver_mgr.nomad-driver-podman: Could not get podman info: driver=podman @module=podman err="unexpected EOF" timestamp=2020-12-09T17:50:23.473Z
2020-12-09T17:50:23.731Z [TRACE] client: next heartbeat: period=17.728592082s

is the latest podman driver compatible with nomad v1.0?

Support for bridge networking

Hello,

does this driver support bridge networking? Nomad fails to schedule any jobs with:

task "whatever" {
  ...
  group "whatever" {
    network {
      mode = "bridge"
    }
    ...
  }
}

Saying: Unable to add allocation due to error: failed to configure network manager: task whatever does not support "bridge" networking mode.

I am not sure if I should be looking for the culprit in nomad, podman or this driver.

The same job definition works when I use docker driver (with CNI installed).

Thanks for the info.

Nomad seems to be publishing wrong port to Consul

I'm running an nginx container that listens internally on port 80. Here's my task:

    task "nginx" {
      driver = "podman"
      config {
        image = ".../telescope-frontend:3559d7c273ca"
        port_map = {
          http = 80
        }
      }

      env { }

      resources {
        cpu    = 1000  # MHz
        memory = 512  # MB

        network {
          port "http" {}
        }
      }

It works as expected. The container spawns and the service is published. But the published port is wrong- Consul has it registered as Port 80, when Nomad has it running on 22352.

root@nyc3-nomad1 ~]# dig @127.0.0.1 -p 8600 _website-telescope-frontend-primary-nginx._tcp.service.consul SRV
...
;; ANSWER SECTION:
_website-telescope-frontend-primary-nginx._tcp.service.consul. 0 IN SRV	1 1 80 0a580006.addr.nyc3.consul.
...

This is an issue for obvious reasons. The process is running on the right post on the host and is accessible.

[root@nyc3-nomad1 ~]# curl 10.234.4.9:22352
<!DOCTYPE html><html lang="en"><head>

I'm using driver v0.1.0 under CentOS 8

handling oom-killer?

When allocation is oom-killed the error is very ambiguous. After the incident Nomad repeatedly complaints "Unknown allocation" to logs...

Would it be possible to detect whether allocation failed due to oom-killer?

See also hashicorp/nomad#2203

Upgrade to V2 API

Podman varlink API is no longer supported.

The V2 HTTP API is where all future effort and bugfixes will be implemented.

Need to decide if we are going to consume their swagger API and generate a swagger client or just create our own API client. Inclined to do the latter because the generated swagger API doesn't provide a lot of control and requires us to navigate through some interesting generated code.

Issues that are fixed in v2 api:
containers/podman#4560

varlink: empty network field does not fallback to default

if a nil string is passed to io.varlink podman seems to not use the default value

The following should use slirp4netns since no -network value is passed in, but instead slirp4netns is not running

→ varlink call unix:/run/user/1000/podman/io.podman/io.podman.CreateContainer '{ "create": { "args": ["docker://registry.hub.docker.com/library/redis:6.0.3-buster", "-m 20", "-network"], "detach": true }}'
{
  "container": "6875ca9ae683a6b5336fdae57990b2ef33e141d70d694e322d4935e613de32d3"
}

λ drew [work/nomad-dev/podman] at  master !?
→ varlink call unix:/run/user/1000/podman/io.podman/io.podman.StartContainer '{ "name": "6875ca9ae683a6b5336fdae57990b2ef33e141d70d694e322d4935e613de32d3" }'
{
  "container": "6875ca9ae683a6b5336fdae57990b2ef33e141d70d694e322d4935e613de32d3"
}

ps aux | grep slirp -> nothing

→ podman run -m=40m --network= --cpu-shares 200 redis
→ ps aux | grep slirp
sable-host-loopback --mtu 65520 --enable-sandbox --enable-seccomp -c -e 3 -r 4 --netns-type=path /run/user/1000/netns/cni-fc123a4f-c8c2-ecbb-581d-21f5636d2329 tap0

Networking broken without deprecated task level config

As of Nomad 0.12, task level networks and port_map are both deprecated and have been replaced with group level network syntax. The podman driver does not work with the new syntax and requires the (largely undocumented) old syntax and config locations. It appears as if something similar to hashicorp/nomad#8623 needs to be implemented.

#63 seems to be the same issue, but it hasn't gotten much attention due to it's title and wording.

FTBFS with Nomad 0.10.0, Podman 1.6.1

src/github.com/pascomnet/nomad-driver-podman/driver.go:208:40: not enough arguments in call to iopodman.GetInfo().Call
        have (*varlink.Connection)
        want (context.Context, *varlink.Connection)
src/github.com/pascomnet/nomad-driver-podman/driver.go:254:41: not enough arguments in call to iopodman.Ps().Call
        have (*varlink.Connection, iopodman.PsOpts)
        want (context.Context, *varlink.Connection, iopodman.PsOpts)
src/github.com/pascomnet/nomad-driver-podman/driver.go:324:53: not enough arguments in call to iopodman.CreateContainer().Call
        have (*varlink.Connection, iopodman.Create)
        want (context.Context, *varlink.Connection, iopodman.Create)
src/github.com/pascomnet/nomad-driver-podman/driver.go:332:47: not enough arguments in call to iopodman.RemoveContainer().Call
        have (*varlink.Connection, string, bool, bool)
        want (context.Context, *varlink.Connection, string, bool, bool)
src/github.com/pascomnet/nomad-driver-podman/driver.go:337:41: not enough arguments in call to iopodman.StartContainer().Call
        have (*varlink.Connection, string)
        want (context.Context, *varlink.Connection, string)
src/github.com/pascomnet/nomad-driver-podman/driver.go:347:54: not enough arguments in call to iopodman.InspectContainer().Call
        have (*varlink.Connection, string)
        want (context.Context, *varlink.Connection, string)
src/github.com/pascomnet/nomad-driver-podman/driver.go:509:49: not enough arguments in call to varlink.NewConnection
        have (string)
        want (context.Context, string)
src/github.com/pascomnet/nomad-driver-podman/handle.go:127:59: not enough arguments in call to iopodman.GetContainerStats().Call
        have (*varlink.Connection, string)
        want (context.Context, *varlink.Connection, string)
src/github.com/pascomnet/nomad-driver-podman/handle.go:185:44: not enough arguments in call to iopodman.StopContainer().Call
        have (*varlink.Connection, string, int64)
        want (context.Context, *varlink.Connection, string, int64)
src/github.com/pascomnet/nomad-driver-podman/handle.go:187:45: not enough arguments in call to iopodman.KillContainer().Call
        have (*varlink.Connection, string, number)
        want (context.Context, *varlink.Connection, string, int64)
src/github.com/pascomnet/nomad-driver-podman/handle.go:187:45: too many errors

Error in launching example (Redis) container using nomad-driver-podman

Hi,

I am trying to setup nomad-driver-podman and launch the example redis container, however it's failing with the following error message:

2020-08-18T00:22:46.454Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=054c6dbb-7ebe-37ce-91de-3129599c3ae3 task=redis error="rpc error: code = Unknown desc = failed to start task, could not create container: org.varlink.service.InvalidParameter"

I followed the instructions from here

This is my setup.

  1. Vagrant VM (fedora 28)
[root@localhost nomad-driver-podman]# cat /etc/os-release
NAME=Fedora
VERSION="28 (Twenty Eight)"
ID=fedora
VERSION_ID=28
VERSION_CODENAME=""
PLATFORM_ID="platform:f28"
PRETTY_NAME="Fedora 28 (Twenty Eight)"
  1. Nomad version
[vagrant@localhost nomad-driver-podman]$ nomad version
Nomad v0.11.2 (807cfebe90d56f9e5beec3e72936ebe86acc8ce3)
  1. Golang version
[vagrant@localhost nomad-driver-podman]$ go version
go version go1.14.3 linux/amd64
  1. Install and setup podman without the SSH steps.
[vagrant@localhost ~]$ podman version
Version:            1.1.2
RemoteAPI Version:  1
Go Version:         go1.10.8
Git Commit:         ff1bf1d7cb7f6f4e948e5419cca9bb48d38eddd1
Built:              Tue Mar  5 18:12:40 2019
OS/Arch:            linux/amd64
  1. Clone the repository: [email protected]:hashicorp/nomad-driver-podman.git

  2. ./build.sh

  3. mkdir -p /tmp/podman-driver

  4. mv nomad-driver-podman /tmp/podman-driver

  5. mkdir example

  6. cd example and create agent.hcl

plugin "nomad-driver-podman" {
  config {
    gc {
      container = false
    }
  }
}
  1. Start nomad ( + nomad-driver-podman)
$$ sudo nomad agent -dev -config=/home/vagrant/nomad-driver-podman/example/agent.hcl  -plugin-dir=/tmp/podman-driver/
  1. nomad node status <node_id>|grep "Driver Status"
[root@localhost nomad-driver-podman]# nomad node status 56cf067d|grep "Driver Status"
Driver Status   = exec,podman,raw_exec

podman driver is registered.

  1. Create a job.
$ nomad init -short (Change driver and image)
[root@localhost example]# cat example.nomad
job "example" {
  datacenters = ["dc1"]

  group "cache" {
    task "redis" {
      driver = "podman"

      config {
        image = "docker://redis:3.2"

        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500
        memory = 256

        network {
          mbits = 10
          port  "db"  {}
        }
      }
    }
  }
}
  1. Launch the job
$$ nomad job run example.nomad

(14) fails with the error message:

2020-08-18T00:42:40Z  Driver Failure   rpc error: code = Unknown desc = failed to start task, could not create container: org.varlink.service.InvalidParameter

Run rootless containers while running nomad as root

Hi all,

I'd like to be able to run a nomad job using "rootless" style containers while nomad itself is still running as root. That is to say, I'd like to be able to run the container with user namespacing enabled, and uid 0 within the container mapped to a chosen non-root uid on the host. Nomad itself might be called upon to run "rootless" containers on behalf of multiple other uids (which removes running nomad as a non-root uid from consideration).

This Podman driver implements the {{--user}} flag the same way as docker does, all processes within the container are squashed to run as the chosen uid, and the "rootless" functionality is lost (having multiple users within the container, uid 0 having sufficient privilege to switch user, change file owners etc). This is how we currently run docker containers under nomad, but its problematic in that containers that users are developing locally using non-root podman will not behave the same way when launched under nomad, and containers from the docker hub that require processes to run as specific uids (or use multiple uids internally) fail with filesystem permission errors and similar.

Do you think it would be possible to get the "rootless" behaviour using the podman v2 http api, either by having nomad itself drop privileges to the desired user before making the api request, or by passing additional options via the api to setup the user namespacing in a way equivalent to running podman run as the target uid? I see that some user namespacing options exist in the code but are currently commented out (https://github.com/hashicorp/nomad-driver-podman/blob/master/api/structs.go#L297).

Thanks,
Ben

Be prepared for upgrade of Consul

FYI in newer releases of Consul github.com/hashicorp/consul/lib/freeport has been moved to github.com/hashicorp/consul/sdk/freeport.

Patch is trivial:

--- a/driver_test.go
+++ b/driver_test.go
@@ -27,9 +27,9 @@
        "testing"
        "time"
        "os"

-       "github.com/hashicorp/consul/lib/freeport"
+       "github.com/hashicorp/consul/sdk/freeport"
        "github.com/hashicorp/nomad/client/taskenv"
        ctestutil "github.com/hashicorp/nomad/client/testutil"
        "github.com/hashicorp/nomad/helper/testlog"
        "github.com/hashicorp/nomad/helper/uuid"

See also hashicorp/nomad#6785

sometimes fails to start: Reattachment process not found

dispensing initial plugin failed: driver=podman error="failed to start plugin: Reattachment process not found"

This is happening on restart of nomad probably when some allocations are still running when the nomad is stopped...

0.1.0 [regression]: invalid reference format oci-archive:app.oci

After upgrading from 0.0.3 my containers don't start any more:

Couldn't create image: image oci-archive:/mnt/apps/app.oci couldn't be inspected: io.podman.ImageNotFound(Id: oci-archive:/mnt/apps/app.oci, Reason: invalid reference format)

Downgrading back to 0.0.3 fixed the problem...

Update build scripts/ ci process

Upgrading the go version to 1.14.5 was quite difficult. I think github actions works best if user & sudo calls are minimal.

  • Move scripts into centralized makefile
  • Only invoke sudo, per script step when necceessary

how to control swap?

One of my containers uses swap heavily...
Is there a way to pass --memory-swap and/or --memory-swapiness to Podman in order to restrict swap usage?

SELinux labeled volumes on shared filesystem

OS version: CentOS Linux release 8.2.2004 (Core)
nomad-driver-podman version: git commit 49d9894
nomad version: 0.12.3
podman version: 1.6.4, shipped by CentOS 8
gluster version: 7.7

I got 2 nomad nodes on CentOS 8 set up to mount a gluster volume:

gluster.service.local-consul:/gv0 /shared 	glusterfs context=system_u:object_r:container_file_t:s0,selinux 0 0

ls shows the correct context is applied:

# ls /mnt/bar
drwxr-xr-x.  1 root root system_u:object_r:container_file_t:s0    4.0K Sep  7 16:15 bar

Running chcon on that filesystem produces an error:

# chcon -u system_u -r system_r -t container_file_t -l s0 /mnt/bar
chcon: failed to change context of '/mnt/bar' to ‘system_u:system_r:container_file_t:s0’: Operation not supported

Setting selinuxlabel = "z" won't let me start any container that has volumes on that share, as podman forces relabeling, triggering this error:

rpc error: code = Unknown desc = failed to start task, could not start container: io.podman.ErrorOccurred(Reason: relabel failed "/mnt/foo": operation not supported)

According to the nomad-driver-podman's docs, selinuxlabel is supposed to apply only to the local, secrets and alloc mounts:

Allows the operator to set a SELinux label to the allocation and task local bind-mounts to containers

If the job in question is placed on a filesystem that supports SELinux labels it works fine.

Expected results:

podman not being forced to relabel on a volume that's properly prepared.

Task failed: 'cannot unmarshal array into Go struct field InspectContainerHostConfig.Tmpfs of type map[string]string'

Hello, thanks for creating this driver.

I am having an issue running the example redis task in the description on CentOS 8 hosts. When I try to run the task, I encounter this error:

Jan 27 20:33:50 4128528c26eb nomad[6738]:     2020-01-27T20:33:50.200+1100 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=8a8d7eb3-0d56-e2a3-2e7e-3efc01648873 task=redis error="rpc error: code = Unknown desc = failed to start task, could not inspect container : json: cannot unmarshal array into Go struct field InspectContainerHostConfig.Tmpfs of type map[string]string"

I have tried with and without SELinux enabled. System and version information (both CentOS 8 hosts have the same config):

[root@4128528c26eb ~]# cat /etc/centos-release
CentOS Linux release 8.1.1911 (Core)

[root@4128528c26eb ~]# nomad version
Nomad v0.10.2 (0d2d6e3dc5a171c21f8f31fa117c8a765eb4fc02)

[root@4128528c26eb ~]# podman version
Version:            1.4.2-stable2
RemoteAPI Version:  1 
Go Version:         go1.12.8
OS/Arch:            linux/amd64

[root@4128528c26eb ~]# cat /etc/nomad.d/podman.nomad.hcl
plugin "nomad-driver-podman" {
  config {
    volumes {
      enabled      = true
      selinuxlabel = "z"
    }
  }
}

[root@4128528c26eb ~]# dnf list installed | grep varlink
libvarlink.x86_64                     18-3.el8                                          @anaconda 
libvarlink-util.x86_64                18-3.el8                                          @BaseOS

Let me know if you require more information or if you would like me to try anything, I am more than happy to help resolve this issue. Thanks!

Support network bridge mode

Driver currently supports bridge network mode via task config but not from a driver and task group perspective.

Support the connect demo

job "countdash" {
  datacenters = ["dc1"]

  group "api" {
    network {
      mode = "bridge"
    }

    service {
      name = "count-api"
      port = "9001"

      connect {
        sidecar_service {}
      }
    }

    task "web" {
      driver = "podman"

      config {
        image = "hashicorpnomad/counter-api:v1"
      }
    }
  }

  group "dashboard" {
    network {
      mode = "bridge"

      port "http" {
        static = 9002
        to     = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8081
            }
          }
        }
      }
    }

    task "dashboard" {
      driver = "podman"

      env {
        COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
      }

      config {
        image = "hashicorpnomad/counter-dashboard:v1"
      }
    }
  }
}

alloc logs stop after an alloc restart

nomad alloc logs -f <alloc> then in a separate terminal/process nomad alloc restart <alloc>

nomad alloc status shows running, nomad alloc logs -f <alloc> produces no more logs.

Am documenting for now but plan to dig into it

Improve test environment and test code quality

Podman version

We're using a dated podman version in our unit tests. It's installed from the projectatomic ppa but the recommended repo is now hosted at opensuse.org.

Code quality

driver_test has a lot of repetitive boilerplate code, we should refactor some often used patterns into tool functions. Also there is a unnecessary, hardcoded, 2 second timeout for some tests, see #45 (review).

Driver crash when using an entrypoint

I'm trying to spawn a docker container. My taks is as follows:

    task "frontend" {
      driver = "podman"
      config {
        image = "docker-hub/prod/telescope-frontend:8d2059f411a4"
        entrypoint = "/entrypoint.sh"
        port_map = {
          http = 80
        }
    }

I tried to specify the entrypoint because I was getting stderr F env: can't execute 'sh': No such file or directory type error when I spawned it. Adding it caused the plugin to crash with the following error:

 2020-08-18T20:56:32.721Z [DEBUG] client.driver_mgr: waiting for RPC address: driver=podman path=/opt/nomad/plugins/nomad-driver-podman
 2020-08-18T20:56:32.731Z [DEBUG] client.driver_mgr: using plugin: driver=podman version=2
 2020-08-18T20:56:32.731Z [DEBUG] client.driver_mgr.nomad-driver-podman: plugin address: driver=podman address=/tmp/plugin581382622 network=unix timestamp=2020-08-18T20:56:32.730Z
 2020-08-18T20:56:32.738Z [DEBUG] client.driver_mgr.nomad-driver-podman: Get podman info: driver=podman @module=podmanClient timestamp=2020-08-18T20:56:32.738Z
 2020-08-18T20:56:32.742Z [DEBUG] client.driver_mgr.nomad-driver-podman: Inspect image: driver=podman @module=podmanClient image=docker-hub/prod/telescope-frontend:8d2059f411a4 timestamp=2020-08-18T20:56:32.739Z
 2020-08-18T20:56:32.742Z [DEBUG] client.driver_mgr.nomad-driver-podman: Image created: driver=podman img_id=8d2059f411a41384ee67194c46cb2570fcb749e7c02fd99c052ef8fd1cbbf3de @module=podman config="map[Cmd:[sh /entrypoint.sh] Entrypoint:[/docker-entrypoint.sh] Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/u>
 2020-08-18T20:56:32.743Z [DEBUG] client.driver_mgr.nomad-driver-podman: created/pulled image: driver=podman @module=podman img_id=8d2059f411a41384ee67194c46cb2570fcb749e7c02fd99c052ef8fd1cbbf3de timestamp=2020-08-18T20:56:32.742Z
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: panic: runtime error: invalid memory address or nil pointer dereference: driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xb57e32]: driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: : driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: goroutine 34 [running]:: driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: main.(*Driver).StartTask(0xc0002d8230, 0xc000192200, 0xc0001c2e00, 0x644, 0x644, 0xccd8c0): driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman:         /home/colum/workspace/terraform/puppet/tmp/nomad-driver-podman-0.1.0/driver.go:372 +0x3c2: driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: github.com/hashicorp/nomad/plugins/drivers.(*driverPluginServer).StartTask(0xc00028c860, 0xeb1780, 0xc0001827b0, 0xc0001827e0, 0xc00028c860, 0xc0001827b0, 0xc000061ba0): driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman:         /home/colum/workspace/terraform/puppet/tmp/nomad-driver-podman-0.1.0/build/pkg/mod/github.com/hashicorp/[email protected]/plugins/drivers/server.go:105 +0x60: driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: github.com/hashicorp/nomad/plugins/drivers/proto._Driver_StartTask_Handler(0xd0aa20, 0xc00028c860, 0xeb1780, 0xc0001827b0, 0xc0001805a0, 0x0, 0xeb1780, 0xc0001827b0, 0xc0001c2e00, 0x644): driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman:         /home/colum/workspace/terraform/puppet/tmp/nomad-driver-podman-0.1.0/build/pkg/mod/github.com/hashicorp/[email protected]/plugins/drivers/proto/driver.pb.go:4260 +0x217: driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: google.golang.org/grpc.(*Server).processUnaryRPC(0xc0002d64e0, 0xebf940, 0xc000001980, 0xc000192100, 0xc0002bdb60, 0x1457668, 0x0, 0x0, 0x0): driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman:         /home/colum/workspace/terraform/puppet/tmp/nomad-driver-podman-0.1.0/build/pkg/mod/google.golang.org/[email protected]/server.go:1082 +0x50a: driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: google.golang.org/grpc.(*Server).handleStream(0xc0002d64e0, 0xebf940, 0xc000001980, 0xc000192100, 0x0): driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman:         /home/colum/workspace/terraform/puppet/tmp/nomad-driver-podman-0.1.0/build/pkg/mod/google.golang.org/[email protected]/server.go:1405 +0xccb: driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc000036040, 0xc0002d64e0, 0xebf940, 0xc000001980, 0xc000192100): driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman:         /home/colum/workspace/terraform/puppet/tmp/nomad-driver-podman-0.1.0/build/pkg/mod/google.golang.org/[email protected]/server.go:746 +0xa1: driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman: created by google.golang.org/grpc.(*Server).serveStreams.func1: driver=podman
 2020-08-18T20:56:32.745Z [DEBUG] client.driver_mgr.nomad-driver-podman:         /home/colum/workspace/terraform/puppet/tmp/nomad-driver-podman-0.1.0/build/pkg/mod/google.golang.org/[email protected]/server.go:744 +0xa1: driver=podman
 2020-08-18T20:56:32.746Z [DEBUG] client.driver_mgr: plugin process exited: driver=podman path=/opt/nomad/plugins/nomad-driver-podman pid=92308 error="exit status 2"
 2020-08-18T20:56:32.746Z [WARN]  client.driver_mgr: failed to reattach to plugin, starting new instance: driver=podman err="singleton plugin exited"
 2020-08-18T20:56:32.746Z [DEBUG] client.driver_mgr: starting plugin: driver=podman path=/opt/nomad/plugins/nomad-driver-podman args=[/opt/nomad/plugins/nomad-driver-podman]
 2020-08-18T20:56:32.746Z [WARN]  client.driver_mgr: received fingerprint error from driver: driver=podman error="plugin is shut down"

APIVersion field in Version struct not compatible with podman 2.0.5

Hello, thank you for this awesome plugin!

First, I know why it's happening and fixing it only involves changing two characters in the README.

The problem

When running nomad with the podman driver, the podman driver is not activated and I get the following error with journalctl --unit=nomad.service:

client.driver_mgr.nomad-driver-podman: Could not get podman info: driver=podman err="json: cannot unmarshal number into Go struct field Version.version.APIVersion of type string"

Environment:

  • nomad version: 1.0.1
  • driver version: 0.2.0
  • podman version: 2.0.5
  • redhat linux 8

It seems nomad will regardless detect "docker support" when the podman v2 REST socket is activated.

The reason

Looking at the podman source, the commit containers/podman@c4b49afad, part of the v2.1.0 patch, changed the APIVersion field from an int64 to a string. The Version data type in api/structs.go seems to be the corresponding struct in the driver. Since before 2.1.0, the structs were not compatible, and I'm using 2.0.5, this is the source of the error.

The solution

I don't expect the driver to work with every version of podman, but it would be nice to specify in the README that the minimal working version is 2.1.0 and not 2.x as it currently says. It might be possible to work around it by defining the decoders differently, but I don't know how, and it's not that important. So I recommend updating the doc.

Thank you for your work!

job HCL at https://www.nomadproject.io/docs/drivers/podman doesn't work

running the job at https://www.nomadproject.io/docs/drivers/podman , I get the following output

#  nomad job status example
ID            = example
Name          = example
Submit Date   = 2020-12-31T11:01:12-06:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = pending
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         0        7       1         0

Future Rescheduling Attempts
Task Group  Eval ID   Eval Time
cache       b85161ee  8s from now

Latest Deployment
ID          = 98f44b15
Status      = running
Description = Deployment is running

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
cache       1        2       0        2          2020-12-31T11:11:12-06:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
31a8f33b  4072e015  cache       3        run      failed    52s ago    48s ago
 
#nomad alloc status 9302e0ec
ID                   = 9302e0ec-787d-81a8-ee8e-34a9a837d8dd
Eval ID              = 0526f33d
Name                 = example.cache[0]
Node ID              = 4072e015
Node Name            = dc1
Job ID               = example
Job Version          = 5
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 1m29s ago
Modified             = 6s ago
Deployment ID        = fc5bf8f0
Deployment Health    = unhealthy
Replacement Alloc ID = 31e7dd56

Task "redis" is "dead"
Task Resources
CPU        Memory       Disk     Addresses
0/500 MHz  0 B/256 MiB  300 MiB  db: 172.19.64.3:26028

Task Events:
Started At     = 2020-12-31T17:04:15Z
Finished At    = 2020-12-31T17:04:50Z
Total Restarts = 1
Last Restart   = 2020-12-31T11:04:21-06:00

Recent Events:
Time                       Type             Description
2020-12-31T11:04:52-06:00  Killing          Sent interrupt. Waiting 5s before force killing
2020-12-31T11:04:51-06:00  Alloc Unhealthy  Unhealthy because of failed task
2020-12-31T11:04:51-06:00  Killing          Sent interrupt. Waiting 5s before force killing
2020-12-31T11:04:50-06:00  Not Restarting   Error was unrecoverable
2020-12-31T11:04:50-06:00  Driver Failure   rpc error: code = Unknown desc = failed to start task, could not inspect container : unexpected EOF
2020-12-31T11:04:21-06:00  Restarting       Task restarting in 18.502085747s
2020-12-31T11:04:21-06:00  Terminated       Exit Code: 1
2020-12-31T11:04:15-06:00  Started          Task started by client
2020-12-31T11:04:07-06:00  Driver           Image downloaded: Storing signatures
2020-12-31T11:03:58-06:00  Driver           Downloading image

OS: CentOS Stream release 8
nomad -version: Nomad v1.0.1 (c9c68aa55a7275f22d2338f2df53e67ebfcb9238)
podman: version 2.0.5
nomad-driver-podman version: nomad-driver-podman_0.2.0_linux_amd64

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.