GithubHelp home page GithubHelp logo

containers / conmon-rs Goto Github PK

View Code? Open in Web Editor NEW
168.0 168.0 41.0 5.92 MB

An OCI container runtime monitor written in Rust

License: Apache License 2.0

Rust 62.19% Makefile 1.49% Shell 2.47% Cap'n Proto 2.13% Go 30.96% Nix 0.77%
containers kubernetes rust

conmon-rs's People

Contributors

bitoku avatar cgwalters avatar dependabot[bot] avatar haircommander avatar lsm5 avatar martinpitt avatar mgjm avatar mrunalp avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar rhatdan avatar rphillips avatar saschagrunert avatar utam0k avatar wasup-yash avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

conmon-rs's Issues

Kubernetes e2e test `Kubectl Port forwarding With a server listening on localhost that expects a client request should support a client that connects, sends DATA, and disconnects` fails

What happened?

The test fails in cri-o/cri-o#6220 together with other cases, but I assume that the root cause is the same.

Interestingly, on my local machine the cancellation of the execsync token seems to kill the main container PID.

What did you expect to happen?

That the test succeeds.

How can we reproduce it (as minimally and precisely as possible)?

Running the test

> k8s-test-run --ginkgo.focus "Kubectl Port forwarding With a server listening on localhost that expects a client request should support a client that connects, sends DATA, and disconnects"
…
[BeforeEach] [sig-cli] Kubectl Port forwarding
  set up framework | framework.go:158
STEP: Creating a kubernetes client 09/16/22 11:08:56.88
Sep 16 11:08:56.880: INFO: >>> kubeConfig: /var/run/kubernetes/admin.kubeconfig
STEP: Building a namespace api object, basename port-forwarding 09/16/22 11:08:56.881
STEP: Waiting for a default service account to be provisioned in namespace 09/16/22 11:08:56.885
STEP: Waiting for kube-root-ca.crt to be provisioned in namespace 09/16/22 11:08:56.887
[It] should support a client that connects, sends DATA, and disconnects
  test/e2e/kubectl/portforward.go:481
STEP: Creating the target pod 09/16/22 11:08:56.888
Sep 16 11:08:56.892: INFO: Waiting up to 5m0s for pod "pfpod" in namespace "port-forwarding-8233" to be "running and ready"
Sep 16 11:08:56.893: INFO: Pod "pfpod": Phase="Pending", Reason="", readiness=false. Elapsed: 1.353931ms
Sep 16 11:08:56.893: INFO: The phase of Pod pfpod is Pending, waiting for it to be Running (with Ready = true)
Sep 16 11:08:58.896: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=false. Elapsed: 2.003699716s
Sep 16 11:08:58.896: INFO: The phase of Pod pfpod is Running (Ready = false)
Sep 16 11:09:00.896: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=false. Elapsed: 4.003966771s
Sep 16 11:09:00.896: INFO: The phase of Pod pfpod is Running (Ready = false)
Sep 16 11:09:02.895: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=true. Elapsed: 6.003036101s
Sep 16 11:09:02.895: INFO: The phase of Pod pfpod is Running (Ready = true)
Sep 16 11:09:02.895: INFO: Pod "pfpod" satisfied condition "running and ready"
STEP: Running 'kubectl port-forward' 09/16/22 11:09:02.895
Sep 16 11:09:02.895: INFO: starting port-forward command and streaming output
Sep 16 11:09:02.895: INFO: Asynchronously running '/home/sascha/go/src/k8s.io/kubernetes/_output/local/bin/linux/amd64/kubectl kubectl --server=https://localhost:6443/ --kubeconfig=/var/run/kubernetes/admin.kubeconfig --namespace=port-forwarding-8233 port-forward --namespace=port-forwarding-8233 pfpod :80'
Sep 16 11:09:02.895: INFO: reading from `kubectl port-forward` command's stdout
STEP: Dialing the local port 09/16/22 11:09:02.951
STEP: Sending the expected data to the local port 09/16/22 11:09:02.952
STEP: Reading data from the local port 09/16/22 11:09:02.952
STEP: Closing the write half of the client's connection 09/16/22 11:09:04.859
STEP: Waiting for the target pod to stop running 09/16/22 11:09:04.859
Sep 16 11:09:04.859: INFO: Waiting up to 5m0s for pod "pfpod" in namespace "port-forwarding-8233" to be "container terminated"
Sep 16 11:09:04.860: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=false. Elapsed: 1.185975ms
Sep 16 11:09:04.860: INFO: Pod "pfpod" satisfied condition "container terminated"
STEP: Verifying logs 09/16/22 11:09:04.86
…
[ hangs for 120s ]

We can see that the port-forwarder is dead within the container:

> kubectl get pods -A
NAMESPACE              NAME                      READY   STATUS    RESTARTS   AGE
kube-system            coredns-567b6dd84-bt6pf   1/1     Running   0          2m50s
port-forwarding-5552   pfpod                     0/2     Error     0          12s
> ps aux | rg agnhost
root     1000821  0.0  0.0 740048 24788 ?        Ssl  11:10   0:00 /agnhost netexec

And the conmonrs logs also indicate that it got terminated:

> sudo journalctl -f _COMM=conmonrs --since=now
conmonrs[1028242]: Using systemd/journald logger
conmonrs[1028242]: Set log level to: trace
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: registering event source with poller: token=Token(1), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(2), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a version request
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: registering event source with poller: token=Token(16777218), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a version request
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: registering event source with poller: token=Token(33554434), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a create container request
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: PID file is /run/containers/storage/overlay-containers/1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7/userdata/pidfile
conmonrs[1028243]: Runtime args "--root=/run/runc create --bundle /run/containers/storage/overlay-containers/1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7/userdata --pid-file /run/containers/storage/overlay-containers/1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7/userdata/pidfile 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7"
conmonrs[1028243]: Initializing CRI logger in path /var/log/pods/port-forwarding-2789_pfpod_39410d99-a33c-4a12-bd3b-0a47e1fb0d9c/readiness/0.log
conmonrs[1028243]: registering event source with poller: token=Token(3), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(4), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(5), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Using cgroup path: /proc/1028268/cgroup
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Read 150 bytes
conmonrs[1028243]: Wrote log line of length 120
conmonrs[1028243]: Wrote log line of length 120
conmonrs[1028243]: registering event source with poller: token=Token(50331650), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a create container request
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: PID file is /run/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata/pidfile
conmonrs[1028243]: Runtime args "--root=/run/runc create --bundle /run/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata --pid-file /run/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata/pidfile a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d"
conmonrs[1028243]: Initializing CRI logger in path /var/log/pods/port-forwarding-2789_pfpod_39410d99-a33c-4a12-bd3b-0a47e1fb0d9c/portforwardtester/0.log
conmonrs[1028243]: registering event source with poller: token=Token(6), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(7), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(8), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Using cgroup path: /proc/1028312/cgroup
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: registering event source with poller: token=Token(67108866), interests=READABLE | WRITABLE
conmonrs[1028243]: Got exec sync container request with timeout 60
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: Exec args "--root=/run/runc exec -d --pid-file=/run/containers/storage/overlay-containers/f5a4cb30d7219327b1fb12d111a14137ba65b5ebbfaaa50e4ba980070c4ad7bf/userdata/exec_syncAZSUifMpid 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7 sh -c netstat -na | grep LISTEN | grep -v 8080 | grep 80"
conmonrs[1028243]: registering event source with poller: token=Token(9), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(10), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(11), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: Using cgroup path: /proc/1028958/cgroup
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Read 81 bytes
conmonrs[1028243]: Exited 0
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1028958
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: Write to exit paths: 
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 0, oomed: false, timed_out: false }
conmonrs[1028243]: Task done
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: registering event source with poller: token=Token(83886082), interests=READABLE | WRITABLE
conmonrs[1028243]: Got exec sync container request with timeout 60
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: Exec args "--root=/run/runc exec -d --pid-file=/run/containers/storage/overlay-containers/f5a4cb30d7219327b1fb12d111a14137ba65b5ebbfaaa50e4ba980070c4ad7bf/userdata/exec_synczA8mDmspid 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7 sh -c netstat -na | grep LISTEN | grep -v 8080 | grep 80"
conmonrs[1028243]: registering event source with poller: token=Token(16777225), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(16777227), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(16777226), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: Using cgroup path: /proc/1029067/cgroup
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Read 81 bytes
conmonrs[1028243]: Exited 0
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1029067
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Write to exit paths: 
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 0, oomed: false, timed_out: false }
conmonrs[1028243]: Task done
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Read 57 bytes
conmonrs[1028243]: Wrote log line of length 72
conmonrs[1028243]: Wrote log line of length 75
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Stdout read loop failure: send data message: channel closed
conmonrs[1028243]: registering event source with poller: token=Token(16777223), interests=READABLE | WRITABLE
conmonrs[1028243]: Got exec sync container request with timeout 60
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: Exec args "--root=/run/runc exec -d --pid-file=/run/containers/storage/overlay-containers/f5a4cb30d7219327b1fb12d111a14137ba65b5ebbfaaa50e4ba980070c4ad7bf/userdata/exec_sync34TJjjHpid 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7 sh -c netstat -na | grep LISTEN | grep -v 8080 | grep 80"
conmonrs[1028243]: registering event source with poller: token=Token(100663298), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(33554442), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(33554441), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: Using cgroup path: /proc/1029207/cgroup
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Exited 0
conmonrs[1028243]: Read 81 bytes
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1029207
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: Write to exit paths: 
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 0, oomed: false, timed_out: false }
conmonrs[1028243]: Task done
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Exited 2
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1028312
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Stderr read loop failure: send done message: channel closed
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: Write to exit paths: /var/run/crio/exits/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d, /var/lib/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata/exit
conmonrs[1028243]: Creating exit file
conmonrs[1028243]: Creating exit file
conmonrs[1028243]: Writing exit code to file
conmonrs[1028243]: Writing exit code to file
conmonrs[1028243]: Flushing file
conmonrs[1028243]: Flushing file
conmonrs[1028243]: Done writing exit file
conmonrs[1028243]: Done writing exit file
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 2, oomed: false, timed_out: false }

Anything else we need to know?

Interestingly, when running the workload from the yaml then it works:

apiVersion: v1
kind: Pod
metadata:
  labels:
    name: pfpod
  name: pfpod
spec:
  containers:
  - args:
    - netexec
    image: registry.k8s.io/e2e-test-images/agnhost:2.40
    imagePullPolicy: IfNotPresent
    name: readiness
    readinessProbe:
      exec:
        command:
        - sh
        - -c
        - netstat -na | grep LISTEN | grep -v 8080 | grep 80
      failureThreshold: 3
      initialDelaySeconds: 5
      periodSeconds: 1
      successThreshold: 1
      timeoutSeconds: 60
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-q2nfb
      readOnly: true
  - args:
    - port-forward-tester
    env:
    - name: BIND_PORT
      value: "80"
    - name: EXPECTED_CLIENT_DATA
      value: abc
    - name: CHUNKS
      value: "10"
    - name: CHUNK_SIZE
      value: "10"
    - name: CHUNK_INTERVAL
      value: "100"
    - name: BIND_ADDRESS
      value: localhost
    image: registry.k8s.io/e2e-test-images/agnhost:2.40
    imagePullPolicy: IfNotPresent
    name: portforwardtester
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-q2nfb
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: 127.0.0.1
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-q2nfb
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
> k get pods
NAME    READY   STATUS    RESTARTS   AGE
pfpod   2/2     Running   0          24s

Removing the readiness probe in the test makes it pass on my local machine:
https://github.com/kubernetes/kubernetes/blob/02ac8ac4181e179c2f030a9f8b1abef0d9a0b512/test/e2e/kubectl/portforward.go#L77-L87

So it looks like that the the first execsync request kills the portforwardtester container.

conmon-rs version

$ conmonrs --version
version: 0.2.0
tag: none
commit: 51eb41deca7402edb0eb8b75b965283790dbf299
build: 2022-09-15 07:32:52 +00:00
target: x86_64-unknown-linux-gnu
rustc 1.63.0 (4b91a6ea7 2022-08-08)
cargo 1.63.0 (fd9c4297c 2022-07-01)

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Additional environment details (AWS, VirtualBox, physical, etc.)

Increase unit test coverage

We should actively work on increasing the unit test coverage by using the mockall framework. This should be done in the same way like we did it for the server init:

#[cfg(test)]
use mockall::{automock, predicate::*};

#[cfg(test)]
mod tests {
use super::*;
use std::{ptr, str};
use tempfile::tempfile;
fn new_sut(mock: MockInitImpl) -> Init<MockInitImpl> {
Init::<MockInitImpl> { imp: mock }
}
#[test]
fn unset_locale() -> Result<()> {
let mut mock = MockInitImpl::new();
mock.expect_setlocale()
.withf(|x, _| *x == LC_ALL)
.returning(|_, _| ptr::null_mut());
let sut = new_sut(mock);
sut.unset_locale()
}

Memory usage after first tonic request

I was adding a RPC request to return RSS memory and noticed the RSS usage goes up in the server after the first GRPC request... significantly.

Terminal 1:

# target/release/conmon-server --runtime /bin/ls

Terminal 2:

# cat /proc/$(pidof conmon-server)/status|grep RSS
VmRSS:      1084 kB
# target/release/conmon-client
Version: Response { metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Wed, 10 Nov 2021 21:55:01 GMT", "grpc-status": "0"} }, message: VersionResponse { version: "0.1.0" }, extensions: Extensions }
# cat /proc/$(pidof conmon-server)/status|grep RSS
VmRSS:      3464 kB

RPM Packaging

What happened?

I received some tips for packaging conmon-rs RPMs from @cgwalters. Generally bootupd looks like a good example. The bootupd github releases contain a source tar and a vendor tar. The bootupd.spec uses both tarballs to generate the resulting rpm.

Starting points could be:

  • On conmon-rs tag, create a release with a vendor tarball and a source tarball using a GH action

What did you expect to happen?

None

How can we reproduce it (as minimally and precisely as possible)?

None

Anything else we need to know?

No response

conmon-rs version

NONE

$ conmonrs --version
# paste output here

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Additional environment details (AWS, VirtualBox, physical, etc.)

go.mod: do not depend on github.com/containers/podman

Given that podman likely wants to import this go client lib as well (containers/podman#14804) this lib should not depend on any podman code. While it can work the circular dependency is just asking for trouble when podman tries to make breaking changes to the code.

I propose that we move the required podman code to c/common and then podman and conmon-rs can use it without trouble.

Catch container OOM

the current implementation of conmon listens to the container's cgroup.events file to check whether a container OOMed after it died. We should do a similar thing in conmonrs

Make stdout log driver with non terminal containers work

The issue’s that we disconnect stdout and stderr with non terminal containers in streams.rs. This is causing the stdout logger not work any more since the log crate relies on stdio. We should investigate how to improve that.

Attach detaches from container too early

What happened?

I am currently trying to integrate conmon-rs into Podman. I am trying to get the attach functionality working. When I run the command sudo bin/podman run --log-driver=k8s-file --name=demo -d alpine top, the container successfully starts and the logs display that there are bytes being read:

2022-08-11T18:14:42.317436Z DEBUG backend:create_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="1c4c62f4-c230-485f-895e-aee634fe43a6"}:promise:stdout: conmonrs::container_io: 238: Read 304 bytes
2022-08-11T18:14:47.322829Z DEBUG backend:create_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="1c4c62f4-c230-485f-895e-aee634fe43a6"}:promise:stdout: conmonrs::container_io: 238: Read 304 bytes
2022-08-11T18:14:52.327820Z DEBUG backend:create_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="1c4c62f4-c230-485f-895e-aee634fe43a6"}:promise:stdout: conmonrs::container_io: 238: Read 304 bytes

When I try and attach to the container doing sudo bin/podman attach demo, it attaches and briefly displays the output of top:

Mem: 15176388K used, 50390388K free, 1378696K shrd, 4192K buff, 9559136K cached
CPU:   1% usr   0% sys   0% nic  98% idle   0% io   0% irq   0% sirq
Load average: 0.26 0.45 0.47 1/1736 1
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 root     R     1604   0%  10   0% top

However, after this output displays, the container immediately detaches even though the logs show that conmon-rs is still reading bytes.

Log output when attaching:

2022-08-11T18:16:01.775594Z DEBUG backend: conmonrs::rpc: 50: Got a version request
2022-08-11T18:16:01.775814Z DEBUG backend:attach_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="fe2068d0-b224-47ac-9a90-a321b23225ef"}: conmonrs::rpc: 238: Got a attach container request
2022-08-11T18:16:01.775826Z DEBUG backend:attach_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="fe2068d0-b224-47ac-9a90-a321b23225ef"}:promise: conmonrs::attach: 116: Creating attach socket: /var/lib/containers/storage/overlay-containers/a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5/userdata/attach
2022-08-11T18:16:01.775897Z DEBUG backend:attach_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="fe2068d0-b224-47ac-9a90-a321b23225ef"}:promise:attach: conmonrs::attach: 158: Start listening on attach socket
2022-08-11T18:16:01.775968Z DEBUG backend:attach_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="fe2068d0-b224-47ac-9a90-a321b23225ef"}:promise:attach: conmonrs::attach: 163: Got new attach stream connection
2022-08-11T18:16:02.402098Z DEBUG backend:create_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="1c4c62f4-c230-485f-895e-aee634fe43a6"}:promise:stdout: conmonrs::container_io: 238: Read 304 bytes
2022-08-11T18:16:02.402554Z DEBUG backend:attach_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="fe2068d0-b224-47ac-9a90-a321b23225ef"}:promise:attach:write_loop: conmonrs::attach: 254: Wrote stdout packet 0/1 to client
2022-08-11T18:16:02.402594Z DEBUG backend:attach_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="fe2068d0-b224-47ac-9a90-a321b23225ef"}:promise:attach:write_loop: conmonrs::attach: 254: Wrote stdout packet 1/1 to client
2022-08-11T18:16:02.402956Z DEBUG backend:attach_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="fe2068d0-b224-47ac-9a90-a321b23225ef"}:promise:attach:read_loop: conmonrs::attach: 203: Stopping read loop because no more data to read
2022-08-11T18:16:07.407616Z DEBUG backend:create_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="1c4c62f4-c230-485f-895e-aee634fe43a6"}:promise:stdout: conmonrs::container_io: 238: Read 304 bytes
2022-08-11T18:16:12.413399Z DEBUG backend:create_container{container_id="a186f94a8b5cd7953e3fe191a0bbf2dfbb8e4199c2dd3c1dd3a05c6ac44674f5" uuid="1c4c62f4-c230-485f-895e-aee634fe43a6"}:promise:stdout: conmonrs::container_io: 238: Read 304 bytes

Matt Heon and I believe there is a synchronization issue somewhere server side.

What did you expect to happen?

I expected the container to remain attached and continue outputting the top data

How can we reproduce it (as minimally and precisely as possible)?

The code I've been writing to integrate conmon-rs into Podman can be found in this PR: containers/podman#14930

Anything else we need to know?

This has been tried with prior versions of conmon-rs and didn't work either.

conmon-rs version

$ conmonrs --version
version: 0.1.0
tag: none
commit: 55285cb357adbbdf4ebd8d54076ca8db6da69682
build: 2022-08-11 18:26:23 +00:00
rustc 1.62.0 (a8314ef7d 2022-06-27)

OS version

# On Linux:
$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="36 (Workstation Edition)"
ID=fedora
VERSION_ID=36
VERSION_CODENAME=""
PLATFORM_ID="platform:f36"
PRETTY_NAME="Fedora Linux 36 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:36"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=36
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=36
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation
$ uname -a
Linux fedora 5.18.16-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Aug 3 15:44:49 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

$ podman version

Client: Podman Engine
Version: 4.2.0-dev
API Version: 4.2.0-dev
Go Version: go1.18.2
Git Commit: 1ada01a038fceaa9e94beb2de6e4593df03be7fa-dirty
Built: Tue Jun 7 12:48:55 2022
OS/Arch: linux/amd64

$ go version
go version go1.18.4 linux/amd64

$ rustup --version
rustup 1.24.3 (ce5817a94 2021-05-31)

$ rustc --version
rustc 1.62.0 (a8314ef7d 2022-06-27)

Add support for OpenTelemetry tracing

We're already using the tracing crate, which has support for Open Telemetry (OTEL) as well: https://github.com/tokio-rs/tracing/tree/master/tracing-opentelemetry

Using the crate would probably incorporate adding those, too:

Independently how we wanna use it from a configuration perspective, having OTEL support would increase the observability of the runtime stack (together with CRI-O) by gaining much more insights about what is going on in conmon-rs.

Experiment moving pinns functionality into conmonrs

pinns is a utility called by cri-o to create pod-level namespaces. It would be cool if conmon-rs could create those namespaces for the pod. Most notably, if it could manage a pause process to hold open the pod's PID namespace, then CRI-O could get rid of the infra container

Rework README.md

We should give the README.md some love and reflect the current state of the project.

critest attach test fails

The critest conformance test runtime should support attach [Conformance] fails by timing out:

https://github.com/kubernetes-sigs/cri-tools/blob/1a0b272405a694d7d82606c5a7eddd7905321c4b/pkg/validate/streaming.go#L109-L122

Right now the test got excluded by CRI-O: https://github.com/cri-o/cri-o/blob/db76322414a7888c7642b707e07dbb9905aa969f/.github/workflows/integration.yml#L57

We either have to fix that on the CRI-O or the conmon-rs side. From the logs I see that conmonrs gets the data from the test:

May 11 10:41:06 nixos conmonrs[504862]: Got a attach container request
May 11 10:41:06 nixos conmonrs[504862]: Creating attach socket: /var/lib/containers/storage/overlay-containers/c16e58ee6177513959550004b949d71472b0cca52ca603bd3764195d22590b59/userdata/ee14eb08f450ffda35dd689411663b8b19d3d7947e695844e49ecca5ce32dc0d/attach
May 11 10:41:06 nixos conmonrs[504862]: Start listening on attach socket
May 11 10:41:06 nixos conmonrs[504862]: Got new attach stream connection
May 11 10:41:07 nixos conmonrs[504862]: Read 14 stdin bytes from client
May 11 10:41:07 nixos conmonrs[504862]: fd:23:read 5 bytes
May 11 10:41:08 nixos conmonrs[504862]: Wrote stdout packet to client

s390x Package RPM are Failing

What happened?

Minor issue on s390x packaging builds.

   Compiling conmonrs v0.1.0 (/builddir/build/BUILD/conmon-rs/conmon-rs/server)
error[E0308]: mismatched types
  --> conmon-rs/server/src/oom_watcher.rs:19:48
   |
19 | pub const CGROUP2_SUPER_MAGIC: FsType = FsType(libc::CGROUP2_SUPER_MAGIC as i64);
   |                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `u32`, found `i64`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `conmonrs` due to previous error
make: *** [Makefile:17: release] Error 101
error: Bad exit status from /var/tmp/rpm-tmp.iFW7as (%build)

What did you expect to happen?

I expect s390x packages to build correctly.

How can we reproduce it (as minimally and precisely as possible)?

N/A

Anything else we need to know?

No response

conmon-rs version

None

$ conmonrs --version
# paste output here

OS version

None

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Additional environment details (AWS, VirtualBox, physical, etc.)

None

gettext-sys compile times

Is there a way to get rid of gettext-sys? We use it for setting the locale, but the compile times with it seem to be quite long.

release-static failing

Which jobs are failing?

release-static

Which tests are failing?

N/A

Since when has it been failing?

Not sure, just noticed this morning

Reason for failure (if possible)

  = note: "cc" "-m64" "/home/runner/work/conmon-rs/conmon-rs/target/x86_64-unknown-linux-gnu/release/deps/conmonrs-988be6b2ab5ecfbe.conmonrs-4c05c21a52340583.3ioxwzlr70pbvyba.rcgu.o.rcgu.o" "-Wl,--as-needed" "-L" "/home/runner/work/conmon-rs/conmon-rs/target/x86_64-unknown-linux-gnu/release/deps" "-L" "/home/runner/work/conmon-rs/conmon-rs/target/release/deps" "-L" "src/backend/linux_raw/arch/outline/release" "-L" "/home/runner/work/conmon-rs/conmon-rs/target/x86_64-unknown-linux-gnu/release/build/libgit2-sys-adb848e1e8fe704a/out/build" "-L" "/home/runner/.rustup/toolchains/1.58.1-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "/tmp/rustcOeSN7b/liblibgit2_sys-9c0b940abda9ef93.rlib" "/tmp/rustcOeSN7b/librustix-371ab8aa59bbc384.rlib" "-Wl,--start-group" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-lgcc_eh" "-lgcc" "-Wl,--end-group" "/home/runner/.rustup/toolchains/1.58.1-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-5667a4a7e2c48d47.rlib" "-Wl,-Bdynamic" "-lc" "-lz" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-L" "/home/runner/.rustup/toolchains/1.58.1-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/home/runner/work/conmon-rs/conmon-rs/target/x86_64-unknown-linux-gnu/release/deps/conmonrs-988be6b2ab5ecfbe" "-Wl,--gc-sections" "-static" "-no-pie" "-Wl,-zrelro,-znow" "-nodefaultlibs"
  = note: /usr/bin/ld: /home/runner/work/conmon-rs/conmon-rs/target/x86_64-unknown-linux-gnu/release/deps/conmonrs-988be6b2ab5ecfbe.conmonrs-4c05c21a52340583.3ioxwzlr70pbvyba.rcgu.o.rcgu.o: in function `<std::sys_common::net::LookupHost as core::convert::TryFrom<(&str,u16)>>::try_from':
          /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b//library/std/src/sys_common/net.rs:191: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
          /usr/bin/ld: attempted static link of dynamic object `/lib/x86_64-linux-gnu/libc.so.6'
          /usr/bin/ld: attempted static link of dynamic object `/lib64/ld-linux-x86-64.so.2'
          /usr/bin/ld: attempted static link of dynamic object `/usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/libz.so'
          collect2: error: ld returned 1 exit status
          

Anything else we need to know?

ref https://github.com/containers/conmon-rs/actions/runs/3675616095/jobs/6218022389

Unable to re-attach after a successful attach

What happened?

We cannot reattach to a container because the attach path already exists:

FATA[0000] attaching running container failed: Internal error occurred: error attaching to container: create result: conmon-rs/common/proto/conmon.capnp:Conmon.attachContainer: create attach endpoint: Attach socket path already exists: /run/containers/storage/overlay-containers/1ac6dc4e679069ba32bfbd4307435da7551de15ad81e5fff6bc54757badafed1/userdata/2469c120ea14e5d8112e3886a514e1b86262841dac76a4011b008a0a79d55072/attach

What did you expect to happen?

We should be able to re-attach containers.

How can we reproduce it (as minimally and precisely as possible)?

Run a workload via CRI-O and crictl and try to attach twice.

Anything else we need to know?

No response

conmon-rs version

869e839

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Additional environment details (AWS, VirtualBox, physical, etc.)

Attach leaks a goroutine

What happened?

We print the active goroutines after each integration test. It looks like that the util.CopyDetachable still leaves a running goroutine open:

2022-07-20T14:16:05.7893221Z 2 @ 0x44c076 0x45d112 0x4b882e 0x4b92e5 0x654416 0x6dc739 0x47dfc1
2022-07-20T14:16:05.7893648Z #	0x4b882d	io.(*pipe).read+0x14d										/opt/hostedtoolcache/go/1.18.4/x64/src/io/pipe.go:57
2022-07-20T14:16:05.7894140Z #	0x4b92e4	io.(*PipeReader).Read+0x64									/opt/hostedtoolcache/go/1.18.4/x64/src/io/pipe.go:136
2022-07-20T14:16:05.7895004Z #	0x654415	github.com/containers/common/pkg/util.CopyDetachable+0xb5					/home/runner/go/pkg/mod/github.com/containers/[email protected]/pkg/util/copy.go:16
2022-07-20T14:16:05.7895944Z #	0x6dc738	github.com/containers/conmon-rs/pkg/client.(*ConmonClient).setupStdioChannels.func2+0xb8	/home/runner/work/conmon-rs/conmon-rs/pkg/client/attach.go:201

go func() {
var err error
if cfg.Streams.Stdin != nil {
_, err = util.CopyDetachable(conn, cfg.Streams.Stdin, cfg.DetachKeys)
}
stdinDone <- err
}()

We should find a way to cleanup the goroutine on server shutdown as well.

What did you expect to happen?

No running goroutines for attach after the integration tests.

How can we reproduce it (as minimally and precisely as possible)?

Running the integration tests reveals the issue by printing the goroutines at the end.

Anything else we need to know?

No

conmon-rs version

$ conmonrs --version
version: 0.1.0-dev
tag: none
commit: 68252d0373b8e262bdf7ff8780fb2b5ab6f66c29
build: 2022-07-20 14:13:13 +00:00
rustc 1.62.1 (e092d0b6b 2022-07-16)

OS version

Not relevant

Additional environment details (AWS, VirtualBox, physical, etc.)

No

capnproto alpha 5 needs a manual bump

What happened?

#608 is failing due to a possible API change. We will need a manual bump of the package to fix it.

What did you expect to happen?

capnproto alpha 5 to work.

How can we reproduce it (as minimally and precisely as possible)?

N/A

Anything else we need to know?

No response

conmon-rs version

$ conmonrs --version
# paste output here

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Additional environment details (AWS, VirtualBox, physical, etc.)

Add support to forward containers output to splunk via hec connector

We do currently use docker on Debian 11 and let the containers log on stdout through the
Splunk logging driver (https://docs.docker.com/config/containers/logging/splunk/) via HEC (https://docs.splunk.com/Documentation/Splunk/latest/Data/UsetheHTTPEventCollector) into Splunk.

In term of docker-compose this gives us the possibility to configure Splunk-logging on a per deployment basis, which is very comfortable as well as independent of any central configuration.

We would like to move to Podman on RHEL8 servers. Everything works fine so far, but we didn't find a way to log stdout of the containers via the HEC interface into Splunk, as the currently available podman version 4 does not provide such a splunk logging driver.

Is there a solution on the part of Podman to log the output of the containers into Splunk.

If not, is it possible to commission a corresponding development against payment?

I opened a feature request in the containers/conmon#340 and I was told to open the request in this repo.

Logging to journald/systemd does not add scope

When logging to stdout then we usually add a full tracing context to each RPC:

2022-05-04T12:22:35.208960Z DEBUG create_container{container_id="326d99c7850ba440f0faafeb30f875ca448eac74706ee9a03fbf45c33e86b5cc" uuid="b5ce0501-c4b3-4fbb-8453-deb02f133ff5"}: conmon::rpc: 72: Got a create container request

This information seems to be lost when using the systemd logger:

May 11 10:41:06 nixos conmonrs[504862]: Got a create container request

We should investigate if we're misugins the tracing API or if it's related to tokio-rs/tracing#2051

Choose RPC framework

there are tons of RPC frameworks to choose from. we've flirted with the idea of grpc, ttrpc, cap n proto. we should probably just pick one. any other ideas?

Container logs related tests fail

Which jobs are failing?

CRI-O GitHub actions e2e tests: https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317

Which tests are failing?

A bunch of tests asserting container logs:

Summarizing 6 Failures:
[14113](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14114)
  [FAIL] [sig-node] Variable Expansion [It] should allow composing env vars into new env vars [NodeConformance] [Conformance]
[14114](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14115)
  test/e2e/framework/util.go:773
[14115](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14116)
  [FAIL] [sig-node] Downward API [It] should provide pod name, namespace and IP address as env vars [NodeConformance] [Conformance]
[14116](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14117)
  test/e2e/framework/util.go:773
[14117](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14118)
  [FAIL] [sig-storage] EmptyDir volumes when FSGroup is specified [LinuxOnly] [NodeFeature:FSGroup] [It] new files should be created with FSGroup ownership when container is non-root
[14118](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14119)
  test/e2e/framework/util.go:773
[14119](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14120)
  [FAIL] [sig-node] Sysctls [LinuxOnly] [NodeConformance] [It] should support sysctls [MinimumKubeletVersion:1.21] [Conformance]
[14120](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14121)
  test/e2e/common/node/sysctl.go:114
[14121](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14122)
  [FAIL] [sig-node] Pods [It] should contain environment variables for services [NodeConformance] [Conformance]
[14122](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14123)
  test/e2e/common/node/pods.go:524
[14123](https://github.com/cri-o/cri-o/actions/runs/3089802460/jobs/4998634317#step:5:14124)
  [FAIL] [sig-storage] Projected combined [It] should project all components that make up the projection API [Projection][NodeConformance] [Conformance]

Since when has it been failing?

I’m unsure, they passed a while before we merged the attach fixes.

Reason for failure (if possible)

I assume that we have not written all logs on container termination and race between the CRI logger and child reaper. We probably need something to wait for the logs before exiting conmon-rs.

Anything else we need to know?

No response

Propagate Open Telemetry context from CRI-O to conmon-rs

We can use the Open Telemetry context propagation to link spans between CRI-O and conmon-rs. This would mean that we have to use the global Text Map Propagator on the client side as well as on the conmon-rs server side to inject/extract the spans.

Since capnproto does neither natively support serializing hash maps nor carrying headers, we have to change each RPC to additionally contain the metadata:

struct VersionRequest {
    metadata @0 :Data;
    verbose @1 :Bool;
}

I assume that we introduce a new metadata type like:

struct MetadataMap<'a>(HashMap<&'a str, &'a str>);

impl<'a> Extractor for MetadataMap<'a> {
    fn get(&self, key: &str) -> Option<&str> {
        self.0.get(key).map(|x| x.clone())
    }

    /// Collect all the keys from the MetadataMap.
    fn keys(&self) -> Vec<&str> {
        self.0.keys().map(|k| k.clone()).collect::<Vec<_>>()
    }
}

And then (for each RPC) restore the spans:

let metadata = pry!(req.get_metadata());
let hashmap: HashMap<&str, &str> = pry_err!(serde_json::from_slice(metadata));
let metadata_map = MetadataMap(hashmap);
let parent_cx = global::get_text_map_propagator(|prop| prop.extract(&metadata_map));
self.tracer().start_with_context("version", &parent_cx);

Examples:

Allow CRI-O logging to stdout

We right now use the default logger in CRI-O, which logs to systemd. It would be good to have at least a runtime monitor configuration to also log to stdout. This allows us to collect conmon and CRI-O logs in parallel.

Q: Do we wanna make stdout logging the default? Which benefits do we gain when logging to systemd?

attach metadata to logger

the rust logger allows us to attach metadata. it would be useful to track different messages based on container id, or pid if that's not possible, so we know which container did what

Test `CreateContainer should kill created children if being killed` flakes

The test case flakes from time to time with the following error:

ConmonClient CreateContainer
  should kill created children if being killed
  /home/runner/work/conmon-rs/conmon-rs/pkg/client/client_test.go:67
2022-04-20T09:52:35.865Z INFO  [conmon::server] Received SIGINT
2022-04-20T09:52:35.866Z INFO  [conmon::server] Using stdout logger
2022-04-20T09:52:35.866Z INFO  [conmon::server] Set log level to: DEBUG
2022-04-20T09:52:35.866Z INFO  [conmon::init] Missing sufficient privileges to adjust OOM score
2022-04-20T09:52:35.867Z DEBUG [conmon::rpc] Got a version request
2022-04-20T09:52:35.869Z DEBUG [conmon::rpc] Got a create container request for id 5e1b73e9b08c3fd32c51213b8a51e9070e872589be2c3071cd766b9a76e72043
2022-04-20T09:52:35.869Z DEBUG [conmon::streams] Creating new IO streams
2022-04-20T09:52:35.869Z DEBUG [conmon::rpc] PID file is /tmp/conmon-client1596343358/pidfile
2022-04-20T09:52:35.869Z DEBUG [conmon::server] Runtime args "--root=/tmp/conmon-client1596343358/root create --bundle /tmp/conmon-client1596343358 --pid-file /tmp/conmon-client1596343358/pidfile 5e1b73e9b08c3fd32c51213b8a51e9070e872589be2c3071cd766b9a76e72043"
2022-04-20T09:52:35.870Z DEBUG [conmon::cri_logger] Initializing CRI logger in path /tmp/conmon-client1596343358/log
2022-04-20T09:52:35.873Z DEBUG [conmon::streams] Start reading from IO streams
2022-04-20T09:52:35.[959](https://github.com/containers/conmon-rs/runs/6092649279?check_suite_focus=true#step:7:959)Z INFO  [conmon::server] Received SIGINT
------------------------------
• [FAILED] [10.210 seconds]
ConmonClient
/home/runner/work/conmon-rs/conmon-rs/pkg/client/client_test.go:16
  CreateContainer
  /home/runner/work/conmon-rs/conmon-rs/pkg/client/client_test.go:48
    [It] should kill created children if being killed
    /home/runner/work/conmon-rs/conmon-rs/pkg/client/client_test.go:67

  Timed out after 10.005s.
  Expected
      <*errors.errorString | 0xc00045f310>: {
          s: "Expected stopped to be a substr of ID                                                                 PID         STATUS      BUNDLE                         CREATED                          OWNER\n5e1b73e9b08c3fd32c51213b8a51e9070e872589be2c3071cd766b9a76e72043   10637       created     /tmp/conmon-client15[963](https://github.com/containers/conmon-rs/runs/6092649279?check_suite_focus=true#step:7:963)43358   2022-04-20T09:52:35.916020963Z   runner\n",
      }
  to be nil
  In [It] at: /home/runner/work/conmon-rs/conmon-rs/pkg/client/client_test.go:78

It(testName("should kill created children if being killed", terminal), func() {
tr = newTestRunner()
tr.createRuntimeConfig(terminal)
sut = tr.configGivenEnv()
tr.createContainer(sut, terminal)
Expect(sut.Shutdown()).To(BeNil())
sut = nil
Eventually(func() error {
return tr.rr.RunCommandCheckOutput("stopped", "list")
}, time.Second*10).Should(BeNil())
})

use mio or other event based io for stream

currently, there's a bug where running with multiple exec sync or containers causes conmonrs to leak threads. This is from the streams::read_loop_single_stream reader.read call. we're doing a normal blocking read, and waiting on I/O that will never come (nor be canceled).

We need to either not block on the IO, use an event based system, or be able to cancel the read when we know there are no more messages

Idea: Would it be possible to create the cgroup in advance?

This is just an idea. I'd like to discuss this with the conmon-rs team.

Maybe you know creating a cgroup takes a cost, actually, it is one of the most time-consuming tasks. in the container runtime following OCI runtime spec.
Youki previously considered creating the cgroup asynchronously with io_uring, but this did not yield very good results.
However, if it is a daemon like a server, there should be enough time to create it in advance. Whereas there, the container runtime should be able to skip the cgroup creation process by creating the process with clone3. Wdyt?

This idea is inspired from:

Thus reducing the amount of exec calls that must happen in the container engine, and reducing the amount of memory it uses.

Fedora 35 i686 build fails in copr

What happened?

The builds fails since a couple of commits with the error message:

https://download.copr.fedorainfracloud.org/results/rhcontainerbot/podman-next/fedora-35-i386/04836395-conmon-rs/root.log.gz

DEBUG util.py:445:  No match for argument: golang-1.18.6-1.fc35.i386
DEBUG util.py:443:  Error: Unable to find a match: golang-1.18.6-1.fc35.i386

What did you expect to happen?

The build should work like the other ones.

How can we reproduce it (as minimally and precisely as possible)?

Mostly on every recent commit on https://copr.fedorainfracloud.org/coprs/rhcontainerbot/podman-next/package/conmon-rs/

Anything else we need to know?

No response

conmon-rs version

$ conmonrs --version
# paste output here

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Additional environment details (AWS, VirtualBox, physical, etc.)

Implement container attach

To satisfy the CRI-O runtime method AttachContainer, a method should be created that requests an attach socket be created for a container or exec session.

`[sig-cli] Kubectl client Simple pod should support inline execution and attach` fails

Which jobs are failing?

CRI-O conmonrs e2e feature tests.

Which tests are failing?

Mentioned in the title, ran via: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/cri-o_cri-o/6220/pull-ci-cri-o-cri-o-main-ci-e2e-conmonrs/1572188830188965888

Since when has it been failing?

Since ever, reproducible with 8e4cec9

Reason for failure (if possible)

I guess it’s something in the attach stdin code.

Anything else we need to know?

No response

Can't run conmon-rs with cargo run

What happened?

Hi, when I want to run this project with cargo run, it failed, even if I specific --bin like below:

cargo run --bin=/home/zk/go/src/github.com/containers/conmon-rs/target/debug

It get this error :
error: no bin target named /home/zk/go/src/github.com/containers/conmon-rs/target/debug/conmonrs

What did you expect to happen?

Run successfully

How can we reproduce it (as minimally and precisely as possible)?

cd containers/conmon-rs
cargo run --bin=/home/zk/go/src/github.com/containers/conmon-rs/target/debug

Anything else we need to know?

No response

conmon-rs version

$ conmonrs --version
# paste output here

OS version

# On Linux:
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

$ uname -a
inux ubuntu 5.13.0-41-generic #46~20.04.1-Ubuntu SMP Wed Apr 20 13:16:21 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.