Comments (13)
Agree 👍
from agent.
Hey !
I have been thinking about this and I actually had not realized we needed such a thing (a subreaper) for all the children and grandchildren and grandgrand... processes that could be spawned from the container or exec processes.
I have made some testing with our agent from Clear Containers (also using libcontainer) and here is the way we should handle this:
- We set the agent as the subreaper of all the processes.
- We start a Go routine loop from the agent, waiting for ALL processes.
- We add to the
sandbox
structure a map of Go channels with processes PID being the key. - From the PID waiting loop, if the pid of the reaped process match one of the process inside from the map, we send the exit code of this process through this channel.
- In the meantime,
WaitProcess()
will be called, and it is going to start listening on this channel from the map, awaiting for the exit code to be received.Once it gets the exit code, it calls into process.Wait() (libcontainer function), this way every thing will be cleaned up by libcontainer itself. Here is a comment fromrunc
implementation to illustrate this:
call Wait() on the process even though we already have the exit status because we must ensure that any of the go specific process fun such as flushing pipes are complete before we return.
This comes from signals.go
in runc
, and here we can see they don't bother testing the error returned:
case unix.SIGCHLD:
exits, err := h.reap()
if err != nil {
logrus.Error(err)
}
for _, e := range exits {
logrus.WithFields(logrus.Fields{
"pid": e.pid,
"status": e.status,
}).Debug("process exited")
if e.pid == pid1 {
// call Wait() on the process even though we already have the exit
// status because we must ensure that any of the go specific process
// fun such as flushing pipes are complete before we return.
process.Wait()
if h.notifySocket != nil {
h.notifySocket.Close()
}
return e.status, nil
}
}
This approach will prevent any errors related to the fact that we cannot control who is going to receive the exit code if we are waiting the same PID from 2 different Go routines at the same time. Indeed, the subreaper fixes the issue of the children and grandchildren reaping, but we didn't initially need it for the container or exec process since we know we have to wait for them. This approach centralize all the wait()
through a unique Go routine.
I have been discussing about this with @sameo and he agrees with it.
@laijs @bergwolf @gnawux @WeiZhang555 Does this make sense ?
from agent.
original reply is deleted
@sboeuf I'm sorry, I read your words in totally wrong direction, I thought you said "we don't need it to be subreaper", but actually you were saying "we didn't initially need it". 😢
Your codes are definitely good for me!
from agent.
I've been playing around this a bit today and I don't think the current proposal (use a pause
program to own the shared pid namespace and let agent handle all SIGCHLD
, #28 #29 ) can work.
The main reason is that even though prctl(PR_SET_CHILD_SUBREAPER)
can let a process fulfill the role of init(1) for all of its descendant processes, the situation is not kept if any descendant process joins a different pid namespace, in which case, SIGCHLD
is sent to the pid 1 of the new namespace rather than the original subreaper process that forks the child process.
I have a test program showing result:
https://github.com/bergwolf/linux-namespace-tests#linux-namespace-playground
Therefore, the way to handle SIGCHLD and pid namespace sharing should indeed be considered together, as I commented in #28, we should combine shared pidns support with zombie children reaper so that either:
the agent is pid1 of the shared pidns and handles SIGCHLD, or
the pause process lives is the pid1 of the shared pidns and handles SIGCHLD
I was wrong. If new process is forked from outside of the pidns, it will be reparented to its subreaper ancestor, which is exactly the agent use case.
from agent.
@bergwolf Your test case is not the case for the agent or libcontainer.
If processA enter pidnsB, Non of the descendant of processA is the descendant of pidnsB.
from agent.
@bergwolf I need to do some testing too. But if what you're saying is true (a subreaper don't receive SIGCHLD across PID namespaces), then how do we handle the case of a non-shared PID namespace ?
from agent.
I'm changing the tests to match agent/libcontainer use case. Will update here later.
from agent.
update: I was wrong. If new process is forked from outside of the pidns, it will be reparented to its subreaper ancestor, which is exactly the agent use case.
from agent.
@bergwolf what about the children and grandchildren. The case we're trying to cover here is:
- We have the agent set as the subreaper
- The agent will spawn container process, and libcontainer takes care of that, making this process entering a new PID ns.
- The container process will spawn a bunch of new children and grandchildren.
The question is, do we get all the SIGCHLD (from the agent) for all the processes that I have mentioned here ?
from agent.
@sboeuf in my test, yes. All SIGCHLD of agent's children/grandchildren etc. are sent to the subreaper agent no mater they enter which pidns.
from agent.
Phew ! I am relieved :)
@bergwolf then you're okay with #28 ?
from agent.
@sboeuf yes, I think it is the right way to go.
from agent.
@bergwolf glad to hear that :)
from agent.
Related Issues (20)
- action: Improve porting checks
- action: Improve porting checks
- action: Improve porting checks
- github: Remove issue template and use central one
- Race between getDeviceName() and uevent arrival
- Passing PCI device information from host to VM is limited and messy HOT 5
- fail to hotplug device if host memory size large enough HOT 7
- make proto fail using docker 18.06 on arm64
- Kata 2.0.0 doesn't play well with Docker HOT 1
- Mount failed with "rw,stripe=1024" HOT 2
- may be a fd leak? HOT 1
- enable github actions
- hugepage support in Kata
- guest OCI hooks failed to find config.json HOT 1
- backport github actions for stable-1.12
- /dev/pts/N leak HOT 1
- pci bus path changes in qemu/arm64 vm when using acpi
- mount: don't error of virtiofs share is already mounted HOT 1
- running oci hook fail with "wait: no child processes" error sometimes when stop container HOT 1
- Revert 1.13.0-alpha1 version bump, as the release was never tagged.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from agent.