Comments (7)
Also tested with the latest 5.0.2 patch, with no success.
from ipyparallel.
A hunch is to set:
ENV OMPI_ALLOW_RUN_AS_ROOT=1
ENV OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
in the dockerfile. At least it works when openmpi is installed through apt:
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y openmpi-bin libopenmpi-dev
from ipyparallel.
It did not work with:
FROM ubuntu:22.04
ARG OPENMPI_SERIES=5.0
ARG OPENMPI_PATCH=1
RUN DEBIAN_FRONTEND=noninteractive apt-get update && \
apt-get install -y wget g++ cmake python3-dev
# RUN DEBIAN_FRONTEND=noninteractive apt-get install -y openmpi-bin libopenmpi-dev
RUN wget https://download.open-mpi.org/release/open-mpi/v${OPENMPI_SERIES}/openmpi-${OPENMPI_SERIES}.${OPENMPI_PATCH}.tar.gz && \
tar xfz openmpi-${OPENMPI_SERIES}.${OPENMPI_PATCH}.tar.gz && \
cd openmpi-${OPENMPI_SERIES}.${OPENMPI_PATCH} && \
./configure && \
make -j${BUILD_NP} install && \
ldconfig
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y python3-pip
RUN python3 -m pip install mpi4py ipyparallel
ENV OMPI_ALLOW_RUN_AS_ROOT=1
ENV OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
RUN python3 -c "import ipyparallel as ipp; cluster = ipp.Cluster(engines='mpi', n=3); rc = cluster.start_and_connect_sync();cluster.stop_cluster_sync()"
bash
> [6/6] RUN python3 -c "import ipyparallel as ipp; cluster = ipp.Cluster(engines='mpi', n=3); rc = cluster.start_and_connect_sync();cluster.stop_cluster_sync()":
1.894 mpiexec error output:
1.894 --------------------------------------------------------------------------
1.894 mpiexec has detected an attempt to run as root.
1.894
1.894 Running as root is *strongly* discouraged as any mistake (e.g., in
1.894 defining TMPDIR) or bug can result in catastrophic damage to the OS
1.894 file system, leaving your system in an unusable state.
1.894
1.894 We strongly suggest that you run mpiexec as a non-root user.
1.894
1.894 You can override this protection by adding the --allow-run-as-root option
1.894 to the cmd line or by setting two environment variables in the following way:
1.894 the variable OMPI_ALLOW_RUN_AS_ROOT=1 to indicate the desire to override this
1.894 protection, and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 to confirm the choice and
1.894 add one more layer of certainty that you want to do so.
1.894 We reiterate our advice against doing so - please proceed at your own risk.
1.894 --------------------------------------------------------------------------
1.894
1.894 engine set stopped 1707377642: {'exit_code': 1, 'pid': 39, 'identifier': 'ipengine-1707377641-o120-1707377642-7'}
1.895 Traceback (most recent call last):
1.895 File "<string>", line 1, in <module>
1.895 File "/usr/local/lib/python3.10/dist-packages/ipyparallel/_async.py", line 72, in _synchronize
1.895 return _asyncio_run(async_f(*args, **kwargs))
1.895 File "/usr/local/lib/python3.10/dist-packages/ipyparallel/_async.py", line 18, in _asyncio_run
1.895 return loop.run_sync(lambda: asyncio.ensure_future(coro))
1.895 File "/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py", line 539, in run_sync
1.895 return future_cell[0].result()
1.895 File "/usr/local/lib/python3.10/dist-packages/ipyparallel/cluster/cluster.py", line 757, in start_and_connect
1.895 await asyncio.wrap_future(
1.895 ipyparallel.error.EngineError: Engine set stopped: {'exit_code': 1, 'pid': 39, 'identifier': 'ipengine-1707377641-o120-1707377642-7'}
1.896 Stopping cluster <Cluster(cluster_id='1707377641-o120', profile='default', controller=<running>, engine_sets=['1707377642'])>
------
Dockerfile:17
--------------------
15 | RUN python3 -m pip install mpi4py ipyparallel
16 |
17 | >>> RUN python3 -c "import ipyparallel as ipp; cluster = ipp.Cluster(engines='mpi', n=3); rc = cluster.start_and_connect_sync();cluster.stop_cluster_sync()"
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -c \"import ipyparallel as ipp; cluster = ipp.Cluster(engines='mpi', n=3); rc = cluster.start_and_connect_sync();cluster.stop_cluster_sync()\"" did not complete successfully: exit code: 1
from ipyparallel.
Yeah, OMPI refuses to run as root unless you tell it you're really super sure that's what you want. I think the goal hereis perhaps to figure out why you're not getting error output in the first case, because I'ld call the second one "working as intended" since you get OMPI's actionable error message. Maybe a race/buffering issue.
from ipyparallel.
Yeah, OMPI refuses to run as root unless you tell it you're really super sure that's what you want. I think the goal hereis perhaps to figure out why you're not getting error output in the first case, because I'ld call the second one "working as intended" since you get OMPI's actionable error message. Maybe a race/buffering issue.
It didn't run with 4.1.2 with wget, which should be the same as the one on apt.
from ipyparallel.
I think the issue in IPP is in how it tries to parse output from mpiexec in order to log errors. The structure changed, so IPP doesn't extract messages from OMPI anymore, and doesn't report what it does find, which is either:
prterun has detected an attempt to run as root.
Running as root is *strongly* discouraged as any mistake (e.g., in
defining TMPDIR) or bug can result in catastrophic damage to the OS
file system, leaving your system in an unusable state.
We strongly suggest that you run prterun as a non-root user.
You can override this protection by adding the --allow-run-as-root
option to your command line. However, we reiterate our strong advice
against doing so - please do so at your own risk.
--------------------------------------------------------------------------
if you are missing allow-run-as-root, and then
--------------------------------------------------------------------------
It looks like "prte_init()" failed for some reason. There are many
reasons that can cause PRRTE to fail during "prte_init()", some of
which are due to configuration or environment problems. This failure
appears to be an internal failure — here's some additional information
(which may only be relevant to a PRRTE developer):
prte_plm_base_select failed
--> Returned value (-46) instead of PRTE_SUCCESS
--------------------------------------------------------------------------
if you are missing OMPI_MCA_plm_ssh_agent=false
Ultimately, I think you need to set these environment variables:
export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
export OMPI_MCA_plm=ssh
export OMPI_MCA_plm_ssh_agent=false
In OMPI 4, OMPI_MCA_plm=isolated
is simpler, but PRRTE, which was adopted in ompi 5, lacks an isolated
plm, so you need to use ssh
with ssh_agent=false
.
The issues seem to stem from OMPI migrating from ORTE to PRRTE, which renamed a bunch of things, and doesn't seem to produce particularly informative errors.
from ipyparallel.
Working setting for OMPI4:
export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
export OMPI_MCA_plm=isolated
Working for 5.0.x
export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
export OMPI_MCA_plm=ssh
export OMPI_MCA_plm_ssh_agent=false
for the minimal test case.
from ipyparallel.
Related Issues (20)
- No module named 'jupyter_server' HOT 2
- How to make it work with torch DDP HOT 4
- Transition from `CompositeError` to builtin `ExceptionGroup` HOT 1
- ipcluster nbextension enable not working after notebook upgrade HOT 2
- Print in multiprocessing.Process crashing the engine HOT 7
- Windows ssh support by ipcluster HOT 33
- map_sync with pandas operation function does not finish. HOT 1
- Py3.10 code serialization does not work on PyPy3.10
- sync_imports not working as intended HOT 9
- ipyparallel and pymoo doesn't work HOT 2
- AsyncResult.join doesn't work
- AsyncResult.abort() call hangs if not all jobs can be stopped HOT 1
- Question: engines and databases HOT 1
- BroadcastView map Not Implemented HOT 3
- 60s timeout on get_connection_info() is not configurable HOT 1
- please release/tag/pypi the current version as it supports JupyterLab 4.x HOT 2
- SSHEngineLauncher does not work as expected HOT 2
- Outstanding task on client but hub says completed when using broadcast view
- Entrypoints should be phased out
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ipyparallel.