GithubHelp home page GithubHelp logo

Comments (7)

jorgensd avatar jorgensd commented on June 25, 2024

Also tested with the latest 5.0.2 patch, with no success.

from ipyparallel.

jorgensd avatar jorgensd commented on June 25, 2024

A hunch is to set:

ENV OMPI_ALLOW_RUN_AS_ROOT=1 
ENV OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 

in the dockerfile. At least it works when openmpi is installed through apt:

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y openmpi-bin libopenmpi-dev

from ipyparallel.

jorgensd avatar jorgensd commented on June 25, 2024

It did not work with:

FROM ubuntu:22.04
ARG OPENMPI_SERIES=5.0
ARG OPENMPI_PATCH=1
RUN DEBIAN_FRONTEND=noninteractive  apt-get update && \
    apt-get install -y wget g++ cmake  python3-dev
# RUN DEBIAN_FRONTEND=noninteractive apt-get install -y openmpi-bin libopenmpi-dev
RUN wget https://download.open-mpi.org/release/open-mpi/v${OPENMPI_SERIES}/openmpi-${OPENMPI_SERIES}.${OPENMPI_PATCH}.tar.gz && \
    tar xfz openmpi-${OPENMPI_SERIES}.${OPENMPI_PATCH}.tar.gz  && \
    cd openmpi-${OPENMPI_SERIES}.${OPENMPI_PATCH} && \
    ./configure  && \
    make -j${BUILD_NP} install && \
    ldconfig

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y python3-pip
RUN python3 -m pip install mpi4py ipyparallel
ENV OMPI_ALLOW_RUN_AS_ROOT=1 
ENV OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 
RUN python3 -c "import ipyparallel as ipp; cluster = ipp.Cluster(engines='mpi', n=3); rc = cluster.start_and_connect_sync();cluster.stop_cluster_sync()"
bash
> [6/6] RUN python3 -c "import ipyparallel as ipp; cluster = ipp.Cluster(engines='mpi', n=3); rc = cluster.start_and_connect_sync();cluster.stop_cluster_sync()":                                                                                                                                                                                                                                                                                 
1.894 mpiexec error output:                                                                                                                                                                                                                                                                                                                                                                                                                        
1.894 --------------------------------------------------------------------------                                                                                                                                                                                                                                                                                                                                                                   
1.894 mpiexec has detected an attempt to run as root.                                                                                                                                                                                                                                                                                                                                                                                              
1.894                                                                                                                                                                                                                                                                                                                                                                                                                                              
1.894 Running as root is *strongly* discouraged as any mistake (e.g., in
1.894 defining TMPDIR) or bug can result in catastrophic damage to the OS
1.894 file system, leaving your system in an unusable state.
1.894 
1.894 We strongly suggest that you run mpiexec as a non-root user.
1.894 
1.894 You can override this protection by adding the --allow-run-as-root option
1.894 to the cmd line or by setting two environment variables in the following way:
1.894 the variable OMPI_ALLOW_RUN_AS_ROOT=1 to indicate the desire to override this
1.894 protection, and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 to confirm the choice and
1.894 add one more layer of certainty that you want to do so.
1.894 We reiterate our advice against doing so - please proceed at your own risk.
1.894 --------------------------------------------------------------------------
1.894 
1.894 engine set stopped 1707377642: {'exit_code': 1, 'pid': 39, 'identifier': 'ipengine-1707377641-o120-1707377642-7'}
1.895 Traceback (most recent call last):
1.895   File "<string>", line 1, in <module>
1.895   File "/usr/local/lib/python3.10/dist-packages/ipyparallel/_async.py", line 72, in _synchronize
1.895     return _asyncio_run(async_f(*args, **kwargs))
1.895   File "/usr/local/lib/python3.10/dist-packages/ipyparallel/_async.py", line 18, in _asyncio_run
1.895     return loop.run_sync(lambda: asyncio.ensure_future(coro))
1.895   File "/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py", line 539, in run_sync
1.895     return future_cell[0].result()
1.895   File "/usr/local/lib/python3.10/dist-packages/ipyparallel/cluster/cluster.py", line 757, in start_and_connect
1.895     await asyncio.wrap_future(
1.895 ipyparallel.error.EngineError: Engine set stopped: {'exit_code': 1, 'pid': 39, 'identifier': 'ipengine-1707377641-o120-1707377642-7'}
1.896 Stopping cluster <Cluster(cluster_id='1707377641-o120', profile='default', controller=<running>, engine_sets=['1707377642'])>
------
Dockerfile:17
--------------------
  15 |     RUN python3 -m pip install mpi4py ipyparallel
  16 |     
  17 | >>> RUN python3 -c "import ipyparallel as ipp; cluster = ipp.Cluster(engines='mpi', n=3); rc = cluster.start_and_connect_sync();cluster.stop_cluster_sync()"
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -c \"import ipyparallel as ipp; cluster = ipp.Cluster(engines='mpi', n=3); rc = cluster.start_and_connect_sync();cluster.stop_cluster_sync()\"" did not complete successfully: exit code: 1

from ipyparallel.

minrk avatar minrk commented on June 25, 2024

Yeah, OMPI refuses to run as root unless you tell it you're really super sure that's what you want. I think the goal hereis perhaps to figure out why you're not getting error output in the first case, because I'ld call the second one "working as intended" since you get OMPI's actionable error message. Maybe a race/buffering issue.

from ipyparallel.

jorgensd avatar jorgensd commented on June 25, 2024

Yeah, OMPI refuses to run as root unless you tell it you're really super sure that's what you want. I think the goal hereis perhaps to figure out why you're not getting error output in the first case, because I'ld call the second one "working as intended" since you get OMPI's actionable error message. Maybe a race/buffering issue.

It didn't run with 4.1.2 with wget, which should be the same as the one on apt.

from ipyparallel.

minrk avatar minrk commented on June 25, 2024

I think the issue in IPP is in how it tries to parse output from mpiexec in order to log errors. The structure changed, so IPP doesn't extract messages from OMPI anymore, and doesn't report what it does find, which is either:

prterun has detected an attempt to run as root.

Running as root is *strongly* discouraged as any mistake (e.g., in
defining TMPDIR) or bug can result in catastrophic damage to the OS
file system, leaving your system in an unusable state.

We strongly suggest that you run prterun as a non-root user.

You can override this protection by adding the --allow-run-as-root
option to your command line.  However, we reiterate our strong advice
against doing so - please do so at your own risk.
--------------------------------------------------------------------------

if you are missing allow-run-as-root, and then

--------------------------------------------------------------------------
It looks like "prte_init()" failed for some reason. There are many
reasons that can cause PRRTE to fail during "prte_init()", some of
which are due to configuration or environment problems.  This failure
appears to be an internal failure — here's some additional information
(which may only be relevant to a PRRTE developer):

   prte_plm_base_select failed
   --> Returned value  (-46) instead of PRTE_SUCCESS
--------------------------------------------------------------------------

if you are missing OMPI_MCA_plm_ssh_agent=false

Ultimately, I think you need to set these environment variables:

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
export OMPI_MCA_plm=ssh
export OMPI_MCA_plm_ssh_agent=false

In OMPI 4, OMPI_MCA_plm=isolated is simpler, but PRRTE, which was adopted in ompi 5, lacks an isolated plm, so you need to use ssh with ssh_agent=false.

The issues seem to stem from OMPI migrating from ORTE to PRRTE, which renamed a bunch of things, and doesn't seem to produce particularly informative errors.

from ipyparallel.

jorgensd avatar jorgensd commented on June 25, 2024

Working setting for OMPI4:

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
export OMPI_MCA_plm=isolated

Working for 5.0.x

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
export OMPI_MCA_plm=ssh
export OMPI_MCA_plm_ssh_agent=false

for the minimal test case.

from ipyparallel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.