GithubHelp home page GithubHelp logo

tmpi's People

Contributors

azrael3000 avatar fabioluporini avatar glwagner avatar jczhang07 avatar phil-blain avatar philipvinc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tmpi's Issues

fail to use tmpi

Dear Azrael3000,

I am failing to use tmpi. I have a toy mpi program hat runs OK with :
mpirun -np 2 xterm -e gdb /tmp/fort
But if I switch to :
tmpi 2 gdb /tmp/fort
I see a tmux appear then disappear with a [exited] print on screen.

Is there a way to debug tmpi to see what is going wrong ?

Thanks,
Nicolas

Support intelmpi?

intelmpi implements the MPICH specification (according to https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/mpi-library.html#gs.9dy1lr), but does not seem to work with tmpi.

I tried changing

tmpi/tmpi

Line 46 in ae369dc

mpich=$(mpirun --version |grep -i HYDRA)

to

mpich=$(mpirun --version |grep -i -e HYDRA -e Intel)

but I get strange module errors

manpath: warning: $MANPATH set, ignoring /etc/man_db.conf
[ERR] The following modulefiles are not provided by module profile 'profile/base':
[ERR]   + intel/pe-xe-2020--binary
[ERR]   + intelmpi/2020--binary
[ERR]   + mkl/2020--binary
[ERR] Suggestion: 'module unload intel/pe-xe-2020--binary intelmpi/2020--binary mkl/2020--binary'

Loading profile/base
  ERROR: can't read "errorCode": no such variable

These don't appear when using the openmpi module. Unfortunately this cluster doesn't have mpich.
Edit: I think similar errors do actually appear when using openmpi, but exection continues and they are covered up by panels being launched soon afterward.

I don't know what else to try - am happy to try some more debugging if anyone has suggestions?

`tmpi` just opens one shell/pane with no errors

Thank you @Azrael3000 for maintaining this awesome script! It's been a game-changer for debugging MPI.

I'm having issues setting it up on a new server though.

Running e.g. tmpi 2 gdb (or any other command with any number of ranks) just open one pane with a bash shell and nothing else. Having a hard time figuring out the problem without any errors.

I'm using OpenMPI

[wdmc@tartarus ~]$ mpiexec --version
mpiexec (OpenRTE) 4.0.3

Report bugs to http://www.open-mpi.org/community/help/

image

tmpi spawning N different sessions of size 1

Hello,

first of all, let me say that tmpi is great :) or better, it could be !
I bumped into this issue -- instead of getting say 4 panes each pane running one rank of an mpirun -n 4 ... program, I get 4 independent processes running in the 4 panes.

I don't do anything particularly relevant -- I run it normally as tmpi 4 ... and that's what I get. With any MPI program, including a classic hello world.

Is there anything on top of your mind that could help me out, or that I could look at?

use with slurm?

Is it possible to use tmpi with slurm, I tried a few quick experiments setting the mpirun env var to srun and launching jobs, but I didn't get it working (tmpi exits immediately). I don't usually use tmux, so didn't dig deep or experiment much, but if it was possible to have

tmpi 4 gdb --args /my/job/to/debug

expand out to do the equivalent of

srun -n 4 blah blah blah

and do the right thing, it'd be very useful indeed. On some machines I'm not able to use xterm, so can't run a simple

srun -n 4 xterm -e gdb --args /my/job/to/debug

which is one way of debugging under slurm.

MPI_INIT fails on all ranks but rank 0 when using mpi4py

Starting a Python shell with tmpi 2 python and running from mpi4py import MPI results in a failed MPI_INIT on all ranks but the first (for any number of ranks afaict) with the following message:

Python 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from mpi4py import MPI
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_mpi_instance_init failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[dyn3168-24:00000] *** An error occurred in MPI_Init_thread
[dyn3168-24:00000] *** reported by process [2283732993,1]
[dyn3168-24:00000] *** on a NULL communicator
[dyn3168-24:00000] *** Unknown error
[dyn3168-24:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[dyn3168-24:00000] ***    and MPI will try to terminate your MPI job as well)

Pane is dead (status 14, Mon Feb 19 14:17:33 2024)

This is on MacOS with OpenMPI 5.0.2 installed using Brew. A script containing this import works fine when run with mpiexec -n 2 --oversubscribe python script.py.

Any idea why this might be happening? The behaviour seems to be TMPI-specific. I worked initially after install, but started throwing this error, and reinstalling both OpenMPI and TMPI hasn't fixed the issue afaict.

Hide all pannels but keep syncronization

Hello,

Sorry if this is not the best place to ask question, but...
I just started using your tool for debugging and would like to know if it is possible to hide all panels except one but keep the synchronization ?

Problem with intel mpi

Thank you for maintaing this project, it helps me a lot.
I tried to modify the source file to enable intel mpi, but failed.
I changed
mpich=$(mpirun --version |grep -i HYDRA)
to
mpich=$(mpirun --version |grep -i -e HYDRA -e Intel)
and comment out (if not, mpirun will end with error: unrecognized argument pmi-port)

if [ -n "${mpich}" ]; then
    pmi_arg="-pmi-port"
fi

I am using Quantum ESPRESSO, a fortran code. When I enter in tmux and type in
run < input_script
to run the program, it ended with

[cli_1]: write_line error; fd=10 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_1]: Unable to write to PMI_fd
[cli_1]: write_line error; fd=10 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Abort(1091087) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136):
MPID_Init(709).......:
MPIR_pmi_init(105)...: PMI_Get_appnum returned -1
[cli_1]: write_line error; fd=10 buf=:cmd=abort exitcode=1091087
:
system msg for write_line failure : Bad file descriptor
*** error in Message Passing (mp) module ***
*** error code: 8001
Attempting to use an MPI routine before initializing MPICH
[Inferior 1 (process 10868) exited with code 01]

Does anyone know how to resolve this problem?
Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.