azrael3000 / tmpi Goto Github PK
View Code? Open in Web Editor NEWRun a parallel command inside a split tmux window
License: GNU General Public License v2.0
Run a parallel command inside a split tmux window
License: GNU General Public License v2.0
Dear Azrael3000,
I am failing to use tmpi. I have a toy mpi program hat runs OK with :
mpirun -np 2 xterm -e gdb /tmp/fort
But if I switch to :
tmpi 2 gdb /tmp/fort
I see a tmux appear then disappear with a [exited]
print on screen.
Is there a way to debug tmpi to see what is going wrong ?
Thanks,
Nicolas
intelmpi
implements the MPICH specification (according to https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/mpi-library.html#gs.9dy1lr), but does not seem to work with tmpi
.
I tried changing
mpich=$(mpirun --version |grep -i -e HYDRA -e Intel)
but I get strange module errors
manpath: warning: $MANPATH set, ignoring /etc/man_db.conf
[ERR] The following modulefiles are not provided by module profile 'profile/base':
[ERR] + intel/pe-xe-2020--binary
[ERR] + intelmpi/2020--binary
[ERR] + mkl/2020--binary
[ERR] Suggestion: 'module unload intel/pe-xe-2020--binary intelmpi/2020--binary mkl/2020--binary'
Loading profile/base
ERROR: can't read "errorCode": no such variable
These don't appear when using the openmpi module. Unfortunately this cluster doesn't have mpich.
Edit: I think similar errors do actually appear when using openmpi, but exection continues and they are covered up by panels being launched soon afterward.
I don't know what else to try - am happy to try some more debugging if anyone has suggestions?
I'm on fedora with mpich. See for example what happens with tmpi 2 python
(spoiler: it reports size 1).
https://asciinema.org/a/HbRzLtMfjNoYaTh0zad4MOoeN
Instead using mpirun i get
โ mpirun -np 2 python -c "from mpi4py import MPI; print(MPI.COMM_WORLD.Get_size())"
2
2
Do you have any suggestion on what might be happening? maybe some env variables are slipping through?
Thank you @Azrael3000 for maintaining this awesome script! It's been a game-changer for debugging MPI.
I'm having issues setting it up on a new server though.
Running e.g. tmpi 2 gdb
(or any other command with any number of ranks) just open one pane with a bash shell and nothing else. Having a hard time figuring out the problem without any errors.
I'm using OpenMPI
[wdmc@tartarus ~]$ mpiexec --version
mpiexec (OpenRTE) 4.0.3
Report bugs to http://www.open-mpi.org/community/help/
Hello,
first of all, let me say that tmpi
is great :) or better, it could be !
I bumped into this issue -- instead of getting say 4 panes each pane running one rank of an mpirun -n 4 ...
program, I get 4 independent processes running in the 4 panes.
I don't do anything particularly relevant -- I run it normally as tmpi 4 ...
and that's what I get. With any MPI program, including a classic hello world.
Is there anything on top of your mind that could help me out, or that I could look at?
Is it possible to use tmpi with slurm, I tried a few quick experiments setting the mpirun env var to srun and launching jobs, but I didn't get it working (tmpi exits immediately). I don't usually use tmux, so didn't dig deep or experiment much, but if it was possible to have
tmpi 4 gdb --args /my/job/to/debug
expand out to do the equivalent of
srun -n 4 blah blah blah
and do the right thing, it'd be very useful indeed. On some machines I'm not able to use xterm, so can't run a simple
srun -n 4 xterm -e gdb --args /my/job/to/debug
which is one way of debugging under slurm.
Starting a Python shell with tmpi 2 python
and running from mpi4py import MPI
results in a failed MPI_INIT on all ranks but the first (for any number of ranks afaict) with the following message:
Python 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from mpi4py import MPI
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_mpi_instance_init failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[dyn3168-24:00000] *** An error occurred in MPI_Init_thread
[dyn3168-24:00000] *** reported by process [2283732993,1]
[dyn3168-24:00000] *** on a NULL communicator
[dyn3168-24:00000] *** Unknown error
[dyn3168-24:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[dyn3168-24:00000] *** and MPI will try to terminate your MPI job as well)
Pane is dead (status 14, Mon Feb 19 14:17:33 2024)
This is on MacOS with OpenMPI 5.0.2 installed using Brew. A script containing this import works fine when run with mpiexec -n 2 --oversubscribe python script.py
.
Any idea why this might be happening? The behaviour seems to be TMPI-specific. I worked initially after install, but started throwing this error, and reinstalling both OpenMPI and TMPI hasn't fixed the issue afaict.
Hello,
Sorry if this is not the best place to ask question, but...
I just started using your tool for debugging and would like to know if it is possible to hide all panels except one but keep the synchronization ?
Thank you for maintaing this project, it helps me a lot.
I tried to modify the source file to enable intel mpi, but failed.
I changed
mpich=$(mpirun --version |grep -i HYDRA)
to
mpich=$(mpirun --version |grep -i -e HYDRA -e Intel)
and comment out (if not, mpirun will end with error: unrecognized argument pmi-port)
if [ -n "${mpich}" ]; then
pmi_arg="-pmi-port"
fi
I am using Quantum ESPRESSO, a fortran code. When I enter in tmux and type in
run < input_script
to run the program, it ended with
[cli_1]: write_line error; fd=10 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_1]: Unable to write to PMI_fd
[cli_1]: write_line error; fd=10 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Abort(1091087) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136):
MPID_Init(709).......:
MPIR_pmi_init(105)...: PMI_Get_appnum returned -1
[cli_1]: write_line error; fd=10 buf=:cmd=abort exitcode=1091087
:
system msg for write_line failure : Bad file descriptor
*** error in Message Passing (mp) module ***
*** error code: 8001
Attempting to use an MPI routine before initializing MPICH
[Inferior 1 (process 10868) exited with code 01]
Does anyone know how to resolve this problem?
Thank you very much.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.