Comments (2)
@GoodKairos ,
We don't support Open MPI right now. We have s new implementation here:
https://github.com/mpickpt/mana
Is been tested on MPICH for CentOS. We do intend to support Open MPI in the future. But maybe not on the near term unless there is enough demand.
from dmtcp.
Linked to #910
Hi @gc00 , thank you for the answer.
However I have been led to believe that DMTCP officially support OpenMPI, because this is what is written in doc:
https://dmtcp.sourceforge.io/FAQ.html#mpiCkpt
Does DMTCP support checkpointing of MPI programs?
Yes. And to restart on different hosts, edit the 'ssh' lines in dmtcp_restart_script.sh . DMTCP operates by checkpointing the sockets (or InfiniBand connections, if using --infiniband) created by the MPI library. Hence, it is transparent to MPI and doesn't require any particular MPI configuration or hooks. In principle, DMTCP should run on any MPI over TCP/IP. We usually test on [Open MPI](http://www.open-mpi.org/), [MVAPICH-2](http://mvapich.cse.ohio-state.edu/) and[ MPICH-2](http://www.mcs.anl.gov/research/projects/mpich2/). If you find an MPI that we don't support, this is a bug in DMTCP. We would be appreciative if you can file a bug report. For further details on using MPI, see [QUICK-START.md](http://github.com/dmtcp/dmtcp/tree/master/QUICK-START.md).
As of DMTCP-2.4, DMTCP should offer robust support for popular implementations of MPI, along with support for the SLURM batch queue. (See the [example SLURM scripts for using DMTCP](https://github.com/dmtcp/dmtcp/tree/master/plugin/batch-queue/job_examples).)
I can also see examples of Slurm batch launching dmtcp with OpenMPI so I thought it would work. I spent quite a lot of time trying, only to find out that I should focus on Mana instead. Can someone fix the official doc? Thank you.
from dmtcp.
Related Issues (20)
- port dmtcp for windows HOT 9
- Support for RISC-V HOT 3
- Create checkpoint from a dump file or a running process HOT 1
- `make check` fails on an ARMv8 machine HOT 1
- Segmentation fault at restart
- dmtcp in docker on apple silicon HOT 5
- dmtcp CI is broken for master branch - root cause: python3.8 vs python3.10 pty module HOT 2
- add soversion
- dmtcp build failed on ppc64le, aarch64 and s390x architecture
- The last few checkpoints of the dmtcp save are particularly slow
- INSTALL.md mentions non-existing command line option --no-coordinator for dmtcp_launch HOT 1
- Release 3.0, Windows Subsystem for Linux with Ubuntu 22.04 LTS: all checks fail HOT 1
- Release 3.0, Windows Subsystem for Linux with Ubuntu 22.04 LTS: restart doesn't work HOT 1
- DMTCP build is broken with recent PR 1061 HOT 1
- Duplicating(forking) a checkpointed process? HOT 2
- Segmentation fault on dmtcp (2.6.0) using MPICH (4.2.0)
- Segfault when I set 2 ckpts in a program using share memory HOT 1
- Using '--enable-logging' leads to hang in a simplest case HOT 5
- Segfault when using gethostbyname() after dmtcp_restart HOT 1
- "dmtcp_command -kc" does not kill the node after checkpoint
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dmtcp.