Comments (6)
Hi,
a queue system is not mandatory.
AbiPy works also in the simplest case in which the calculation is executed in a shell subprocess.
For further details see here
In this case, you need to specify the maximum number of cores that can be used by the scheduler
and the max number of jobs that are allowed to run. See also abinit/abitutorials#3
One can use the pre_run
option in the manager.yml file to specify
a list of commands to be executed before running the code e.g.:
pre_run:
- ulimit -s unlimited
- ...
Could you provide an example of script you use to run Abinit in parallel on your cluster?
from abipy.
Thanks for the prompt reply.
I have been running successfully on a single node using abipy, but am interested in running on various hosts simultaneously. I normally run ABINIT in the following manner:
mpirun -np 72 -hosts node0,node1,node2 abinit < some.files
where node[0-2] are the hosts over which ABINIT will be parallelized. I could also create a file with the hostnames, and execute with
mpirun -np 72 --hostfile hosts.file abinit < some.files
where hosts.file would contain
$ cat hosts.file
node0
node1
node2
Thanks a lot for the help.
from abipy.
Ok, I see the problem.
I didn't consider the hostst.file syntax but I think it's possible to support it.
The problem is that, by default, AbiPy tries to find an "optimal" parallel configuration
for a given input file where parallel configuration means:
- Number of MPI processes
- Input variables governing the parallel algorithm (e.g. npfft, npband ... if paral_kgb == 1)
So AbiPy will select the "optimal" configuration and will write the associated submission script
at runtime thus delegating the allocation of the CPUS to the resource manager.
It's possible to disable this feature and one can also enforce a particular number of CPUs with:
my_manager = manager.new_with_fixed_mpi_omp(mpi_procs, omp_threads)
See examples in the abipy/benchmarks directory
This approach, however, assumes pre-generated input files whose parallel variables (npfft, npband, npkpt) are compatible with the number of MPI ranks requested by the user (this is very important
especially when paral_kgb == 1 is used otherwise the code will stop immediately).
I can add support for hostfiles once I know the total number of CPUS and the number of procs
per node (this value is reported in manager.yml)
The challenge is how to optimise the resources when multiple calculations are executed concurrently i.e. one can have 2 calculations requiring 72 procs each and these calculations
should be running on node[0-2] and node[3-5] to avoid node overloading.
This means that the AbiPy scheduler should keep a record of the nodes that have
been already booked and select those that are free.
This job is usually done by the resource manager.
Adding support at the level of the AbiPy scheduler requires some coding that for sure won't be as efficient as the logic already implemented by a real resource manager.
Could you give more details about your typical workflows so that I can have a better view
of the problem?
from abipy.
Sure thing,
We specialize in first principle calculation of optical properties. Typically, we use ABINIT to calculate the electronic density and energies, which we then use to calculate various susceptiblities for any given material. We typically work with DFT-LDA, but we also have some domain over GW and BSE calculations, particularly in conjunction with the DP/EXC code.
I am currently trying to audit abipy
to see if it first nicely in our workflow. In particular, I would like to execute a G0W0 calculation and evaluate convergence over different parameters automatically -- in essence, starting the program and having it do the convergence automatically without any user interaction. My group has used ABINIT for many years in the standard way: editing text files and reviewing results. It seems that abipy
offers a very sophisticated mechanism for improving on this method, but as you are very likely aware of, academics are often hesitant about integrating new methods into production.
Anyway, that's the gist of it.
As for this particular issue of using a specific set of nodes: it occurs to me that the user could add the hosts to the manager.yml
file, and that this would be parsed along with the number of processes, etc. Then, abipy
would use that as a replacement for the allocation done by the real resource manager, in essence handling the automatic parallelization but only across the specified nodes.
from abipy.
As for this particular issue of using a specific set of nodes: it occurs to me that the user could add the hosts to the manager.yml file, and that this would be parsed along with the number of processes, etc. Then, abipy would use that as a replacement for the allocation done by the real resource manager, in essence handling the automatic parallelization but only across the specified nodes.
manager.yml already provides an option (mpi_runner_options
) to pass options to mpirun
See also:
abidoc.py manager
If I use the following manager.yml:
qadapters:
# List of qadapters objects
- priority: 1
queue:
qtype: shell
qname: localhost
job:
mpi_runner: mpirun
mpi_runner_options: "--hostfile ${HOME}/my_hosts"
# source a script to setup the environment.
#pre_run: "source ~/env.sh"
limits:
timelimit: 1:00:00
max_cores: 2
hardware:
num_nodes: 1
sockets_per_node: 1
cores_per_socket: 2
mem_per_node: 4 Gb
and I run one of the examples in abipy/examples/flows
(e.g. run_si_ebands.py),
I get the following shell script
#!/bin/bash
cd /Users/gmatteo/git_repos/abipy/abipy/examples/flows/flow_si_ebands/w0/t0
# OpenMp Environment
export OMP_NUM_THREADS=1
mpirun --hostfile ${HOME}/my_hosts -n 1 abinit < /Users/gmatteo/git_repos/abipy/abipy/examples/flows/flow_si_ebands/w0/t0/run.files > /Users/gmatteo/git_repos/abipy/abipy/examples/flows/flow_si_ebands/w0/t0/run.log 2> /Users/gmatteo/git_repos/abipy/abipy/examples/flows/flow_si_ebands/w0/t0/run.err
This solution should work and does not require any change in the present implementation.
The number of MPI ranks will be defined at runtime by calling abinit in autoparal mode
with max_cores: 2
.
Remember to set:
# Limit on the number of jobs that can be present in the queue. (DEFAULT: 200)
max_njobs_inqueue: 2
# Maximum number of cores that can be used by the scheduler.
max_ncores_used: 4
in your scheduler.yml to avoid oversubscribing nodes.
from abipy.
I think this solution will work great for me, and I will try it out during the week.
Thanks!
from abipy.
Related Issues (20)
- abistruct.py for eBAND HOT 8
- pytmatgen structure not abipy? HOT 1
- Schedular error
- Erroneous energy conversion?
- python3.6.2 package not found error
- error in import abilib HOT 7
- Private reporting of a potential security vulnerability HOT 1
- Incompatible with `numpy>=1.24.0`
- Incompatible with recent `pymatgen` HOT 5
- pydantic 2.0: `BaseSettings` has been moved to the `pydantic-settings` package.
- pinning the pymatgen version? HOT 3
- ImportError when using abicheck.py with abipy and pymatgen HOT 1
- `which` has been removed from `monty.os.path` HOT 1
- 🐛 `NameError`: name 'TYPE_CHECKING' is not defined
- Reintroduce support for Python 3.8?
- module 'pymatgen.io.abinit.netcdf' has no attribute 'ETSF_Reader' HOT 3
- Abiypy electron ebands.py part problem
- Local manager does not limit the number of running tasks
- Compatibility with pymatgen
- Monty type error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from abipy.