GithubHelp home page GithubHelp logo

numba-mpi / numba-mpi Goto Github PK

View Code? Open in Web Editor NEW
33.0 4.0 7.0 230 KB

Numba @njittable wrappers for MPI C API tested on Linux, macOS and Windows

Home Page: https://numba-mpi.github.io/numba-mpi/

License: GNU General Public License v3.0

Python 100.00%
mpi mpi4py numba hpc python pypi-package conda-forge

numba-mpi's Introduction

numba-mpi logo numba-mpi

Python 3 LLVM Linux OK macOS OK Windows OK Github Actions Status Maintenance License: GPL v3 PyPI version Anaconda-Server Badge AUR package DOI

Overview

numba-mpi provides Python wrappers to the C MPI API callable from within Numba JIT-compiled code (@njit mode).

Support is provided for a subset of MPI routines covering: size/rank, send/recv, allreduce, bcast, scatter/gather & allgather, barrier, wtime and basic asynchronous communication with isend/irecv (only for contiguous arrays); for request handling including wait/waitall/waitany and test/testall/testany.

The API uses NumPy and supports both numeric and character datatypes (e.g., broadcast). Auto-generated docstring-based API docs are published on the web: https://numba-mpi.github.io/numba-mpi

Packages can be obtained from PyPI, Conda Forge, Arch Linux or by invoking pip install git+https://github.com/numba-mpi/numba-mpi.git.

numba-mpi is a pure-Python package. The codebase includes a test suite used through the GitHub Actions workflows (thanks to mpi4py's setup-mpi!) for automated testing on: Linux (MPICH, OpenMPI & Intel MPI), macOS (MPICH & OpenMPI) and Windows (MS MPI).

Features that are not implemented yet include (help welcome!):

  • support for non-default communicators
  • support for MPI_IN_PLACE in [all]gather/scatter and allreduce
  • support for MPI_Type_create_struct (Numpy structured arrays)
  • ...

Hello world send/recv example:

import numba, numba_mpi, numpy

@numba.njit()
def hello():
    src = numpy.array([1., 2., 3., 4., 5.])
    dst_tst = numpy.empty_like(src)

    if numba_mpi.rank() == 0:
        numba_mpi.send(src, dest=1, tag=11)
    elif numba_mpi.rank() == 1:
        numba_mpi.recv(dst_tst, source=0, tag=11)

hello()

Example comparing numba-mpi vs. mpi4py performance:

The example below compares Numba + mpi4py vs. Numba + numba-mpi performance. The sample code estimates $\pi$ by integration of $4/(1+x^2)$ between 0 and 1 dividing the workload into n_intervals handled by separate MPI processes and then obtaining a sum using allreduce. The computation is carried out in a JIT-compiled function and is repeated N_TIMES, the repetitions and the MPI-handled reduction are done outside or inside of the JIT-compiled block for mpi4py and numba-mpi, respectively. Timing is repeated N_REPEAT times and the minimum time is reported. The generated plot shown below depicts the speedup obtained by replacing mpi4py with numba_mpi as a function of n_intervals - the more often communication is needed (smaller n_intervals), the larger the expected speedup.

import timeit, mpi4py, numba, numpy as np, numba_mpi

N_TIMES = 10000
N_REPEAT = 10
RTOL = 1e-3

@numba.njit
def get_pi_part(out, n_intervals, rank, size):
    h = 1 / n_intervals
    partial_sum = 0.0
    for i in range(rank + 1, n_intervals, size):
        x = h * (i - 0.5)
        partial_sum += 4 / (1 + x**2)
    out[0] = h * partial_sum

@numba.njit
def pi_numba_mpi(n_intervals):
    pi = np.array([0.])
    part = np.empty_like(pi)
    for _ in range(N_TIMES):
        get_pi_part(part, n_intervals, numba_mpi.rank(), numba_mpi.size())
        numba_mpi.allreduce(part, pi, numba_mpi.Operator.SUM)
        assert abs(pi[0] - np.pi) / np.pi < RTOL

def pi_mpi4py(n_intervals):
    pi = np.array([0.])
    part = np.empty_like(pi)
    for _ in range(N_TIMES):
        get_pi_part(part, n_intervals, mpi4py.MPI.COMM_WORLD.rank, mpi4py.MPI.COMM_WORLD.size)
        mpi4py.MPI.COMM_WORLD.Allreduce(part, (pi, mpi4py.MPI.DOUBLE), op=mpi4py.MPI.SUM)
        assert abs(pi[0] - np.pi) / np.pi < RTOL

plot_x = [1000 * k for k in range(1, 11)]
plot_y = {'numba_mpi': [], 'mpi4py': []}
for n_intervals in plot_x:
    for impl in plot_y:
        plot_y[impl].append(min(timeit.repeat(
            f"pi_{impl}({n_intervals})",
            globals=locals(),
            number=1,
            repeat=N_REPEAT
        )))

if numba_mpi.rank() == 0:
    from matplotlib import pyplot
    pyplot.figure(figsize=(8.3, 3.5), tight_layout=True)
    pyplot.plot(plot_x, np.array(plot_y['mpi4py'])/np.array(plot_y['numba_mpi']), marker='o')
    pyplot.xlabel('n_intervals (workload in between communication)')
    pyplot.ylabel('wall time ratio (mpi4py / numba_mpi)')
    pyplot.title(f'mpiexec -np {numba_mpi.size()}')
    pyplot.grid()
    pyplot.savefig('readme_plot.png')

plot

MPI resources on the web:

Acknowledgements:

Development of numba-mpi has been supported by the Polish National Science Centre (grant no. 2020/39/D/ST10/01220).

numba-mpi's People

Contributors

abulenok avatar david-zwicker avatar delcior avatar slayoo avatar xann16 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

numba-mpi's Issues

Does not support Intel MPI

I tried to import numba_mpi, but got the following error:

File "<stdin>", line 1, in <module>
 File "...\lib\site-packages\numba_mpi\__init__.py", line 4, in <module>
   from .mpi import Operator, allreduce, initialized, rank, recv, send, size
 File "...\lib\site-packages\numba_mpi\mpi.py", line 34, in <module>
   libmpi = ctypes.CDLL(LIB)
 File "...\lib\ctypes\__init__.py", line 364, in __init__
   self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found

I see that in mpi.py you try to load msmpi.dll. However, this is not the only possible library for MPI. For example, Intel makes an MPI library available (see the Intel Conda channel, for example), which is what I have installed.

refactor `_jit` methods in tests not to mislead that `py_func` points to impl

E.g., in test_barrier.py:

@numba.njit()
def jit_barrier():
    return numba_mpi.barrier()


@pytest.mark.parametrize("barrier", (jit_barrier.py_func, jit_barrier))
def test_barrier(barrier):
    status = barrier()

    assert status == MPI_SUCCESS

we intend to test the non-jitted version by takin jit_barrier.py_func, but in fact due to presence of jit_barrier, we are not testing it! The intention here was to ensure if all API elements are njittable (i.e. we have not forgotten Numba decorators in implementations).

MPICH stopped working on CI for Python <3.10 (works for 3.10!)

example: https://github.com/atmos-cloud-sim-uj/numba-mpi/actions/runs/3540947727

error:

A process has executed an operation involving a call
to the fork() system call to create a child process.

As a result, the libfabric EFA provider is operating in
a condition that could result in memory corruption or
other system errors.

For the libfabric EFA provider to work safely when fork()
is called, you will need to set the following environment
variable:
          RDMAV_FORK_SAFE

However, setting this environment variable can result in
signficant performance impact to your application due to
increased cost of memory registration.

You may want to check with your application vendor to see
if an application-level alternative (of not using fork)
exists.

Your job will now abort.

Tha CI jobs then timeouts after 6h

numba_mpi.allreduce can`t handle **scalar type as recvobj**

I have a small project ~10000 lines on python & numba. High-level code is easier to write on pure python, and low-level code on Numba. For pure python I use mpi4py and for numba try to use numba-mpi. It would be probably nice if the code was similar in both cases, although the differences are easily overcome.
For example numba_mpi.allreduce can`t handle scalar type as recvobj, but mpi4py provides with Comm-class functions with upper case first letter for sending buffer like objects like Numpy arrays and with lower case first letter for sending generic data objects. https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:mpi4py
It would be nice to support such a convention, otherwise one have to convert generic numba types to and from numpy arrays.

code example where I should create an numpy array of one element to receive sum of lengths of self.energy_list from all processes

import numba as nb
import numba_mpi as nb_mpi

energy_list_len = np.empty(1, dtype=np.int64)
nb_mpi.allreduce(len(self.energy_list), energy_list_len)

Best Vladimir.

Originally posted by @Konjkov in #83 (comment)

getting `numba_mpi.scatter` to work (type of first argument, ...)

I want to scatter an numpy array from root = 0 to all processors by numba_mpi.scatter(). Here is the code:
import numba_mpi
import numpy as np
from numba import jit
@jit
def roll_array_scalar():
if numba_mpi.rank() == 0:
A = np.empty((2,4));
A[0] = np.random.uniform(1,1,(4));
A[1] = np.random.uniform(2,2,(4));
else:
A = None
B = np.empty((1,4));
numba_mpi.scatter( A, B, count = B.size, root = 0 )
print(numba_mpi.rank(),B)

roll_array_scalar()

Here is the error: numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'flags' of type none
File "../../../anaconda3/lib/python3.9/site-packages/numba_mpi/api/scatter_gather.py", line 35:
def scatter(send_data, recv_data, count, root):

"""wrapper for MPI_Scatter(). Returns integer status code (0 == MPI_SUCCESS)"""
assert send_data.flags.c_contiguous # TODO #60
^
During: typing of get attribute at /home/nandita/anaconda3/lib/python3.9/site-packages/numba_mpi/api/scatter_gather.py (35)

However, if I define the array A in all processors and then scatter it to B, then it works:
@jit
def roll_array_scalar():
A = np.empty((2,4));
A[0] = np.random.uniform(1,1,(4));
A[1] = np.random.uniform(2,2,(4));
B = np.empty((1,4));
numba_mpi.scatter( A, B, count = B.size, root = numba_mpi.rank() )
print(numba_mpi.rank(),B)

roll_array_scalar()
Output: 0 [[1. 1. 1. 1.]]
1 [[2. 2. 2. 2.]]

Please resolve the issue so that I can scatter an array defined in rank 0 to all processors.

publish `numba_mpi` on conda

Would it be possible to publish the package on conda as well as on pypi? I would like to use the package in my py-pde project and conda is a convenient way of installing many packages together, e.g., in an HPC environment.

`test_isend_irecv` works for 2 workers (as checkd on CI), but not for one or three...

with one worker, for example:

_____________________ test_recv_default_source[send-recv0] _____________________

snd = CPUDispatcher(<function send at 0x7c8bb8ac84c0>)
rcv = CPUDispatcher(<function recv at 0x7c8bb8a46e50>)

    @pytest.mark.parametrize(
        "snd, rcv", [(mpi.send, mpi.recv), (mpi.send.py_func, mpi.recv.py_func)]
    )
    def test_recv_default_source(snd, rcv):
        src = get_random_array(())
        dst_tst = np.empty_like(src)
    
        if mpi.rank() == 0:
            status = snd(src, dest=1, tag=44)
>           assert status == MPI_SUCCESS
E           assert 6 == 0

tests/api/test_send_recv.py:120: AssertionError

with three workers:

tests/api/test_isend_irecv.py .......................................... [ 27%]
..--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
..........................................--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node devel exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.