nv-legate / legate.core Goto Github PK

View Code? Open in Web Editor NEW

184.0 184.0 61.0 9.08 MB

The Foundation for All Legate Libraries

Home Page: https://docs.nvidia.com/legate/24.06/

License: Apache License 2.0

Python 66.59% C++ 25.95% C 0.62% Shell 2.29% Cuda 0.48% CMake 3.47% Cython 0.60%

legate.core's People

Contributors

Stargazers

Watchers

legate.core's Issues

Automated CI job for releasing on merge to `main`

PR #44 had stated the following...

All merges to main trigger an automated CI job that will produce a new release and tag incrementing the patch version off of the previous highest tag in the repo. Once the tag is set, the automated CI build for conda packages creates and pushes new packages for users. This includes the version change enabling known good builds and the ability to rollback.

However, I don't believe we have this level of automation in our CI at this time. We don't have packages, yet. I'm removing this from the PR and opening this issue for tracking.

cc @marcinz

Clean up incompat overrides for ManualTask

ref: #189 (comment)

Several properties on ManualTask override base class property types in an unhappy way, requiring a # type: ignore [assignment] for now.

Interface for sharded I/O of existing data

We need an interface for ingesting sharded data from individual processes and then getting it back out again when users are done with it.

Any C++ example to use legate.core

Hi,
I know that the legate.core is designed for numpy/python application. Would you mind to provide some guidance on how to use the library with C++ program? I tried to run the hello program as follows, but in vain. Can you give some examples to call the function from C++ frontend?

int main(int argc, char **argv)
{
Runtime::initialize(&argc, &argv);
legate_hello_perform_registration();
return Runtime::start(argc, argv);
}

Interpreter hitting assertion when shutting down for an uncaught exception

On Python 3.10 I hit assert tlock.locked() when the script ran unsuccessfully due to an uncaught exception. I believe this is caused by the workaround I added to Realm to address the shutdown hang.

Flag --summarize causing error: "unrecognized arguments: -lg:summarize"

Version info

legate.core: commit 18cb8fd
legate.numpy: 026061b

Steps to reproduce

Go to legate.numpy/example
Execute legate --cpus 1 --summarize ./jacobi.py

Expected and actual output

The expected result is the normal outputs from jacobi.py plus whatever the summarize should output.

The actual output is an error message: jacobi.py: error: unrecognized arguments: -lg:summarize

Composition between delinearizing functor and others

When the solver chooses to use a 1D launch domain and delinearizing functors for store arguments, it assumes that the stores don't have any store transformations, which is what this assertion basically entails. In the future, we do need to allow such stores, and to support them, we need to compose the delinearizing functor with those derived from the store transformations.

FileNotFoundError: [Errno 2] No such file or directory: 'python' (Python3 is there)

I do not use Python2, but I am using Ubuntu 20 and Python3 is python3. I did alias python=python3 but that didn't help. I tried with Python2 (as python) but that failed too (see second section).

How do I install with Python3? Thanks.

Python3

Installation complete
Traceback (most recent call last):
  File "./setup.py", line 73, in <module>
    exec(code)
  File "install.py", line 981, in <module>
    driver()
  File "install.py", line 977, in driver
    install(unknown=unknown, **vars(args))
  File "install.py", line 684, in install
    build_legion(
  File "install.py", line 363, in build_legion
    verbose_check_call(
  File "install.py", line 68, in verbose_check_call
    subprocess.check_call(*args, **kwargs)
  File "/usr/lib/python3.8/subprocess.py", line 359, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python3.8/subprocess.py", line 340, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'python'

Python2

Installation complete
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/legate
copying legate/__init__.py -> build/lib.linux-x86_64-2.7/legate
creating build/lib.linux-x86_64-2.7/legate/core
copying legate/core/legion.py -> build/lib.linux-x86_64-2.7/legate/core
copying legate/core/legate.py -> build/lib.linux-x86_64-2.7/legate/core
copying legate/core/install_info.py -> build/lib.linux-x86_64-2.7/legate/core
copying legate/core/__init__.py -> build/lib.linux-x86_64-2.7/legate/core
creating build/lib.linux-x86_64-2.7/legate/timing
copying legate/timing/timing.py -> build/lib.linux-x86_64-2.7/legate/timing
copying legate/timing/__init__.py -> build/lib.linux-x86_64-2.7/legate/timing
running install_lib
copying build/lib.linux-x86_64-2.7/legate/core/install_info.py -> /home/jhammond/LEGATE/lib/python2.7/site-packages/legate/core
byte-compiling /home/jhammond/LEGATE/lib/python2.7/site-packages/legate/core/install_info.py to install_info.pyc
byte-compiling /home/jhammond/LEGATE/lib/python2.7/site-packages/legate/timing/timing.py to timing.pyc
  File "/home/jhammond/LEGATE/lib/python2.7/site-packages/legate/timing/timing.py", line 84
    raise ValueError(f"Invalid store count: {len(stores)}")
                                                         ^
SyntaxError: invalid syntax

running install_egg_info
Removing /home/jhammond/LEGATE/lib/python2.7/site-packages/legate.core-0.1-py2.7.egg-info
Writing /home/jhammond/LEGATE/lib/python2.7/site-packages/legate.core-0.1-py2.7.egg-info

Linearize can produce non-dense domains, slice_task complains

This testcase https://github.com/manopapad/cunumeric/blob/ingest/tests/ingest.py#L97 produces a situation where the LinearizingShardingFunctor produces non-dense slice domains, and BaseMapper::slice_task is unhappy about that.

$ LEGATE_TEST=1 $DEV/quickstart/run.sh 2 tests/ingest.py -cunumeric:test
...
Command: /gpfs/fs1/mpapadakis/legate.core/install/bin/legate --launcher mpirun --numamem 200000 --omps 2 --ompthreads 18 --cpus 1 --sysmem 256 --gpus 8 --fbmem 14500 --verbose --logdir /gpfs/fs1/mpapadakis/2021/10/29/132439 --nodes 2 --ranks-per-node 1 tests/ingest.py -cunumeric:test -logfile /gpfs/fs1/mpapadakis/2021/10/29/132439/%.log
Running: mpirun -n 2 --npernode 1 --bind-to none --mca mpi_warn_on_fork 0 -x LD_LIBRARY_PATH -x UCX_TLS -x LEGATE_DIR -x UCX_MEMTYPE_CACHE -x LEGATE_TEST -x PYTHONDONTWRITEBYTECODE -x PYTHONPATH -x NCCL_LAUNCH_MODE -x LEGATE_NEED_CUDA -x LEGATE_NEED_OPENMP -x LEGATE_NEED_GASNET -x LEGATE_MAX_DIM -x LEGATE_MAX_FIELDS -x GASNET_PHYSMEM_MAX -x REALM_BACKTRACE /gpfs/fs1/mpapadakis/legate.core/install/bin/legion_python -ll:py 1 -lg:local 0 -ll:gpu 8 -cuda:skipbusy -ll:ocpu 2 -ll:othr 18 -ll:onuma 1 -ll:util 2 -ll:bgwork 2 -ll:csize 256 -ll:nsize 200000 -ll:fsize 14500 -ll:zsize 32 -level openmp=5,gpu=5 -lg:eager_alloc_percentage 50 tests/ingest.py -cunumeric:test -logfile /gpfs/fs1/mpapadakis/2021/10/29/132439/%.log
legion_python: core/mapping/base_mapper.cc:235: virtual void legate::mapping::BaseMapper::slice_task(Legion::Mapping::MapperContext, const LegionTask&, const Legion::Mapping::Mapper::SliceTaskInput&, Legion::Mapping::Mapper::SliceTaskOutput&): Assertion `input.domain.dense()' failed.

In this example I am using colorspace = (5,3), 2 shards, and setting up a Tiling manually.

You need the following branches to run this test: https://github.com/manopapad/legate.core/tree/ingest & https://github.com/manopapad/cunumeric/tree/ingest.

Here is some debugging output:

register_legate_core_sharding_functors: register proj_id 0 to shard_id 1073741826
register_legate_core_sharding_functors: register proj_id 1073741826 to shard_id 1073741826
register_legate_core_sharding_functors: register proj_id 0 to shard_id 1073741826
register_legate_core_sharding_functors: register proj_id 1073741826 to shard_id 1073741826
picked sharding functor 1073741826 based on region req 0
picked sharding functor 1073741826 based on region req 0
LinearizingFunctor::shard: p = (0,0) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (0,1) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (0,2) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (1,0) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (1,1) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (1,2) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (2,0) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (2,1) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (2,2) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (3,0) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (3,1) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (3,2) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (4,0) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (4,1) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (4,2) launch_space = <0,0>..<4,2> -> 1
slice_task: input.domain =<0,0>..<1,2>+<2,0>..<2,1>
slice_task: input.domain =<3,0>..<4,2>+<2,2>..<2,2>

Installing Legate.Core __habs undefined (Legion Runtime)

Problem

I am compiling the Legate.core and I am getting the following error inside of the legion/runtime directory.

/home/cosmicbox/Documents/legate.core/legion/runtime/mathtypes/half.h(364): error: identifier "__habs" is undefined

1 error detected in the compilation of "/tmp/tmpxft_0000417d_00000000-6_legion_redop.cpp1.ii".
make: *** [/home/cosmicbox/Documents/legate.core/legion/runtime/runtime.mk:1333: /home/cosmicbox/Documents/legate.core/legion/runtime/legion/legion_redop.cu.o] Error 1

Steps to reproduce

./install.py --cuda --with-cuda /usr/local/cuda-10.1 --python-lib /usr/local/lib/libpython3.7m.a

Versions

OS = CENTOS8
CUDA = 10.1 
g++ = 8.4.1 
PYTHON = 3.7.11
PyArrow = 1.0.1

I was able to clone and run tests from the legion runtime.
I am unsure where to go next to tackle this undefined function.
Any help is appreciated. Just let me know if I should post this with the legion runtime folks.

Scalar reduction stores can be uninitialized

The semantics of reduction stores admits partial updates, including the "uninitialized" case where the task didn't touch the store at all. However, as reported by this comment, the current implementation requires an accessor be created to make that case work. This shouldn't be a requirement and the postamble should be able to handle this automatically for reduction stores. Note that the same isn't true for write stores, as they must receive updates from the task and reporting the uninitialized case early is actually useful.

legate.py steals command-line arguments from target applications

Problem

Lunching jacobi.py using legate with the flag --num does not change the size of the matrix A in the Jacobi example.

Steps to reproduce

Install both legate.core and legate.numpy
Go to the example folder of legate.numpy
Run the jacobi.py code with legate with legate --cpus 1 --omps 1 --ompthreads 1 ./jacobi.py --num 1234

Expected and actual results

The log message should say "Generating 1234x1234 system...", However, the actual message says "Generating 100x100 system..."

Diagnosis

When running the example code of the Jacobi solver, the jacobi.py allows a command-line argument called --num. However, if launching jacobi.py using legate, legate steals the value of --num, saves it to numamem, and never actually passes it to jacobi.py.

This is because, by default, an ArgumentParser allows implicit abbreviations for long arguments. When executing legate <some flags> ./jacobi.py --num 1234, the argument parser of legate.py (line 549-807 here) sees --num as a abbreviation of the flag --numamem.

Solution

A quick solution is to disable the implicit abbreviation by changing line 549 in legate.py from

parser = argparse.ArgumentParser(description="Legate Driver.")

parser = argparse.ArgumentParser(description="Legate Driver.", allow_abbrev=False)

But I believe eventually there should be a better solution to separate the arguments of legate.py from a target application. Otherwise, if a target application has some overlapped arguments with legate.py (i.e., the same argument names), this conflict can not be resolved by simply disabling allow_abbrev.

Adopt NEP 29 for Python version support

I propose legate adopts NEP 29, which suggests supporting Python 3.7+ until Dec 26, 2021 and then 3.8+. The RAPIDS project general follows this support model.

Any objections?

Building leagte.core with Anaconda

I'm following the directions to the legate.core. The command line I'm using is,

$ sudo ./install.py --cuda --with-cuda $CUDA_PATH --arch volta --install-dir /usr/local/legate

I get the error message:

/usr/bin/env: ‘python’: No such file or directory

The system is Ubuntu 20.04 with Anaconda 4.10.1. I downloaded legate.core early in the morning of 4/27/2021 (before 7:00 am EDT).

BTW - I'm using $CUDA_PATH to point to CUDA. It resolves to "/home/laytonjb/anaconda3", my home directory where I have Anaconda installed.

Any help with what I'm doing wrong is greatly appreciated.

Jeff

How to correctly use OpenMP backend?

My test machine has only one CPU socket. This CPU has 6 physical cores and a total of 12 logical cores. Using the script legate (i.e., legate.py), how to correctly specify flags for OpenMP, especially the flags --cpus, --omps, and --ompthreads (and probability also --utility)? Or is there documentation about how to use these flags? I read Legion's documentation but still unclear how to use OpenMP.

No matter how I change the flags, the example code always runs (and I assume the results are correct). However, sometimes I get warnings like ... {4}{threads}: reservation (blahblahblah) cannot be satisfied ... and have no clue how to fix it. For example, when I ran the jacobi.py example with

$ legate --cpus 1 --omps 1 --ompthreads 4 --utility 0 jacobi.py

I got [0 - 7fb341387f00] 0.000120 {4}{threads}: reservation ('dedicated worker (generic) #2') cannot be satisfied. This is confusing, as the system should not be oversubscribed by just 4 OMP threads. I checked the resource reservation by running the application directly with legion_python jacobi.py -ll:py 1 -ll:cpu 1 -ll:ocpu 1 -ll:othr 4 -ll:util 0 -ll:show_rsvr and got

core map {
  domain 0 {
    core 0 { ids=<0> alu=<6> fpu=<6> ldst=<6> }
    core 1 { ids=<1> alu=<7> fpu=<7> ldst=<7> }
    core 2 { ids=<2> alu=<8> fpu=<8> ldst=<8> }
    core 3 { ids=<3> alu=<9> fpu=<9> ldst=<9> }
    core 4 { ids=<4> alu=<10> fpu=<10> ldst=<10> }
    core 5 { ids=<5> alu=<11> fpu=<11> ldst=<11> }
    core 6 { ids=<6> alu=<0> fpu=<0> ldst=<0> }
    core 7 { ids=<7> alu=<1> fpu=<1> ldst=<1> }
    core 8 { ids=<8> alu=<2> fpu=<2> ldst=<2> }
    core 9 { ids=<9> alu=<3> fpu=<3> ldst=<3> }
    core 10 { ids=<10> alu=<4> fpu=<4> ldst=<4> }
    core 11 { ids=<11> alu=<5> fpu=<5> ldst=<5> }
  }
}
OMP0 proc 1d00000000000001 (master): allocated <>
OMP0 proc 1d00000000000001 (worker 1): allocated <>
CPU proc 1d00000000000000: allocated <>
dedicated worker (generic) #1: allocated <>
dedicated worker (generic) #2: allocated <>
OMP0 proc 1d00000000000001 (worker 3): allocated <>
Python-1 proc 1d00000000000002: allocated <>
OMP0 proc 1d00000000000001 (worker 2): allocated <>

And when I did legion_python jacobi.py -ll:py 1 -ll:cpu 1 -ll:ocpu 1 -ll:othr 3 -ll:util 0 -ll:show_rsrv, I got no warning and this

blah blah blah

OMP0 proc 1d00000000000001 (master): allocated <0>
OMP0 proc 1d00000000000001 (worker 1): allocated <1>
CPU proc 1d00000000000000: allocated <3>
dedicated worker (generic) #1: allocated <5,11>
dedicated worker (generic) #2: allocated <5,11>
Python-1 proc 1d00000000000002: allocated <4>
OMP0 proc 1d00000000000001 (worker 2): allocated <2>

It seems except for the 3 or 4 threads requested by OMP, there are also many other processes using resources (CPU proc, dedicated worker 1 & 2, Python-1, and utility proc if enabled). And each process has to be bound to a physical core? Is this the reason why 4 OMP threads cause the resources oversubscribed? If so, how to maximize the performance and resource usage when using OMP? (e.g., dedicating all physical cores to OMP?)

Thanks in advance, and I apologize for so many question :P

Slicing an Nd view of a 1d array on dimensions other than the first

Transformations required for cases like the following:

import legate.numpy as lg
x = lg.arange(25).reshape((5,5))
x[0:2,0:2] = x[2:4,2:4]
x[:,1] = x[:,2]
x[1,3:5] = x[2,3:5]
x[3:5,1] = x[3:5,2]

are not currently supported:

ValueError: Unsupported partition: Tiling(tile:Shape((2, 2)), color:Shape((1, 1)), offset:Shape((2, 2)))
ValueError: Unsupported partition: Tiling(tile:Shape((5, 1)), color:Shape((1, 1)), offset:Shape((0, 1)))
ValueError: Unsupported partition: Tiling(tile:Shape((1, 2)), color:Shape((1, 1)), offset:Shape((1, 3)))
ValueError: Unsupported partition: Tiling(tile:Shape((2, 1)), color:Shape((1, 1)), offset:Shape((3, 1)))

The reason is that, if we wanted to make a tight instance for such subregions we would end up with sparse instances. We could create over-approximate partitions (using a bounding box). This would still allow us to make affine accessors, and the mapper is going to create over-approximate dense instances anyway.

Remove leftover python2 code

There are some statements in the codebase meant to handle Python 2, e.g.:

legate.core/legate.py

Lines 31 to 34 in 0bcd6f1

 try: 

 _input = raw_input # Python 2.x: 

 except NameError: 

 _input = input # Python 3.x:

Since our minimum Python version is 3.7, we can remove this logic (from the core and other libraries).

Can't install legate.core with CUDA

I tried to install legate.core with CUDA in workstations with gaming GPUs (RTX 3090 and GTX1080) and did not succeed. I am getting errors like:

Already on 'control_replication'
/home/beka/opt/legate.core/legion/runtime/mathtypes/complex.h(126): error: more than one conversion function from "__half" to a built-in type applies:
function "__half::operator float() const"
/usr/local/cuda//include/cuda_fp16.hpp(204): here
function "__half::operator short() const"

I suspect these are due to the wrong GPU architecture is given to the compiler.
When I indicate --arch pascal, arch=sm_60 appears in the compiler options.

For my GPUs I need arch=sm_61 or arch=sm_86
How can I get it right?

Record important compile-time settings

We should have a system similar to Legion's legion_defines.h for recording the values of certain compile-time flags that are important to know when compiling or running code against legate.core.

Legion uses this system, for example, to record the maximum number of dimensions that the runtime supports, which is important for the regent compiler to know when generating code that will run against that runtime.

For legate.core this is important to do for the value of TYPE_SAFE_LEGATE, which controls whether the argument deserializer will expect to find a type tag before each argument. This setting needs to be communicated to the legate application at runtime, so that BufferBuilder python objects know to add the extra type tag.

Start Python Before Legion

Today, the Legate driver script will always run legion_python to execute Legate programs using Legion. Alternatively, we could start the Python interpreter first in each process and then start Legion when the legate.core module is loaded. This would allow Legate programs to be run like normal Python programs on a single node, or in the case of multi-node execution, to be run like mpirun -n N python script.py. That might be preferable for some users, but also will require some work to make it happen and comes with some caveats.

Machine configuration parameters: today users pass machine configuration parameters on the command line. The Legate driver script converts those into Realm machine command line options. When using legion_python, we ensure that those command line parameters are stripped out before the script starts and becomes visible through sys.argv in Python. That prevents users from having to know how to ignore our arguments. In the case of starting Python first, machine configuration flags would still be visible on sys.argv at least until we start Legion.
Some Python operations may need to turn around and launch sub-tasks which also are going to need to run on Python processors, such as numpy.vectorize which will need to execute a user-defined function. To handle such cases, we need to have support in Realm for drafting external resources as custom processors. Specifically in this case, we need to be able to tell Realm that it can treat the main thread with the Python interpreter as an instance of a Python processor on which Python sub-tasks can be executed. There are some tricky open questions here, such as how to know when it's safe to execute sub-tasks on the drafted processor, e.g. by noticing that the implicit task has paused. Legion can provide some of that context, but it's not a general Realm solution. The concept of an implicit-task currently only exists in Legion. More details: StanfordLegion/legion#716
Allowing Python to start before Legion will pollute the global address space with all of the Python interpreter's icky global variables. Once that occurs, it's impossible for the work on subprocess in Legion and Realm to provide support for multiple Python interpreters per node as the pollution of the global address space from the Python interpreter will just flow down into any subprocesses. More details: StanfordLegion/legion#627

Add Python 3.9 testing to CI

To support NEP 29 as mentioned in #21

Bump numpy version

ref: #189 (comment)

Suggested above to bump numpy version to a later version that supports better type annotations.

This issue is to coordinate the version bump, any mypy-related updates that can also occur as a result, and any documentation updates.

Show full execution context in backtraces from failed task executions

Currently if an error occurs inside a task launched from legate we only have access to the state of execution within the runtime, and thus can only print (and inspect inside a debugger) the stack up to the point where the task starts. For debugging it would be useful to also know the state of the program when the task was launched.

The execution of a task is normally disconnected from its launch, so to do this properly we would need support in the runtime for precise exceptions. Once such functionality is available, Legate would need to package stack traces from failing tasks in their future results and return them up the stack to the python interpreter, at the point where the faulty task was launched.

Then the launching code would have a full picture of the execution state, and could print a more useful stacktrace (we would likely want to print the frames within Legion as C-level stacktraces, and the python frames at the python level, using a module like traceback).

error in install.py

greetings, I'm trying to install legate.core on jetson nano (CUDA arch-5.3 and version-10.2.3). I have also pre-installed numpy_1.19, cffi, pyarrow and it's got g++.
I tried to install from source -and with gpu and cuda support- using file : install.py
but I'm getting this error :
sudo python3 install.py --cuda --with-cuda /usr/local/cuda-10 --arch maxwell --install-dir /usr/local/legate/

Using python lib and version: /usr/lib/python3.6/config-3.6m-aarch64-linux-gnu/libpython3.6m.so, 3.6.9
error: pathspec 'control_replication' did not match any file(s) known to git.

Traceback (most recent call last):
File "install.py", line 946, in
driver()
File "install.py", line 942, in driver
install(unknown=unknown, **vars(args))
File "install.py", line 651, in install
update_legion(legion_src_dir, branch=legion_branch)
File "install.py", line 216, in update_legion
git_update(legion_src_dir, branch=branch)
File "install.py", line 150, in git_update
verbose_check_call(["git", "checkout", branch], cwd=repo_dir)
File "install.py", line 85, in verbose_check_call
subprocess.check_call(*args, **kwargs)
File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'checkout', 'control_replication']' returned non-zero exit status 1

Add RAFT to CUDA libraries managed by legate.core

Since 0.16 a lot of helper code from various RAPIDS projects has been collected into a single header-only library, RAFT.

The "handles" from multiple libraries (e.g. cuML, cuGraph) have been unified into a single RAFT handle; if we want to use functions from these libraries we need to feed them an initialized RAFT handle.

There should be one RAFT handle per device, and each of those holds handles to various CUDA math libraries (cuBLAS, cuSolver, cuSparse), that are initialized on-demand if the code asks for the corresponding handle.

Since RAFT is header-only I think it makes sense to include it in the legate.core build and manage its handle in cudalibs.h. We should also piggyback off its internal handles for our cuBLAS needs instead of maintaining a second copy of the library.

Support for building with clang

We can only compile with gcc compilers and nvcc right now. We can use clang to build CPU-only versions of Legate, but there are things that clang accepts which nvcc does not and vice-versa.

Protocol/ABC for launcher types

refs: #189 (comment), #189 (comment), #189 (comment)

Clean up this union type:

 Sequence[Union[ScalarArg, RegionFieldArg, FutureStoreArg]]

Update python installation method

Legion and legate are using easy_install to install the python packages to an arbitrary location. This mode is deprecated and will soon be removed. We should switch to following standard conventions (e.g. pip-based), which install directly into the currently active virtual environment (e.g. the one managed by conda).

Here are some relevant messages we get during installation:

DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
DeprecationWarning: The distutils.sysconfig module is deprecated, use sysconfig instead
SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
SetuptoolsDeprecationWarning: Custom 'build_py' does not implement 'get_data_files_without_manifest'.
Please extend command classes from setuptools instead of distutils.

One thing I would like to know is how easy it would be to "uninstall" our packages in this mode, in case we want to do a clean build.

Copying from my discussion with @bryevdv on alternatives:

The issue probably comes down to whether "build cmd + package/install cmd" is acceptable or whether you want some one-shot to do everything. One option might be to have cmake build the binary artifacts and install them in the source tree (or somewhere), and then a pip install . just does a fairly normal "python package install". Maybe the cmake process even does the pip install . for you as the last step. If you want things "driven" from python then you still have a setup.py that invokes the C++ build. Since we can't call setup.py as a script anymore (that is deprecated) we'd have to rely on only env vars to control any conditional logic inside it when you call pip install . or build. There are evidently other tools that support custom steps, like the hatch tool mentioned in use by Jupyter.

CC @trxcllnt for input

Undefined symbol in liblgcore.so

Hi, everyone!
I'm trying to use cunumeric through legate core but I'm running into an "undefined symbol" error. I leave the trace below.

OSError: cannot load library '/home/francesco/anaconda3/envs/legate-v1/lib/libcunumeric.so': /home/francesco/anaconda3/envs/legate-v1/lib/./liblgcore.so: undefined symbol: _ZN6Legion7Runtime29perform_registration_callbackEPFvN5Realm7MachineEPS0_RKSt3setINS1_9ProcessorESt4lessIS5_ESaIS5_EEEb

I installed legate in anaconda with Python 3.8 and Cuda support. My system runs on Ubuntu 20.04.
Did anyone have the same error? Or can you point me in the direction where to solve my problem?

Many thanks,
Francesco

The default value of `--logdir` is not really the current directory

This is a minor issue. The help message of legate says the default value of --logdir is the current directory. However, the actual default value of --logdir is where the script legate is installed. See

legate.core/legate.py

Line 661 in 9e327b7

default=os.path.dirname(os.path.realpath(__file__)),

This is a bit confusing. At least for me, I thought the current directory means the directory where I launch the legate, rather than where it is installed.

If the intention is indeed to use the current directory, I believe os.getcwd() should work fine.

Add NCCL to CUDA libraries managed by legate.core

The implementation would probably follow this recipe for initializing one NCCL rank per GPU in Legion: https://gitlab.com/StanfordLegion/legion/-/blob/master/examples/nccl/nccl_legion_demo.cu.

Libraries like legate.pandas that are currently initializing NCCL internally would want to switch to using this.

If we incorporate RAFT in legate.core, we can pass the NCCL communicator that we initialize to be used for their communication methods: https://github.com/rapidsai/raft/blob/branch-0.18/cpp/include/raft/comms/comms.hpp.

Conda build fails with c++17 related fix

When using conda-build:

/opt/conda/conda-bld/legate-core_1651611301955/_build_env/x86_64-conda-linux-gnu/include/c++/11.2.0/type_traits:71:52: error: redefinition of 'constexpr const _Tp std::integral_constant<_Tp, __v>::value'
   71 |   template<typename _Tp, _Tp __v>
      |                                                    ^                           
/opt/conda/conda-bld/legate-core_1651611301955/_build_env/x86_64-conda-linux-gnu/include/c++/11.2.0/type_traits:59:29: note: 'constexpr const _Tp value' previously declared here
   59 |       static constexpr _Tp                  value = __v;
      |

This is fixed with forcing c++17.

According to this thread this is probably related to the combination of CUDA and GCC. It's not clear whether this is a valid error or a bug. It seems that there should not be a problem since this is a standard header.

Cross-operation partitioning constraints

Operations like prefix sum or sorting typically involve two sets of tasks in distributed setting, and currently there's no way of constraining their partitions to be consistent. We need to extend the constraint language to be able to express such constraints.

Interpreter check

As part of our preparation for the release, we need to add a check that detects if we're not running on a legion python interpreter and if so, prints out a nice error message that also tells the user how to run Legate programs correctly.

The use of legate.cunumeric and legate.pandas in jupyter notebook

Hi guys, I wonder if it is possible to use legate.cunumeric and legate.pandas in jupyter notebook? If there are any docs on this, could you point me at those?

Run Legion Spy and Prof only for successful runs

The launcher seems to be running post-processing scripts for Legion Spy and Prof even when the program terminated with an error. I think we should check the exit code and skip that part when it was not 0.

Investigate turning off CI in forked repos

In forked repos, the appropriate runners are not available, and the CI always fails. This may be confusing to forking users.

SHARD_VOLUME heuristic should consider broadcasted dimensions

This heuristic decides how big the pieces should be before legate will bother partitioning. However, in the presence of partial broadcasts like the following example:

task.add_output(arr)
task.add_broadcast(arr, axes=tuple(range(1, arr.ndim)))

the heuristic will not consider the broadcasted dimensions. In this example we end up only considering the first dimension, so an array like arr.shape == (1000,1000,1000), which should definitely be split, fails the heuristic, because it doesn't have enough elements on the first dimension alone.

CC @fduguet-nv

Guide and information on using legate

Legate got suggested as a library I should look at during a couple of sessions I attended at GTC2021. I just wondered if there was somewhere or someone I could get some information on how to use the library? My use case is with volume data, which is array data. From the slides from GTC 2020, it appears that the data is split a bit like a quad tree implementation. It would be greatto get some information how to use.

Check last CUDART error in task postamble

As discussed on nv-legate/cunumeric#281.

I suggest checking the value of cudaPeekAtLastError/cudaGetLastError in the task postamble, if running on a GPU processor, on both debug and release modes.

I assume that, no matter what method we use to encapsulate a task's CUDA effects within the envelope of realm-managed effects (e.g. using the realm CUDART hijack, or doing an explicit context sync), a driver-level error check will always happen before task completion. However, errors like "too many threads per block" (i.e. when threadsPerBlock specified on the kernel launch site is greater than maxThreadsPerBlock specified on __launch_bounds__) signal immediately at kernel launch, and don't even enter the driver, thus are not caught by the existing exit checks. The extra exit check I am suggesting would be a lightweight way (causes no extra blocking) to catch such runtime-only errors, so they don't cause silent failures.

Conda install fails on osx

conda install -c nvidia -c conda-forge -c legate legate-core

PackagesNotFoundError: The following packages are not available from current channels:

legate-core

Current channels:

Institute CI using Github actions

Port CI to github actions.

Conda package is compiled with `-march=native`

I'm not sure if my understanding is correct, but it seems all code by default is compiled with -march=native. See:

legate.core/install.py

Lines 902 to 908 in cd4ca09

 parser.add_argument( 

 "--march", 

 dest="march", 

 required=False, 

 default="native", 

 help="Specify the target CPU architecture.", 

 )

If so, this is fine for building packages for local use. However, in the newly merged conda package recipe, it seems the code is still compiled with -march=native. See

legate.core/conda/conda-build/build.sh

Line 3 in cd4ca09

 $PYTHON install.py --cuda --openmp --with-cuda $PREFIX --with-nccl $PREFIX --arch 70,75,80 --install-dir $PREFIX -v 

The recipe's still using the default value for march.

If my understanding is correct, this breaks the portability of the conda package.

Runtime warning when reusing arrays in pytest modules

When "reusing" a cunumeric array in a pytest module, all test can pass, but a runtime warning is issued at the end:

Exception ignored in: <function RegionField.__del__ at 0x7f30c0594320>
Traceback (most recent call last):
  File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/store.py", line 124, in __del__
  File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/store.py", line 251, in detach_external_allocation
  File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 447, in detach_external_allocation
  File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 436, in _remove_allocation
  File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 430, in _remove_attachment
TypeError: argument of type 'NoneType' is not iterable
Exception ignored in: <function RegionField.__del__ at 0x7f30c0594320>
Traceback (most recent call last):
  File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/store.py", line 124, in __del__
  File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/store.py", line 251, in detach_external_allocation
  File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 447, in detach_external_allocation
  File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 436, in _remove_allocation
  File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 430, in _remove_attachment
TypeError: argument of type 'NoneType' is not iterable

The warning only appears when -cunumeric:test is specified.

To reproduce, run the code below with legate mre.py -cunumeric:test

# mre.py

import sys
import pytest
import cunumeric as cnp

x = cnp.array([1, 2, 3])

def test_lt():
    y = x < 2

def test_le():
    y = x <= 2

pytest.main(sys.argv)

Common testing infrastructure for Legate libraries

Currently, each Legate library has its own test.py script for testing, but this makes it hard to propagate improvements in one script to the others. For example, Legate Pandas' script can run GPU tests in parallel without exceeding the resource limit (which reduces testing time a lot). We should build a common testing driver in the core so that the individual test drivers can inherit improvements like this more easily.

Exposing the Legate C++ API through Cython

I'm not sure whether this is related to issue #11 but it would be good to provide an easier interface to handle existing C-level data in python, in particular for regions, but possibly other objects, too.

In particular, if we have a handle to a region in C, it's difficult to create a corresponding legate.core.legion.Region object:

I believe there is no way right now to get the fields of a field space (this is simply lacking from Legion's C interface AFAICT)
The legate.core.legion.FieldSpace class would have to support being created with a handle, like the Region and IndexSpace classes atm

Without this, I'm not sure if/how you can use the __legate_data_interface__, unless you create the underlying C objects on the python side.

Per-dimension alignments

Currently, Legate Core only allows tasks to specify alignments between whole Legate stores, which assumes those stores have the same dimensions. This requires dimensions to be added purely for the purpose of alignments, which often make the stores exceed the supported maximum dimension, limiting the set of admissible stores even to a smaller set. We want to be able to specify alignment constraints on a per-dimension basis. This requires both the interface extension and changes in the constraint solver to make it resolve partial alignments.

Update README.md

With a new NCCL dependency, now is a time to revise our README.md to make it up to date with the dependencies, minimum supported hardware level, etc..

Building on Windows

We currently only provide conda packages for Linux. For other platforms users will need to build from source. We have not tried to do this on Windows platforms.

As @HarryES95 notes on #106, some of our scripts do not expect to be called on a Windows environment. However, the larger problem is that Legion (the distributed runtime that Legate is built on) does not support Windows at the moment (see StanfordLegion/legion#1017).

Opening this issue to track work on this.

Test suite additional configurations

Collecting some ideas on additional ways to run our existing tests:

	try:
	_input = raw_input # Python 2.x:
	except NameError:
	_input = input # Python 3.x:

	parser.add_argument(
	"--march",
	dest="march",
	required=False,
	default="native",
	help="Specify the target CPU architecture.",
	)

nv-legate / legate.core Goto Github PK

legate.core's People

Contributors

Stargazers

Watchers

Forkers

legate.core's Issues

Version info

Steps to reproduce

Expected and actual output

Python3

Python2

Problem

Steps to reproduce

Versions

Problem

Steps to reproduce

Expected and actual results

Diagnosis

Solution

Recommend Projects

Recommend Topics

Recommend Org

Jobs