nv-legate / legate.core Goto Github PK
View Code? Open in Web Editor NEWThe Foundation for All Legate Libraries
Home Page: https://docs.nvidia.com/legate/24.06/
License: Apache License 2.0
The Foundation for All Legate Libraries
Home Page: https://docs.nvidia.com/legate/24.06/
License: Apache License 2.0
PR #44 had stated the following...
All merges to
main
trigger an automated CI job that will produce a new release and tag incrementing the patch version off of the previous highest tag in the repo. Once the tag is set, the automated CI build for conda packages creates and pushes new packages for users. This includes the version change enabling known good builds and the ability to rollback.
However, I don't believe we have this level of automation in our CI at this time. We don't have packages, yet. I'm removing this from the PR and opening this issue for tracking.
cc @marcinz
ref: #189 (comment)
Several properties on ManualTask
override base class property types in an unhappy way, requiring a # type: ignore [assignment]
for now.
We need an interface for ingesting sharded data from individual processes and then getting it back out again when users are done with it.
Hi,
I know that the legate.core is designed for numpy/python application. Would you mind to provide some guidance on how to use the library with C++ program? I tried to run the hello program as follows, but in vain. Can you give some examples to call the function from C++ frontend?
int main(int argc, char **argv)
{
Runtime::initialize(&argc, &argv);
legate_hello_perform_registration();
return Runtime::start(argc, argv);
}
On Python 3.10 I hit assert tlock.locked()
when the script ran unsuccessfully due to an uncaught exception. I believe this is caused by the workaround I added to Realm to address the shutdown hang.
legate.core: commit 18cb8fd
legate.numpy: 026061b
legate --cpus 1 --summarize ./jacobi.py
The expected result is the normal outputs from jacobi.py
plus whatever the summarize
should output.
The actual output is an error message: jacobi.py: error: unrecognized arguments: -lg:summarize
When the solver chooses to use a 1D launch domain and delinearizing functors for store arguments, it assumes that the stores don't have any store transformations, which is what this assertion basically entails. In the future, we do need to allow such stores, and to support them, we need to compose the delinearizing functor with those derived from the store transformations.
I do not use Python2, but I am using Ubuntu 20 and Python3 is python3
. I did alias python=python3
but that didn't help. I tried with Python2 (as python
) but that failed too (see second section).
How do I install with Python3? Thanks.
Installation complete
Traceback (most recent call last):
File "./setup.py", line 73, in <module>
exec(code)
File "install.py", line 981, in <module>
driver()
File "install.py", line 977, in driver
install(unknown=unknown, **vars(args))
File "install.py", line 684, in install
build_legion(
File "install.py", line 363, in build_legion
verbose_check_call(
File "install.py", line 68, in verbose_check_call
subprocess.check_call(*args, **kwargs)
File "/usr/lib/python3.8/subprocess.py", line 359, in check_call
retcode = call(*popenargs, **kwargs)
File "/usr/lib/python3.8/subprocess.py", line 340, in call
with Popen(*popenargs, **kwargs) as p:
File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'python'
Installation complete
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/legate
copying legate/__init__.py -> build/lib.linux-x86_64-2.7/legate
creating build/lib.linux-x86_64-2.7/legate/core
copying legate/core/legion.py -> build/lib.linux-x86_64-2.7/legate/core
copying legate/core/legate.py -> build/lib.linux-x86_64-2.7/legate/core
copying legate/core/install_info.py -> build/lib.linux-x86_64-2.7/legate/core
copying legate/core/__init__.py -> build/lib.linux-x86_64-2.7/legate/core
creating build/lib.linux-x86_64-2.7/legate/timing
copying legate/timing/timing.py -> build/lib.linux-x86_64-2.7/legate/timing
copying legate/timing/__init__.py -> build/lib.linux-x86_64-2.7/legate/timing
running install_lib
copying build/lib.linux-x86_64-2.7/legate/core/install_info.py -> /home/jhammond/LEGATE/lib/python2.7/site-packages/legate/core
byte-compiling /home/jhammond/LEGATE/lib/python2.7/site-packages/legate/core/install_info.py to install_info.pyc
byte-compiling /home/jhammond/LEGATE/lib/python2.7/site-packages/legate/timing/timing.py to timing.pyc
File "/home/jhammond/LEGATE/lib/python2.7/site-packages/legate/timing/timing.py", line 84
raise ValueError(f"Invalid store count: {len(stores)}")
^
SyntaxError: invalid syntax
running install_egg_info
Removing /home/jhammond/LEGATE/lib/python2.7/site-packages/legate.core-0.1-py2.7.egg-info
Writing /home/jhammond/LEGATE/lib/python2.7/site-packages/legate.core-0.1-py2.7.egg-info
This testcase https://github.com/manopapad/cunumeric/blob/ingest/tests/ingest.py#L97 produces a situation where the LinearizingShardingFunctor
produces non-dense slice domains, and BaseMapper::slice_task
is unhappy about that.
$ LEGATE_TEST=1 $DEV/quickstart/run.sh 2 tests/ingest.py -cunumeric:test
...
Command: /gpfs/fs1/mpapadakis/legate.core/install/bin/legate --launcher mpirun --numamem 200000 --omps 2 --ompthreads 18 --cpus 1 --sysmem 256 --gpus 8 --fbmem 14500 --verbose --logdir /gpfs/fs1/mpapadakis/2021/10/29/132439 --nodes 2 --ranks-per-node 1 tests/ingest.py -cunumeric:test -logfile /gpfs/fs1/mpapadakis/2021/10/29/132439/%.log
Running: mpirun -n 2 --npernode 1 --bind-to none --mca mpi_warn_on_fork 0 -x LD_LIBRARY_PATH -x UCX_TLS -x LEGATE_DIR -x UCX_MEMTYPE_CACHE -x LEGATE_TEST -x PYTHONDONTWRITEBYTECODE -x PYTHONPATH -x NCCL_LAUNCH_MODE -x LEGATE_NEED_CUDA -x LEGATE_NEED_OPENMP -x LEGATE_NEED_GASNET -x LEGATE_MAX_DIM -x LEGATE_MAX_FIELDS -x GASNET_PHYSMEM_MAX -x REALM_BACKTRACE /gpfs/fs1/mpapadakis/legate.core/install/bin/legion_python -ll:py 1 -lg:local 0 -ll:gpu 8 -cuda:skipbusy -ll:ocpu 2 -ll:othr 18 -ll:onuma 1 -ll:util 2 -ll:bgwork 2 -ll:csize 256 -ll:nsize 200000 -ll:fsize 14500 -ll:zsize 32 -level openmp=5,gpu=5 -lg:eager_alloc_percentage 50 tests/ingest.py -cunumeric:test -logfile /gpfs/fs1/mpapadakis/2021/10/29/132439/%.log
legion_python: core/mapping/base_mapper.cc:235: virtual void legate::mapping::BaseMapper::slice_task(Legion::Mapping::MapperContext, const LegionTask&, const Legion::Mapping::Mapper::SliceTaskInput&, Legion::Mapping::Mapper::SliceTaskOutput&): Assertion `input.domain.dense()' failed.
In this example I am using colorspace = (5,3)
, 2 shards, and setting up a Tiling manually.
You need the following branches to run this test: https://github.com/manopapad/legate.core/tree/ingest & https://github.com/manopapad/cunumeric/tree/ingest.
Here is some debugging output:
register_legate_core_sharding_functors: register proj_id 0 to shard_id 1073741826
register_legate_core_sharding_functors: register proj_id 1073741826 to shard_id 1073741826
register_legate_core_sharding_functors: register proj_id 0 to shard_id 1073741826
register_legate_core_sharding_functors: register proj_id 1073741826 to shard_id 1073741826
picked sharding functor 1073741826 based on region req 0
picked sharding functor 1073741826 based on region req 0
LinearizingFunctor::shard: p = (0,0) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (0,1) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (0,2) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (1,0) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (1,1) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (1,2) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (2,0) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (2,1) launch_space = <0,0>..<4,2> -> 0
LinearizingFunctor::shard: p = (2,2) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (3,0) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (3,1) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (3,2) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (4,0) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (4,1) launch_space = <0,0>..<4,2> -> 1
LinearizingFunctor::shard: p = (4,2) launch_space = <0,0>..<4,2> -> 1
slice_task: input.domain =<0,0>..<1,2>+<2,0>..<2,1>
slice_task: input.domain =<3,0>..<4,2>+<2,2>..<2,2>
I am compiling the Legate.core and I am getting the following error inside of the legion/runtime directory.
/home/cosmicbox/Documents/legate.core/legion/runtime/mathtypes/half.h(364): error: identifier "__habs" is undefined
1 error detected in the compilation of "/tmp/tmpxft_0000417d_00000000-6_legion_redop.cpp1.ii".
make: *** [/home/cosmicbox/Documents/legate.core/legion/runtime/runtime.mk:1333: /home/cosmicbox/Documents/legate.core/legion/runtime/legion/legion_redop.cu.o] Error 1
./install.py --cuda --with-cuda /usr/local/cuda-10.1 --python-lib /usr/local/lib/libpython3.7m.a
OS = CENTOS8
CUDA = 10.1
g++ = 8.4.1
PYTHON = 3.7.11
PyArrow = 1.0.1
I was able to clone and run tests from the legion runtime.
I am unsure where to go next to tackle this undefined function.
Any help is appreciated. Just let me know if I should post this with the legion runtime folks.
The semantics of reduction stores admits partial updates, including the "uninitialized" case where the task didn't touch the store at all. However, as reported by this comment, the current implementation requires an accessor be created to make that case work. This shouldn't be a requirement and the postamble should be able to handle this automatically for reduction stores. Note that the same isn't true for write stores, as they must receive updates from the task and reporting the uninitialized case early is actually useful.
Lunching jacobi.py
using legate
with the flag --num
does not change the size of the matrix A
in the Jacobi example.
legate.core
and legate.numpy
legate.numpy
jacobi.py
code with legate
with legate --cpus 1 --omps 1 --ompthreads 1 ./jacobi.py --num 1234
The log message should say "Generating 1234x1234 system...", However, the actual message says "Generating 100x100 system..."
When running the example code of the Jacobi solver, the jacobi.py
allows a command-line argument called --num
. However, if launching jacobi.py
using legate
, legate
steals the value of --num
, saves it to numamem
, and never actually passes it to jacobi.py
.
This is because, by default, an ArgumentParser
allows implicit abbreviations for long arguments. When executing legate <some flags> ./jacobi.py --num 1234
, the argument parser of legate.py
(line 549-807 here) sees --num
as a abbreviation of the flag --numamem
.
A quick solution is to disable the implicit abbreviation by changing line 549 in legate.py
from
parser = argparse.ArgumentParser(description="Legate Driver.")
to
parser = argparse.ArgumentParser(description="Legate Driver.", allow_abbrev=False)
But I believe eventually there should be a better solution to separate the arguments of legate.py
from a target application. Otherwise, if a target application has some overlapped arguments with legate.py
(i.e., the same argument names), this conflict can not be resolved by simply disabling allow_abbrev
.
I propose legate
adopts NEP 29, which suggests supporting Python 3.7+ until Dec 26, 2021 and then 3.8+. The RAPIDS project general follows this support model.
Any objections?
I'm following the directions to the legate.core. The command line I'm using is,
$ sudo ./install.py --cuda --with-cuda $CUDA_PATH --arch volta --install-dir /usr/local/legate
I get the error message:
/usr/bin/env: โpythonโ: No such file or directory
The system is Ubuntu 20.04 with Anaconda 4.10.1. I downloaded legate.core early in the morning of 4/27/2021 (before 7:00 am EDT).
BTW - I'm using $CUDA_PATH to point to CUDA. It resolves to "/home/laytonjb/anaconda3", my home directory where I have Anaconda installed.
Any help with what I'm doing wrong is greatly appreciated.
Jeff
My test machine has only one CPU socket. This CPU has 6 physical cores and a total of 12 logical cores. Using the script legate
(i.e., legate.py
), how to correctly specify flags for OpenMP, especially the flags --cpus
, --omps
, and --ompthreads
(and probability also --utility
)? Or is there documentation about how to use these flags? I read Legion's documentation but still unclear how to use OpenMP.
No matter how I change the flags, the example code always runs (and I assume the results are correct). However, sometimes I get warnings like ... {4}{threads}: reservation (blahblahblah) cannot be satisfied ...
and have no clue how to fix it. For example, when I ran the jacobi.py
example with
$ legate --cpus 1 --omps 1 --ompthreads 4 --utility 0 jacobi.py
I got [0 - 7fb341387f00] 0.000120 {4}{threads}: reservation ('dedicated worker (generic) #2') cannot be satisfied
. This is confusing, as the system should not be oversubscribed by just 4 OMP threads. I checked the resource reservation by running the application directly with legion_python jacobi.py -ll:py 1 -ll:cpu 1 -ll:ocpu 1 -ll:othr 4 -ll:util 0 -ll:show_rsvr
and got
core map {
domain 0 {
core 0 { ids=<0> alu=<6> fpu=<6> ldst=<6> }
core 1 { ids=<1> alu=<7> fpu=<7> ldst=<7> }
core 2 { ids=<2> alu=<8> fpu=<8> ldst=<8> }
core 3 { ids=<3> alu=<9> fpu=<9> ldst=<9> }
core 4 { ids=<4> alu=<10> fpu=<10> ldst=<10> }
core 5 { ids=<5> alu=<11> fpu=<11> ldst=<11> }
core 6 { ids=<6> alu=<0> fpu=<0> ldst=<0> }
core 7 { ids=<7> alu=<1> fpu=<1> ldst=<1> }
core 8 { ids=<8> alu=<2> fpu=<2> ldst=<2> }
core 9 { ids=<9> alu=<3> fpu=<3> ldst=<3> }
core 10 { ids=<10> alu=<4> fpu=<4> ldst=<4> }
core 11 { ids=<11> alu=<5> fpu=<5> ldst=<5> }
}
}
OMP0 proc 1d00000000000001 (master): allocated <>
OMP0 proc 1d00000000000001 (worker 1): allocated <>
CPU proc 1d00000000000000: allocated <>
dedicated worker (generic) #1: allocated <>
dedicated worker (generic) #2: allocated <>
OMP0 proc 1d00000000000001 (worker 3): allocated <>
Python-1 proc 1d00000000000002: allocated <>
OMP0 proc 1d00000000000001 (worker 2): allocated <>
And when I did legion_python jacobi.py -ll:py 1 -ll:cpu 1 -ll:ocpu 1 -ll:othr 3 -ll:util 0 -ll:show_rsrv
, I got no warning and this
blah blah blah
OMP0 proc 1d00000000000001 (master): allocated <0>
OMP0 proc 1d00000000000001 (worker 1): allocated <1>
CPU proc 1d00000000000000: allocated <3>
dedicated worker (generic) #1: allocated <5,11>
dedicated worker (generic) #2: allocated <5,11>
Python-1 proc 1d00000000000002: allocated <4>
OMP0 proc 1d00000000000001 (worker 2): allocated <2>
It seems except for the 3 or 4 threads requested by OMP, there are also many other processes using resources (CPU proc, dedicated worker 1 & 2, Python-1, and utility proc if enabled). And each process has to be bound to a physical core? Is this the reason why 4 OMP threads cause the resources oversubscribed? If so, how to maximize the performance and resource usage when using OMP? (e.g., dedicating all physical cores to OMP?)
Thanks in advance, and I apologize for so many question :P
Transformations required for cases like the following:
import legate.numpy as lg
x = lg.arange(25).reshape((5,5))
x[0:2,0:2] = x[2:4,2:4]
x[:,1] = x[:,2]
x[1,3:5] = x[2,3:5]
x[3:5,1] = x[3:5,2]
are not currently supported:
ValueError: Unsupported partition: Tiling(tile:Shape((2, 2)), color:Shape((1, 1)), offset:Shape((2, 2)))
ValueError: Unsupported partition: Tiling(tile:Shape((5, 1)), color:Shape((1, 1)), offset:Shape((0, 1)))
ValueError: Unsupported partition: Tiling(tile:Shape((1, 2)), color:Shape((1, 1)), offset:Shape((1, 3)))
ValueError: Unsupported partition: Tiling(tile:Shape((2, 1)), color:Shape((1, 1)), offset:Shape((3, 1)))
The reason is that, if we wanted to make a tight instance for such subregions we would end up with sparse instances. We could create over-approximate partitions (using a bounding box). This would still allow us to make affine accessors, and the mapper is going to create over-approximate dense instances anyway.
There are some statements in the codebase meant to handle Python 2, e.g.:
Lines 31 to 34 in 0bcd6f1
I tried to install legate.core with CUDA in workstations with gaming GPUs (RTX 3090 and GTX1080) and did not succeed. I am getting errors like:
Already on 'control_replication'
/home/beka/opt/legate.core/legion/runtime/mathtypes/complex.h(126): error: more than one conversion function from "__half" to a built-in type applies:
function "__half::operator float() const"
/usr/local/cuda//include/cuda_fp16.hpp(204): here
function "__half::operator short() const"
I suspect these are due to the wrong GPU architecture is given to the compiler.
When I indicate --arch pascal, arch=sm_60 appears in the compiler options.
For my GPUs I need arch=sm_61 or arch=sm_86
How can I get it right?
We should have a system similar to Legion's legion_defines.h for recording the values of certain compile-time flags that are important to know when compiling or running code against legate.core.
Legion uses this system, for example, to record the maximum number of dimensions that the runtime supports, which is important for the regent compiler to know when generating code that will run against that runtime.
For legate.core this is important to do for the value of TYPE_SAFE_LEGATE
, which controls whether the argument deserializer will expect to find a type tag before each argument. This setting needs to be communicated to the legate application at runtime, so that BufferBuilder python objects know to add the extra type tag.
Today, the Legate driver script will always run legion_python
to execute Legate programs using Legion. Alternatively, we could start the Python interpreter first in each process and then start Legion when the legate.core
module is loaded. This would allow Legate programs to be run like normal Python programs on a single node, or in the case of multi-node execution, to be run like mpirun -n N python script.py
. That might be preferable for some users, but also will require some work to make it happen and comes with some caveats.
legion_python
, we ensure that those command line parameters are stripped out before the script starts and becomes visible through sys.argv
in Python. That prevents users from having to know how to ignore our arguments. In the case of starting Python first, machine configuration flags would still be visible on sys.argv
at least until we start Legion.numpy.vectorize
which will need to execute a user-defined function. To handle such cases, we need to have support in Realm for drafting external resources as custom processors. Specifically in this case, we need to be able to tell Realm that it can treat the main thread with the Python interpreter as an instance of a Python processor on which Python sub-tasks can be executed. There are some tricky open questions here, such as how to know when it's safe to execute sub-tasks on the drafted processor, e.g. by noticing that the implicit task has paused. Legion can provide some of that context, but it's not a general Realm solution. The concept of an implicit-task currently only exists in Legion. More details: StanfordLegion/legion#716To support NEP 29 as mentioned in #21
ref: #189 (comment)
Suggested above to bump numpy version to a later version that supports better type annotations.
This issue is to coordinate the version bump, any mypy-related updates that can also occur as a result, and any documentation updates.
Currently if an error occurs inside a task launched from legate we only have access to the state of execution within the runtime, and thus can only print (and inspect inside a debugger) the stack up to the point where the task starts. For debugging it would be useful to also know the state of the program when the task was launched.
The execution of a task is normally disconnected from its launch, so to do this properly we would need support in the runtime for precise exceptions. Once such functionality is available, Legate would need to package stack traces from failing tasks in their future results and return them up the stack to the python interpreter, at the point where the faulty task was launched.
Then the launching code would have a full picture of the execution state, and could print a more useful stacktrace (we would likely want to print the frames within Legion as C-level stacktraces, and the python frames at the python level, using a module like traceback
).
greetings, I'm trying to install legate.core on jetson nano (CUDA arch-5.3 and version-10.2.3). I have also pre-installed numpy_1.19, cffi, pyarrow and it's got g++.
I tried to install from source -and with gpu and cuda support- using file : install.py
but I'm getting this error :
sudo python3 install.py --cuda --with-cuda /usr/local/cuda-10 --arch maxwell --install-dir /usr/local/legate/
Using python lib and version: /usr/lib/python3.6/config-3.6m-aarch64-linux-gnu/libpython3.6m.so, 3.6.9
error: pathspec 'control_replication' did not match any file(s) known to git.
Traceback (most recent call last):
File "install.py", line 946, in
driver()
File "install.py", line 942, in driver
install(unknown=unknown, **vars(args))
File "install.py", line 651, in install
update_legion(legion_src_dir, branch=legion_branch)
File "install.py", line 216, in update_legion
git_update(legion_src_dir, branch=branch)
File "install.py", line 150, in git_update
verbose_check_call(["git", "checkout", branch], cwd=repo_dir)
File "install.py", line 85, in verbose_check_call
subprocess.check_call(*args, **kwargs)
File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'checkout', 'control_replication']' returned non-zero exit status 1
Since 0.16 a lot of helper code from various RAPIDS projects has been collected into a single header-only library, RAFT.
The "handles" from multiple libraries (e.g. cuML, cuGraph) have been unified into a single RAFT handle; if we want to use functions from these libraries we need to feed them an initialized RAFT handle.
There should be one RAFT handle per device, and each of those holds handles to various CUDA math libraries (cuBLAS, cuSolver, cuSparse), that are initialized on-demand if the code asks for the corresponding handle.
Since RAFT is header-only I think it makes sense to include it in the legate.core build and manage its handle in cudalibs.h. We should also piggyback off its internal handles for our cuBLAS needs instead of maintaining a second copy of the library.
We can only compile with gcc compilers and nvcc right now. We can use clang to build CPU-only versions of Legate, but there are things that clang accepts which nvcc does not and vice-versa.
refs: #189 (comment), #189 (comment), #189 (comment)
Clean up this union type:
Sequence[Union[ScalarArg, RegionFieldArg, FutureStoreArg]]
Legion and legate are using easy_install
to install the python packages to an arbitrary location. This mode is deprecated and will soon be removed. We should switch to following standard conventions (e.g. pip-based), which install directly into the currently active virtual environment (e.g. the one managed by conda).
Here are some relevant messages we get during installation:
DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
DeprecationWarning: The distutils.sysconfig module is deprecated, use sysconfig instead
SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
SetuptoolsDeprecationWarning: Custom 'build_py' does not implement 'get_data_files_without_manifest'.
Please extend command classes from setuptools instead of distutils.
One thing I would like to know is how easy it would be to "uninstall" our packages in this mode, in case we want to do a clean build.
Copying from my discussion with @bryevdv on alternatives:
The issue probably comes down to whether "build cmd + package/install cmd" is acceptable or whether you want some one-shot to do everything. One option might be to have cmake build the binary artifacts and install them in the source tree (or somewhere), and then a pip install .
just does a fairly normal "python package install". Maybe the cmake process even does the pip install .
for you as the last step. If you want things "driven" from python then you still have a setup.py
that invokes the C++ build. Since we can't call setup.py
as a script anymore (that is deprecated) we'd have to rely on only env vars to control any conditional logic inside it when you call pip install .
or build. There are evidently other tools that support custom steps, like the hatch
tool mentioned in use by Jupyter.
CC @trxcllnt for input
Hi, everyone!
I'm trying to use cunumeric through legate core but I'm running into an "undefined symbol" error. I leave the trace below.
OSError: cannot load library '/home/francesco/anaconda3/envs/legate-v1/lib/libcunumeric.so': /home/francesco/anaconda3/envs/legate-v1/lib/./liblgcore.so: undefined symbol: _ZN6Legion7Runtime29perform_registration_callbackEPFvN5Realm7MachineEPS0_RKSt3setINS1_9ProcessorESt4lessIS5_ESaIS5_EEEb
I installed legate in anaconda with Python 3.8 and Cuda support. My system runs on Ubuntu 20.04.
Did anyone have the same error? Or can you point me in the direction where to solve my problem?
Many thanks,
Francesco
This is a minor issue. The help message of legate
says the default value of --logdir
is the current directory. However, the actual default value of --logdir
is where the script legate
is installed. See
Line 661 in 9e327b7
This is a bit confusing. At least for me, I thought the current directory means the directory where I launch the legate
, rather than where it is installed.
If the intention is indeed to use the current directory, I believe os.getcwd()
should work fine.
The implementation would probably follow this recipe for initializing one NCCL rank per GPU in Legion: https://gitlab.com/StanfordLegion/legion/-/blob/master/examples/nccl/nccl_legion_demo.cu.
Libraries like legate.pandas that are currently initializing NCCL internally would want to switch to using this.
If we incorporate RAFT in legate.core, we can pass the NCCL communicator that we initialize to be used for their communication methods: https://github.com/rapidsai/raft/blob/branch-0.18/cpp/include/raft/comms/comms.hpp.
When using conda-build:
/opt/conda/conda-bld/legate-core_1651611301955/_build_env/x86_64-conda-linux-gnu/include/c++/11.2.0/type_traits:71:52: error: redefinition of 'constexpr const _Tp std::integral_constant<_Tp, __v>::value'
71 | template<typename _Tp, _Tp __v>
| ^
/opt/conda/conda-bld/legate-core_1651611301955/_build_env/x86_64-conda-linux-gnu/include/c++/11.2.0/type_traits:59:29: note: 'constexpr const _Tp value' previously declared here
59 | static constexpr _Tp value = __v;
|
This is fixed with forcing c++17.
According to this thread this is probably related to the combination of CUDA and GCC. It's not clear whether this is a valid error or a bug. It seems that there should not be a problem since this is a standard header.
Operations like prefix sum or sorting typically involve two sets of tasks in distributed setting, and currently there's no way of constraining their partitions to be consistent. We need to extend the constraint language to be able to express such constraints.
As part of our preparation for the release, we need to add a check that detects if we're not running on a legion python interpreter and if so, prints out a nice error message that also tells the user how to run Legate programs correctly.
Hi guys, I wonder if it is possible to use legate.cunumeric and legate.pandas in jupyter notebook? If there are any docs on this, could you point me at those?
The launcher seems to be running post-processing scripts for Legion Spy and Prof even when the program terminated with an error. I think we should check the exit code and skip that part when it was not 0.
In forked repos, the appropriate runners are not available, and the CI always fails. This may be confusing to forking users.
This heuristic decides how big the pieces should be before legate will bother partitioning. However, in the presence of partial broadcasts like the following example:
task.add_output(arr)
task.add_broadcast(arr, axes=tuple(range(1, arr.ndim)))
the heuristic will not consider the broadcasted dimensions. In this example we end up only considering the first dimension, so an array like arr.shape == (1000,1000,1000)
, which should definitely be split, fails the heuristic, because it doesn't have enough elements on the first dimension alone.
CC @fduguet-nv
Legate got suggested as a library I should look at during a couple of sessions I attended at GTC2021. I just wondered if there was somewhere or someone I could get some information on how to use the library? My use case is with volume data, which is array data. From the slides from GTC 2020, it appears that the data is split a bit like a quad tree implementation. It would be greatto get some information how to use.
As discussed on nv-legate/cunumeric#281.
I suggest checking the value of cudaPeekAtLastError
/cudaGetLastError
in the task postamble, if running on a GPU processor, on both debug and release modes.
I assume that, no matter what method we use to encapsulate a task's CUDA effects within the envelope of realm-managed effects (e.g. using the realm CUDART hijack, or doing an explicit context sync), a driver-level error check will always happen before task completion. However, errors like "too many threads per block" (i.e. when threadsPerBlock
specified on the kernel launch site is greater than maxThreadsPerBlock
specified on __launch_bounds__
) signal immediately at kernel launch, and don't even enter the driver, thus are not caught by the existing exit checks. The extra exit check I am suggesting would be a lightweight way (causes no extra blocking) to catch such runtime-only errors, so they don't cause silent failures.
conda install -c nvidia -c conda-forge -c legate legate-core
PackagesNotFoundError: The following packages are not available from current channels:
Current channels:
Port CI to github actions.
I'm not sure if my understanding is correct, but it seems all code by default is compiled with -march=native
. See:
Lines 902 to 908 in cd4ca09
If so, this is fine for building packages for local use. However, in the newly merged conda package recipe, it seems the code is still compiled with -march=native
. See
The recipe's still using the default value for march
.
If my understanding is correct, this breaks the portability of the conda package.
When "reusing" a cunumeric array in a pytest module, all test can pass, but a runtime warning is issued at the end:
Exception ignored in: <function RegionField.__del__ at 0x7f30c0594320>
Traceback (most recent call last):
File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/store.py", line 124, in __del__
File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/store.py", line 251, in detach_external_allocation
File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 447, in detach_external_allocation
File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 436, in _remove_allocation
File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 430, in _remove_attachment
TypeError: argument of type 'NoneType' is not iterable
Exception ignored in: <function RegionField.__del__ at 0x7f30c0594320>
Traceback (most recent call last):
File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/store.py", line 124, in __del__
File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/store.py", line 251, in detach_external_allocation
File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 447, in detach_external_allocation
File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 436, in _remove_allocation
File "/home/bryan/work/legate.core/install37/lib/python3.7/site-packages/legate/core/runtime.py", line 430, in _remove_attachment
TypeError: argument of type 'NoneType' is not iterable
The warning only appears when -cunumeric:test
is specified.
To reproduce, run the code below with legate mre.py -cunumeric:test
# mre.py
import sys
import pytest
import cunumeric as cnp
x = cnp.array([1, 2, 3])
def test_lt():
y = x < 2
def test_le():
y = x <= 2
pytest.main(sys.argv)
Currently, each Legate library has its own test.py
script for testing, but this makes it hard to propagate improvements in one script to the others. For example, Legate Pandas' script can run GPU tests in parallel without exceeding the resource limit (which reduces testing time a lot). We should build a common testing driver in the core so that the individual test drivers can inherit improvements like this more easily.
I'm not sure whether this is related to issue #11 but it would be good to provide an easier interface to handle existing C-level data in python, in particular for regions, but possibly other objects, too.
In particular, if we have a handle to a region in C, it's difficult to create a corresponding legate.core.legion.Region
object:
legate.core.legion.FieldSpace
class would have to support being created with a handle, like the Region
and IndexSpace
classes atmWithout this, I'm not sure if/how you can use the __legate_data_interface__
, unless you create the underlying C objects on the python side.
Currently, Legate Core only allows tasks to specify alignments between whole Legate stores, which assumes those stores have the same dimensions. This requires dimensions to be added purely for the purpose of alignments, which often make the stores exceed the supported maximum dimension, limiting the set of admissible stores even to a smaller set. We want to be able to specify alignment constraints on a per-dimension basis. This requires both the interface extension and changes in the constraint solver to make it resolve partial alignments.
With a new NCCL dependency, now is a time to revise our README.md to make it up to date with the dependencies, minimum supported hardware level, etc..
We currently only provide conda packages for Linux. For other platforms users will need to build from source. We have not tried to do this on Windows platforms.
As @HarryES95 notes on #106, some of our scripts do not expect to be called on a Windows environment. However, the larger problem is that Legion (the distributed runtime that Legate is built on) does not support Windows at the moment (see StanfordLegion/legion#1017).
Opening this issue to track work on this.
Collecting some ideas on additional ways to run our existing tests:
-lg:safe_ctrlrepl 1
on at least 2 ranks, to check for control replication violations. Possibly add some tools that can help pinpoint where the violation comes from.-lg:partcheck
.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.