nvidia / spark-rapids-jni Goto Github PK

View Code? Open in Web Editor NEW

31.0 21.0 57.0 2.41 MB

RAPIDS Accelerator JNI For Apache Spark

License: Apache License 2.0

Shell 1.43% CMake 1.61% C++ 31.00% Cuda 42.01% Java 22.97% Dockerfile 0.22% PowerShell 0.04% Groovy 0.72%

spark-rapids-jni's Introduction

RAPIDS Accelerator JNI For Apache Spark

This repository contains native support code for the RAPIDS Accelerator for Apache Spark.

Building From Source

See the build instructions in the contributing guide.

spark-rapids-jni's People

Contributors

Stargazers

Watchers

spark-rapids-jni's Issues

Add formatting tools and enforcement of C++ style

It would be nice to have tools to format and precommit checks to enforce a C++ style as was done in the RAPIDS cudf repository with clang-format.

[FEA] ninja-build for cudf submodule

Is your feature request related to a problem? Please describe.
follow-up of #2

Leverage ninja-build to speed up cudf build

Improve string performance in row to columner and columnar to row transitions

When row conversions for strings was added the goal was implementation speed over operational speed. Now that there is a working version, some investigation into the performance of the kernel is warranted. Investiagtions:

shared memory isn't being used for these transitions, but may make sense
strings are handled on a per-warp basis. This works, but could be dominated by a single very-long string in incoming data. It was decided in the initial implementation that spending the time searching for source and destination pointers to divide the work evenly per thread was complicated and of unknown benefit. This should be investigated further.

[FEA] Harden parquet footer parsing

The parquet footer parser feature is experimental and we need to harden it.

This means we need to

fail when we see ambiguity in the names of columns when ignore_case is enabled.
Look at having better unicode to lower case conversion.
Support the non-standard ways of encoding lists and maps.

That last one could be rather complicated as it will require some interface changes to the parser where we need to pass down the schema of the data, not just the names of the columns.

[FEA] Use CI friendly versions revision, sha1, changeList

Is your feature request related to a problem? Please describe.
When testing local builds of spark-rapids-jni with Spark, we currently have to carefully examine that the spar-rapids build consumed the local build output of spark-rapids-jni instead of a downloaded dependency from a Maven Central repo.

Describe the solution you'd like
This proposes to use the mechanism described in Maven CI Friendly Versions, which should be straightforward in spark-rapids-jni since it's a single-module build.

If the version is defined to the tune of from the doc

<version>${revision}${sha1}${changelist}</version>
...
<properties>
    <revision>22.08.0</revision>
    <changelist>-SNAPSHOT</changelist>
    <sha1/>
 </properties>

the user can produce a sufficiently unique local artifact in order not to worry about collisions with the SNAPSHOTs from Central
by adding -Dsha1="-$(git rev-parse --short HEAD)"

Then when building spark-rapids repo the user will specify -Dspark-rapids-jni.version=22.08.0-6453047ef-SNAPSHOT

Describe alternatives you've considered
Continue watching build info output at run time carefully

Additional context
Numerous confusions due to version inconsistencies

[BUG] CMake Error at .../thrust-config-version.cmake

Describe the bug
spark-rapids-jni_nightly-dev ID 184 failed at,

16:28:05  [INFO]      [exec] -- Found JNI: /usr/lib/jvm/java/jre/lib/amd64/libjawt.so  
16:28:05  [INFO]      [exec] -- JDK with JNI in /usr/lib/jvm/java/include;/usr/lib/jvm/java/include/linux;/usr/lib/jvm/java/include
16:28:05  [INFO]      [exec] -- Found nvcomp: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib/cmake/nvcomp/nvcomp-config.cmake (found version "2.3.3") 
16:28:05  [INFO]      [exec] -- Found Thrust: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/include/libcudf/Thrust/thrust/cmake/thrust-config.cmake (found version "1.15.0.0") 
16:28:05  [INFO]      [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:10 (file):
16:28:05  [INFO]      [exec]   file failed to open for reading (No such file or directory):
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec]     /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/src/main/cpp/_THRUST_VERSION_INCLUDE_DIR-NOTFOUND/thrust/version.h
16:28:05  [INFO]      [exec] Call Stack (most recent call first):
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:30 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05  [INFO]      [exec]   CMakeLists.txt:104 (rapids_find_package)
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:17 (math):
16:28:05  [INFO]      [exec]   math cannot parse the expression: " / 100000": syntax error, unexpected
16:28:05  [INFO]      [exec]   exp_DIVIDE (2).
16:28:05  [INFO]      [exec] Call Stack (most recent call first):
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:30 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05  [INFO]      [exec]   CMakeLists.txt:104 (rapids_find_package)
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:18 (math):
16:28:05  [INFO]      [exec]   math cannot parse the expression: "( / 100) % 1000": syntax error,
16:28:05  [INFO]      [exec]   unexpected exp_DIVIDE (3).
16:28:05  [INFO]      [exec] Call Stack (most recent call first):
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:30 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05  [INFO]      [exec]   CMakeLists.txt:104 (rapids_find_package)
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:19 (math):
16:28:05  [INFO]      [exec]   math cannot parse the expression: " % 100": syntax error, unexpected
16:28:05  [INFO]      [exec]   exp_MOD (2).
16:28:05  [INFO]      [exec] Call Stack (most recent call first):
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:30 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05  [INFO]      [exec]   CMakeLists.txt:104 (rapids_find_package)
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:10 (file):
16:28:05  [INFO]      [exec]   file failed to open for reading (No such file or directory):
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec]     /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/src/main/cpp/_THRUST_VERSION_INCLUDE_DIR-NOTFOUND/thrust/version.h
16:28:05  [INFO]      [exec] Call Stack (most recent call first):
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05  [INFO]      [exec]   /usr/local/cmake-3.22.3-linux-x86_64/share/cmake-3.22/Modules/CMakeFindDependencyMacro.cmake:47 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:31 (find_dependency)
16:28:05  [INFO]      [exec] -- Found rmm: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake (found version "22.10.0")   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05  [INFO]      [exec]   CMakeLists.txt:104 (rapids_find_package)
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:17 (math):
16:28:05  [INFO]      [exec]   math cannot parse the expression: " / 100000": syntax error, unexpected
16:28:05  [INFO]      [exec]   exp_DIVIDE (2).
16:28:05  [INFO]      [exec] Call Stack (most recent call first):
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05  [INFO]      [exec]   /usr/local/cmake-3.22.3-linux-x86_64/share/cmake-3.22/Modules/CMakeFindDependencyMacro.cmake:47 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:31 (find_dependency)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05  [INFO]      [exec]   CMakeLists.txt:104 (rapids_find_package)
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:18 (math):
16:28:05  [INFO]      [exec]   math cannot parse the expression: "( / 100) % 1000": syntax error,
16:28:05  [INFO]      [exec]   unexpected exp_DIVIDE (3).
16:28:05  [INFO]      [exec] Call Stack (most recent call first):
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05  [INFO]      [exec]   /usr/local/cmake-3.22.3-linux-x86_64/share/cmake-3.22/Modules/CMakeFindDependencyMacro.cmake:47 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:31 (find_dependency)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05  [INFO]      [exec]   CMakeLists.txt:104 (rapids_find_package)
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] CMake Error at /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/thrust/thrust-config-version.cmake:19 (math):
16:28:05  [INFO]      [exec]   math cannot parse the expression: " % 100": syntax error, unexpected
16:28:05  [INFO]      [exec]   exp_MOD (2).
16:28:05  [INFO]      [exec] Call Stack (most recent call first):
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-dependencies.cmake:24 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/rmm/rmm-config.cmake:75 (include)
16:28:05  [INFO]      [exec]   /usr/local/cmake-3.22.3-linux-x86_64/share/cmake-3.22/Modules/CMakeFindDependencyMacro.cmake:47 (find_package)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-dependencies.cmake:31 (find_dependency)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake:91 (include)
16:28:05  [INFO]      [exec]   /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/_deps/rapids-cmake-src/rapids-cmake/find/package.cmake:114 (find_package)
16:28:05  [INFO]      [exec]   CMakeLists.txt:104 (rapids_find_package)
16:28:05  [INFO]      [exec] 
16:28:05  [INFO]      [exec] 
16:28:06  [INFO]      [exec] -- Check if compiler accepts -pthread
16:28:06  [INFO]      [exec] -- Check if compiler accepts -pthread - yes
16:28:06  [INFO]      [exec] -- Found libcudacxx: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/libcudacxx/libcudacxx-config.cmake (found version "1.7.0") 
16:28:06  [INFO]      [exec] -- Found cuco: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cuco/cuco-config.cmake (found version "0.0.1") 
16:28:06  [INFO]      [exec] -- Found cuFile: /usr/local/cuda/lib64/libcufile.so  
16:28:06  [INFO]      [exec] -- Found KvikIO: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/kvikio/kvikio-config.cmake (found version "22.10.0") 
16:28:06  [INFO]      [exec] -- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7") 
16:28:06  [INFO]      [exec] -- Found cudf: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/cudf/cudf-config.cmake (found version "22.10.0") 
16:28:06  [INFO]      [exec] -- Found GTest: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/libcudf-install/lib64/cmake/GTest/GTestConfig.cmake (found version "1.10.0")  
16:28:06  [INFO]      [exec] -- Found Boost: /usr/local/lib/cmake/Boost-1.79.0/BoostConfig.cmake (found version "1.79.0")  
16:28:06  [INFO]      [exec] -- Configuring incomplete, errors occurred!
16:28:06  [INFO]      [exec] See also "/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/CMakeFiles/CMakeOutput.log".
16:28:06  [INFO]      [exec] See also "/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build/CMakeFiles/CMakeError.log".
16:28:06  [INFO] ------------------------------------------------------------------------
16:28:06  [INFO] BUILD FAILURE
16:28:06  [INFO] ------------------------------------------------------------------------
16:28:06  [INFO] Total time: 1:26:13.167s
16:28:06  [INFO] Finished at: Sat Aug 13 08:28:06 UTC 2022
16:28:07  [INFO] Final Memory: 19M/654M
16:28:07  [INFO] ------------------------------------------------------------------------
16:28:07  [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.0.0:run (build-sparkrapidsjni) on project spark-rapids-jni: An Ant BuildException has occured: exec returned: 1
16:28:07  [ERROR] around Ant part ...<exec failonerror="true" dir="/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/cmake-build" executable="cmake">... @ 5:152 in /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-184-cuda11/target/antrun/build-main.xml
16:28:07  [ERROR] -> [Help 1]
16:28:07  [ERROR] 
16:28:07  [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
16:28:07  [ERROR] Re-run Maven using the -X switch to enable full debug logging.
16:28:07  [ERROR] 
16:28:07  [ERROR] For more information about the errors and possible solutions, please read the following articles:
16:28:07  [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

[FEA] Benchmark compilation should not require building cuDF benchmarks

Is your feature request related to a problem? Please describe.
Building benchmarks in spark-rapids-jni requires building of the cudf benchmarks in order to get the libcudf_datagen library. The cudf team isn't interested in making this library public or decoupling it from the cudf benchmarks, so a copy of this could should be brought over to spark-rapids-jni. The data generation code doesn't change often and seems preferable to building all the cudf benchmarks, which currently takes a non-trivial amount of time that will surely get longer over time.

This was discussed in #331 and an attempt was made at that time to bring this files internal to spark-rapids-jni, but compilation issue arose that put it outside of the scope of that work.

Investigate concurrent kernel execution for columnar to row conversions

It was brought up in reviews for 10871 that the some of the kernels running for the row to column conversion could be run concurrently. This could provide a performance boost and should be investigated. The row to column string kernel is using information computed by the fixed-width copy kernel, but others have no dependencies.

[BUG] Executables built inside the docker container are unable to find libraries

Is your feature request related to a problem? Please describe.
The executables built with the build-in-docker script probably have a different environment than the host. This results in them being unable to find libraries that are required to run.

$ target/cmake-build/gtests/ROW_CONVERSION 
target/cmake-build/gtests/ROW_CONVERSION: error while loading shared libraries: libcudf.so: cannot open shared object file: No such file or directory
$ ldd !$
ldd target/cmake-build/gtests/ROW_CONVERSION
	linux-vdso.so.1 (0x00007ffe2abf3000)
	libcudf.so => not found
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe277626000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fe27760a000)
	libnvcomp.so => not found
	libnvcomp_gdeflate.so => not found
	libnvcomp_bitcomp.so => not found
	libcudart.so.11.0 => not found
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe277602000)
	libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fe275e38000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe275e15000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe275c33000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe275ae2000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe275ac7000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe2758d5000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe277646000)

Describe the solution you'd like
If the docker image used the same path structure as the host, the host would find the runpath information in the elf to match up and be able to find the libraries needed.

Describe alternatives you've considered
It is also possible to just force running everything inside of the docker environment via scripts

[BUG] Failed to register cuFile handle: internal error running CuFileTest

Describe the bug
Running CuFileTest results in

ai.rapids.cudf.CudfException: cuDF failure at: /rapids/spark-rapids-jni/thirdparty/cudf/java/src/main/native/src/CuFileJni.cpp:162: Failed to register cuFile handle: internal error

Steps/Code to reproduce bug

PARALLEL_LEVEL=6 ./build/build-in-docker clean install -DGPU_ARGCHS=native -Dtest=ai.rapids.cudf.CuFileTest#testCopyToFile

Expected behavior
should pass.

Environment details (please complete the following information)

local dev

Additional context
Probably need a cudf issue to have exceptions include the path on the filesystem.

Port CUDA fatal test pom changes from cudf

After rapidsai/cudf#10884 CudaFatalTest is generating a fatal CUDA error which causes subsequent tests using CUDA APIs to fail. We need to port the pom changes from the cudf PR into the spark-rapids-jni pom.

Update build to use CUDF_USE_PER_THREAD_DEFAULT_STREAM

rapidsai/cudf#10877 is changing PER_THREAD_DEFAULT_STREAM to CUDF_USE_PER_THREAD_DEFAULT_STREAM. We need to update the spark-rapids-jni build accordingly and should switch to the new flag name.

[FEA] Support predicate push down in the native parquet footer parser.

Another big area where for processing the footer is predicate push down. It would be great if we could push down the predicates and filter out row groups that do not match before sending data back to java. We could also drop all of the column chunk statistics after this because they are not going to be needed and it would save both time and memory to serialize and de-serialize them again.

To be clear this does not include work for bloom filters or the dictionary predicate checks. The dictionary checks are something that we will keep in java. Bloom filters is something that we need to investigate more.

[FEA] String to float kernel should support hexadecimal encoded strings

Is your feature request related to a problem? Please describe.
Spark supports hexadecimal for strings being cast to floats. Strings like 0x1p0. The values is HEX VALUE * 2 ^ EXP as defined in this PR comment Note that the parsing code needs to support intermixed decimal and hexadecimal strings.

Describe the solution you'd like
The float parsing kernel should be augmented to support these string values.

Additional context
https://www.exploringbinary.com/hexadecimal-floating-point-constants/

[BUG] jni pre-merge cannot upload failure logs correctly

Describe the bug
https://github.com/NVIDIA/spark-rapids-jni/runs/5877737893?check_suite_focus=true

******** JOB LOGS with sensitive data redacted **************

[FEA] Allow building and running all tests in a single command

Now that c++ tests are being added to spark-rapids-jni, it would be nice to have a single script to run that would build the library, tests, and run the tests. This could be leveraged in CI.

Update cudf submodule ref to branch-22.12

Is your feature request related to a problem? Please describe.
After cudf CI of 22.12 is available, we should create a PR,

update https://github.com/NVIDIA/spark-rapids-jni/blob/branch-22.12/.gitmodules#L4 to refer to branch-22.12
do a

git submodule update --remote --merge

[BUG] ant exec errors logged as info in the Maven log, not included in the error

Describe the bug
Standard implementation in antrun plugin does not propagate error output to Maven. Therefore error output is logged as info.

The actual ERROR log from Maven only includes the return code 1 without explanation.

Steps/Code to reproduce bug

Run build when cudf submodule is stale

[INFO] ------------------------------------------------------------------------
[INFO] Building RAPIDS Accelerator JNI for Apache Spark 22.08.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-antrun-plugin:3.0.0:run (submodule check) @ spark-rapids-jni ---
[INFO] Executing tasks
[INFO]      [exec] ERROR: submodules out of date: +dba4eea4a5db9e1b3ceb8ceb8f2762cf86b91170 thirdparty/cudf (v0.12.0-16695-gdba4eea4a5). To fix: git submodule update --init --recursive
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.741s
[INFO] Finished at: Mon Jul 18 19:06:16 UTC 2022
[INFO] Final Memory: 15M/477M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.0.0:run (submodule check) on project spark-rapids-jni: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...<exec failonerror="true" dir="/home/gshegalov/gits/NVIDIA/spark-rapids-jni" executable="/home/gshegalov/gits/NVIDIA/spark-rapids-jni/build/submodule-check"></exec>... @ 4:161 in /home/gshegalov/gits/NVIDIA/spark-rapids-jni/target/antrun/build-main.xml
[ERROR] -> [Help 1]

Expected behavior

ERROR output should include diagnostics beyond the result code.

This can be achieved by turning off failonerror, capturing result code and error in properties of exec and having an additional check

        <fail message="$Exit code: {exitCode}, Error message: ${errorMsg}">
            <condition>
                <not>
                    <equals arg1="${exitCode}" arg2="0"/>
                </not>
            </condition>
        </fail>

Environment details (please complete the following information)

local build

Additional context
N/A

Look into dead code elimination to reduce static native lib size in the jar

nvcc: Eliminating unused kernels https://developer.nvidia.com/blog/reducing-application-build-times-using-cuda-c-compilation-aids/

-rdc=true

gcc: https://gcc.gnu.org/onlinedocs/gnat_ugn/Compilation-options.html

add compiler flags --ffunction-sections and --fdata-sections
add to linker flags --gc-sections

Use "mvn --batch-mode" in CI builds

We have a lot of noise in the build logs we upload and link to GH workflow runs

https://github.com/NVIDIA/spark-rapids-jni/runs/8214034914?check_suite_focus=true#step:2:20347

...
239/283 KB   60/60 KB   350/350 KB   454/1473 KB   127/127 KB   
239/283 KB   60/60 KB   350/350 KB   458/1473 KB   127/127 KB   
243/283 KB   60/60 KB   350/350 KB   458/1473 KB   127/127 KB   
243/283 KB   60/60 KB   350/350 KB   462/1473 KB   127/127 KB   
243/283 KB   60/60 KB   350/350 KB   466/1473 KB   127/127 KB   
...

[FEA] Auto cleanup temp branches created by bot

Is your feature request related to a problem? Please describe.
Intermediate branches for auto-merged should be auto deleted by the action itself.

But if the PR was closed by other like manual conflict fix, we should have a way to auto clean up them

[FEA] Use ccache in build-in-docker

Is your feature request related to a problem? Please describe.
I wish RAPIDS Accelerator JNI for Apache Spark would be built with ccache support introduced in rapidsai/cudf#10790

Describe the solution you'd like
Provide an option or environment variable to easily turn on/off ccache via build-in-docker

Describe alternatives you've considered
A one-off personal script

Additional context
Sometimes one needs to remove the build directory to fix build issues. Always being able to clean build without taking a productivity hit prevents issues from occurring to begin with.

[FEA] Random return code choice out of CSV specified in substituteReturnCode

Is your feature request related to a problem? Please describe.
I wish the Fault Injection Tool allowed comma-separated values for substitute Return Code such that I don't need a verbose list of separate rules for each one individually

Describe the solution you'd like

       "cuLaunchKernel_ptsz": {
           "percent": 1,
           "injectionType": 2,
           "substituteReturnCode": "2,3,999" ,
           "interceptionCount": 1000
       }

with the semantic: if the rule is matched pick one of the values randomly

Describe alternatives you've considered
replicate the rule per return code

[FEA] Statically link the CUDA runtime

This requests something similar to what was implemented in rapidsai/cudf#9873 for cudf. Ideally spark-rapids-jni should provide a single shared library that statically links the CUDA runtime.

[FEA] Custom Hive Text File Parser

Is your feature request related to a problem? Please describe.
in several discussions with CUDF we have come to the conclusion that the CSV parser is not likely to get a lot of love/fixes any time soon unless we do those fixes ourselves. We have some goals to support the Hive Text format in the next release 23.02, but with the complexity in CUDF parser I think it is going to be simpler for us to write a custom parser ourselves in the short term, and target it directly at the Hive Text file format, specifically the default settings for the HiveTextFile format. We can discuss other settings that might be common with the HiveTextFile format.

Describe the solution you'd like
I would like to have an API that takes a String column as input (we already have split each of the rows), and list of columns to keep. It would then return a table of string columns that we would then parse further into smaller parts. The main goal would be to split on the record deliminator, and handle quotes and escapes correctly.

Describe alternatives you've considered
We fix all of the bugs and new features in CUDF that are needed to do this.

[FEA] Organize Java and JNI source code by functionalities similar to cudf

Currently, the Java binding and JNI in the cudf repository expand significantly. Over years of development, Java source files and JNI cpp files contain thousands LOC in each file. Almost all the functionalities of column/table/etc. are put into the same file. With the increasing number of LOC, things become more and more unorganized. Nowadays, it is very difficult to check for coverage of JNI/Java binding of the same category (such as string functions) because doing so would require scanning through several thousand LOC in several files.

Having a fresh repository, I believe that we can do much better by implementing things from scratch. I suggest that we organize the new Java binding functions and JNI cpp functions by categories, similar to what cudf is doing. For example: string functions would be put in one source file, struct functions would be put in another source file, list functions would be put in different file again, and so on. The way we organize functions can closely follow cudf so we can easily trace back the binding coverage for cudf.

The solution for such organization is very simple:

Organizing JNI cpp functions is trivial: just separate the functions. Nothing is required to do.
Organizing Java binding functions is also very simple. Instead of having all the functions as class members or static members of one Java class, we just leave a few essential member functions (such as getNativeHandle) in the old classes. We then create additional Java classes like CudfStrings, CudfStructs etc. which reflect corresponding cudf's namespaces. The cudf functions (like strings::split) will be bound as static member functions of these new classes (so we will have CudfStrings::split) which will operate on free input columns as function arguments.

The solution above is breaking changes but it is very simple to implement and I believe that it can be a significant improvement for the codebase organization. For long-term development, we should make such breaking changes ASAP so the cost of building things from scratch can be minimal.

[BUG] String to float kernel needs to match Spark's handling of numbers near max value

Describe the bug
Spark CPU handling of string to float conversions has some odd behavior around the max values of floats and doubles. The GPU kernel needs to match these behaviors.

Steps/Code to reproduce bug

val df = Seq("1.7976931348623158E308", "1.79769313486231581E308", "1.7976931348623157E308", "1.7976931348623159E308", "-1.7976931348623158E308", "-1.79769313486231581E308", "-1.7976931348623157E308", "-1.7976931348623159E308", "1.7976931348623158E-308", "1.79769313486231581E-308", "1.7976931348623157E-308", "1.7976931348623159E-308").toDF
df.coalesce(1).selectExpr("*", "CAST(value as double)").show(truncate = false)
+------------------------+-----------------------+
|value                   |value                  |
+------------------------+-----------------------+
|1.7976931348623158E308  |1.7976931348623157E308 |
|1.79769313486231581E308 |Infinity               |
|1.7976931348623157E308  |1.7976931348623157E308 |
|1.7976931348623159E308  |Infinity               |
|-1.7976931348623158E308 |-1.7976931348623157E308|
|-1.79769313486231581E308|-Infinity              |
|-1.7976931348623157E308 |-1.7976931348623157E308|
|-1.7976931348623159E308 |-Infinity              |
|1.7976931348623158E-308 |1.797693134862316E-308 |
|1.79769313486231581E-308|1.797693134862316E-308 |
|1.7976931348623157E-308 |1.7976931348623155E-308|
|1.7976931348623159E-308 |1.797693134862316E-308 |
+------------------------+-----------------------+
spark.conf.set("spark.rapids.sql.enabled", "true")
df.coalesce(1).selectExpr("*", "CAST(value as double)").show(truncate = false)
+------------------------+----------------+
|value                   |value           |
+------------------------+----------------+
|1.7976931348623158E308  |Infinity        |
|1.79769313486231581E308 |1.797693135E308 |
|1.7976931348623157E308  |1.797693135E308 |
|1.7976931348623159E308  |Infinity        |
|-1.7976931348623158E308 |-Infinity       |
|-1.79769313486231581E308|-1.797693135E308|
|-1.7976931348623157E308 |-1.797693135E308|
|-1.7976931348623159E308 |-Infinity       |
|1.7976931348623158E-308 |1.797693135e-308|
|1.79769313486231581E-308|1.797693135e-308|
|1.7976931348623157E-308 |1.797693135e-308|
|1.7976931348623159E-308 |1.797693135e-308|
+------------------------+----------------+

Of note, 1.79769313486231580E308 results in 1.7976931348623157E308, but 1.7976931348623158E308 results in Infinity, which is interesting. This implies that there are some special cases in the code around the edges.

Expected behavior
CPU and GPU conversions should match

[BUG] string to float kernel fails parsing string with only exponent

Describe the bug
The string to float code fails when parsing the string "E15" resulting in a value of 0 instead of a null value.

[FEA] allow dynamic link manner for libcupti on arm64 instance

When building spark-rapids-jni jar on an arm64 instance, the following error is thrown:

[INFO]      [exec] CMake Error at faultinj/CMakeLists.txt:39 (target_link_libraries):
[INFO]      [exec]   Target "cufaultinj" links to:
[INFO]      [exec]
[INFO]      [exec]     CUDA::cupti_static
[INFO]      [exec]
[INFO]      [exec]   but the target was not found.  Possible reasons include:
[INFO]      [exec]
[INFO]      [exec]     * There is a typo in the target name.
[INFO]      [exec]     * A find_package call is missing for an IMPORTED target.
[INFO]      [exec]     * An ALIAS target is missing.

This is due to the lack of static libcupti in nvidia/cuda arm64 docker images(e.g. nvidia/cuda:11.5.2-devel-ubuntu18.04) and only a static link way is provided in the cmake file.

We can add a conditional link for arm64 architectures.

[FEA] Make logger sink configurable

Is your feature request related to a problem? Please describe.
I wish CUDA Fault Injector allowed configurable non-console logger sink such as a file.

Describe the solution you'd like
Add some config key "logSink" with values such as

stderr
stdout
otherfillename.log

stderr can be the default
Describe alternatives you've considered
N/A. Logging to the console are sometimes convenient for a demo. but never for production.

Additional context
N/A

[FEA] Initiate CICD setup

The CICD should include:

signoff check
pre-merge check
nightly build and deploy
periodical submodule sync
auto-merge from pre-release branch to dev branch

[FEA] Delayed, temporary, etc attachment and rule activation

Is your feature request related to a problem? Please describe.
For better supporting non-interactive automated use, I wish the Fault Injection Tool allowed

Starting in unattached state, and CUPTI event subscription happening at a later point, e.g. measured in milliseconds after cuInit
Delayed rule activation measured in milliseconds since cuInit, and most recent CUPTI attach
interception duration measured in real time milliseconds not just interception counts
Peridic activation useful ?

[BUG] use `r` form of multibyte conversion

Describe the bug
In the native parquet footer we try to convert UTF-8 characters to lower case. The only way to do this I found was to convert the data to wide characters, do the lower case conversion, and convert them back to multi-byte (UTF-8).

Recently when looking at the code and docs again I found the following...

https://en.cppreference.com/w/cpp/string/multibyte/mbstowcs

In most implementations, this function updates a global static object of type std::mbstate_t as it processes through the string, and cannot be called simultaneously by two threads, std::mbsrtowcs should be used in such cases.

We are using the non r versions to do the conversions. This is only for a small amount of metadta, but we should switch to the r versions of these functions to be safe.

[DOC] Provide a doc to explain how to build on ARM

We're going to first release a spark-rapids jar for Arm in 22.12.
But we don't have time to adjust the modules in spark-rapids-jni.
For 22.12, we won't release a arm-based jar of spark-rapids-jni, but only release the arm64 jar of rapids-4-spark jar, which packages everything together.
So, we should provide a doc to explain how to build on ARM for the case that a customer wants to build Arm-based jar.

We'll plan to adjust the modules in spark-rapids-jni in 23.02.

[FEA] spark-rapids-jni release workflow

Setup an automation release workflow for jni

[FEA] Add a kernel for integer casting

Is your feature request related to a problem? Please describe.
The main issue is NVIDIA/spark-rapids#5639 and it lays out kernels we would like to create to improve performance of casting in spark-rapids. This issue is for the integer kernel.

Describe the solution you'd like
A kernel should be created to convert strings to integers for reading CSV and JSON.

[FEA] Update maven build with new cudf option

New cudf PR (rapidsai/cudf#12002) has a new option DCUDF_JNI_ENABLE_PROFILING. That option has also been added into cudf Java build config: https://github.com/rapidsai/cudf/pull/12002/files#diff-d8225ebcfc11b480a6e4f54e183b67c3ead51635a167c106d928c2abf1f9ef66R459. We should also update it here too.

Dockerfile should derive from cudf Java Dockerfile

Since this project builds libcudf and libcudfjni, ideally the Dockerfile used for this project should derive from the Dockerfile used for the nightly cudf Java jar builds. Doing so would require publishing the cudf Java Docker image so it can be referenced in this repository's Dockerfile.

[BUG] string to float cast kernel incorrectly casts "-2.21363921575273728E17"

Describe the bug
CPU code is able to convert the string "-2.21363921575273728E17" without issue, but the kernel produces a null value.

[BUG] Warning in `cast_string.cu`

This is not quite a bug, instead something needs to be improved as we have compiler warnings:

cast_string.cu(121): warning #186-D: pointless comparison of unsigned integer with zero
[INFO]      [exec]           detected during:
[INFO]      [exec]             instantiation of "void spark_rapids_jni::detail::string_to_integer_kernel(T *, cudf::bitmask_type *, const char *, const cudf::offset_type *, const cudf::bitmask_type *, cudf::size_type, __nv_bool) [with T=uint8_t]"

[FEA] Map.contains should support a vector parameter

Is your feature request related to a problem? Please describe.
There should be a way to determine if a key is not present in a map column.

Describe the solution you'd like
Create a method similar to contains(scalar) that takes in a vector. This is necessary when we want to handle the case when a key isn't present and want to show the user an error.

[FEA] Add a checksum when downloading boost

As a part of our build we download boost source code. Ideally we should have a checksum with it too so we can verify that nothing changed.

[FEA] Support cast string to float

Is your feature request related to a problem? Please describe.
The main issue is NVIDIA/spark-rapids#5639 and it lays out kernels we would like to create to improve performance of casting in spark-rapids. This issue is for the float/double kernel.

Describe the solution you'd like
A kernel should be created to convert strings to floats

[FEA] Publish the spark-rapids-jni artifact

In order for the RAPIDS Accelerator to start depending on the spark-rapids-jni artifact instead of cudf, we need to publish it so it can be downloaded during the RAPIDS Accelerator builds. Nightly builds should be setup to publish the spark-rapids-jni snapshot jar as we have done for cudf.

[FEA] Allow seed for the random number generator in config

Is your feature request related to a problem? Please describe.
I wish the Fault Injector would allow specifying the seed for the random number generator.
And it should log the default time(0) value. If incorrect failure handling scenario is discovered during an automated run, the developer will be able to reproduce the sequence of interceptions if the CUDA app is otherwise deterministic.

Describe the solution you'd like
top level config "seed"

Describe alternatives you've considered
Hope that the failure is not rare enough as to depend on the exact sequence of faults

Additional context
N/A

[faultinj] Improve C++ constructs

use std::thread instead of pthreads
use RAII-style locking
Class with methods instead of a struct like globalControl
Use C++ paradigms for initialization of the globalControl singleton

h/t @mythrocks for suggestions in #399

[DOC] Document how to port custom changes from cudf into this project

One question that came up during review is how does one who's used to working in cudf apply/test their custom changes to cudf against the RAPIDS Accelerator when it's no longer using cudf directly but instead using the spark-rapids-jni artifact. There should be documentation (and ideally scripts if the steps are complicated or hard to remember) on how one can port changes against the cudf repo, either changes to a local cudf repository or a pending cudf PR, into this repo.

[BUG] 22.06 testCudaAsyncMemoryResourceSize failed w/ latest cudf commit

Describe the bug
nightly failed UT w/ 9e593b3

10:50:06  [ERROR] testCudaAsyncMemoryResourceSize  Time elapsed: 0.008 s  <<< ERROR!
10:50:06  ai.rapids.cudf.CudfException: CUDA error at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-pre_release-4-cuda11/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/detail/dynamic_load_runtime.hpp:139: cudaErrorInvalidValue invalid argument
10:50:06  	at ai.rapids.cudf.Rmm.initializeInternal(Native Method)
10:50:06  	at ai.rapids.cudf.Rmm.initialize(Rmm.java:119)
10:50:06  	at ai.rapids.cudf.RmmTest.testCudaAsyncMemoryResourceSize(RmmTest.java:392)
10:50:06  	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
10:50:06  	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
10:50:06  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
10:50:06  	at java.lang.reflect.Method.invoke(Method.java:498)
10:50:06  	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
10:50:06  	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
10:50:06  	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
10:50:06  	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
10:50:06  	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
10:50:06  	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
10:50:06  	at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
10:50:06  	at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
10:50:06  	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
10:50:06  	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
10:50:06  	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
10:50:06  	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
10:50:06  	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
10:50:06  	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
10:50:06  	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
10:50:06  	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06  	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
10:50:06  	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
10:50:06  	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
10:50:06  	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
10:50:06  	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
10:50:06  	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
10:50:06  	at java.util.ArrayList.forEach(ArrayList.java:1259)
10:50:06  	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
10:50:06  	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
10:50:06  	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
10:50:06  	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
10:50:06  	at java.util.ArrayList.forEach(ArrayList.java:1259)
10:50:06  	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
10:50:06  	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
10:50:06  	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
10:50:06  	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
10:50:06  	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
10:50:06  	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
10:50:06  	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
10:50:06  	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:54)
10:50:06  	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:220)
10:50:06  	at org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:188)
10:50:06  	at org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:202)
10:50:06  	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:181)
10:50:06  	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)
10:50:06  	at org.junit.platform.surefire.provider.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:155)
10:50:06  	at org.junit.platform.surefire.provider.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:134)
10:50:06  	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:383)
10:50:06  	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:344)
10:50:06  	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
10:50:06  	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:417)
10:50:06

[FEA] faultinj CICD

Is your feature request related to a problem? Please describe.
create a ticket to discuss the CICD requirement for faultinj tooling.

There is still some question for the tooling,
A. The artifact is a .so file, where should we deploy it to?
internal only or external artifactory store? Or we ask developers to build it whenever they want the tool?

B. What is the plan for this tooling? do we have plan to release it?
do we have some roadmap for it? like what we are trying to achieve in next release

C. We have several scenarios in design doc, but there is still no specific test specs (SW&HW) and expectation to make sure we have deterministic regular runs nightly. It would be nice to have some tables to clarify the details to help define the scenarios instead of simply giving a command.
e.g.
spark test w/ some specific configs
some faultinj specific configs
driver 450.xx
ubuntu 18.04
GPU w/ 12Gi mem
should return error count X. Then if using driver 465.yy/centos7/24Gi-mem gpu, it should return error count Y/Z/A
Or explicit saying that like cuda/OS/GPU types do not matter here, or we do not care about error count, or if test error out then all the setup meets our expectations. Then we could have a regular run for it

Thanks

[FEA] pre-merge CI setup

Is your feature request related to a problem? Please describe.

work w/ blossom team to get pre-merge blossom CI work

nvidia / spark-rapids-jni Goto Github PK

spark-rapids-jni's Introduction

RAPIDS Accelerator JNI For Apache Spark

Building From Source

spark-rapids-jni's People

Contributors

Stargazers

Watchers

Forkers

spark-rapids-jni's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs