rapids-examples's People
Forkers
jdye64 davidwendt vibhujawa shaneding raydouglass travishester akaanirban ssayyah vinaybagade charlesbluca mayankanand007 mbendris katebsaber ertkonuk python-repository-hub cxz beattherush shashankgaur3 shwina fmcetin abjt11 steviedrew67 jjacobellirapids-examples's Issues
cuBERTopic: Can't install rapids-21.12 so I changed to 22.06 instead; I've also changed NVCC PATH; then, I get import errors and then I get 'Duplicate columns' error.
Hi, I'm trying to use cuBERTopic.
I tried to install using the YAML or using conda code provided by the repository. Both didn't work since they can't find version 21.12. So I decided to install it using version 22.06 with some adaptions for a VM inside Google Cloud Platform, using CUDA-11.0:
conda create -n rapids-22.06 -c rapidsai-nightly -c nvidia -c conda-forge \
rapids=22.06 python=3.8 cudatoolkit=11.0
conda activate rapids-22.06
But, then, I've got a NVCC PATH warning while importing cuBERTopic. So I changed the beginning of cuBERTopic.py file to my current cuda PATH:
if "NVCC" not in os.environ:
os.environ["NVCC"] = "/usr/local/cuda-11.0/bin/nvcc"
warnings.warn(
"NVCC Path not found, set to : /usr/local/cuda-11.0/bin/nvcc . \nPlease set NVCC as appropitate to your environment"
)
Then, when I try to import the libraries as followed by the example notebook, I get an AttributeError: 'NoneType' object has no attribute 'split'; --> 324 cmd = _nvcc.split()
error. But I'm able to import if I change the order of the imports:
from cuBERTopic import gpu_BERTopic
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
from transformers import AutoTokenizer, AutoModel
import torch
from cuBERTopic import gpu_BERTopic
import rmm
import os
os.environ["TOKENIZERS_PARALLELISM"] = "true"
rmm.reinitialize(pool_allocator=True,initial_pool_size=5e+9)
Then, everything works and I can check and see that it's using the GPU for training the Notebook provided as an example, but by the end, I get the following error while using tf-idf: ValueError: Duplicate column names are not allowed
. The full log is:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 topics, probs = topic_model.fit_transform(docs)
File ~/rapids-examples/cuBERT_topic_modelling/cuBERTopic.py:220, in gpu_BERTopic.fit_transform(self, data)
217 del umap_embeddings
219 # Topic representation
--> 220 tf_idf, count, labels = self.create_topics(documents)
221 top_n_words, name_repr = self.extract_top_n_words_per_topic(
222 tf_idf, count, labels, n=30
223 )
225 self.topic_sizes_df["Name"] = self.topic_sizes_df["Topic"].map(name_repr)
File ~/rapids-examples/cuBERT_topic_modelling/cuBERTopic.py:129, in gpu_BERTopic.create_topics(self, docs_df)
117 """Extract topics from the clusters using a class-based TF-IDF
118 Arguments:
119 docs_df: DataFrame containing documents and other information
(...)
125 topic_labels: A list of unique topic labels
126 """
127 topic_labels = docs_df["Topic"].unique()
--> 129 tf_idf, vectorizer = self.new_c_tf_idf(docs_df, len(docs_df))
130 return tf_idf, vectorizer, topic_labels
File ~/rapids-examples/cuBERT_topic_modelling/cuBERTopic.py:107, in gpu_BERTopic.new_c_tf_idf(self, document_df, m, ngram_range)
90 """Calculate a class-based TF-IDF where m is the number of total documents.
91
92 Arguments:
(...)
104 count: object of class CountVecWrapper
105 """
106 count = CountVecWrapper(ngram_range=ngram_range)
--> 107 X = count.fit_transform(document_df)
108 multiplier = None
110 transformer = ClassTFIDF().fit(X, n_samples=m, multiplier=multiplier)
File ~/rapids-examples/cuBERT_topic_modelling/vectorizer/vectorizer.py:54, in CountVecWrapper.fit_transform(self, docs_df)
50 tokenized_df = self._create_tokenized_df(docs)
51 self.vocabulary_ = tokenized_df["token"].unique()
53 merged_count_df = (
---> 54 cudf.merge(tokenized_df, topic_df, how="left")
55 .sort_values("Topic_ID")
56 .rename({"Topic_ID": "doc_id"}, axis=1)
57 )
59 count_df = self._count_vocab(merged_count_df)
61 # TODO: handle empty docids case later
File /opt/conda/envs/rapids-22.06/lib/python3.8/contextlib.py:75, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
72 @wraps(func)
73 def inner(*args, **kwds):
74 with self._recreate_cm():
---> 75 return func(*args, **kwds)
File /opt/conda/envs/rapids-22.06/lib/python3.8/site-packages/cudf/core/dataframe.py:2893, in DataFrame.rename(self, mapper, index, columns, axis, copy, inplace, level, errors)
2890 out = DataFrame(index=self.index)
2892 if columns:
-> 2893 out._data = self._data.rename_levels(mapper=columns, level=level)
2894 else:
2895 out._data = self._data.copy(deep=copy)
File /opt/conda/envs/rapids-22.06/lib/python3.8/site-packages/cudf/core/column_accessor.py:552, in ColumnAccessor.rename_levels(self, mapper, level)
549 new_col_names = [mapper(col_name) for col_name in self.keys()]
551 if len(new_col_names) != len(set(new_col_names)):
--> 552 raise ValueError("Duplicate column names are not allowed")
554 ca = ColumnAccessor(
555 dict(zip(new_col_names, self.values())),
556 level_names=self.level_names,
557 multiindex=self.multiindex,
558 )
560 return self.__class__(ca)
ValueError: Duplicate column names are not allowed
I know I've did a bunch of critical changes here, one on top of another. But maybe you can help me to make it work properly? :)
Wish you the best! Thank you for implementing BERTopic with RAPIDS!
Update shareable-dataframe example for cudf 0.20 release
With the Rapids 0.20 release there were several cmake updates that we can take advantage of in our example to make linking against libcudf easier and we should include those in the shareable-dataframe example.
Hello-world example on how to use cuDF/cuML with Triton
We should add a hello-world example on how to use cuDF/cuML with Triton with appropriate Dockerization.
CC: @jdye64 , @randerzander
Add HTTP client example for Triton example
Add HTTP client example for Triton+RAPIDS example
Duplicate column names are not allowed
Doing the below cause duplicate columns are not allowed exception
cudf.merge(tokenized_df, topic_df, how="left")
.sort_values("Topic_ID")
.rename({"Topic_ID": "doc_id"}, axis=1)
Explore pycuda as an option to write custom cuda code with cudf
Explore pycuda as an option to write custom cuda code with cudf
With a recent PR pycuda got __cuda_array_interface__
allowing interoperability with numba,cupy etc. Pycuda in theory allows us access to the whole cuda driver API. It might be worth exploring how we can use it with cudf.
Having examples that do operations like https://github.com/rapidsai/rapids-examples/tree/main/shareable-dataframes but with Pycuda might make access to CUDA kernels easier without too much glue code.
An example to create will be something on the lines @jdye64 did but with pycuda
Helpful links:
- https://documen.tician.de/pycuda/tutorial.html#executing-a-kernel
- https://documen.tician.de/pycuda/tutorial.html?highlight=numba#interoperability-with-other-libraries-using-the-cuda-array-interface
CC: @beckernick , @randerzander , @jdye64 .
'python-kernel-wrapper' link broken
Rapids Examples section in README section has a broken link 'python-kernel-wrapper.'
Where can I get the download_model.sh when I import model-repository?
E0303 01:18:13.177689 1 model_repository_manager.cc:1186] failed to load 'sentiment_model_pytorch' version 1: Internal: FileNotFoundError: [Errno 2] No such file or directory: '/opt/tritonserver/models/sentiment_model_pytorch/1/model.pt'
CMakeLists.txt for strings_udf should not require building libcudf
The strings_udf/CMakeLists.txt should not require building libcudf from source. This example only needs libcudf installed (headers and libcudf.so) to build and link with. I believe the cudf build generates appropriate Find-CUDF cmake files.
Shared-dataframe unknown file type '.pyx' on 22.08
Hello guys, I try the Shared-dataframe example on RAPIDS 22.08 but get the error about cython. The example works well on 22.02. I have installed cython-0.29.30 by conda. Appreciate it if anyone has idea about it.
rapids-22.08) root@17c65b4b56bb:~/c_python/rapids-examples/test_new# bash compile.sh
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- The CUDA compiler identification is NVIDIA 11.6.112
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-11.6/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Downloading CPM.cmake
-- CUDA_VERSION_MAJOR: 11
-- CUDA_VERSION_MINOR: 6
-- CUDA_VERSION: 11.6
-- Found CUDAToolkit: /usr/local/cuda-11.6/include (found version "11.6.112")
-- Found Threads: TRUE
-- CPM: using local package [email protected]
-- CPM: using local package [email protected]
-- Found Thrust: /conda/envs/rapids-22.08/include/libcudf/Thrust/thrust/cmake/thrust-config.cmake (found version "1.15.0.0")
-- Found rmm: /conda/envs/rapids-22.08/lib/cmake/rmm/rmm-config.cmake (found version "22.8.0")
-- Found libcudacxx: /conda/envs/rapids-22.08/lib/cmake/libcudacxx/libcudacxx-config.cmake (found version "1.7.0")
-- Found cuco: /conda/envs/rapids-22.08/lib/cmake/cuco/cuco-config.cmake (found version "0.0.1")
-- Found raft: /conda/envs/rapids-22.08/lib/cmake/raft/raft-config.cmake (found version "22.8.0") found components: nn distance
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
BUILD_BENCHMARKS
BUILD_TESTS
-- Build files have been written to: /root/c_python/rapids-examples/test_new/cpp/build
[ 50%] Building CUDA object CMakeFiles/shareable_dataframe.dir/src/kernel_wrapper.cu.o
[100%] Linking CUDA shared library libshareable_dataframe.so
[100%] Built target shareable_dataframe
-- Install configuration: ""
-- Installing: /usr/local/lib/libshareable_dataframe.so
-- Set runtime path of "/usr/local/lib/libshareable_dataframe.so" to ""
running build
running build_ext
building 'cudfkernel' extension
error: unknown file type '.pyx' (from 'kernel.pyx')
Traceback (most recent call last):
File "python/python_kernel_wrapper.py", line 4, in <module>
import cudfkernel # Cython bindings to execute existing CUDA Kernels
ModuleNotFoundError: No module named 'cudfkernel'
Add pycuda cudf integration example to the readme.
We should add pycuda cudf integration example to the readme here . https://github.com/rapidsai/rapids-examples#readme
Cuber Topic Installation Google Colab
!nvidia-smi
This get the RAPIDS-Colab install files and test check your GPU. Run this and the next cell only.
Please read the output of this cell. If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!pip install pynvml
!pip install bertopic
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/env-check.py
This will update the Colab environment and restart the kernel. Don't run the next cell until you see the session crash.
!bash rapidsai-csp-utils/colab/update_gcc.sh
import os
os._exit(00)
This will install CondaColab. This will restart your kernel one last time. Run this cell by itself and only run the next cell once you see the session crash.
import condacolab
condacolab.install()
you can now run the rest of the cells as normal
import condacolab
condacolab.check()
Installing RAPIDS is now 'python rapidsai-csp-utils/colab/install_rapids.py '
The options are 'stable' and 'nightly'. Leaving it blank or adding any other words will default to stable.
!python rapidsai-csp-utils/colab/install_rapids.py stable
import os
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ['CONDA_PREFIX'] = '/usr/local'
Explore the newly added support of Dlpack with Rapids example
Trition recently added support for dlpack (See PR). We should look into how this integrates with RAPIDS.
numeric::decimal128 not supported in device code
Hello,
I am doing an integration between C++ and python code. I try building shareable-dataframes
on my container with rapids 22.02 but got an error. Appreciate it if anyone has idea about this.
conda/envs/rapids/include/cudf/utilities/type_dispatcher.hpp(505): error: "numeric::decimal128" contains a 128-bit integer, which is not supported in device code
(rapids) root@test:~/rapids-examples/shareable-dataframes# conda list
WARNING conda.models.version:get_matcher(537): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 12.*, but conda is ignoring the .* and treating it as 12
# packages in environment at /conda/envs/rapids:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
abseil-cpp 20210324.2 h9c3ff4c_0 conda-forge
aiohttp 3.8.1 py38h497a2fe_0 conda-forge
aiosignal 1.2.0 pyhd8ed1ab_0 conda-forge
alsa-lib 1.2.3 h516909a_0 conda-forge
anyio 3.4.0 py38h578d9bd_0 conda-forge
appdirs 1.4.4 pyh9f0ad1d_0 conda-forge
argon2-cffi 21.1.0 py38h497a2fe_2 conda-forge
arrow-cpp 5.0.0 py38h4dc56cc_20_cuda conda-forge
arrow-cpp-proc 3.0.0 cuda conda-forge
asgiref 3.4.1 pyhd8ed1ab_0 conda-forge
async-timeout 4.0.1 pyhd8ed1ab_0 conda-forge
async_generator 1.10 py_0 conda-forge
attrs 21.2.0 pyhd8ed1ab_0 conda-forge
aws-c-auth 0.6.8 hadad3cd_1 conda-forge
aws-c-cal 0.5.12 h70efedd_7 conda-forge
aws-c-common 0.6.17 h7f98852_0 conda-forge
aws-c-compression 0.2.14 h7c7754b_7 conda-forge
aws-c-event-stream 0.2.7 hd2be095_32 conda-forge
aws-c-http 0.6.10 h416565a_3 conda-forge
aws-c-io 0.10.14 he836878_0 conda-forge
aws-c-mqtt 0.7.10 h885097b_0 conda-forge
aws-c-s3 0.1.29 h8d70ed6_0 conda-forge
aws-c-sdkutils 0.1.1 h7c7754b_4 conda-forge
aws-checksums 0.1.12 h7c7754b_6 conda-forge
aws-crt-cpp 0.17.10 h6ab17b9_5 conda-forge
aws-sdk-cpp 1.9.160 h36ff4c5_0 conda-forge
babel 2.9.1 pyh44b312d_0 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
backports.zoneinfo 0.2.1 py38h497a2fe_4 conda-forge
bleach 4.1.0 pyhd8ed1ab_0 conda-forge
blinker 1.4 py_1 conda-forge
blosc 1.21.0 h9c3ff4c_0 conda-forge
bokeh 2.4.0 py38h578d9bd_0 conda-forge
boost 1.74.0 py38h2b96118_4 conda-forge
boost-cpp 1.74.0 h312852a_4 conda-forge
brotli 1.0.9 h7f98852_6 conda-forge
brotli-bin 1.0.9 h7f98852_6 conda-forge
brotlipy 0.7.0 py38h497a2fe_1003 conda-forge
brunsli 0.1 h9c3ff4c_0 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.18.1 h7f98852_0 conda-forge
c-blosc2 2.0.4 h5f21a17_1 conda-forge
ca-certificates 2021.10.8 ha878542_0 conda-forge
cachetools 4.2.4 pyhd8ed1ab_0 conda-forge
cairo 1.16.0 h6cf1ce9_1008 conda-forge
certifi 2021.10.8 py38h578d9bd_1 conda-forge
cffi 1.15.0 py38h3931269_0 conda-forge
cfitsio 3.470 hb418390_7 conda-forge
charls 2.2.0 h9c3ff4c_0 conda-forge
charset-normalizer 2.0.9 pyhd8ed1ab_0 conda-forge
click 8.0.3 py38h578d9bd_1 conda-forge
click-plugins 1.1.1 py_0 conda-forge
cligj 0.7.2 pyhd8ed1ab_1 conda-forge
cloudpickle 2.0.0 pyhd8ed1ab_0 conda-forge
colorama 0.4.4 pyh9f0ad1d_0 conda-forge
colorcet 3.0.0 pyhd8ed1ab_0 conda-forge
cryptography 36.0.1 py38h3e25421_0 conda-forge
cucim 22.02.00a211220 cuda_11_py38_gab8e6a4_31 rapidsai-nightly
cuda-python 11.5.0 py38h3fd9d12_0 nvidia
cudatoolkit 11.2.72 h2bc3f7f_0 nvidia
cudf 22.02.00a211220 cuda_11_py38_ga4dc42d4c6_206 rapidsai-nightly
cudf_kafka 22.02.00a211220 py38_ga4dc42d4c6_206 rapidsai-nightly
cugraph 22.02.00a211220 cuda11_py38_gf80313ec_58 rapidsai-nightly
cuml 22.02.00a211220 cuda11_py38_g03132e811_83 rapidsai-nightly
cupy 9.6.0 py38h177b0fd_0 conda-forge
curl 7.80.0 h2574ce0_0 conda-forge
cusignal 22.02.00a211220 py37_g6a02566_9 rapidsai-nightly
cuspatial 22.02.00a211220 py38_gae17e55_13 rapidsai-nightly
custreamz 22.02.00a211220 py38_ga4dc42d4c6_206 rapidsai-nightly
cuxfilter 22.02.00a211220 py38_g1b76aa8_6 rapidsai-nightly
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
cyrus-sasl 2.1.27 h230043b_5 conda-forge
cytoolz 0.11.2 py38h497a2fe_1 conda-forge
dask 2021.11.2 pyhd8ed1ab_0 conda-forge
dask-core 2021.11.2 pyhd8ed1ab_0 conda-forge
dask-cuda 22.02.00a211220 py38_43 rapidsai-nightly
dask-cudf 22.02.00a211220 cuda_11_py38_ga4dc42d4c6_206 rapidsai-nightly
dask-sql 2021.12.0 py38h578d9bd_0 conda-forge
datashader 0.11.1 pyh9f0ad1d_0 conda-forge
datashape 0.5.4 py_1 conda-forge
debugpy 1.5.1 py38h709712a_0 conda-forge
decorator 5.1.0 pyhd8ed1ab_0 conda-forge
defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge
distributed 2021.11.2 py38h578d9bd_0 conda-forge
dlpack 0.5 h9c3ff4c_0 conda-forge
entrypoints 0.3 pyhd8ed1ab_1003 conda-forge
expat 2.4.1 h9c3ff4c_0 conda-forge
faiss-proc 1.0.0 cuda conda-forge
fastapi 0.70.1 pyhd8ed1ab_0 conda-forge
fastavro 1.4.7 py38h497a2fe_1 conda-forge
fastrlock 0.8 py38h709712a_1 conda-forge
fiona 1.8.20 py38hbb147eb_2 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
fontconfig 2.13.1 hba837de_1005 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.28.5 py38h497a2fe_0 conda-forge
freetype 2.10.4 h0708190_1 conda-forge
freexl 1.0.6 h7f98852_0 conda-forge
frozenlist 1.2.0 py38h497a2fe_1 conda-forge
fsspec 2021.11.1 pyhd8ed1ab_0 conda-forge
gcsfs 2021.11.1 pyhd8ed1ab_0 conda-forge
gdal 3.3.2 py38h81a01a0_3 conda-forge
geopandas 0.9.0 pyhd8ed1ab_1 conda-forge
geopandas-base 0.9.0 pyhd8ed1ab_1 conda-forge
geos 3.9.1 h9c3ff4c_2 conda-forge
geotiff 1.7.0 h08e826d_2 conda-forge
gettext 0.19.8.1 h73d1719_1008 conda-forge
gflags 2.2.2 he1b5a44_1004 conda-forge
giflib 5.2.1 h36c2ea0_2 conda-forge
glog 0.5.0 h48cff8f_0 conda-forge
google-api-core 2.3.2 pyhd8ed1ab_0 conda-forge
google-auth 2.3.3 pyh6c4a22f_0 conda-forge
google-auth-oauthlib 0.4.6 pyhd8ed1ab_0 conda-forge
google-cloud-core 2.2.1 pyh6c4a22f_0 conda-forge
google-cloud-storage 1.43.0 pyh6c4a22f_0 conda-forge
google-crc32c 1.1.2 py38h8838a9a_2 conda-forge
google-resumable-media 2.1.0 pyh6c4a22f_0 conda-forge
googleapis-common-protos 1.53.0 py38h578d9bd_1 conda-forge
graphite2 1.3.13 h58526e2_1001 conda-forge
grpc-cpp 1.42.0 ha1441d3_1 conda-forge
grpcio 1.43.0 py38hdd6454d_0 conda-forge
h11 0.12.0 pyhd8ed1ab_0 conda-forge
harfbuzz 2.9.1 h83ec7ef_1 conda-forge
hdf4 4.2.15 h10796ff_3 conda-forge
hdf5 1.12.1 nompi_h2750804_103 conda-forge
heapdict 1.0.1 py_0 conda-forge
icu 68.2 h9c3ff4c_0 conda-forge
idna 3.1 pyhd3deb0d_0 conda-forge
imagecodecs 2021.8.26 py38hb5ce8f7_1 conda-forge
imageio 2.13.3 pyh239f2a4_0 conda-forge
importlib-metadata 4.10.0 py38h578d9bd_0 conda-forge
importlib_metadata 4.10.0 hd8ed1ab_0 conda-forge
importlib_resources 5.4.0 pyhd8ed1ab_0 conda-forge
ipykernel 6.6.0 py38he5a9106_0 conda-forge
ipython 7.30.1 py38h578d9bd_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
ipywidgets 7.6.5 pyhd8ed1ab_0 conda-forge
jbig 2.1 h7f98852_2003 conda-forge
jedi 0.18.1 py38h578d9bd_0 conda-forge
jinja2 3.0.3 pyhd8ed1ab_0 conda-forge
joblib 1.1.0 pyhd8ed1ab_0 conda-forge
jpeg 9d h36c2ea0_0 conda-forge
jpype1 1.3.0 py38h1fd1430_2 conda-forge
json-c 0.15 h98cffda_0 conda-forge
json5 0.9.5 pyh9f0ad1d_0 conda-forge
jsonschema 4.3.1 pyhd8ed1ab_0 conda-forge
jupyter-server-proxy 3.2.0 pyhd8ed1ab_0 conda-forge
jupyter_client 7.1.0 pyhd8ed1ab_0 conda-forge
jupyter_core 4.9.1 py38h578d9bd_1 conda-forge
jupyter_server 1.13.1 pyhd8ed1ab_0 conda-forge
jupyterlab 3.3.1 pyhd8ed1ab_0 conda-forge
jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge
jupyterlab_server 2.10.3 pyhd8ed1ab_0 conda-forge
jupyterlab_widgets 1.0.2 pyhd8ed1ab_0 conda-forge
jxrlib 1.1 h7f98852_2 conda-forge
kealib 1.4.14 h87e4c3c_3 conda-forge
kiwisolver 1.3.2 py38h1fd1430_1 conda-forge
krb5 1.19.2 hcc1bbae_3 conda-forge
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
lerc 3.0 h9c3ff4c_0 conda-forge
libaec 1.0.6 h9c3ff4c_0 conda-forge
libblas 3.9.0 12_linux64_openblas conda-forge
libbrotlicommon 1.0.9 h7f98852_6 conda-forge
libbrotlidec 1.0.9 h7f98852_6 conda-forge
libbrotlienc 1.0.9 h7f98852_6 conda-forge
libcblas 3.9.0 12_linux64_openblas conda-forge
libcrc32c 1.1.2 h9c3ff4c_0 conda-forge
libcucim 22.02.00a211220 cuda11_gab8e6a4_31 rapidsai-nightly
libcudf 22.02.00a211220 cuda11_ga4dc42d4c6_206 rapidsai-nightly
libcudf_kafka 22.02.00a211220 ga4dc42d4c6_206 rapidsai-nightly
libcugraph 22.02.00a211220 cuda11_gf80313ec_58 rapidsai-nightly
libcuml 22.02.00a211220 cuda11_g03132e811_83 rapidsai-nightly
libcumlprims 22.02.00a211213 cuda11_g2dcab39_12 rapidsai-nightly
libcurl 7.80.0 h2574ce0_0 conda-forge
libcusolver 11.3.2.107 hc875929_0 nvidia
libcuspatial 22.02.00a211220 cuda11_gae17e55_13 rapidsai-nightly
libdap4 3.20.6 hd7c4107_2 conda-forge
libdeflate 1.8 h7f98852_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libevent 2.1.10 h9b69904_4 conda-forge
libfaiss 1.7.0 cuda112h5bea7ad_8_cuda conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 11.2.0 h1d223b6_11 conda-forge
libgcrypt 1.9.4 h7f98852_0 conda-forge
libgdal 3.3.2 h6acdded_3 conda-forge
libgfortran-ng 11.2.0 h69a702a_11 conda-forge
libgfortran5 11.2.0 h5c6108e_11 conda-forge
libglib 2.70.2 h174f98d_0 conda-forge
libgomp 11.2.0 h1d223b6_11 conda-forge
libgpg-error 1.42 h9c3ff4c_0 conda-forge
libgsasl 1.10.0 h5b4c23d_0 conda-forge
libhwloc 2.3.0 h5e5b7d1_1 conda-forge
libiconv 1.16 h516909a_0 conda-forge
libkml 1.3.0 h238a007_1014 conda-forge
liblapack 3.9.0 12_linux64_openblas conda-forge
libllvm11 11.1.0 hf817b99_2 conda-forge
libnetcdf 4.8.1 nompi_hb3fd0d9_101 conda-forge
libnghttp2 1.43.0 h812cca2_1 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libntlm 1.4 h7f98852_1002 conda-forge
libopenblas 0.3.18 pthreads_h8fe5266_0 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libpq 13.5 hd57d9b9_1 conda-forge
libprotobuf 3.19.1 h780b84a_0 conda-forge
librdkafka 1.6.1 hc49e61c_1 conda-forge
librmm 22.02.00a211220 cuda11_g846d638_23 rapidsai-nightly
librttopo 1.1.0 h1185371_6 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libspatialindex 1.9.3 h9c3ff4c_4 conda-forge
libspatialite 5.0.1 h5cf074c_8 conda-forge
libssh2 1.10.0 ha56f1ee_2 conda-forge
libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge
libthrift 0.15.0 he6d91bd_1 conda-forge
libtiff 4.3.0 h6f004c6_2 conda-forge
libutf8proc 2.7.0 h7f98852_0 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libuv 1.42.0 h7f98852_0 conda-forge
libwebp 1.2.1 h3452ae3_0 conda-forge
libwebp-base 1.2.1 h7f98852_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libxgboost 1.5.0dev.rapidsai22.02 cuda11.2_0 rapidsai-nightly
libxml2 2.9.12 h72842e0_0 conda-forge
libzip 1.8.0 h4de3113_1 conda-forge
libzlib 1.2.11 h36c2ea0_1013 conda-forge
libzopfli 1.0.3 h9c3ff4c_0 conda-forge
llvmlite 0.37.0 py38h4630a5e_1 conda-forge
locket 0.2.0 py_2 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
mapclassify 2.4.3 pyhd8ed1ab_0 conda-forge
markdown 3.3.6 pyhd8ed1ab_0 conda-forge
markupsafe 2.0.1 py38h497a2fe_1 conda-forge
matplotlib-base 3.5.1 py38hf4fb855_0 conda-forge
matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge
mistune 0.8.4 py38h497a2fe_1005 conda-forge
msgpack-python 1.0.3 py38h1fd1430_0 conda-forge
multidict 5.2.0 py38h497a2fe_1 conda-forge
multipledispatch 0.6.0 py_0 conda-forge
munch 2.5.0 py_0 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
nbclassic 0.3.6 pyhd8ed1ab_0 conda-forge
nbclient 0.5.9 pyhd8ed1ab_0 conda-forge
nbconvert 6.3.0 py38h578d9bd_1 conda-forge
nbformat 5.1.3 pyhd8ed1ab_0 conda-forge
nccl 2.11.4.1 hdc17891_0 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
nest-asyncio 1.5.4 pyhd8ed1ab_0 conda-forge
networkx 2.6.3 pyhd8ed1ab_1 conda-forge
nodejs 14.17.4 h92b4a50_0 conda-forge
notebook 6.4.6 pyha770c72_0 conda-forge
notebook-shim 0.1.0 pyhd8ed1ab_0 conda-forge
nspr 4.32 h9c3ff4c_1 conda-forge
nss 3.73 hb5efdd6_0 conda-forge
numba 0.54.1 py38h4bf6c61_0 conda-forge
numpy 1.20.3 py38h9894fe3_1 conda-forge
nvtx 0.2.3 py38h497a2fe_1 conda-forge
oauthlib 3.1.1 pyhd8ed1ab_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openjdk 11.0.9.1 h5cc2fde_1 conda-forge
openjpeg 2.4.0 hb52868f_1 conda-forge
openssl 1.1.1l h7f98852_0 conda-forge
orc 1.7.1 h1be678f_1 conda-forge
packaging 21.3 pyhd8ed1ab_0 conda-forge
pandas 1.3.5 py38h43a58ef_0 conda-forge
pandoc 2.16.2 h7f98852_0 conda-forge
pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge
panel 0.12.4 pyhd8ed1ab_0 conda-forge
param 1.12.0 pyh6c4a22f_0 conda-forge
parquet-cpp 1.5.1 2 conda-forge
parso 0.8.3 pyhd8ed1ab_0 conda-forge
partd 1.2.0 pyhd8ed1ab_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pexpect 4.8.0 pyh9f0ad1d_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 8.4.0 py38h8e6f84c_0 conda-forge
pip 21.3.1 pyhd8ed1ab_0 conda-forge
pixman 0.40.0 h36c2ea0_0 conda-forge
pooch 1.5.2 pyhd8ed1ab_0 conda-forge
poppler 21.09.0 ha39eefc_3 conda-forge
poppler-data 0.4.11 hd8ed1ab_0 conda-forge
postgresql 13.5 h2510834_1 conda-forge
proj 8.1.0 h277dcde_1 conda-forge
prometheus_client 0.12.0 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.24 pyha770c72_0 conda-forge
protobuf 3.19.1 py38h709712a_1 conda-forge
psutil 5.8.0 py38h497a2fe_2 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
ptxcompiler 0.2.0 py38hb739d79_0 rapidsai-nightly
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
py-xgboost 1.5.0dev.rapidsai22.02 cuda11.2py38_0 rapidsai-nightly
pyarrow 5.0.0 py38ha746e9d_20_cuda conda-forge
pyasn1 0.4.8 py_0 conda-forge
pyasn1-modules 0.2.7 py_0 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pyct 0.4.6 py_0 conda-forge
pyct-core 0.4.6 py_0 conda-forge
pydantic 1.8.2 py38h497a2fe_2 conda-forge
pydeck 0.5.0 pyh9f0ad1d_0 conda-forge
pyee 8.1.0 pyh9f0ad1d_0 conda-forge
pygments 2.10.0 pyhd8ed1ab_0 conda-forge
pyjwt 2.3.0 pyhd8ed1ab_1 conda-forge
pynvml 11.4.1 pyhd8ed1ab_0 conda-forge
pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge
pyparsing 3.0.6 pyhd8ed1ab_0 conda-forge
pyppeteer 0.2.6 pyhd8ed1ab_0 conda-forge
pyproj 3.1.0 py38h3701b11_4 conda-forge
pyrsistent 0.18.0 py38h497a2fe_0 conda-forge
pysocks 1.7.1 py38h578d9bd_4 conda-forge
python 3.8.12 hb7a2778_2_cpython conda-forge
python-confluent-kafka 1.6.0 py38h497a2fe_1 conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-tzdata 2021.5 pyhd8ed1ab_0 conda-forge
python_abi 3.8 2_cp38 conda-forge
pytz 2021.3 pyhd8ed1ab_0 conda-forge
pytz-deprecation-shim 0.1.0.post0 py38h578d9bd_1 conda-forge
pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge
pyviz_comms 2.1.0 pyhd8ed1ab_0 conda-forge
pywavelets 1.2.0 py38h6c62de6_1 conda-forge
pyyaml 6.0 py38h497a2fe_3 conda-forge
pyzmq 22.3.0 py38h2035c66_1 conda-forge
rapids 22.02.00a211220 cuda11.2_py38_gae421a4_92 rapidsai-nightly
rapids-xgboost 22.02.00a211220 cuda11.2_py38_gae421a4_92 rapidsai-nightly
re2 2021.11.01 h9c3ff4c_0 conda-forge
readline 8.1 h46c0cb4_0 conda-forge
requests 2.26.0 pyhd8ed1ab_1 conda-forge
requests-oauthlib 1.3.0 pyh9f0ad1d_0 conda-forge
rmm 22.02.00a211219 cuda11_py38_g846d638_23_has_cma rapidsai-nightly
rsa 4.8 pyhd8ed1ab_0 conda-forge
rtree 0.9.7 py38h02d302b_3 conda-forge
s2n 1.3.0 h9b69904_0 conda-forge
scikit-image 0.18.1 py38h51da96c_0 conda-forge
scikit-learn 1.0.1 py38h1561384_3 conda-forge
scipy 1.7.3 py38h56a6a73_0 conda-forge
send2trash 1.8.0 pyhd8ed1ab_0 conda-forge
setuptools 59.6.0 py38h578d9bd_0 conda-forge
shapely 1.8.0 py38hb7fe4a8_0 conda-forge
simpervisor 0.4 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
snappy 1.1.8 he1b5a44_3 conda-forge
sniffio 1.2.0 py38h578d9bd_2 conda-forge
sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge
spdlog 1.8.5 h4bd325d_0 conda-forge
sqlite 3.37.0 h9cd32fc_0 conda-forge
starlette 0.16.0 pyhd8ed1ab_0 conda-forge
streamz 0.6.3 pyh6c4a22f_0 conda-forge
tabulate 0.8.9 pyhd8ed1ab_0 conda-forge
tblib 1.7.0 pyhd8ed1ab_0 conda-forge
terminado 0.12.1 py38h578d9bd_1 conda-forge
testpath 0.5.0 pyhd8ed1ab_0 conda-forge
threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge
tifffile 2021.11.2 pyhd8ed1ab_0 conda-forge
tiledb 2.3.4 he87e0bf_0 conda-forge
tk 8.6.11 h27826a3_1 conda-forge
toolz 0.11.2 pyhd8ed1ab_0 conda-forge
tornado 6.1 py38h497a2fe_2 conda-forge
tqdm 4.62.3 pyhd8ed1ab_0 conda-forge
traitlets 5.1.1 pyhd8ed1ab_0 conda-forge
treelite 2.1.0 py38hdd725b4_0 conda-forge
treelite-runtime 2.1.0 pypi_0 pypi
typing-extensions 4.0.1 hd8ed1ab_0 conda-forge
typing_extensions 4.0.1 pyha770c72_0 conda-forge
tzcode 2021e h7f98852_0 conda-forge
tzdata 2021e he74cb21_0 conda-forge
tzlocal 4.1 py38h578d9bd_1 conda-forge
ucx 1.11.2+gef2bbcf cuda11.2_0 rapidsai-nightly
ucx-proc 1.0.0 gpu rapidsai-nightly
ucx-py 0.24.0a211220 py38_gef2bbcf_18 rapidsai-nightly
unicodedata2 13.0.0.post2 py38h497a2fe_4 conda-forge
urllib3 1.26.7 pyhd8ed1ab_0 conda-forge
uvicorn 0.16.0 py38h578d9bd_0 conda-forge
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
webencodings 0.5.1 py_1 conda-forge
websocket-client 1.2.3 pyhd8ed1ab_0 conda-forge
websockets 9.1 py38h497a2fe_0 conda-forge
wheel 0.37.0 pyhd8ed1ab_1 conda-forge
widgetsnbextension 3.5.2 py38h578d9bd_1 conda-forge
xarray 0.20.2 pyhd8ed1ab_0 conda-forge
xerces-c 3.2.3 h9d8b166_3 conda-forge
xgboost 1.5.0dev.rapidsai22.02 cuda11.2py38_0 rapidsai-nightly
xorg-fixesproto 5.0 h7f98852_1002 conda-forge
xorg-inputproto 2.3.2 h7f98852_1002 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.0.10 h7f98852_0 conda-forge
xorg-libsm 1.2.3 hd9c2040_1000 conda-forge
xorg-libx11 1.7.2 h7f98852_0 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h7f98852_1 conda-forge
xorg-libxfixes 5.0.3 h7f98852_1004 conda-forge
xorg-libxi 1.7.10 h7f98852_0 conda-forge
xorg-libxrender 0.9.10 h7f98852_1003 conda-forge
xorg-libxtst 1.2.3 h7f98852_1002 conda-forge
xorg-recordproto 1.14.2 h7f98852_1002 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h7f98852_1002 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xz 5.2.5 h516909a_1 conda-forge
yaml 0.2.5 h516909a_0 conda-forge
yarl 1.7.2 py38h497a2fe_1 conda-forge
zeromq 4.3.4 h9c3ff4c_1 conda-forge
zfp 0.5.5 h9c3ff4c_8 conda-forge
zict 2.0.0 py_0 conda-forge
zipp 3.6.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.11 h36c2ea0_1013 conda-forge
zstd 1.5.0 ha95c52a_0 conda-forge
cuBERTtopic error: cuDF failure at: [...] Could not open vocab/voc_hash.txt
I've installed Rapids using
mamba create -n rapids-22.04 -c rapidsai -c nvidia -c conda-forge rapids=22.04 python=3.9 cudatoolkit=11.3 dask-sql --no-channel-priority
and then mamba install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
I've tried to follow the example at:
https://github.com/rapidsai/rapids-examples/blob/main/cuBERT_topic_modelling/berttopic_example.ipynb
After cloning, and pip install -e .
cuBERTopic I ran:
from cuBERTopic import gpu_BERTopic
gpu_topic = gpu_BERTopic()
topics_gpu, probs_gpu = gpu_topic.fit_transform(docs)
The last line fails with:
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_1502/660932292.py in
----> 1 topics_gpu, probs_gpu = gpu_topic.fit_transform(docs)~/rapids-examples/cuBERT_topic_modelling/cuBERTopic.py in fit_transform(self, data)
204
205 # Extract embeddings
--> 206 embeddings = create_embeddings(
207 documents.Document, self.embedding_model, self.vocab_file
208 )~/rapids-examples/cuBERT_topic_modelling/embedding_extraction.py in create_embeddings(sentences, embedding_model, vocab_file)
71 """
72
---> 73 cudf_tokenizer = SubwordTokenizer(vocab_file, do_lower_case=True)
74 batch_size = 256
75 pooling_output_ls = []/opt/conda/envs/rapids-22.04/lib/python3.9/site-packages/cudf/core/subword_tokenizer.py in init(self, hash_file, do_lower_case)
53
54 self.do_lower_case = do_lower_case
---> 55 self.vocab_file = cpp_hashed_vocabulary(hash_file)
56
57 def call(cudf/_lib/nvtext/subword_tokenize.pyx in cudf._lib.nvtext.subword_tokenize.Hashed_Vocabulary.cinit()
RuntimeError: cuDF failure at: /workspace/.conda-bld/work/cpp/src/text/subword/load_hash_file.cu:183: Could not open vocab/voc_hash.txt
string_udf example cleanup
Let's make the string_udf example simpler with Python bindings that work with cudf Python. Something like:
import cudf
df = cudf.DataFrame({'id': [0, 1, 2], 'val': ['abc', 'def', 'ghi']}
df['result'] = cpp_string_udf(df['val'])
instead of a CLI binary
Moving from `pynvml` to `nvidia-ml-py`
Currently this is making use of pynvml
in a few places:
rapids-examples/dask-metrics/README.md
Line 189 in bedd00f
However we would like to move to nvidia-ml-py
in the future. Raising this issue to track this work
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.