GithubHelp home page GithubHelp logo

rapidsai / rapids-examples Goto Github PK

View Code? Open in Web Editor NEW
31.0 31.0 24.0 9.93 MB

Dockerfile 0.65% CMake 1.45% C++ 3.51% Cuda 4.80% Python 9.77% Shell 0.10% Jupyter Notebook 79.48% Cython 0.25% Batchfile 0.01%

rapids-examples's People

Contributors

ajschmidt8 avatar akaanirban avatar davidwendt avatar jdye64 avatar jjacobelli avatar mayankanand007 avatar randerzander avatar raydouglass avatar shaneding avatar shwina avatar travishester avatar vibhujawa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

rapids-examples's Issues

cuBERTopic: Can't install rapids-21.12 so I changed to 22.06 instead; I've also changed NVCC PATH; then, I get import errors and then I get 'Duplicate columns' error.

Hi, I'm trying to use cuBERTopic.

I tried to install using the YAML or using conda code provided by the repository. Both didn't work since they can't find version 21.12. So I decided to install it using version 22.06 with some adaptions for a VM inside Google Cloud Platform, using CUDA-11.0:

conda create -n rapids-22.06 -c rapidsai-nightly -c nvidia -c conda-forge \
    rapids=22.06 python=3.8 cudatoolkit=11.0
conda activate rapids-22.06

But, then, I've got a NVCC PATH warning while importing cuBERTopic. So I changed the beginning of cuBERTopic.py file to my current cuda PATH:

if "NVCC" not in os.environ:
    os.environ["NVCC"] = "/usr/local/cuda-11.0/bin/nvcc"
    warnings.warn(
        "NVCC Path not found, set to  : /usr/local/cuda-11.0/bin/nvcc . \nPlease set NVCC as appropitate to your environment"
    )

Then, when I try to import the libraries as followed by the example notebook, I get an AttributeError: 'NoneType' object has no attribute 'split'; --> 324 cmd = _nvcc.split() error. But I'm able to import if I change the order of the imports:

from cuBERTopic import gpu_BERTopic
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
from transformers import AutoTokenizer, AutoModel
import torch
from cuBERTopic import gpu_BERTopic
import rmm
import os
os.environ["TOKENIZERS_PARALLELISM"] = "true"
rmm.reinitialize(pool_allocator=True,initial_pool_size=5e+9)

Then, everything works and I can check and see that it's using the GPU for training the Notebook provided as an example, but by the end, I get the following error while using tf-idf: ValueError: Duplicate column names are not allowed. The full log is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 topics, probs = topic_model.fit_transform(docs)

File ~/rapids-examples/cuBERT_topic_modelling/cuBERTopic.py:220, in gpu_BERTopic.fit_transform(self, data)
    217 del umap_embeddings
    219 # Topic representation
--> 220 tf_idf, count, labels = self.create_topics(documents)
    221 top_n_words, name_repr = self.extract_top_n_words_per_topic(
    222     tf_idf, count, labels, n=30
    223 )
    225 self.topic_sizes_df["Name"] = self.topic_sizes_df["Topic"].map(name_repr)

File ~/rapids-examples/cuBERT_topic_modelling/cuBERTopic.py:129, in gpu_BERTopic.create_topics(self, docs_df)
    117 """Extract topics from the clusters using a class-based TF-IDF
    118 Arguments:
    119     docs_df: DataFrame containing documents and other information
   (...)
    125     topic_labels: A list of unique topic labels
    126 """
    127 topic_labels = docs_df["Topic"].unique()
--> 129 tf_idf, vectorizer = self.new_c_tf_idf(docs_df, len(docs_df))
    130 return tf_idf, vectorizer, topic_labels

File ~/rapids-examples/cuBERT_topic_modelling/cuBERTopic.py:107, in gpu_BERTopic.new_c_tf_idf(self, document_df, m, ngram_range)
     90 """Calculate a class-based TF-IDF where m is the number of total documents.
     91 
     92 Arguments:
   (...)
    104     count: object of class CountVecWrapper
    105 """
    106 count = CountVecWrapper(ngram_range=ngram_range)
--> 107 X = count.fit_transform(document_df)
    108 multiplier = None
    110 transformer = ClassTFIDF().fit(X, n_samples=m, multiplier=multiplier)

File ~/rapids-examples/cuBERT_topic_modelling/vectorizer/vectorizer.py:54, in CountVecWrapper.fit_transform(self, docs_df)
     50 tokenized_df = self._create_tokenized_df(docs)
     51 self.vocabulary_ = tokenized_df["token"].unique()
     53 merged_count_df = (
---> 54     cudf.merge(tokenized_df, topic_df, how="left")
     55     .sort_values("Topic_ID")
     56     .rename({"Topic_ID": "doc_id"}, axis=1)
     57 )
     59 count_df = self._count_vocab(merged_count_df)
     61 # TODO: handle empty docids case later

File /opt/conda/envs/rapids-22.06/lib/python3.8/contextlib.py:75, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     72 @wraps(func)
     73 def inner(*args, **kwds):
     74     with self._recreate_cm():
---> 75         return func(*args, **kwds)

File /opt/conda/envs/rapids-22.06/lib/python3.8/site-packages/cudf/core/dataframe.py:2893, in DataFrame.rename(self, mapper, index, columns, axis, copy, inplace, level, errors)
   2890     out = DataFrame(index=self.index)
   2892 if columns:
-> 2893     out._data = self._data.rename_levels(mapper=columns, level=level)
   2894 else:
   2895     out._data = self._data.copy(deep=copy)

File /opt/conda/envs/rapids-22.06/lib/python3.8/site-packages/cudf/core/column_accessor.py:552, in ColumnAccessor.rename_levels(self, mapper, level)
    549         new_col_names = [mapper(col_name) for col_name in self.keys()]
    551     if len(new_col_names) != len(set(new_col_names)):
--> 552         raise ValueError("Duplicate column names are not allowed")
    554     ca = ColumnAccessor(
    555         dict(zip(new_col_names, self.values())),
    556         level_names=self.level_names,
    557         multiindex=self.multiindex,
    558     )
    560 return self.__class__(ca)

ValueError: Duplicate column names are not allowed

I know I've did a bunch of critical changes here, one on top of another. But maybe you can help me to make it work properly? :)

Wish you the best! Thank you for implementing BERTopic with RAPIDS!

Duplicate column names are not allowed

Doing the below cause duplicate columns are not allowed exception
cudf.merge(tokenized_df, topic_df, how="left")
.sort_values("Topic_ID")
.rename({"Topic_ID": "doc_id"}, axis=1)

Explore pycuda as an option to write custom cuda code with cudf

Explore pycuda as an option to write custom cuda code with cudf

With a recent PR pycuda got __cuda_array_interface__ allowing interoperability with numba,cupy etc. Pycuda in theory allows us access to the whole cuda driver API. It might be worth exploring how we can use it with cudf.

Having examples that do operations like https://github.com/rapidsai/rapids-examples/tree/main/shareable-dataframes but with Pycuda might make access to CUDA kernels easier without too much glue code.

An example to create will be something on the lines @jdye64 did but with pycuda

Helpful links:

  1. https://documen.tician.de/pycuda/tutorial.html#executing-a-kernel
  2. https://documen.tician.de/pycuda/tutorial.html?highlight=numba#interoperability-with-other-libraries-using-the-cuda-array-interface

CC: @beckernick , @randerzander , @jdye64 .

Shared-dataframe unknown file type '.pyx' on 22.08

Hello guys, I try the Shared-dataframe example on RAPIDS 22.08 but get the error about cython. The example works well on 22.02. I have installed cython-0.29.30 by conda. Appreciate it if anyone has idea about it.

rapids-22.08) root@17c65b4b56bb:~/c_python/rapids-examples/test_new# bash compile.sh 
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- The CUDA compiler identification is NVIDIA 11.6.112
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-11.6/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Downloading CPM.cmake
-- CUDA_VERSION_MAJOR: 11
-- CUDA_VERSION_MINOR: 6
-- CUDA_VERSION: 11.6
-- Found CUDAToolkit: /usr/local/cuda-11.6/include (found version "11.6.112") 
-- Found Threads: TRUE  
-- CPM: using local package [email protected]
-- CPM: using local package [email protected]
-- Found Thrust: /conda/envs/rapids-22.08/include/libcudf/Thrust/thrust/cmake/thrust-config.cmake (found version "1.15.0.0") 
-- Found rmm: /conda/envs/rapids-22.08/lib/cmake/rmm/rmm-config.cmake (found version "22.8.0") 
-- Found libcudacxx: /conda/envs/rapids-22.08/lib/cmake/libcudacxx/libcudacxx-config.cmake (found version "1.7.0") 
-- Found cuco: /conda/envs/rapids-22.08/lib/cmake/cuco/cuco-config.cmake (found version "0.0.1") 
-- Found raft: /conda/envs/rapids-22.08/lib/cmake/raft/raft-config.cmake (found version "22.8.0") found components: nn distance 
-- Configuring done
-- Generating done
CMake Warning:
  Manually-specified variables were not used by the project:

    BUILD_BENCHMARKS
    BUILD_TESTS


-- Build files have been written to: /root/c_python/rapids-examples/test_new/cpp/build
[ 50%] Building CUDA object CMakeFiles/shareable_dataframe.dir/src/kernel_wrapper.cu.o
[100%] Linking CUDA shared library libshareable_dataframe.so
[100%] Built target shareable_dataframe
-- Install configuration: ""
-- Installing: /usr/local/lib/libshareable_dataframe.so
-- Set runtime path of "/usr/local/lib/libshareable_dataframe.so" to ""
running build
running build_ext
building 'cudfkernel' extension
error: unknown file type '.pyx' (from 'kernel.pyx')
Traceback (most recent call last):
  File "python/python_kernel_wrapper.py", line 4, in <module>
    import cudfkernel  # Cython bindings to execute existing CUDA Kernels
ModuleNotFoundError: No module named 'cudfkernel'



Cuber Topic Installation Google Colab

!nvidia-smi

This get the RAPIDS-Colab install files and test check your GPU. Run this and the next cell only.

Please read the output of this cell. If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.

!pip install pynvml
!pip install bertopic
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/env-check.py

This will update the Colab environment and restart the kernel. Don't run the next cell until you see the session crash.

!bash rapidsai-csp-utils/colab/update_gcc.sh
import os
os._exit(00)

This will install CondaColab. This will restart your kernel one last time. Run this cell by itself and only run the next cell once you see the session crash.

import condacolab
condacolab.install()

you can now run the rest of the cells as normal

import condacolab
condacolab.check()

Installing RAPIDS is now 'python rapidsai-csp-utils/colab/install_rapids.py '

The options are 'stable' and 'nightly'. Leaving it blank or adding any other words will default to stable.

!python rapidsai-csp-utils/colab/install_rapids.py stable
import os
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ['CONDA_PREFIX'] = '/usr/local'

numeric::decimal128 not supported in device code

Hello,

I am doing an integration between C++ and python code. I try building shareable-dataframes on my container with rapids 22.02 but got an error. Appreciate it if anyone has idea about this.

conda/envs/rapids/include/cudf/utilities/type_dispatcher.hpp(505): error: "numeric::decimal128" contains a 128-bit integer, which is not supported in device code
(rapids) root@test:~/rapids-examples/shareable-dataframes# conda list 
WARNING conda.models.version:get_matcher(537): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 12.*, but conda is ignoring the .* and treating it as 12
# packages in environment at /conda/envs/rapids:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
abseil-cpp                20210324.2           h9c3ff4c_0    conda-forge
aiohttp                   3.8.1            py38h497a2fe_0    conda-forge
aiosignal                 1.2.0              pyhd8ed1ab_0    conda-forge
alsa-lib                  1.2.3                h516909a_0    conda-forge
anyio                     3.4.0            py38h578d9bd_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
argon2-cffi               21.1.0           py38h497a2fe_2    conda-forge
arrow-cpp                 5.0.0           py38h4dc56cc_20_cuda    conda-forge
arrow-cpp-proc            3.0.0                      cuda    conda-forge
asgiref                   3.4.1              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.1              pyhd8ed1ab_0    conda-forge
async_generator           1.10                       py_0    conda-forge
attrs                     21.2.0             pyhd8ed1ab_0    conda-forge
aws-c-auth                0.6.8                hadad3cd_1    conda-forge
aws-c-cal                 0.5.12               h70efedd_7    conda-forge
aws-c-common              0.6.17               h7f98852_0    conda-forge
aws-c-compression         0.2.14               h7c7754b_7    conda-forge
aws-c-event-stream        0.2.7               hd2be095_32    conda-forge
aws-c-http                0.6.10               h416565a_3    conda-forge
aws-c-io                  0.10.14              he836878_0    conda-forge
aws-c-mqtt                0.7.10               h885097b_0    conda-forge
aws-c-s3                  0.1.29               h8d70ed6_0    conda-forge
aws-c-sdkutils            0.1.1                h7c7754b_4    conda-forge
aws-checksums             0.1.12               h7c7754b_6    conda-forge
aws-crt-cpp               0.17.10              h6ab17b9_5    conda-forge
aws-sdk-cpp               1.9.160              h36ff4c5_0    conda-forge
babel                     2.9.1              pyh44b312d_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
backports.zoneinfo        0.2.1            py38h497a2fe_4    conda-forge
bleach                    4.1.0              pyhd8ed1ab_0    conda-forge
blinker                   1.4                        py_1    conda-forge
blosc                     1.21.0               h9c3ff4c_0    conda-forge
bokeh                     2.4.0            py38h578d9bd_0    conda-forge
boost                     1.74.0           py38h2b96118_4    conda-forge
boost-cpp                 1.74.0               h312852a_4    conda-forge
brotli                    1.0.9                h7f98852_6    conda-forge
brotli-bin                1.0.9                h7f98852_6    conda-forge
brotlipy                  0.7.0           py38h497a2fe_1003    conda-forge
brunsli                   0.1                  h9c3ff4c_0    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
c-blosc2                  2.0.4                h5f21a17_1    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
cachetools                4.2.4              pyhd8ed1ab_0    conda-forge
cairo                     1.16.0            h6cf1ce9_1008    conda-forge
certifi                   2021.10.8        py38h578d9bd_1    conda-forge
cffi                      1.15.0           py38h3931269_0    conda-forge
cfitsio                   3.470                hb418390_7    conda-forge
charls                    2.2.0                h9c3ff4c_0    conda-forge
charset-normalizer        2.0.9              pyhd8ed1ab_0    conda-forge
click                     8.0.3            py38h578d9bd_1    conda-forge
click-plugins             1.1.1                      py_0    conda-forge
cligj                     0.7.2              pyhd8ed1ab_1    conda-forge
cloudpickle               2.0.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
colorcet                  3.0.0              pyhd8ed1ab_0    conda-forge
cryptography              36.0.1           py38h3e25421_0    conda-forge
cucim                     22.02.00a211220 cuda_11_py38_gab8e6a4_31    rapidsai-nightly
cuda-python               11.5.0           py38h3fd9d12_0    nvidia
cudatoolkit               11.2.72              h2bc3f7f_0    nvidia
cudf                      22.02.00a211220 cuda_11_py38_ga4dc42d4c6_206    rapidsai-nightly
cudf_kafka                22.02.00a211220 py38_ga4dc42d4c6_206    rapidsai-nightly
cugraph                   22.02.00a211220 cuda11_py38_gf80313ec_58    rapidsai-nightly
cuml                      22.02.00a211220 cuda11_py38_g03132e811_83    rapidsai-nightly
cupy                      9.6.0            py38h177b0fd_0    conda-forge
curl                      7.80.0               h2574ce0_0    conda-forge
cusignal                  22.02.00a211220 py37_g6a02566_9    rapidsai-nightly
cuspatial                 22.02.00a211220 py38_gae17e55_13    rapidsai-nightly
custreamz                 22.02.00a211220 py38_ga4dc42d4c6_206    rapidsai-nightly
cuxfilter                 22.02.00a211220 py38_g1b76aa8_6    rapidsai-nightly
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
cyrus-sasl                2.1.27               h230043b_5    conda-forge
cytoolz                   0.11.2           py38h497a2fe_1    conda-forge
dask                      2021.11.2          pyhd8ed1ab_0    conda-forge
dask-core                 2021.11.2          pyhd8ed1ab_0    conda-forge
dask-cuda                 22.02.00a211220         py38_43    rapidsai-nightly
dask-cudf                 22.02.00a211220 cuda_11_py38_ga4dc42d4c6_206    rapidsai-nightly
dask-sql                  2021.12.0        py38h578d9bd_0    conda-forge
datashader                0.11.1             pyh9f0ad1d_0    conda-forge
datashape                 0.5.4                      py_1    conda-forge
debugpy                   1.5.1            py38h709712a_0    conda-forge
decorator                 5.1.0              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
distributed               2021.11.2        py38h578d9bd_0    conda-forge
dlpack                    0.5                  h9c3ff4c_0    conda-forge
entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
expat                     2.4.1                h9c3ff4c_0    conda-forge
faiss-proc                1.0.0                      cuda    conda-forge
fastapi                   0.70.1             pyhd8ed1ab_0    conda-forge
fastavro                  1.4.7            py38h497a2fe_1    conda-forge
fastrlock                 0.8              py38h709712a_1    conda-forge
fiona                     1.8.20           py38hbb147eb_2    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.28.5           py38h497a2fe_0    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
freexl                    1.0.6                h7f98852_0    conda-forge
frozenlist                1.2.0            py38h497a2fe_1    conda-forge
fsspec                    2021.11.1          pyhd8ed1ab_0    conda-forge
gcsfs                     2021.11.1          pyhd8ed1ab_0    conda-forge
gdal                      3.3.2            py38h81a01a0_3    conda-forge
geopandas                 0.9.0              pyhd8ed1ab_1    conda-forge
geopandas-base            0.9.0              pyhd8ed1ab_1    conda-forge
geos                      3.9.1                h9c3ff4c_2    conda-forge
geotiff                   1.7.0                h08e826d_2    conda-forge
gettext                   0.19.8.1          h73d1719_1008    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
glog                      0.5.0                h48cff8f_0    conda-forge
google-api-core           2.3.2              pyhd8ed1ab_0    conda-forge
google-auth               2.3.3              pyh6c4a22f_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-cloud-core         2.2.1              pyh6c4a22f_0    conda-forge
google-cloud-storage      1.43.0             pyh6c4a22f_0    conda-forge
google-crc32c             1.1.2            py38h8838a9a_2    conda-forge
google-resumable-media    2.1.0              pyh6c4a22f_0    conda-forge
googleapis-common-protos  1.53.0           py38h578d9bd_1    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
grpc-cpp                  1.42.0               ha1441d3_1    conda-forge
grpcio                    1.43.0           py38hdd6454d_0    conda-forge
h11                       0.12.0             pyhd8ed1ab_0    conda-forge
harfbuzz                  2.9.1                h83ec7ef_1    conda-forge
hdf4                      4.2.15               h10796ff_3    conda-forge
hdf5                      1.12.1          nompi_h2750804_103    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
icu                       68.2                 h9c3ff4c_0    conda-forge
idna                      3.1                pyhd3deb0d_0    conda-forge
imagecodecs               2021.8.26        py38hb5ce8f7_1    conda-forge
imageio                   2.13.3             pyh239f2a4_0    conda-forge
importlib-metadata        4.10.0           py38h578d9bd_0    conda-forge
importlib_metadata        4.10.0               hd8ed1ab_0    conda-forge
importlib_resources       5.4.0              pyhd8ed1ab_0    conda-forge
ipykernel                 6.6.0            py38he5a9106_0    conda-forge
ipython                   7.30.1           py38h578d9bd_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.6.5              pyhd8ed1ab_0    conda-forge
jbig                      2.1               h7f98852_2003    conda-forge
jedi                      0.18.1           py38h578d9bd_0    conda-forge
jinja2                    3.0.3              pyhd8ed1ab_0    conda-forge
joblib                    1.1.0              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
jpype1                    1.3.0            py38h1fd1430_2    conda-forge
json-c                    0.15                 h98cffda_0    conda-forge
json5                     0.9.5              pyh9f0ad1d_0    conda-forge
jsonschema                4.3.1              pyhd8ed1ab_0    conda-forge
jupyter-server-proxy      3.2.0              pyhd8ed1ab_0    conda-forge
jupyter_client            7.1.0              pyhd8ed1ab_0    conda-forge
jupyter_core              4.9.1            py38h578d9bd_1    conda-forge
jupyter_server            1.13.1             pyhd8ed1ab_0    conda-forge
jupyterlab                3.3.1              pyhd8ed1ab_0    conda-forge
jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
jupyterlab_server         2.10.3             pyhd8ed1ab_0    conda-forge
jupyterlab_widgets        1.0.2              pyhd8ed1ab_0    conda-forge
jxrlib                    1.1                  h7f98852_2    conda-forge
kealib                    1.4.14               h87e4c3c_3    conda-forge
kiwisolver                1.3.2            py38h1fd1430_1    conda-forge
krb5                      1.19.2               hcc1bbae_3    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      3.0                  h9c3ff4c_0    conda-forge
libaec                    1.0.6                h9c3ff4c_0    conda-forge
libblas                   3.9.0           12_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h7f98852_6    conda-forge
libbrotlidec              1.0.9                h7f98852_6    conda-forge
libbrotlienc              1.0.9                h7f98852_6    conda-forge
libcblas                  3.9.0           12_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcucim                  22.02.00a211220 cuda11_gab8e6a4_31    rapidsai-nightly
libcudf                   22.02.00a211220 cuda11_ga4dc42d4c6_206    rapidsai-nightly
libcudf_kafka             22.02.00a211220 ga4dc42d4c6_206    rapidsai-nightly
libcugraph                22.02.00a211220 cuda11_gf80313ec_58    rapidsai-nightly
libcuml                   22.02.00a211220 cuda11_g03132e811_83    rapidsai-nightly
libcumlprims              22.02.00a211213 cuda11_g2dcab39_12    rapidsai-nightly
libcurl                   7.80.0               h2574ce0_0    conda-forge
libcusolver               11.3.2.107           hc875929_0    nvidia
libcuspatial              22.02.00a211220 cuda11_gae17e55_13    rapidsai-nightly
libdap4                   3.20.6               hd7c4107_2    conda-forge
libdeflate                1.8                  h7f98852_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h9b69904_4    conda-forge
libfaiss                  1.7.0           cuda112h5bea7ad_8_cuda    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 11.2.0              h1d223b6_11    conda-forge
libgcrypt                 1.9.4                h7f98852_0    conda-forge
libgdal                   3.3.2                h6acdded_3    conda-forge
libgfortran-ng            11.2.0              h69a702a_11    conda-forge
libgfortran5              11.2.0              h5c6108e_11    conda-forge
libglib                   2.70.2               h174f98d_0    conda-forge
libgomp                   11.2.0              h1d223b6_11    conda-forge
libgpg-error              1.42                 h9c3ff4c_0    conda-forge
libgsasl                  1.10.0               h5b4c23d_0    conda-forge
libhwloc                  2.3.0                h5e5b7d1_1    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libkml                    1.3.0             h238a007_1014    conda-forge
liblapack                 3.9.0           12_linux64_openblas    conda-forge
libllvm11                 11.1.0               hf817b99_2    conda-forge
libnetcdf                 4.8.1           nompi_hb3fd0d9_101    conda-forge
libnghttp2                1.43.0               h812cca2_1    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libntlm                   1.4               h7f98852_1002    conda-forge
libopenblas               0.3.18          pthreads_h8fe5266_0    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     13.5                 hd57d9b9_1    conda-forge
libprotobuf               3.19.1               h780b84a_0    conda-forge
librdkafka                1.6.1                hc49e61c_1    conda-forge
librmm                    22.02.00a211220 cuda11_g846d638_23    rapidsai-nightly
librttopo                 1.1.0                h1185371_6    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libspatialindex           1.9.3                h9c3ff4c_4    conda-forge
libspatialite             5.0.1                h5cf074c_8    conda-forge
libssh2                   1.10.0               ha56f1ee_2    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_11    conda-forge
libthrift                 0.15.0               he6d91bd_1    conda-forge
libtiff                   4.3.0                h6f004c6_2    conda-forge
libutf8proc               2.7.0                h7f98852_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libuv                     1.42.0               h7f98852_0    conda-forge
libwebp                   1.2.1                h3452ae3_0    conda-forge
libwebp-base              1.2.1                h7f98852_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxgboost                1.5.0dev.rapidsai22.02      cuda11.2_0    rapidsai-nightly
libxml2                   2.9.12               h72842e0_0    conda-forge
libzip                    1.8.0                h4de3113_1    conda-forge
libzlib                   1.2.11            h36c2ea0_1013    conda-forge
libzopfli                 1.0.3                h9c3ff4c_0    conda-forge
llvmlite                  0.37.0           py38h4630a5e_1    conda-forge
locket                    0.2.0                      py_2    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
mapclassify               2.4.3              pyhd8ed1ab_0    conda-forge
markdown                  3.3.6              pyhd8ed1ab_0    conda-forge
markupsafe                2.0.1            py38h497a2fe_1    conda-forge
matplotlib-base           3.5.1            py38hf4fb855_0    conda-forge
matplotlib-inline         0.1.3              pyhd8ed1ab_0    conda-forge
mistune                   0.8.4           py38h497a2fe_1005    conda-forge
msgpack-python            1.0.3            py38h1fd1430_0    conda-forge
multidict                 5.2.0            py38h497a2fe_1    conda-forge
multipledispatch          0.6.0                      py_0    conda-forge
munch                     2.5.0                      py_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
nbclassic                 0.3.6              pyhd8ed1ab_0    conda-forge
nbclient                  0.5.9              pyhd8ed1ab_0    conda-forge
nbconvert                 6.3.0            py38h578d9bd_1    conda-forge
nbformat                  5.1.3              pyhd8ed1ab_0    conda-forge
nccl                      2.11.4.1             hdc17891_0    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
nest-asyncio              1.5.4              pyhd8ed1ab_0    conda-forge
networkx                  2.6.3              pyhd8ed1ab_1    conda-forge
nodejs                    14.17.4              h92b4a50_0    conda-forge
notebook                  6.4.6              pyha770c72_0    conda-forge
notebook-shim             0.1.0              pyhd8ed1ab_0    conda-forge
nspr                      4.32                 h9c3ff4c_1    conda-forge
nss                       3.73                 hb5efdd6_0    conda-forge
numba                     0.54.1           py38h4bf6c61_0    conda-forge
numpy                     1.20.3           py38h9894fe3_1    conda-forge
nvtx                      0.2.3            py38h497a2fe_1    conda-forge
oauthlib                  3.1.1              pyhd8ed1ab_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openjdk                   11.0.9.1             h5cc2fde_1    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1l               h7f98852_0    conda-forge
orc                       1.7.1                h1be678f_1    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.3.5            py38h43a58ef_0    conda-forge
pandoc                    2.16.2               h7f98852_0    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
panel                     0.12.4             pyhd8ed1ab_0    conda-forge
param                     1.12.0             pyh6c4a22f_0    conda-forge
parquet-cpp               1.5.1                         2    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
partd                     1.2.0              pyhd8ed1ab_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    8.4.0            py38h8e6f84c_0    conda-forge
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pooch                     1.5.2              pyhd8ed1ab_0    conda-forge
poppler                   21.09.0              ha39eefc_3    conda-forge
poppler-data              0.4.11               hd8ed1ab_0    conda-forge
postgresql                13.5                 h2510834_1    conda-forge
proj                      8.1.0                h277dcde_1    conda-forge
prometheus_client         0.12.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.24             pyha770c72_0    conda-forge
protobuf                  3.19.1           py38h709712a_1    conda-forge
psutil                    5.8.0            py38h497a2fe_2    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptxcompiler               0.2.0            py38hb739d79_0    rapidsai-nightly
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
py-xgboost                1.5.0dev.rapidsai22.02  cuda11.2py38_0    rapidsai-nightly
pyarrow                   5.0.0           py38ha746e9d_20_cuda    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyct                      0.4.6                      py_0    conda-forge
pyct-core                 0.4.6                      py_0    conda-forge
pydantic                  1.8.2            py38h497a2fe_2    conda-forge
pydeck                    0.5.0              pyh9f0ad1d_0    conda-forge
pyee                      8.1.0              pyh9f0ad1d_0    conda-forge
pygments                  2.10.0             pyhd8ed1ab_0    conda-forge
pyjwt                     2.3.0              pyhd8ed1ab_1    conda-forge
pynvml                    11.4.1             pyhd8ed1ab_0    conda-forge
pyopenssl                 21.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.6              pyhd8ed1ab_0    conda-forge
pyppeteer                 0.2.6              pyhd8ed1ab_0    conda-forge
pyproj                    3.1.0            py38h3701b11_4    conda-forge
pyrsistent                0.18.0           py38h497a2fe_0    conda-forge
pysocks                   1.7.1            py38h578d9bd_4    conda-forge
python                    3.8.12          hb7a2778_2_cpython    conda-forge
python-confluent-kafka    1.6.0            py38h497a2fe_1    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-tzdata             2021.5             pyhd8ed1ab_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
pytz-deprecation-shim     0.1.0.post0      py38h578d9bd_1    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
pyviz_comms               2.1.0              pyhd8ed1ab_0    conda-forge
pywavelets                1.2.0            py38h6c62de6_1    conda-forge
pyyaml                    6.0              py38h497a2fe_3    conda-forge
pyzmq                     22.3.0           py38h2035c66_1    conda-forge
rapids                    22.02.00a211220 cuda11.2_py38_gae421a4_92    rapidsai-nightly
rapids-xgboost            22.02.00a211220 cuda11.2_py38_gae421a4_92    rapidsai-nightly
re2                       2021.11.01           h9c3ff4c_0    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
requests                  2.26.0             pyhd8ed1ab_1    conda-forge
requests-oauthlib         1.3.0              pyh9f0ad1d_0    conda-forge
rmm                       22.02.00a211219 cuda11_py38_g846d638_23_has_cma    rapidsai-nightly
rsa                       4.8                pyhd8ed1ab_0    conda-forge
rtree                     0.9.7            py38h02d302b_3    conda-forge
s2n                       1.3.0                h9b69904_0    conda-forge
scikit-image              0.18.1           py38h51da96c_0    conda-forge
scikit-learn              1.0.1            py38h1561384_3    conda-forge
scipy                     1.7.3            py38h56a6a73_0    conda-forge
send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
setuptools                59.6.0           py38h578d9bd_0    conda-forge
shapely                   1.8.0            py38hb7fe4a8_0    conda-forge
simpervisor               0.4                pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.8                he1b5a44_3    conda-forge
sniffio                   1.2.0            py38h578d9bd_2    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
spdlog                    1.8.5                h4bd325d_0    conda-forge
sqlite                    3.37.0               h9cd32fc_0    conda-forge
starlette                 0.16.0             pyhd8ed1ab_0    conda-forge
streamz                   0.6.3              pyh6c4a22f_0    conda-forge
tabulate                  0.8.9              pyhd8ed1ab_0    conda-forge
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
terminado                 0.12.1           py38h578d9bd_1    conda-forge
testpath                  0.5.0              pyhd8ed1ab_0    conda-forge
threadpoolctl             3.0.0              pyh8a188c0_0    conda-forge
tifffile                  2021.11.2          pyhd8ed1ab_0    conda-forge
tiledb                    2.3.4                he87e0bf_0    conda-forge
tk                        8.6.11               h27826a3_1    conda-forge
toolz                     0.11.2             pyhd8ed1ab_0    conda-forge
tornado                   6.1              py38h497a2fe_2    conda-forge
tqdm                      4.62.3             pyhd8ed1ab_0    conda-forge
traitlets                 5.1.1              pyhd8ed1ab_0    conda-forge
treelite                  2.1.0            py38hdd725b4_0    conda-forge
treelite-runtime          2.1.0                    pypi_0    pypi
typing-extensions         4.0.1                hd8ed1ab_0    conda-forge
typing_extensions         4.0.1              pyha770c72_0    conda-forge
tzcode                    2021e                h7f98852_0    conda-forge
tzdata                    2021e                he74cb21_0    conda-forge
tzlocal                   4.1              py38h578d9bd_1    conda-forge
ucx                       1.11.2+gef2bbcf      cuda11.2_0    rapidsai-nightly
ucx-proc                  1.0.0                       gpu    rapidsai-nightly
ucx-py                    0.24.0a211220   py38_gef2bbcf_18    rapidsai-nightly
unicodedata2              13.0.0.post2     py38h497a2fe_4    conda-forge
urllib3                   1.26.7             pyhd8ed1ab_0    conda-forge
uvicorn                   0.16.0           py38h578d9bd_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
websocket-client          1.2.3              pyhd8ed1ab_0    conda-forge
websockets                9.1              py38h497a2fe_0    conda-forge
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
widgetsnbextension        3.5.2            py38h578d9bd_1    conda-forge
xarray                    0.20.2             pyhd8ed1ab_0    conda-forge
xerces-c                  3.2.3                h9d8b166_3    conda-forge
xgboost                   1.5.0dev.rapidsai22.02  cuda11.2py38_0    rapidsai-nightly
xorg-fixesproto           5.0               h7f98852_1002    conda-forge
xorg-inputproto           2.3.2             h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxfixes            5.0.3             h7f98852_1004    conda-forge
xorg-libxi                1.7.10               h7f98852_0    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-libxtst              1.2.3             h7f98852_1002    conda-forge
xorg-recordproto          1.14.2            h7f98852_1002    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
yarl                      1.7.2            py38h497a2fe_1    conda-forge
zeromq                    4.3.4                h9c3ff4c_1    conda-forge
zfp                       0.5.5                h9c3ff4c_8    conda-forge
zict                      2.0.0                      py_0    conda-forge
zipp                      3.6.0              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11            h36c2ea0_1013    conda-forge
zstd                      1.5.0                ha95c52a_0    conda-forge

cuBERTtopic error: cuDF failure at: [...] Could not open vocab/voc_hash.txt

I've installed Rapids using
mamba create -n rapids-22.04 -c rapidsai -c nvidia -c conda-forge rapids=22.04 python=3.9 cudatoolkit=11.3 dask-sql --no-channel-priority

and then mamba install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

I've tried to follow the example at:
https://github.com/rapidsai/rapids-examples/blob/main/cuBERT_topic_modelling/berttopic_example.ipynb

After cloning, and pip install -e . cuBERTopic I ran:

from cuBERTopic import gpu_BERTopic
gpu_topic = gpu_BERTopic()
topics_gpu, probs_gpu = gpu_topic.fit_transform(docs)

The last line fails with:

RuntimeError Traceback (most recent call last)
/tmp/ipykernel_1502/660932292.py in
----> 1 topics_gpu, probs_gpu = gpu_topic.fit_transform(docs)

~/rapids-examples/cuBERT_topic_modelling/cuBERTopic.py in fit_transform(self, data)
204
205 # Extract embeddings
--> 206 embeddings = create_embeddings(
207 documents.Document, self.embedding_model, self.vocab_file
208 )

~/rapids-examples/cuBERT_topic_modelling/embedding_extraction.py in create_embeddings(sentences, embedding_model, vocab_file)
71 """
72
---> 73 cudf_tokenizer = SubwordTokenizer(vocab_file, do_lower_case=True)
74 batch_size = 256
75 pooling_output_ls = []

/opt/conda/envs/rapids-22.04/lib/python3.9/site-packages/cudf/core/subword_tokenizer.py in init(self, hash_file, do_lower_case)
53
54 self.do_lower_case = do_lower_case
---> 55 self.vocab_file = cpp_hashed_vocabulary(hash_file)
56
57 def call(

cudf/_lib/nvtext/subword_tokenize.pyx in cudf._lib.nvtext.subword_tokenize.Hashed_Vocabulary.cinit()

RuntimeError: cuDF failure at: /workspace/.conda-bld/work/cpp/src/text/subword/load_hash_file.cu:183: Could not open vocab/voc_hash.txt

string_udf example cleanup

Let's make the string_udf example simpler with Python bindings that work with cudf Python. Something like:

import cudf
df = cudf.DataFrame({'id': [0, 1, 2], 'val': ['abc', 'def', 'ghi']}

df['result'] = cpp_string_udf(df['val'])

instead of a CLI binary

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.