Comments (5)
@beckernick I am not quite sure if it works with spectral initialization, could you try using init="random"
?
from cuml.
The UMAP docstring indicates that random_state
can't provide exact determinism but should provide consistency up to about 3 digits of precision.
@dantegd , possible we have a bug or the documentation is wrong?
import cuml
from sklearn.datasets import make_blobs
N = 1000
X, y = make_blobs(
n_samples=N
)
NREP = 3
for i in range(NREP):
reducer = cuml.manifold.umap.UMAP(
random_state=12
)
X_t = reducer.fit_transform(X)
print(reducer.random_state)
print(X_t[:5])
print()
662124363
[[-2.5505848 -0.63661003]
[-5.3669243 -0.07881355]
[-4.428316 1.4433041 ]
[-0.9989338 10.929661 ]
[ 6.8667793 -9.262173 ]]
662124363
[[ -1.9667425 -2.6903896 ]
[ -3.396501 -0.25006104]
[ -1.6785622 0.13145828]
[ 3.3643045 11.314904 ]
[ -2.0715647 -11.898888 ]]
662124363
[[ 0.3823166 2.5653324 ]
[ 0.5335636 -0.0426445 ]
[ 2.2950068 0.81112003]
[ -7.4286957 10.400803 ]
[ 8.3242235 -10.5068655 ]]
from cuml.
Dear cuml team,
Another cuml-related issue has just popped up:
I need to know topic distribution of each document so I follow BERTopic instructions to implement approximate_distribution, but it returns with a ndarray containing nothing but 0s.
I have just realized that this issue may be due to cuml.
approximate_distribution can generate topic distribution if I use
from umap import UMAP
from hdbscan import HDBSCAN
But approximate_distribution returns with only 0s if I use
from cuml.cluster import HDBSCAN
from cuml.manifold import UMAP
Any help or advice is much appreciated!
from cuml.
That looks like a bug to me. Oddly, oddly we also have python tests for the reproducibility and those appear to be passing...
Victor's got a good point- it's very possible the spectral embedding is not honoring the random state and that's why we are using random init in the pytests.
from cuml.
Looks like that's the bug:
import cuml
from sklearn.datasets import make_blobs
N = 1000
X, y = make_blobs(
n_samples=N
)
NREP = 3
for i in range(NREP):
reducer = cuml.manifold.umap.UMAP(
random_state=12,
init="random"
)
X_t = reducer.fit_transform(X)
print(reducer.random_state)
print(X_t[:5])
print()
662124363
[[ -4.766629 8.464443 ]
[ 8.891461 1.2006083]
[ -7.211566 -7.8680773]
[ -5.811491 -12.208349 ]
[ -6.8120937 7.2288113]]
662124363
[[ -4.766629 8.464443 ]
[ 8.891461 1.2006083]
[ -7.211566 -7.8680773]
[ -5.811491 -12.208349 ]
[ -6.8120937 7.2288113]]
662124363
[[ -4.766629 8.464443 ]
[ 8.891461 1.2006083]
[ -7.211566 -7.8680773]
[ -5.811491 -12.208349 ]
[ -6.8120937 7.2288113]]
from cuml.
Related Issues (20)
- [BUG] UserWarning: Error getting driver and runtime versions: HOT 1
- [BUG] Devcontainer 11.8 image base doesn't exist HOT 1
- Why cuml=24.04 cannot be found? HOT 5
- [QST] Version matching problem about python3.7 HOT 3
- Getting all cuml tests to pass with cudf.pandas enabled HOT 3
- Replace deprecated `cupyx.scatter_max` and `cupyx.scatter_add` with `cupy.maximum.at` and `cupy.add.at`
- Address scikit-learn FutureWarnings and DeprecationWarnings from 1.x
- [BUG] Dask PCA HOT 2
- [BUG] Random Forest issue with more than 2 models with different criterion settings HOT 2
- [FEA] IVIS HOT 1
- [QST] Reset GPU to release memory resources HOT 2
- [FEA] Sparse PCA in C++ Layer HOT 1
- [BUG] Dask + UMAP does not work with numpy array. HOT 1
- [DOC]
- Request for MLPClassifier HOT 1
- How to get the medoids from HDBSCAN? [QST] HOT 1
- CUML models not working with textattack library HOT 4
- [BUG] Linear SVC fitting with cuDF inputs causes AttributeError HOT 2
- [BUG] SVC fit_proba doesn't seem to be using Class Weights
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cuml.