Comments (6)
The UMAP algorithm doesn't really have the concept of a cluster the way you are talking about here. You can read more here in the Python docs; notice what color is being mapped to (the label, not any output from UMAP). You need to cluster on top of the UMAP results, like with HDBSCAN in that example.
If you would like to get out cluster assignments from k-means, take a look at this article (you'll want to augment()
your original data points).
from embed.
The UMAP algorithm doesn't really have the concept of a cluster the way you are talking about here. You can read more here in the Python docs; notice what color is being mapped to (the label, not any output from UMAP). You need to cluster on top of the UMAP results, like with HDBSCAN in that example.
If you would like to get out cluster assignments from k-means, take a look at this article (you'll want to
augment()
your original data points).
Thanks, really nice to know about augment()
.
Do you happen to know an alternative that's not distance based like KNN? With more sample size, KNN would not really work.
from embed.
Have you taken a look at something like mclust? Or this Stack Overflow answer outlines some nice options.
from embed.
UMAP is sort of distance based, just on a complex manifold.
I think that the main problem is defining the membership function. With classical clustering methods, we would look at the distance to each class centroid. For UMAP the notions of distance and centroid are not well defined.
from embed.
It also might be worth it to look at https://tidyclust.tidymodels.org/, which is our package for dealing with clustering problem.
I'm going to close this issue for now. If you have any further problem/questions/praise feel free to open another issue!
from embed.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
from embed.
Related Issues (20)
- Create groupings in the reference of pkgdown HOT 1
- Poisson models fail for likelihood encodings HOT 2
- Release embed 0.2.0 HOT 1
- step_umap crashing Rstudio HOT 18
- catboost method to embed categorical variables HOT 11
- Release embed 1.0.0 HOT 1
- step_woe errors uninformatively if outcome isn't a factor HOT 2
- Allow step_collapse_stringdist to accept different distance methods HOT 2
- Metrice argument for step_umap function HOT 2
- Custom metric for step_umap HOT 2
- Upkeep for embed HOT 1
- remove tidyr_new_interface() check HOT 1
- Test that all tunable.step_*() are specified correctly HOT 1
- Use rlang errors HOT 1
- step_embed() should have `keep_original_cols` argument HOT 1
- Release embed 1.1.0 HOT 1
- Add missing infrastructure tests HOT 1
- Release embed 1.1.1 HOT 1
- new parameters for step_lencode_glm
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from embed.