GithubHelp home page GithubHelp logo

FR: For each of the UMAP clusters, information/ID on values (from which columns) assigned to which UMAP clusters would be nice about embed HOT 6 CLOSED

exsell-jc avatar exsell-jc commented on July 19, 2024
FR: For each of the UMAP clusters, information/ID on values (from which columns) assigned to which UMAP clusters would be nice

from embed.

Comments (6)

juliasilge avatar juliasilge commented on July 19, 2024 1

The UMAP algorithm doesn't really have the concept of a cluster the way you are talking about here. You can read more here in the Python docs; notice what color is being mapped to (the label, not any output from UMAP). You need to cluster on top of the UMAP results, like with HDBSCAN in that example.

If you would like to get out cluster assignments from k-means, take a look at this article (you'll want to augment() your original data points).

from embed.

exsell-jc avatar exsell-jc commented on July 19, 2024

The UMAP algorithm doesn't really have the concept of a cluster the way you are talking about here. You can read more here in the Python docs; notice what color is being mapped to (the label, not any output from UMAP). You need to cluster on top of the UMAP results, like with HDBSCAN in that example.

If you would like to get out cluster assignments from k-means, take a look at this article (you'll want to augment() your original data points).

Thanks, really nice to know about augment().

Do you happen to know an alternative that's not distance based like KNN? With more sample size, KNN would not really work.

from embed.

juliasilge avatar juliasilge commented on July 19, 2024

Have you taken a look at something like mclust? Or this Stack Overflow answer outlines some nice options.

from embed.

topepo avatar topepo commented on July 19, 2024

UMAP is sort of distance based, just on a complex manifold.

I think that the main problem is defining the membership function. With classical clustering methods, we would look at the distance to each class centroid. For UMAP the notions of distance and centroid are not well defined.

from embed.

EmilHvitfeldt avatar EmilHvitfeldt commented on July 19, 2024

It also might be worth it to look at https://tidyclust.tidymodels.org/, which is our package for dealing with clustering problem.

I'm going to close this issue for now. If you have any further problem/questions/praise feel free to open another issue!

from embed.

github-actions avatar github-actions commented on July 19, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

from embed.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.