Comments (3)
from hdbscan.
Thanks, it seems promising. I will look into that.
In the meantime, I have found a workaround:
I cluster all the points together as usual. Then, for each short sentence, I compute the average distance from each cluster (excluding short sentences) and reassign if required.
This seems to solve the problem on the current dataset.
from hdbscan.
In case your are interested, HDBSCAN works wonderfully for clustering speakers in a diarisation project:
https://github.com/mirix/approaches-to-diarisation
I am really impressed. The challenge now would be to come up with some heuristics or ML to guess the optimal parameters automatically.
from hdbscan.
Related Issues (20)
- Unable to install BERTopic due to fail in building HDBSCAN on Kaggle Notebooks. HOT 1
- While doing BERTopic modeling in hdbscan clustering step i am getting error as numpy.float64 cannot be interpreted as an integer HOT 5
- Is there a bug for "label up to the root" ?
- Validation questions HOT 1
- pypi version throws ValueError HOT 27
- TypeError encountered HOT 2
- Getting Error while using HDBSCAN HOT 1
- Clustering struggles with mix of noise levels HOT 1
- HDBSCAN version 0.8.33 not able to install with python version 3.10.13 HOT 2
- Tests failed with: No module named 'hdbscan._hdbscan_linkage'
- Request for Adding `__version__` Attribute HOT 1
- Request for `verbose` setting
- max_cluster_size parameter does not work
- ip
- Question regarding sparse matrices
- Crash when points are equal HOT 1
- Way to obtain the lambda value HOT 1
- requirements prevent cython>=3 HOT 1
- How to set cluster_selection_epsilon when using cosine distances?
- Outlier scores - possible bug in GLOSH computation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hdbscan.