Comments (4)
As the doc says:
If metric is "precomputed", X is assumed to be a distance matrix and must be square.
Distance matrix, means the matrix has 0 on the diagonal. You matrix is a similarity matrix I am guessing this is why you find points with no neighbors.
I am going to close the issue, since at this point I feel this is more likely to be a scikit-learn usage question rather than a bug in scikit-learn.
from scikit-learn.
Thanks for opening an issue! I think the documentation is right but I have to admit, I am certainly not a DBSCAN expert.
I took min_samples=2 and the points with one neighbor did not become the core
Have you tried playing with eps
? If you can provide a snippet of code showing this issue, this would be great so that a maintainer can have a closer look.
from scikit-learn.
@lesteve
I use matrix of distances in input data
from sklearn.neighbors import NearestNeighbors
from sklearn.cluster import DBSCAN
DBSCAN_model = DBSCAN(eps=100, min_samples=2, metric='precomputed', algorithm='brute')
similarity_matrix = [
[1e10, 2500.966630, 572.004568, 2571.116203, 2637.008209, 2378.405924, 244.336929, 288.477526, 339.468194],
[2500.966630, 1e10, 2437.781596, 70.149573, 1024.025578, 765.423293, 2256.629701, 2262.386342, 2205.245221],
[572.004568, 2437.781596, 1e10, 2507.931168, 2573.823175, 2315.220890, 327.667640, 333.424280, 232.536374],
[2571.116203, 70.149573, 2507.931168, 1e10, 1094.175151, 835.572866, 2326.779274, 2332.535914, 2275.394794],
[2637.008209, 1024.025578, 2573.823175, 1094.175151, 1e10, 258.602285, 2392.671280, 2398.427921, 2341.286801],
[2378.405924, 765.423293, 2315.220890, 835.572866, 258.602285, 1e10, 2134.068995, 2139.825636, 2082.684515],
[244.336929, 2256.629701, 327.667640, 2326.779274, 2392.671280, 2134.068995, 1e10, 44.140597, 95.131265],
[288.477526, 2262.386342, 333.424280, 2332.535914, 2398.427921, 2139.825636, 44.140597, 1e10, 100.887906],
[339.468194, 2205.245221, 232.536374, 2275.394794, 2341.286801, 2082.684515, 95.131265, 100.887906, 1e10]
]
# Create DataFrame
df_similarity = pd.DataFrame(similarity_matrix, columns=range(1, 10), index=range(1, 10))
df_similarity_numpy = df_similarity.to_numpy()
neighbors_model = NearestNeighbors(
radius=DBSCAN_model.eps,
algorithm=DBSCAN_model.algorithm,
leaf_size=DBSCAN_model.leaf_size,
metric=DBSCAN_model.metric,
metric_params=DBSCAN_model.metric_params,
p=DBSCAN_model.p,
n_jobs=DBSCAN_model.n_jobs,
)
neighbors_model.fit(df_similarity_numpy)
# This has worst case O(n^2) memory complexity
neighborhoods = neighbors_model.radius_neighbors(df_similarity_numpy, return_distance=False)
This code return neighborhoods as array in output we see:
array([array([], dtype=int64), array([3]), array([], dtype=int64),
array([1]), array([], dtype=int64), array([], dtype=int64),
array([7, 8]), array([6]), array([6])], dtype=object)
only on point has more then 1 neighbor ( array([7, 8])). I remind you that we have specified 2 neighbors "including the point in question" in order for it to become the core.
then we run this:
n_neighbors = np.array([len(neighbors) for neighbors in neighborhoods])
core_samples = np.asarray(n_neighbors >= 2, dtype=np.uint8)
core_samples
in output:
array([0, 0, 0, 0, 0, 0, 1, 0, 0], dtype=uint8)
only one point which have 2 neighbors except itself become a core point
from scikit-learn.
@lesteve Thank you very much for the prompt response, now I understand)
from scikit-learn.
Related Issues (20)
- Consider bumping C standard in meson.build from C99 to C17 HOT 2
- Add support for Python 3.13 free-threaded build HOT 14
- Documentation says scikit-learn latest versions still supports Python 3.8 HOT 2
- Add zero_division for single class prediction in MCC HOT 2
- Saving and loading calibratedclassifierCV model (ensemble) HOT 3
- What about negative coefficients / feature weights? HOT 10
- MemoryLeak in `LogisticRession` HOT 4
- StratifiedShuffleSplit requires three copies of a lower class, rather than 2 HOT 2
- Add "scoring" argument to ``score`` HOT 20
- Enhancement: Add Summary Output for Linear Regression Models HOT 2
- KFold(n_samples=n) not equivalent to LeaveOneOut() cv in CalibratedClassifierCV() HOT 3
- ⚠️ CI failed on Wheel builder (last failure: May 13, 2024) ⚠️
- Incorrect documented output shape for `predict` method of linear models when `n_targets` > 1 HOT 2
- Pyodide build broken by updating meson.build to C17 HOT 3
- MultiOutputClassifier does not rely on estimator to provide pairwise tag HOT 2
- Using decision boundary display to plot the relationship between any 2 features if model is fitted to more than 2 features HOT 1
- TunedThreasholdClassifierCV failing inside a SearchCV object HOT 1
- DOC Investigate scipy-doctest for better doctests
- Issue with int32/int64 dtype with NumPy 2.0
- Improve `FunctionTransformer` diagram representation HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-learn.