behrouz-babaki / minsizekmeans Goto Github PK
View Code? Open in Web Editor NEWA python implementation of KMeans clustering with minimum cluster size constraint (Bradley et al., 2000)
License: GNU General Public License v3.0
A python implementation of KMeans clustering with minimum cluster size constraint (Bradley et al., 2000)
License: GNU General Public License v3.0
Hi,
How should I run on a CSV with multiple columns?
ex: I want to cluster data from a csv that contains:
Thanks!
Hi,
I implemented the algorithm as a scikit classifier module so that is possible to place in scikit pipeline. Should I make a commit?
Hi all,
I cloned this repo and made a fresh install of Anaconda (ver. 4.7.12), created a new environment using the following command:
conda create -n py35 python=3.5 anaconda
Followed by activating py35
conda activate py35
After making this environment, proceeded to run the code, throwing the following error:
Requested Python version (3.5) is not installed
Even though I confirm that the python version is correct:
python --version
Python 3.5.5
It won't run. Any advice would be great. I am currently running on Windows 7, and this is my full conda info list:
active environment : py35
active env location : C:\Users\Doug\Anaconda3\envs\py35
shell level : 2
user config file : C:\Users\Doug\.condarc
populated config files : C:\Users\Doug\.condarc
conda version : 4.7.12
conda-build version : 3.18.9
python version : 3.7.4.final.0
virtual packages :
base environment : C:\Users\Doug\Anaconda3 (writable)
channel URLs : https://conda.anaconda.org/conda-forge/win-64
https://conda.anaconda.org/conda-forge/noarch
https://repo.anaconda.com/pkgs/main/win-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/win-64
https://repo.anaconda.com/pkgs/r/noarch
https://repo.anaconda.com/pkgs/msys2/win-64
https://repo.anaconda.com/pkgs/msys2/noarch
package cache : C:\Users\Doug\Anaconda3\pkgs
C:\Users\Doug\.conda\pkgs
C:\Users\Doug\AppData\Local\conda\conda\pkgs
envs directories : C:\Users\Doug\Anaconda3\envs
C:\Users\Doug\.conda\envs
C:\Users\Doug\AppData\Local\conda\conda\envs
platform : win-64
user-agent : conda/4.7.12 requests/2.22.0 CPython/3.7.4 Windows/7 W
ows/6.1.7601
administrator : False
netrc file : None
offline mode : False
best = None best_clusters = None for i in range(args.NUM_ITER): clusters, centers = minsize_kmeans(data, args.k, args.min_size, args.max_size) if clusters: quality = compute_quality(data, clusters) if not best or (quality < best): best = quality best_clusters = clusters if best: if args.OUTFILE: with open(args.OUTFILE, 'w') as f: print('\n'.join(str(i) for i in clusters), file=f) else: print('cluster assignments:') for i in range(len(clusters)): print('%d: %d'%(i, clusters[i])) print('sum of squared distances: %.4f'%(best)) else: print('no clustering found')
Just a question, maybe I am wrong, but if you found a best cluster configuration, should not the variable best_clusters written to the file instead of the last cluster configuration? Regards Lukas
I have a dataset consisting of 1.3 million 256-dimensional vectors. Should I give it a try or is it a total waste of time?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.