fcomitani / simpsom Goto Github PK
View Code? Open in Web Editor NEWPython library for Self-Organizing Maps
License: GNU General Public License v3.0
Python library for Self-Organizing Maps
License: GNU General Public License v3.0
I tried to clone the simpsom repsitory, but I got errors because of too long filenames in the tests/ground_truth/ folder.
I'm working with an encrypted home folder, for which the maximum filename size to 143 characters.
As a workaround I have forked the simpsom repository, cloned it to an unencrypted folder, deleted the ground_truth files and pushed id back to my forked repository. Than I was able to clone the repository to an encrypted folder.
Here are the errors when cloning:
me@comp:~/tmp$ git clone https://github.com/fcomitani/simpsom.git
Klone nach 'simpsom' …
remote: Enumerating objects: 1414, done.
remote: Counting objects: 100% (217/217), done.
remote: Compressing objects: 100% (85/85), done.
remote: Total 1414 (delta 154), reused 165 (delta 130), pack-reused 1197
Empfange Objekte: 100% (1414/1414), 34.45 MiB | 15.38 MiB/s, fertig.
Löse Unterschiede auf: 100% (733/733), fertig.
error: unable to create file tests/ground_truth/som_clusters_29793102587501614832761962817496381948754187354139510906297685347436709277089543135748392401212717654282462446534086383947023366030253119052733995242201873785323240739807453510.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_clusters_342008044776077809952382379377725389002657731782690276593972840565627274789710108763094526877518194116991217137317838936203404537258310.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_clusters_342008044776173808293820246848172717185750701891546008523394046430801193894526052825594273532638348990588789776175654503162559421243718.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_clusters_6990909199294242919332708782990481955635203273555594435047935561554494072389185531750095764725676280630374101219996805343174917845436987668653086546229359531757587021810123956550.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_clusters_80251701864081417831371931036709008182515405432296624554309851309220089166275703141148945731290811019842586449396221097154566387051290950.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_clusters_80251701864081513829713368904179455510698498402405480286239272515085263085380519085211445477945931174716184022035078912721525541935276358.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_29793102587501614832761962817496381948754187354139510906297685347436709277089543135748392401212717654282462446534086383947023366030253119052733995242201873785323240739807453510.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_342008044776077809952382379377725389002657731782690276593972840565627274789710108763094526877518194116991217137317838936203404537258310.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_342008044776173808293820246848172717185750701891546008523394046430801193894526052825594273532638348990588789776175654503162559421243718.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_6990909199294242919332708782990481955635203273555594435047935561554494072389185531750095764725676280630374101219996805343174917845436987668653086546229359531757587021810123956550.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_80251701864081417831371931036709008182515405432296624554309851309220089166275703141148945731290811019842586449396221097154566387051290950.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_80251701864081513829713368904179455510698498402405480286239272515085263085380519085211445477945931174716184022035078912721525541935276358.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1224543790650657313213549472991040535904219846983620675164315761268632417486417738025024652371253533972999132316224697404960185610836.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1224543790650657313213549472991040535904316229697346149481683466457230054787165196947898727118335156616273920126621428309896238752070.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1224543790650657313213549472991040535904316229697346149481683466457230054787165196947898727118335234824610301771193443025591640809798.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1335968924906787723998553143006053779321110122674337069737184616588407112905780651649079739801479596597171496942181683254327081984326.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1335968924906787723998553143006053779321110122674337069737184616588407112905780651649079739801479674805507878586753697970022484042054.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1346400136541297592615788464317873357020832176068851985856552095957119374614286985032559191109006013108952827969596804999741666077473546408059206.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_18685055399332539569298545425278328489749448348749094774846126728342169455610559414236305619044411867761030363303200024846688582.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_18685055399332539569298545425278328489749448348749094774846126728342169456274684723282236516895166603137453243205118281426690374.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_18685055399332539569298545425278328489750919032247099059558511844534599738236183038460742443149308498859269490750211552227123526.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_20385268019207576354958391464325771779568238790711341781361650770101963224748689817883955002660751240165972540340344160735556180.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_20544435677204858285763373635056853135622468717937993960553728025241249173361883988934768232047196198044068854409077774513251429182268531014.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_22413839222448957511672109767691453563286597887893850675787809544380889229156669641337366995873220311703611193242375012091195018934785696070.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_22413839222448957511672147934829851706974254468168255641494287828126447755230133213742654659999254827175997758192823410038977598027450835270.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_29793102587501614832761962817496381948754187354139510906297685347436709277089543135748392401212717654282462446534086383947023366030253119052733995242201873785323240739807453510.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_313483210406568272182668665085706377191480280827806892842064834884769898876522940934406311007040904697087777872953522535669807482429766.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_313483210406568272182668665085706377191480280827806892842064834884769898876522940934406311007040904775296114254598094550385502884487494.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_342008044776077809952382379377725389002657731782690276593972840565627274789710108763094526877518194116991217137317838936203404537258310.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_342008044776173808293820246848172717185750701891546008523394046430801193894526052825594273532638348990588789776175654503162559421243718.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_4783374182229130129740427628871252093376235272255258396412826040848554901512364050577729402805996705532320000494614954335532315220.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5218628612917139546869348214867397575473086416696629178660877408548465284788205670504217733599529674207701159930397200212215296596.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5218628612917139546869348214867397575569469130422103496028582597146102585535664593378292480681152317482488970327128105148268437830.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5218628612917139546869348214867397575569469130422103496028582597146102585535664593378292480681230525818870614899142820843670495558.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5259375533364443721155423650574554402718938031188775047500175620866119703749582709011776511528302778938456220334561858942689831440974428791110.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5259375533364443721155423688741692800862625687769049452465882099149865262275656172584181799192428813453928606899512307340637614020067093930310.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5737942840946933122988069871316442036985409143826399469508816258754238097057918953726770108704046100504160029419017113508415153431397775728966.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_6990909199294242919332708782990481955635203273555594435047935561554494072389185531750095764725676280630374101219996805343174917845436987668653086546229359531757587021810123956550.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_80251701864081417831371931036709008182515405432296624554309851309220089166275703141148945731290811019842586449396221097154566387051290950.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_80251701864081513829713368904179455510698498402405480286239272515085263085380519085211445477945931174716184022035078912721525541935276358.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_87554059462691240279969178780044740483205306606426785458463137265706929141469772643990323717393234658316474950798540409350505658676175174.npy: Der Dateiname ist zu lang
fatal: Arbeitsverzeichnis konnte nicht ausgecheckt werden
warning: Klonen erfolgreich, Auschecken ist aber fehlgeschlagen.
Sie können mit 'git status' prüfen, was ausgecheckt worden ist
und das Auschecken mit 'git restore --source=HEAD :/' erneut versuchen.
me@comp:~/tmp$
How can I know, the value of each element of the data in the hexagono in the SOM? There are not function thats shows the value?. How can I do this, cause I need the value?
Hi! I just started experimenting with your package for the analysis of some big datasets and have encountered problems with the required allocation of memory: for example, the MNIST tutorial is interruped because of the following error
MemoryError: Unable to allocate 876. GiB for an array with shape (60000, 2500, 784) and data type float64
Also creating a small data set ad hoc, such as 2000 np.array of length 200 gets me a similar error. Running on a 8GB RAM and Intel(R) Core i5-6300U CPU .
Thanks in advance!
Hi!
I just realized that the version I installed from pip (pip install SimpSOM
) doesn't have the colnames
parameter included but it's included in the repo code. I made the modifications on my local file, but just so you know that happens!
Thanks for your amazing work!
Hello, first of all, thank you for sharing, and gave me a lot of help.
def update_weights(self, inputVec, sigma, lrate, bmu):
"""Update the node Weights.
Args:
inputVec (np.array): A weights vector whose distance drives the direction of the update.
sigma (float): The updated gaussian sigma.
lrate (float): The updated learning rate.
bmu (somNode): The best matching unit.
"""
dist=self.get_nodeDistance(bmu)
gauss=np.exp(-dist*dist/(2*sigma*sigma)) # I think gauss will always > 0
if gauss>0:
for i in range(len(self.weights)):
self.weights[i] = self.weights[i] - gauss*lrate*(self.weights[i]-inputVec[i])
In somNode::update_weights() , expression ' gauss > 0 ' will always be true.
So throughout the training process, the weights of all nodes will be changed.
I read some literature. The literature says that the node weights near BMU should be updated, and this neighborhood is gradually reduced, eventually containing only BMU itself.
So, I think we should change ' gauss >0' to 'gauss>x ' (0<x<1)
Thank you again for sharing and looking forward to your reply.
Hey @fcomitani team,
I basically use your library for clustering but there's one function which takes hell lot of time. My code is like below:
x_train=df_kmeans.drop(columns=['LTV','sqrt_LTV'])
net = sps.somNet(30, 30, x_train.values, PBC=True)
net.train(0.1, 20000)
prj=np.array(net.project(x_train.values))
This (prj=np.array(net.project(x_train.values))) line of code takes around 6-7 hours for around 7 million rows. Can you help me out that how I can faster this one out. My current system is 32 GB RAM and 4 core CPU in AWS.
Hi @fcomitani
I just want to inform you that PyPi does not pull the latest changes that include the printout
typo fix.
Near 284 line,p.dists.iteritems should change to p.dists.item. Thank you very mach for your code to help me a lot.
Hi there!
I am trying to reproduce a simple example but I am having some issues to initialize SOMNet
To reproduce
import simpsom as sps
import numpy as np
data = np.random.rand(20, 20)
net = sps.SOMNet(20, 20, data, topology='hexagonal',
init='PCA', metric='cosine',
neighborhood_fun='gaussian', PBC=True,
random_seed=32, GPU=False, CUML=False,
output_path="./")
and here is the error I get
Traceback (most recent call last):
File "/home/lucas/mba/python/simpsom.py", line 1, in <module>
import simpsom as sps
File "/home/lucas/mba/python/simpsom.py", line 5, in <module>
net = sps.SOMNet(20, 20, data, topology='hexagonal',
AttributeError: partially initialized module 'simpsom' has no attribute 'SOMNet' (most likely due to a circular import)
Should the _update_learning_rate() in network.py be self.learning_rate = self.start_learning_rate * self.xp.exp(-n_iter / self.epochs)
instead? Thanks!
Figures 2 and 3 of the .run_colorsExample() do not appear. I also tried to use the .project function with another dataset and the result was the same. I am using python 3.6.5 64b, numpy=1.14.3, matplotlib=2.2.2, sklearn=0.20.1 and no errors appear neither. Anyone know what to do?
Once the SOM has been trained, is it possible to get the cell to which a data sample belongs? Or for each cell to get a list of samples from a dataset that belong to that cell? I haven't found anything like this in the example.
Hi, I tried training a model and I got the following error.
UnboundLocalError Traceback (most recent call last)
in ()
----> 1 net.train(0.2, 1000)
~\Anaconda3\lib\site-packages\simpsom-1.3.3-py3.7.egg\SimpSOM_init_.py in train(self, startLearnRate, epochs)
213 inputVec = self.data[np.random.randint(0, self.data.shape[0]), :].reshape(np.array([self.data.shape[1]]))
214
--> 215 bmu=self.find_bmu(inputVec)
216
217 for node in self.nodeList:
~\Anaconda3\lib\site-packages\simpsom-1.3.3-py3.7.egg\SimpSOM_init_.py in find_bmu(self, vec)
173 minVal=dist
174 bmu=node
--> 175 return bmu
176
177
UnboundLocalError: local variable 'bmu' referenced before assignment
Hi, first thanks for the code, I'm using it at my work for clustering some data.
I'm interested in parallelizing the code, and I see that you have a "TODO" comment above this line "Parallel(n_jobs=self.n_jobs)(delayed(my_func)(c, K, N) for c in inputs)" in the train function. I suppose you have tried or have some ideas about it, and I wanted to know which function is "my_func" a placeholder for, and what are the parameters c,K,N ir order to have a better general idea about how to proceed with the parallelization.
Again thanks for the repository!
I've installed the latest SimpSOM using Pip and I've tried following the code presented on Github and the code presented in the API (readthedocs). Unfortunately, raw_data isn't present after import SimpSOM and I can't find any API documentation (e.g., descriptions, return values, argument values with datatypes) for each method/function. If raw_data is only a place holder, then what is it, i.e., a list, dictionary, array, etc? I would really like to learn more about this package, is there documentation elsewhere?
I have question about time complexity. I made som experiments and I got O(n log n) with using DBSCAN for clustering and O(n^2) with using Quality Threshold algorithm. I tried the first method on Jain, Flame, Compound and t4.8k and second method on MNIST. Can you explain O(n log n) complexity? Because I think that time complexity for SOM is O(n^2). Thanks
Hi Federico,
I looked into the code and the only difference between the U-matrix and your implementation of diff_graph() seems to be that the former takes an average of the distances with its neighbours while diff_graph() just takes the sum. Is diff_graph() an implementation of the U-matrix?
Hi @fcomitani:
I have some problems to determinate the node's coordinate in the bidirectional map. If I took, the trainned net and I want to see the centroids of the node, only I need to do is:
#SOM 4X5
net45 = sps.somNet(4, 5, data, PBC=True)
#Train the network for 10000 epochs and with initial learning rate of 0.01.
s45 = net45.train(0.01, 10000)
#Visualize the centroids of each features into the node 0
s45[0]
But if I want only two coordinates from where is located in the MAP this node, I can´t do this. So, can you help me to find where I get this two values that I need, 'cause I review your code and I don't find it.
Thanks so much,
Hi,
First at all, thank you for share the code. Could you share any code for nodes difference of MNIST? I am interesting on the representation by class just like your last image in the examples. And another question could you put centroids for the differents regions for each data point and finally calculate the distance for each data point with its centroid. And finally it is possible use SimpSOM like prediction? It would be very nice to see where a new point will be. Thanks for all!
Hello! Thank you for your work. I tried importing the latest version of simpsom
(v2.0.0) and the error occurred: ModuleNotFoundError: No module named 'simpsom.cluster'
. There seems to be a missing init.py file in "./simpsom/cluster/". I tried adding it locally and it did the trick for me.
Hi this code is pretty useful for making a fast SOM in a preliminary data set. I liked a lot, as the package can be also easily modified to change the graphics for my particular use. I wanted to ask you, how can your code be properly referred to a paper. Is there a publication from you that I shall cite, and if not, what shall I add to the acknowledgement section of my paper. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.