GithubHelp home page GithubHelp logo

fcomitani / simpsom Goto Github PK

View Code? Open in Web Editor NEW
153.0 153.0 36.0 35.28 MB

Python library for Self-Organizing Maps

License: GNU General Public License v3.0

Python 100.00%
clustering dimensionality-reduction kohonen python self-organizing-map

simpsom's People

Contributors

akshgpt7 avatar andrea-tango avatar fcomitani avatar florentf9 avatar joaogabrielm avatar rajatjain1997 avatar richardscottoz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

simpsom's Issues

Cloning of repository fails due to too long filenames.

I tried to clone the simpsom repsitory, but I got errors because of too long filenames in the tests/ground_truth/ folder.
I'm working with an encrypted home folder, for which the maximum filename size to 143 characters.

As a workaround I have forked the simpsom repository, cloned it to an unencrypted folder, deleted the ground_truth files and pushed id back to my forked repository. Than I was able to clone the repository to an encrypted folder.

Here are the errors when cloning:

me@comp:~/tmp$ git clone https://github.com/fcomitani/simpsom.git
Klone nach 'simpsom' …
remote: Enumerating objects: 1414, done.
remote: Counting objects: 100% (217/217), done.
remote: Compressing objects: 100% (85/85), done.
remote: Total 1414 (delta 154), reused 165 (delta 130), pack-reused 1197
Empfange Objekte: 100% (1414/1414), 34.45 MiB | 15.38 MiB/s, fertig.
Löse Unterschiede auf: 100% (733/733), fertig.
error: unable to create file tests/ground_truth/som_clusters_29793102587501614832761962817496381948754187354139510906297685347436709277089543135748392401212717654282462446534086383947023366030253119052733995242201873785323240739807453510.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_clusters_342008044776077809952382379377725389002657731782690276593972840565627274789710108763094526877518194116991217137317838936203404537258310.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_clusters_342008044776173808293820246848172717185750701891546008523394046430801193894526052825594273532638348990588789776175654503162559421243718.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_clusters_6990909199294242919332708782990481955635203273555594435047935561554494072389185531750095764725676280630374101219996805343174917845436987668653086546229359531757587021810123956550.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_clusters_80251701864081417831371931036709008182515405432296624554309851309220089166275703141148945731290811019842586449396221097154566387051290950.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_clusters_80251701864081513829713368904179455510698498402405480286239272515085263085380519085211445477945931174716184022035078912721525541935276358.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_29793102587501614832761962817496381948754187354139510906297685347436709277089543135748392401212717654282462446534086383947023366030253119052733995242201873785323240739807453510.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_342008044776077809952382379377725389002657731782690276593972840565627274789710108763094526877518194116991217137317838936203404537258310.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_342008044776173808293820246848172717185750701891546008523394046430801193894526052825594273532638348990588789776175654503162559421243718.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_6990909199294242919332708782990481955635203273555594435047935561554494072389185531750095764725676280630374101219996805343174917845436987668653086546229359531757587021810123956550.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_80251701864081417831371931036709008182515405432296624554309851309220089166275703141148945731290811019842586449396221097154566387051290950.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/som_projected_80251701864081513829713368904179455510698498402405480286239272515085263085380519085211445477945931174716184022035078912721525541935276358.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1224543790650657313213549472991040535904219846983620675164315761268632417486417738025024652371253533972999132316224697404960185610836.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1224543790650657313213549472991040535904316229697346149481683466457230054787165196947898727118335156616273920126621428309896238752070.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1224543790650657313213549472991040535904316229697346149481683466457230054787165196947898727118335234824610301771193443025591640809798.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1335968924906787723998553143006053779321110122674337069737184616588407112905780651649079739801479596597171496942181683254327081984326.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1335968924906787723998553143006053779321110122674337069737184616588407112905780651649079739801479674805507878586753697970022484042054.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_1346400136541297592615788464317873357020832176068851985856552095957119374614286985032559191109006013108952827969596804999741666077473546408059206.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_18685055399332539569298545425278328489749448348749094774846126728342169455610559414236305619044411867761030363303200024846688582.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_18685055399332539569298545425278328489749448348749094774846126728342169456274684723282236516895166603137453243205118281426690374.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_18685055399332539569298545425278328489750919032247099059558511844534599738236183038460742443149308498859269490750211552227123526.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_20385268019207576354958391464325771779568238790711341781361650770101963224748689817883955002660751240165972540340344160735556180.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_20544435677204858285763373635056853135622468717937993960553728025241249173361883988934768232047196198044068854409077774513251429182268531014.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_22413839222448957511672109767691453563286597887893850675787809544380889229156669641337366995873220311703611193242375012091195018934785696070.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_22413839222448957511672147934829851706974254468168255641494287828126447755230133213742654659999254827175997758192823410038977598027450835270.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_29793102587501614832761962817496381948754187354139510906297685347436709277089543135748392401212717654282462446534086383947023366030253119052733995242201873785323240739807453510.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_313483210406568272182668665085706377191480280827806892842064834884769898876522940934406311007040904697087777872953522535669807482429766.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_313483210406568272182668665085706377191480280827806892842064834884769898876522940934406311007040904775296114254598094550385502884487494.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_342008044776077809952382379377725389002657731782690276593972840565627274789710108763094526877518194116991217137317838936203404537258310.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_342008044776173808293820246848172717185750701891546008523394046430801193894526052825594273532638348990588789776175654503162559421243718.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_4783374182229130129740427628871252093376235272255258396412826040848554901512364050577729402805996705532320000494614954335532315220.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5218628612917139546869348214867397575473086416696629178660877408548465284788205670504217733599529674207701159930397200212215296596.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5218628612917139546869348214867397575569469130422103496028582597146102585535664593378292480681152317482488970327128105148268437830.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5218628612917139546869348214867397575569469130422103496028582597146102585535664593378292480681230525818870614899142820843670495558.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5259375533364443721155423650574554402718938031188775047500175620866119703749582709011776511528302778938456220334561858942689831440974428791110.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5259375533364443721155423688741692800862625687769049452465882099149865262275656172584181799192428813453928606899512307340637614020067093930310.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_5737942840946933122988069871316442036985409143826399469508816258754238097057918953726770108704046100504160029419017113508415153431397775728966.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_6990909199294242919332708782990481955635203273555594435047935561554494072389185531750095764725676280630374101219996805343174917845436987668653086546229359531757587021810123956550.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_80251701864081417831371931036709008182515405432296624554309851309220089166275703141148945731290811019842586449396221097154566387051290950.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_80251701864081513829713368904179455510698498402405480286239272515085263085380519085211445477945931174716184022035078912721525541935276358.npy: Der Dateiname ist zu lang
error: unable to create file tests/ground_truth/trained_som_87554059462691240279969178780044740483205306606426785458463137265706929141469772643990323717393234658316474950798540409350505658676175174.npy: Der Dateiname ist zu lang
fatal: Arbeitsverzeichnis konnte nicht ausgecheckt werden
warning: Klonen erfolgreich, Auschecken ist aber fehlgeschlagen.
Sie können mit 'git status' prüfen, was ausgecheckt worden ist
und das Auschecken mit 'git restore --source=HEAD :/' erneut versuchen.

me@comp:~/tmp$ 

Labels

How can I know, the value of each element of the data in the hexagono in the SOM? There are not function thats shows the value?. How can I do this, cause I need the value?

MemoryError

Hi! I just started experimenting with your package for the analysis of some big datasets and have encountered problems with the required allocation of memory: for example, the MNIST tutorial is interruped because of the following error

MemoryError: Unable to allocate 876. GiB for an array with shape (60000, 2500, 784) and data type float64

Also creating a small data set ad hoc, such as 2000 np.array of length 200 gets me a similar error. Running on a 8GB RAM and Intel(R) Core i5-6300U CPU .

Thanks in advance!

pip version doesn't have colnames parameter in nodes_graph

Hi!

I just realized that the version I installed from pip (pip install SimpSOM) doesn't have the colnames parameter included but it's included in the repo code. I made the modifications on my local file, but just so you know that happens!

Thanks for your amazing work!

I think that when updating weights, should not target all nodes.

Hello, first of all, thank you for sharing, and gave me a lot of help.

def update_weights(self, inputVec, sigma, lrate, bmu):

	"""Update the node Weights.

	Args:
		inputVec (np.array): A weights vector whose distance drives the direction of the update.
		sigma (float): The updated gaussian sigma.
		lrate (float): The updated learning rate.
		bmu (somNode): The best matching unit.
	"""

	dist=self.get_nodeDistance(bmu)
	gauss=np.exp(-dist*dist/(2*sigma*sigma))  # I think gauss will always > 0
	if gauss>0:
		for i in range(len(self.weights)):
			self.weights[i] = self.weights[i] - gauss*lrate*(self.weights[i]-inputVec[i])

In somNode::update_weights() , expression ' gauss > 0 ' will always be true.
So throughout the training process, the weights of all nodes will be changed.

I read some literature. The literature says that the node weights near BMU should be updated, and this neighborhood is gradually reduced, eventually containing only BMU itself.
So, I think we should change ' gauss >0' to 'gauss>x ' (0<x<1)

Thank you again for sharing and looking forward to your reply.

net.project() function is slow

Hey @fcomitani team,

I basically use your library for clustering but there's one function which takes hell lot of time. My code is like below:

x_train=df_kmeans.drop(columns=['LTV','sqrt_LTV']) 
net = sps.somNet(30, 30, x_train.values, PBC=True)
net.train(0.1, 20000)
prj=np.array(net.project(x_train.values))

This (prj=np.array(net.project(x_train.values))) line of code takes around 6-7 hours for around 7 million rows. Can you help me out that how I can faster this one out. My current system is 32 GB RAM and 4 core CPU in AWS.

problem due cyclical error?

Hi there!

I am trying to reproduce a simple example but I am having some issues to initialize SOMNet

To reproduce

import simpsom as sps
import numpy as np

data = np.random.rand(20, 20)
net = sps.SOMNet(20, 20, data, topology='hexagonal',
                init='PCA', metric='cosine',
                neighborhood_fun='gaussian', PBC=True,
                random_seed=32, GPU=False, CUML=False,
                output_path="./")

and here is the error I get

Traceback (most recent call last):
  File "/home/lucas/mba/python/simpsom.py", line 1, in <module>
    import simpsom as sps
  File "/home/lucas/mba/python/simpsom.py", line 5, in <module>
    net = sps.SOMNet(20, 20, data, topology='hexagonal',
AttributeError: partially initialized module 'simpsom' has no attribute 'SOMNet' (most likely due to a circular import)

Bug in learning rate?

Should the _update_learning_rate() in network.py be self.learning_rate = self.start_learning_rate * self.xp.exp(-n_iter / self.epochs) instead? Thanks!

run_colorsExample

Figures 2 and 3 of the .run_colorsExample() do not appear. I also tried to use the .project function with another dataset and the result was the same. I am using python 3.6.5 64b, numpy=1.14.3, matplotlib=2.2.2, sklearn=0.20.1 and no errors appear neither. Anyone know what to do?

Predicting winning cell for data?

Once the SOM has been trained, is it possible to get the cell to which a data sample belongs? Or for each cell to get a list of samples from a dataset that belong to that cell? I haven't found anything like this in the example.

Unbound Local Error While training

Hi, I tried training a model and I got the following error.

Training SOM... 0%

UnboundLocalError Traceback (most recent call last)
in ()
----> 1 net.train(0.2, 1000)

~\Anaconda3\lib\site-packages\simpsom-1.3.3-py3.7.egg\SimpSOM_init_.py in train(self, startLearnRate, epochs)
213 inputVec = self.data[np.random.randint(0, self.data.shape[0]), :].reshape(np.array([self.data.shape[1]]))
214
--> 215 bmu=self.find_bmu(inputVec)
216
217 for node in self.nodeList:

~\Anaconda3\lib\site-packages\simpsom-1.3.3-py3.7.egg\SimpSOM_init_.py in find_bmu(self, vec)
173 minVal=dist
174 bmu=node
--> 175 return bmu
176
177

UnboundLocalError: local variable 'bmu' referenced before assignment

Parallelization of the code

Hi, first thanks for the code, I'm using it at my work for clustering some data.
I'm interested in parallelizing the code, and I see that you have a "TODO" comment above this line "Parallel(n_jobs=self.n_jobs)(delayed(my_func)(c, K, N) for c in inputs)" in the train function. I suppose you have tried or have some ideas about it, and I wanted to know which function is "my_func" a placeholder for, and what are the parameters c,K,N ir order to have a better general idea about how to proceed with the parallelization.

Again thanks for the repository!

Cannot locate raw_data or any detailed API.

I've installed the latest SimpSOM using Pip and I've tried following the code presented on Github and the code presented in the API (readthedocs). Unfortunately, raw_data isn't present after import SimpSOM and I can't find any API documentation (e.g., descriptions, return values, argument values with datatypes) for each method/function. If raw_data is only a place holder, then what is it, i.e., a list, dictionary, array, etc? I would really like to learn more about this package, is there documentation elsewhere?

Time complexity

I have question about time complexity. I made som experiments and I got O(n log n) with using DBSCAN for clustering and O(n^2) with using Quality Threshold algorithm. I tried the first method on Jain, Flame, Compound and t4.8k and second method on MNIST. Can you explain O(n log n) complexity? Because I think that time complexity for SOM is O(n^2). Thanks

Is diff_graph() an implementation of the U-matrix?

Hi Federico,

I looked into the code and the only difference between the U-matrix and your implementation of diff_graph() seems to be that the former takes an average of the distances with its neighbours while diff_graph() just takes the sum. Is diff_graph() an implementation of the U-matrix?

Node's coordinates in the SOM

Hi @fcomitani:

I have some problems to determinate the node's coordinate in the bidirectional map. If I took, the trainned net and I want to see the centroids of the node, only I need to do is:

#SOM 4X5
net45 = sps.somNet(4, 5, data, PBC=True)
#Train the network for 10000 epochs and with initial learning rate of 0.01.
s45 = net45.train(0.01, 10000)

#Visualize the centroids of each features into the node 0
s45[0]

s45

But if I want only two coordinates from where is located in the MAP this node, I can´t do this. So, can you help me to find where I get this two values that I need, 'cause I review your code and I don't find it.

Thanks so much,

Nodes difference MNIST

Hi,

First at all, thank you for share the code. Could you share any code for nodes difference of MNIST? I am interesting on the representation by class just like your last image in the examples. And another question could you put centroids for the differents regions for each data point and finally calculate the distance for each data point with its centroid. And finally it is possible use SimpSOM like prediction? It would be very nice to see where a new point will be. Thanks for all!

Module Not Found error during import

Hello! Thank you for your work. I tried importing the latest version of simpsom (v2.0.0) and the error occurred: ModuleNotFoundError: No module named 'simpsom.cluster'. There seems to be a missing init.py file in "./simpsom/cluster/". I tried adding it locally and it did the trick for me.

How to reference this package

Hi this code is pretty useful for making a fast SOM in a preliminary data set. I liked a lot, as the package can be also easily modified to change the graphics for my particular use. I wanted to ask you, how can your code be properly referred to a paper. Is there a publication from you that I shall cite, and if not, what shall I add to the acknowledgement section of my paper. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.