How can I know, the value of each element of the data in the hexagono in the SOM? Ther

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

proj_coor=net.project(raw_data) <p d

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Labels about simpsom HOT 9 CLOSED

fcomitani commented on May 14, 2024

Labels

from simpsom.

Comments (9)

fcomitani commented on May 14, 2024

Hello @JessitaMS,

at the moment there is no straightforward way to extract value of the weights for each node, but I can easily add it to the next version.

In the meantime you can iterate on all the node objects after training the net:

for node in net.nodeList:
    print(node.weights)

node.weights contains the value array of each node, while node.pos their position.

Alternatively you could save the weights with net.save and then read them from the .npy file that it writes.

Hope this helps!

from simpsom.

JessitaMS commented on May 14, 2024

Hello @fcomitani,

What you recommend me to do. I have already tried it, but with this I only get the centroids.
I have reviewed your code and I do not see where exactly the data contained in each cluster is stored ... This is really what I need now, since I have already solved the previous problem.
Thank you for your previous response, I hope you can give me an answer of this too soon

from simpsom.

fcomitani commented on May 14, 2024

Hi,

I'm sorry I didn't understand your question.

If you are looking for the position of each one of your point in the final 2d map, these are given in output by net.project.
So, for example you can run

proj_coor=net.project(raw_data)

and the 2d coordinates of each data point will be stored in proj_coor.
Btw, you can deactivate the plotting function with the flag printout=False if you don't need it.

Then if you want to know which point belongs to each cluster run

clus=net.cluster(raw_data, type='qthresh')

the output format will be a nested list of clusters. Each sublist contains the index of the samples (according to your original raw_data order) assigned to that specific cluster.
This information is also saved to a txt file by default (savefile=True or False).
It's a bit counterintuitive at the moment, I guess I could make it a dictionary.

Does this answer your question?

from simpsom.

JessitaMS commented on May 14, 2024

Hi,

First thank you because you are being super nice to answer all my questions.

What you tell me about the net.cluster ... in fact I was hoping that that would give me back what I wanted, but it wasn't like that, this is what your function returns, the index of all the data in a list.

And I thought it would be a list list, where each sublist would be the cluster they belong to ...
I don't know, for example something like this that is what I did with the KMeans

Hopefully and there is a solution within the existing code. Thank you! ;)

from simpsom.

fcomitani commented on May 14, 2024

Hi @JessitaMS,

no problem at all!

I think I see what's going on here.

There's a chance that quality threshold is actually failing in separating different clusters.
Can you check the output plot (ending in _clusters.png) of net.cluster? That may be the case if you see that the points are all plotted with the same color.

I suggest you try different cutoff values for qtresh adding the flag cutoff to net.cluster.

If you still have problems, and you are not using periodic boundary conditions, you can try other clustering methods (qtresh is quick and easy but not very accurate).
You can choose between type='MeanShift', 'DBSCAN' or 'KMeans'
Density peak ('dpeak') should also work with PBC.

Let me know if that works.

from simpsom.

laecheln commented on May 14, 2024

proj_coor=net.project(raw_data)

and the 2d coordinates of each data point will be stored in proj_coor.
Btw, you can deactivate the plotting function with the flag printout=False if you don't need it.

Then if you want to know which point belongs to each cluster run

clus=net.cluster(raw_data, type='qthresh')

the output format will be a nested list of clusters. Each sublist contains the index of the samples (according to your original raw_data order) assigned to that specific cluster.
This information is also saved to a txt file by default (savefile=True or False).
It's a bit counterintuitive at the moment, I guess I could make it a dictionary.

Hello @fcomitani, thanks for SimpSOM.
I'm pretty new in SOM world and currently facing the same issue as above. I've done what you suggested, but I become just 1000 values, no matter how big is the raw_data dataset. The network used is 20x20. Have I misunderstood anything, was not possible to know the correspondence between the raw_data rows and the network layer and from this also the corresponding cluster?

Thank you in advance.

from simpsom.

fcomitani commented on May 14, 2024

Hi @laecheln,

the clustering step is separate from the network training and is applied directly to the data, so there is no direct link between clusters and nodes at the moment. One could in principle assign a cluster back to a node given the position of the data points and possibly infer cluster boundaries on those nodes that have no data point assigned.

I'm not sure however I correctly understand your issue. Are you getting only 1000 samples out of your total dataset in the output txt clusters assignment file? That's surprising. I can look at the code to see if there's a bug hidden in the clustering/saving functions. Would you be able to provide a subsample of your dataset? I would like to try and reproduce your error.

Could you also share the resulting scatter plot with clusters?

Thank you.

from simpsom.

laecheln commented on May 14, 2024

Hi @fcomitani,

Thanks for the answer and for having clarified my doubt.

Related with the issue: exactly. No matter how big the dataset was, I always got maximum 1000 samples out. Unfortunately I'm not able to provide a subsample, because the data are not mine. I'm sorry.

I can create the diff_graph with weights differences between nodes, but for the scatter, I checked it now, I got this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-ba935a4d5212> in <module>
----> 1 prj=np.array(net.project(trained.values))
      2 plt.scatter(prj.T[0],prj.T[1])
      3 plt.show()

AttributeError: 'NoneType' object has no attribute 'values'

Thank you

from simpsom.

fcomitani commented on May 14, 2024

Hi @laecheln,

no worries regarding the data, I understand.

I tried a couple of dummy datasets, but I can't seem to reproduce your issue.

Would you mind sharing the steps you are running (from the network set up to the clustering)? I'm wondering if maybe the clustering failed at a certain point and maybe just saves what it can. Did you get any output messages?

What is the shape of your data (specifically how many features do you have)?

The net.project error is telling you that your trained variable is empty, so there's definitely something wrong upstream. What is trained in this instance? You are supposed to pass your input data.

from simpsom.

Labels about simpsom HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs