Comments (25)
ah yes, you are right, it will be hard to do, but maybe there is some linear algebra trick
from pygenstability.
Really cool! Yes, implementing this is on the table, but we need to find the time to have a proper try at it. Hopefully soon!
from pygenstability.
Hello!
First, the positive answer: you can use weighted!
Second, the negative answer: we cannot use such big networks with markov stability, it requires computing matrix exponential, which is costly from CPU time and memory.
But, @d-schindler just implemented (#68) another variant for computing the matrix exponential with spectral decomposition. As it is done now, it will not work either, but we could try something with sparse matrices and only compute the largest eigenvalues to approximate the matrix exponential.
But, as I see you want to use directed, we're not quite sure yet how to make spectral base computation work with this constructor yet.
You really want to use matrix exponentials for such a large graph? Did you try 'simple' louvain our leiden with modularity? It will not be so easy to scale the code to such large networks, so we need to see if it is worth before jumping into coding this.
from pygenstability.
Hello @arnaudon ,
Thanks a lot for your answer.
I see the problem with matrix exponentials.
I could try the simplest version of the model. Basically you are suggesting to use the default run option:
pygenstability.pygenstability.run(graph=None, constructor='linearized', min_scale=-2.0, max_scale=0.5, n_scale=20, log_scale=True, scales=None, n_tries=100, with_NVI=True, n_NVI=20, with_postprocessing=True, with_ttprime=True, with_spectral_gap=False, result_file='results.pkl', n_workers=4, tqdm_disable=False, with_optimal_scales=True, optimal_scales_kwargs=None, method='louvain')
Which use a linearized constructor and louvain method.
My question, however is:
Can I still consider directed and weighted networks?
I'am basically try to run the following command now:
all_results = run(adjacency, min_scale=-1, max_scale = 1, n_scale=20,n_workers=6)
Which i believe correspond to your suggestion.
from pygenstability.
I'm not expert in directed networks, so I'm not sure how modularity works in this case. For weights, it's perfectly ok. Probably there are some extensions/adaptaions of modularity for directed networks, we can implement one if you want to use it. Also, constructor='linearized' corresponds to modularity with a scaling factor in the null model.
from pygenstability.
I understand.
I can set the constructor to linearized.
Also, setting the method to leiden speed up the process incredibly.
You use the genealied Modularity. So the only point is whether it also works for directed network or needs some modifications.
from pygenstability.
Hello @MatteoSerafino, our current implementation of linearized Markov Stability is aimed for undirected graphs. It can be extended to directed graphs though and we will try and implement a "linearized directed" constructor ASAP.
from pygenstability.
Hello @d-schindler,
Thanks for your asnwear.
That would be great.
from pygenstability.
@MatteoSerafino could you try the code dominik just implemented, to see if it does something reasonable for you?
from pygenstability.
Hello @arnaudon,
I did test the code as follow:
all_results = run(adjacency, min_scale=-1, max_scale = 1, n_scale=30, method='leiden', n_workers=6, constructor='linearized_directed')
As graphs, I generated some modular directed graph with N
nides, m
modules and with p_in
being the probability that nodes in the same module are connected, and p_out
being the probability of connection between nodes of different modules.
I made different tests with different combinations of N
, m
, p_in
and p_out
and results seems reasonable.
These were just 'naive tests, and deeper checks should be made.
However, for large network still give an memory error, even with the linearized directed version.
The problem is here:
np.ones((n_nodes, n_nodes)) / n_nodes
from pygenstability.
Ok, I see. If we use Google teleportation, we necessarily obtain a dense matrix and this leads to memory issues in large graphs. So probably we should turn off teleportation (i.e. set alpha=1) for linearized_directed.
from pygenstability.
Could you try again now? I set alpha=1 as default. It means that the constructor only works for strongly connected graphs.
from pygenstability.
I got the latest version, where you also fixed the following:
out_degrees = self.graph.toarray().sum(axis=1).flatten()
Which was causing a memory error.
It seems is working fine. I will let it run as follows:
all_results = run(adjacency, min_scale=min_scale_, max_scale = max_scale_, n_scale=n_scale, method='leiden', result_file=pah, n_workers=6, constructor='linearized_directed')
and see it reaches the end.
from pygenstability.
Hello @MatteoSerafino,
great to hear it's running now. Yes, I had to use scipy.sparse consistently to make it work.
It would be great if you could let us know whether it works fine, once the run is complete.
from pygenstability.
Hello @d-schindler,
The simulation went trough without problems.
However, given the following parameters:
Is directed True
Is weihed True
NĀ° nodes 179682
n scale: 30 ,max_scale: 5 ,min_scale: -1
I got four optional partitions, and all of them have at least 49749 communities.
I believe this is because my network is weakly connected and not strongly.
So it seems that if we input a weakly connected graph, the alghortims fails.
What do you think?
from pygenstability.
Great to hear! If this is an issue with the optimal scale selection, please try to change the parameters for scale selection, e.g. change kernel size or window size.
# select optimal scales
all_results = identify_optimal_scales(all_results,kernel_size=5,window_size=5,basin_radius=2)
_ = plotting.plot_scan(all_results)
Also, you will probably need to increase the resolution of the MS analyis to get decent results, i.e. increase n_scales. If you're partitions are too large, you need to decrease min_scale. It's a bit of trial and error.
from pygenstability.
Hello @d-schindler,
Thanks a lot.
You mention that alpha=1 forces the algorithm on the strong connected component. Therefore, do you think would work properly in a graph that is weakly connected?
from pygenstability.
With
We will merge #70 soon, do you think this suffices to closes this issue here @MatteoSerafino ?
from pygenstability.
@d-schindler do you think we can try to implement sparse with alpha<1?
from pygenstability.
@d-schindler, Yes, I think it does.
I would specify in the new documentation that for large networks, you use \alpha=1
and also properly specify that this correspond on focusing on the strongest connected component. As far as I can see, the current version does not work properly if the graph is weakly connected.
I believe that if you could implement the linearized_directed constructor with \alpha<1
for big networks, this will give more visibility to your package.
from pygenstability.
yes, let's try to make it work with sparse matrices all the way, I'll have a look at it tonight
from pygenstability.
@arnaudon , I actually don't think it is theoretically possible to use
One solution might be to apply teleportation only to those nodes that are outside of the LSCC, but this has never been tested before I think.
And @MatteoSerafino ,
from pygenstability.
Perhaps I am missing something here, but I don't think there should be a problem with having a "directed linearized" version that works for large graphs.
In terms of the constructor:
Teleportation is basically a low rank correction to a sparse matrix. So you can implement the transition matrix, including teleportation via a Linear Operator (that does defines the matVec products w/o ever constructing a full matrix).
https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.LinearOperator.html
Such a linear operator can be used to compute eigenvectors as well, and thus you can get the stationary distribution / null model of the Markov stability.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigs.html#scipy.sparse.linalg.eigs
The second aspect is that Louvain/Leiden might not like to be given a LinearOperator as input; however as teleportation allows one to connect to any neighbor, this information does not really have to be encoded as a graph, but can also be passed on in other terms. Due to the low-rank structure some parts of this can probably also be pulled into the null model term (though I have not done the calculations here), as the dominant updates of community structure should follow the "actual links" most of the time.
from pygenstability.
Ah, here is the linear algebra trick, thanks a lot, Michael! This definitely sounds like a possibility. I was also wondering why we cannot do it, as this is basically pagerank, that works on pretty large graphs. Let's give it a try asap!
from pygenstability.
Hi all, I have been following this discussion because I was the one that suggested using PyGenStabillity to @MatteoSerafino and I am very interested in this code.
I had to deal with a similar issue in my code for flow stability. I did what @michaelschaub mentioned. My issue was not because of page rank teleportation but for dealing with covariance matrices that have a sparse part and a dense low-rank part (the null model). As @michaelschaub explained, the covariance for the PageRank teleportation can probably be also written like that. Maybe this is useful for you to have a look. I created a new class for covariance matrices that stores things in sparse matrices internally and implemented the methods I needed in Cython (for the Louvain algo).
Anyway, I'm glad that there is an effort to implement Markov Stability and all its variants in Python š
from pygenstability.
Related Issues (20)
- improve sankey diagram HOT 7
- add ploting function of clustered adjacency
- handling weights=0 HOT 1
- Windows compatibility HOT 3
- fix pypi deploy with github action HOT 3
- update cli HOT 1
- stress test with large graphs HOT 1
- pyyaml install HOT 1
- Adding the feature to bigger libraries HOT 6
- integrate leiden algorithm HOT 3
- add optimal-scale detection in main run function HOT 1
- improve ttprime HOT 1
- make pool argument optional in some functions
- swap order of post-processing and ttprime computation
- installing bug HOT 2
- scale selection at margins of domain HOT 4
- Alternative ways to compute matrix exponential HOT 1
- Allow to bypass louvain install HOT 2
- TypeError when installing PyGenStability on Mac with M2 Pro chip HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pygenstability.