kfoynt / localgraphclustering Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
For the Julia setup in the README,
Pkg.add(PyCall) should be Pkg.add("PyCall")
In reading through the notebooks, I notice that https://github.com/kfoynt/LocalGraphClustering/blob/master/notebooks/examples.ipynb has examples of ACL where the output of approximate personalized page rank is input directly into sweep_cut, while the paper (http://www.leonidzhukov.net/hse/2015/networks/papers/andersen06localgraph.pdf, Section 2.4 , and others) suggests using degree normalized pagerank. That seemed like a discrepancy to me, unless I am missing something. Hoping to get your comment/feedback on this.
@dgleich For some reason, notebooks test is not added into the coverage even if I have made some change to test notebooks locally. How can we fix that?
When I ran this using some updated libraries, I got the warning:
/homes/dgleich/.local/lib/python3.5/site-packages/localgraphclustering/ncpplots.py:18: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
xs = df[group].as_matrix().copy()
We should probably fix that to use .values !
When trying our code on a new Mac,
pip3 install pytest
gave me version 5.4.2, which doesn't work with our tests. We need version 4.3.1 (which works in our build script...)
We should probably try and update this.
It looks like the random number generator in proxl1PRrand can only generate at most 32767 numbers, which is too small for large graphs
Hello --
I'm interested in trying to apply the L1-regularized PPR on a directed graph ... do you know if there's an implementation of that somewhere? Or - if not - is there a modification to the undirected algorithm that I could try to implement? I tried playing around w/ the math a bit myself ... but didn't make much progress.
Thanks!
When I install LocalGraphClustering in Mojave I get the following:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/wait.h:110:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:196:2: error: unknown type name 'uint8_t'
uint8_t ri_uuid[16];
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:197:2: error: unknown type name 'uint64_t'
uint64_t ri_user_time;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:198:2: error: unknown type name 'uint64_t'
uint64_t ri_system_time;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:199:2: error: unknown type name 'uint64_t'
uint64_t ri_pkg_idle_wkups;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:200:2: error: unknown type name 'uint64_t'
uint64_t ri_interrupt_wkups;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:201:2: error: unknown type name 'uint64_t'
uint64_t ri_pageins;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:202:2: error: unknown type name 'uint64_t'
uint64_t ri_wired_size;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:203:2: error: unknown type name 'uint64_t'
uint64_t ri_resident_size;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:204:2: error: unknown type name 'uint64_t'
uint64_t ri_phys_footprint;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:205:2: error: unknown type name 'uint64_t'
uint64_t ri_proc_start_abstime;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:206:2: error: unknown type name 'uint64_t'
uint64_t ri_proc_exit_abstime;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:210:2: error: unknown type name 'uint8_t'
uint8_t ri_uuid[16];
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:211:2: error: unknown type name 'uint64_t'
uint64_t ri_user_time;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:212:2: error: unknown type name 'uint64_t'
uint64_t ri_system_time;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:213:2: error: unknown type name 'uint64_t'
uint64_t ri_pkg_idle_wkups;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:214:2: error: unknown type name 'uint64_t'
uint64_t ri_interrupt_wkups;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:215:2: error: unknown type name 'uint64_t'
uint64_t ri_pageins;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:216:2: error: unknown type name 'uint64_t'
uint64_t ri_wired_size;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:217:2: error: unknown type name 'uint64_t'
uint64_t ri_resident_size;
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: *** [../sweepcut.o] Error 1
However, the notebooks still run normally.
A student of mine is working on the SimpleLocal code. He mentioned to me the following:
"There is a bug in STAGEFLOW method in SimpleLocal.cpp. When the graph is being constructed at each stage, 4 edges are added between vertices instead of two (i.e., two edges are added from u to v and two edges are added from v to u)."
You might want to have a look at this.
Hello,
AttributeError Traceback (most recent call last)
in ()
1 # Read graph. This also supports gml format.
----> 2 g = lgc.GraphLocal('datasets/JohnsHopkins.graphml','graphml')
3
4 # To get a quick look at the list of methods and attributes for the graph object 'g' you can type 'g.' + tab
5 # and scroll up or down.
AttributeError: module 'localgraphclustering' has no attribute 'GraphLocal'
Issue 96 may have similar problem as me.
(#96)
But there was little information so I will provide some details here.
I am running python3.6 in jupyter notebook, with macOS 10.14.5. The error message is shown below:
AttributeError Traceback (most recent call last)
<ipython-input-3-8da92a9ab2a6> in <module>
----> 1 import localgraphclustering as lgc
/anaconda3/lib/python3.6/site-packages/localgraphclustering/__init__.py in <module>
----> 1 from .GraphLocal import GraphLocal
2 from .GraphDrawing import GraphDrawing
3 from .fiedler import fiedler, fiedler_local
4 from .approximate_PageRank import approximate_PageRank
5 from .approximate_PageRank_weighted import approximate_PageRank_weighted
/anaconda3/lib/python3.6/site-packages/localgraphclustering/GraphLocal.py in <module>
8 import warnings
9 import collections as cole
---> 10 from .cpp import *
11 import random
12
/anaconda3/lib/python3.6/site-packages/localgraphclustering/cpp/__init__.py in <module>
41 from .MQI_cpp import *
42 from .proxl1PRaccel import *
---> 43 from .proxl1PRrand_cpp import *
44 from .SimpleLocal_cpp import *
45 from .sweepcut_cpp import *
/anaconda3/lib/python3.6/site-packages/localgraphclustering/cpp/proxl1PRrand_cpp.py in <module>
50
51 _graphlib_funs_proxl1PRrand64 = _setup_proxl1PRrand_args(
---> 52 'int64','int64', _graphlib.proxl1PRrand64)
53 _graphlib_funs_proxl1PRrand32 = _setup_proxl1PRrand_args(
54 'uint32','uint32', _graphlib.proxl1PRrand32)
/anaconda3/lib/python3.6/ctypes/__init__.py in __getattr__(self, name)
359 if name.startswith('__') and name.endswith('__'):
360 raise AttributeError(name)
--> 361 func = self.__getitem__(name)
362 setattr(self, name, func)
363 return func
/anaconda3/lib/python3.6/ctypes/__init__.py in __getitem__(self, name_or_ordinal)
364
365 def __getitem__(self, name_or_ordinal):
--> 366 func = self._FuncPtr((name_or_ordinal, self))
367 if not isinstance(name_or_ordinal, int):
368 func.__name__ = name_or_ordinal
AttributeError: dlsym(0x7fec89d9ade0, proxl1PRrand64): symbol not found
It seems to me that something goes wrong with the c library. But I'm not familiar with this. Could anyone give some help? Thanks in advance.
@MengLiuPurdue We should have a routine to compute conductance and conductance for weighted graphs in C++. These will be much faster than our current ones :)
Probably the best thing to do is to write: cut_value and weighted_cut value in C++ that take in an array of vertex ids.
Then we can just change set_scores in GraphLocal to use the C++ code to compute the cut and everything should be faster. At the moment, computing these conductance scores is taking a substantial fraction of the total time of an NCP. (e.g. like 50%...)
I am using spectral_clustering and the best conductance values I'm getting are all negative. Am I doing something wrong? Using nibble btw
We have a number of common visualization patterns that we'd like to do. I've been using NetworkX for this, but this seems like overkill as there is a big translation between their ids and our ids, which makes the process slightly tedious and error-prone.
G = GraphLocal()
Here xy, xyz are arrays with a row for each vertex with 1 or 2 coordinates.
G.draw(xy)
G.draw(xyz)
G.draw(xy, nodemarkersize=0)
G.draw(xy, set=S)
G.draw(xy, set=S)
G.draw(xy, values=f)
G.draw(xyz, groups=g) # this is a partition.
Parameters
G.draw(coords, ...)
coords: a n-by-2 or n-by-3 array with coordinates for each node of the graph.
Optional parameters:
alpha: [0, 1] the overall alpha scaling of the plot
nodealpha: [0, 1]
edgealpha:
setalpha:
nodecolor:
edgecolor:
setcolor:
nodesize:
linewidth:
ax=None (default) will create a new figure, or this will plot in ax if not None.
Return a dictionary with:
fig, ax, nodes, edges, setnodes, setedges, groupnodes, groupedges
these are the handles to the actual plot elements, so that you could change
values after the fact.
"""
For the following example, I get the entire set of nodes of the back from CRD. This seems like a bug in that we shouldn't ever get the entire graph back (all but one node, sure... but, not the entire graph...)
S = lgc.flow_clustering(helper.lgc_data("ASTRAL"),[398],method="crd", U = 3,h = 10, w = 2)
g.set_scores(S[0])
{'cond': 1,
'cut': 0.0,
'edgeseff': 0.0,
'edgestrue': 314428.0,
'isop': 1,
'sizeeff': 0,
'sizetrue': 1049,
'voleff': 0.0,
'voltrue': 314428.0}
Is there a reason to believe that the two approaches would yield significantly different results (though i recognize that approach 2 would be slower)? I seem to get quite different results, but perhaps its a bug on my side....
Approach 1:
Use ACL + sweep cut from local graph clustering library
Approach 2:
personalized pagerank using say, networkx
Normalize pagerank values by degree
run sweep_cut
I am using approximate_PageRank-weighted for clustering weighted graphs and was wondering if the parameter 'epsilon' could be changed in the code? (The unweighted version (approximate_PageRank) has this parameter in its setting ...)
Is there any way to pick clusters based on local minimum conductance?
from localgraphclustering import *
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\gerra\Anaconda3\lib\site-packages\localgraphclustering_init_.py", line 1, in
from .GraphLocal import GraphLocal
File "C:\Users\gerra\Anaconda3\lib\site-packages\localgraphclustering\GraphLocal.py", line 10, in
from .cpp import *
File "C:\Users\gerra\Anaconda3\lib\site-packages\localgraphclustering\cpp_init_.py", line 33, in
graphlib = load_library()
File "C:\Users\gerra\Anaconda3\lib\site-packages\localgraphclustering\cpp_init.py", line 24, in load_library
lib=ctypes.cdll.LoadLibrary(find_path())
File "C:\Users\gerra\Anaconda3\lib\ctypes_init_.py", line 434, in LoadLibrary
return self.dlltype(name)
File "C:\Users\gerra\Anaconda3\lib\ctypes_init.py", line 356, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] 找不到指定的模块。
i found there are no libgraph.dll in graph_lib_test file?
use python3.7 and MacOS env
AttributeError Traceback (most recent call last)
in
----> 1 from localgraphclustering import *
~/anaconda3/lib/python3.7/site-packages/localgraphclustering/init.py in
----> 1 from .GraphLocal import GraphLocal
2 from .GraphDrawing import GraphDrawing
3 from .fiedler import fiedler, fiedler_local
4 from .approximate_PageRank import approximate_PageRank
5 from .approximate_PageRank_weighted import approximate_PageRank_weighted
~/anaconda3/lib/python3.7/site-packages/localgraphclustering/GraphLocal.py in
8 import warnings
9 import collections as cole
---> 10 from .cpp import *
11 import random
12
~/anaconda3/lib/python3.7/site-packages/localgraphclustering/cpp/init.py in
41 from .MQI_cpp import *
42 from .proxl1PRaccel import *
---> 43 from .proxl1PRrand_cpp import *
44 from .SimpleLocal_cpp import *
45 from .sweepcut_cpp import *
~/anaconda3/lib/python3.7/site-packages/localgraphclustering/cpp/proxl1PRrand_cpp.py in
50
51 _graphlib_funs_proxl1PRrand64 = _setup_proxl1PRrand_args(
---> 52 'int64','int64', _graphlib.proxl1PRrand64)
53 _graphlib_funs_proxl1PRrand32 = _setup_proxl1PRrand_args(
54 'uint32','uint32', _graphlib.proxl1PRrand32)
~/anaconda3/lib/python3.7/ctypes/init.py in getattr(self, name)
367 if name.startswith('') and name.endswith(''):
368 raise AttributeError(name)
--> 369 func = self.getitem(name)
370 setattr(self, name, func)
371 return func
~/anaconda3/lib/python3.7/ctypes/init.py in getitem(self, name_or_ordinal)
372
373 def getitem(self, name_or_ordinal):
--> 374 func = self._FuncPtr((name_or_ordinal, self))
375 if not isinstance(name_or_ordinal, int):
376 func.name = name_or_ordinal
Does this package work with Windows10? I installed it on Python 3.7.4 (with pip3). When tried to import it on Jupytor got this error:
OSError Traceback (most recent call last)
...
~\Anaconda3\envs\newenvt\lib\ctypes_init_.py in LoadLibrary(self, name)
440
441 def LoadLibrary(self, name):
--> 442 return self._dlltype(name)
443
444 cdll = LibraryLoader(CDLL)
~\Anaconda3\envs\newenvt\lib\ctypes_init_.py in init(self, name, mode, handle, use_errno, use_last_error)
362
363 if handle is None:
--> 364 self._handle = _dlopen(self._name, mode)
365 else:
366 self._handle = handle
OSError: [WinError 126] The specified module could not be found
Can we get CRD to return the labels and levels of each node so that we can (if we want) post-process the result with a sweep-cut to make it more like the spectral algorithms?
The error "name 'sys' is not defined" pops up in many notebooks, e.g. image_segmentation_using_local_graph_clustering_and_gPb.ipynb.
When I import sys, the following error pops up: dlsym(0x7fdd03323660, MQI_weighted64): symbol not found
Are you able to explain what the check_symmetry
method in graph_class_local
is doing? It looks like it's checking to see if reciprocal edges exist for the first 200 edges in the graph -- is that right?
If so, any reason why you're only checking the first 200 edges and not all edges? Seems like it'd be better to do check via something like:
set(edges) == set([(e[1], e[0]) for e in edges])
EDIT: Also, it looks like graph_class_local
assumes all graphs are undirected? Is that right?
~ Ben
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.