GithubHelp home page GithubHelp logo

kfoynt / localgraphclustering Goto Github PK

View Code? Open in Web Editor NEW
132.0 11.0 45.0 1.08 GB

License: MIT License

Shell 0.01% Jupyter Notebook 98.18% Python 0.21% C++ 0.18% MATLAB 0.01% C 0.01% Makefile 0.01% HTML 1.42%
graph-algorithms jupyter-notebook graph visualization python julia

localgraphclustering's People

Contributors

bkj avatar chesterhu avatar dgleich avatar kfoynt avatar mengliupurdue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

localgraphclustering's Issues

ACL: Degree normalized pagerank for sweepcut

In reading through the notebooks, I notice that https://github.com/kfoynt/LocalGraphClustering/blob/master/notebooks/examples.ipynb has examples of ACL where the output of approximate personalized page rank is input directly into sweep_cut, while the paper (http://www.leonidzhukov.net/hse/2015/networks/papers/andersen06localgraph.pdf, Section 2.4 , and others) suggests using degree normalized pagerank. That seemed like a discrepancy to me, unless I am missing something. Hoping to get your comment/feedback on this.

Small code update for as_matrix() to .values

When I ran this using some updated libraries, I got the warning:

/homes/dgleich/.local/lib/python3.5/site-packages/localgraphclustering/ncpplots.py:18: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  xs = df[group].as_matrix().copy()

We should probably fix that to use .values !

bash run_tests.sh fails on pytest-5.4.2

When trying our code on a new Mac,

pip3 install pytest

gave me version 5.4.2, which doesn't work with our tests. We need version 4.3.1 (which works in our build script...)

We should probably try and update this.

L1-regularized PPR on directed graphs?

Hello --

I'm interested in trying to apply the L1-regularized PPR on a directed graph ... do you know if there's an implementation of that somewhere? Or - if not - is there a modification to the undirected algorithm that I could try to implement? I tried playing around w/ the math a bit myself ... but didn't make much progress.

Thanks!

Error: unknown type name "uint64_t"

When I install LocalGraphClustering in Mojave I get the following:

In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/wait.h:110:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:196:2: error: unknown type name 'uint8_t'
uint8_t ri_uuid[16];
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:197:2: error: unknown type name 'uint64_t'
uint64_t ri_user_time;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:198:2: error: unknown type name 'uint64_t'
uint64_t ri_system_time;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:199:2: error: unknown type name 'uint64_t'
uint64_t ri_pkg_idle_wkups;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:200:2: error: unknown type name 'uint64_t'
uint64_t ri_interrupt_wkups;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:201:2: error: unknown type name 'uint64_t'
uint64_t ri_pageins;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:202:2: error: unknown type name 'uint64_t'
uint64_t ri_wired_size;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:203:2: error: unknown type name 'uint64_t'
uint64_t ri_resident_size;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:204:2: error: unknown type name 'uint64_t'
uint64_t ri_phys_footprint;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:205:2: error: unknown type name 'uint64_t'
uint64_t ri_proc_start_abstime;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:206:2: error: unknown type name 'uint64_t'
uint64_t ri_proc_exit_abstime;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:210:2: error: unknown type name 'uint8_t'
uint8_t ri_uuid[16];
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:211:2: error: unknown type name 'uint64_t'
uint64_t ri_user_time;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:212:2: error: unknown type name 'uint64_t'
uint64_t ri_system_time;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:213:2: error: unknown type name 'uint64_t'
uint64_t ri_pkg_idle_wkups;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:214:2: error: unknown type name 'uint64_t'
uint64_t ri_interrupt_wkups;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:215:2: error: unknown type name 'uint64_t'
uint64_t ri_pageins;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:216:2: error: unknown type name 'uint64_t'
uint64_t ri_wired_size;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/resource.h:217:2: error: unknown type name 'uint64_t'
uint64_t ri_resident_size;
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: *** [../sweepcut.o] Error 1

However, the notebooks still run normally.

Possible bug in SimpleLocal

A student of mine is working on the SimpleLocal code. He mentioned to me the following:

"There is a bug in STAGEFLOW method in SimpleLocal.cpp. When the graph is being constructed at each stage, 4 edges are added between vertices instead of two (i.e., two edges are added from u to v and two edges are added from v to u)."

You might want to have a look at this.

module 'localgraphclustering' has no attribute 'GraphLocal'

Hello,

I am trying to run the examples.ipynb in Jupyter, however I receive this error after executing the second cell

AttributeError Traceback (most recent call last)
in ()
1 # Read graph. This also supports gml format.
----> 2 g = lgc.GraphLocal('datasets/JohnsHopkins.graphml','graphml')
3
4 # To get a quick look at the list of methods and attributes for the graph object 'g' you can type 'g.' + tab
5 # and scroll up or down.

AttributeError: module 'localgraphclustering' has no attribute 'GraphLocal'

AttributeError error when importing localgraphclustering

Issue 96 may have similar problem as me.
(#96)
But there was little information so I will provide some details here.

I am running python3.6 in jupyter notebook, with macOS 10.14.5. The error message is shown below:

AttributeError                            Traceback (most recent call last)
<ipython-input-3-8da92a9ab2a6> in <module>
----> 1 import localgraphclustering as lgc

/anaconda3/lib/python3.6/site-packages/localgraphclustering/__init__.py in <module>
----> 1 from .GraphLocal import GraphLocal
      2 from .GraphDrawing import GraphDrawing
      3 from .fiedler import fiedler, fiedler_local
      4 from .approximate_PageRank import approximate_PageRank
      5 from .approximate_PageRank_weighted import approximate_PageRank_weighted

/anaconda3/lib/python3.6/site-packages/localgraphclustering/GraphLocal.py in <module>
      8 import warnings
      9 import collections as cole
---> 10 from .cpp import *
     11 import random
     12 

/anaconda3/lib/python3.6/site-packages/localgraphclustering/cpp/__init__.py in <module>
     41 from .MQI_cpp import *
     42 from .proxl1PRaccel import *
---> 43 from .proxl1PRrand_cpp import *
     44 from .SimpleLocal_cpp import *
     45 from .sweepcut_cpp import *

/anaconda3/lib/python3.6/site-packages/localgraphclustering/cpp/proxl1PRrand_cpp.py in <module>
     50 
     51 _graphlib_funs_proxl1PRrand64 = _setup_proxl1PRrand_args(
---> 52     'int64','int64', _graphlib.proxl1PRrand64)
     53 _graphlib_funs_proxl1PRrand32 = _setup_proxl1PRrand_args(
     54     'uint32','uint32', _graphlib.proxl1PRrand32)

/anaconda3/lib/python3.6/ctypes/__init__.py in __getattr__(self, name)
    359         if name.startswith('__') and name.endswith('__'):
    360             raise AttributeError(name)
--> 361         func = self.__getitem__(name)
    362         setattr(self, name, func)
    363         return func

/anaconda3/lib/python3.6/ctypes/__init__.py in __getitem__(self, name_or_ordinal)
    364 
    365     def __getitem__(self, name_or_ordinal):
--> 366         func = self._FuncPtr((name_or_ordinal, self))
    367         if not isinstance(name_or_ordinal, int):
    368             func.__name__ = name_or_ordinal

AttributeError: dlsym(0x7fec89d9ade0, proxl1PRrand64): symbol not found

It seems to me that something goes wrong with the c library. But I'm not familiar with this. Could anyone give some help? Thanks in advance.

Compute conductance in C++

@MengLiuPurdue We should have a routine to compute conductance and conductance for weighted graphs in C++. These will be much faster than our current ones :)

Probably the best thing to do is to write: cut_value and weighted_cut value in C++ that take in an array of vertex ids.

Then we can just change set_scores in GraphLocal to use the C++ code to compute the cut and everything should be faster. At the moment, computing these conductance scores is taking a substantial fraction of the total time of an NCP. (e.g. like 50%...)

Graph drawing tools

We have a number of common visualization patterns that we'd like to do. I've been using NetworkX for this, but this seems like overkill as there is a big translation between their ids and our ids, which makes the process slightly tedious and error-prone.

  1. standard graph drawing given xy or xyz coordinates for each vertex.
  2. a standard graph drawing (xy or xyz coords) for each vertex and a subset set of nodes highlighted.
  3. a standard graph drawing (xy or xyz coords) for each vertex and a vector of data highlighted (e.g. a float value for each node).

G = GraphLocal()

Here xy, xyz are arrays with a row for each vertex with 1 or 2 coordinates.

G.draw(xy)
G.draw(xyz)
G.draw(xy, nodemarkersize=0) 
G.draw(xy, set=S)
G.draw(xy, set=S)
G.draw(xy, values=f)
G.draw(xyz, groups=g) # this is a partition. 

Parameters

G.draw(coords, ...) 

coords: a n-by-2 or n-by-3 array with coordinates for each node of the graph.

Optional parameters:

alpha: [0, 1] the overall alpha scaling of the plot
nodealpha: [0, 1]
edgealpha:
setalpha: 

nodecolor:
edgecolor:
setcolor:

nodesize:
linewidth:

ax=None (default) will create a new figure, or this will plot in ax if not None.

Return a dictionary with: 
fig, ax, nodes, edges, setnodes, setedges, groupnodes, groupedges
these are the handles to the actual plot elements, so that you could change 
values after the fact. 
"""






CRD is returning the whole graph

For the following example, I get the entire set of nodes of the back from CRD. This seems like a bug in that we shouldn't ever get the entire graph back (all but one node, sure... but, not the entire graph...)

S = lgc.flow_clustering(helper.lgc_data("ASTRAL"),[398],method="crd", U = 3,h = 10, w = 2)
g.set_scores(S[0])

{'cond': 1,
 'cut': 0.0,
 'edgeseff': 0.0,
 'edgestrue': 314428.0,
 'isop': 1,
 'sizeeff': 0,
 'sizetrue': 1049,
 'voleff': 0.0,
 'voltrue': 314428.0}

Comparing clusters found by regular pagerank (+degree normalization) and sweep_cut to ACL

Is there a reason to believe that the two approaches would yield significantly different results (though i recognize that approach 2 would be slower)? I seem to get quite different results, but perhaps its a bug on my side....

Approach 1:
Use ACL + sweep cut from local graph clustering library

Approach 2:
personalized pagerank using say, networkx
Normalize pagerank values by degree
run sweep_cut

PR setting

I am using approximate_PageRank-weighted for clustering weighted graphs and was wondering if the parameter 'epsilon' could be changed in the code? (The unweighted version (approximate_PageRank) has this parameter in its setting ...)

error: [WinError 126 ]

from localgraphclustering import *
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\gerra\Anaconda3\lib\site-packages\localgraphclustering_init_.py", line 1, in
from .GraphLocal import GraphLocal
File "C:\Users\gerra\Anaconda3\lib\site-packages\localgraphclustering\GraphLocal.py", line 10, in
from .cpp import *
File "C:\Users\gerra\Anaconda3\lib\site-packages\localgraphclustering\cpp_init_.py", line 33, in
graphlib = load_library()
File "C:\Users\gerra\Anaconda3\lib\site-packages\localgraphclustering\cpp_init
.py", line 24, in load_library
lib=ctypes.cdll.LoadLibrary(find_path())
File "C:\Users\gerra\Anaconda3\lib\ctypes_init_.py", line 434, in LoadLibrary
return self.dlltype(name)
File "C:\Users\gerra\Anaconda3\lib\ctypes_init
.py", line 356, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] 找不到指定的模块。

i found there are no libgraph.dll in graph_lib_test file?

import error

use python3.7 and MacOS env

AttributeError Traceback (most recent call last)
in
----> 1 from localgraphclustering import *

~/anaconda3/lib/python3.7/site-packages/localgraphclustering/init.py in
----> 1 from .GraphLocal import GraphLocal
2 from .GraphDrawing import GraphDrawing
3 from .fiedler import fiedler, fiedler_local
4 from .approximate_PageRank import approximate_PageRank
5 from .approximate_PageRank_weighted import approximate_PageRank_weighted

~/anaconda3/lib/python3.7/site-packages/localgraphclustering/GraphLocal.py in
8 import warnings
9 import collections as cole
---> 10 from .cpp import *
11 import random
12

~/anaconda3/lib/python3.7/site-packages/localgraphclustering/cpp/init.py in
41 from .MQI_cpp import *
42 from .proxl1PRaccel import *
---> 43 from .proxl1PRrand_cpp import *
44 from .SimpleLocal_cpp import *
45 from .sweepcut_cpp import *

~/anaconda3/lib/python3.7/site-packages/localgraphclustering/cpp/proxl1PRrand_cpp.py in
50
51 _graphlib_funs_proxl1PRrand64 = _setup_proxl1PRrand_args(
---> 52 'int64','int64', _graphlib.proxl1PRrand64)
53 _graphlib_funs_proxl1PRrand32 = _setup_proxl1PRrand_args(
54 'uint32','uint32', _graphlib.proxl1PRrand32)

~/anaconda3/lib/python3.7/ctypes/init.py in getattr(self, name)
367 if name.startswith('') and name.endswith(''):
368 raise AttributeError(name)
--> 369 func = self.getitem(name)
370 setattr(self, name, func)
371 return func

~/anaconda3/lib/python3.7/ctypes/init.py in getitem(self, name_or_ordinal)
372
373 def getitem(self, name_or_ordinal):
--> 374 func = self._FuncPtr((name_or_ordinal, self))
375 if not isinstance(name_or_ordinal, int):
376 func.name = name_or_ordinal

OSError: [WinError 126]

Does this package work with Windows10? I installed it on Python 3.7.4 (with pip3). When tried to import it on Jupytor got this error:

OSError Traceback (most recent call last)
...

~\Anaconda3\envs\newenvt\lib\ctypes_init_.py in LoadLibrary(self, name)
440
441 def LoadLibrary(self, name):
--> 442 return self._dlltype(name)
443
444 cdll = LibraryLoader(CDLL)

~\Anaconda3\envs\newenvt\lib\ctypes_init_.py in init(self, name, mode, handle, use_errno, use_last_error)
362
363 if handle is None:
--> 364 self._handle = _dlopen(self._name, mode)
365 else:
366 self._handle = handle

OSError: [WinError 126] The specified module could not be found

Error when implementing notebook

The error "name 'sys' is not defined" pops up in many notebooks, e.g. image_segmentation_using_local_graph_clustering_and_gPb.ipynb.

When I import sys, the following error pops up: dlsym(0x7fdd03323660, MQI_weighted64): symbol not found

`graph_class_local.check_symmetry`

Are you able to explain what the check_symmetry method in graph_class_local is doing? It looks like it's checking to see if reciprocal edges exist for the first 200 edges in the graph -- is that right?

If so, any reason why you're only checking the first 200 edges and not all edges? Seems like it'd be better to do check via something like:

set(edges) == set([(e[1], e[0]) for e in edges])

EDIT: Also, it looks like graph_class_local assumes all graphs are undirected? Is that right?

~ Ben

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.