python-graphblas / graphblas-algorithms Goto Github PK
View Code? Open in Web Editor NEWGraph algorithms written in GraphBLAS
License: Apache License 2.0
Graph algorithms written in GraphBLAS
License: Apache License 2.0
Hi all! Still getting used to the graphblas bindings and writing efficient enough algorithms to contribute effectively, but I thought I'd put a placeholder issue up in case someone else already has progress on this.
I don't see any of the graphblas python bindings implementing Algebraic Prim's from ch. 5.2 in the original Graph Algorithms in the Language of Linear Algebra book. MST is really quite useful to me, but in general as an approximation to the Steiner Tree for a given set of nodes and their metric closure.
I'm fairly certain the text states we cannot take advantage of the priority queue/heap speedup in linalg method, but perhaps someone has an idea (since Prim's is theoretically O(1) for sufficiently dense graphs, i.e. complete graphs of the metric closure! yay!)
Let me know if there's other info desired here for the feature request. I'm excited for an alternative to the old scipy minimum_spanning_tree
method, since it's spending a lot of time on nested graph validation that isn't opt-out.
Thanks!
Now that we have a logo for python-graphblas
(python-graphblas/python-graphblas#506), I suppose the logical next step is to make a logo for graphblas-algorithms
. It would probably make sense to use a variant of the python-graphblas
logo.
For now, the GraphBLAS Algorithms community call is the same as the Python-graphblas community call:
python-graphblas/python-graphblas#247
If you're a new user or contributor looking for guidance (or just want to say hi!), please join. We're friendly :)
Currently, graphblas_algorithms.Graph
objects only have a single edge attribute, and the "weight" parameters are ignored when passed a Graph object from our library (it's used when passed a networkx graph). We should strive to match networkx, so we should allow multi-property graphs and handle weight parameters.
To do this, it probably makes the most sense to use multiple graphblas matrices to hold edge data. One matrix should be boolean of all True to indicate the structure, and then we should have a new matrix for each edge property (which may have fewer elements than the structural matrix).
We should also consider how this will affect caching of properties (which probably needs reconsidered anyway). Perhaps we should create a pseudo-syntax (one that doesn't use eval
) such as A.select(tril)
for structural and A[myweight].select(tril)
for a property.
Package is now available one conda-forge. Update README to indicate that.
Calling an algorithm should accept two types of inputs: a raw Matrix or a wrapped Graph.
This will make the interface more uniform when interacting with nxapi
as Graphs will always be passed in, so a wrapper instance will be returned. But when using graphblas-algorithms without nxapi
, the need for wrapping can be controlled by the user in how they call algorithms.
We compute and cache many properties such as has_self_edges
, degrees-
(w/o diagonals), degrees+
(with diagonals), etc., which is actually very handy. However, the code is a little complicated (cough cough, my bad) and may be error prone (see: #82 (comment)). It would be great if we could refactor this to be more clear, maintainable, and safe. I imagine such a solution would be more declarative and less procedural.
Inspired from this paper, https://arxiv.org/pdf/2007.11111.pdf, we can compute square counting for all nodes via:
Q(~degrees.diag().S) << plus_pair(A @ A) # Use mask to ignore diagonal
all_squares = (Q * (Q - 1)).reduce_rowwise() // 2
This is probably better than counting squares one node at a time.
Now, can we come up with a better way to compute the denominator?
A few days ago a user posted an issue describing calculation's from NetworkX not matching against Graphblas's implementation on SO. Thought i'd pass on the details along as another kind gent has described where he believes the issue may be and I'm unable to verify if they're on the right track.
Original post: https://stackoverflow.com/questions/78383991/why-does-graphblas-return-different-results-as-networkx/78402677#78402677
import networkx as nx
import graphblas_algorithms as ga
G = nx.DiGraph([(0, 1), (0, 2), (0, 3), (1, 2), (1, 3)])
nx.in_degree_centrality(G)
{0: 0.0, 1: 0.3333333333333333, 2: 0.6666666666666666, 3: 0.6666666666666666}
GG = ga.Graph.from_networkx(G)
ga.in_degree_centrality(GG)
0 1 2 3
1.0 0.666667
From #82, showing <graphblas_algorithms.classes.nodemap.NodeMap object at 0x00000206BC590550>
is not the most helpful even though a NodeMap
is a MutableMapping
.
So, the repr of NodeMap
should be updated to look more dict-like. Same for VectorMap
, VectorNodeMap
, NodeNodeMap
, and NodeSet
(these could probably use basic docstrings too!).
in ga.Graph.__init__
we can specify key_to_id
. I can't see any reason why this must be a stdlib.dict
and not any other class that implements the typing.Mapping
protocol (__getitem__
, __len__
, __iter__
). I have a usecase where I have a more memory efficient mapping than a dict and would like to use it.
However, whenever the id_to_key
property is used, this will create a full inverse mapping, undoing any memory efficiency. It would be great to be able to pass an inverse mapping optionally (id_to_key
) to avoid this calculation if possible. Currently I'm doing:
G = ga.Graph(..., key_to_id=...)
G._id_to_key = ...
but obviously this relies on setting the 'private' _id_to_key
member, which has no API stability guarantees. I'd be happy to implement this change!
digraphTCC2 = ga.Graph.from_networkx(digraphTCC)
clus = nx.square_clustering(digraphTCC2)
print(clus)
for i in clus:
print(i)
I've just found graphblas-algorithms as a way to speed up computations on a networkx graph, and I've integrated it into my work for finding the square clustering coefficient of a graph. The speedup is immense, but I'm not sure how to interpret the result.
Based on the code snippet above (digraphTCC is a valid networkx graph), the result of the function, clus prints out as
<graphblas_algorithms.classes.nodemap.NodeMap object at 0x00000206BC590550>
.
I'm unaware how to interact with a NodeMap object, but when I iterate through it, the values seem to just be node labels. How should I interpret this result, and transform it into the dictionary of {node:clustering} that you would expect from the slow networkx version?
Thank you for your time!
We would like to be able to compare performance to graph-tool
: https://graph-tool.skewed.de
Some algorithms may be different, which can make direct comparisons difficult to do well (for example, stopping criteria for e.g. PageRank could be different).
It will be an ongoing effort to make it easier to contribute to graphblas-algorithms
, so lets gather ideas and prioritize in this thread. Heh, right now, I encourage potential contributors to be very patient. Things will get better/easier, I promise!
python-graphblas
python-graphblas
and graphblas-algorithms
python-graphblas
, but maybe not one that is fully tested or complete
I'm the most familiar with both python-graphblas
and graphblas-algorithms
, so I trust my judgement the least regarding how to improve onboarding and enabling contributions. @MridulS, @z3y50n, @ParticularMiner, what do you think would be the most helpful? Also, thanks for your patience :)
We would like to be able to compare performance to igraph
: https://github.com/igraph/python-igraph
Some algorithms may be different, which can make direct comparisons difficult to do well (for example, stopping criteria for e.g. PageRank could be different).
It would be super-duper handy to be able to automatically generate input graphs and node selection lists (i.e., masks) to run with both NetworkX and graphblas-algorithms
and compare results.
For example, it would be nice to cover:
Perhaps we could leverage hypothesis
to help generate random inputs. I would be delighted if we began with a very small subset of the above.
CC @jim22k who has done similar work in the past. Having this functionality would be incredibly useful in making sure we match NetworkX. For one thing, it would help us determine what to do about self-edges, which may be poorly defined at times for NetworkX (I really don't know if it is or not), but should be well-defined for us: do what NetworkX does.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.