python-graphblas / graphblas-algorithms Goto Github PK

Graph algorithms written in GraphBLAS

License: Apache License 2.0

Python 99.91% Shell 0.09%

complex-networks graph-algorithms graph-analysis graph-analytics graph-datastructures graph-library graph-theory graph-theory-algorithms graphblas pydata python

graphblas-algorithms's Issues

Algorithm Request: MST (5.6 Algebraic Prim's)

Hi all! Still getting used to the graphblas bindings and writing efficient enough algorithms to contribute effectively, but I thought I'd put a placeholder issue up in case someone else already has progress on this.

I don't see any of the graphblas python bindings implementing Algebraic Prim's from ch. 5.2 in the original Graph Algorithms in the Language of Linear Algebra book. MST is really quite useful to me, but in general as an approximation to the Steiner Tree for a given set of nodes and their metric closure.

I'm fairly certain the text states we cannot take advantage of the priority queue/heap speedup in linalg method, but perhaps someone has an idea (since Prim's is theoretically O(1) for sufficiently dense graphs, i.e. complete graphs of the metric closure! yay!)

Let me know if there's other info desired here for the feature request. I'm excited for an alternative to the old scipy minimum_spanning_tree method, since it's spending a lot of time on nested graph validation that isn't opt-out.

Thanks!

Logo for `graphblas-algorithms`

Now that we have a logo for python-graphblas (python-graphblas/python-graphblas#506), I suppose the logical next step is to make a logo for graphblas-algorithms. It would probably make sense to use a variant of the python-graphblas logo.

Here's one idea:

COMMUNITY: Regular GraphBLAS Algorithms community conference calls (Open to all)

For now, the GraphBLAS Algorithms community call is the same as the Python-graphblas community call:

python-graphblas/python-graphblas#247

If you're a new user or contributor looking for guidance (or just want to say hi!), please join. We're friendly :)

multi-property graphs and handling of "weight" parameter

Currently, graphblas_algorithms.Graph objects only have a single edge attribute, and the "weight" parameters are ignored when passed a Graph object from our library (it's used when passed a networkx graph). We should strive to match networkx, so we should allow multi-property graphs and handle weight parameters.

To do this, it probably makes the most sense to use multiple graphblas matrices to hold edge data. One matrix should be boolean of all True to indicate the structure, and then we should have a new matrix for each edge property (which may have fewer elements than the structural matrix).

We should also consider how this will affect caching of properties (which probably needs reconsidered anyway). Perhaps we should create a pseudo-syntax (one that doesn't use eval) such as A.select(tril) for structural and A[myweight].select(tril) for a property.

Update README

Package is now available one conda-forge. Update README to indicate that.

Raw vs Wrapped objects

Calling an algorithm should accept two types of inputs: a raw Matrix or a wrapped Graph.

When given a Matrix, the return type should be raw (i.e. a Matrix or Vector).
When given a Graph, the return type should be wrapped (i.e. a Graph or NodeMap).

This will make the interface more uniform when interacting with nxapi as Graphs will always be passed in, so a wrapper instance will be returned. But when using graphblas-algorithms without nxapi, the need for wrapping can be controlled by the user in how they call algorithms.

Redo/refactor calculations and caching of properties such as degrees

We compute and cache many properties such as has_self_edges, degrees- (w/o diagonals), degrees+ (with diagonals), etc., which is actually very handy. However, the code is a little complicated (cough cough, my bad) and may be error prone (see: #82 (comment)). It would be great if we could refactor this to be more clear, maintainable, and safe. I imagine such a solution would be more declarative and less procedural.

Alternative for square counting (for square clustering)

Inspired from this paper, https://arxiv.org/pdf/2007.11111.pdf, we can compute square counting for all nodes via:

    Q(~degrees.diag().S) << plus_pair(A @ A)  # Use mask to ignore diagonal
    all_squares = (Q * (Q - 1)).reduce_rowwise() // 2

This is probably better than counting squares one node at a time.

Now, can we come up with a better way to compute the denominator?

Degree Centrality Calculation Differs From Networkx

A few days ago a user posted an issue describing calculation's from NetworkX not matching against Graphblas's implementation on SO. Thought i'd pass on the details along as another kind gent has described where he believes the issue may be and I'm unable to verify if they're on the right track.

Original post: https://stackoverflow.com/questions/78383991/why-does-graphblas-return-different-results-as-networkx/78402677#78402677

import networkx as nx
import graphblas_algorithms as ga

G = nx.DiGraph([(0, 1), (0, 2), (0, 3), (1, 2), (1, 3)])
nx.in_degree_centrality(G)
{0: 0.0, 1: 0.3333333333333333, 2: 0.6666666666666666, 3: 0.6666666666666666}

GG = ga.Graph.from_networkx(G)
ga.in_degree_centrality(GG)
0   1     2 3
1.0 0.666667

`NodeMap` and other results need better reprs

From #82, showing <graphblas_algorithms.classes.nodemap.NodeMap object at 0x00000206BC590550> is not the most helpful even though a NodeMap is a MutableMapping.

So, the repr of NodeMap should be updated to look more dict-like. Same for VectorMap, VectorNodeMap, NodeNodeMap, and NodeSet (these could probably use basic docstrings too!).

add `id_to_key` as argument to graph constructor

in ga.Graph.__init__ we can specify key_to_id. I can't see any reason why this must be a stdlib.dict and not any other class that implements the typing.Mapping protocol (__getitem__, __len__, __iter__). I have a usecase where I have a more memory efficient mapping than a dict and would like to use it.

However, whenever the id_to_key property is used, this will create a full inverse mapping, undoing any memory efficiency. It would be great to be able to pass an inverse mapping optionally (id_to_key) to avoid this calculation if possible. Currently I'm doing:

G = ga.Graph(..., key_to_id=...)
G._id_to_key = ...

but obviously this relies on setting the 'private' _id_to_key member, which has no API stability guarantees. I'd be happy to implement this change!

square_clustering output using networkx api

digraphTCC2 = ga.Graph.from_networkx(digraphTCC)
clus = nx.square_clustering(digraphTCC2)
print(clus)

for i in clus:
print(i)

I've just found graphblas-algorithms as a way to speed up computations on a networkx graph, and I've integrated it into my work for finding the square clustering coefficient of a graph. The speedup is immense, but I'm not sure how to interpret the result.

Based on the code snippet above (digraphTCC is a valid networkx graph), the result of the function, clus prints out as
<graphblas_algorithms.classes.nodemap.NodeMap object at 0x00000206BC590550>.

I'm unaware how to interact with a NodeMap object, but when I iterate through it, the values seem to just be node labels. How should I interpret this result, and transform it into the dictionary of {node:clustering} that you would expect from the slow networkx version?

Thank you for your time!

Add `graph-tool` backend to `scripts/bench.py`

We would like to be able to compare performance to graph-tool: https://graph-tool.skewed.de

Some algorithms may be different, which can make direct comparisons difficult to do well (for example, stopping criteria for e.g. PageRank could be different).

Improving onboarding and enabling contributions

It will be an ongoing effort to make it easier to contribute to graphblas-algorithms, so lets gather ideas and prioritize in this thread. Heh, right now, I encourage potential contributors to be very patient. Things will get better/easier, I promise!

Improve the overall documentation of python-graphblas
- Also, point to "best" resources for Python users to learn GraphBLAS in general (i.e., choose a couple from https://graphblas.org/GraphBLAS-Pointers/)
Add contributing documentation to python-graphblas and graphblas-algorithms
- How to set up local dev environment, etc. Should borrow liberally from contributing docs from other libraries.
Create issues for "starter algorithms"
- Identify a networkx algorithm to implement
- Link to existing GraphBLAS implementations (and papers, etc) if possible
- Perhaps sketch an implementation in python-graphblas, but maybe not one that is fully tested or complete
  - Even with this, there is still plenty of work needed to add an algorithm
- This may be an effective use of my (@eriknw) or @jim22k's time
- See https://github.com/python-graphblas/graphblas-algorithms/wiki/Where-to-find-algorithms
Create "algorithm template" notebook that helps guide development
- Create example data, load datasets, guidelines for benchmarking, run networkx tests, etc.
Create notebooks with exercises as e.g. self-guided tutorials
- Example starter exercises:
  - Find neighbors of node X
  - Find "friends of friends"
  - Find predecessors (or successors) of directed graphs
  - etc.; there are lots of "simple" algorithms we can build up to
- I think some notebooks should encourage hands-on experimentation, not just reading
I think dev notebooks could be a nice side-effect of adding algorithms
- For example, for triangle counting, there were lots of possibilities to choose from, and I decided based on benchmarks
- These notebooks can help demonstrate how algorithm development is done, and help revise algorithms later if necessary
- These may be messy. Not intended as tutorial notebooks. I have some notebooks around that probably belong somewhere.
- I don't want to require a notebook for each algorithm, but it's a nice-to-have.
Don't forget about contributions other than algorithms!

I'm the most familiar with both python-graphblas and graphblas-algorithms, so I trust my judgement the least regarding how to improve onboarding and enabling contributions. @MridulS, @z3y50n, @ParticularMiner, what do you think would be the most helpful? Also, thanks for your patience :)

Add `igraph` backend to `scripts/bench.py`

We would like to be able to compare performance to igraph: https://github.com/igraph/python-igraph

Some algorithms may be different, which can make direct comparisons difficult to do well (for example, stopping criteria for e.g. PageRank could be different).

Automated testing to compare with networkx

It would be super-duper handy to be able to automatically generate input graphs and node selection lists (i.e., masks) to run with both NetworkX and graphblas-algorithms and compare results.

For example, it would be nice to cover:

Graph and DiGraph
With and w/o self-edges
Symmetric and asymmetric DiGraph
Purely structural (all edges 1)
Edge values contain all combinations of {positive, 0, negative}, and with ints or floats
Different edge densities, including full graph (w/ and w/o self-edges)
Empty graph, only self-edges
Some rows and/or columns are empty (i.e., no in-edges or out-edges or both)
Perhaps some graphs with specific shapes: ring, tree, DAG, etc
Bipartite graphs
More than one groups of connected components (by construction)
etc.

Perhaps we could leverage hypothesis to help generate random inputs. I would be delighted if we began with a very small subset of the above.

CC @jim22k who has done similar work in the past. Having this functionality would be incredibly useful in making sure we match NetworkX. For one thing, it would help us determine what to do about self-edges, which may be poorly defined at times for NetworkX (I really don't know if it is or not), but should be well-defined for us: do what NetworkX does.

python-graphblas / graphblas-algorithms Goto Github PK

graphblas-algorithms's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs