Comments (14)
Thanks for the issue and question @danielverd, and great to hear "the speedup is immense"!
Take one: NodeMap
is a MutableMapping
, so you can use it like a dict
such as clus[key]
and for key, val in clus.items()
. You can also call dict(clus)
to convert it to a dict
.
Remark: NodeMap
also has .vector
attribute that is the underlying GraphBLAS vector, and it has .key_to_id
attribute that may be a dict that maps the vector index to the node labels.
Take two: it looks like we ought to improve the repr of NodeMap
so the keys and values (and dict-like behavior) are more obvious.
Take three: more generally, I guess we ought to improve our documentation including providing examples.
Finally, I'm working very actively on improving dispatching in NetworkX and implementing backends, so please don't be shy about reporting issues, recommending algorithms, and general user-experience suggestions :)
from graphblas-algorithms.
One final question, do you have certainty that the outputs of square_clustering are identical to networkx? I've gotten a situation where the output becomes negative which shouldn't be possible when averaging out positive numbers (but that could be an issue outside of square_clustering).
This library uses the networkx test suite to run the tests so it should be identical. If you do end up finding a corner case please do open an issue.
from graphblas-algorithms.
I'll report anything I can find.
I'm currently working on getting a publication finalized, but once that's over I could help out with this project since it's helped save me so much compute time lol.
One final question, do you have certainty that the outputs of square_clustering are identical to networkx? I've gotten a situation where the output becomes negative which shouldn't be possible when averaging out positive numbers (but that could be an issue outside of square_clustering).
from graphblas-algorithms.
Yes, a reproducible example would help.
from graphblas-algorithms.
Sounds great @danielverd! Good luck on the publication.
Does your graph have self-edges? As MridulS said, we do run against (and pass) networkx tests, but I don't think networkx tests square clustering with self-edges, and this is the only way I could think to get negative results (or maybe integer overflow?). How large is the graph, and what's the largest degree of a node?
from graphblas-algorithms.
yeah, i've been messing around with it. [The issue might be that any zeroes come back as integers.] this doesn't seem to be the issue
I'll keep checking myself because I'm not entirely sure I can share the data I used that led to this issue. Sorry about that, but I'll see what I can find.
from graphblas-algorithms.
How large is the graph, and what's the largest degree of a node?
260587 edges,
132922 nodes,
and the highest degree is 42047.
from graphblas-algorithms.
Perfect thanks.
I'm looking at the implementation more closely, and I think we should probably specify the dtype when using e.g. plus_pair
semiring in algorithms. @jim22k was smart enough to do this in k_truss
. Otherwise, the return dtype of plus_pair
is determined by the inputs, which may be e.g. INT8
, which may not behave well. Gonna try this out, and I'll push a bugfix release if necessary.
from graphblas-algorithms.
Fantastic.
Are there any workarounds you suppose I could try in the meantime?
from graphblas-algorithms.
Are there any workarounds you suppose I could try in the meantime?
Sure. How are you using graphblas-algorithms
? Do you start with a networkx Graph or a graphblas Matrix, and are you calling from networkx?
networkx graph G
, calling from networkx
nx.square_clustering(G, backend="graphblas") # no workaround needed (I hope)
graphblas Matrix A
, calling from networkx
nx.square_clustering(A.dup(int))
So, I have been able to reproduce negative results using graphblas Matrix objects with dtype of int8
as inputs, so we do need to be more careful about dtypes during calculations. My bad! I got swamped today, but I will try to fix this tomorrow.
Also, I think we get the same results as networkx with the presense of self-loops (or matrix diagonals), so that probably isn't an issue.
from graphblas-algorithms.
Sure. How are you using
graphblas-algorithms
? Do you start with a networkx Graph or a graphblas Matrix, and are you calling from networkx?
I've been using the Dispatch Example from the README, which converts from networkx to a ga.Graph and back after calling the function from networkx.
I just tried the first suggestion with the networkx graph and got this error from algorithms.square_clustering(G, chunk_ids)
A, degrees = G.get_properties("A degrees+")
KeyError: 'degrees+'
from graphblas-algorithms.
I've been using the Dispatch Example from the README, which converts from networkx to a ga.Graph and back after calling the function from networkx.
Cool, thanks. Does the networkx graph have weights with "weight"
key, and are they by chance boolean?
I just tried the first suggestion with the networkx graph and got this error [...]
Oh my! That shouldn't happen. Can you share the full stacktrace please (and a minimal reproducible example if it's not too much hassle)?
from graphblas-algorithms.
Hey @danielverd, think I have a fix for this in python-graphblas/python-graphblas#524
Specifically, these two lines are needed:
https://github.com/python-graphblas/python-graphblas/pull/524/files#diff-b4b616572292444660e5f89b1a07a69d0f2d049cecd40c4c41b3dd82638bce53R79-R80
I'd like to release python-graphblas
tomorrow (Wednesday) with this fix.
If you still see negative results after this, then we may need to investigate further into integer overflow when int64
is too small. If this happens, perhaps we can change the calculation to prevent overflow or figure out when it may occur and use float64
. I may play around with some synthetic graphs around the same size as yours.
Note to observers: we would still like to improve the repr of NodeMap
and other returned objects ;)
from graphblas-algorithms.
@danielverd, python-graphblas
version 2023.12.0 was released and is available from PyPI and conda-forge. I believe updating to this version should fix the issue with negative results for square clustering.
In my testing, square_clustering
can safely handle graphs with largest node degree of 250,000 (even if they all have this degree, which is quite large--more than 62 billion edges!). I don't know the upper limit. If the graph has no self-edges (i.e., adjacency matrix has no values on the diagonal), then the results from square clustering should be within [0, 1]
, and if they are not then overflow must have occurred.
I created issues #85 and #86 to capture other tasks from this issue.
Closing, b/c I think all questions have been answered and issues have been fixed or captured. @danielverd, good luck! Please don't be shy about asking for help. Have you found any other fast square clustering algorithms around?
from graphblas-algorithms.
Related Issues (15)
- Automated testing to compare with networkx HOT 1
- COMMUNITY: Regular GraphBLAS Algorithms community conference calls (Open to all)
- Alternative for square counting (for square clustering)
- Improving onboarding and enabling contributions HOT 2
- multi-property graphs and handling of "weight" parameter
- Update README
- Raw vs Wrapped objects
- Add `graph-tool` backend to `scripts/bench.py`
- Add `igraph` backend to `scripts/bench.py` HOT 1
- Logo for `graphblas-algorithms` HOT 3
- `NodeMap` and other results need better reprs
- Redo/refactor calculations and caching of properties such as degrees
- add `id_to_key` as argument to graph constructor HOT 1
- Degree Centrality Calculation Differs From Networkx HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graphblas-algorithms.