GithubHelp home page GithubHelp logo

Comments (14)

eriknw avatar eriknw commented on August 10, 2024 2

Thanks for the issue and question @danielverd, and great to hear "the speedup is immense"!

Take one: NodeMap is a MutableMapping, so you can use it like a dict such as clus[key] and for key, val in clus.items(). You can also call dict(clus) to convert it to a dict.

Remark: NodeMap also has .vector attribute that is the underlying GraphBLAS vector, and it has .key_to_id attribute that may be a dict that maps the vector index to the node labels.

Take two: it looks like we ought to improve the repr of NodeMap so the keys and values (and dict-like behavior) are more obvious.

Take three: more generally, I guess we ought to improve our documentation including providing examples.

Finally, I'm working very actively on improving dispatching in NetworkX and implementing backends, so please don't be shy about reporting issues, recommending algorithms, and general user-experience suggestions :)

from graphblas-algorithms.

MridulS avatar MridulS commented on August 10, 2024 1

One final question, do you have certainty that the outputs of square_clustering are identical to networkx? I've gotten a situation where the output becomes negative which shouldn't be possible when averaging out positive numbers (but that could be an issue outside of square_clustering).

This library uses the networkx test suite to run the tests so it should be identical. If you do end up finding a corner case please do open an issue.

from graphblas-algorithms.

danielverd avatar danielverd commented on August 10, 2024

I'll report anything I can find.

I'm currently working on getting a publication finalized, but once that's over I could help out with this project since it's helped save me so much compute time lol.

One final question, do you have certainty that the outputs of square_clustering are identical to networkx? I've gotten a situation where the output becomes negative which shouldn't be possible when averaging out positive numbers (but that could be an issue outside of square_clustering).

from graphblas-algorithms.

SultanOrazbayev avatar SultanOrazbayev commented on August 10, 2024

Yes, a reproducible example would help.

from graphblas-algorithms.

eriknw avatar eriknw commented on August 10, 2024

Sounds great @danielverd! Good luck on the publication.

Does your graph have self-edges? As MridulS said, we do run against (and pass) networkx tests, but I don't think networkx tests square clustering with self-edges, and this is the only way I could think to get negative results (or maybe integer overflow?). How large is the graph, and what's the largest degree of a node?

from graphblas-algorithms.

danielverd avatar danielverd commented on August 10, 2024

yeah, i've been messing around with it. [The issue might be that any zeroes come back as integers.] this doesn't seem to be the issue

I'll keep checking myself because I'm not entirely sure I can share the data I used that led to this issue. Sorry about that, but I'll see what I can find.

from graphblas-algorithms.

danielverd avatar danielverd commented on August 10, 2024

How large is the graph, and what's the largest degree of a node?

260587 edges,
132922 nodes,
and the highest degree is 42047.

from graphblas-algorithms.

eriknw avatar eriknw commented on August 10, 2024

Perfect thanks.

I'm looking at the implementation more closely, and I think we should probably specify the dtype when using e.g. plus_pair semiring in algorithms. @jim22k was smart enough to do this in k_truss. Otherwise, the return dtype of plus_pair is determined by the inputs, which may be e.g. INT8, which may not behave well. Gonna try this out, and I'll push a bugfix release if necessary.

from graphblas-algorithms.

danielverd avatar danielverd commented on August 10, 2024

Fantastic.
Are there any workarounds you suppose I could try in the meantime?

from graphblas-algorithms.

eriknw avatar eriknw commented on August 10, 2024

Are there any workarounds you suppose I could try in the meantime?

Sure. How are you using graphblas-algorithms? Do you start with a networkx Graph or a graphblas Matrix, and are you calling from networkx?

networkx graph G, calling from networkx

nx.square_clustering(G, backend="graphblas")  # no workaround needed (I hope)

graphblas Matrix A, calling from networkx

nx.square_clustering(A.dup(int))

So, I have been able to reproduce negative results using graphblas Matrix objects with dtype of int8 as inputs, so we do need to be more careful about dtypes during calculations. My bad! I got swamped today, but I will try to fix this tomorrow.

Also, I think we get the same results as networkx with the presense of self-loops (or matrix diagonals), so that probably isn't an issue.

from graphblas-algorithms.

danielverd avatar danielverd commented on August 10, 2024

Sure. How are you using graphblas-algorithms? Do you start with a networkx Graph or a graphblas Matrix, and are you calling from networkx?

I've been using the Dispatch Example from the README, which converts from networkx to a ga.Graph and back after calling the function from networkx.

I just tried the first suggestion with the networkx graph and got this error from algorithms.square_clustering(G, chunk_ids)
A, degrees = G.get_properties("A degrees+")
KeyError: 'degrees+'

from graphblas-algorithms.

eriknw avatar eriknw commented on August 10, 2024

I've been using the Dispatch Example from the README, which converts from networkx to a ga.Graph and back after calling the function from networkx.

Cool, thanks. Does the networkx graph have weights with "weight" key, and are they by chance boolean?

I just tried the first suggestion with the networkx graph and got this error [...]

Oh my! That shouldn't happen. Can you share the full stacktrace please (and a minimal reproducible example if it's not too much hassle)?

from graphblas-algorithms.

eriknw avatar eriknw commented on August 10, 2024

Hey @danielverd, think I have a fix for this in python-graphblas/python-graphblas#524

Specifically, these two lines are needed:
https://github.com/python-graphblas/python-graphblas/pull/524/files#diff-b4b616572292444660e5f89b1a07a69d0f2d049cecd40c4c41b3dd82638bce53R79-R80

I'd like to release python-graphblas tomorrow (Wednesday) with this fix.

If you still see negative results after this, then we may need to investigate further into integer overflow when int64 is too small. If this happens, perhaps we can change the calculation to prevent overflow or figure out when it may occur and use float64. I may play around with some synthetic graphs around the same size as yours.

Note to observers: we would still like to improve the repr of NodeMap and other returned objects ;)

from graphblas-algorithms.

eriknw avatar eriknw commented on August 10, 2024

@danielverd, python-graphblas version 2023.12.0 was released and is available from PyPI and conda-forge. I believe updating to this version should fix the issue with negative results for square clustering.

In my testing, square_clustering can safely handle graphs with largest node degree of 250,000 (even if they all have this degree, which is quite large--more than 62 billion edges!). I don't know the upper limit. If the graph has no self-edges (i.e., adjacency matrix has no values on the diagonal), then the results from square clustering should be within [0, 1], and if they are not then overflow must have occurred.

I created issues #85 and #86 to capture other tasks from this issue.

Closing, b/c I think all questions have been answered and issues have been fixed or captured. @danielverd, good luck! Please don't be shy about asking for help. Have you found any other fast square clustering algorithms around?

from graphblas-algorithms.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.