GithubHelp home page GithubHelp logo

Comments (21)

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024 1

Oops -- Yes, I left out the GB_PUBLIC. Adding it now for another beta release.

Yes, I did handle the case where the new nrows or new ncols are equal to 1 specially so they should be pretty fast.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024 1

Added to v7.2.0.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024

Yes, it's on my TODO list. See my MATLAB implementaion of GrB.reshape:
https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/stable/GraphBLAS/%40GrB/reshape.m
which is very slow.

See also line 12:

% FUTURE: this would be faster as a built-in GxB_reshape function.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024

The GxB_Matrix_reshape method would need to know if it is working by row or by column. MATLAB assumes everything is stored by column, so my GrB.reshape MATLAB script is based on that assumption. If the matrix is held by row but a reshape wants to work by column, or visa versa, then GxB_Matrix_reshape would have to move the data around (a transpose I guess; I'm not sure about the details).

If the matrix is sparse, there's no way to do this except to rearrange the matrix. But in the bitmap or full case, the reshape could be done without moving data, if it happens to be stored the right way.

from graphblas.

eriknw avatar eriknw commented on September 17, 2024

Great to hear it's on your TODO list!

I think it would be best to specify in the function call whether it should work by row or by column (i.e., don't choose the direction based on the internal format of the matrix).

For now, I can export/calculate/import to get this functionality.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024

Yes, that's what I meant. The function would have to take in a parameter or descriptor or something, to tell it how to reshape, by row or by column. It may or may not match the current matrix format.

from graphblas.

eriknw avatar eriknw commented on September 17, 2024

Sounds good!

A more general operation could effectively allow us to "move axes" around; for example, to change (the effectively contiguous layout) from a 2x3x5 array to a 3x2x5. I think this is similar to permute in MATLAB. It is like moveaxis in NumPy. I'll illustrate this below.

Given a matrix where the linearized (either row-wise or column-wise) array is

>>> A = np.arange(2*3*5)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

we can permute it and reshape it in many different ways.

This is equivalent to reshape (b/c the "order" argument is [0, 1, 2]) assuming row-wise:

>>> regroup(A, [2, 3, 5], [0, 1, 2], output_shape=(2*3, 5))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

>>> regroup(A, [2, 3, 5], [0, 1, 2], output_shape=(2, 3*5))

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

The other permutations are:

>>> regroup(A, [2, 3, 5], [0, 2, 1], output_shape=(2*5, 3))

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14],
       [15, 20, 25],
       [16, 21, 26],
       [17, 22, 27],
       [18, 23, 28],
       [19, 24, 29]])

>>> regroup(A, [2, 3, 5], [0, 2, 1], output_shape=(2, 5*3))

array([[ 0,  5, 10,  1,  6, 11,  2,  7, 12,  3,  8, 13,  4,  9, 14],
       [15, 20, 25, 16, 21, 26, 17, 22, 27, 18, 23, 28, 19, 24, 29]])

>>> regroup(A, [2, 3, 5], [1, 0, 2], output_shape=(3*2, 5))

array([[ 0,  1,  2,  3,  4],
       [15, 16, 17, 18, 19],
       [ 5,  6,  7,  8,  9],
       [20, 21, 22, 23, 24],
       [10, 11, 12, 13, 14],
       [25, 26, 27, 28, 29]])

>>> regroup(A, [2, 3, 5], [1, 0, 2], output_shape=(3, 2*5))

array([[ 0,  1,  2,  3,  4, 15, 16, 17, 18, 19],
       [ 5,  6,  7,  8,  9, 20, 21, 22, 23, 24],
       [10, 11, 12, 13, 14, 25, 26, 27, 28, 29]])

>>> regroup(A, [2, 3, 5], [1, 2, 0], output_shape=(5*2, 3))

array([[ 0,  5, 10],
       [15, 20, 25],
       [ 1,  6, 11],
       [16, 21, 26],
       [ 2,  7, 12],
       [17, 22, 27],
       [ 3,  8, 13],
       [18, 23, 28],
       [ 4,  9, 14],
       [19, 24, 29]])

>>> regroup(A, [2, 3, 5], [1, 2, 0], output_shape=(5, 2*3))

array([[ 0,  5, 10, 15, 20, 25],
       [ 1,  6, 11, 16, 21, 26],
       [ 2,  7, 12, 17, 22, 27],
       [ 3,  8, 13, 18, 23, 28],
       [ 4,  9, 14, 19, 24, 29]])

>>> regroup(A, [2, 3, 5], [2, 0, 1], output_shape=(3*5, 2))

array([[ 0, 15],
       [ 1, 16],
       [ 2, 17],
       [ 3, 18],
       [ 4, 19],
       [ 5, 20],
       [ 6, 21],
       [ 7, 22],
       [ 8, 23],
       [ 9, 24],
       [10, 25],
       [11, 26],
       [12, 27],
       [13, 28],
       [14, 29]])

>>> regroup(A, [2, 3, 5], [2, 0, 1], output_shape=(3, 5*2))

array([[ 0, 15,  1, 16,  2, 17,  3, 18,  4, 19],
       [ 5, 20,  6, 21,  7, 22,  8, 23,  9, 24],
       [10, 25, 11, 26, 12, 27, 13, 28, 14, 29]])

>>> regroup(A, [2, 3, 5], [2, 1, 0], output_shape=(5*3, 2))

array([[ 0, 15],
       [ 5, 20],
       [10, 25],
       [ 1, 16],
       [ 6, 21],
       [11, 26],
       [ 2, 17],
       [ 7, 22],
       [12, 27],
       [ 3, 18],
       [ 8, 23],
       [13, 28],
       [ 4, 19],
       [ 9, 24],
       [14, 29]])

>>> regroup(A, [2, 3, 5], [2, 1, 0], output_shape=(5, 3*2))

array([[ 0, 15,  5, 20, 10, 25],
       [ 1, 16,  6, 21, 11, 26],
       [ 2, 17,  7, 22, 12, 27],
       [ 3, 18,  8, 23, 13, 28],
       [ 4, 19,  9, 24, 14, 29]])

There are other ways to spell this of course. Instead of giving the "current" shape, one could give the "target" shape, and also one could provide the source or target destination for the "reordering" argument. It may be useful to see how other libraries and languages spell similar operations.

If you're interested, I can provide a C prototype for regroup as above. Obviously, resize is much simpler and easier to use than regroup, but regroup is more capable, and it's structured enough such that we can be efficient when sorting of indices is necessary.

As with reshape, for now I can export/calculate/import for regroup as well.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024

I only support 2-dimensional matrices, though. For that case, isn't the only thing that is possible is to permute [0 1] to [1 0], which is the same as a transpose?

from graphblas.

eriknw avatar eriknw commented on September 17, 2024

Here, there can be any number of logical "group sizes" such as [2, 3, 5], but the data is stored as a 2d matrix. It's similar to how a 1-dim array can be interpreted as an N-dim array with the proper metadata. But, unlike with dense arrays where updating metadata is usually sufficient, doing a regroup operation on sparse will likely require computing new indices.

The 12 results shown above are valid representations of 3 dimensional data in a 2d structure. It is useful to perform reordering gymnastics to perform different tensor dot products and reductions.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024

from graphblas.

eriknw avatar eriknw commented on September 17, 2024

Understood. Thanks. An advantage I see for pushing this into a library such as SuiteSparse:GraphBLAS is you can do the right thing for your different internal formats, which means others don't need to worry about the fine details. Full and bitmap should be able to be done very fast. Some operations are able to preserve sorted-ness and may be faster than using extractTuples.

Thanks for entertaining the idea! I'm still +1 for adding a reshape function.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024

from graphblas.

ParticularMiner avatar ParticularMiner commented on September 17, 2024

Hi all, may I chip-in with a related request?

I would like to be able to convert back and forth between a Vector and a one row- (or one column-) Matrix. That is, to insert or remove a dimension/axis.

The only way I can think of right now to do this without too much memory overhead is to export the GraphBLAS Vector/Matrix and import it back again with the required axis-metadata.

But having this feature built-in could help in implementing a more efficient parallel matrix-multiplication in dask which employs such features.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024

Good question.

Internally, I cheat. I can just typecast an n-by-1 GrB_Matrix into a GrB_Vector, so long as the matrix is held in the right format (by column, not hypersparse). And I can typecast any GrB_Vector into an n-by-1 matrix, for any vector. That's cheating in the sense that it depends on my internal opaque formats; I don't allow the user to do this.

That typecast takes zero time, or O(1) if you like because it's a single pointer assignment at most, as in:

// given an n-by-1 matrix A that is held by column, and not in hypersparse form then this is OK:
GrB_Vector v = (GrB_Vector) A ;

The safest way to get this effect would be the import/export, or better yet the GxB pack/unpack methods. You can unpack an n-by-1 matrix A into its 3 components (assuming CSC format): Ap, Ai, and Ax. Then "pack" the Ai and Ax components into a GrB_Vector v. That also takes O(1) time, and if you reuse the A and v objects, no memory gets allocated at all.

from graphblas.

ParticularMiner avatar ParticularMiner commented on September 17, 2024

Thank you @DrTimothyAldenDavis . Interesting internal typecasting abilities.

I was also unaware of the pack/unpack methods — I’d been reading an old User Guide.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024

It would be awkward to have GxB_Matrix_reshape accept a parameter that states how the resulting matrix should be stored (by row or by column). There's no way that would be added to the spec that way, since GxB_Matrix_Option_set (GxB_FORMAT, GxB_BY_ROW) or ... by col ... is very specific to SuiteSparse:GraphBLAS.

The best solution would be to just return the matrix in the same format: if the input is by row then the result is by row, and if the input is by column then the result is held by column.

Another question: should this method modify the matrix in-place? As in:

GxB_Matrix_reshape (A, new_nrows, new_ncols) ;

or:

GxB_Matrix_reshape (&C, A, new_nrows, new_ncols) ;

Or perhaps the latter could be used, and C is a new matrix unless you do:

GxB_Matrix_reshape (&A, A, new_nrows, new_ncols) ;

Then you could do both: in-place to modify A, and out-of-place to do C=reshape(A, ...). In-place is faster since it doesn't require the numerical values to be modified or copied at all. The numerical values would not be touched, regardless of the format (sparse, hypersparse, bitmap, or full). That would make the in-place reshape very fast.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024

I'm working on this now and have a working draft in the master branch for v7.2.0. I've decided it's safest to write two user-callable reshape methods: one in-place like GrB_Matrix_resize, the other makes a copy:

GxB_Matrix_reshape (C, by_col, nrows_new, ncols_new, descriptor)

by_col is bool, and it tells C to be reshaped column-wise (if true) or row-wise (if false). The descriptor is used just to control the # of threads to use. C is modified in-place.

GxB_Matrix_reshapeDup (&C, A, by_col, nrows_new, ncols_new, descriptor)

constructs a new matrix C, and leaves A unchanged. I wouldn't expect the C API to add both of these. GrB_Matrix_resize doesn't have a descriptor, either, so I suppose the future C API might be just

GrB_Matrix_reshape (C, by_col, nrows_new, ncols_new)

Every GrB method should have a descriptor, or some kind of Context, to control the # threads used, what GPUs to use, etc, but that's beyond the scope of this method.

from graphblas.

eriknw avatar eriknw commented on September 17, 2024

Hooray 🎉

I'll think about these function signatures, because, yeah, we don't need/want type coercion, masks, accumulators, etc.

Instead of GxB_Matrix_reshapeDup, what's the downside of doing dup then GxB_Matrix_reshape?

And, yes, I hear you about descriptors. This percolates up to wrappers, because we want to expose your descriptors.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on September 17, 2024

Doing a dup and then a reshape will be slower. I need the reshapeDup for the computation C = reshape (A, mnew, nnew) in my MATLAB interface, for example, where A is an input matrix that won't change, and C is the output matrix.

v7.2.0 is now fully tested & documented, and ready to release so if you have any comments on the API please let me know as soon as you can. I will post it as beta.

from graphblas.

eriknw avatar eriknw commented on September 17, 2024

Did you forget to make these public via GB_PUBLIC?

from graphblas.

eriknw avatar eriknw commented on September 17, 2024

I think the API is good; it's awkward to fit both functionalities into a single function.

I'm curious: do you handle the cases nrows == 1 or ncols == 1 specifically? Will these cases be as fast as can reasonably be expected?

I'm actually pretty excited to get this in and start playing with it :)

from graphblas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.