I would find it useful to be able to reshape a matrix while not modifying the values.

Here, there can be any number of logical "group sizes" such as <code class="n

Reshape matrix,about drtimothyaldendavis/graphblas

Comments (21)

DrTimothyAldenDavis commented on September 17, 2024 1

Oops -- Yes, I left out the GB_PUBLIC. Adding it now for another beta release.

Yes, I did handle the case where the new nrows or new ncols are equal to 1 specially so they should be pretty fast.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024 1

Added to v7.2.0.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024

Yes, it's on my TODO list. See my MATLAB implementaion of GrB.reshape:
https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/stable/GraphBLAS/%40GrB/reshape.m
which is very slow.

See also line 12:

% FUTURE: this would be faster as a built-in GxB_reshape function.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024

The GxB_Matrix_reshape method would need to know if it is working by row or by column. MATLAB assumes everything is stored by column, so my GrB.reshape MATLAB script is based on that assumption. If the matrix is held by row but a reshape wants to work by column, or visa versa, then GxB_Matrix_reshape would have to move the data around (a transpose I guess; I'm not sure about the details).

If the matrix is sparse, there's no way to do this except to rearrange the matrix. But in the bitmap or full case, the reshape could be done without moving data, if it happens to be stored the right way.

from graphblas.

eriknw commented on September 17, 2024

Great to hear it's on your TODO list!

I think it would be best to specify in the function call whether it should work by row or by column (i.e., don't choose the direction based on the internal format of the matrix).

For now, I can export/calculate/import to get this functionality.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024

Yes, that's what I meant. The function would have to take in a parameter or descriptor or something, to tell it how to reshape, by row or by column. It may or may not match the current matrix format.

from graphblas.

eriknw commented on September 17, 2024

Sounds good!

A more general operation could effectively allow us to "move axes" around; for example, to change (the effectively contiguous layout) from a 2x3x5 array to a 3x2x5. I think this is similar to permute in MATLAB. It is like moveaxis in NumPy. I'll illustrate this below.

Given a matrix where the linearized (either row-wise or column-wise) array is

>>> A = np.arange(2*3*5)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

we can permute it and reshape it in many different ways.

This is equivalent to reshape (b/c the "order" argument is [0, 1, 2]) assuming row-wise:

>>> regroup(A, [2, 3, 5], [0, 1, 2], output_shape=(2*3, 5))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

>>> regroup(A, [2, 3, 5], [0, 1, 2], output_shape=(2, 3*5))

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

The other permutations are:

>>> regroup(A, [2, 3, 5], [0, 2, 1], output_shape=(2*5, 3))

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14],
       [15, 20, 25],
       [16, 21, 26],
       [17, 22, 27],
       [18, 23, 28],
       [19, 24, 29]])

>>> regroup(A, [2, 3, 5], [0, 2, 1], output_shape=(2, 5*3))

array([[ 0,  5, 10,  1,  6, 11,  2,  7, 12,  3,  8, 13,  4,  9, 14],
       [15, 20, 25, 16, 21, 26, 17, 22, 27, 18, 23, 28, 19, 24, 29]])

>>> regroup(A, [2, 3, 5], [1, 0, 2], output_shape=(3*2, 5))

array([[ 0,  1,  2,  3,  4],
       [15, 16, 17, 18, 19],
       [ 5,  6,  7,  8,  9],
       [20, 21, 22, 23, 24],
       [10, 11, 12, 13, 14],
       [25, 26, 27, 28, 29]])

>>> regroup(A, [2, 3, 5], [1, 0, 2], output_shape=(3, 2*5))

array([[ 0,  1,  2,  3,  4, 15, 16, 17, 18, 19],
       [ 5,  6,  7,  8,  9, 20, 21, 22, 23, 24],
       [10, 11, 12, 13, 14, 25, 26, 27, 28, 29]])

>>> regroup(A, [2, 3, 5], [1, 2, 0], output_shape=(5*2, 3))

array([[ 0,  5, 10],
       [15, 20, 25],
       [ 1,  6, 11],
       [16, 21, 26],
       [ 2,  7, 12],
       [17, 22, 27],
       [ 3,  8, 13],
       [18, 23, 28],
       [ 4,  9, 14],
       [19, 24, 29]])

>>> regroup(A, [2, 3, 5], [1, 2, 0], output_shape=(5, 2*3))

array([[ 0,  5, 10, 15, 20, 25],
       [ 1,  6, 11, 16, 21, 26],
       [ 2,  7, 12, 17, 22, 27],
       [ 3,  8, 13, 18, 23, 28],
       [ 4,  9, 14, 19, 24, 29]])

>>> regroup(A, [2, 3, 5], [2, 0, 1], output_shape=(3*5, 2))

array([[ 0, 15],
       [ 1, 16],
       [ 2, 17],
       [ 3, 18],
       [ 4, 19],
       [ 5, 20],
       [ 6, 21],
       [ 7, 22],
       [ 8, 23],
       [ 9, 24],
       [10, 25],
       [11, 26],
       [12, 27],
       [13, 28],
       [14, 29]])

>>> regroup(A, [2, 3, 5], [2, 0, 1], output_shape=(3, 5*2))

array([[ 0, 15,  1, 16,  2, 17,  3, 18,  4, 19],
       [ 5, 20,  6, 21,  7, 22,  8, 23,  9, 24],
       [10, 25, 11, 26, 12, 27, 13, 28, 14, 29]])

>>> regroup(A, [2, 3, 5], [2, 1, 0], output_shape=(5*3, 2))

array([[ 0, 15],
       [ 5, 20],
       [10, 25],
       [ 1, 16],
       [ 6, 21],
       [11, 26],
       [ 2, 17],
       [ 7, 22],
       [12, 27],
       [ 3, 18],
       [ 8, 23],
       [13, 28],
       [ 4, 19],
       [ 9, 24],
       [14, 29]])

>>> regroup(A, [2, 3, 5], [2, 1, 0], output_shape=(5, 3*2))

array([[ 0, 15,  5, 20, 10, 25],
       [ 1, 16,  6, 21, 11, 26],
       [ 2, 17,  7, 22, 12, 27],
       [ 3, 18,  8, 23, 13, 28],
       [ 4, 19,  9, 24, 14, 29]])

There are other ways to spell this of course. Instead of giving the "current" shape, one could give the "target" shape, and also one could provide the source or target destination for the "reordering" argument. It may be useful to see how other libraries and languages spell similar operations.

If you're interested, I can provide a C prototype for regroup as above. Obviously, resize is much simpler and easier to use than regroup, but regroup is more capable, and it's structured enough such that we can be efficient when sorting of indices is necessary.

As with reshape, for now I can export/calculate/import for regroup as well.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024

I only support 2-dimensional matrices, though. For that case, isn't the only thing that is possible is to permute [0 1] to [1 0], which is the same as a transpose?

from graphblas.

eriknw commented on September 17, 2024

Here, there can be any number of logical "group sizes" such as [2, 3, 5], but the data is stored as a 2d matrix. It's similar to how a 1-dim array can be interpreted as an N-dim array with the proper metadata. But, unlike with dense arrays where updating metadata is usually sufficient, doing a regroup operation on sparse will likely require computing new indices.

The 12 results shown above are valid representations of 3 dimensional data in a 2d structure. It is useful to perform reordering gymnastics to perform different tensor dot products and reductions.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024

I see. I don’t think I can do it any faster than extractTuples, then remap the indices (simple parallelism), then build. That’s how I would do it internally I think. Even some of my transpose methods do that when A is hypersparse. Maybe doing it internally would save a bit of memory since I could do some work in place. Probably the best place for a method like this is an LAGraph utility. It would be just as fast as if I were to do it, and it would have the side benefit of working for all GraphBLAS implementations.

from graphblas.

eriknw commented on September 17, 2024

Understood. Thanks. An advantage I see for pushing this into a library such as SuiteSparse:GraphBLAS is you can do the right thing for your different internal formats, which means others don't need to worry about the fine details. Full and bitmap should be able to be done very fast. Some operations are able to preserve sorted-ness and may be faster than using extractTuples.

Thanks for entertaining the idea! I'm still +1 for adding a reshape function.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024

True. I'll keep it as an open issue.

…

On Wed, Oct 20, 2021 at 7:44 PM Erik Welch ***@***.***> wrote: Understood. Thanks. An advantage I see for pushing this into a library such as SuiteSparse:GraphBLAS is you can do the right thing for your different internal formats, which means others don't need to worry about the fine details. Full and bitmap should be able to be done very fast. Some operations are able to preserve sorted-ness and *may* be faster than using extractTuples. Thanks for entertaining the idea! I'm still +1 for adding a reshape function. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYIIONEMUQHXMET5N4GFETUH5O7XANCNFSM5GMDYI5A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

from graphblas.

ParticularMiner commented on September 17, 2024

Hi all, may I chip-in with a related request?

I would like to be able to convert back and forth between a Vector and a one row- (or one column-) Matrix. That is, to insert or remove a dimension/axis.

The only way I can think of right now to do this without too much memory overhead is to export the GraphBLAS Vector/Matrix and import it back again with the required axis-metadata.

But having this feature built-in could help in implementing a more efficient parallel matrix-multiplication in dask which employs such features.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024

Good question.

Internally, I cheat. I can just typecast an n-by-1 GrB_Matrix into a GrB_Vector, so long as the matrix is held in the right format (by column, not hypersparse). And I can typecast any GrB_Vector into an n-by-1 matrix, for any vector. That's cheating in the sense that it depends on my internal opaque formats; I don't allow the user to do this.

That typecast takes zero time, or O(1) if you like because it's a single pointer assignment at most, as in:

// given an n-by-1 matrix A that is held by column, and not in hypersparse form then this is OK:
GrB_Vector v = (GrB_Vector) A ;

The safest way to get this effect would be the import/export, or better yet the GxB pack/unpack methods. You can unpack an n-by-1 matrix A into its 3 components (assuming CSC format): Ap, Ai, and Ax. Then "pack" the Ai and Ax components into a GrB_Vector v. That also takes O(1) time, and if you reuse the A and v objects, no memory gets allocated at all.

from graphblas.

ParticularMiner commented on September 17, 2024

Thank you @DrTimothyAldenDavis . Interesting internal typecasting abilities.

I was also unaware of the pack/unpack methods — I’d been reading an old User Guide.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024

It would be awkward to have GxB_Matrix_reshape accept a parameter that states how the resulting matrix should be stored (by row or by column). There's no way that would be added to the spec that way, since GxB_Matrix_Option_set (GxB_FORMAT, GxB_BY_ROW) or ... by col ... is very specific to SuiteSparse:GraphBLAS.

The best solution would be to just return the matrix in the same format: if the input is by row then the result is by row, and if the input is by column then the result is held by column.

Another question: should this method modify the matrix in-place? As in:

GxB_Matrix_reshape (A, new_nrows, new_ncols) ;

or:

GxB_Matrix_reshape (&C, A, new_nrows, new_ncols) ;

Or perhaps the latter could be used, and C is a new matrix unless you do:

GxB_Matrix_reshape (&A, A, new_nrows, new_ncols) ;

Then you could do both: in-place to modify A, and out-of-place to do C=reshape(A, ...). In-place is faster since it doesn't require the numerical values to be modified or copied at all. The numerical values would not be touched, regardless of the format (sparse, hypersparse, bitmap, or full). That would make the in-place reshape very fast.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024

I'm working on this now and have a working draft in the master branch for v7.2.0. I've decided it's safest to write two user-callable reshape methods: one in-place like GrB_Matrix_resize, the other makes a copy:

GxB_Matrix_reshape (C, by_col, nrows_new, ncols_new, descriptor)

by_col is bool, and it tells C to be reshaped column-wise (if true) or row-wise (if false). The descriptor is used just to control the # of threads to use. C is modified in-place.

GxB_Matrix_reshapeDup (&C, A, by_col, nrows_new, ncols_new, descriptor)

constructs a new matrix C, and leaves A unchanged. I wouldn't expect the C API to add both of these. GrB_Matrix_resize doesn't have a descriptor, either, so I suppose the future C API might be just

GrB_Matrix_reshape (C, by_col, nrows_new, ncols_new)

Every GrB method should have a descriptor, or some kind of Context, to control the # threads used, what GPUs to use, etc, but that's beyond the scope of this method.

from graphblas.

eriknw commented on September 17, 2024

Hooray 🎉

I'll think about these function signatures, because, yeah, we don't need/want type coercion, masks, accumulators, etc.

Instead of GxB_Matrix_reshapeDup, what's the downside of doing dup then GxB_Matrix_reshape?

And, yes, I hear you about descriptors. This percolates up to wrappers, because we want to expose your descriptors.

from graphblas.

DrTimothyAldenDavis commented on September 17, 2024

Doing a dup and then a reshape will be slower. I need the reshapeDup for the computation C = reshape (A, mnew, nnew) in my MATLAB interface, for example, where A is an input matrix that won't change, and C is the output matrix.

v7.2.0 is now fully tested & documented, and ready to release so if you have any comments on the API please let me know as soon as you can. I will post it as beta.

from graphblas.

eriknw commented on September 17, 2024

Did you forget to make these public via GB_PUBLIC?

from graphblas.

eriknw commented on September 17, 2024

I think the API is good; it's awkward to fit both functionalities into a single function.

I'm curious: do you handle the cases nrows == 1 or ncols == 1 specifically? Will these cases be as fast as can reasonably be expected?

I'm actually pretty excited to get this in and start playing with it :)

from graphblas.

Reshape matrix about graphblas HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs