Comments (21)
Oops -- Yes, I left out the GB_PUBLIC. Adding it now for another beta release.
Yes, I did handle the case where the new nrows or new ncols are equal to 1 specially so they should be pretty fast.
from graphblas.
Added to v7.2.0.
from graphblas.
Yes, it's on my TODO list. See my MATLAB implementaion of GrB.reshape:
https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/stable/GraphBLAS/%40GrB/reshape.m
which is very slow.
See also line 12:
% FUTURE: this would be faster as a built-in GxB_reshape function.
from graphblas.
The GxB_Matrix_reshape method would need to know if it is working by row or by column. MATLAB assumes everything is stored by column, so my GrB.reshape MATLAB script is based on that assumption. If the matrix is held by row but a reshape wants to work by column, or visa versa, then GxB_Matrix_reshape would have to move the data around (a transpose I guess; I'm not sure about the details).
If the matrix is sparse, there's no way to do this except to rearrange the matrix. But in the bitmap or full case, the reshape could be done without moving data, if it happens to be stored the right way.
from graphblas.
Great to hear it's on your TODO list!
I think it would be best to specify in the function call whether it should work by row or by column (i.e., don't choose the direction based on the internal format of the matrix).
For now, I can export/calculate/import to get this functionality.
from graphblas.
Yes, that's what I meant. The function would have to take in a parameter or descriptor or something, to tell it how to reshape, by row or by column. It may or may not match the current matrix format.
from graphblas.
Sounds good!
A more general operation could effectively allow us to "move axes" around; for example, to change (the effectively contiguous layout) from a 2x3x5
array to a 3x2x5
. I think this is similar to permute
in MATLAB. It is like moveaxis
in NumPy. I'll illustrate this below.
Given a matrix where the linearized (either row-wise or column-wise) array is
>>> A = np.arange(2*3*5)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
we can permute it and reshape it in many different ways.
This is equivalent to reshape
(b/c the "order" argument is [0, 1, 2]
) assuming row-wise:
>>> regroup(A, [2, 3, 5], [0, 1, 2], output_shape=(2*3, 5))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])
>>> regroup(A, [2, 3, 5], [0, 1, 2], output_shape=(2, 3*5))
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
The other permutations are:
>>> regroup(A, [2, 3, 5], [0, 2, 1], output_shape=(2*5, 3))
array([[ 0, 5, 10],
[ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14],
[15, 20, 25],
[16, 21, 26],
[17, 22, 27],
[18, 23, 28],
[19, 24, 29]])
>>> regroup(A, [2, 3, 5], [0, 2, 1], output_shape=(2, 5*3))
array([[ 0, 5, 10, 1, 6, 11, 2, 7, 12, 3, 8, 13, 4, 9, 14],
[15, 20, 25, 16, 21, 26, 17, 22, 27, 18, 23, 28, 19, 24, 29]])
>>> regroup(A, [2, 3, 5], [1, 0, 2], output_shape=(3*2, 5))
array([[ 0, 1, 2, 3, 4],
[15, 16, 17, 18, 19],
[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[10, 11, 12, 13, 14],
[25, 26, 27, 28, 29]])
>>> regroup(A, [2, 3, 5], [1, 0, 2], output_shape=(3, 2*5))
array([[ 0, 1, 2, 3, 4, 15, 16, 17, 18, 19],
[ 5, 6, 7, 8, 9, 20, 21, 22, 23, 24],
[10, 11, 12, 13, 14, 25, 26, 27, 28, 29]])
>>> regroup(A, [2, 3, 5], [1, 2, 0], output_shape=(5*2, 3))
array([[ 0, 5, 10],
[15, 20, 25],
[ 1, 6, 11],
[16, 21, 26],
[ 2, 7, 12],
[17, 22, 27],
[ 3, 8, 13],
[18, 23, 28],
[ 4, 9, 14],
[19, 24, 29]])
>>> regroup(A, [2, 3, 5], [1, 2, 0], output_shape=(5, 2*3))
array([[ 0, 5, 10, 15, 20, 25],
[ 1, 6, 11, 16, 21, 26],
[ 2, 7, 12, 17, 22, 27],
[ 3, 8, 13, 18, 23, 28],
[ 4, 9, 14, 19, 24, 29]])
>>> regroup(A, [2, 3, 5], [2, 0, 1], output_shape=(3*5, 2))
array([[ 0, 15],
[ 1, 16],
[ 2, 17],
[ 3, 18],
[ 4, 19],
[ 5, 20],
[ 6, 21],
[ 7, 22],
[ 8, 23],
[ 9, 24],
[10, 25],
[11, 26],
[12, 27],
[13, 28],
[14, 29]])
>>> regroup(A, [2, 3, 5], [2, 0, 1], output_shape=(3, 5*2))
array([[ 0, 15, 1, 16, 2, 17, 3, 18, 4, 19],
[ 5, 20, 6, 21, 7, 22, 8, 23, 9, 24],
[10, 25, 11, 26, 12, 27, 13, 28, 14, 29]])
>>> regroup(A, [2, 3, 5], [2, 1, 0], output_shape=(5*3, 2))
array([[ 0, 15],
[ 5, 20],
[10, 25],
[ 1, 16],
[ 6, 21],
[11, 26],
[ 2, 17],
[ 7, 22],
[12, 27],
[ 3, 18],
[ 8, 23],
[13, 28],
[ 4, 19],
[ 9, 24],
[14, 29]])
>>> regroup(A, [2, 3, 5], [2, 1, 0], output_shape=(5, 3*2))
array([[ 0, 15, 5, 20, 10, 25],
[ 1, 16, 6, 21, 11, 26],
[ 2, 17, 7, 22, 12, 27],
[ 3, 18, 8, 23, 13, 28],
[ 4, 19, 9, 24, 14, 29]])
There are other ways to spell this of course. Instead of giving the "current" shape, one could give the "target" shape, and also one could provide the source or target destination for the "reordering" argument. It may be useful to see how other libraries and languages spell similar operations.
If you're interested, I can provide a C prototype for regroup
as above. Obviously, resize
is much simpler and easier to use than regroup
, but regroup
is more capable, and it's structured enough such that we can be efficient when sorting of indices is necessary.
As with reshape
, for now I can export/calculate/import for regroup
as well.
from graphblas.
I only support 2-dimensional matrices, though. For that case, isn't the only thing that is possible is to permute [0 1] to [1 0], which is the same as a transpose?
from graphblas.
Here, there can be any number of logical "group sizes" such as [2, 3, 5]
, but the data is stored as a 2d matrix. It's similar to how a 1-dim array can be interpreted as an N-dim array with the proper metadata. But, unlike with dense arrays where updating metadata is usually sufficient, doing a regroup
operation on sparse will likely require computing new indices.
The 12 results shown above are valid representations of 3 dimensional data in a 2d structure. It is useful to perform reordering gymnastics to perform different tensor dot products and reductions.
from graphblas.
from graphblas.
Understood. Thanks. An advantage I see for pushing this into a library such as SuiteSparse:GraphBLAS is you can do the right thing for your different internal formats, which means others don't need to worry about the fine details. Full and bitmap should be able to be done very fast. Some operations are able to preserve sorted-ness and may be faster than using extractTuples.
Thanks for entertaining the idea! I'm still +1 for adding a reshape function.
from graphblas.
from graphblas.
Hi all, may I chip-in with a related request?
I would like to be able to convert back and forth between a Vector and a one row- (or one column-) Matrix. That is, to insert or remove a dimension/axis.
The only way I can think of right now to do this without too much memory overhead is to export the GraphBLAS Vector/Matrix and import it back again with the required axis-metadata.
But having this feature built-in could help in implementing a more efficient parallel matrix-multiplication in dask which employs such features.
from graphblas.
Good question.
Internally, I cheat. I can just typecast an n-by-1 GrB_Matrix into a GrB_Vector, so long as the matrix is held in the right format (by column, not hypersparse). And I can typecast any GrB_Vector into an n-by-1 matrix, for any vector. That's cheating in the sense that it depends on my internal opaque formats; I don't allow the user to do this.
That typecast takes zero time, or O(1) if you like because it's a single pointer assignment at most, as in:
// given an n-by-1 matrix A that is held by column, and not in hypersparse form then this is OK:
GrB_Vector v = (GrB_Vector) A ;
The safest way to get this effect would be the import/export, or better yet the GxB pack/unpack methods. You can unpack an n-by-1 matrix A into its 3 components (assuming CSC format): Ap, Ai, and Ax. Then "pack" the Ai and Ax components into a GrB_Vector v. That also takes O(1) time, and if you reuse the A and v objects, no memory gets allocated at all.
from graphblas.
Thank you @DrTimothyAldenDavis . Interesting internal typecasting abilities.
I was also unaware of the pack/unpack methods — I’d been reading an old User Guide.
from graphblas.
It would be awkward to have GxB_Matrix_reshape accept a parameter that states how the resulting matrix should be stored (by row or by column). There's no way that would be added to the spec that way, since GxB_Matrix_Option_set (GxB_FORMAT, GxB_BY_ROW) or ... by col ... is very specific to SuiteSparse:GraphBLAS.
The best solution would be to just return the matrix in the same format: if the input is by row then the result is by row, and if the input is by column then the result is held by column.
Another question: should this method modify the matrix in-place? As in:
GxB_Matrix_reshape (A, new_nrows, new_ncols) ;
or:
GxB_Matrix_reshape (&C, A, new_nrows, new_ncols) ;
Or perhaps the latter could be used, and C is a new matrix unless you do:
GxB_Matrix_reshape (&A, A, new_nrows, new_ncols) ;
Then you could do both: in-place to modify A, and out-of-place to do C=reshape(A, ...). In-place is faster since it doesn't require the numerical values to be modified or copied at all. The numerical values would not be touched, regardless of the format (sparse, hypersparse, bitmap, or full). That would make the in-place reshape very fast.
from graphblas.
I'm working on this now and have a working draft in the master branch for v7.2.0. I've decided it's safest to write two user-callable reshape methods: one in-place like GrB_Matrix_resize, the other makes a copy:
GxB_Matrix_reshape (C, by_col, nrows_new, ncols_new, descriptor)
by_col is bool, and it tells C to be reshaped column-wise (if true) or row-wise (if false). The descriptor is used just to control the # of threads to use. C is modified in-place.
GxB_Matrix_reshapeDup (&C, A, by_col, nrows_new, ncols_new, descriptor)
constructs a new matrix C, and leaves A unchanged. I wouldn't expect the C API to add both of these. GrB_Matrix_resize doesn't have a descriptor, either, so I suppose the future C API might be just
GrB_Matrix_reshape (C, by_col, nrows_new, ncols_new)
Every GrB method should have a descriptor, or some kind of Context, to control the # threads used, what GPUs to use, etc, but that's beyond the scope of this method.
from graphblas.
Hooray
I'll think about these function signatures, because, yeah, we don't need/want type coercion, masks, accumulators, etc.
Instead of GxB_Matrix_reshapeDup
, what's the downside of doing dup
then GxB_Matrix_reshape
?
And, yes, I hear you about descriptors. This percolates up to wrappers, because we want to expose your descriptors.
from graphblas.
Doing a dup and then a reshape will be slower. I need the reshapeDup for the computation C = reshape (A, mnew, nnew) in my MATLAB interface, for example, where A is an input matrix that won't change, and C is the output matrix.
v7.2.0 is now fully tested & documented, and ready to release so if you have any comments on the API please let me know as soon as you can. I will post it as beta.
from graphblas.
Did you forget to make these public via GB_PUBLIC
?
from graphblas.
I think the API is good; it's awkward to fit both functionalities into a single function.
I'm curious: do you handle the cases nrows == 1
or ncols == 1
specifically? Will these cases be as fast as can reasonably be expected?
I'm actually pretty excited to get this in and start playing with it :)
from graphblas.
Related Issues (20)
- Consider adding COLEQ and ROWEQ IndexUnaryOp operators HOT 9
- New unary operators to calculate principal cube root of real (floating point) values HOT 5
- Matrix_extractElement_Structural HOT 6
- Pass a print function for UDTs HOT 4
- GrB_Descriptor_set is unable to set expected descriptor fields (does not match GxB_Desc_set) HOT 4
- Set name of UDT when serializing HOT 12
- Build broken with spaces in folder names HOT 8
- build fails on windows 11 using CMake with MinGW HOT 5
- ISEQ monoids HOT 7
- Fix bitwise operator monoid names HOT 4
- Remove va_arg HOT 5
- GrB_Vector_(de)serialize HOT 6
- Sparse Index Space HOT 2
- Removed symbols without soname bump HOT 12
- atomic*: Undefined symbol on armel and mipsel architectures HOT 33
- "ZEROB" Binary Operator HOT 3
- GxB sort with smaller (or larger) output objects
- cpu_features: Build error for MinGW HOT 13
- Size of Static Library HOT 16
- Link error with Intel igx and Ninja generator on Windows HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graphblas.