Comments (11)
Are your blocks constant size?
The use of views in setindex!
was just a quick hack to get it working, and it would definitely be better to redesign.
from blockbandedmatrices.jl.
No, and they're not even square in general. The blocks in our case represent the mapping from configuration derivative to velocity for a single joint, so the blocks can be of varying sizes and can be non-square for joints with a different number of configurations than velocities.
from blockbandedmatrices.jl.
Are they like in your example cols = rows = 1:N
? (This is actually my biggest use case.)
If not, that is, if they are arbitrary sized, I don't believe there is any way around the fact that setindex!
will be slow, as it will have to lookup the right blocks in memory. In my use case, the bands are rational so the eventual fast code would not call setindex!
, but instead look something like:
view(A, Block(N,M))[band(1)] .= (1:N).^(-1)
in this case, the code could be designed to do the lookups once. (And could even be done on a GPU with no CPU memory access.)
Are you sure that your code needs to call setindex!
?
Also, look in BlockArrays.jl at BlockArray
vs PseudoBlockArray
. Here I'm more following PseudoBlockArray
style where the non-zero entries of the matrix are stored consecutively in memory by column. (This will allow the use of LAPACK to do block-wise QR.) An alternative would be to follow BlockArray
and store each block as a separate Matrix{T}
, that is, wrap a BandedMatrix{Matrix{T}, Matrix{Matrix{T}}}
. Which style fits in your application more?
from blockbandedmatrices.jl.
No, that's just pulled from the readme. We don't know anything about the sizes of the blocks ahead of time, but we know that the resulting matrix will have block lower and upper bandwidths of 0 (i.e. there is only a single, contiguous diagonal band of blocks), but each block can be of varying size.
The matrices we're dealing with are also small enough (less than 50x50) that sparsity isn't a huge concern. My current solution is a struct which stores the parent as a dense matrix and then stores a vector of views into that parent. We don't have to do any fancy lookup to find the right block because there's just the one diagonal band of blocks, so I can just store the views in diagonal-order and look them up linearly.
The only two operations that I currently perform on the block-diagonal matrix are:
- Updating each block in place with something like
update!(blocks(J)[i], data[i])
wherei
is the index of the block along the diagonal - Matmul using the full
J
matrix
It looks like BandedMatrix{Matrix{T}, Matrix{Matrix{T}}}
might be a good alternative to what I'm using now.
from blockbandedmatrices.jl.
There’s still a lookup as you have to access the view, whose location is stored in memory. So I don’t think there’s anything about your setup that makes it inherently faster, though storing the views will indeed avoid allocation.
I think the fastest option is to lookup the starting index and stride and access the data directly, without creating any views. This is actually a really easy change.
from blockbandedmatrices.jl.
Sorry, yes, I totally agree. My approach will still be slower than directly accessing a plain Matrix because of the cost of looking up the existing view. In my particular case that's OK, as I can amortize a single lookup over all of the indices which are being set within the given view.
And yes, I think looking up the index and stride directly would be faster (and non-allocating) compared to constructing a new view each time.
from blockbandedmatrices.jl.
Try this:
@inline function foosetindex!(A::BlockBandedMatrix{T}, v, i::Int, j::Int) where T
@boundscheck checkbounds(A, i, j)
bi = BlockBandedMatrices.global2blockindex(A.block_sizes, (i, j))
ind = A.block_sizes.block_starts[bi.I...]
st = A.block_sizes.block_strides[bi.I[2]]
@inbounds V.data[ind + bi.α[1]-1 + (bi.α[2]-1)*st] = convert(T, v)::T
return v
end
I get
julia> @btime foosetindex!(m,2, 5,6)
2.755 μs (4 allocations: 160 bytes)
2
from blockbandedmatrices.jl.
That doesn't look quite right. You're accessing a non-existent variable V
(should be A
), and the time is 2 microseconds, which is about 1000x too slow. Changing that V
to A
, I get 53.211 ns (2 allocations: 128 bytes)
, which is much better but still allocating.
It looks like there's something strange going on with global2blockindex
which is where the allocations (and most of the time) are coming from:
julia> s = m.block_sizes;
julia> @btime BlockBandedMatrices.global2blockindex($s, $(5, 6))
42.521 ns (2 allocations: 128 bytes)
from blockbandedmatrices.jl.
Ah, I got it. There's a subtle bug that occurs in several places in BlockArrays.jl. In many of the @generated
functions, you have lines like this: https://github.com/JuliaArrays/BlockArrays.jl/blob/d5710bb15357b26c942df8c983a8f14992a8d922/src/blockindices.jl#L73
where you do $Expr(:meta, :inline)
. But that interpolation isn't quite right: you're actually interpolating only the symbol Expr
, not the actual expression. The result is that the generated output, rather than containing the inlining hint, actually constructs an Expr
at run-time. That is, the resulting generated function's body looks like:
function foo(x)
(BlockArrays.Expr)(:meta, :inline)
_do_stuff(x)
end
Changing that to $(Expr(:meta, :inline))
everywhere, I get global2blockindex
down from ~60 ns with 2 allocations to ~10 ns with 0 allocations.
from blockbandedmatrices.jl.
Awesome! Though by "you" you mean @kristofferc
, who wrote BlockArrays.jl 😛 Though I've inhereted and now maintain it.
Can you make a PR?
from blockbandedmatrices.jl.
Haha, sorry, didn't mean to accuse 😉 . And yup, working on a PR now.
from blockbandedmatrices.jl.
Related Issues (20)
- Banded block banded matrix with different bandwidths for the blocks HOT 3
- Can we implement more properties of BandedBlockBandedMatrix? HOT 4
- StackOverFlow Error HOT 4
- Unable to assign entries through BlockIndex HOT 2
- Addition/broadcasting type unstable with mixed block range types HOT 1
- How to convert BandedBlockBandedMatrix to BandedMatrix (and preserve what's left of the sparsity)? HOT 1
- BlockBandedMatrix should have extra entries so subblocks are banded
- Performance of `BandedBlockBandedMatrix` `Vector` Multiplication HOT 2
- Infinite dim. BandedBlockBandedMatrices interaction with Symmetric() broken? HOT 2
- Make some of this package an extension in BlockArrays.jl
- A minor typo on homepage
- Performances of mul!(x, transpose(B), y) HOT 2
- Precompilation failure in mybinder.org HOT 3
- Incompatible with spy HOT 3
- 0.7 HOT 1
- Adding diagonal matrix to BlockBandedMatrix/BlockSkylineMatrix yields dense BlockArray HOT 1
- Constructors from Kronecker Products? HOT 9
- Update constructor docs HOT 1
- view(A,Block(K,K))[band(0)] = ... syntax not working HOT 2
- TagBot trigger issue HOT 41
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blockbandedmatrices.jl.