Hi! We're considering using the BlockBandedMatrix typ

Are your blocks constant size? The use of views in <code class="notr

Try this: <div class="highlight highlight-source-julia notranslate position-relati

`setindex!()` performance,about julialinearalgebra/blockbandedmatrices.jl

Comments (11)

dlfivefifty commented on June 27, 2024

Are your blocks constant size?

The use of views in setindex! was just a quick hack to get it working, and it would definitely be better to redesign.

from blockbandedmatrices.jl.

rdeits commented on June 27, 2024

No, and they're not even square in general. The blocks in our case represent the mapping from configuration derivative to velocity for a single joint, so the blocks can be of varying sizes and can be non-square for joints with a different number of configurations than velocities.

from blockbandedmatrices.jl.

dlfivefifty commented on June 27, 2024

Are they like in your example cols = rows = 1:N? (This is actually my biggest use case.)

If not, that is, if they are arbitrary sized, I don't believe there is any way around the fact that setindex! will be slow, as it will have to lookup the right blocks in memory. In my use case, the bands are rational so the eventual fast code would not call setindex!, but instead look something like:

view(A, Block(N,M))[band(1)] .= (1:N).^(-1)

in this case, the code could be designed to do the lookups once. (And could even be done on a GPU with no CPU memory access.)

Are you sure that your code needs to call setindex!?

Also, look in BlockArrays.jl at BlockArray vs PseudoBlockArray. Here I'm more following PseudoBlockArray style where the non-zero entries of the matrix are stored consecutively in memory by column. (This will allow the use of LAPACK to do block-wise QR.) An alternative would be to follow BlockArray and store each block as a separate Matrix{T}, that is, wrap a BandedMatrix{Matrix{T}, Matrix{Matrix{T}}}. Which style fits in your application more?

from blockbandedmatrices.jl.

rdeits commented on June 27, 2024

No, that's just pulled from the readme. We don't know anything about the sizes of the blocks ahead of time, but we know that the resulting matrix will have block lower and upper bandwidths of 0 (i.e. there is only a single, contiguous diagonal band of blocks), but each block can be of varying size.

The matrices we're dealing with are also small enough (less than 50x50) that sparsity isn't a huge concern. My current solution is a struct which stores the parent as a dense matrix and then stores a vector of views into that parent. We don't have to do any fancy lookup to find the right block because there's just the one diagonal band of blocks, so I can just store the views in diagonal-order and look them up linearly.

The only two operations that I currently perform on the block-diagonal matrix are:

Updating each block in place with something like update!(blocks(J)[i], data[i]) where i is the index of the block along the diagonal
Matmul using the full J matrix

It looks like BandedMatrix{Matrix{T}, Matrix{Matrix{T}}} might be a good alternative to what I'm using now.

from blockbandedmatrices.jl.

dlfivefifty commented on June 27, 2024

There’s still a lookup as you have to access the view, whose location is stored in memory. So I don’t think there’s anything about your setup that makes it inherently faster, though storing the views will indeed avoid allocation.

I think the fastest option is to lookup the starting index and stride and access the data directly, without creating any views. This is actually a really easy change.

from blockbandedmatrices.jl.

rdeits commented on June 27, 2024

Sorry, yes, I totally agree. My approach will still be slower than directly accessing a plain Matrix because of the cost of looking up the existing view. In my particular case that's OK, as I can amortize a single lookup over all of the indices which are being set within the given view.

And yes, I think looking up the index and stride directly would be faster (and non-allocating) compared to constructing a new view each time.

from blockbandedmatrices.jl.

dlfivefifty commented on June 27, 2024

Try this:

@inline function foosetindex!(A::BlockBandedMatrix{T}, v, i::Int, j::Int) where T
    @boundscheck checkbounds(A, i, j)
    bi = BlockBandedMatrices.global2blockindex(A.block_sizes, (i, j))
    ind = A.block_sizes.block_starts[bi.I...]
    st = A.block_sizes.block_strides[bi.I[2]]
    @inbounds V.data[ind + bi.α[1]-1 + (bi.α[2]-1)*st] = convert(T, v)::T
    return v
end

I get

julia> @btime foosetindex!(m,2, 5,6)
  2.755 μs (4 allocations: 160 bytes)
2

from blockbandedmatrices.jl.

rdeits commented on June 27, 2024

That doesn't look quite right. You're accessing a non-existent variable V (should be A), and the time is 2 microseconds, which is about 1000x too slow. Changing that V to A, I get 53.211 ns (2 allocations: 128 bytes), which is much better but still allocating.

It looks like there's something strange going on with global2blockindex which is where the allocations (and most of the time) are coming from:

julia> s = m.block_sizes;

julia> @btime BlockBandedMatrices.global2blockindex($s, $(5, 6))
42.521 ns (2 allocations: 128 bytes)

from blockbandedmatrices.jl.

rdeits commented on June 27, 2024

Ah, I got it. There's a subtle bug that occurs in several places in BlockArrays.jl. In many of the @generated functions, you have lines like this: https://github.com/JuliaArrays/BlockArrays.jl/blob/d5710bb15357b26c942df8c983a8f14992a8d922/src/blockindices.jl#L73

where you do $Expr(:meta, :inline). But that interpolation isn't quite right: you're actually interpolating only the symbol Expr, not the actual expression. The result is that the generated output, rather than containing the inlining hint, actually constructs an Expr at run-time. That is, the resulting generated function's body looks like:

function foo(x)
  (BlockArrays.Expr)(:meta, :inline)
  _do_stuff(x)
end

Changing that to $(Expr(:meta, :inline)) everywhere, I get global2blockindex down from ~60 ns with 2 allocations to ~10 ns with 0 allocations.

from blockbandedmatrices.jl.

dlfivefifty commented on June 27, 2024

Awesome! Though by "you" you mean @kristofferc, who wrote BlockArrays.jl 😛 Though I've inhereted and now maintain it.

Can you make a PR?

from blockbandedmatrices.jl.

rdeits commented on June 27, 2024

Haha, sorry, didn't mean to accuse 😉 . And yup, working on a PR now.

from blockbandedmatrices.jl.

`setindex!()` performance about blockbandedmatrices.jl HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs