GithubHelp home page GithubHelp logo

Comments (11)

dlfivefifty avatar dlfivefifty commented on June 27, 2024

Are your blocks constant size?

The use of views in setindex! was just a quick hack to get it working, and it would definitely be better to redesign.

from blockbandedmatrices.jl.

rdeits avatar rdeits commented on June 27, 2024

No, and they're not even square in general. The blocks in our case represent the mapping from configuration derivative to velocity for a single joint, so the blocks can be of varying sizes and can be non-square for joints with a different number of configurations than velocities.

from blockbandedmatrices.jl.

dlfivefifty avatar dlfivefifty commented on June 27, 2024

Are they like in your example cols = rows = 1:N? (This is actually my biggest use case.)

If not, that is, if they are arbitrary sized, I don't believe there is any way around the fact that setindex! will be slow, as it will have to lookup the right blocks in memory. In my use case, the bands are rational so the eventual fast code would not call setindex!, but instead look something like:

view(A, Block(N,M))[band(1)] .= (1:N).^(-1)

in this case, the code could be designed to do the lookups once. (And could even be done on a GPU with no CPU memory access.)

Are you sure that your code needs to call setindex!?

Also, look in BlockArrays.jl at BlockArray vs PseudoBlockArray. Here I'm more following PseudoBlockArray style where the non-zero entries of the matrix are stored consecutively in memory by column. (This will allow the use of LAPACK to do block-wise QR.) An alternative would be to follow BlockArray and store each block as a separate Matrix{T}, that is, wrap a BandedMatrix{Matrix{T}, Matrix{Matrix{T}}}. Which style fits in your application more?

from blockbandedmatrices.jl.

rdeits avatar rdeits commented on June 27, 2024

No, that's just pulled from the readme. We don't know anything about the sizes of the blocks ahead of time, but we know that the resulting matrix will have block lower and upper bandwidths of 0 (i.e. there is only a single, contiguous diagonal band of blocks), but each block can be of varying size.

The matrices we're dealing with are also small enough (less than 50x50) that sparsity isn't a huge concern. My current solution is a struct which stores the parent as a dense matrix and then stores a vector of views into that parent. We don't have to do any fancy lookup to find the right block because there's just the one diagonal band of blocks, so I can just store the views in diagonal-order and look them up linearly.

The only two operations that I currently perform on the block-diagonal matrix are:

  1. Updating each block in place with something like update!(blocks(J)[i], data[i]) where i is the index of the block along the diagonal
  2. Matmul using the full J matrix

It looks like BandedMatrix{Matrix{T}, Matrix{Matrix{T}}} might be a good alternative to what I'm using now.

from blockbandedmatrices.jl.

dlfivefifty avatar dlfivefifty commented on June 27, 2024

There’s still a lookup as you have to access the view, whose location is stored in memory. So I don’t think there’s anything about your setup that makes it inherently faster, though storing the views will indeed avoid allocation.

I think the fastest option is to lookup the starting index and stride and access the data directly, without creating any views. This is actually a really easy change.

from blockbandedmatrices.jl.

rdeits avatar rdeits commented on June 27, 2024

Sorry, yes, I totally agree. My approach will still be slower than directly accessing a plain Matrix because of the cost of looking up the existing view. In my particular case that's OK, as I can amortize a single lookup over all of the indices which are being set within the given view.

And yes, I think looking up the index and stride directly would be faster (and non-allocating) compared to constructing a new view each time.

from blockbandedmatrices.jl.

dlfivefifty avatar dlfivefifty commented on June 27, 2024

Try this:

@inline function foosetindex!(A::BlockBandedMatrix{T}, v, i::Int, j::Int) where T
    @boundscheck checkbounds(A, i, j)
    bi = BlockBandedMatrices.global2blockindex(A.block_sizes, (i, j))
    ind = A.block_sizes.block_starts[bi.I...]
    st = A.block_sizes.block_strides[bi.I[2]]
    @inbounds V.data[ind + bi.α[1]-1 + (bi.α[2]-1)*st] = convert(T, v)::T
    return v
end

I get

julia> @btime foosetindex!(m,2, 5,6)
  2.755 μs (4 allocations: 160 bytes)
2

from blockbandedmatrices.jl.

rdeits avatar rdeits commented on June 27, 2024

That doesn't look quite right. You're accessing a non-existent variable V (should be A), and the time is 2 microseconds, which is about 1000x too slow. Changing that V to A, I get 53.211 ns (2 allocations: 128 bytes), which is much better but still allocating.

It looks like there's something strange going on with global2blockindex which is where the allocations (and most of the time) are coming from:

julia> s = m.block_sizes;

julia> @btime BlockBandedMatrices.global2blockindex($s, $(5, 6))
42.521 ns (2 allocations: 128 bytes)

from blockbandedmatrices.jl.

rdeits avatar rdeits commented on June 27, 2024

Ah, I got it. There's a subtle bug that occurs in several places in BlockArrays.jl. In many of the @generated functions, you have lines like this: https://github.com/JuliaArrays/BlockArrays.jl/blob/d5710bb15357b26c942df8c983a8f14992a8d922/src/blockindices.jl#L73

where you do $Expr(:meta, :inline). But that interpolation isn't quite right: you're actually interpolating only the symbol Expr, not the actual expression. The result is that the generated output, rather than containing the inlining hint, actually constructs an Expr at run-time. That is, the resulting generated function's body looks like:

function foo(x)
  (BlockArrays.Expr)(:meta, :inline)
  _do_stuff(x)
end

Changing that to $(Expr(:meta, :inline)) everywhere, I get global2blockindex down from ~60 ns with 2 allocations to ~10 ns with 0 allocations.

from blockbandedmatrices.jl.

dlfivefifty avatar dlfivefifty commented on June 27, 2024

Awesome! Though by "you" you mean @kristofferc, who wrote BlockArrays.jl 😛 Though I've inhereted and now maintain it.

Can you make a PR?

from blockbandedmatrices.jl.

rdeits avatar rdeits commented on June 27, 2024

Haha, sorry, didn't mean to accuse 😉 . And yup, working on a PR now.

from blockbandedmatrices.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.