fverdugo / partitionedarrays.jl Goto Github PK

View Code? Open in Web Editor NEW

108.0 6.0 12.0 4.59 MB

Large-scale, distributed, sparse linear algebra in Julia.

License: MIT License

Julia 100.00%

julia parallel-algorithms parallel-data mpi linear-algebra hpc

partitionedarrays.jl's People

Contributors

Stargazers

Watchers

Forkers

isgasho ranocha oriolcg fredrikekre alvaroborras termi-official giordano jordimanyer carlodev yssamtu rasimers hrvanelderen

partitionedarrays.jl's Issues

discover_snd_parts warning

Hi @fverdugo,

we know that the current implementation of discover_snd_parts is not scalable. However, the user might not know that. While we dont have an scalable implementation I would warn the user on screen with a message. Something like:

"[PartitionedArrays.jl] Warning: Using a non-scalable implementation to discover reciprocal parts in sparse communication kernel among nearest neighbours. This might cause trouble when running the code at medium/large scales. You can avoid this using the Exchanger constructor with bla bla bla"

I guess that we can do it on the master task using the @warn macro right at the beginning of a call to discover_parts_snd.

What do you think? Any other ideas while we do not have the scalable solution?

Missing: Array of Parrays to Parray of Arrays

Hi @fverdugo, I think there is a functionality missing that could be useful in some cases. Again, I might be missing something but I haven't found an elegant way of dealing with it.

It involves converting an Array of PArrays to a PArray of Arrays. Consider the following example:

using PartitionedArrays

ranks = with_debug() do distribute
  distribute(LinearIndices((4,)))
end

a = map(r -> r, ranks)
b = map(r -> -r, ranks)

# Array of PArrays
v = [a,b];

# PArray of Arrays
map(ranks) do r
  map(v) do v_i
    # Used to be: PartitionedArrays.get_part(v_i,r)
    # But now is limited to:
    PartitionedArrays.getany(v_i)
  end
end

The current code works fine for MPIArrays, but not for DebugArrays (since getany always returns the first element). The old solution (commented) does not work anymore since there is no way to access a particular element in a DebugArray/MPIArray without getting an error/warning (even if it is safe to do so).

I think the solution to this involves a new function (basically an analog to tuple_of_arrays) that specializes for DebugArray and MPIArray. Would that be something you would like to have in PartitionedArrays?

Rosenbrock/TRBDF2 with Preconditioned Newton-Krylov Distributed Demo: PArray Construction

I'm trying to make a version of the DifferentialEquations.jl Solving Large Stiff Equations PDE time stepping tutorial that is distributed, and am running into some issues figuring out how to construct PArrays. MWE with the method errors are below:

using OrdinaryDiffEq, LinearAlgebra, SparseArrays, BenchmarkTools, LinearSolve

const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a
kernel_u! = let N=N, xyd=xyd_brusselator, dx=step(xyd_brusselator)
  @inline function (du, u, A, B, α, II, I, t)
    i, j = Tuple(I)
    x = xyd[I[1]]
    y = xyd[I[2]]
    ip1 = limit(i+1, N); im1 = limit(i-1, N)
    jp1 = limit(j+1, N); jm1 = limit(j-1, N)
    du[II[i,j,1]] = α*(u[II[im1,j,1]] + u[II[ip1,j,1]] + u[II[i,jp1,1]] + u[II[i,jm1,1]] - 4u[II[i,j,1]]) +
    B + u[II[i,j,1]]^2*u[II[i,j,2]] - (A + 1)*u[II[i,j,1]] + brusselator_f(x, y, t)
  end
end
kernel_v! = let N=N, xyd=xyd_brusselator, dx=step(xyd_brusselator)
  @inline function (du, u, A, B, α, II, I, t)
    i, j = Tuple(I)
    ip1 = limit(i+1, N)
    im1 = limit(i-1, N)
    jp1 = limit(j+1, N)
    jm1 = limit(j-1, N)
    du[II[i,j,2]] = α*(u[II[im1,j,2]] + u[II[ip1,j,2]] + u[II[i,jp1,2]] + u[II[i,jm1,2]] - 4u[II[i,j,2]]) +
    A*u[II[i,j,1]] - u[II[i,j,1]]^2*u[II[i,j,2]]
  end
end
brusselator_2d = let N=N, xyd=xyd_brusselator, dx=step(xyd_brusselator)
  function (du, u, p, t)
    @inbounds begin
      ii1 = N^2
      ii2 = ii1+N^2
      ii3 = ii2+2(N^2)
      A = p[1]
      B = p[2]
      α = p[3]/dx^2
      II = LinearIndices((N, N, 2))
      kernel_u!.(Ref(du), Ref(u), A, B, α, Ref(II), CartesianIndices((N, N)), t)
      kernel_v!.(Ref(du), Ref(u), A, B, α, Ref(II), CartesianIndices((N, N)), t)
      return nothing
    end
  end
end
p = (3.4, 1., 10., step(xyd_brusselator))

function init_brusselator_2d(xyd)
  N = length(xyd)
  u = zeros(N, N, 2)
  for I in CartesianIndices((N, N))
    x = xyd[I[1]]
    y = xyd[I[2]]
    u[I,1] = 22*(y*(1-y))^(3/2)
    u[I,2] = 27*(x*(1-x))^(3/2)
  end
  u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d,u0,(0.,11.5),p)

du = similar(u0)
brusselator_2d(du, u0, p, 0.0)
du[34] # 802.9807693762164
du[1058] # 985.3120721709204
du[2000] # -403.5817880634729
du[end] # 1431.1460373522068
du[521] # -323.1677459142322

du2 = similar(u0)
brusselator_2d(du2, u0, p, 1.3)
du2[34] # 802.9807693762164
du2[1058] # 985.3120721709204
du2[2000] # -403.5817880634729
du2[end] # 1431.1460373522068
du2[521] # -318.1677459142322

using Symbolics, PartitionedArrays, SparseDiffTools
du0 = copy(u0)
jac_sparsity = float.(Symbolics.jacobian_sparsity((du,u)->brusselator_2d(du,u,p,0.0),du0,u0))
colorvec = matrix_colors(jac_sparsity)

# From https://github.com/fverdugo/PartitionedArrays.jl/blob/v0.2.8/test/test_fdm.jl#L93
# A = PSparseMatrix(I,J,V,rows,cols;ids=:local)

II,J,V = findnz(jac_sparsity)
jac_sparsity_distributed = PartitionedArrays.PSparseMatrix(II,J,V,jac_sparsity.rowval,jac_sparsity.colptr)
# MethodError: no method matching PSparseMatrix(::Vector{Int64}, ::Vector{Int64}, ::Vector{Float64}, ::Vector{Int64}, ::Vector{Int64})

f = ODEFunction(brusselator_2d;jac_prototype=jac_sparsity,colorvec = colorvec)
parf = ODEFunction(brusselator_2d;jac_prototype=jac_sparsity_distributed,colorvec = colorvec)

prob_ode_brusselator_2d = ODEProblem(brusselator_2d,u0,(0.0,11.5),p,tstops=[1.1])
prob_ode_brusselator_2d_sparse = ODEProblem(f,u0,(0.0,11.5),p,tstops=[1.1])

nparts = 4
pu0 = PartitionedArrays.PVector(u0,nparts) # MethodError: no method matching PVector(::Array{Float64, 3}, ::Int64)

prob_ode_brusselator_2d_parallel = ODEProblem(brusselator_2d,pu0,(0.0,11.5),p,tstops=[1.1])
prob_ode_brusselator_2d_parallelsparse = ODEProblem(parf,pu0,(0.0,11.5),p,tstops=[1.1])

Then the solving code is:

@time solve(prob_ode_brusselator_2d_parallel,Rosenbrock23(),save_everystep=false);
@time solve(prob_ode_brusselator_2d_parallelsparse,Rosenbrock23(),save_everystep=false);

using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = aspreconditioner(ruge_stuben(convert(AbstractMatrix,W)))
  else
    Pl = Plprev
  end
  Pl,nothing
end

# Required due to a bug in Krylov.jl: https://github.com/JuliaSmoothOptimizers/Krylov.jl/pull/477
Base.eltype(::AlgebraicMultigrid.Preconditioner) = Float64

@time solve(prob_ode_brusselator_2d_parallelsparse,Rosenbrock23(linsolve=KrylovJL_GMRES()),save_everystep=false);
@time solve(prob_ode_brusselator_2d_parallelsparse,Rosenbrock23(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);

But since the construction doesn't run right now those will all fail of course.

Deactivate codecov checks

CI shows as failed even when test are passing if coverage is reduced. Even slightly. Very annoying!

Diagnostic context missing in some error messages

Some of the error messages could be more helpful if the provide some context about what exactly has been detected.

For example "The sparsity pattern of the ghost layer is inconsistent" could dump some information about where the detected inconsistency is. E.g. something like "The sparsity pattern of the ghost layer is inconsistent on rank $rank at local index ($i,$j)". I understand that it is impossible to reconstruct the full context of the error, especially in an distributed environment, but having at least some information can be immensely helpful when dealing with errors in non-trivial corner cases.

Welcome guide for developers

It would be very useful to create a welcome guide to people willing to contribute to the library or using it at a more advanced level (e.g. in a bsc/msc project).

This guide should include:

Julia-related

PartitionedArrays-related

General usage: https://www.francescverdugo.com/PartitionedArrays.jl/dev/usage/
Working with PVectors: https://www.francescverdugo.com/PartitionedArrays.jl/dev/jacobi_tutorial/
Assembling parallel sparse matrices. Missing tutorial (e.g., assemble a Laplace Matrix)
Make clearer the difference between mesh partition and matrix partition
Working with PSparseMatrix Missing tutorial on how to implement SpMV
Working with PSparseMatrix (advanced) Missing tutorial on how to implement SpMM

(To be continued)

Sparse-matrix vector product A*x for general ghosts in x

functio spmv(A,x;reuse)
      cols = partition(axes(A,2))
      x_with_ghost, cachex = consistent(x,cols;reuse=true) |> fetch
      b = A*x_with_ghost
      cache = (x_with_ghost,cachex)
      b, cache
end

function spmv!(b,A,x,cache)
     consistent!(x_with_ghost,x,cachex) |> wait
     mul!(b,A,x_with_ghost)
end

Explore if cache can be hidden in the matrix.

Partitioned multidimensional arrays (generalize PVector into PArray)

For parallel stencil computations (e.g. finite-difference methods), it would be nice to have partitioned multidimensional arrays, with support for arbitrary thickness "ghost" overlap regions (so that you could loop over just the interior of each partition and the ghost regions would handle the stencil boundaries).

Not sure whether that is in scope for this package, or if it is something that should be implemented in another package on top of PVector?

It would be nice to at least broaden PVector to PArray (with PVector as an alias for PArray{1}) — same machinery, just multidimensional arrays as storage and either arrays of linear indices or arrays of CartesianIndex for the index_partition. Matrix-free stencil support could then be implemented in an add-on package.

Performance and scalability benchmark of gather-scatter versus ibarrier-based find_rcv_neighbors

Improving Timer with (optional) barrier right before Tic!

See PR #47

Implement the AbstractLocalIndices interface for OwnIndices

Implement the AbstractLocalIndices interface for OwnIndices and make union_ghost work as well.

propertynames(::PVector) is ambiguous

julia> propertynames(vec)
ERROR: MethodError: propertynames(::PVector{Float64, MPIData{Vector{Float64}, 2}, PRange{MPIData{IndexRange, 2}, Exchanger{MPIData{Vector{Int32}, 2}, MPIData{Table{Int32}, 2}}, MPIData{PartitionedArrays.LinearGidToPart, 2}}}, ::Bool) is ambiguous. Candidates:
  propertynames(x, private::Bool) in Base at reflection.jl:1581
  propertynames(x::PVector, private) in PartitionedArrays at /home/amartin/git-repos/PartitionedArrays.jl/src/Interfaces.jl:1438
Possible fix, define
  propertynames(::PVector, ::Bool)
Stacktrace:
 [1] propertynames(x::PVector{Float64, MPIData{Vector{Float64}, 2}, PRange{MPIData{IndexRange, 2}, Exchanger{MPIData{Vector{Int32}, 2}, MPIData{Table{Int32}, 2}}, MPIData{PartitionedArrays.LinearGidToPart, 2}}})
   @ PartitionedArrays ~/git-repos/PartitionedArrays.jl/src/Interfaces.jl:1439
 [2] top-level scope
   @ REPL[6]:1

More catchy package name

DistributedDataDraft.jl is just a temporary dummy name that needs to be replaced. Any ideas are wellcome!

GSoC 2024 Task #1: Extend Notebook to PartitionedArrays.jl Tutorial

cc @fverdugo @amartinhuertas @oriolcg

In the meeting on 2024-05-27, the following steps were discussed to get started with the GSoC project:

Familiarize with PartitionedArrays.jl package by working with Jacobi tutorial notebook
add explanatory text and instructions to the notebook to create a tutorial
create first issue and post link to it on social media platform

eltype, length and size are inconsistent with iterate

In partituclar, it means that we cannot use the Julia machinery to work with iterable objects.

Example

This works:

using PartitionedArrays
part = get_part_ids(SequentialBackend(),4)
a = map_parts(part->(part,2*part),part)
x,y = a

but this does not:

collect(a)

Compatibility broken for PartitionedArrays 0.2.12

I've noticed that MPI.jl 0.20 is now supported, which is good. However, this means we cannot have compatibility with older versions of MPI.jl, since the MPI.API module was introduced in 0.20 and is explicitly used in some PartitionedArrays functions, for instance PartitionedArrays.jl/MPIBackend.jl/_get_part_ids_body(...).

I noticed this while trying to use the new version of PartitionedArrays.jl in GridapDistributed.jl and GridapP4est.jl.

I then believe the compat line in Project.toml should read

MPI = "0.20"

Let me know what you think.

Bug: PermutedLocalIndices

@fverdugo I think I found some issues for PermutedLocalIndices. See the MWE:

using PartitionedArrays

ranks = with_debug() do distribute
  distribute(LinearIndices((2,)))
end

indices = map(ranks) do r
  n_global = 4
  if r == 1
    local_to_global = [1,2,3]
    local_to_owner  = [1,1,2]
  else
    local_to_global = [2,3,4]
    local_to_owner  = [1,2,2]
  end
  LocalIndices(n_global,r,local_to_global,local_to_owner)
end

perm = map(ranks) do r
  if r == 1
    [2,1,3]
  else
    [1,3,2]
  end
end

perm_indices = map(permute_indices,indices,perm) # Error 1
v = pfill(0.0,perm_indices) # Error 2

Trace for error 1:

ERROR: MethodError: no method matching PartitionedArrays.LocalToGlobal(::SubArray{Int64, 1, Vector{Int64}, Tuple{Vector{Int32}}, false}, ::SubArray{Int64, 1, Vector{Int64}, Tuple{Vector{Int32}}, false}, ::Vector{Int32})

Closest candidates are:
  PartitionedArrays.LocalToGlobal(::A, ::Vector{Int64}, ::C) where {A, C}
   @ PartitionedArrays ~/.julia/packages/PartitionedArrays/py6uo/src/p_range.jl:955
  PartitionedArrays.LocalToGlobal(::PartitionedArrays.BlockPartitionOwnToGlobal, ::Vector{Int64}, ::Any)
   @ PartitionedArrays ~/.julia/packages/PartitionedArrays/py6uo/src/p_range.jl:1428

Trace for error 2:

ERROR: MethodError: no method matching PartitionedArrays.LocalToOwner(::PartitionedArrays.OwnToOwner, ::SubArray{Int32, 1, Vector{Int32}, Tuple{Vector{Int32}}, false}, ::Vector{Int32})

Closest candidates are:
  PartitionedArrays.LocalToOwner(::PartitionedArrays.OwnToOwner, ::Vector{Int32}, ::C) where C
   @ PartitionedArrays ~/.julia/packages/PartitionedArrays/py6uo/src/p_range.jl:972

It seems the LocalToGlobal and LocalToOwner structs are enforcing datatypes which PermutedLocalIndices does not support, when created from a LocalIndices object.

I would also think that this is related to the fact that LocalToGlobal assumes we have our indices ordered as [own...,ghost...] like for OwnAndGhostIndices (which is the only type for which permutation is supported). Indeed, if we solve the issue with something like

function local_to_global(a::PermutedLocalIndices)
  LocalToGlobal(own_to_global(a),collect(ghost_to_global(a)),a.perm)
end

we can create the object, but the result is wrong:

2-element DebugArray{PartitionedArrays.LocalToGlobal{SubArray{Int64, 1, Vector{Int64}, Tuple{Vector{Int32}}, false}, Vector{Int32}}, 1}:
[1] = [2, 1, 3]
[2] = [3, 4, 2] # This should be still [2,3,4], but it gets reordered as owned then ghost.

All in all, I think something along these lines should be considered to fix this issue while preserving performance and the current API:

Rename PermutedLocalIndices to PermutedOnwAndGhostIndices.
Create a new struct PermutedLocalIndices that works for an arbitrary LocalIndices structure. Since LocalIndices already contains the local_to_global and local_to_owner arrays, it is significantly simpler to reorder the indices.
Dispatch on permute_indices based on the indices type.

Alternatively, we could avoid a new structure by just dispatching to

function permute_indices(indices::LocalIndices,perm)
  id = part_id(indices)
  n_glob = global_length(indices)
  l2g = view(local_to_global(indices),perm)
  l2o = view(local_to_owner(indices),perm)
  return LocalIndices(n_glob,id,l2g,l2o)
end

Also, and just as a comment, I think it would be nice to add to the documentation in which direction the permutation is going, i.e from the old local indices to the new or vice-versa. I believe it;s meant to be a map new -> old.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Bug: `global_length` for `PRange`

I believe there is a typo/mistake here.

I think it should be

global_length(pr::PRange) = map(global_length,partition(pr))

Enhance and optimize `discover_parts_snd`

This function is now implemented in a rather brute force way and should be implemented as detailed in https://dl.acm.org/doi/10.1145/1837853.1693476

This can also further optimized for Cartesian topologies.

Rename `prun` for `with_backend` and deprecate `prun`

IMO,

  with_backend(MPIBackend(),np) do parts
  end

is more intention revealing than the current

  prun(MPIBackend(),np) do parts
  end

psystem returns different ghost rows for the matrix and the vector when assemble=false

ERROR: MethodError: no method matching psparse(::Vector{…}, ::Vector{…}, ::Vector{…}, ::Vector{…}, ::Vector{…})

using PartitionedArrays
using IterativeSolvers
using LinearAlgebra

np = 3
n = 5

ranks = LinearIndices((np,))
map(ranks) do rank
println("Hello, world! I am proc $rank of $np.")
end;

row_partition = uniform_partition(ranks, n)
IV = map(row_partition) do row_indices
I,V = Int[], Float64[]
for global_row in local_to_global(row_indices)
if global_row == 1
v = 1.0
elseif global_row == n
v = -1.0
else
continue
end
push!(I,global_row)
push!(V,v)
end
I,V
end

II,VV = tuple_of_arrays(IV);

b = PVector(II, VV, row_partition) |> fetch;

IJV = map(row_partition) do row_indices
I,J,V = Int[], Int[], Float64[]
for global_row in local_to_global(row_indices)
if global_row in (1,n)
push!(I,global_row)
push!(J,global_row)
push!(V,1.0)
else
push!(I,global_row)
push!(J,global_row-1)
push!(V,-1.0)
push!(I,global_row)
push!(J,global_row)
push!(V,2.0)
push!(I,global_row)
push!(J,global_row+1)
push!(V,-1.0)
end
end
I,J,V
end
I,J,V = tuple_of_arrays(IJV);
col_partition = row_partition;
A = psparse(I,J,V,row_partition,col_partition);

x = similar(b,axes(A,2))
x .= b
IterativeSolvers.cg!(x,A,b)
r = A*x - b
norm(r)

I run above code, which is an example of "Distributed sparse linear solve", but got errors below.

Missing feature: `find_owner` for `LocalIndices`

I believe the method find_owner is missing for the structure LocalIndices.

Is this correct, or am I missing something?

Implement a nice show method for PVector and PSparseMatrix

This basic feature is not yet implemented. Now, display(v) for a PVector or a PSparseMatrix leads to errors (since scalar indexing is not implemented see issue #53)

A consistent way to show PVector or a PSparseMatrix is to gather the values in the main part and print only here.

Do not use `MPI.COMM_WORLD` in `get_part_ids(b::MPIBackend,...`

We should think where to duplicate MPI.COMM_WORLD. Using MPI.COMM_WORLD all the way through is not safe.

Implement matrix-matrix and matrix-matrix-matrix product

This would be needed for multigrid preconditioners.

Implement scalar indexing and warning error message

Scalar indexing is not implemented for PVector and PSparseMatrix. Mainly since this is very inefficient and should be avoided . However, in some situations, it would be useful (e.g., for debugging). The functions implementing scalar indexing should print a big warning (similar as in CUDA.jl which also provides a mechanism to de-activate the warning)

Error when generting copies of PSParseMatrix (as needed by NLSolve)

Solving a nonlinear problem with PartitionedArrays+GridapDistributed+Gridap with NLSolve requires a copy of the Jacobian (a PSparseMatrix) at this point: https://github.com/JuliaNLSolvers/NLSolversBase.jl/blob/78af38393b14992ea996899c6486d971b5bfa612/src/objective_types/oncedifferentiable.jl#L231

The issue can be reproduced using the distributed linear system in the examples, i.e. https://www.francescverdugo.com/PartitionedArrays.jl/dev/examples/#Distributed-sparse-linear-solve
and simply copy(A).

This copy generates an error in the implementation of similar at

PartitionedArrays.jl/src/p_sparse_matrix.jl

Line 400 in dc1a3b9

PSparseMatrix(values,row_partition,col_partition)

After fix it, I get

julia> copy(A)
ERROR: Scalar indexing on PSparseMatrix is not allowed for performance reasons.

I guess the solution is to overwrite Base.copy with a map over ranks and local copy. Is it @fverdugo ?

Docs need some examples

First and foremost, thanks for providing the package. It looks like it can be very useful for aiding in the field and solution management on a large battery of methods for PDE discretizations. However, I the package has quite a high barrier of entry, as no minimal examples are provided. I tried to reconstruct how to use the package from the linked Poisson example (i.e. PartitionedPoisson.jl) and GridapDistributed.jl, but made just minor progress over the last few days.

Trying to construct some very minimalistic example with two processes (distributed discretization of a Poisson problem in 1d into 5 elements on 2 processes, where process 1 owns the first 3 dofs and process two the last 2) I ran into the issue that I have some trouble solving the problem with a cg.

For reproduction some code (don't mind the variable names, this is a quickly stripped down version of some other code)

using MPI, PartitionedArrays, SparseArrays, IterativeSolvers, LinearAlgebra
const PArrays = PartitionedArrays

# Get MPI information
MPI.Init()
comm    = MPI.COMM_WORLD
my_rank = MPI.Comm_rank(comm)
np      = MPI.Comm_size(comm)

# Partition matrix row by row
ngdofs = 5
if my_rank == 0
    neighbors = MPIData(Int32[2], comm, (np,))
    dof_partition = MPIData(PArrays.IndexSet(my_rank+1, [1,2,3], Int32[1,1,1]), comm, (np,))
    allcols = MPIData(PArrays.IndexSet(my_rank+1, [1,2,3,4], Int32[1,1,1,2]), comm, (np,))
else
    neighbors = MPIData(Int32[1], comm, (np,))
    dof_partition = MPIData(PArrays.IndexSet(my_rank+1, [3,4,5], Int32[1,2,2]), comm, (np,))
    allcols = MPIData(PArrays.IndexSet(my_rank+1, [3,4,5], Int32[1,2,2]), comm, (np,))
end
dof_exchanger = Exchanger(dof_partition,neighbors)
rows = PRange(ngdofs,dof_partition,dof_exchanger)
cols = PRange(ngdofs,allcols,Exchanger(allcols,neighbors))

# Fill COO triplets
if my_rank == 0
    I_ = MPIData([ 1, 1, 2, 2, 2, 3, 3, 3], comm, (np,))
    J_ = MPIData([ 1, 2, 1, 2, 3, 2, 3, 4], comm, (np,))
    V_ = MPIData(Float64[1, 0, 0,-2, 1, 1,-1, 0], comm, (np,))
else
    I_ = MPIData([ 1, 1, 2, 2, 2, 3, 3], comm, (np,))
    J_ = MPIData([ 1, 2, 1, 2, 3, 2, 3], comm, (np,))
    V_ = MPIData(Float64[-1, 1, 1,-2, 1, 1,-1], comm, (np,))
end

# Construct the actual sparse matrix
#
#    P1  P1  P1  P2  P2
# P1  1   0   0   0   0
# P1  0  -2   1   0   0   
# P1  0   1  -2   1   0
# P2  0   0   1  -2   1
# P2  0   0   0   1  -1
#
#               =
#
#    P1  P1  P1  P2  P2
# P1  1   0   0   0   0
# P1  0  -2   1   0   0
# P1  0   1  -1   0   0
# P2  x   x   x   x   x
# P2  x   x   x   x   x
#
#              + 
# 
#    P1  P1  P1  P2  P2
# P1  x   x   x   x   x
# P1  x   x   x   x   x   
# P1  0   0  -1   1   0
# P2  0   0   1  -2   1
# P2  0   0   0   1  -1

K = PArrays.PSparseMatrix(I_, J_, V_, rows, cols, ids=:local)

# Trigger sync
PArrays.assemble!(K)

# Construct a constant vector 
x = PVector{Float64}(undef,K.cols)
fill!(x, 1.0)
assemble!(x)

# SPMV works
y = K*x

# Provided direct solver works
x_solved = K\y
r = K*x_solved - y
@assert norm(x_solved - x) < 1e-8

# Solve problem with cg seem to work now
x_solved = IterativeSolvers.cg(K,y)
r = K*x_solved - y
@assert norm(x_solved - x) < 1e-8

the last norm fails. ~~My first guess is that the cg fails due to improper synchronization of the SPMV results.~~ Edit: cg needs a symmetric matrix. :) However, with the symmetric matrix now the cg works ~~but the direct solver fails.~~. Also works when typing the values.

+- constructing the MPIData objects by hand, is this how to construct the PSparseMatrix, PRange and Exchanger?

Also, are there some debug methods aiding to inspect the resulting distributed matrix?

Improved version of get_part_ids + prun for mpi backend

Just to not forget ...

in this PR https://github.com/gridap/GridapP4est.jl/pull/19/files we (@principejavier and me) wrote a more general version of get_part_ids and prun which decouples the number of MPI tasks in the world communicator and the number of parts. I wonder whether these variants would also be useful for PartitionedArrays.jl.

Performance benchmark and comparison against PETSc

It would be nice to do some scaling tests and compare against PETSc and provide the results in the README.md

GSoC Task #2: Implement aggregation method for vector-valued problems

Meeting notes 2024-06-13:
@oriolcg
Attendees: @fverdugo @amartinhuertas @GeliezaK
Notes:

Implementation of matrix-to-graph-transformation here: muelue, pyAMG, PetSci
AMG for vector-valued problems

Action Points:

perform spell check on tutorial
blog post on LinkedIn with link to tutorial PR
add consistency tests to test_amg.jl
start implementing matrix-to-graph transformation for CSC sparse matrix and PSparse matrix

Bug: `copy` returns matrix with wrong communication pattern

Hi @fverdugo,

There is an issue with the current copy function. I detected it for the old implemetation of PSparseMatrix, but I believe it should affect the new one as well.

Here is a MWE:

using PartitionedArrays

np = 2
ranks = with_debug() do distribute
  distribute(LinearIndices((np,)))
end

n = 4
rows = uniform_partition(ranks,n)
I,J,V = map(ranks) do r
    if r == 1
        I = [1,2,3]
        J = [1,2,3]
    elseif r == 2
        I = [3,4]
        J = [3,4]
    end
    I,J,fill(Float64(r),length(J))
end |> tuple_of_arrays
A = old_psparse!(I,J,V,rows,rows) |> fetch

B = copy(A)

A_caches = A.cache.items[1].cache
B_caches = B.cache.items[1].cache

which yields:

PartitionedArrays.VectorAssemblyCache{Float64}(Int32[2], Int32[], JaggedArray{Int32,Int32}(Vector{Int32}[[3]]), JaggedArray{Int32,Int32}(Vector{Int32}[]), JaggedArray{Float64,Int32}([[0.0]]), JaggedArray{Float64,Int32}(Vector{Float64}[]))

PartitionedArrays.VectorAssemblyCache{Float64}(Int32[2], Int32[], JaggedArray{Int32,Int32}(Vector{Int32}[[]]), JaggedArray{Int32,Int32}(Vector{Int32}[]), JaggedArray{Float64,Int32}([Float64[]]), JaggedArray{Float64,Int32}(Vector{Float64}[]))

The issue here is the following: We are not implementing copy, but rather copyto! and similar. Additionally, similar depends on the type of the local matrices, in our case SparseArrays.
I believe the issue is that SparseArrays's similar returns a sparse matrix without non-zero elements. This makes the construction of the assembly caches to be wrong.

A possible solution (without changing the behaviour of SparseArrays is to also implement copy in the following way:

function Base.copy(a::PSparseMatrix)
  mats = map(copy,partition(a))
  cache = map(copy_cache,a.cache)
  return PSparseMatrix(mats,partition(axes(a,1)),partition(axes(a,2)),cache)
end

where copying the cache is not necessary, but is more efficient (I believe).

Implement some backed-independent timing mechanism

We need this to perform scaling tests and compare against PETSc.

fverdugo / partitionedarrays.jl Goto Github PK

partitionedarrays.jl's People

Contributors

Stargazers

Watchers

Forkers

partitionedarrays.jl's Issues

Example

Recommend Projects

Recommend Topics

Recommend Org

Jobs