ornl-cees / mfmg Goto Github PK
View Code? Open in Web Editor NEWMFMG is an open-source library implementing matrix-free multigrid methods.
Home Page: https://mfmg.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
MFMG is an open-source library implementing matrix-free multigrid methods.
Home Page: https://mfmg.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
grid_complexity
and operator_complexity
are missing since the merge of the refactored code.
Mesh
, Operator
and MultiVector
. However, it is unclear how Operator
and Mesh
should be separated. Should it be templated on the device too, or is it a part of Operator
?Level
specialization for dealii
hostLevel
specialization for dealii
deviceIn order get a MPI+TBB version of spectral amge (2 grids), we need to do the following steps:
We need to add tests that use laplace_matrix_free
Right now, setting FE degree has a drawback: one has to remember to change it in two places, Laplace
constructor and MeshEvaluator
. Instead, it should be done through a parameter list that is passed to both.
We need to remove the assumption that we have access to a matrix on the GPU
We should have a look at const
-correctness throughout the code, see e.g. #211 (comment).
The function Smoother::apply
can be optimized by implementing a proper alpha*A*x + beta*y
apply call for the device matrix.
Right now we only instantiate vector with ScalarType=double
. We should also instantiate ScalarType=float
.
Convergence on hyperball is degraded on the testing machine. We need to investigate why. A good first step would be to reproduce the results with Docker
In particular, we consider the Poisson equation discretized with bilinear functions
on a 32 × 32 square grid of square elements. Homogeneous Dirichlet boundary conditions
are assumed, and they are eliminated from the matrix during discretization.
Standard 2 × 2 agglomeration is forced on each level.
Unstaggered elements
metric | |
---|---|
number of levels | 2 |
convergence rate | 0.04 |
grid complexity | 1.47 |
operator complexity | 1.9 |
Here is a list of things that we can do to improve the code:
class
by typename
in template parameter (right now we are not using any thing consistently. Example: VectorType
is sometimes a class
sometimes a typename
)const
on the rightadapters_dealii
to dealii_adapaters
(consistency with dealii_operator
)public
, protected
, and private
evaluate
for the GlobalOperator
requires to copy the system sparse matrix. This is a pretty expensive operation. Instead, we should just ask for a shared pointerSick and tired of recompiling things.
Need to create an issue to keep track of these
Originally posted by @dalg24 in #183 (comment)
We only test the code with cuda support. We need to check that it works also without cuda.
Currently, we have a number of
dealii::MultithreadInfo::set_thread_limit(1);
in the tests. We should get rid off them and make sure that everything works with multiple threads.
<snip>
/usr/local/cuda-8.0/lib64/libcusolver.so: undefined reference to `GOMP_parallel_start'
/usr/local/cuda-8.0/lib64/libcusolver.so: undefined reference to `GOMP_critical_end'
<snip>
-fopenmp
flag is present in the link, and is after cusolver
. DealII was installed through spack. This does not happen when DealII is installed independently.
Enabling CUDA in MFMG leads to successful linking.
All classes currently derived from mfmg::MeshEvaluator
know about dim
at compile time. We should consider making access constexpr
.
After updating deal.II
, i.e. after dealii/dealii#7687, the LAPACK
tests in test_hierarchy
were not working (#138). One of the reasons is that the agglomerates might not have boundary conditions and hence the corresponding matrix has a non-trivial kernel shifting the diagonal entries slightly at least allows inverting the matrix (using UMFPACK
). I observed this even before updating, i.e. with with the version in the image. After that the LAPACK
tests are still failing due to the restriction matrix not being full-rank (possibly due to the spurious eigenvalues).
Just switching from LAPACK
to ARPACK
gives similar results as with Lanczos
. Copying the reference values from the Lanczos
tests to the ARPACK
tests gives:
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_16": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.11466298397904537 != 0.1148067738]. Relative difference exceeds tolerance [0.00125402 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = false; reordering = None; mesh_evaluator_type = DealIIMeshEvaluator; eigensolver = arpack;
1: FE degree: 1
1: Convergence rate: 0.11
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_17": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.11469510272606935 != 0.1148067738]. Relative difference exceeds tolerance [0.000973634 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = false; reordering = None; mesh_evaluator_type = DealIIMeshEvaluator; eigensolver = lanczos;
1: skip
1: FE degree: 1
1: Convergence rate: 0.30
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_19": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.2995474407510747 != 0.3012330587]. Relative difference exceeds tolerance [0.00562722 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = false; reordering = None; mesh_evaluator_type = DealIIMatrixFreeMeshEvaluator; eigensolver = lanczos;
1: FE degree: 1
1: Convergence rate: 0.11
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_20": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.11466298397904537 != 0.1148067738]. Relative difference exceeds tolerance [0.00125402 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = false; reordering = Reverse Cuthill_McKee; mesh_evaluator_type = DealIIMeshEvaluator; eigensolver = arpack;
1: FE degree: 1
1: Convergence rate: 0.11
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_21": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.11469510272606935 != 0.1148067738]. Relative difference exceeds tolerance [0.000973634 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = false; reordering = Reverse Cuthill_McKee; mesh_evaluator_type = DealIIMeshEvaluator; eigensolver = lanczos;
1: skip
1: FE degree: 1
1: Convergence rate: 0.30
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_23": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.2995474407510747 != 0.3012330587]. Relative difference exceeds tolerance [0.00562722 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = false; reordering = Reverse Cuthill_McKee; mesh_evaluator_type = DealIIMatrixFreeMeshEvaluator; eigensolver = lanczos;
1: FE degree: 1
1: Convergence rate: 0.10
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_24": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.10243805241099262 != 0.10245448259999999]. Relative difference exceeds tolerance [0.000160391 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = true; reordering = None; mesh_evaluator_type = DealIIMeshEvaluator; eigensolver = arpack;
1: FE degree: 1
1: Convergence rate: 0.10
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_25": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.10245562595409574 != 0.10245448259999999]. Relative difference exceeds tolerance [1.11596e-05 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = true; reordering = None; mesh_evaluator_type = DealIIMeshEvaluator; eigensolver = lanczos;
1: skip
1: FE degree: 1
1: Convergence rate: 0.30
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_27": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.29778963080796284 != 0.29779748690000002]. Relative difference exceeds tolerance [2.63813e-05 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = true; reordering = None; mesh_evaluator_type = DealIIMatrixFreeMeshEvaluator; eigensolver = lanczos;
1: FE degree: 1
1: Convergence rate: 0.10
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_28": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.10243805241099262 != 0.10245448259999999]. Relative difference exceeds tolerance [0.000160391 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = true; reordering = Reverse Cuthill_McKee; mesh_evaluator_type = DealIIMeshEvaluator; eigensolver = arpack;
1: FE degree: 1
1: Convergence rate: 0.10
1: /tmp/mfmg/tests/test_hierarchy.cc(241): error: in "hierarchy_3d/_29": check conv_rate == ref_solution[std::make_tuple(mesh, distort_random_str, reordering, eigensolver, matrix_free_str)] has failed [0.10245562595409574 != 0.10245448259999999]. Relative difference exceeds tolerance [1.11596e-05 > 1e-06]
1: Failure occurred in a following context:
1: mesh = hyper_ball; distort_random = true; reordering = Reverse Cuthill_McKee; mesh_evaluator_type = DealIIMeshEvaluator; eigensolver = lanczos;
The Lanczos
results are also slightly different due to the minor modification of the agglomerates.
A semi-easy approach that may not have the best numerical properties, but will have many of the components that will be required in the long term. It is general enough that may not need to much adaptions in the future.
Specific steps:
apply
.DoFHandler
for a local aggregate mesh, and instead assembling an aggregate matrix out of local matrices. In the long term, this will almost a non-steps, and the agglomerate operator will need not be a matrix, and will just need an apply
. An important thing here: this operator may encompass additional degrees of freedom that are not part of an agglomerate. This is due to using `A_i = RA_{\tau_i}P$ which will make it go beyond agglomerate borders.The MPI functionality is somewhat orthogonal here. I think it could be properly addressed keeping around some map. It will rear its ugly head when trying to deal with indices. But let me forget it for a bit.
The main code thrust here is:
This should work with a Poisson equation. However, I think this does not have great robustness as it does not creep.
Currently, the solver doesn't converge for discontinuous
, but gives similar results for constant
, linear
and linear_x
. We should make sure that all options actually work.
test_hierarchy_device
is reporting
!!! detected some memory leaks in the code: trying to free non-empty temporary device pool !!!
34: ptr: 0x7f21e8006000 size: 3117056
34: ptr: 0x7f21e82ff000 size: 32768
34: ptr: 0x7f21e8307000 size: 32768
34: ptr: 0x7f21e830f000 size: 1560576
34: ptr: 0x7f21e848c000 size: 36864
34: ptr: 0x7f21e8495000 size: 36864
If we cache variables, we can see (and modify) them using ccmake
It may make sense to develop a prototype to quickly play around with using high-level languages. The candidates are Julia, and Python. Need to figure out what would be the easiest and fastest as we don't have a lot of time to spend on this.
Currently, we are using LAPACKE
interfaces. Thus, it is not sufficient if deal.II
is built with LAPACK
support and we hack the appropriate LAPACKE
include path into the docker image configuration. We also need to detect LAPACKE
, add the include path, link to the library and make sure that it is compatible with the LAPACK
version used when building deal.II
.
When using mpi, the convergence in 3D is degraded. We need to understand why. It could be due to the way the agglomerate are formed in 3D
Add the moment the only eigensolver that we have on the GPU requires the matrix. We need an eigensolver that only require the action of an operator.
We should look/try to install saamge since it is the only mpi implementation of spectral amge. Hypre contains a serial implementation of spectral amge.
Running the default hierarchy_input.info
but in 3D throws with
An error occurred in line <1365> of file </home/xap/local/opt/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/dealii-develop-yamtwkdku77c3ccvocmevzb2747psl2n/include/deal.II/lac/la_paralle
l_vector.h> in function
Number dealii::LinearAlgebra::distributed::Vector<Number>::operator()(dealii::LinearAlgebra::distributed::Vector<Number>::size_type) const [with Number = double; dealii::LinearAlgebra::distributed::Vector<Number>::size_type = unsigned int]
The violated condition was:
partitioner->in_local_range (global_index) || partitioner->ghost_indices().is_element(global_index)
Additional information:
You tried to access element 3433648137 of a distributed vector, but this element is not stored on the current processor. Note: The range of locally owned elements is 0 to 35937, and there
are 0 ghost elements that this vector can access.
The problem is with the following fragment:
dealii::TrilinosWrappers::SparseMatrix weight_matrix(sp);
pos = 0;
for (unsigned int i = 0; i < n_agglomerates; ++i)
{
unsigned int const n_elem = eigenvectors[pos].size();
for (unsigned int j = 0; j < n_elem; ++j)
{
dealii::types::global_dof_index const global_pos = dof_indices_maps[i][j];
double const value =
diag_elements[i][j] / locally_relevant_global_diag[global_pos];
weight_matrix.add(global_pos, global_pos, value);
}
++pos;
}
In debugger, i = 1696
, and j = 32
with n_elem = 36
. However, dof_indices_maps[i]
is only of size 27.
We need to be able to run with different stretch ratios among dimensions. This poses a problem for classical AMG, but AMGe should be robust wrt that. We need to confirm that.
We need to add Wayne's eigensolver to this repository and compare the results with arpack/lapack
We need to implement MatrixFreeHierarchyHelpers
to use the matrix-free implementation the easiest will probably to derive it (for now) frm DealIIHierarchyHelpers
.
gpusys$ cat /proc/meminfo | head -n1
MemTotal: 3859908 kB
cd /usr/local/src
git clone https://github.com/spack/spack.git
chmod -R a+rX spack
export SPACK_ROOT=/usr/local/src/spack
. $SPACK_ROOT/share/spack/setup-env.sh
spack install gcc
spack compiler add spack location -i [email protected]
spack install dealii@develop %[email protected]
GCC_ROOT_=$(spack location --install-dir gcc)
export LD_LIBRARY_PATH="${GCC_ROOT_}/lib:${GCC_ROOT_}/lib64"
PATH="${GCC_ROOT_}/bin:${PATH}"
MPI_ROOT_=$(spack location --install-dir mpi)
PATH="${MPI_ROOT_}/bin:${PATH}"
CMAKE_ROOT_=$(spack location --install-dir cmake)
PATH="${CMAKE_ROOT_}/bin:${PATH}"
DEAL_II_DIR=$(spack location --install-dir dealii)
BOOST_ROOT=$(spack location --install-dir boost)
cmake
-D CMAKE_BUILD_TYPE=Debug
-D MFMG_ENABLE_TESTS=ON
-D MFMG_ENABLE_CUDA=OFF
-D BOOST_ROOT=${BOOST_ROOT}
-D DEAL_II_DIR=${DEAL_II_DIR}
../mfmg
make
env DEAL_II_NUM_THREADS=1 make test ARGS=-V
7: Test command: /usr/local/src/spack/opt/spack/linux-rhel7-x86_64/gcc-8.2.0/openmpi-3.1.3-ib5tya3erlk4gxgepkmge7ugk6ea6uip/bin/mpiexec "-n" "1" "./test_hierarchy"
7: Test timeout computed to be: 1500
7: Running 23 test cases...
7: At line 51 of file /tmp/root/spack-stage/spack-stage-4kar8p/arpack-ng-3.6.3/UTIL/dvout.f
7: Fortran runtime error: Unit number is negative and unit was not already opened with OPEN(NEWUNIT=...)
7: --------------------------------------------------------------------------
7: Primary job terminated normally, but 1 process returned
7: a non-zero exit code. Per user-direction, the job has been aborted.
7: --------------------------------------------------------------------------
7: --------------------------------------------------------------------------
7: mpiexec detected that one or more processes exited with non-zero status, thus causing
7: the job to be terminated. The first process to do so was:
7:
7: Process name: [[55908,1],0]
7: Exit code: 2
7: --------------------------------------------------------------------------
7/20 Test #7: test_hierarchy_1 .................***Failed 4.07 sec
test 8
Start 8: test_hierarchy_2
8: Test command: /usr/local/src/spack/opt/spack/linux-rhel7-x86_64/gcc-8.2.0/openmpi-3.1.3-ib5tya3erlk4gxgepkmge7ugk6ea6uip/bin/mpiexec "-n" "2" "./test_hierarchy"
8: Test timeout computed to be: 1500
8: Running 23 test cases...
8: Running 23 test cases...
8: At line 51 of file /tmp/root/spack-stage/spack-stage-4kar8p/arpack-ng-3.6.3/UTIL/dvout.f
8: Fortran runtime error: Unit number is negative and unit was not already opened with OPEN(NEWUNIT=...)
8: --------------------------------------------------------------------------
8: Primary job terminated normally, but 1 process returned
8: a non-zero exit code. Per user-direction, the job has been aborted.
8: --------------------------------------------------------------------------
8: unknown location(0): fatal error: in "benchmark<mfmg__DealIIMeshEvaluator<2>>": dealii::SparseDirectUMFPACK::ExcUMFPACKError:
8: --------------------------------------------------------
8: An error occurred in line <291> of file </usr/local/src/spack/var/spack/stage/dealii-develop-c34vncl5qn7fkr4afiohu5cqe5i4kd5x/dealii/source/lac/sparse_direct.cc> in function
8: void dealii::SparseDirectUMFPACK::factorize(const Matrix&) [with Matrix = dealii::SparseMatrix]
8: The violated condition was:
8: status == UMFPACK_OK
8: Additional information:
8: UMFPACK routine umfpack_dl_numeric returned error status 1.
8:
8: A complete list of error codes can be found in the file <bundled/umfpack/UMFPACK/Include/umfpack.h>.
8:
8: That said, the two most common errors that can happen are that your matrix cannot be factorized because it is rank deficient, and that UMFPACK runs out of memory because your problem is too large.
8:
8: The first of these cases most often happens if you forget terms in your bilinear form necessary to ensure that the matrix has full rank, or if your equation has a spatially variable coefficient (or nonlinearity) that is supposed to be strictly positive but, for whatever reasons, is negative or zero. In either case, you probably want to check your assembly procedure. Similarly, a matrix can be rank deficient if you forgot to apply the appropriate boundary conditions. For example, the Laplace equation without boundary conditions has a single zero eigenvalue and its rank is therefore deficient by one.
8:
8: The other common situation is that you run out of memory.On a typical laptop or desktop, it should easily be possible to solve problems with 100,000 unknowns in 2d. If you are solving problems with many more unknowns than that, in particular if you are in 3d, then you may be running out of memory and you will need to consider iterative solvers instead of the direct solver employed by UMFPACK.
8: --------------------------------------------------------
8:
8: /home/wjd/mfmg_project/mfmg/tests/test_hierarchy.cc(114): last checkpoint: "benchmark" entry.
8: --------------------------------------------------------------------------
8: mpiexec detected that one or more processes exited with non-zero status, thus causing
8: the job to be terminated. The first process to do so was:
8:
8: Process name: [[55924,1],0]
8: Exit code: 2
8: --------------------------------------------------------------------------
8/20 Test #8: test_hierarchy_2 .................***Failed 2.91 sec
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
An error occurred in line <291> of file </usr/local/src/spack/var/spack/stage/dealii-develop-c34vncl5qn7fkr4afiohu5cqe5i4kd5x/dealii/source/lac/sparse_direct.cc> in function
void dealii::SparseDirectUMFPACK::factorize(const Matrix&) [with Matrix = dealii::SparseMatrix]
The violated condition was:
status == UMFPACK_OK
Additional information:
UMFPACK routine umfpack_dl_numeric returned error status 1.
A complete list of error codes can be found in the file <bundled/umfpack/UMFPACK/Include/umfpack.h>.
That said, the two most common errors that can happen are that your matrix cannot be factorized because it is rank deficient, and that UMFPACK runs out of memory because your problem is too large.
The first of these cases most often happens if you forget terms in your bilinear form necessary to ensure that the matrix has full rank, or if your equation has a spatially variable coefficient (or nonlinearity) that is supposed to be strictly positive but, for whatever reasons, is negative or zero. In either case, you probably want to check your assembly procedure. Similarly, a matrix can be rank deficient if you forgot to apply the appropriate boundary conditions. For example, the Laplace equation without boundary conditions has a single zero eigenvalue and its rank is therefore deficient by one.
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
The goal here is to at least have a baseline and a mixed full multilevel hierarchy. Some of the things that would be interesting to see if how larger of the agglomerates one could take and still have reasonable convergence. In addition, we should compare the convergence and performance with a fully assembled fine level matrix and using ML.
In #110, we had to relax the randomization of one mesh for the algorithm to converge. This is because in that case the restrictor is not full ranked. This lead us to the discovery that we don't know how to compute the eigenvectors of the operators on an agglomerate when we have boundary conditions. When using LAPACK
, we compute all the eigenvalues including the ones corresponding to the boundary conditions however these are spurious boundary conditions. ARPACK
does not have this problem because we "eliminate" these eigenvalues by setting to zero the dofs corresponding to the Dirichlet BC in the initial guess. However, running ARPACK
twice in a row gives different results which may indicate a stability problem. Since we plan to change the eigensolver, we need to re-investigate this problem once the eigensolver has been integrated in the code.
CMake Warning at CMakeLists.txt:3 (PROJECT):
VERSION keyword not followed by a value or was followed by a value that
expanded to nothing.
The problem seems that SetupMFMG
is being included after MFMG_VERSION
is used.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.