The x3div from xcompact3d

Split modules into submodules

This would help keep associated functionality together, for example "tools" is something of a grabbag - a "restart" submodule could be factored out, the interfaces are defined in the tools module and implemented in the restart submodule. This keeps changes isolated and also benefits users as a small code change in a submodule does not trigger a full recompile

V2 project (hack_DC_TGV branch): benchmarking on CPUs/GPUs with various compilers (GCC, NVIDIA, Intel) for the TGV case

compilation of data for the TGV case on CPUs/GPUs for each version of x3div (when TGV capability is available). The aim is to produce strong and weak scaling plots for various spatial resolution (from 65^3 to 2049^3) with as much compute nodes as possible.

Data should be collected in a way that they can be reproducible (compiler version, mpi version, github version, etc.)

Data should first be collected for ARCHER2.

One issue is to agree on a way to measure the performance (current MPI tools might not be the correct way of doing this).

To be discussed: how and where to store the data on Github

fix cuFFT

V2 project (hack_DC_TGV branch): use of 2DECOMP&FFT as a sub-project

The idea here is to have the 2DECOMP&FFT files in a separate directory as the Xcompact3d files. At the moment, there is a 2decomp directory inside the x3div project.

The aim is to have a dedicated 2DECOMP&FFT project, under the Xcompact3d umbrella.

It will allow users to raise dedicated queries/issues for the 2DECOMP&FFT library and will allow its use outside Xcompact3d.

The library should be compiled independently of Xcompact3d and use as an external library; Makefile and cmake in Xcompact3d will have to be modified accordingly.

IMPORTANT: the 2decomp files in x3div are not up to date so we will need to start with the 2decomp files from incompact3d

-X3div working with nz=1 (Cedric)

-functions for MPI (brainstorming)

Use of generic FFT and cuFFT library on CPUs/GPUs in x3div

To make sure that x3div can use the generic FFT (no need for external librairies) on CPU/GPU, and the cuFFT on NVIDIA GPUs. Default setting should be FFT generic option.

It should be noted that the generic FFT currently need explicit Openacc code to run on NVIDIA GPUs. We need an update on this from NVIDIA. See other issue raised for this problem with openacc (we want to avoid openacc and rely only on do concurrent.

Discussion about FFTW: do we need to have this option available?

Prepare "do concurrent" for Xcompact3d:

-change the loops with do concurrent

TARGET: next hackathon

V2 project (hack_DC_TGV branch): cmake capabilities for CPUs/GPUs for various compilers (GCC, NVIDIA, Intel)

Continue to develop a Cmake framework for X3div, with the possibility to use Fortran compilers on CPUs and GPUs. To start with, the framework should work with GCC, Nvidia and Intel compilers, as well as the compilers available on ARCHER2.

Implement do concurrent in fft_generic.f90

The FFT algorithm in fft_generic could use a do concurrent construct here ?

x3div/decomp2d/fft_generic.f90

Lines 81 to 91 in 2421f3c

 do k=1,decomp%xsz(3) 

 do j=1,decomp%xsz(2) 

 do i=1,decomp%xsz(1) 

 buf(i) = inout(i,j,k) 

 end do 

 call spcfft(buf,decomp%xsz(1),isign,scratch) 

 do i=1,decomp%xsz(1) 

 inout(i,j,k) = buf(i) 

 end do 

 end do 

 end do

Benchmarking:

CPU only GCC (Cedric/Sylvain)
CPU only Intel (Cedric/Sylvain)
CPU only NVIDIA (Stefano)
GPU only NVIDIA (Stefano)
testing on ARCHER2/GNU (CPU only, Paul)
testing on ARCHER2/Cray (CPU only, Paul)
testing on ARCHER2/AOCC? (CPU only, Paul)

-->Testing with TGV case with 128^3, 256^3, 512^3 and 1024^3 mesh nodes.

--> Properly document the benchmarking for reproducibility (machine, compiler, option, github version, etc.)

TARGET: End-Feb

Finalised "do concurrent" version of x3div:

Poisson solver (Stefano)
Check MPI_ALLTOALL (Stefano)
CMake/Makefile for NVIDIA flags (Stefano)
Update Readme for on how to use the Cmake (Stefano)
Chose best Thomas algo option (Cedric)

TARGET: 04/02/2022

Clean compilation scripts and document compilation procedure

Currently, compilation of the code can be performed using the provided

Makefile
cmake scripts
autotools scripts

On the one hand, keeping the Makefile is probably a good idea so that one (not very familiar with the cmake syntax) can quickly test the code on new architectures. On the other hand, keeping both the cmake and autotools is probably a bad idea, and the autotools should be removed later on (v3).

It could be interesting to have a Makefile with explicit file dependencies so that parallel make would work with both Makefile and cmake.

The compilation using cmake should be clearly documented. Some clusters have support for cmake but no support for ccmake or cmake-gui. Thus, it is important to document clearly all the cmake options one can use from the command-line.

2DECOMP&FFT work:

(we need 2DECOMP&FFT to work as an independent library so it can be maintained properly)
-Github sub-project (Paul), if bad idea then go for external library option

TARGET: Mid-FEB

FFT/generic : Replace explicit openacc directives with do concurrent

Following commit ef6dfb9, explicit openacc directive allow the generic fft to run on GPU. Do concurrent should be used instead of explicit openacc directives.

Minimal CI workflow using GitHub actions

To make a start on using github actions we could have a simple workflow that builds x3div and runs the default benchmark.
This would be run upon pushing new commits to the repo.

V2 project (hack_DC_TGV branch): implementation of Unit Tests

The aim is to be able to test automatically derivatives/interpolations/filters regularly (especially after an update of the code)

-UT for each first derivative subroutine in x y and z directions
-UT for each second derivative subroutine in x y and z directions
-UT for each interpolation subroutine in x y and z directions
-UT for each filter subroutine in x y and z directions
-UT for the Poisson solver

Each UT will have to work for all possible sets of boundary conditions. The idea is to define a cos or a sin function to be derived/interpolated, and compare the results with the analytical solution. An error threshold would have to be defined for automatic testing.

See other issue raised by Thibault for more information.

finalise v2 (branch hack_DC_TGV)

GPU version with TGV case implemented

V2 project (hack_DC_TGV branch): implementation of the I/O tools from Xcompact3d for the TGV case

To simply copy paste the MPI I/O tools from xcompact3d to x3div to allow for post-processing of the TGV case.

I/O tools would have to become "do concurrent" friendly.

It would be good to have ADIOS2 capabilities as well.

Discussion to have with Paul about the new Py4incompact3d which allows for parallel post-processing (only need to save ux, uy, uz and pressure in production runs). Potentially possibility to simplify the post-processing tools when running simulations.

finalise v1 (reference case CPU)

fix the time advancement part of x3div

Setup unit testing with CMake and pFUnit

A minimum setup could be

Basic CMakeLists.txt for configuration and particularly interfacing with pFUnit.
A couple of placeholder unit tests to "test the testing".
Build and cache pFUnit in the GH actions workflow, build the test executable and run the tests.

Currently (17.02.2022) some tests are implemented in the hack_doconcurrent branch using CTest and pFUnit, so I'm assuming some of you know about this testing framework. pFUnit looks like a popular option for fortran. It's maybe not as well documented and feature-rich as frameworks in other languages, but seems to be used by many projects out there. Alternatives are Funit, FRUIT and Ftunit but pFUnit is the only one that ibviusly looks actively maintained?

I've played around with it, it's straightforward to install and use. At least following basic examples available at

https://github.com/Goddard-Fortran-Ecosystem/pFUnit_demos

It also had support for MPI. I don't know if this is common among fortran frameworks, but it sounds relevant to me.

Tests are written in free form fortran, e.g.

@test
subroutine test_square()
   use Square_mod
   use funit

   @assertEqual(9., square(3.), 'square(3)')
   
end subroutine test_square

that is meant to be parsed and compiled. But a CMake module is provided that takes care of everything, so that

add_pfunit_ctest (my_tests
  TEST_SOURCES ${test_srcs}
  LINK_LIBRARIES sut # your application library
  )

should be enough to generate test executable my_tests. See Using pFUnit.

	do k=1,decomp%xsz(3)
	do j=1,decomp%xsz(2)
	do i=1,decomp%xsz(1)
	buf(i) = inout(i,j,k)
	end do
	call spcfft(buf,decomp%xsz(1),isign,scratch)
	do i=1,decomp%xsz(1)
	inout(i,j,k) = buf(i)
	end do
	end do
	end do

xcompact3d / x3div Goto Github PK

x3div's Introduction

x3div

x3div's People

Contributors

Stargazers

Watchers

Forkers

x3div's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs