GithubHelp home page GithubHelp logo

litaosysu / minismc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from weiqunzhang/minismc

0.0 1.0 0.0 288 KB

DNS code solving compressible Navier-Stokes equations for viscous multi-component reacting flows

Makefile 2.27% Perl 2.61% C 15.03% Fortran 80.08%

minismc's Introduction

This is a minimalist version of SMC, a combustion code that solves
compressible Navier-Stokes equations for viscous multi-component
reacting flows.  SMC is developed at the Center for Computational
Sciences and Engineering (CCSE) at Lawrence Berkeley National
Laboratory.  It uses the BoxLib library
(https://ccse.lbl.gov/BoxLib/), also developed at CCSE.  This
minimalist version of SMC includes a stripped down version of BoxLib.
Therefore, the full version of BoxLib is not needed.

The code is mainly written in Fortran 90, with some parts in C.
Parallelization is implemented with hybrid MPI/OpenMP.  However, SMC
can also be built with pure MPI, or pure OpenMP, or neither.

GNU make is used to build the code.  The following options can be
specified in the GNUmakefile:

* MPI :=  
  This determines whether we are using the Message Passing
  Interface (MPI) library.  Leaving this option empty will disable
  MPI.  If MPI is enabled, you need to specify how to link to the MPI
  library.  You can set USE_MPI_WRAPPERS := t, then mpif90 or mpiifort
  will be used.  Or you can set "mpi_include_dir", "mpi_lib_dir", and
  "mpi_libraries" to appropriate values in GNUmakefile.

* OMP := t
  This determines whether we are using OpenMP.  Leaving this option
  empty disables OpenMP.

* NDEBUG := t
  This option determines whether we compile with support for debugging
  (usually also enabling some runtime checks). Setting NDEBUG := t
  turns off debugging.

* COMP := Intel
  Set this to your compiler of choice (e.g., GNU).  Specific options
  for a certain compiler are stored in the comps/$(COMP).mak file.   

* MIC :=
  Set this if compiling for Intel Xeon Phi.

* K_USE_AUTOMATIC := t
  This determines whether some arrays in kernels.F90 will be automatic
  or allocatable.  Automatic arrays are usually on the stack.
  When OpenMP is used, allocating memory on the stack is usually
  faster than on the heap.  However, one must make sure there is
  adequate space on the stack; otherwise a segmentation fault might
  occur.  Note that the size of the stack for threads can be adjusted
  by the OMP_STACKSIZE environment variable and the shell might also
  impose a limit on stack size.

* MKVERBOSE := t
  This determines the verbosity of the building process.

To build an executable, type "make" in the SMC directory.  Note that
the above options can also be specified in a command line such as
"make MPI=t MIC=t NDEBUG= ".  The executable will have a name like
main.GNU.debug.mpi.exe depending on the compiler and some other
options specified in the GNUmakefile or command line.  There are some
other options provided by the GNUmakefile.  You can type "make TAGS"
or "make tags" to generate tag file for Emacs or vi.  You can type
"make clean" or "make realclean" to remove files generated during the
make process.

Runtime parameters can be specified in the inputs_SMC file, which must
be placed in the directory where the executable is run.  Below are a
list of selected runtime parameters and default values.

* n_cellx = -1
* n_celly = -1
* n_cellz = -1

  Number of cells in the x, y, and z-directions.  They must be greater than 4.

* max_grid_size = 64 

  This determines the largest grid size of a box.  If max_grid_size is
  too big, there might be fewer boxes than MPI tasks, and then some
  processors will be idle.  However, if it is too small, each MPI task
  may have too many small boxes, and the performance will be affected.
  Thus, max_grid_size must be chosen according to the number of cells
  and the number of MPI tasks.  Ideally, you would like to have one
  box for each MPI task to minimize the communication cost.

* tb_split_dim = 2

  This determines how domain decomposition in some subroutines is done
  for threads.  When OpenMP is used, each box is "virtually" divided
  into "nthreads" boxes and each OpenMP thread works on one
  thread-box.  This parameter can be either 2 or 3. When it is 2, the
  domain decomposition is in y and z-directions; when it is 3, the
  domain decomposition is in 3D.

* tb_blocksize_x = -1
* tb_blocksize_y = 16
* tb_blocksize_z = 16

  SMC uses a blocking strategy.  These parameters determines blocking
  size in x, y and z-directions.  The value should be either greater
  than or equal to 4, or less than 0.  If it is less than 0, no
  blocking is used in that direction.

* verbose = 0

  This determines verbosity.

* stop_time = -1.d0

  Simulation stop time.

* max_step = 1

  Maximum number of timesteps in the simulation.

Note that there are a number of OMP COLLAPSE clauses in the code.
These tend to trigger compiler bugs.  If the code crashes, try to run
it without "COLLAPSE". 

More information about SMC can be found in the paper "High-Order
Algorithms for Compressible Reacting Flow with Complex Chemistry" by
M. Emmett, W. Zhang, and J.B. Bell (http://arxiv.org/abs/1309.7327),
and in "Optimizing for Navier-Stokes Equations" by Antonio Valles and
Weiqun Zhang (Chapter 4 of "High Performance Parallelism Pearls:
Multicore and Many-core Programming Approaches", James Reinders and
Jim Jeffers, published by Morgan Kaufmann, 2014).  If you have any
questions, please contact Weiqun Zhang ([email protected]).

**********************************************************************

As noted in the book, the best performance shown on Figure 4.19 for
the Intel Xeon Phi coprocessor is 12.99 seconds, and we found close to
publication time that a few additional Fortran compiler options allow
us to do better and achieve 9.11 seconds for the coprocessor (1.43x
improvement vs . Figure 4.19) while keeping the processor performance
the same. The compiler options for the improved performance are:
-fno-alias -noprec-div -no-prec-sqrt -fimf-precision=low
-fimf-domain-exclusion=15 -align array64byte -opt-assumesafe-padding
-opt-streaming-stores always -opt-streaming-cache-evict=0.

These new best results on coprocessor use -1, 16, 16 for the thread
blocking specified in inputs_SMC file and SIMD directives are no
longer needed in kernels.F90 file when using -fno-alias compiler
option.  The chapter discusses Figure 4.19 as best results for
coprocessor.

The files here reflect all these changes for higher performance.

minismc's People

Contributors

weiqunzhang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.