llnl / quicksilver Goto Github PK

View Code? Open in Web Editor NEW

39.0 12.0 33.0 246 KB

A proxy app for the Monte Carlo Transport Code, Mercury. LLNL-CODE-684037

License: Other

C++ 96.93% Makefile 3.07%

proxy-application monte-carlo cpp

quicksilver's Introduction

Quicksilver

Introduction

Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simpliﬁed dynamic monte carlo particle transport problem. Quicksilver attempts to replicate the memory access patterns, communication patterns, and the branching or divergence of Mercury for problems using multigroup cross sections. OpenMP and MPI are used for parallelization. A GPU version is available. Unified memory is assumed.

Performance of Quicksilver is likely to be dominated by latency bound table look-ups, a highly branchy/divergent code path, and poor vectorization potential.

For more information, visit the LLNL co-design pages.

Building Quicksilver

Instructions to build Quicksilver can be found in the Makefile. Quicksilver is a relatively easy to build code with no external dependencies (except MPI and OpenMP). You should be able to build Quicksilver on nearly any system by customizing the values of only four variables in the Makefile:

CXX The name of the C++ compiler (with path if necessary) Quicksilver uses C++11 features, so a C++11 compliant compiler should be used.
CXXFLAGS Command line switches to pass to the C++ compiler when compiling objects and when linking the executable.
CPPFLAGS Command line switches to pass to the compiler only when compiling objects
LDFLAGS Command line switches to pass to the compiler only when linking the executable

Sample definitions for a number of common systems are provided.

Quicksilver recognizes a number of pre-processor macros that enable or disable various code features such as MPI, OpenMP, etc. These are described in the Makefile.

Running Quicksilver

Quicksilver’s behavior is controlled by a combination of command line options and an input file. All of the parameters that can be set on the command line can also be set in the input file. The input file values will override the command line. Run $ qs –h to see documentation on the available command line switches. Documentation of the input file parameters is in preparation.

Quicksilver also has the property that the output of every run is a valid input file. Hence you can repeat any run for which you have the output file by using that output as an input file.

License and Distribution Information

Quicksilver is available on github

Quicksilver is open source software with a BSD license. See LICENSE.md

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

LLNL-CODE-684037

quicksilver's People

Contributors

Stargazers

Watchers

quicksilver's Issues

Application Output Explenation

Is there any documentation to explain the output of this application? I am trying to understand how to interpret the data. From searching the code I can see that the first column refers to parts of the code which was timed.

Timer                       Cumulative   Cumulative   Cumulative   Cumulative   Cumulative   Cumulative
Name                            number    microSecs    microSecs    microSecs    microSecs   Efficiency
                              of calls          min          avg          max       stddev       Rating
main                                 1    1.529e+06    1.531e+06    1.533e+06    1.402e+03        99.90
cycleInit                           10    1.988e+05    2.750e+05    3.871e+05    7.559e+04        71.04
cycleTracking                       10    1.137e+06    1.251e+06    1.329e+06    7.726e+04        94.15
cycleTracking_Kernel            669680    5.383e+05    7.237e+05    8.906e+05    1.507e+05        81.27
cycleTracking_MPI               736648    7.518e+04    1.453e+05    2.153e+05    6.100e+04        67.49
cycleTracking_Test_Done          66978    2.151e+03    2.499e+03    3.023e+03    3.242e+02        82.67
cycleFinalize                       20    4.660e+02    9.270e+02    1.362e+03    3.434e+02        68.06
Figure Of Merit              1.938e+07 [Num Segments / Cycle Tracking Time]

Initialize random number generator

Hello,

We are currently trying to run several Proxy apps with SMPI/SimGrid (https://github.com/simgrid/SMPI-proxy-apps/ and https://github.com/simgrid/simgrid).

We had a little issue with Quicksilver, as the mesh is initialized with random centers. But the random number generator used in initMC.cc does not seem to be initialized explicitely. It's OK when all processes are running separately, as they will all return the same sequence, but for SimGrid we "fold" all of the processes into a single one. So all processes are generating different centers, which leads to issues later.

Adding a call to srand48(params.simulationParams.seed); in initMesh (initMC.cc:257) fixes the issue for us, ensuring that all processes will be initialized to the same seed before running the calls to drand48.

ptxas fatal : Unresolved extern function '_Z17getGlobalThreadIDv'

I googled this but couldn't figure it out. I don't know whether it's a QS issue, a CUDA issue, or some C++ issue. Any suggestions?

$ make
/swtools/cuda/cuda-9.2/bin/nvcc -x cu  -DHAVE_CUDA -std=c++11 -O2  -g  --gpu-architecture=sm_61 --compiler-bindir=/swtools/gcc/7.5.0/bin/gcc -c main.cc -o main.o
ptxas fatal   : Unresolved extern function '_Z17getGlobalThreadIDv'
make: *** [main.o] Error 255

Dead link in README.md

The link pointing to "LLNL co-design pages" in the Introduction section of README.md is a dead link. Perhaps it can be refreshed to this one instead? https://asc.llnl.gov/codes/proxy-apps/quicksilver

(Also, I don't know who in ECP I should contact for this one, but the same dead link is listed in https://proxyapps.exascaleproject.org/app/quicksilver/ )

Message from cuda-memcheck

I built the application with the options below and ran on a P100 GPU.

-O2 -DHAVE_CUDA -x cu -dc

cuda-memcheck shows the following message. When similar message can be reproduced, would you please update the kernels ? If I don't build/run the program appropriately, please let me know.

Thanks

cuda-memcheck ./qs

========= Invalid __global__ read of size 8
=========     at 0x000002e8 in CycleTrackingGuts(MonteCarlo*, int, ParticleVault*, ParticleVault*)
=========     by thread (127,0,0) in block (194,0,0)
=========     Address 0x7f153e0002e8 is out of bounds
=========     Device Frame:CycleTrackingKernel(MonteCarlo*, int, ParticleVault*, ParticleVault*) (CycleTrackingKernel(MonteCarlo*, int, ParticleVault*, ParticleVault*) : 0xa0)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/lib64/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:./qs [0x36feb]
=========     Host Frame:./qs [0x78d31]
=========     Host Frame:./qs [0x31358]
=========     Host Frame:./qs [0x31aaf]
=========     Host Frame:./qs [0x65dc]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x22555]
=========     Host Frame:./qs [0x693e]
...
...

Functions declarations have HOST_DEVICE_CUDA but definitions

In MC_Particle.hh:
Function declaration of Move_Particle uses HOST_DEVICE_CUDA:

   HOST_DEVICE_CUDA
   void Move_Particle(const DirectionCosine & direction_cosine, const double distance);

but function definition does not:
inline void MC_Particle::Move_Particle( const DirectionCosine &my_direction_cosine, …

In file ParticleVault.hh:
putParticle is declared as HOST_DEVICE_CUDA, but the function definition does not use HOST_DEVICE_CUDA.
invalidateParticle is declared as HOST_DEVICE_CUDA, but the function definition does not use HOST_DEVICE_CUDA.

ATOMIC_UPDATE called from host function

In file MC_SourceNow.cc:

ATOMIC_CAPTURE and ATOMIC_UPDATE macros are used inside a host function: void MC_SourceNow(MonteCarlo *monteCarlo). When the macro (defined in AtomicMacro.hh) is evaluated under a CUDA compiler (e.g., nvcc or clang) the code checks whether __CUDA_ARCH__ is defined or not and tries to replace it by the CUDA version or the serial version.

Under a CUDA compiler, because __CUDA_ARCH__ is always defined, it will try to replace the macro with the device version, but this is incorrect because the function is a host function.