enzo-project / enzo-e Goto Github PK

A version of Enzo designed for exascale and built on charm++.

License: Other

Makefile 0.04% Python 5.49% Shell 0.18% C++ 77.81% Fortran 9.02% C 3.67% Lex 0.12% Yacc 0.49% CSS 0.06% Awk 0.33% CMake 2.54% Jupyter Notebook 0.25%

enzo-e's People

Contributors

Stargazers

Watchers

enzo-e's Issues

Broken documentation links

On the front documentation page, some of the links (e.g. "Getting started", "Parameter file", "Parameter reference") within the text are broken.

Modify the CelloArray dimension naming

Currently CelloArray follows a convention similar to numpy arrays (assuming that the underlying array has C ordering): we enumerate the dimensions in order of increasing access speed.

For example, if we have a 3D array, arr.shape(0) gives the size of the dimension along which element is slowest and arr.shape(2) gives the size of the dimension along which element access is fastest.
To put it another way, if we initialize an array called arr with shape (a,b,c,d) then:
- arr.shape(0) == a
- arr.shape(1) == b
- arr.shape(2) == c
- arr.shape(3) == d

In enzo-e, the x-position of cells always varies along the fastest access-order dimension while the z-position of cells always varies along slowest access- order dimension.

This combination of factors frequently leads to this recurring pattern:

for (int iz = 0; iz < arr.shape(0); iz++) {
  for (int iy = 0; iy < arr.shape(1); iy++) {
    for (int ix = 0; ix < arr.shape(2); ix++) {
      // perform operations involving arr ...
    }
  }
}

In other words, the number of elements elements along the x-axis is given by arr.shape(2) while the number of elements along the z-axis is given by arr.shape(0).

This is counter-intuitive, especially because there are numerous places throughout the VL+CT where the dimension of an operation is expected as an argument to a function, with the following mapping:

x <--> 0
y <--> 1
z <--> 2
A similar mapping is also used in some Cello machinery, but is less explicit.

As discussed in PR #62, to resolve this tension, it would probably be best to reverse this number of dimensions. This new ordering would be similar to AthenaArray from Athena++ (Note: that in Athena++, the lowest dimension starts at 1, here it would start from 0)

Build without grackle fails

I tried to build the DD branch without Grackle and followed the doc setting use_grackle = 0 in the top level SConstruct.
The build fails with

./build.sh bin/enzo-p
Remove bin/enzo-p
2020-08-25 15:02:13 BEGIN
BEGIN Enzo-P/Cello ./build.sh
arch=linux_gnu
prec=double
target=bin/enzo-p
2020-08-25 15:02:13 compiling...                 
('    CELLO_ARCH scons arch=', 'linux_gnu')
('    CELLO_PREC scons prec=', 'double')


scons: warning: Two different environments were specified for target main_enzo.o,
	but they appear to have the same action: $CXX -o $TARGET -c $CXXFLAGS $CCFLAGS $_CCCOMCOM $SOURCES
File "/home/pgrete/src/enzo-e/build/Cello/SConscript", line 210, in <module>
/home/pgrete/src/charm/bin/charmc -language charm++ -o build/Enzo/enzo_EnzoMethodGrackle.o -c -Wall -O3 -g -ffast-math -funroll-loops -fPIC -balancer CommonLBs -DCONFIG_PRECISION_DOUBLE -DSMALL_INTS -DCONFIG_NODE_SIZE=64 -DCONFIG_NODE_SIZE_3=192 -DNO_FREETYPE -DCONFIG_USE_PERFORMANCE -DCONFIG_USE_MEMORY -DCONFIG_NEW_CHARM -DCONFIG_HAVE_VERSION_CONTROL -Iinclude -I/usr/include -I/usr/include/boost -I/lib/x86_64-linux-gnu/include build/Enzo/enzo_EnzoMethodGrackle.cpp
In file included from include/_data.hpp:86,
                 from build/Enzo/enzo.hpp:37,
                 from build/Enzo/enzo_EnzoMethodGrackle.cpp:10:
include/data_FluxData.hpp: In member function ‘int FluxData::index_field(int) const’:
include/data_FluxData.hpp:103:29: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<int>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
  103 |     return (0 <= i_f && i_f < field_list_.size()) ?
      |                         ~~~~^~~~~~~~~~~~~~~~~~~~
include/data_FluxData.hpp: In member function ‘const FaceFluxes* FluxData::get_block_fluxes_(int) const’:
include/data_FluxData.hpp:184:25: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<FaceFluxes*>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
  184 |   { return (0 <= i && i < block_fluxes_.size()) ?
      |                       ~~^~~~~~~~~~~~~~~~~~~~~~
include/data_FluxData.hpp: In member function ‘FaceFluxes* FluxData::get_block_fluxes_(int)’:
include/data_FluxData.hpp:187:25: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<FaceFluxes*>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
  187 |   { return (0 <= i && i < block_fluxes_.size()) ?
      |                       ~~^~~~~~~~~~~~~~~~~~~~~~
include/data_FluxData.hpp: In member function ‘const FaceFluxes* FluxData::get_neighbor_fluxes_(int) const’:
include/data_FluxData.hpp:192:25: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<FaceFluxes*>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
  192 |   { return (0 <= i && i < neighbor_fluxes_.size()) ?
      |                       ~~^~~~~~~~~~~~~~~~~~~~~~~~~
include/data_FluxData.hpp: In member function ‘FaceFluxes* FluxData::get_neighbor_fluxes_(int)’:
include/data_FluxData.hpp:195:25: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<FaceFluxes*>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
  195 |   { return (0 <= i && i < neighbor_fluxes_.size()) ?
      |                       ~~^~~~~~~~~~~~~~~~~~~~~~~~~
In file included from include/_data.hpp:90,
                 from build/Enzo/enzo.hpp:37,
                 from build/Enzo/enzo_EnzoMethodGrackle.cpp:10:
include/data_DataMsg.hpp: In member function ‘void DataMsg::set_num_face_fluxes(int)’:
include/data_DataMsg.hpp:146:11: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<FaceFluxes*>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
  146 |     if (i > face_fluxes_list_.size()) {
      |         ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
build/Enzo/enzo_EnzoMethodGrackle.cpp: In constructor ‘EnzoMethodGrackle::EnzoMethodGrackle(double, double)’:
build/Enzo/enzo_EnzoMethodGrackle.cpp:21:5: error: class ‘EnzoMethodGrackle’ does not have any field named ‘grackle_units_’
   21 |     grackle_units_(),
      |     ^~~~~~~~~~~~~~
build/Enzo/enzo_EnzoMethodGrackle.cpp:22:5: error: class ‘EnzoMethodGrackle’ does not have any field named ‘grackle_rates_’
   22 |     grackle_rates_(),
      |     ^~~~~~~~~~~~~~
build/Enzo/enzo_EnzoMethodGrackle.cpp:23:5: error: class ‘EnzoMethodGrackle’ does not have any field named ‘time_grackle_data_initialized_’
   23 |     time_grackle_data_initialized_(ENZO_FLOAT_UNDEFINED)
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
build/Enzo/enzo_EnzoMethodGrackle.cpp: In member function ‘virtual double EnzoMethodGrackle::timestep(Block*) const’:
build/Enzo/enzo_EnzoMethodGrackle.cpp:502:22: warning: unused variable ‘config’ [-Wunused-variable]
  502 |   const EnzoConfig * config = enzo::config();
      |                      ^~~~~~
Fatal Error by charmc in directory /home/pgrete/src/enzo-e
   Command g++ -DCMK_GFORTRAN -I/home/pgrete/src/charm/bin/../include -D__CHARMC__=1 -DCONFIG_PRECISION_DOUBLE -DSMALL_INTS -DCONFIG_NODE_SIZE=64 -DCONFIG_NODE_SIZE_3=192 -DNO_FREETYPE -DCONFIG_USE_PERFORMANCE -DCONFIG_USE_MEMORY -DCONFIG_NEW_CHARM -DCONFIG_HAVE_VERSION_CONTROL -Iinclude -I/usr/include -I/usr/include/boost -I/lib/x86_64-linux-gnu/include -Wall -O3 -g -ffast-math -funroll-loops -fPIC -U_FORTIFY_SOURCE -fno-stack-protector -fno-lifetime-dse -c build/Enzo/enzo_EnzoMethodGrackle.cpp -o build/Enzo/enzo_EnzoMethodGrackle.o returned error code 1
charmc exiting...
scons: *** [build/Enzo/enzo_EnzoMethodGrackle.o] Error 1
FAIL
done
END   Enzo-P/Cello ./build.sh: arch = linux_gnu  prec = double  target = bin/enzo-p time = .15 min
make: *** [Makefile:5: bin/enzo-p] Error 1

Am I missing something?

HDF5 Error

I was running a large simulation on Frontera (4096 PEs) and the program weirdly crashed while writing an HDF5 file. This error seems to be non-deterministic and rare:

I had already run 2 simulations of the same size which each wrote about 300 hdf5 files to disk
The simulation in which the error arose had already written 130 outputs. Plus after restart the same did not happen again.

I'm not sure how much can be done about this but wanted to report it

Here is the traceback written to stdout:

[1963] Stack Traceback:
[1963:0] enzo-p 0xb679a5 t_()
[1963:1] enzo-p 0xb24594 FileHdf5::file_close()
[1963:2] enzo-p 0xb16bb8 OutputData::close()
[1963:3] enzo-p 0xae73f1 Problem::output_write(Simulation*, int, char*)
[1963:4] enzo-p 0xae727d Problem::output_wait(Simulation*)
[1963:5] enzo-p 0xae7014 Simulation::r_write(CkReductionMsg*)
[1963:6] enzo-p 0xae6ef8 Block::p_output_write(int, int)
[1963:7] enzo-p 0xa42252 CkIndex_Block::_call_p_output_write_marshall13(void*, void*)
[1963:8] enzo-p 0x61b30f CkDeliverMessageReadonly
[1963:9] enzo-p 0x64c844 CkLocRec::invokeEntry(CkMigratable*, void*, int, bool)
[1963:10] enzo-p 0x6b4ed4 CkIndex_CkArray::_call_recvBroadcast_CkMessage(void*, void*)
[1963:11] enzo-p 0x620a1b
[1963:12] enzo-p 0x61f7ec _processHandler(void*, CkCoreState*)
[1963:13] enzo-p 0x7ec0e4 CsdScheduler
[1963:14] enzo-p 0x7e9265 ConverseInit
[1963:15] enzo-p 0x6111e2 charm_main
[1963:16] libc.so.6 0x2ad9b62de555 _libc_start_main
[1963:17] enzo-p 0x56c7f9
[1963] Stack Traceback:
[1963:0] libhdf5.so.103 0x2ad9b14bdc8b H5F__close_cb
[1963:1] libhdf5.so.103 0x2ad9b1545a9e
[1963:2] libhdf5.so.103 0x2ad9b16241d0 H5SL_try_free_safe
[1963:3] libhdf5.so.103 0x2ad9b1545999 H5I_clear_type
[1963:4] libhdf5.so.103 0x2ad9b14acd9e H5F_term_package
[1963:5] libhdf5.so.103 0x2ad9b13f499a H5_term_library
[1963:6] libc.so.6 0x2ad9b62f5ce9
[1963:7] libc.so.6 0x2ad9b62f5d37
[1963:8] libmpi.so.12 0x2ad9b599a406 MPL_exit
[1963:9] libmpi.so.12 0x2ad9b5518229
[1963:10] libmpi.so.12 0x2ad9b504d153
[1963:11] libmpi.so.12 0x2ad9b4f582fd MPI_Abort
[1963:12] enzo-p 0x7e8457 CmiAbort
[1963:13] enzo-p 0xb679a5 t()
[1963:14] enzo-p 0xb24594 FileHdf5::file_close()
[1963:15] enzo-p 0xb16bb8 OutputData::close()
[1963:16] enzo-p 0xae73f1 Problem::output_write(Simulation*, int, char*)
[1963:17] enzo-p 0xae727d Problem::output_wait(Simulation*)
[1963:18] enzo-p 0xae7014 Simulation::r_write(CkReductionMsg*)
[1963:19] enzo-p 0xae6ef8 Block::p_output_write(int, int)
[1963:20] enzo-p 0xa42252 CkIndex_Block::_call_p_output_write_marshall13(void*, void*)
[1963:21] enzo-p 0x61b30f CkDeliverMessageReadonly
[1963:22] enzo-p 0x64c844 CkLocRec::invokeEntry(CkMigratable*, void*, int, bool)
[1963:23] enzo-p 0x6b4ed4 CkIndex_CkArray::_call_recvBroadcast_CkMessage(void*, void*)
[1963:24] enzo-p 0x620a1b
[1963:25] enzo-p 0x61f7ec _processHandler(void*, CkCoreState*)
[1963:26] enzo-p 0x7ec0e4 CsdScheduler
[1963:27] enzo-p 0x7e9265 ConverseInit
[1963:28] enzo-p 0x6111e2 charm_main
[1963:29] libc.so.6 0x2ad9b62de555 __libc_start_main
[1963:30] enzo-p 0x56c7f9
[1595807695.210587] [c111-094:179807:0] ib_md.c:1063 UCX WARN IB: ibv_fork_init() was disable
d or failed, yet a fork() has been issued.
[1595807695.210601] [c111-094:179807:0] ib_md.c:1064 UCX WARN IB: data corruption might occur
when using registered memory.

Here is the error message written to stderr:

HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 0:
#000: H5C.c line 5457 in H5C_flush_invalidate_ring(): dirty entry flush destroy failed
major: Object cache
minor: Unable to flush data from cache
#001: H5C.c line 6133 in H5C__flush_single_entry(): Can't write image to file
major: Object cache
minor: Unable to flush data from cache
#002: H5Fio.c line 165 in H5F_block_write(): write through page buffer failed
major: Low-level I/O
minor: Write failed
#003: H5PB.c line 1028 in H5PB_write(): write through metadata accumulator failed
major: Page Buffering
minor: Write failed
#004: H5Faccum.c line 645 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#005: H5FDint.c line 258 in H5FD_write(): driver write request failed
major: Virtual File Layer
minor: Write failed
#006: H5FDsec2.c line 811 in H5FD_sec2_write(): file write failed: time = Sun Jul 26 18:54:00 2020
, filename = './cloud_08.2500/cloud-data-1963.h5', file descriptor = 49, errno = 5, error message = 'Inp
ut/output error', buf = 0x1ac7ab8, total write size = 1224, bytes this sub-write = 1224, bytes actually
written = 18446744073709551615, offset = 73417080
major: Low-level I/O
minor: Write failed
#007: H5Fint.c line 1130 in H5F__dest(): unable to flush cached data (phase 2)
major: File accessibilty
minor: Unable to flush data from cache
#008: H5Fint.c line 1912 in H5F__flush_phase2(): unable to flush metadata accumulator
major: Low-level I/O
minor: Unable to flush data from cache
#009: H5Faccum.c line 1033 in H5F__accum_flush(): file write failed
major: Low-level I/O
minor: Write failed
#010: H5FDint.c line 258 in H5FD_write(): driver write request failed
major: Virtual File Layer
minor: Write failed
#011: H5FDsec2.c line 811 in H5FD_sec2_write(): file write failed: time = Sun Jul 26 18:53:04 2020
, filename = './cloud_08.2500/cloud-data-1963.h5', file descriptor = 49, errno = 5, error message = 'Inp
ut/output error', buf = 0x1ac7ab8, total write size = 1224, bytes this sub-write = 1224, bytes actually
written = 18446744073709551615, offset = 73417080
major: Low-level I/O
minor: Write failed
#012: H5Fint.c line 1901 in H5F__flush_phase2(): unable to flush metadata cache
major: Object cache
minor: Unable to flush data from cache
#013: H5AC.c line 732 in H5AC_flush(): Can't flush cache
major: Object cache
minor: Unable to flush data from cache
#014: H5C.c line 1150 in H5C_flush_cache(): flush ring failed
major: Object cache
minor: Unable to flush data from cache
#015: H5C.c line 5872 in H5C__flush_ring(): Can't flush entry
major: Object cache
minor: Unable to flush data from cache
#016: H5C.c line 6133 in H5C__flush_single_entry(): Can't write image to file
major: Object cache
minor: Unable to flush data from cache
#017: H5Fio.c line 165 in H5F_block_write(): write through page buffer failed
major: Low-level I/O
minor: Write failed
#018: H5PB.c line 1028 in H5PB_write(): write through metadata accumulator failed
major: Page Buffering
minor: Write failed
#019: H5Faccum.c line 645 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#020: H5FDint.c line 258 in H5FD_write(): driver write request failed
major: Virtual File Layer
minor: Write failed
#021: H5FDsec2.c line 811 in H5FD_sec2_write(): file write failed: time = Sun Jul 26 18:52:09 2020
, filename = './cloud_08.2500/cloud-data-1963.h5', file descriptor = 49, errno = 5, error message = 'Inp
ut/output error', buf = 0x1ac7ab8, total write size = 1224, bytes this sub-write = 1224, bytes actually
written = 18446744073709551615, offset = 73417080
major: Low-level I/O
minor: Write failed
#022: H5Fint.c line 1881 in H5F__flush_phase2(): unable to flush metadata cache
major: Object cache
minor: Unable to flush data from cache
#023: H5AC.c line 732 in H5AC_flush(): Can't flush cache
major: Object cache
minor: Unable to flush data from cache
#024: H5C.c line 1150 in H5C_flush_cache(): flush ring failed
major: Object cache
minor: Unable to flush data from cache
#025: H5C.c line 5872 in H5C__flush_ring(): Can't flush entry
major: Object cache
minor: Unable to flush data from cache
#026: H5C.c line 6133 in H5C__flush_single_entry(): Can't write image to file
major: Object cache
minor: Unable to flush data from cache
#027: H5Fio.c line 165 in H5F_block_write(): write through page buffer failed
major: Low-level I/O
minor: Write failed
#028: H5PB.c line 1028 in H5PB_write(): write through metadata accumulator failed
major: Page Buffering
minor: Write failed
#029: H5Faccum.c line 645 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#030: H5FDint.c line 258 in H5FD_write(): driver write request failed
major: Virtual File Layer
minor: Write failed
#031: H5FDsec2.c line 811 in H5FD_sec2_write(): file write failed: time = Sun Jul 26 18:51:14 2020
, filename = './cloud_08.2500/cloud-data-1963.h5', file descriptor = 49, errno = 5, error message = 'Inp
ut/output error', buf = 0x1ac7ab8, total write size = 1224, bytes this sub-write = 1224, bytes actually
written = 18446744073709551615, offset = 73417080
major: Low-level I/O
minor: Write failed
1963 119516.09 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
1963 119516.09 ERROR disk_FileHdf5.cpp:143
1963 119516.09 ERROR FileHdf5::file_close
1963 119516.09 ERROR Return value -1 closing file ./cloud_08.2500/cloud-data-1963.h5
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(_Z2t_v+0x1b) [0xb6794b]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(_ZN8FileHdf510file_closeEv+0x1c4) [0xb24594]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(_ZN10OutputData5closeEv+0x18) [0xb16bb8]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(_ZN7Problem12output_writeEP10SimulationiPc+0
x81) [0xae73f1]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(_ZN7Problem11output_waitEP10Simulation+0x20d
) [0xae727d]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(_ZN10Simulation7r_writeEP14CkReductionMsg+0x
74) [0xae7014]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(_ZN5Block14p_output_writeEii+0xc8) [0xae6ef8
]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(ZN13CkIndex_Block31_call_p_output_write_mar
shall13EPvS0+0x92) [0xa42252]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(CkDeliverMessageReadonly+0x3f) [0x61b30f]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(_ZN8CkLocRec11invokeEntryEP12CkMigratablePvi
b+0xb4) [0x64c844]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(ZN15CkIndex_CkArray29_call_recvBroadcast_Ck
MessageEPvS0+0x374) [0x6b4ed4]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p() [0x620a1b]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(_Z15_processHandlerPvP11CkCoreState+0x112c)
[0x61f7ec]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(CsdScheduler+0x444) [0x7ec0e4]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(ConverseInit+0xaf5) [0x7e9265]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p(charm_main+0x22) [0x6111e2]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /lib64/libc.so.6(__libc_start_main+0xf5) [0x2ad9b62de555]
1963 119516.12 EXIT :0
1963 119516.12 EXIT /home1/05274/tg845732/enzo-e/bin/enzo-p() [0x56c7f9]

Autospecify required fields for MHD solver

Now that PR #49 has been merged in, the MHD solver needs to be updated so that it can auto-specify it's required fields. This is going to look a little different from the other methods since the MHD solver requires face-centered fields.

Hopefully I'll be able to issue this in the next week (I don't think this will take much time at all since we can reuse some existing machinery)

Specify new file-sets on Restart

It would be extremely useful if we could tell Enzo-E to write new output file-sets on a restart (both for debugging purposes and for science purposes).

I made a lot of progress towards introducing this feature in PR #88, but ultimately encountered some weird bugs when using a version of charm++ built with MPI mode (note: it's possible that the parameter file I was using was written incorrectly). I closed that PR because it wasn't an immediate priority for me anymore, and I'm not quite sure whether PR #97 will require this to be overhauled.

Representation of particle mass versus density

I'm opening up an issue to have a discussion on the particle masses. This was brought up in #49. The particles in the group has_mass will be included in the gravity solve. Currently, the mass field is defined to be density. This is how Enzo handled the particle masses to ease their mass deposition into grids for the gravity solve.

Do we keep this convention from Enzo? Or do we rename the field to particle_mass_density or the like? Then we could have some derived field that contains their masses. I could be convinced either way.

SCons Warnings

This is a fairly well-known issue, but I don't think an issue has been created to document it. Whenever you initiate a build, we get the following warnings:

    CELLO_ARCH scons arch= linux_gnu
    CELLO_PREC scons prec= double


scons: warning: Calling missing SConscript without error is deprecated.
Transition by adding must_exist=0 to SConscript calls.
Missing SConscript '/home/mabruzzo/local/SConscript'
File "/home/mabruzzo/enzo-e/build-main/External/SConscript", line 8, in <module>

scons: warning: Two different environments were specified for target main_enzo.o,
	but they appear to have the same action: $CXX -o $TARGET -c $CXXFLAGS $CCFLAGS $_CCCOMCOM $SOURCES

I'm most concerned about the warning that states: "Two different environments ...". This has been around since I was a new user and it definitely tripped me up a bunch when I was getting started. If at all possible, it would be nice to resolve this warning.

The other warning is much newer; I think it only started to appear after PR #46 was merged into master. I'm a little doubtful that anything can be done about this one (or whether anything should be done)

Saving Files Based on Scheduled Simulation times

I've noticed that there's a bug with saving output when specifying an output schedule that depends on the simulation time. Regardless of whether the schedule is based on some interval scheme or some predetermined list, no outputs get saved unless a simulation time is specified that exactly matches any of the times that the simulation naturally reaches.

For example, if we indicate that output should be when simulation time is 0.5, but the closest the simulation time is 0.501, then no output will be saved. (Note: if we specify that output should be saved at the initial time or the stopping time, then output is saved). A modified version of the “Hello World” (the Hi.in variation) input file is provided here to illustrate the problem.

The reference guide seems to suggest that the simulation's time step may be reduced so that the output is saved at the exactly specified, but as far as I can tell, this never happens. (The only case where anything like this happens is when a Stopping time is specified)

Enforce Ghost Depth Constraints on Block Size

Currently, Cello/Enzo-E does not enforce any constraints on the block size based on the ghost depth. This can cause the simulation to silently break. An explicit check should be added to cause the simulation to fail at startup with an informative error message.

This should be easy to address and would be a great first issue!

In case this is somebody's first issue, I wanted to briefly outline how I I would approach this (but feel free to do something different). The easiest way to accomplish this is to add ~2 ASSERT or ERROR statements to Config::read_mesh_ in src/Cello/parameters_Config.cpp. Here are a couple of notes:

The dimensions of the active zone on each block need to be computed. They are given by mesh_root_blocks[i]/mesh_root_size[i]. Note that i=0, i=1, and i=2 correspond to the x, y, and z axes, respectively.
When AMR is being used (mesh_max_level > 0), you need to confirm that the dimensions of the active zone on each block are at least double the size of the ghost depth.
For unigrid simulations (mesh_max_level = 0), the dimensions of the active zone on each block must be at least as large as the ghost depth.
Some care needs to be taken to properly handle 1D and 2D simulations (the dimensionality is stored in mesh_root_rank). In these cases, the number of cells along the extra dimensions are set to 1

There are a couple of optional, additional checks that might also be nice to have. These are not necessary (other error handling will catch these problems eventually), but it might be nice to provide more explicit error messages for new users.

the dimensions of the active zone on each block is should each be even when AMR is being used (they can be odd for unigrid simulations). Again, care needs to be taken for handling 1D/2D simulations
The parameter "Adapt:min_level" (stored in the mesh_min_level variable) should be less than or equal to zero.

The load balance tests may need to be skipped on circle-ci since they don't do what's intended

We may need to skip the load balancing tests on circle-ci since ssh is not available. At the moment ++local is used in the run argument which defeats the purpose of the tests on circle-ci. Tests are currently located in test/Balance

Passing particles from parent to child during ICs

Currently, there is a bug that prevents passing particles from a parent blocks (i.e. the root grid) to child blocks during initialization. This is preventing ICs with both a large number of particles and a large number of highly refined blocks, but a reasonable number of root-grid blocks. Although reading these particles into each child block works, this can take an incredibly long time for large particle lists / block counts.

More generally there may be a need to do something creative to read in VERY large particle ICs for VERY large root-grids, but that is longer term and solutions may be problem-type specific.

`MethodFluxCorrect` problems with Required-Field machinery

Currently, MethodFluxCorrect registers the fields in the "conserved" group (or other user-specified group) for refresh when it's initially constructed. I'm pretty confident that this will produce problems if fields are added to the "conserved" group after MethodFluxCorrect has been constructed.

For example, the user might specify the following set of Methods:
list = ["ppm", "flux_correct", "grackle"];
and rely upon the required-field machinery in EnzoMethodGrackle to automatically initialize the necessary color fields.

Currently, the EnzoMethodGrackle doesn't add the "color" fields to the "conserved" group (this should be addressed).
Even if EnzoMethodGrackle were to add the new "color" fields to the "conserved" field-group, MethodFluxCorrect's refresh objects won't be updated to reflect this change.

Since this may not be easy to fix, a reasonable short-term solution might be to raise an error in MethodFluxCorrect::compute if the number of fields in the "conserved" group has changed.

Enzo-P crashes on 64 nodes of Frontera when run on top of MPICH/OpenMPI (built over UCX)

I ran into this issue while attempting to run Enzo-P/Cello on Frontera on top of the MPI machine layer (mpi-linux-x86_64 charm builds) which was built over MPICH (built over UCX) and OpenMPI (built over UCX).

The output showed the following stack trace:

0 00038.05 Solver dd_root  iter 0000  err 0 [0 0]
[c188-184:204986:0:204986] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x21)
==== backtrace (tid: 204986) ====
 0 0x0000000000036340 killpg()  ???:0
 1 0x00000000007524b1 std::__shared_ptr_access<Matrix, (__gnu_cxx::_Lock_policy)2, false, false>::_M_get()  /opt/apps/gcc/9.1.0/include/c++/9.1.0/bits/shared_ptr_base.h:1021
 2 0x0000000000755242 EnzoSolverMg0::begin_solve()  /scratch1/03808/nbhat4/enzo-e/build/Enzo/enzo_EnzoSolverMg0.cpp:501
 3 0x0000000000755363 EnzoBlock::r_solver_mg0_begin_solve()  /scratch1/03808/nbhat4/enzo-e/build/Enzo/enzo_EnzoSolverMg0.cpp:481
 4 0x0000000000755363 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char const*>()  /opt/apps/gcc/9.1.0/include/c++/9.1.0/bits/basic_string.h:247
 5 0x0000000000755363 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>()  /opt/apps/gcc/9.1.0/include/c++/9.1.0/bits/basic_string.h:266
 6 0x0000000000755363 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string()  /opt/apps/gcc/9.1.0/include/c++/9.1.0/bits/basic_string.h:527
 7 0x0000000000755363 EnzoBlock::r_solver_mg0_begin_solve()  /scratch1/03808/nbhat4/enzo-e/build/Enzo/enzo_EnzoSolverMg0.cpp:483
 8 0x00000000005880ba CkDeliverMessageReadonly()  ???:0
 9 0x00000000005b08d5 CkLocRec::invokeEntry()  ???:0
10 0x00000000005d674b CkArrayBroadcaster::deliver()  ???:0
11 0x00000000005e5976 CkArray::recvBroadcast()  ???:0
12 0x00000000005896a0 CkDeliverMessageFree()  ???:0
13 0x0000000000590e66 _processHandler()  ???:0
14 0x0000000000669ed4 CsdScheduleForever()  ???:0
15 0x000000000066a175 CsdScheduler()  ???:0
16 0x0000000000668e7a ConverseInit()  ???:0
17 0x000000000057eb1c charm_main()  ???:0
18 0x0000000000022495 __libc_start_main()  ???:0
19 0x000000000053c17c _start()  ???:0
=================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 204986 RUNNING AT c188-184
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

The following are the detailed output logs from the 64 node runs over the MPI machine layer on top of MPICH and OpenMPI respectively.

enzo-output-64nodes-mpich-672138.out.txt

enzo-output-64nodes-ompi-672140.out.txt

Interestingly, running over Intel MPI didn't cause this failure.

Conservation of mass and energy

In the process of testing the DD branch I ran the test problem (Hi.in) following the "Getting Started" instructions (using a "standard" linux system with GNU compilers using CELLO_PREC=double).

The output I get includes information like

0 00009.18 Method Field density sum 1.4601588822392469e+03 conserved to 4.74825 digits of 16
 pass  0/4 build/Cello/problem_MethodFluxCorrect.cpp 205  MethodFluxCorrect precision (density)
0 00009.18 Method Field internal_energy sum 0.0000000000000000e+00 conserved to inf digits of 16
0 00009.18 Method Field total_energy sum 4.3561676005276215e+03 conserved to 4.49216 digits of 16
0 00009.18 Method Field velocity_x sum -6.8978075245716434e-03 conserved to -3.34049 digits of 16
0 00009.18 Method Field velocity_y sum 1.0612441232024589e-03 conserved to -13.1133 digits of 16

If I read this correctly then mass and total energy are only conserved to less than 5 digits already after a few timesteps.
Is the output incorrect, is the code not conserving mass and energy to what I'd expect for double precision calculations, or am I missing something else?

IsolatedGalaxy: Total loss of disk mass

When I was testing #49 I had to adjust several of the input parameters to produce the correct setup. However, the disk immediately loses its mass (density = density_floor) after the first cycle.

Cycle 0

Cycle 1

Conservation of mass and momentum in Shu Collapse problem

shu_collapse_test.tar.gz

I ran the Shu Collapse problem (unstable isothermal sphere) with periodic boundary conditions, once with the centre of collapse at the centre of the domain, at an intersection between blocks (which I call the centre-collapse problem), and once with the centre of collapse at the centre of a block, offset from the centre of the domain (which I call the off-centre-collapse problem). The attached images show the density field in the initial conditions, at after the problem has run for 1000 cycles, for each problem. The cross-shape (I am fairly sure) is due to the periodic boundary conditions.

I have attached a couple of plots showing the mass and momentum conservation for the two cases. In the off-centre-collapse problem, mass in conserved exactly, which is not the case for the centre-collapse problem. Momentum is not conserved in either problem, but the centre-collapse shows a much larger change in the total momentum.

While we may expect there to be some numerical errors which mean mass and momentum are not perfectly conserved, it is curious that there is difference due to the translation of the system.

To reproduce this, you will need to checkout the smartstars branch from my fork here: https://github.com/stefanarridge/enzo-e/, and compile with CELLO_ARCH=linux_gnu and CELLO_PREC=double. The attached tarball contains all the necessary files for running the problem and making the figures... you will simply need to edit the environment variables in run.sh and run it.

Let me know if you need some more clarification or help with reproducing these results.

Data corruption after restart

It appears that data dumps generated after a checkpoint restart are corrupted in some fashion. Or at least corrupted in some fashion that yt is unable to open the output. I was able to re-create this problem with the enzo-e master branch using the HelloWorld.in example (input pasted here here).

In some cases non-utf8 characters popped up in the resulting .parameter and .libconfig files, but there seems to be other issues with the data output itself that I am not sure about. For the input example above, here is the yt error output when trying to interact with the data set (yt.load returns without error).

Below are the commands I used to run and then restart the HelloWorld example:

./charmrun +p2 ./enzo-e HelloWorld.in

and

./charmrun +p2 ./enzo-e +restart checkpoint-0000

Scons TypeError

While building enzo-e on stampede2, I encountered a python error related to scons at the end of the build process. Everything proceeds as normal until the very end of the build process. Right after the newly-built executable is copied to bin/enzo-p, the traceback for a python TypeError is output. However then the build script indicates that the installation succeeds.

This occurs for both choices of compilers. As far as I can tell, this doesn't actually cause any issues and the executable works as it is supposed to. However, I wanted to document this issue.

An excerpt of the output illustrating this issue is provided below. The full log from the same build process can be found here.

Install file: "build-stampede_build_update/Cello/libcello.a" as "lib/libcello.a"
/home1/05274/tg845732/charm/bin/charmc -language charm++ -o build-stampede_build_update/Enzo/enzo-p -Ofast -g -Wall -rdynamic -module CommonLBs build-stampede_build_update/Enzo/enzo-p.o build-stampede_build_update/Cello/main_enzo.o -Llib -L/opt/apps/intel18/hdf5/1.10.4/x86_64/lib -L/opt/apps/intel18/boost/1.68/lib -L/home1/05274/tg845732/local/lib -L/usr/lib64/lib -lenzo -lcharm -lsimulation -ldata -lproblem -lcompute -lcontrol -lmesh -lio -ldisk -lmemory -lparameters -lerror -lmonitor -lparallel -lperformance -ltest -lcello -lexternal -lhdf5 -lz -ldl -lpng -lirc -limf -lifcore -lifport -lstdc++ -lintlc -lsvml -lboost_filesystem -lboost_system -lgrackle
Install file: "build-stampede_build_update/Enzo/enzo-p" as "bin/enzo-p"
TypeError: signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable object:
File "/home1/05274/tg845732/enzo-e/scons-local-2.2.0/SCons/Script/Main.py", line 1343:
_exec_main(parser, values)
File "/home1/05274/tg845732/enzo-e/scons-local-2.2.0/SCons/Script/Main.py", line 1307:
_main(parser)
File "/home1/05274/tg845732/enzo-e/scons-local-2.2.0/SCons/Script/Main.py", line 1071:
nodes = _build_targets(fs, options, targets, target_top)
File "/home1/05274/tg845732/enzo-e/scons-local-2.2.0/SCons/Script/Main.py", line 1265:
jobs.run(postfunc = jobs_postfunc)
File "/home1/05274/tg845732/enzo-e/scons-local-2.2.0/SCons/Job.py", line 114:
self._reset_sig_handler()
File "/home1/05274/tg845732/enzo-e/scons-local-2.2.0/SCons/Job.py", line 160:
signal.signal(signal.SIGTERM, self.old_sigterm)
Success
done
END Enzo-P/Cello ./build.sh: arch = stampede_intel prec = double target = bin/enzo-p time = 2.83 min
Done.

Further test infrastructure test improvements required

Just a quick note that we still need to improve the test infrastructure somewhat.

Improved documentation (there is some but could be improved)
The tests still spit out erroneous errors when moving files (*.png *.hdf5) around. Just needs to be cleaned up.

./build.sh returns success even if some tests crash

Additionally, failed balance tests still return success since they're run in a subdirectory.

Refreshing (and Saving) face-centered fields

There appears to be an issue with refreshing face-centered fields. I believe that the wrong iteration limits are being used for the refresh. To help illustrate which iteration limits I think should be used for an field centered on the x-faces I have included the 2 images:

In the following table, I summarize what the iteration limits should be for an xface centered field.
The limits assume that nx represents the number of values of a field in the active zone.

	Ghost Zone Limits	Active Zone Limits
Update Left Ghost Zone	istart = 0 istop = gx	istart = nx-1 istop = nx+gx-1
Update Right Ghost Zone	istart = nx + gx istop = nx+2*gx	istart = gx+1 istop = 2*gx+1

For comparison, I have included the iteration limits for a cell-centered field (in this table nx is smaller by a value of one):

	Ghost Zone Limits	Active Zone Limits
Update Left Ghost Zone	istart = 0 istop = gx	istart = nx istop = nx+gx
Update Right Ghost Zone	istart = nx + gx istop = nx+2*gx	istart = gx istop = 2*gx

This issue can be replicated with a simplified example using the input here. Unfortunately, the initialization of face-centered fields with functions is fairly buggy. As a result, I can't initialize the field with a periodic function (this example is not particularly helpful for understanding the actual issue).

Before running the example, an issue with saving the face-centered fields needs to be addressed. Currently, in the master-branch face-centered face-centered arrays get clipped when they are saved to HDF5 files. This arises because the method IoFieldData::field_array that retrieves the data and it’s iteration limits for copying the data to the hdf5 file, determines the iteration limits by calling FieldData::size and it never checks the field's centering (effectively all fields are assumed to be cell-centered). I have supplied a bug fix here.

After applying the above fix, the example should work properly. The example initializes the following field centered on the x-faces:

[[  0.   0. -18. -17. -16. -15. -14. -13. -12.]
 [-11. -10.  -9.  -8.  -7.  -6.  -5.  -4.  -3.]
 [ -2.  -1.   0.   1.   2.   3.   4.   5.   6.]
 [  7.   8.   9.  10.  11.  12.  13.  14.  15.]
 [ 16.  17.  18.  19.  20.  21.  22.  23.  24.]
 [ 25.  26.  27.  28.  29.  30.  31.  32.  33.]
 [ 34.  35.  36.  37.  38.  39.  40.  41.  42.]
 [ 43.  44.  45.   0.   0.   0.   0.   0.   0.]]

After refreshing the ghost zone (using a periodic boundary condition) the resulting field is:

[[21. 22. 18. 19. 20. 21. 22. 18. 19.]
 [30. 31. 27. 28. 29. 30. 31. 27. 28.]
 [ 3.  4.  0.  1.  2.  3.  4.  0.  1.]
 [12. 13.  9. 10. 11. 12. 13.  9. 10.]
 [21. 22. 18. 19. 20. 21. 22. 18. 19.]
 [30. 31. 27. 28. 29. 30. 31. 27. 28.]
 [ 3.  4.  0.  1.  2.  3.  4.  0.  1.]
 [12. 13.  9. 10. 11. 12. 13.  9. 10.]]

In reality, the final field should be:

[[20. 21. 18. 19. 20. 21. 22. 19. 20.]
 [29. 30. 27. 28. 29. 30. 31. 28. 29.]
 [ 2.  3.  0.  1.  2.  3.  4.  1.  2.]
 [11. 12.  9. 10. 11. 12. 13. 10. 11.]
 [20. 21. 18. 19. 20. 21. 22. 19. 20.]
 [29. 30. 27. 28. 29. 30. 31. 28. 29.]
 [ 2.  3.  0.  1.  2.  3.  4.  1.  2.]
 [11. 12.  9. 10. 11. 12. 13. 10. 11.]]

Issue with simulation timing after restart

There appears to be an issue with the global simulation timing following checkpoint-restarts. Basically the number becomes huge after the restart. This was occurred using an older version of the code, so it may not still be relevant.

I know @jobordner has been working on the performance counters and he may have already addressed this (or were you just addressing memory allocation tracking?).

Anyway, as an example here is the example output just before the restart. I have put the disagreeing values in bold:

Checkpoint to disk finished in 21.848492s, sending out the cb...
creating symlink ./ckpt-09.7500 -> Checkpoint
0 172381.53 Output writing data file ./cloud_09.7500/cloud-data-0000.h5
0 172385.58 -------------------------------------
0 172385.58 Simulation cycle 30330
0 172385.58 Simulation time-sim 9.750000000000e+00
0 172385.58 Simulation dt 2.756530990524e-04
0 172389.88 Performance counter num-msg-coarsen 0
0 172389.88 Performance counter num-msg-refine 4095
0 172389.88 Performance counter num-msg-refresh 0
0 172389.88 Performance counter num-data-msg 0
0 172389.88 Performance counter num-field-face 0
0 172389.88 Performance counter num-particle-data 4096
0 172389.88 Performance simulation num-particles total 0
0 172389.88 performance simulation num-blocks-0 4096
0 172389.88 Performance simulation num-leaf-blocks 4096
0 172389.88 Performance simulation num-total-blocks 4096
0 172389.88 Performance simulation time-usec 706082880935855
0 172389.88 Performance cycle time-usec 705983702632898
0 172389.88 Performance cycle bytes-curr 1121654867955
0 172389.88 Performance cycle bytes-high 4628707403161
0 172389.88 Performance cycle bytes-highest 4628707403161
0 172389.88 Performance cycle bytes-available 0
0 172389.88 Performance initial time-usec -1171434555
0 172389.88 Performance adapt_apply time-usec 20479153
0 172389.88 Performance adapt_apply_sync time-usec 3082800
0 172389.88 Performance adapt_update time-usec 16036783688
0 172389.88 Performance adapt_update_sync time-usec 114510906154
0 172389.88 Performance adapt_notify time-usec 2729825452
0 172389.88 Performance adapt_notify_sync time-usec 969340540
0 172389.88 Performance adapt_end time-usec 65885677
0 172389.88 Performance adapt_end_sync time-usec 71455292124
0 172389.88 Performance refresh_store time-usec 3777640034769
0 172389.88 Performance refresh_child time-usec 0
0 172389.88 Performance refresh_exit time-usec 1570735025053
0 172389.88 Performance refresh_store_sync time-usec 1041918012919
0 172389.88 Performance refresh_child_sync time-usec 0
0 172389.88 Performance refresh_exit_sync time-usec 11940266161476
0 172389.88 Performance control time-usec 276013661
0 172389.88 Performance compute time-usec 599363082978407
0 172389.88 Performance output time-usec 5429074918201
0 172389.88 Performance stopping time-usec 7360126856353
0 172389.88 Performance block time-usec 1104304619
0 172389.88 Performance exit time-usec 0
0 172389.88 Performance grackle time-usec 108939071776165

Here is the very first logging immediately after the restart:

0 00025.48 -------------------------------------
0 00025.48 Simulation cycle 30330
0 00025.48 Simulation time-sim 9.750000000000e+00
0 00025.48 Simulation dt 2.756530990524e-04
0 00028.99 Performance counter num-msg-coarsen 0
0 00028.99 Performance counter num-msg-refine 262080
0 00028.99 Performance counter num-msg-refresh 0
0 00028.99 Performance counter num-data-msg 0
0 00028.99 Performance counter num-field-face 0
0 00028.99 Performance counter num-particle-data 65
0 00028.99 Performance simulation num-particles total 0
0 00028.99 performance simulation num-blocks-0 4096
0 00028.99 Performance simulation num-leaf-blocks 4096
0 00028.99 Performance simulation num-total-blocks 4096
0 00028.99 Performance simulation time-usec 6528550902331704639
0 00028.99 Performance cycle time-usec 6528550803153401813
0 00028.99 Performance cycle bytes-curr 1115892465817
0 00028.99 Performance cycle bytes-high 1115892465817
0 00028.99 Performance cycle bytes-highest 1115892465817
0 00028.99 Performance cycle bytes-available 0
0 00028.99 Performance initial time-usec -1171434555
0 00028.99 Performance adapt_apply time-usec 20479153
0 00028.99 Performance adapt_apply_sync time-usec 3082800
0 00028.99 Performance adapt_update time-usec 16036783688
0 00028.99 Performance adapt_update_sync time-usec 114510906154
0 00028.99 Performance adapt_notify time-usec 2729825452
0 00028.99 Performance adapt_notify_sync time-usec 969340540
0 00028.99 Performance adapt_end time-usec 65885677
0 00028.99 Performance adapt_end_sync time-usec 71455292124
0 00028.99 Performance refresh_store time-usec 3777640034769
0 00028.99 Performance refresh_child time-usec 0
0 00028.99 Performance refresh_exit time-usec 1570735025053
0 00028.99 Performance refresh_store_sync time-usec 1041918012919
0 00028.99 Performance refresh_child_sync time-usec 0
0 00028.99 Performance refresh_exit_sync time-usec 11940266161476
0 00028.99 Performance control time-usec 276013661
0 00028.99 Performance compute time-usec 599363082978407
0 00028.99 Performance output time-usec 5427301748834
0 00028.99 Performance stopping time-usec 7360126856353
0 00028.99 Performance block time-usec 1105656497
0 00028.99 Performance exit time-usec 0
0 00028.99 Performance grackle time-usec 108939071776165

Unfortunately, I don't currently have a minimal example to reproduce this. I do have some generalized machinery in mind to help streamline the testing of checkpoint-restart assist (to address Issue #72) - when I write that up, it may be helpful with reproducing this problem.

Gravity time step.

The gravity time step condition is calculated in Lines 244-287 of src/Enzo/enzo_EnzoMethodGravity.cpp:

if (cosmology) {
    const int rank = cello::rank();
    enzo_float cosmo_a = 1.0;
    enzo_float cosmo_dadt = 0.0;
    double dt   = block->dt();
    double time = block->time();
    cosmology-> compute_expansion_factor (&cosmo_a,&cosmo_dadt,time+0.5*dt);
    if (rank >= 1) hx*=cosmo_a;
    if (rank >= 2) hy*=cosmo_a;
    if (rank >= 3) hz*=cosmo_a;
  }

  if (ax) {
    for (int iz=gz; iz<mz-gz; iz++) {
      for (int iy=gy; iy<my-gy; iy++) {
	for (int ix=gx; ix<mx-gx; ix++) {
	  int i=ix + mx*(iy + iz*my);
	  dt = std::min(enzo_float(dt),enzo_float(sqrt(hx/(fabs(ax[i]+1e-20)))));
	}
      }
    }
  }
  if (ay) {
    for (int iz=gz; iz<mz-gz; iz++) {
      for (int iy=gy; iy<my-gy; iy++) {
	for (int ix=gx; ix<mx-gx; ix++) {
	  int i=ix + mx*(iy + iz*my);
	  dt = std::min(enzo_float(dt),enzo_float(sqrt(hy/(fabs(ay[i]+1e-20)))));
	}
      }
    }
  }
  if (az) {
    for (int iz=gz; iz<mz-gz; iz++) {
      for (int iy=gy; iy<my-gy; iy++) {
	for (int ix=gx; ix<mx-gx; ix++) {
	  int i=ix + mx*(iy + iz*my);
	  dt = std::min(enzo_float(dt),enzo_float(sqrt(hz/(fabs(az[i]+1e-20)))));
	}
      }
    }
  }

  return 0.5*dt;
}

There are a few issues with this:

The cell widths are multiplied by 'cosmo_a' to convert from comoving to proper coordinates. But there is no equivalent done for the accelerations. It is my understanding that when cosmology is turned on, all field values are all comoving quantities, but I may be wrong about this. In general I think the treatment of units and comoving vs physical quantities needs to be clarified and perhaps needs a separate github issue.
On the first cycle, the acceleration field is zero everywhere, which means that dt will be sqrt(hx)*1e10, independent of the units used. This means that the gravity time step on the first cycle will be very large, and not physically motivated. In a dark matter only simulation, this will be the time step used, unless the maximum time step is set to be something smaller. At the very least, the '1e-20' should be replaced by an input parameter which can be less arbitrary.
In general, the timestep depends on the acceleration field from the previous cycle, which is likely not what we want.

SConstruct complains about not finding VERSION file when CHARM_HOME has arch directory

Since the VERSION file is present only in the base directory (path to charm) and not the architecture directory, on setting CHARM_HOME to include the arch directory like /work/03808/nbhat4/frontera/charm/mpi-linux-x86_64-impi-prod, the SConstruct script complains about not finding a VERSION file.

login1.frontera(1245)$ export CHARM_HOME=/work/03808/nbhat4/frontera/charm/mpi-linux-x86_64-impi-prod
login1.frontera(1246)$ make
./build.sh bin/enzo-p
Remove bin/enzo-p
2020-03-09 16:29:51 BEGIN
BEGIN Enzo-P/Cello ./build.sh
arch=frontera_gcc
prec=single
target=bin/enzo-p
2020-03-09 16:29:51 compiling...
    CELLO_ARCH scons arch= frontera_gcc
    CELLO_PREC scons prec= single

cat: /work/03808/nbhat4/frontera/charm/mpi-linux-x86_64-impi-prod/VERSION: No such file or directory
CalledProcessError: Command '['cat', '/work/03808/nbhat4/frontera/charm/mpi-linux-x86_64-impi-prod/VERSION']' returned non-zero exit status 1:
  File "/scratch1/03808/nbhat4/enzo-e/SConstruct", line 559:
    charm_version =  subprocess.check_output (["cat", charm_path + "/VERSION"]).rstrip();
  File "/usr/lib64/python2.7/subprocess.py", line 575:
    raise CalledProcessError(retcode, cmd, output=output)
FAIL
done
END   Enzo-P/Cello ./build.sh: arch = frontera_gcc  prec = single  target = bin/enzo-p time = 0 min
make: *** [bin/enzo-p] Error 1

The current resolution to bypass this error is to copy the VERSION file from the charm base directory into the arch directory or set CHARM_HOME to just the charm base directory. However, it is common to use different builds (and have different arch directories) with the same charm base directory.

For this reason, it'll be good to find a better solution to this minor problem.

Enzo could read from a variable called CHARM_ARCH where the arch directory specifies the specific build name and the final charm path is $CHARM_HOME appended with $CHARM_ARCH. This is the approach used by NAMD.
Enzo could extract the VERSION from CHARM_HOME and if not found, it could try to find a VERSION file in the parent directory of CHARM_HOME. However, this solution isn't as concrete as option 1.

Building Enzo-E with CMake

In preparation for incorporating Kokkos into Enzo-E, I need to get Enzo-E compiling and running with a build system that is compatible with Kokkos - either cmake or plain Makefile. cmake should be more flexible.

I've started a branch here on my fork of Enzo-E for building Enzo-E with cmake. However, it's currently not working, even with a charm++ build that works fine with the master SCons built Enzo-E repo.

From a high level, my cmake does these things to successfully compile Enzo-E (although the executable is broken).

Compile a libexternal_library.a from all the Source and header files in src/External using g++
Generate *.decl.h and *.def.h headers from all the *.ci files in src/ using charmc
Compile a libcello_library.a from most of the source and header files in src/Cello using g++/gcc/gfortran
Compile object files from most of the source and header files in src/Enzo using g++/gcc/gfortran
Link libexternal_library.a, libcello_library.a, and the source files from src/Enzo together into an executable using charmc, also linking with dl, pthread, z, hdf5, png, gfortran, boost

However, the resulting executable still doesn't work

$CHARM_DIR/bin/charmrun +p4 src/enzo_e_exe input/test_cosmo-bcg.in
Running on 4 processors:  src/enzo_e_exe input/test_cosmo-bcg.in 
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 4  src/enzo_e_exe input/test_cosmo-bcg.in 
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: MPI_THREAD_SINGLE (desired: MPI_THREAD_SINGLE)
Charm++> Running in non-SMP mode: 4 processes (PEs)
Converse/Charm++ Commit ID: v6.10.2-0-g7bf00fad3
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 6 cores x 2 PUs = 12-way SMP)
Charm++> cpu topology info is gathered in 0.000 seconds.
0 00000.00 Parameters  Adapt:list = [ "mass_baryon" "mass_dark" ]
0 00000.00 Parameters  Adapt:mass_baryon:field_list = [ "density" ]
0 00000.00 Parameters  Adapt:mass_baryon:mass_type = "baryon"
0 00000.00 Parameters  Adapt:mass_baryon:max_coarsen = 0.1250000000000000
0 00000.00 Parameters  Adapt:mass_baryon:min_refine = 8.000000000000000
0 00000.00 Parameters  Adapt:mass_baryon:type = "mass"
0 00000.00 Parameters  Adapt:mass_dark:field_list = [ "density_particle_accumulate" ]
0 00000.00 Parameters  Adapt:mass_dark:mass_type = "dark"
0 00000.00 Parameters  Adapt:mass_dark:max_coarsen = 0.1250000000000000
0 00000.00 Parameters  Adapt:mass_dark:min_refine = 8.000000000000000
0 00000.00 Parameters  Adapt:mass_dark:type = "mass"
0 00000.00 Parameters  Adapt:max_initial_level = 0
0 00000.00 Parameters  Adapt:max_level = 4
0 00000.00 Parameters  Adapt:min_level = 0
0 00000.00 Parameters  Balance:start = 10
0 00000.00 Parameters  Balance:value = 20
0 00000.00 Parameters  Balance:var = "cycle"
0 00000.00 Parameters  Boundary:type = "periodic"
0 00000.00 Parameters  Domain:lower = [ 0.000000000000000 0.000000000000000 0.000000000000000 ]
0 00000.00 Parameters  Domain:upper = [ 1.000000000000000 1.000000000000000 1.000000000000000 ]
0 00000.00 Parameters  Field:alignment = 8
0 00000.00 Parameters  Field:gamma = 1.666700000000000
0 00000.00 Parameters  Field:ghost_depth = 4
0 00000.00 Parameters  Field:history = 1
0 00000.00 Parameters  Field:list = [ "density" "velocity_x" "velocity_y" "velocity_z" "acceleration_x" "acceleration_y" "acceleration_z" "total_energy" "internal_energy" "pressure" "density_total" "density_particle" "density_particle_accumulate" "potential" "density_gas" "X_copy" "B" "R0" "DD_po_1" "DD_B" "Y1_bcg" "Y2_bcg" "P1_bcg" "Q2_bcg" "V1_bcg" "R1_bcg" "B_COPY" "potential_copy" ]
0 00000.00 Parameters  Field:padding = 0
0 00000.00 Parameters  Initial:list = [ "music" "cosmology" ]
0 00000.00 Parameters  Initial:music:FD:coords = "tzyx"
0 00000.00 Parameters  Initial:music:FD:dataset = "GridDensity"
0 00000.00 Parameters  Initial:music:FD:file = "input/cosmo_grid_density.h5"
0 00000.00 Parameters  Initial:music:FD:name = "density"
0 00000.00 Parameters  Initial:music:FD:type = "field"
0 00000.00 Parameters  Initial:music:FVX:coords = "tzyx"
0 00000.00 Parameters  Initial:music:FVX:dataset = "GridVelocities_x"
0 00000.00 Parameters  Initial:music:FVX:file = "input/cosmo_grid_velocities_x.h5"
0 00000.00 Parameters  Initial:music:FVX:name = "velocity_x"
0 00000.00 Parameters  Initial:music:FVX:type = "field"
0 00000.00 Parameters  Initial:music:FVY:coords = "tzyx"
0 00000.00 Parameters  Initial:music:FVY:dataset = "GridVelocities_y"
0 00000.00 Parameters  Initial:music:FVY:file = "input/cosmo_grid_velocities_y.h5"
0 00000.00 Parameters  Initial:music:FVY:name = "velocity_y"
0 00000.00 Parameters  Initial:music:FVY:type = "field"
0 00000.00 Parameters  Initial:music:FVZ:coords = "tzyx"
0 00000.00 Parameters  Initial:music:FVZ:dataset = "GridVelocities_z"
0 00000.00 Parameters  Initial:music:FVZ:file = "input/cosmo_grid_velocities_z.h5"
0 00000.00 Parameters  Initial:music:FVZ:name = "velocity_z"
0 00000.00 Parameters  Initial:music:FVZ:type = "field"
0 00000.00 Parameters  Initial:music:PVX:attribute = "vx"
0 00000.00 Parameters  Initial:music:PVX:coords = "tzyx"
0 00000.00 Parameters  Initial:music:PVX:dataset = "ParticleVelocities_x"
0 00000.00 Parameters  Initial:music:PVX:file = "input/cosmo_particle_velocities_x.h5"
0 00000.00 Parameters  Initial:music:PVX:name = "dark"
0 00000.00 Parameters  Initial:music:PVX:type = "particle"
0 00000.00 Parameters  Initial:music:PVY:attribute = "vy"
0 00000.00 Parameters  Initial:music:PVY:coords = "tzyx"
0 00000.00 Parameters  Initial:music:PVY:dataset = "ParticleVelocities_y"
0 00000.00 Parameters  Initial:music:PVY:file = "input/cosmo_particle_velocities_y.h5"
0 00000.00 Parameters  Initial:music:PVY:name = "dark"
0 00000.00 Parameters  Initial:music:PVY:type = "particle"
0 00000.00 Parameters  Initial:music:PVZ:attribute = "vz"
0 00000.00 Parameters  Initial:music:PVZ:coords = "tzyx"
0 00000.00 Parameters  Initial:music:PVZ:dataset = "ParticleVelocities_z"
0 00000.00 Parameters  Initial:music:PVZ:file = "input/cosmo_particle_velocities_z.h5"
0 00000.00 Parameters  Initial:music:PVZ:name = "dark"
0 00000.00 Parameters  Initial:music:PVZ:type = "particle"
0 00000.00 Parameters  Initial:music:PX:attribute = "x"
0 00000.00 Parameters  Initial:music:PX:coords = "tzyx"
0 00000.00 Parameters  Initial:music:PX:dataset = "ParticleDisplacements_x"
0 00000.00 Parameters  Initial:music:PX:file = "input/cosmo_particle_displacements_x.h5"
0 00000.00 Parameters  Initial:music:PX:name = "dark"
0 00000.00 Parameters  Initial:music:PX:type = "particle"
0 00000.00 Parameters  Initial:music:PY:attribute = "y"
0 00000.00 Parameters  Initial:music:PY:coords = "tzyx"
0 00000.00 Parameters  Initial:music:PY:dataset = "ParticleDisplacements_y"
0 00000.00 Parameters  Initial:music:PY:file = "input/cosmo_particle_displacements_y.h5"
0 00000.00 Parameters  Initial:music:PY:name = "dark"
0 00000.00 Parameters  Initial:music:PY:type = "particle"
0 00000.00 Parameters  Initial:music:PZ:attribute = "z"
0 00000.00 Parameters  Initial:music:PZ:coords = "tzyx"
0 00000.00 Parameters  Initial:music:PZ:dataset = "ParticleDisplacements_z"
0 00000.00 Parameters  Initial:music:PZ:file = "input/cosmo_particle_displacements_z.h5"
0 00000.00 Parameters  Initial:music:PZ:name = "dark"
0 00000.00 Parameters  Initial:music:PZ:type = "particle"
0 00000.00 Parameters  Initial:music:file_list = [ "FD" "FVX" "FVY" "FVZ" "PX" "PY" "PZ" "PVX" "PVY" "PVZ" ]
0 00000.00 Parameters  Initial:music:throttle_group_size = 64
0 00000.00 Parameters  Initial:music:throttle_internode = false
0 00000.00 Parameters  Initial:music:throttle_intranode = false
0 00000.00 Parameters  Initial:music:throttle_node_files = false
0 00000.00 Parameters  Initial:music:throttle_seconds_delay = 0.01000000000000000
0 00000.00 Parameters  Initial:music:throttle_seconds_stagger = 0.1000000000000000
0 00000.00 Parameters  Memory:limit_gb = 1.000000000000000
0 00000.00 Parameters  Mesh:root_blocks = [ 4 4 4 ]
0 00000.00 Parameters  Mesh:root_rank = 3
0 00000.00 Parameters  Mesh:root_size = [ 32 32 32 ]
0 00000.00 Parameters  Method:gravity:accumulate = true
0 00000.00 Parameters  Method:gravity:grav_const = 1.000000000000000
0 00000.00 Parameters  Method:gravity:order = 2
0 00000.00 Parameters  Method:gravity:solver = "bcg"
0 00000.00 Parameters  Method:list = [ "pm_deposit" "gravity" "pm_update" "comoving_expansion" ]
0 00000.00 Parameters  Method:ppm:courant = 0.5000000000000000
0 00000.00 Parameters  Method:ppm:diffusion = false
0 00000.00 Parameters  Method:ppm:dual_energy = true
0 00000.00 Parameters  Output:ax:dir = [ "Dir_COSMO_BCG_%04d" "cycle" ]
0 00000.00 Parameters  Output:ax:field_list = [ "acceleration_x" ]
0 00000.00 Parameters  Output:ax:image_ghost = false
0 00000.00 Parameters  Output:ax:image_size = [ 512 512 ]
0 00000.00 Parameters  Output:ax:image_type = "data"
0 00000.00 Parameters  Output:ax:name = [ "ax-%02d.png" "count" ]
0 00000.00 Parameters  Output:ax:schedule:step = 20
0 00000.00 Parameters  Output:ax:schedule:var = "cycle"
0 00000.00 Parameters  Output:ax:type = "image"
0 00000.00 Parameters  Output:ay:dir = [ "Dir_COSMO_BCG_%04d" "cycle" ]
0 00000.00 Parameters  Output:ay:field_list = [ "acceleration_y" ]
0 00000.00 Parameters  Output:ay:image_size = [ 512 512 ]
0 00000.00 Parameters  Output:ay:image_type = "data"
0 00000.00 Parameters  Output:ay:name = [ "ay-%02d.png" "count" ]
0 00000.00 Parameters  Output:ay:schedule:step = 20
0 00000.00 Parameters  Output:ay:schedule:var = "cycle"
0 00000.00 Parameters  Output:ay:type = "image"
0 00000.00 Parameters  Output:az:dir = [ "Dir_COSMO_BCG_%04d" "cycle" ]
0 00000.00 Parameters  Output:az:field_list = [ "acceleration_z" ]
0 00000.00 Parameters  Output:az:image_size = [ 512 512 ]
0 00000.00 Parameters  Output:az:image_type = "data"
0 00000.00 Parameters  Output:az:name = [ "az-%02d.png" "count" ]
0 00000.00 Parameters  Output:az:schedule:step = 20
0 00000.00 Parameters  Output:az:schedule:var = "cycle"
0 00000.00 Parameters  Output:az:type = "image"
0 00000.00 Parameters  Output:check:dir = [ "Dir_COSMO_BCG_%04d-checkpoint" "count" ]
0 00000.00 Parameters  Output:check:schedule:start = 100
0 00000.00 Parameters  Output:check:schedule:step = 200
0 00000.00 Parameters  Output:check:schedule:var = "cycle"
0 00000.00 Parameters  Output:check:type = "checkpoint"
0 00000.00 Parameters  Output:dark:colormap = [ 0.000000000000000 0.000000000000000 0.000000000000000 1.000000000000000 0.000000000000000 0.000000000000000 1.000000000000000 1.000000000000000 0.000000000000000 1.000000000000000 1.000000000000000 1.000000000000000 ]
0 00000.00 Parameters  Output:dark:dir = [ "Dir_COSMO_BCG_%04d" "cycle" ]
0 00000.00 Parameters  Output:dark:image_ghost = false
0 00000.00 Parameters  Output:dark:image_size = [ 512 512 ]
0 00000.00 Parameters  Output:dark:image_type = "data"
0 00000.00 Parameters  Output:dark:name = [ "dark-%02d.png" "count" ]
0 00000.00 Parameters  Output:dark:particle_list = [ "dark" ]
0 00000.00 Parameters  Output:dark:schedule:step = 20
0 00000.00 Parameters  Output:dark:schedule:var = "cycle"
0 00000.00 Parameters  Output:dark:type = "image"
0 00000.00 Parameters  Output:de:dir = [ "Dir_COSMO_BCG_%04d" "cycle" ]
0 00000.00 Parameters  Output:de:field_list = [ "density" ]
0 00000.00 Parameters  Output:de:image_ghost = false
0 00000.00 Parameters  Output:de:image_size = [ 512 512 ]
0 00000.00 Parameters  Output:de:image_type = "data"
0 00000.00 Parameters  Output:de:name = [ "de-%02d.png" "count" ]
0 00000.00 Parameters  Output:de:schedule:step = 20
0 00000.00 Parameters  Output:de:schedule:var = "cycle"
0 00000.00 Parameters  Output:de:type = "image"
0 00000.00 Parameters  Output:dep:dir = [ "Dir_COSMO_BCG_%04d" "cycle" ]
0 00000.00 Parameters  Output:dep:field_list = [ "density_particle" ]
0 00000.00 Parameters  Output:dep:image_ghost = false
0 00000.00 Parameters  Output:dep:image_size = [ 512 512 ]
0 00000.00 Parameters  Output:dep:image_type = "data"
0 00000.00 Parameters  Output:dep:name = [ "dep-%02d.png" "count" ]
0 00000.00 Parameters  Output:dep:schedule:step = 20
0 00000.00 Parameters  Output:dep:schedule:var = "cycle"
0 00000.00 Parameters  Output:dep:type = "image"
0 00000.00 Parameters  Output:depa:dir = [ "Dir_COSMO_BCG_%04d" "cycle" ]
0 00000.00 Parameters  Output:depa:field_list = [ "density_particle_accumulate" ]
0 00000.00 Parameters  Output:depa:image_ghost = false
0 00000.00 Parameters  Output:depa:image_size = [ 512 512 ]
0 00000.00 Parameters  Output:depa:image_type = "data"
0 00000.00 Parameters  Output:depa:name = [ "depa-%02d.png" "count" ]
0 00000.00 Parameters  Output:depa:schedule:step = 20
0 00000.00 Parameters  Output:depa:schedule:var = "cycle"
0 00000.00 Parameters  Output:depa:type = "image"
0 00000.00 Parameters  Output:hdf5:dir = [ "Dir_COSMO_BCG_%04d" "cycle" ]
0 00000.00 Parameters  Output:hdf5:field_list = [ "density" "velocity_x" "velocity_y" "velocity_z" "acceleration_x" "acceleration_y" "acceleration_z" "total_energy" "internal_energy" "pressure" ]
0 00000.00 Parameters  Output:hdf5:name = [ "data-%02d-%02d.h5" "count" "proc" ]
0 00000.00 Parameters  Output:hdf5:particle_list = [ "dark" ]
0 00000.00 Parameters  Output:hdf5:schedule:step = 20
0 00000.00 Parameters  Output:hdf5:schedule:var = "cycle"
0 00000.00 Parameters  Output:hdf5:type = "data"
0 00000.00 Parameters  Output:list = [ "de" "depa" "ax" "dark" "mesh" "po" "check" ]
0 00000.00 Parameters  Output:mesh:colormap = [ 0.000000000000000 0.000000000000000 0.000000000000000 0.000000000000000 0.000000000000000 1.000000000000000 0.000000000000000 1.000000000000000 1.000000000000000 0.000000000000000 1.000000000000000 0.000000000000000 1.000000000000000 1.000000000000000 0.000000000000000 1.000000000000000 0.000000000000000 0.000000000000000 ]
0 00000.00 Parameters  Output:mesh:dir = [ "Dir_COSMO_BCG_%04d" "cycle" ]
0 00000.00 Parameters  Output:mesh:image_reduce_type = "max"
0 00000.00 Parameters  Output:mesh:image_size = [ 513 513 ]
0 00000.00 Parameters  Output:mesh:image_type = "mesh"
0 00000.00 Parameters  Output:mesh:name = [ "mesh-%02d.png" "count" ]
0 00000.00 Parameters  Output:mesh:schedule:step = 20
0 00000.00 Parameters  Output:mesh:schedule:var = "cycle"
0 00000.00 Parameters  Output:mesh:type = "image"
0 00000.00 Parameters  Output:po:dir = [ "Dir_COSMO_BCG_%04d" "cycle" ]
0 00000.00 Parameters  Output:po:field_list = [ "potential_copy" ]
0 00000.00 Parameters  Output:po:image_size = [ 512 512 ]
0 00000.00 Parameters  Output:po:image_type = "data"
0 00000.00 Parameters  Output:po:name = [ "po-%02d.png" "count" ]
0 00000.00 Parameters  Output:po:schedule:step = 20
0 00000.00 Parameters  Output:po:schedule:var = "cycle"
0 00000.00 Parameters  Output:po:type = "image"
0 00000.00 Parameters  Particle:dark:attributes = [ "x" "default" "y" "default" "z" "default" "vx" "default" "vy" "default" "vz" "default" "ax" "default" "ay" "default" "az" "default" ]
0 00000.00 Parameters  Particle:dark:constants = [ "mass" "default" 0.8666666666667000 ]
0 00000.00 Parameters  Particle:dark:position = [ "x" "y" "z" ]
0 00000.00 Parameters  Particle:dark:velocity = [ "vx" "vy" "vz" ]
0 00000.00 Parameters  Particle:list = [ "dark" ]
0 00000.00 Parameters  Physics:cosmology:comoving_box_size = 3.000000000000000
0 00000.00 Parameters  Physics:cosmology:final_redshift = 3.000000000000000
0 00000.00 Parameters  Physics:cosmology:hubble_constant_now = 0.7000000000000000
0 00000.00 Parameters  Physics:cosmology:initial_redshift = 99.00000000000000
0 00000.00 Parameters  Physics:cosmology:max_expansion_rate = 0.01500000000000000
0 00000.00 Parameters  Physics:cosmology:omega_baryon_now = 0.04000000000000000
0 00000.00 Parameters  Physics:cosmology:omega_cdm_now = 0.2600000000000000
0 00000.00 Parameters  Physics:cosmology:omega_lambda_now = 0.7000000000000000
0 00000.00 Parameters  Physics:cosmology:omega_matter_now = 0.3000000000000000
0 00000.00 Parameters  Physics:list = [ "cosmology" ]
0 00000.00 Parameters  Solver:bcg:iter_max = 100
0 00000.00 Parameters  Solver:bcg:monitor_iter = 10
0 00000.00 Parameters  Solver:bcg:res_tol = 0.01000000000000000
0 00000.00 Parameters  Solver:bcg:type = "bicgstab"
0 00000.00 Parameters  Solver:list = [ "bcg" ]
0 00000.00 Parameters  Stopping:cycle = 160
0 00000.05 WARNING parameters_Config.cpp:927
0 00000.05 WARNING Config::read()
0 00000.05 WARNING output_axis[0] set to z
0 00000.05 WARNING parameters_Config.cpp:927
0 00000.05 WARNING Config::read()
0 00000.05 WARNING output_axis[1] set to z
0 00000.05 WARNING parameters_Config.cpp:927
0 00000.05 WARNING Config::read()
0 00000.05 WARNING output_axis[2] set to z
0 00000.05 WARNING parameters_Config.cpp:927
0 00000.05 WARNING Config::read()
0 00000.05 WARNING output_axis[3] set to z
0 00000.05 WARNING parameters_Config.cpp:927
0 00000.05 WARNING Config::read()
0 00000.05 WARNING output_axis[4] set to z
0 00000.05 WARNING parameters_Config.cpp:927
0 00000.05 WARNING Config::read()
0 00000.05 WARNING output_axis[5] set to z
UNIT TEST BEGIN
0 00000.05  ==============================================
0 00000.05   
0 00000.05    .oooooo.             oooo  oooo            
0 00000.05   d8P'  `Y8b            `888  `888            
0 00000.05  888           .ooooo.   888   888   .ooooo.  
0 00000.05  888          d88' `88b  888   888  d88' `88b 
0 00000.05  888          888ooo888  888   888  888   888 
0 00000.05  `88b    ooo  888    .o  888   888  888   888 
0 00000.05   `Y8bood8P'  `Y8bod8P' o888o o888o `Y8bod8P' 
0 00000.05   
0 00000.05  A Parallel Adaptive Mesh Refinement Framework
0 00000.05   
0 00000.05    Laboratory for Computational Astrophysics
0 00000.05          San Diego Supercomputer Center
0 00000.05       University of California, San Diego
0 00000.05   
0 00000.05  See 'LICENSE_CELLO' for software license information
0 00000.05   
0 00000.05  BEGIN CELLO: Feb 08 15:30:12
0 00000.05 Define Simulation processors 4
0 00000.05 CHARM CkNumPes()           4
0 00000.05 CHARM CkNumNodes()         4
0 00000.05  BEGIN ENZO-P
0 00000.05 Memory bytes 272858 bytes_high 272912
Charm++: late entry method registration happened after init
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Did you forget to import a module or instantiate a templated entry method in a .ci file?

Entry point: EnzoSimulation(const char *filename, int n), addr: 0x555555808370
[0] Stack Traceback:
  [0:0] enzo_e_exe 0x555555b1c280 CmiAbortHelper(char const*, char const*, char const*, int, int)
  [0:1] enzo_e_exe 0x555555b1c3a4 
  [0:2] enzo_e_exe 0x555555a2f2f8 
  [0:3] enzo_e_exe 0x55555580a7cd CkIndex_EnzoSimulation::reg_EnzoSimulation_marshall1()
  [0:4] enzo_e_exe 0x55555580acd1 CProxy_EnzoSimulation::ckNew(char const*, int, CkEntryOptions const*)
  [0:5] enzo_e_exe 0x5555558185d0 Main::Main(CkArgMsg*)
  [0:6] enzo_e_exe 0x555555a2c3f6 _initCharm(int, char**)
  [0:7] enzo_e_exe 0x555555b1ea1e ConverseInit
  [0:8] enzo_e_exe 0x555555a2b2cc charm_main
  [0:9] libc.so.6 0x7ffff717eb25 __libc_start_main
  [0:10] enzo_e_exe 0x5555556fb2ce _start
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[1] Stack Traceback:
  [1:0] enzo_e_exe 0x555555b1eb67 
  [1:1] libpthread.so.0 0x7ffff7f85960 
  [1:2] enzo_e_exe 0x555555b1cec3 
  [1:3] enzo_e_exe 0x555555b1da99 LrtsAdvanceCommunication(int)
  [1:4] enzo_e_exe 0x555555b1db47 CmiGetNonLocal
  [1:5] enzo_e_exe 0x555555b1fa1d CsdNextMessage
  [1:6] enzo_e_exe 0x555555b1fad3 CsdScheduleForever
  [1:7] enzo_e_exe 0x555555b1fd85 CsdScheduler
  [1:8] enzo_e_exe 0x555555b1ea9a ConverseInit
  [1:9] enzo_e_exe 0x555555a2b2cc charm_main
  [1:10] libc.so.6 0x7ffff717eb25 __libc_start_main
  [1:11] enzo_e_exe 0x5555556fb2ce _start
[2] Stack Traceback:
  [2:0] enzo_e_exe 0x555555b1eb67 
  [2:1] libpthread.so.0 0x7ffff7f85960 
  [2:2] libmpi.so.40 0x7ffff75e33dd 
  [2:3] mca_pml_ob1.so 0x7ffff4ec2281 mca_pml_ob1_iprobe
  [2:4] libmpi.so.40 0x7ffff7596510 MPI_Iprobe
  [2:5] enzo_e_exe 0x555555b1d195 
  [2:6] enzo_e_exe 0x555555b1dabe 
  [2:7] enzo_e_exe 0x555555b2385e CcdRaiseCondition
  [2:8] enzo_e_exe 0x555555b1fb65 CsdScheduleForever
  [2:9] enzo_e_exe 0x555555b1fd85 CsdScheduler
  [2:10] enzo_e_exe 0x555555b1ea9a ConverseInit
  [2:11] enzo_e_exe 0x555555a2b2cc charm_main
  [2:12] libc.so.6 0x7ffff717eb25 __libc_start_main
  [2:13] enzo_e_exe 0x5555556fb2ce _start
------------- Processor 1 Exiting: Caught Signal ------------
Reason: Terminated
------------- Processor 2 Exiting: Caught Signal ------------
Reason: Terminated
------------- Processor 3 Exiting: Caught Signal ------------
Reason: Terminated
[3] Stack Traceback:
  [3:0] enzo_e_exe 0x555555b1eb67 
  [3:1] libpthread.so.0 0x7ffff7f85960 
  [3:2] enzo_e_exe 0x555555b1cdf0 
  [3:3] enzo_e_exe 0x555555b1da99 LrtsAdvanceCommunication(int)
  [3:4] enzo_e_exe 0x555555b1db47 CmiGetNonLocal
  [3:5] enzo_e_exe 0x555555b1fa1d CsdNextMessage
  [3:6] enzo_e_exe 0x555555b1fad3 CsdScheduleForever
  [3:7] enzo_e_exe 0x555555b1fd85 CsdScheduler
  [3:8] enzo_e_exe 0x555555b1ea9a ConverseInit
  [3:9] enzo_e_exe 0x555555a2b2cc charm_main
  [3:10] libc.so.6 0x7ffff717eb25 __libc_start_main
  [3:11] enzo_e_exe 0x5555556fb2ce _start
[forrest-razer:758669] 3 more processes have sent help message help-mpi-api.txt / mpi-abort
[forrest-razer:758669] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Any suggestions for the cmake build process? You can look at my repo and cmake build instructions at https://github.com/forrestglines/enzo-e/tree/cmake

Restarts fail when trying to refine blocks [solver-dd branch]

Error messages vary, but originate from Charm++ and are related to inserting chare elements into a chare array
test_adapt-L5.in with checkpointing enabled:

------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Attempting to create too many chare elements!"

test_cosmo-bcg.unit with checkpointing enabled:

------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: Cannot insert array element twice!

Hanging adapt-L5-P1 test on Circleci

After the final time that @jobordner merged the master branch into PR #45, the adapt-L5-P1 test hung when CircleCi compiled the code both in single and double precision mode. He noted that

this was weird since none of the changes introduced in the PR should have affected that test
when he sshed into the CircleCi server, the test did seem to run a little slowly
CircleCi had some warning about exceeding violating concurrency limits. Maybe it was somehow using more cores than requested at some point during the testing

`NumberOfBaryonFields` ptr versus int logic

After a system update (which came with an updated g++), Enzo-E now fails to compile with

build-master/Enzo/enzo_EnzoMethodPpml.cpp: In member function ‘virtual double EnzoMethodPpml::timestep(Block*) const’:
build-master/Enzo/enzo_EnzoMethodPpml.cpp:96:39: error: ordered comparison of pointer with integer zero (‘int*’ and ‘int’)
   96 |   if (EnzoBlock::NumberOfBaryonFields > 0) {
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
Fatal Error by charmc in directory /home/pgrete/src/enzo-e
   Command g++ -DCMK_GFORTRAN -I/home/pgrete/src/charm/bin/../include -D__CHARMC__=1 -DCONFIG_PRECISION_DOUBLE -DSMALL_INTS -DCONFIG_NODE_SIZE=64 -DCONFIG_NODE_SIZE_3=192 -DNO_FREETYPE -DCONFIG_USE_PERFORMANCE -DCONFIG_USE_MEMORY -DCONFIG_HAVE_VERSION_CONTROL -Iinclude -I/usr/include -I/usr/include/boost -I/lib/x86_64-linux-gnu/include -O3 -g -ffast-math -funroll-loops -fPIC -pedantic -U_FORTIFY_SOURCE -fno-stack-protector -fno-lifetime-dse -c build-master/Enzo/enzo_EnzoMethodPpml.cpp -o build-master/Enzo/enzo_EnzoMethodPpml.o returned error code 1

This also applies to other files, e.g., build-master/Enzo/enzo_SetMinimumSupport.cpp or build-master/Enzo/enzo_SolveMHDEquations.cpp.

As far as I can tell int EnzoBlock::NumberOfBaryonFields[CONFIG_NODE_SIZE]; from enzo_EnzoBlock will always be != nullptr because CONFIG_NODE_SIZE is a compile time constant.
Thus, the conditional as it stands will always be true (if it compiles).

I assume that the idea behind the conditional is that those code pieces should only be called if there are actualy baryon fields.
If that's true, then the check should be updated to sth like EnzoBlock::NumberOfBaryonFields[my_processing_element] > 0.

More generally, I wonder if NumberOfBaryonFields actually needs to be an array (similar to the many other variables of EnzoBlock such as the block dimensions) as those numbers are (AFAIK) constant across all PEs for a given simulation.

@jobordner it'd be great to get your input on this as I'm probably missing sth (for not being too familiar with the low level design).

Possible issue with time integration of particles

I have found some mismatches between what is written in the Enzo paper (https://arxiv.org/abs/1307.2265), and what is currently implemented in Enzo-E.

Firstly, the paper describes the following procedure for depositing particles onto the grid:

" During the CIC interpolation, particle positions are (temporarily) advanced by 0.5vn∆t so that we generate an estimate of the time-centered density field.
"

However, lines 183-195 in src/Enzo/enzo_EnzoMethodPmDeposit.cpp are:


            double x = xa[ip*dp] + vxa[ip*dv]*dt;

            double tx = nx*(x - xm) / (xp - xm) - 0.5;

            int ix0 = gx + floor(tx);

            int ix1 = ix0 + 1;

            double x0 = 1.0 - (tx - floor(tx));
            double x1 = 1.0 - x0;

            de_p[ix0] += dens*x0;
            de_p[ix1] += dens*x1;

i.e. particles are drifted by the whole timestep before being deposited onto the grid.

EDIT:
It appears that dt is defined on Line 152 as follows:

const double dt = alpha_ * block->dt() / cosmo_a;

Where alpha_ is set to be 0.5 by default. So if the default value is used, there is agreement with the Enzo paper, but perhaps this should be hard-coded? Is there any reason why we would want to change it?

When it comes to updating the particles' positions and velocities, the paper describes a 'Drift-Kick-Drift' integration scheme (Equations 62).
What is implemented in Enzo-E currently is a bit different. Firstly, the particles' accelerations are calculating by interpolating from the acceleration field to the particles' drifted positions, in lines 91-106 in src/Enzo/enzo_EnzoMethodPmUpdate.cpp:

 double dt_shift = 0.5*block->dt()/cosmo_a;
    //    double dt_shift = 0.0;
    if (rank >= 1) {
      EnzoComputeCicInterp interp_x ("acceleration_x", "dark", "ax", dt_shift);
      interp_x.compute(block);
    }

    if (rank >= 2) {
      EnzoComputeCicInterp interp_y ("acceleration_y", "dark", "ay", dt_shift);
      interp_y.compute(block);
    }

    if (rank >= 3) {
      EnzoComputeCicInterp interp_z ("acceleration_z", "dark", "az", dt_shift);
      interp_z.compute(block);
    }

EDIT: Also note that this is only done for "dark" particles at the moment. Extending this to other types of particles should be straightforward though.

Then, in lines 189-191 the particles' positions and velocities are updated with something that looks like a Kick-Drift-Kick integration scheme:

	  vx[ipdv] = cvv*vx[ipdv] + cva*ax[ipda];
	  x [ipdp] += cp*vx[ipdv];
	  vx[ipdv] = cvv*vx[ipdv] + cva*ax[ipda];

with cp, cvv and cva defined in lines 145-148:

    const double cp = dt/cosmo_a;
    const double coef = 0.25*cosmo_dadt/cosmo_a*dt;
    const double cvv = (1.0 - coef) / (1.0 + coef);
    const double cva = 0.5*dt / (1.0 + coef);

Even if we did decide to use Kick-Drift-Kick scheme, I'm not sure this is the right way to do it. The accelerations in the two kick steps should be different, and should be calculated using the particles' positions before/after the drift respectively.

Is this change from what's described in the Enzo paper intentional? I tried looking at the Enzo source code to figure out what is actually implemented in Enzo but I couldn't find the relevant lines of code.

Circleci cosmo-bcg Timeout Issue in VLCT PR

This documents a Circleci problem that initially manifested for PR #9, after merging in changes from PR #12 in commit. The issue first appeared in 43341c9, which was the first commit pushed to GitHub following the merge.

To briefly summarize the problem, the cosmo-bcg test, added by PR #12, failed when it was run on Circleci due to a timeout. However, neither @jobordner nor myself were able to replicate the issue on our local machines; when running it locally, we each found that the test passed when run locally. @jobordner recently found a work-around for this issue with commit 86b7693.

This is largely being made to archive the discussion that @jobordner and I had about this issue over Slack

Periodic Boundaries Mixed with other Boundary Types

Currently there is a bug that causes an incorrect update of ghost zones when a periodic boundary is used alongside another boundary type when multiple blocks are used. It's easier to explain this with the following example: (using this input file)

Basically we create an active zone of (6,6) cells where each row has increasing values. I have defined a ghost depth of 2 and split the domain into 4 blocks (2 per axis). The initial conditions look like the following:

I have specified that at x values below x = 0 and above x=xmax the there is an "outflow" boundary condition while at y=0 and above y=ymax there's a "periodic" boundary condition. After the update, the blocks look like:

The outlined red regions highlight the regions that have not been properly updated. In short, the y values exterior to the active zone and x values interior to the active zone are not updated at all. The script to generate the grids can be found here. Analogous versions of the bug exists if we swap the periodic and outflow boundaries. There is no bug if this is all done on a single block (which makes sense).

I'd be happy to try to take a stab at this, but I am struggling to understand where and how it is determined which data should be sent to neighboring blocks.

A side note to anyone who plays around with this - there is a separate, unrelated bug related to specifying two or more periodic boundaries individually in the input file. For example, type keyword in the provided input file for the x boundary to be "periodic", only one of the boundaries will be updated (This bug is not present if simultaneously specify all boundaries as periodic). This bug also appears when you specify that 2+ boundaries are "periodic" for the 3D grid. There is a proposed fix to this bug in Pull Request #19 (alongside some fixes for refreshing and initializing face-centered fields).

More consistent checking for negative densities and masses in PM solver

Instead of inv_vol that's always positive, it might be more informative to report on the negative density (de_p[ix0+mx*iy0]) and/or mass (pmass[ip*dm]). Applies to all instances below, too.

From the previous version, I also see that this error check only happens in 2D. It should be consistently checked (or not) regardless of the rank. It doesn't need to be fixed in this PR, but I'll make an issue.

Originally posted by @jwise77 in #89 (comment)

Deprecated Code introduced by VL+CT PR

Some deprecated code was introduced by the VL+CT PR (#62)

One of those spots was in EnzoInitialCloud
Another spot was in EnzoFieldArrayFactory

These need to be removed

VL+CT testing updates

The VL+CT integrator tests should be ported to be more inline with the new testing framework introduced in PR #53. However, full integration needs to wait until PR #46 get's approved (most of the VL+CT tests python3). There are some ambiguities with how exactly the gold standard values should be treated.

Additionally, tests should be introduced to check the code when compiled in single precision mode (currently, the tests only run when compiled with double precision)

Some additional tests were discussed discussed in the VL+CT PR (#62):

It might be useful to modify the linear wave test (and other tests) so that we can easily change mesh sizes (currently, the sizes are hardcoded). This might make helpful for checking convergence rates
We should move our gold standard acceptance values into external files (like a csv). Currently, these values are hardcoded into the test files. This may also help with testing the code in single precision mode
Cleanup the machinery for computing L1 Error Norms. There's a lot of cruft in the current implementation that accrued because the design of the testing framework was not obvious at the time of writing.
Refactor the shock tube test so that we can specify the initial left and right state in an input file
- I feel like the best way to handle this is to introduce 2 new parameter groups to hold the left and right state (e.g. 'Initial:shock_tube:left_state' and 'Initial:shock_tube:right_state'). The parameters in each groups could easily be read into a vector of std::pairs or a mapping, and then passed directly passed to the initializer.
Add some L_inf norm tests to for linear waves

Metal density field errors

As I found about a month ago in my version of Enzo-E (https://github.com/aemerick/enzo-e/tree/isolated-galaxy), the metal density field kept showing errors that grew with the number of cycles and appeared in grid-like patterns. But this only occurred when running in parallel, and showed no issues on a single core. I also tried this on the solver-dd branch (since mine has quite a few changes), and this still occured.

I came up with a minimal example that shows this problem in my branch and the master branch. In my prior tests of this the issue only arose in the metal density field (and really any colour field I had) but not elsewhere. The attached test seems to show issues in other fields as well.

Interestingly, the behavior changes each time this is run (at least when using the master branch..... this is not something I found originally). Sometimes running in parallel works OK, sometimes it doesn't. I was testing with 2 and 4 cores.

You do have to add metal_density as a field to be refreshed in the PPM method (enzo_EnzoMethodPpm.cpp) to run this.

ppm_metal.txt

Compiler warnings

While building I noticed that there are many warnings thrown by the compiler.
I think it'd be great to aim for a warning free compilation (for subset of compilers everyone agrees upon).
If this is already the case (and I'm not using one the of compilers tested [here g++ (GCC) 10.1.0]), feel free to close the issue.
Otherwise, it may be worth to keep it open to track it and discuss at some point.

dir needs to be included in parameters in ICs for block_list

As it stands the 'dir' parameter needs to be added to the output directive for the *.block_list files to be dumped. In the case where the dir is not specified the *.block_list files are not created.

DD solver performance in DM-only cosmology simulations

I ran a few DM-only cosmology simulations with the DD gravity solver in Enzo-E side-by-side with Enzo using identical parameters and initial conditions with data output turned on. These simulations were run on Frontera. The timings between the two codes in this application are notably different for both AMR and unigrid.

AMR run: 256^3 root-grid mesh with 4 layers of refinement. The simulations were initialized at z=99, and were run down to z=7 (at which point the Enzo-E run failed due to a memory error). The wall time for the two runs to progress to redshift 7 are:

Enzo-E run on 8 nodes (56 cores/node): 6 hr, 15 min.
Enzo run on 4 nodes: 1 min, 17 sec.

Unigrid run: 256^3 root-grid mesh. The simulations were initialized at z=30, and were run all the way down to z=0. The wall times to get to redshift 0 are:

Enzo-E run on 1 node: 23 min, 19 sec.
Enzo run on 1 node: 1 min, 14 sec.

Attached are a series of parameter files for both Enzo-E and Enzo, as well as the output files from the runs.
parameter_files.zip
output_files.zip

PPM / Hydro Differences with Enzo Dev

I decided to poke around the hydro routines as ported to Enzo-E to double check that everything is up to date from Enzo-dev. This looks to generally be the case, but I'm listing some exceptions here to solicit comments from those that know much more than me about these routines to see if these are things that need to be changed / updated or left alone before we issue a public release.

ppm_de and the Euler sweep functions have been ported to C in Enzo dev. Enzo-E uses the Fortran versions. I'm assuming we should leave this alone until we get around to porting everything to C++.
The initial calls (before the Riemann solver calls) to the pgas2d_dual and pgas2d Fortran routines in the Euler sweep functions (Grid_xEulerSweep.C, etc.) are commented out in Enzo dev, but are still called in Enzo-E (xEulerSweep.F, etc.). This is because Enzo dev computes pressure using Grid_ComputePresssure prior to calling the underlying Fortran hydro routines. From what I can tell, Grid_ComputePressure and pgas2d_dual / pgas2d compute the pressure differently only if a non-default EOS is used (e.g. polytropic). So this difference is fine in Enzo-E because there is currently no option to change EOS.... Both codes still use pgas2d_dual and pgas2d to compute pressure later in the Euler sweeps to correct ge and e when using dual energy formalism. However, this means that in both cases there is an inconsistency then in how pressure is computed within the code when H_2 chemistry is used. In both Enzo and Enzo-E, calls to the C-based pressure routines apply corrections for multi-species (directly in Grid_ComputePressure in Enzo dev, and via calls to Grackle's compute_pressure in enzo_EnzoComputePressure Enzo-E) but the Fotran routines do not. I'm assuming the only reasonable fix here (unfortunately) is to wait until everything is ported to C to have a single, consistent pressure computation throughout the code.
Riemann Fallback is currently hard-coded to be turned on. I'm assuming this is likely O.K. behavior but could easily be added, even if it is a low priority feature.
Riemann solver is currently hard-coded to be two shock, with fallback to HLL. No option to just use HLL from the start, and HLLC has not been ported.
ConservativeReconstruction and PositiveReconstruction are both hard-coded to be off in Enzo-E.
Default values for the PPM flattening parameter is different in Enzo-E vs. Enzo dev: 3 and 0 respectively.

Unstated Root Block Array Shape Requirements

This is a long-standing issue that I have known about for a while.

There appears to be an unstated requirement that the number of blocks along each dimension of the array of root blocks must be a power of 2. In other words, each value in the list assigned to root_blocks in the parameter file must be a power of 2. When this invariant is violated, the program hangs indefinitely.

To replicate this behavior, you can use this parameter file (in this example, root_blocks = [2, 6, 1];). For comparison, this parameter file has all of the same conditions, except now root_blocks = [2, 4, 1];. When using the latter parameter file, you get the expected result (These parameter files were derived from the input/test_null.in file).

This constraint is not mentioned anywhere in the documentation (although I seem to recall that this is an underlying assumption baked into the yt frontend). Is this behavior intended?

If this is an intended requirement I'll make a pull request that includes:

an update to the documentation that mentions this requirement
Add a small check that raises an informative error when an invalid array shape is assigned to root_blocks.

Clarify interaction of different boundary conditions

It's not obvious how different types of boundaries interact at the edges and corners of the domain?

For example, imagine specifying an "inflow" boundary on the lower x boundaries, and "outflow", "periodic", or reflecting "boundaries" on the other boundaries. How are the values decided for the cells lying in the x-y, or x-z ghost zones? I would assume this is related to the order of the values in the Boundary:list parameter.

It would be useful if this could be investigated (either empirically or by examining the source code) and the documentation could be updated to reflect the code's behavior.

CT Asymmetry Bug

One of the MHD VL+CT tests fail, in which the L1 error norms of an inclined left and right propagating fast magnetosonic wave are compared. The norms of the L1 error vector should be identical, but there is a slight discrepancy (the values are 3.30e-8 & 3.28e-8). When this test fails currently fails, the user is notified of the failure and the fact that the test has never passed; the return code of this script is not affected by failure of this test.

I'm fairly certain that this is caused by a bug in the Constrained Transport

Change documentation references to "value" boundaries to refer to "inflow" boundaries

Currently the Reference Guide on the website for the Enzo-E/Cello Parameters talks about "value" type of boundary condition. These references to "value" boundaries should all be changed so that they instead refer to the "inflow" type of boundary condition (to reflect the fact Enzo-E/Cello recognizes the "inflow" type and not the "value" type).

I believe that this confusion relates to the fact that "inflow" boundary is implemented with the BoundaryValue class

EnzoMethodGrackle SMP-mode Error

I encountered an error while trying to run one of my branches in SMP-mode and the code crashed when EnzoMethodGrackle::initialize_grackle_chemistry_data initialize_chemistry_data. Note that I have not attempted to replicate this on the master branch, but I believe that this is a general problem. I have attached the error message here.

I intend to address this problem, but I could use a little input.

@jobordner, could you confirm that there is a synchronization step between when the parameter files get read and when the method objects get initialized? If that is the case, then I am pretty sure that multiple PEs call initialize_chemistry_data at once and each try to initialize the same object in the global shared namespace.

I can think of 2 main approaches for solving this problem:

Introduce a global variable (or maybe a static member variable of each EnzoMethodGrackle) that tracks the most recent time at which the global chemistry data instance has been initialized and use some kind of locking mechanism to make sure only 1 PE per local node calls initialize_chemistry_data(&grackle_units_) per simulation timestep. The main downside to this approach is that I'm not sure what kind of locking mechanism to use. I've looked through the charm++ documentation a little and haven't come up with an obvious mechanism. @jobordner, are you aware of any approaches? (I don't suppose something like std::lock would work)
We could move away from using global data structures and track local versions of Grackle's chemistry data structures within each instance of EnzoMethodGrackle. To avoid needing to write a pup method for data structure, we could just avoid pupping the data structures (other than chemistry_data) altogether and simply reconstruct the data structures from scratch while unpacking.

@aemerick do you see any issues with either of these approaches or have any suggestions?

Tweak CSlice interface

As discussed in PR #62, the current implementation of CSlice is lacking.

Under the current implementation, nullptr can be used to to denote that a slice should extend from the start of an axis and/or that it should extend through the end of the axis. For example:

CSlice(1,nullptr) represents a slice extending from the second element through the last element along a given dimension.
CSlice(nullptr, 3) represents a slice that includes the first 3 elements. It's equivalent to CSlice(0, 3)
CSlice(nullptr, nullptr) and CSlice(0, nullptr) both represent slices over an entire dimension.

Both @forrestglines and @pgrete suggested that we move away from using nullptr and instead introducing new types. For example, we could introduce something like sl::ALL, sl::BEG, and sl::END.

Additional Checkpoint Restart Tests

We need additional checkpoint restart tests for more method classes. I know for a fact that these tests don't exist for VL+CT or Grackle. I don't think these tests exist for the gravity solvers. In my experience, this will get broken if it isn't explicitly tested (I've personally broken it more than once).

Ambiguous meaning of "mass"

[This issue is in response to a review comment by @aemerick in #12.]

In EnzoMethodPmDeposit, the "dark" particle's mass is interpreted differently depending on whether cosmology is enabled:

// Required for Cosmology ("mass" is mass)
// Not for Collapse ("mass" is density)
if (cosmology) {
dens *= std::pow(2.0,rank*level);
}

This confusion needs to be clarified. Also, "dark" particles should not be hard-coded into the method.

Star formation and Isolated Galaxy Documentation

Currently, all of the new parameters in PR #49 are not documented. These consist almost entirely of parameters that have regular Enzo equivalents, so documentation should be somewhat straightforward, but this has not yet been done.

Problems with scalar-expressions and logical-expressions in SMP mode

Overview

There are some issues with using a parameter file including scalar-expressions and logical-expressions in the parameter file when Enzo-E is a compile with a build of Charm++ that uses SMP mode. I think I replicated this problem about a year ago, but I heard about this issue from @jobordner, first.

As I recall, this the issue is related to thread-safety/reentrancy problems with either the parser or the ExprEval library (which actually evaluates the expressions).

Consequences

As far as I can tell, this affects 5 separate pieces of functionality:

The "value" initial conditions (used to help initialize test problems using algebraic expressions)
- In the documentation there is a warning that this initializer "does not work reliably for multi-node problems." It's not obvious clear to me whether this is documenting the same issue being raised here or if there are separate that affect this even when using charm++ without SMP-mode.
The "vlct_bfield" initial conditions (assists with initializing face-centered magnetic fields for use with the VL+CT integrator)
- Specifically, issues can arise when scalar-expressions are used to specify components of the magnetic vector potential (which are then used to automatically initialize divergence-free magnetic fields)
- This initializer also supports some separate additional functionallity that is unaffected by SMP mode.
The "mask" type of refinement criterion. This is specified in the Adapt section of the parameter file and implemented with the RefineMask class.
The "inflow" type of boundary condition. This is specified in the Boundary section of the parameter file and implemented with the BoundaryValue class.
The general masks that can be specified in the Boundary section of the parameter file that facilitate the specification of multiple boundary objects that act on separate areas of a particular boundary.

Because evaluation of scalar-expressions and logical-expressions is relatively slow, the recommended best practice is to use specialized problem initializers for large production science simulations (instead of the "value" and "vlct_bfield" initializers). So in a sense, the issue with these initializers doesn't seem too problematic. At the same time, these initializers are used in a large fraction (if not the majority) of test problems. Thus, this issue significantly reduces test coverage in SMP mode (and potentially hides other existing issues).

The issues for the other functionallity might cause problems in SMP mode for:

simulations using Static Mesh Refinement
wind tunnel simulations

Outstanding Questions

It might be useful to assess the extent of the problem.

Does this affect masks that are specified with a .png file? (I'd assume the answer is "no")
Are there any issues when a parameter that could accept a scalar-expression is just passed a single floating point value? (I'm pretty fuzzy on this, but I think I did some basic tests and found that the answer was "no").

If the problem is not easily addressable, maybe we modify Enzo-E/Cello to exit with an error message if the user tries to use this functionallity and the code was compiled in SMP mode.

enzo-project / enzo-e Goto Github PK

enzo-e's People

Contributors

Stargazers

Watchers

Forkers

enzo-e's Issues

Cycle 0

Cycle 1

Overview

Consequences

Outstanding Questions

Recommend Projects

Recommend Topics

Recommend Org

Jobs