Memory Consumption/Leakage Check

Check each libyt API memory consumption and see if there is any memory leakage.

Memory Consumption Check:

Using large for loops to run each API.
Use valgrind tool massif.
Check with MPI.

Execute bash script:

# run with "sh run.sh <log_folder_name>"
# naming
LOG_FOLDER=$1
EXE=API-test.out
# execute
export LD_PRELOAD=/home/calab912/software/valgrind-3.19.0/lib/valgrind/libmpiwrap-amd64-linux.so
export PYTHONMALLOC=malloc
mpirun -np 4 --output-filename $LOG_FOLDER /home/calab912/software/valgrind-3.19.0/bin/valgrind -v --tool=massif --time-unit=B --detailed-freq=1 ./$EXE

Makefile:

# path
MPI_PATH   := /home/calab912/software/openmpi/3.1.5-gnu
LIBYT_PATH := /home/calab912/Documents/GitHub/libyt

# output name
BIN  := API-test.out
FILE := main.cpp
COMPILER := $(MPI_PATH)/bin/mpic++

# command
$(BIN): $(FILE)
    $(COMPILER) -o $(BIN) $(FILE) -I$(LIBYT_PATH)/include -L$(LIBYT_PATH)/lib -lyt

clean:
    rm -f $(BIN)
    rm -rf log
    rm -rf __pycache__
    rm -f *.png *.gif RecordTime_* massif.*

Python Memory Leakage Check:
- Count every libyt data member's reference count using sys.getrefcount.
- sys.getrefcount will increase original count by 1, since it refers to that value to get the count.(link)
  - Every value should have reference count 2.

Memory Consumption Caused By Each API

Other `yt` Operations

Covering Grid

Python API check

Python Reference Count Check

Under these structure, sys.getrefcount are all 2.

libyt: 9
- Dictionary
  - grid_data: 3
  - hierarchy: 3
  - param_yt: 3
  - param_user: 3
- Method
  - derived_func: 3
  - get_attr: 3
  - get_field_remote: 3
  - get_attr_remote: 3

Support Derived Function and Particle Get Attribute Function Prepare Multiple Data Chunks At A Time

Support Derived Function and Get Attribute Function Prepare Multiple Data Chunks At A Time

Problem

Currently, derived_func and derived_func_with_name prepare one data grid at a time.
This may cause poor performance for code heavily use hybrid OpenMP/MPI (ex: gamer) to generate derived field.
This limits a derived function style. One must write derived function in a way libyt likes.
Though this problem isn't actually affecting libyt's overall performance for now.

Solution

libyt

Data Structure

#56
Update data members derived_func_chunk and derived_func_with_name_chunk in yt_field struct.
- void (*derived_func_chunk) ( int list_length, long *list_gid, yt_array *data )
- void (*derived_func_with_name_chunk) ( int list_length, long *list_gid, char *field, yt_array *data )
Update data member get_attr in yt_particle struct.
- void (*get_attr) ( int list_length, long *list_gid, char *attr_name, yt_array *data_array)

Derived Function/Get Attribute Function C++ Extended Python Method

Python dictionary structure for wrapping them and storing them inside NumPy Array.

RMA

Since one will only need remote memory access when MPI size > 1. Do we really need to prepare many grid at a time if one already have parallelized in MPI?
Update yt_rma_field.cpp when preparing grid data for remote ranks.
Update init_libyt_module.cpp function libyt_field_get_field_remote so that it asks many grids at a time.

yt

Update io.py, find places to store grids.
Redesign _get_field_from_libyt, _read_chunk_data, and _read_fluid_selection.

Tasks

Polishment and Optimization

Set MPI root rank, for now, it's fixed to rank0.
- #45
Error message format
Names for input parameter, some might be a little bit confusing
- Change data_dim to data_dimensions.
Name of the libyt API

Inline-analysis shut down when plot with select data

Inline-analysis went wrong when plotting with selected data, even if yt operation are using parallel_objects function.

reminder: https://calab-ntu.slack.com/archives/CSNGU2B4L/p1610709131006300

Update `libyt` Milestone

Support getting non-local grids
Support ghost zone
Minor changes and restriction to field name.
A workaround for big send count MPI.
Supported yt functionalities.

Tasks

Notes

Better to work with Matt on this.
Some of the above functionalities have not been parallelized with grid decompositions in yt. Which they will get grids that aren't exist on local rank.
Halo Analysis and Isocontours have not been tested yet.
Enzo embedded python analysis may not support particles?
Related issue #14

Support Particle Functionalities

Initially, particle plots may generate false figure, if the MPI size is too large. After testing on different machine, this doesn't seems to be an issue in libyt. This is more or less related to memory space of one machine has. But we still cannot find where actually is this issue, we move to another issue.

#41

ℹ️ After testing on calab912 and eureka, it doesn't seem to be an issue. Instead, it is more or less related to memory on the machine. Though we still don't know why and where is the bug.

✔️ Particle plot (ParticlePlot, ParticleProjectionPlot) can successfully run in parallel in inline-analysis. Though some memory related issues may occurred.

Test Run on My Laptop with RAM = 16

All of the images are results from inline-analysis. MPI=1 is the correct and expected outcome.

Test Problem: gamer Plummer
Machine: My Laptop

ParticlePlot

import yt
yt.enable_parallelism()
def yt_inline():
    ds = yt.frontends.libyt.libytDataset()
    par = yt.ParticlePlot( ds, 'particle_position_x', 'particle_position_y', "particle_mass", center='c' )
    if yt.is_root():
        par.save()

MPI=1
MPI=2

ParticleProjectionPlot

import yt
yt.enable_parallelism()
def yt_inline():
    ds = yt.frontends.libyt.libytDataset()
    par = yt.ParticleProjectionPlot( ds, "z")
    if yt.is_root():
        par.save()

MPI=1
MPI=2
The upper-right cluster is different.

Fortran API

Tasks

Add a Fortran API, which is necessary for FLASH

Extend to ParaView

Extend libyt to ParaView

If yt can connect to ParaView (link), then maybe libyt can bypass Catalyst which is a tool for simulation codes to do inline-analysis (in situ) in ParaView.

Although there are things worth notice in real-time volume rendering:

If we wish to do volume rendering in ParaView, we need Nvidia IndeX. But it only supports serial process. It has additional fees if run on multi-node system. (link)
Which means if libyt really wants to support ParaView real-time volume rendering, only one node can be in charge of this inline-analysis. And libyt is not designed for this kind of workflow yet.

Plot Modifications / Annotations Test

Plot Modifications Test

Cookbook Example

Should Not Put `save()` and Operation Inside `yt.is_root()` Clause

annotate_streamlines
annotate_velocity
annotate_line_integral_convolution
annotate_particles
annotate_quiver
annotate_cquiver
annotate_magnetic_field

Error Message

Failed at Somewhere Else Other than `IOHandlerlibyt`

annotate_clumps (failed in MPI = 3, succeed in MPI = 1, 4)

Failed at Opening the Saved Figure

This happens randomly.
Not sure if it is caused by moving save() outside of the if yt.is_root() clause.
#38

`yt` frontend for `libyt`

TODOs in io.py.
Remove debug msg in libyt frontend.
Pull and merge to newest yt main branch.

Miscellaneous issues

Tasks

libyt unknown message [DEBUG]
~~Code units and CGS units are mixed when plotting derived fields~~

Ask Non-Local Grid From Other MPI Rank

Possible Solution:
- libyt get non local grids from other rank, and pass it back to yt, just like how derived fields did.
- Reference
- Code template

Load Particle Data to `yt` through Wrapping the Existing Array

If the particle data is store in contiguous memory block, we can directly wrap them and pass in to yt. So that we don't have to copy it again.
This new api should co-existing with the original one ( user input get_attr which returns the particle attributes array ).

Notes

In order to merge enzo, I introduced the API for wrapping particle data array in libyt at (#79). That PR doesn't support particle array just yet, still fixing some bugs.

Support more codes

Tasks

FLASH
Athena++
Enzo-e
...

Notes

For FLASH, we need to support Fortran first (#6)

Support Time Series Figure

Plot time series dataset. This can be done in inline script already, but the script will be a little bit messy.

OpenMPI and OpenMP

Whether or not define OpenMP Threads Number in Code

Volume Rendering Test Run

Post-Processing
- OpenMP + OpenMPI

Reference

Timing libyt in GAMER

Tasks

Add a timer in GAMER for YT_Inline()
Measure its performance in real applications (e.g., cluster merger and isolated FDM halo)

Related Tasks

#18

Group Different MPI Size or Nodes for In-Situ Analysis

Currently, using MPI size N in simulations will make yt in-situ analysis run with MPI size N.
We want to make it more flexible by giving user freedom to group what MPI ranks should run simulation and what ranks for in-situ analysis.

Feature

Use different number of MPI processes for simulations and in-situ analysis.

Migrate notes

Tasks

Migrate @hyschive's notes on Evernote to GitHub

Demonstrate derived field with EoS

Tasks

Add a temperature derived field in GAMER for libyt by calling Hydro_Con2Temp()
Compute gas temperature in the ClusterMerger test problem and compare with the post-processing script gamer/example/test_problem/Hydro/ClusterMerger/yt_script/plot_slice-z.py
If possible, compute temperature/pressure/entropy in the CCSN simulations. (We tested entropy instead)

Check on libyt derived function functionality

Compare temperature calculate through EoS libyt derived function and post-processing temperature output directly by gamer.

Related tasks

#18

`libyt` Document

Document

This document is more like a user guide, and how they can reach developer guide. They will be put inside README.md and libyt/doc folder.

User Guide
- Welcome msg
  - guides to reach information.
- Supported yt functionality
  - What functionalities will fail. (Even though we have support getting non-local grids and particle data using libyt. See #26 )
- User Guide content table at main readme.
- Example code.

Code Optimize

Make searching in yt_rma_field and yt_rma_particle faster.

Support Ghost Zone

Definitions and Terms

Ghost cell are defined inside yt_field struct.
- We assume that different fields can have different ghost cell in each dimension of the data array.
- Field in different grids must have the same number of ghost cell.
- short field_ghost_cell[6] is defined as number of cells to ignore at the beginning and the end of the data in each dimensions. Which means field_ghost_cell[0] is number of cells to ignore at the beginning of 0-dim of the data, and field_ghost_cell[1] is number of cells to ignore at the end of 0-dim.
We don't pass ghost cell in hierarchy.
We load them to python along with loading field_list dictionary.
grid_dimensions and data_dim:
- Ghost cell does not include in grid_dimension, they are just dimensions read by yt.
- data_dim define in yt_data are the actual data dimension of the data_ptr to be wrapped, it contains ghost cell.
- API:
  - grid_dimensions: [x][y][z] dimension read by yt. (yt_getGridInfo_Dimensions API)
  - data_dim: The actual data dimension of the pointer. (yt_getGridInfo_FieldData API)

TODOs

Test Run

libyt/example
- ProjectionPlot
- SlicePlot

Related Issue

#4

Naming in `libyt` Fields

Frontend Native Fields: Fields defined in XXXFieldInfo in yt frontends. They can be fields derived from other existing fields with function defined inside XXXFieldInfo.

Things Should be Aware of

Field defined in XXXFieldInfo might not have the same name used inside libyt. For example, MagX and CCMagX in gamer. The first one is used inside inline-analysis, while the second one used in post-processing, even though both of them represent the same field.
Function defined inside XXXFieldInfo which derived these added fields don't know they are the same, and it should get MagX instead of CCMagX in inline-analysis. So it ends up an error.

Enhancement

Should add another data member in yt_field to match to an already exist definition inside XXXFieldInfo.

Support python3, parallelization, derived fields

Support new features:

Work with python3.8
Simulation code can use libyt to run parallel inline-analysis with yt.
Make user input their own derived function
- For example, convert face-centered magnetic fields to cell-centered data when yt needs them. This can save memory.

Improve Memory Usage Efficiency and Other Miscellaneous

Improve Memory Usage Efficiency

Change grid_levels from NPY_LONG to NPY_INT. (allocate_hierarchy.cpp, append_grid.cpp)

Look up grid info API using NumPy API

A Better Way to Gather and Pass Hierarchy to Python

Always wrap data pass in
When passing data to Python
- Do not create key-value pair if no field data pointer to wrap when creating libyt.grid_data.
- Check RMA process, don't create key-value pair if some data did not found on some rank.
  - It looks like I did this already...
When passing hierarchy to Python
- Make Python stores one copy of full hierarchy only.
  - Assign libyt's hierarchy directly in yt.
    - Override _initialize_grid_arrays
    - Rewrite abstract _parse_index
  - Reexamine libyt's hierarchy new allocated buffer.
  - Reexamine particle count data type.
    - Stay with long, since we now assign libyt allocated array in yt frontend.

Other Miscellaneous

RMA

Reexamine no_locks, do I even need that.
- https://wgropp.cs.illinois.edu/courses/cs598-s15/lectures/lecture35.pdf
- https://cvw.cac.cornell.edu/mpionesided/fence
- Will schedule point-to-point communication faster than RMA? (Just curious).

Bug

libyt unable to finalize successfully on twnia3.

Add Timer for Performance Test

Section

Loading parameter: adding python variable from Python C++ API.
Wrapping grids to NumPy array.
Executing inline script
Getting data from derived fields.
Data transition in RMA operation (One-Sided MPI)
Clean up.

Links

A really ugly timer at cindytsai/yt branch libyt-timer: link
To use timer in libyt, see: link

Test GAMER

Tasks

Related test on
- #14
- #19
- #43

Test on Gradient Functionality

We expect it to be failed, since this requires nearby grids. So yt will definitly ask for non local grids.
Related Issue: #26

TODO

~~Write on Document that this would fail.~~ (Move to #23 )

Support Hybrid OpenMP/OpenMPI Parallelism

Currently, libyt only uses OpenMPI. Maybe we can use OpenMP when preparing grid data and particle data etc.

Code release

Tasks

Official repo
Documentation
License
Release Tag (?)

Halo Analysis Test

Test on halo analysis.

Update Minor Issues

Update `libyt`

MPI_Gatherv not support send count > INT_MAX.
Simplify choosing data dimension and data type between data_dim and grid_dimensions, data_dtype and field_dtype.
If data_ptr = NULL, libyt shouldn't abort when data_dtype or field_dtype is not set and check_data==false. We don't need to wrap this array, hence no need to set data type.
- This is done inside append_grid.cpp.
Support more particle type, YT_LONG.
- Enum loop.

Invoke libyt in GAMER substeps

Tasks

Call YT_Inline() in the sub-step updates (i.e., in EvolveLevel())
May need to perform temporal interpolation on lower levels
- Need to allocate additional arrays, perform interpolation, and then pass these arrays (instead of amr->patch-fluid[]) to libyt
- Add a runtime option for it
Add new criteria for determining when to call YT_Inline()

Production runs

Tasks

Add New Prototype For Derived Field Function

Add New Prototype For Derived Field Function in Struct `yt_field`

Under yt_field struct, add data member derived_func_with_name
- void (*derived_func_with_name) (long, char *, double *);
- This is for storing universal derived field, so that one can call func(gid, "Dens", data) and func(gid, "MomX", data) to get different derived field by passing different field name.
- If one set field_define_type == "derived_field", the order libyt will use the derived function:
  - derived_func
  - derived_func_with_name

Check `yt` `save()` Function

Check `yt` Save Function

Description

In inline script, we need save() outside the if yt.is_root() clause, because annotate_cquiver (and other annotations) makes save does data IO. When doing data IO in libyt (using function inside io.py), each MPI rank must call the same method. See:

#31
#26

import yt
yt.enable_parallelism()
def yt_inline():
    ds = yt.frontends.libyt.libytDataset()
    slc = yt.OffAxisSlicePlot(ds, [1, 1, 0], [("gas", "density")], center="c")
    slc.annotate_cquiver(("gas", "cutting_plane_velocity_x"), ("gas", "cutting_plane_velocity_y"), factor=10, plot_args={"color":"orange"}, )
    slc.save()

Sometimes, there will be some missing figure in the output series of figures. This may happen if each rank is writing and creating a file with identical name. (link)

Reload/Refresh Inline Script During Runtime

We might want to analyze the data dynamically and get the response from the inline analysis directly, just like using ipython during runtime.

Features

Getting error messages from Python and informing users, instead of terminating the whole process.
Load the script and interact dynamically during the code runtime.
Changes maintain throughout the rest of the process
Export the current functions.
Can determine and see what functions will run in the following steps.

Enhancement

~~- [ ] Colorful python prompt terminal~~
~~- [ ] Indent~~

Parsing traceback errors
string or char? change to string.

Working Procedure

When to enter and exit interactive mode

Run user script in try --> stop if it goes wrong
Detect LIBYT_STOP file --> stop if detected

In interactive mode

Each inline function execute results/status:

* Inline function execute status:
  * yt_inline() ...... finished!
  * yt_inline_arg() .. failed
    * Traceback message ...

Enter interactive mode >>>:

>>> if a == "something":
...     print("run somthing")

TODOs

Problems

Makefile

Add compile option INTERACTIVE_MODE
~~- [ ] Cleaner Makefile.~~

yt Finalize

Python objects (freed by Py_Finalize())
Function Status Vector (Don't need to.)

Try, Except, Finally

try: execute inline script
except: only root prints full traceback msg, the other ranks prints no error msg.
- store error msg to somewhere else, so that we can print it in yt_interactive_mode.
- parse traceback msg, make it more readable for user
finally: sync status?
Use %libyt exit for exit. (temp)

libyt command

Should start with %libyt, like %libyt ...
%libyt exit: exit interactive mode, and enter next iteration of simulation.

Tests

Particle functionalities may generate false figure if memory is almost full

Particle Functionalities May Generate False Figure if Memory is Almost Full

Related Issue

#32

Description

Initially, particle plots may generate false figure, if the MPI size is too large. After testing on different machine, this doesn't seems to be an issue in libyt. This is more or less related to memory space of one machine has. But we still cannot find where actually is this issue.

(I haven't reproduce the issue.)

Paper

###Tasks

Release the code (#8)
Work with Matt
Where to submit (e.g., ApJS)?

Performance

#43
- Scaling is bad in OpenMP and OpenMPI in volume rendering.

Volume Rendering Supports Only Even MPI Size

I think it is not faster when run in parallel. The scaling is bad.
- #43

Support Dask

Dask is a flexible library for parallel computing in Python. It is growing its popularity among Python ecosystems. Because libyt does the in-situ analysis by running Python script, it is important to support this feature as well.

Current `libyt` structure

Each MPI rank initializes a Python interpreter, and they work together through mpi4py.

MPI 0 ~ (N-1)
Python
libyt Python Module
libyt C/C++ library
Simulation

How should `dask` be set up inside embedded Python?

We can make two additional ranks specifically for scheduler and client (not necessarily to be MPI 0 and 1), and the rest of MPI nodes for workers. Each simulation also runs inside workers. By following how dask-mpi initialize() initializes scheduler, client, and workers, it is possible to wrap this inside libyt.

MPI 0	MPI 1	MPI 2	...	MPI (N-1)
Scheduler	Client	Worker	Worker	Worker
libyt Python Module	libyt Python Module	libyt Python Module	libyt Python Module	libyt Python Module
libyt C/C++ library	libyt C/C++ library	libyt C/C++ library	libyt C/C++ library	libyt C/C++ library
Empty	Empty	Simulation	Simulation	Simulation

Solve data exchange problem

Because we use Remote Memory Access (one-sided MPI) with some settings that required every rank to participate in the procedure (#26). libyt suffers from data exchange process between MPI nodes. Every time yt reads data, all ranks should wait for each other and synchronize.
However, if we move this data exchange process from C/C++ to Python side, then it is possible to exchange data with more flexibility using dask and exchange data in a asynchronous way. By encoding what MPI ranks should get into a Dask graph, asking worker to prepare local grid data, and exchanging data between workers, it will be much easier.
(At least much easier than using C/C++. 😅 )

Update milestones

Tasks

Links

https://hackmd.io/@Viukb0eMS-aeoZQudVyJ2w/ryCYwu0xF

Support more `yt_set_UserParameter*`.

Support more `yt_set_UserParameter`

For other yt code frontends to use their own definition of fields in in-situ analysis, we need to create APIs that input all kinds of parameters. But I'm not sure what to implement yet.

Support Enzo

Tasks

Support libyt in Enzo
Test on yt.ParticlePlot with different ptype particles.
Test ghost zone.
Test on face-centered data.

Notes

Work with Matt on this

Annotate Particles Generate False Figure

Originally, we thought this issue is related to particle functionalities. Since it's not, we move it to here.

Annotations in a plot (annotate_particle)
- Subclass of PlotCallback
- Create plot via this function.
Particle functionalities (ParticlePlot, ParticleProjectionPlot)
- Different class hierarchy from PlotCallBack

Particles

Tasks

Annotate Clumps Not Working in Odd MPI Size

Test Problem: gamer MHD Vortex

Inline Script:

import numpy as np
import yt
from yt.data_objects.level_sets.api import Clump, find_clumps
yt.enable_parallelism()
def yt_inline():
    ds = yt.frontends.libyt.libytDataset()
    data_source = ds.all_data()

    c_min = 10 ** np.floor(np.log10(data_source[("gas", "density")]).min())
    c_max = 10 ** np.floor(np.log10(data_source[("gas", "density")]).max() + 1)

    master_clump = Clump(data_source, ("gas", "density"))
    master_clump.add_validator("min_cells", 20)

    find_clumps(master_clump, c_min, c_max, 2.0)
    leaf_clumps = master_clump.leaves

    prj = yt.ProjectionPlot(ds, "z", ("gamer","Dens"), center="c")
    prj.annotate_clumps(leaf_clumps)

    # Either having this is clause or not, it will still failed when MPI = 3.
    if yt.is_root():
        prj.save()

Description:
- FAILED in MPI = 3, stuck at somewhere else, other than class IOHandlerlibyt.
- SUCCEED in MPI = 1, and MPI = 4.
  - Just like the projection plot, because i didn't set a range that actually grab a clump.

Polishment and Enhancement

These are some TODOs in ( #11 ), but found them unnecessary to accomplish them.

Support dimensionality < 3

It only supports dimensionality = 3.

Make Inline Python Script Changeable

File name is fixed through out the whole runtime. And cannot be altered.

Make `yt_add_user_paramter_*` Support More Input Type

Support only scalar and 3-dim vector.
This API is used when adding frontend specific XXXDataset attributes in yt.

Set MPI Root Rank

We assume root rank is 0.
Since in most cases all nodes should have similar performance, setting root rank is unnecessary.

yt-project / libyt Goto Github PK

libyt's People

Contributors

Stargazers

Watchers

Forkers

libyt's Issues

Memory Consumption/Leakage Check

Memory Consumption Caused By Each API

Other yt Operations

Covering Grid

Python API check

Python Reference Count Check

Support Derived Function and Get Attribute Function Prepare Multiple Data Chunks At A Time

Problem

Solution

libyt

Data Structure

Derived Function/Get Attribute Function C++ Extended Python Method

RMA

yt

Tasks

Polishment and Optimization

Update libyt Milestone

Tasks

Notes

Support Particle Functionalities

Test Run on My Laptop with RAM = 16

ParticlePlot

ParticleProjectionPlot

Tasks

Extend libyt to ParaView

Plot Modifications Test

Should Not Put save() and Operation Inside yt.is_root() Clause

Error Message

Failed at Somewhere Else Other than IOHandlerlibyt

Failed at Opening the Saved Figure

Tasks

Ask Non-Local Grid From Other MPI Rank

Load Particle Data to yt through Wrapping the Existing Array

Notes

Tasks

Notes

Support Time Series Figure

Whether or not define OpenMP Threads Number in Code

Volume Rendering Test Run

Reference

Tasks

Related Tasks

Group Different MPI Size or Nodes for In-Situ Analysis

Feature

Tasks

Tasks

Related tasks

Document

Code Optimize

Support Ghost Zone

Definitions and Terms

TODOs

Test Run

Related Issue

Naming in libyt Fields

Things Should be Aware of

Enhancement

Improve Memory Usage Efficiency and Other Miscellaneous

Improve Memory Usage Efficiency

Look up grid info API using NumPy API

A Better Way to Gather and Pass Hierarchy to Python

Other Miscellaneous

RMA

Bug

Add Timer for Performance Test

Section

Links

Tasks

Test on Gradient Functionality

TODO

Support Hybrid OpenMP/OpenMPI Parallelism

Tasks

Halo Analysis Test

Other `yt` Operations

Update `libyt` Milestone

Should Not Put `save()` and Operation Inside `yt.is_root()` Clause

Failed at Somewhere Else Other than `IOHandlerlibyt`

Load Particle Data to `yt` through Wrapping the Existing Array

Naming in `libyt` Fields

Update `libyt`

Add New Prototype For Derived Field Function in Struct `yt_field`

Check `yt` Save Function

Current `libyt` structure

How should `dask` be set up inside embedded Python?