grainlearning / grainlearning Goto Github PK

A Bayesian uncertainty quantification toolbox for discrete and continuum numerical models of granular materials, developed by various projects of the University of Twente (NL), the Netherlands eScience Center (NL), University of Newcastle (AU), and Hiroshima University (JP).

Home Page: https://grainlearning.readthedocs.io/

License: GNU General Public License v2.0

Python 15.35% Shell 0.05% PureBasic 1.07% Jupyter Notebook 83.53%

bayesian-inference low-discrepancy-sequences mixture-models parameter-identification sequential-monte-carlo uncertainty-quantification

grainlearning's Introduction

Welcome to GrainLearning!

fair-software.eu recommendations	Badges
code repository
license
community registry
citation
Best practices checklist
howfairis
Documentation
Code Quality
Paper

Bayesian uncertainty quantification for discrete and continuum numerical models of granular materials, developed by various projects of the University of Twente (NL), the Netherlands eScience Center (NL), University of Newcastle (AU), and Hiroshima University (JP). Browse to the GrainLearning documentation to get started.

Features

Infer and update model parameters using "time" series (sequence) data via Sequential Monte Carlo filtering
Uniform, quasi-random sampling using low-discrepancy sequences
Iterative sampling by training a nonparametric Gaussian mixture model
Surrogate modeling capability for "time" series data

Installation

Install using poetry

Install poetry following these instructions.
Clone the repository: git clone https://github.com/GrainLearning/grainLearning.git
Go to the source code directory: cd grainLearning
Activate the virtual environment: poetry shell
Install GrainLearning and its dependencies: poetry install

Install using pip

Clone the repository: git clone https://github.com/GrainLearning/grainLearning.git
Go to the source code directory: cd grainLearning
Activate the virtual environment: conda create --name grainlearning python=3.11 && conda activate grainlearning
Install GrainLearning and its dependencies: pip install .

Developers please refer to README.dev.md.

To install GrainLearning including the RNN module capabilities check grainlearning/rnn/README.md.

For Windows users

Installation using Windows Subsystem for Linux (WSL)
- Enable WSL1 or WSL2 according to the instructions here
- Install GrainLearning using poetry or pip
Installation using anaconda (if no WSLs are available on your Windows system)
- Open Anaconda Prompt and install GrainLearning using pip. This should create a virtual environment, named GrainLearning.
- Choose that environment from your anaconda navigator: click Environments and select grainlearning from the drop-down menu

One command installation

Stable versions of GrainLearning can be installed via pip install grainlearning However, you still need to clone the GrainLearning repository to run the tutorials.

Tutorials

Linear regression with the run_sim callback function of the BayesianCalibration class, in python_linear_regression_solve.py
Nonlinear, multivariate regression
Interact with the numerical model of your choice via run_sim , in linear_regression_solve.py
Load existing DEM simulation data and run GrainLearning for one iteration, in oedo_load_and_resample.py
Example of GrainLearning integration into YADE
- Two particle collision
- Triaxial compression
Data-driven module tutorials:

Citing GrainLearning

Please choose from the following:

A DOI for citing the software.
The software paper: Cheng et al., (2024). GrainLearning: A Bayesian uncertainty quantification toolbox for discrete and continuum numerical models of granular materials. Journal of Open Source Software, 9(97), 6338, 10.21105/joss.06338
H. Cheng, T. Shuku, K. Thoeni, P. Tempone, S. Luding, V. Magnanimo. An iterative Bayesian filtering framework for fast and automated calibration of DEM models. Comput. Methods Appl. Mech. Eng., 350 (2019), pp. 268-294, 10.1016/j.cma.2019.01.027

Software using GrainLearning

Community

The original development of GrainLearning is done by Hongyang Cheng, in collaboration with Klaus Thoeni , Philipp Hartmann, and Takayuki Shuku. The software is currently maintained by Hongyang Cheng and Stefan Luding with the help of Luisa Orozco and Retief Lubbe. The GrainLearning project receives contributions from students and collaborators.

Help and Support

For assistance with the GrainLearning software, please create an issue on the GitHub Issues page.

Credits

This package was created with Cookiecutter and the NLeSC/python-template.

grainlearning's People

Contributors

Stargazers

Watchers

Forkers

retiefasuarus

grainlearning's Issues

Complete information in best practices

complete information requested in : https://bestpractices.coreinfrastructure.org/en/projects/6533
to obtain the badge of best practices.

Make predictions using only input and param data

Seems like the function predict_batch, formerly predict_macroscopic is taking a Tensorflow dataset, from which only the first element ({'inputs': self.input_data, 'params': param_data}) is used. The "inputs" entry of that dictionary contains num_samples copies of the same input sequence, which could be optimized.

Integration test linear regression

We have to make sure the integration pass a few tests.

MAE with with simulations/observations decreases as sigma decreases
The value with the least MAE has the most probable parameters
parameters zoom is as sigma decreases

see https://github.com/GrainLearning/grainLearning/tree/patch_most_prop_params

Maintenance of poetry.toml

Project requires maintenance of dependencies addressing pyproject.toml:

Isolate core code from developer dependencies, RNN, plotting and interactive

See poetry documentation of optional groups

Update dependencies with poetry update
packages = [ { include = "grainlearning"}] should be under [tool.poetry]

Add unit tests for rnn

Once #13 is done and before we actually start to merge functionally.

Feedback on GrainLearning from Balazs

The need for deterministic parameter sampling
- after each iteration
- independent GL run
Need for constrains during parameter sampling
- Situations when parameters have constraints v0 < v1 < v2
- Possibility for users to define their own resample function
Common interface as other codes
- Example scipy or matlab
```
def run_sim(batch_parameters,...):
    # loop within the code
    return batch of samples
```
- Or at least have it well documented and explained
Overall improve documentation. More documentation etc.
A way to access the current iteration (system linked with calibration module?)
On loading previous proposal - an example how it can be done in run_sim
Add link to new GL repository on old one

Improve test coverage

@luisaforozco: How to do this properly? according to sonarcloud?

call backs to DEM software missing in new GL

TODOs

Name of the parameters
call back functions to YADE, MercuryDPM, LIGGGHTS, etc
I/O
call back to use GL as a postprocessing tool only
Directory structure to ensure backward compatibility

JOSS Review: Paper Comments

Hi! I’m reviewing your paper for JOSS and opening small issues as I come across them. This issue corresponds to my comments on the manuscript.

The paper hits all of the desired points by JOSS, but I personally found it to be very terse and not informative about the software. I think substantially more detail should be added to the paper such that, when cited by researchers, a reader can briefly look at the paper and understand the functionality and design of the software. I feel comfortable requesting this change as you have very extensive documentation and I feel that using some of that content to formally present the software in the paper is not too arduous of a task.

Below are my comments that need to be addressed in the revised paper

While the summary does a good job at setting the context of the field, I find the summary to be lacking in describing what GrainLearning does. Could you please add another two sentences or so highlighting the broad strokes of what GrainLearning does, beyond “emulating granular behavior” and “learning the uncertainties?”
The ‘State of the Field’ section reads more like what I would expect for the ‘Statement of Need’. I think you should merge these and shorten.
I don’t believe lines 32 - 40 should be in the statement of need as they describe the functionality of the software.
The “functionality” section of the paper is highly tailored for a narrow scientific community, and I think obscures the very cool capabilities of the model. I would greatly appreciate if, at a minimum, you could provide a software architecture diagram that highlights the GarinLearning data model. This will allow a quick reader to glance at the paper and understand what GrainLearning takes in and what it puts out.
It is not clear to me what the authors mean by “calibration” in their functionality statement. This may be because I am not in-field, but a definition would be useful. My reading of this statement does not jibe with the type of model calibration I am used to in Bayesian model evaluation.
I also would appreciate a statement of contribution of the authors. I’m very happy that all authors are listed as co-first authors, but I think with this many, a statement of who-did-what would be useful. If all authors equally contributed to writing/coding/testing/deploying/etc, great! Having that written down would only make it more clear that this was a highly collaborative effort.

Error handling in invalid simulations

I'd like to open a discussion possible techniques of handling errors when simulations results do not make sense.

For example:
a DEM simulation with a certain combination of parameters might become unstable and its output may be considered invalid (Nan, Inf etc.) what are some techniques in Grain Learning to handle this situation without restarting the whole calibration process.

Proposed solutions:

We can resample parameters from the same distribution with an error handling callback function
Let SMC identify invalid simulations and reject those solutions in the calculation of the posterior distribution

Documentation build is failing

Clean up RNN branch

Remove unnecessary files
Remove/merge duplicate boilerplate files
Change file structure to merge with grainlearning
Update requirements
etc

Generalization of RNN module

RNN module is specific for triaxial compression:
The data needs to be in groups pressure - experiment_type, and the values of such subgroups are also added to the inputs of the model.
Further evaluation of the amount of work and specific tasks that need to be tackled will follow.

Installation with IDEs

We see in the TUSAIL school quite a few students use IDEs like visual studio and Pycharm.
We may want to include instructions on how to install within these IDE.

Handle paths using Pathlib and/or os.path.join

Since GL is a package that is going to be used by users in different platforms i.e. windows, linux, macOS, it is necessary that the paths will be platform independent. Currently GL works in macOS and linux but it wouldn't in windows, where the paths are with single or double \ instead of /.

To solve this I propose:

use pathlib:

from pathlib import Path
data_folder = Path("source_data/text_files/")
file_to_open = data_folder / "raw_data.txt"

# It also works concatenating multiple strings on the fly:
>>> a_string = "path/to/something"
>>> a_path = Path(a_string+"/new/directory/type")
PosixPath('path/to/something/new/directory/type')

# and adding a str to a Path
>>> a_path / "new/string/path"
PosixPath('path/to/something/new/directory/type/new/string/path')

and/or also use os.path.join

import os
data_folder = os.path.join("source_data", "text_files")
file_to_open = os.path.join(data_folder, "raw_data.txt")

# This also works concatenating a string and a pathlib Path
>>> ab = Path("test/one")
>>> os.path.join(ab,"other/test/")
'test/one/other/test/'

Add possibility to have multiple control variables

In some cases, the control variables (system.ctrl_data) are not 1 dimensional.
They could treated in the same way as observations (system.obs_data) are i.e. list instead of single value.

Also in this same issue: Should we update the documentation to make it clear that for both system.ctrl_data and system.obs_data the last dimension is time?

Refactor CalibrationToolBox class (now called GrainLearning)

The associated commit is: ebaeaa2
@luisaforozco do you know a better way of linking an issue to a commit?

Change relative import to absolute imports everywhere in grainlearning

I have mentioned and explain it to Hongyang, but if you don't rememeber here there is a good resource: https://realpython.com/absolute-vs-relative-python-imports/

RNN integration to GL

The RNN module is made for the triaxial dataset. It should be generalized to

take DynamicSystem.ctrl_data (input), DynamicSystem.param_data (input), andDynamicSystem.sim_data (output) coming out of GL directly
#69
#71
Allow a mixed use of machine learning and model-based

Improving callback functions

I'd like to open a discussion on how we can improve callback functions:

The way simulation data is set is non-standard, or different from other optimization libraries (scipy etc.). A suggestion is to take parameters as an input and output simulation data.
In connection to 1, if more functionality is required. Should additional callback functions (error handling, etc, post or pre callback) may be introduced?
Located in the DynamicSystems class, it is not possible to access additional information from other classes (like inference)
If the user require special functionality, documention how it can be done and possible common cases should be made. For example writting a new BayesianCalibration.run() function.

in Windows wandb doesn't generate latest-run

When trying to merge RNN to GrainLearning the CI/CD showed that the all tests were passing for linux, macOS but not for windows.
I debugged it in a windows machine and found that the issue comes from wandb (see reported issue).
The error is specifically at unit test test_rnn_model.py/test_train when:

assert Path("wandb/latest-run/files/model-best.h5").exists()

Indeed, in windows, the simlink to latest-run is not automatically created by wandb.

Other options, provided by wandb, to access the files of the latest run include:

Syncing the runs to wandb and then retrieving the run with the closes creation date.
Creating a docker container to have dry-runs locally in your machine.
Both options are an overkill for unit tests.

A dirty option is to manually search for the latest folder, but this seems hard to generalize across platforms: unix and win32. Including a variable platform might be an option, but it comes at a cost: complexification of the code and maintance of such code is more prone to errors.

Integrate "sequence" repo into "GL"

@chyalexcheng Explain new implementation and the call_back to Luisa
@luisaforozco Integrate "sequence" into "GL"

GL packaging

@Retiefasuarus

Python wheel (making sure it works for different Python versions)
Write a simple tutorial in Jupyter notebook to be used on co-lab

Merge RNN into grainLearning

Add the relevant RNN code from sequences repo to this repo, following these steps so that the history is maintained.
Will be done in its own branch.

Unwanted persistance in interative notebooks

The variables used in class attributes belong to the particular class and not the object. Python class attributes are shared between the same instance. Therefore, re-initializing a Python object will not reset it's class attributes within the same instance. This is unwanted behavior for interactive problems.

Proposed solution:
Move class attributes to class variables of all classes within Grain Learning.

Bias introduced by GMM

GMM introduces bias to the sampler. Two ways to remedy this.

slice sampling

train gmm
define the lowest score from the trained gmm on expanded samples as the threshold
generate samples from a low-discrepancy sequence (which one has the property of not being affected by cutting or truncating?)
include those higher than the threshold

restore the uniform prior before the data assimilation loop

Documentation

Move old docu to the new version

Remove python version 3.8 from workflows and released wheels

Hello all,
as many projects now have done, maybe we should consider removing python 3.8 from the tested python versions in the CI/CD.
We also see some workflows failing because some dependencies are not generating anymore the binaries (python wheels) for such python version.

If the maintainers agree with this change I can propose making the changes myself.
I tag @chyalexcheng @Retiefasuarus here 😃

Update Python version to 3.12

In pyproject.toml change dependency to python = ">=3.8,<3.12"
Workflows to build pip package for Python 3.12

IODynamicClass redundant init variables

Is obs_data, sim_data necessary for the IODynamic class if it reads files?
@chyalexcheng

    def __init__(
        self,
        sim_name: str,
        sim_data_dir: str,
        sim_data_file_ext: str,
        obs_data_file: str,
        obs_names: List[str],
        ctrl_name: str,
        num_samples: int,
        param_min: List[float],
        param_max: List[float],
        obs_data: np.ndarray = None,
        ctrl_data: np.ndarray = None,
        inv_obs_weight: List[float] = None,
        sim_data: np.ndarray = None,
        callback: Callable = None,
        curr_iter: int = 0,
        param_data_file: str = '',
        param_data: np.ndarray = None,
        param_names: List[str] = None,
    ):

Automatically run a hyperparameter tuning

Running tutorials/data_driven/LSTM/hyperbola_calibration_mixed_hypertuning.py receives the following error.

400 response executing GraphQL.
{"errors":[{"message":"400 Bad Request: The browser (or proxy) sent a request that this server could not understand.","path":["upsertSweep"]}],"data":{"upsertSweep":null}}
wandb: ERROR Error while calling W&B API: 400 Bad Request: The browser (or proxy) sent a request that this server could not understand. (<Response [400]>)

Functions relevant to this are my_training_function, hyper_train(), and lines 112

Correlation metrics among other data analytics tools

We should add some functionalities to analyze the data, basically (parameters) samples and weights on them.
@luisaforozco , I remember you had some correlation matrix plots when working on the CNN. Do you think we can reuse some of them? Feel free to bring in your idea as well.