GithubHelp home page GithubHelp logo

grainlearning / grainlearning Goto Github PK

View Code? Open in Web Editor NEW
9.0 2.0 1.0 138.58 MB

A Bayesian uncertainty quantification toolbox for discrete and continuum numerical models of granular materials, developed by various projects of the University of Twente (NL), the Netherlands eScience Center (NL), University of Newcastle (AU), and Hiroshima University (JP).

Home Page: https://grainlearning.readthedocs.io/

License: GNU General Public License v2.0

Python 15.35% Shell 0.05% PureBasic 1.07% Jupyter Notebook 83.53%
bayesian-inference low-discrepancy-sequences mixture-models parameter-identification sequential-monte-carlo uncertainty-quantification

grainlearning's Introduction

Welcome to GrainLearning!

fair-software.eu recommendations Badges
code repository github repo badge
license github license badge
community registry RSD workflow pypi badge
citation DOI
Best practices checklist workflow cii badge
howfairis fair-software.eu
Documentation Documentation Status
Code Quality Coverage Quality Gate Status
Paper DOI

Bayesian uncertainty quantification for discrete and continuum numerical models of granular materials, developed by various projects of the University of Twente (NL), the Netherlands eScience Center (NL), University of Newcastle (AU), and Hiroshima University (JP). Browse to the GrainLearning documentation to get started.

Features

Installation

Install using poetry

  1. Install poetry following these instructions.
  2. Clone the repository: git clone https://github.com/GrainLearning/grainLearning.git
  3. Go to the source code directory: cd grainLearning
  4. Activate the virtual environment: poetry shell
  5. Install GrainLearning and its dependencies: poetry install

Install using pip

  1. Clone the repository: git clone https://github.com/GrainLearning/grainLearning.git
  2. Go to the source code directory: cd grainLearning
  3. Activate the virtual environment: conda create --name grainlearning python=3.11 && conda activate grainlearning
  4. Install GrainLearning and its dependencies: pip install .

Developers please refer to README.dev.md.

To install GrainLearning including the RNN module capabilities check grainlearning/rnn/README.md.

For Windows users

  • Installation using Windows Subsystem for Linux (WSL)
    • Enable WSL1 or WSL2 according to the instructions here
    • Install GrainLearning using poetry or pip
  • Installation using anaconda (if no WSLs are available on your Windows system)
    • Open Anaconda Prompt and install GrainLearning using pip. This should create a virtual environment, named GrainLearning.
    • Choose that environment from your anaconda navigator: click Environments and select grainlearning from the drop-down menu

One command installation

Stable versions of GrainLearning can be installed via pip install grainlearning However, you still need to clone the GrainLearning repository to run the tutorials.

Tutorials

  1. Linear regression with the run_sim callback function of the BayesianCalibration class, in python_linear_regression_solve.py

  2. Nonlinear, multivariate regression

  3. Interact with the numerical model of your choice via run_sim , in linear_regression_solve.py

  4. Load existing DEM simulation data and run GrainLearning for one iteration, in oedo_load_and_resample.py

  5. Example of GrainLearning integration into YADE

  6. Data-driven module tutorials:

Citing GrainLearning

Please choose from the following:

  • DOI A DOI for citing the software.
  • The software paper: Cheng et al., (2024). GrainLearning: A Bayesian uncertainty quantification toolbox for discrete and continuum numerical models of granular materials. Journal of Open Source Software, 9(97), 6338, 10.21105/joss.06338
  • H. Cheng, T. Shuku, K. Thoeni, P. Tempone, S. Luding, V. Magnanimo. An iterative Bayesian filtering framework for fast and automated calibration of DEM models. Comput. Methods Appl. Mech. Eng., 350 (2019), pp. 268-294, 10.1016/j.cma.2019.01.027

Software using GrainLearning

Community

The original development of GrainLearning is done by Hongyang Cheng, in collaboration with Klaus Thoeni , Philipp Hartmann, and Takayuki Shuku. The software is currently maintained by Hongyang Cheng and Stefan Luding with the help of Luisa Orozco and Retief Lubbe. The GrainLearning project receives contributions from students and collaborators.

Help and Support

For assistance with the GrainLearning software, please create an issue on the GitHub Issues page.

Credits

This package was created with Cookiecutter and the NLeSC/python-template.

grainlearning's People

Contributors

apjansen avatar chyalexcheng avatar klausthoeni avatar luisaforozco avatar pabrod avatar retiefasuarus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

retiefasuarus

grainlearning's Issues

Make predictions using only input and param data

Seems like the function predict_batch, formerly predict_macroscopic is taking a Tensorflow dataset, from which only the first element ({'inputs': self.input_data, 'params': param_data}) is used. The "inputs" entry of that dictionary contains num_samples copies of the same input sequence, which could be optimized.

Maintenance of poetry.toml

Project requires maintenance of dependencies addressing pyproject.toml:

  • Isolate core code from developer dependencies, RNN, plotting and interactive

See poetry documentation of optional groups

  • Update dependencies with poetry update
  • packages = [ { include = "grainlearning"}] should be under [tool.poetry]

Feedback on GrainLearning from Balazs

  • The need for deterministic parameter sampling
    • after each iteration
    • independent GL run
  • Need for constrains during parameter sampling
    • Situations when parameters have constraints v0 < v1 < v2
    • Possibility for users to define their own resample function
  • Common interface as other codes
    • Example scipy or matlab
      def run_sim(batch_parameters,...):
          # loop within the code
          return batch of samples
      
    • Or at least have it well documented and explained
  • Overall improve documentation. More documentation etc.
  • A way to access the current iteration (system linked with calibration module?)
  • On loading previous proposal - an example how it can be done in run_sim
  • Add link to new GL repository on old one

call backs to DEM software missing in new GL

TODOs

  • Name of the parameters
  • call back functions to YADE, MercuryDPM, LIGGGHTS, etc
  • I/O
  • call back to use GL as a postprocessing tool only
  • Directory structure to ensure backward compatibility

JOSS Review: Paper Comments

Hi! I’m reviewing your paper for JOSS and opening small issues as I come across them. This issue corresponds to my comments on the manuscript.

The paper hits all of the desired points by JOSS, but I personally found it to be very terse and not informative about the software. I think substantially more detail should be added to the paper such that, when cited by researchers, a reader can briefly look at the paper and understand the functionality and design of the software. I feel comfortable requesting this change as you have very extensive documentation and I feel that using some of that content to formally present the software in the paper is not too arduous of a task.

Below are my comments that need to be addressed in the revised paper

  1. While the summary does a good job at setting the context of the field, I find the summary to be lacking in describing what GrainLearning does. Could you please add another two sentences or so highlighting the broad strokes of what GrainLearning does, beyond “emulating granular behavior” and “learning the uncertainties?”
  2. The ‘State of the Field’ section reads more like what I would expect for the ‘Statement of Need’. I think you should merge these and shorten.
  3. I don’t believe lines 32 - 40 should be in the statement of need as they describe the functionality of the software.
  4. The “functionality” section of the paper is highly tailored for a narrow scientific community, and I think obscures the very cool capabilities of the model. I would greatly appreciate if, at a minimum, you could provide a software architecture diagram that highlights the GarinLearning data model. This will allow a quick reader to glance at the paper and understand what GrainLearning takes in and what it puts out.
  5. It is not clear to me what the authors mean by “calibration” in their functionality statement. This may be because I am not in-field, but a definition would be useful. My reading of this statement does not jibe with the type of model calibration I am used to in Bayesian model evaluation.
  6. I also would appreciate a statement of contribution of the authors. I’m very happy that all authors are listed as co-first authors, but I think with this many, a statement of who-did-what would be useful. If all authors equally contributed to writing/coding/testing/deploying/etc, great! Having that written down would only make it more clear that this was a highly collaborative effort.

Error handling in invalid simulations

I'd like to open a discussion possible techniques of handling errors when simulations results do not make sense.

For example:
a DEM simulation with a certain combination of parameters might become unstable and its output may be considered invalid (Nan, Inf etc.) what are some techniques in Grain Learning to handle this situation without restarting the whole calibration process.

Proposed solutions:

  • We can resample parameters from the same distribution with an error handling callback function
  • Let SMC identify invalid simulations and reject those solutions in the calculation of the posterior distribution

Clean up RNN branch

  • Remove unnecessary files
  • Remove/merge duplicate boilerplate files
  • Change file structure to merge with grainlearning
  • Update requirements
  • etc

Generalization of RNN module

RNN module is specific for triaxial compression:
The data needs to be in groups pressure - experiment_type, and the values of such subgroups are also added to the inputs of the model.
Further evaluation of the amount of work and specific tasks that need to be tackled will follow.

Installation with IDEs

We see in the TUSAIL school quite a few students use IDEs like visual studio and Pycharm.
We may want to include instructions on how to install within these IDE.

Handle paths using Pathlib and/or os.path.join

Since GL is a package that is going to be used by users in different platforms i.e. windows, linux, macOS, it is necessary that the paths will be platform independent. Currently GL works in macOS and linux but it wouldn't in windows, where the paths are with single or double \ instead of /.

To solve this I propose:

use pathlib:

from pathlib import Path
data_folder = Path("source_data/text_files/")
file_to_open = data_folder / "raw_data.txt"

# It also works concatenating multiple strings on the fly:
>>> a_string = "path/to/something"
>>> a_path = Path(a_string+"/new/directory/type")
PosixPath('path/to/something/new/directory/type')

# and adding a str to a Path
>>> a_path / "new/string/path"
PosixPath('path/to/something/new/directory/type/new/string/path')

and/or also use os.path.join

import os
data_folder = os.path.join("source_data", "text_files")
file_to_open = os.path.join(data_folder, "raw_data.txt")

# This also works concatenating a string and a pathlib Path
>>> ab = Path("test/one")
>>> os.path.join(ab,"other/test/")
'test/one/other/test/'

Add possibility to have multiple control variables

In some cases, the control variables (system.ctrl_data) are not 1 dimensional.
They could treated in the same way as observations (system.obs_data) are i.e. list instead of single value.

Also in this same issue: Should we update the documentation to make it clear that for both system.ctrl_data and system.obs_data the last dimension is time?

RNN integration to GL

The RNN module is made for the triaxial dataset. It should be generalized to

  • take DynamicSystem.ctrl_data (input), DynamicSystem.param_data (input), andDynamicSystem.sim_data (output) coming out of GL directly
  • #69
  • #71
  • Allow a mixed use of machine learning and model-based

Improving callback functions

I'd like to open a discussion on how we can improve callback functions:

  1. The way simulation data is set is non-standard, or different from other optimization libraries (scipy etc.). A suggestion is to take parameters as an input and output simulation data.
  2. In connection to 1, if more functionality is required. Should additional callback functions (error handling, etc, post or pre callback) may be introduced?
  3. Located in the DynamicSystems class, it is not possible to access additional information from other classes (like inference)
  4. If the user require special functionality, documention how it can be done and possible common cases should be made. For example writting a new BayesianCalibration.run() function.

in Windows wandb doesn't generate latest-run

When trying to merge RNN to GrainLearning the CI/CD showed that the all tests were passing for linux, macOS but not for windows.
I debugged it in a windows machine and found that the issue comes from wandb (see reported issue).
The error is specifically at unit test test_rnn_model.py/test_train when:

assert Path("wandb/latest-run/files/model-best.h5").exists()

Indeed, in windows, the simlink to latest-run is not automatically created by wandb.

Other options, provided by wandb, to access the files of the latest run include:

  • Syncing the runs to wandb and then retrieving the run with the closes creation date.
  • Creating a docker container to have dry-runs locally in your machine.
    Both options are an overkill for unit tests.

A dirty option is to manually search for the latest folder, but this seems hard to generalize across platforms: unix and win32. Including a variable platform might be an option, but it comes at a cost: complexification of the code and maintance of such code is more prone to errors.

GL packaging

@Retiefasuarus

  • Python wheel (making sure it works for different Python versions)
  • Write a simple tutorial in Jupyter notebook to be used on co-lab

Unwanted persistance in interative notebooks

The variables used in class attributes belong to the particular class and not the object. Python class attributes are shared between the same instance. Therefore, re-initializing a Python object will not reset it's class attributes within the same instance. This is unwanted behavior for interactive problems.

Proposed solution:
Move class attributes to class variables of all classes within Grain Learning.

Bias introduced by GMM

GMM introduces bias to the sampler. Two ways to remedy this.

  1. slice sampling
  • train gmm
  • define the lowest score from the trained gmm on expanded samples as the threshold
  • generate samples from a low-discrepancy sequence (which one has the property of not being affected by cutting or truncating?)
  • include those higher than the threshold
  1. restore the uniform prior before the data assimilation loop

Remove python version 3.8 from workflows and released wheels

Hello all,
as many projects now have done, maybe we should consider removing python 3.8 from the tested python versions in the CI/CD.
We also see some workflows failing because some dependencies are not generating anymore the binaries (python wheels) for such python version.

If the maintainers agree with this change I can propose making the changes myself.
I tag @chyalexcheng @Retiefasuarus here 😃

IODynamicClass redundant init variables

Is obs_data, sim_data necessary for the IODynamic class if it reads files?
@chyalexcheng

    def __init__(
        self,
        sim_name: str,
        sim_data_dir: str,
        sim_data_file_ext: str,
        obs_data_file: str,
        obs_names: List[str],
        ctrl_name: str,
        num_samples: int,
        param_min: List[float],
        param_max: List[float],
        obs_data: np.ndarray = None,
        ctrl_data: np.ndarray = None,
        inv_obs_weight: List[float] = None,
        sim_data: np.ndarray = None,
        callback: Callable = None,
        curr_iter: int = 0,
        param_data_file: str = '',
        param_data: np.ndarray = None,
        param_names: List[str] = None,
    ):

Automatically run a hyperparameter tuning

Running tutorials/data_driven/LSTM/hyperbola_calibration_mixed_hypertuning.py receives the following error.

400 response executing GraphQL.
{"errors":[{"message":"400 Bad Request: The browser (or proxy) sent a request that this server could not understand.","path":["upsertSweep"]}],"data":{"upsertSweep":null}}
wandb: ERROR Error while calling W&B API: 400 Bad Request: The browser (or proxy) sent a request that this server could not understand. (<Response [400]>)

Functions relevant to this are my_training_function, hyper_train(), and lines 112

Correlation metrics among other data analytics tools

We should add some functionalities to analyze the data, basically (parameters) samples and weights on them.
@luisaforozco , I remember you had some correlation matrix plots when working on the CNN. Do you think we can reuse some of them? Feel free to bring in your idea as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.