GithubHelp home page GithubHelp logo

modeci / mdf Goto Github PK

View Code? Open in Web Editor NEW
37.0 5.0 70.0 103.34 MB

This repository contains the source for the MDF specification and Python API

Home Page: https://mdf.readthedocs.io

License: Apache License 2.0

Python 99.01% Shell 0.99%
machine-learning neuroscience onnx

mdf's Introduction

mdf logo

Actions Status PyPI version Documentation Status Code style: black

ModECI Model Description Format (MDF)

Click here for the full MDF documentation

Note: MDF is still in development! See the open issues related to the specification or go here to get in contact regarding MDF. The MDF format was first proposed following a meeting organised at Princeton in July 2019 by Russ Poldrack of the Center for Reproducible Neuroscience (CRN) at Stanford and the Brain Imaging Data Standard (BIDS) initiative. For more on the previous work in this area, see here.

Overview

MDF is an open source, community-supported standard and associated library of tools for expressing computational models in a form that allows them to be exchanged between diverse programming languages and execution environments. The overarching aim is to provide a common format for models across computational neuroscience, cognitive science and machine learning.

It consists of a specification for expressing models in serialized formats (currently JSON, YAML and BSON representations are supported, though others such as HDF5 are planned) and a set of Python tools for implementing a model described using MDF. The serialized formats can be used when importing a model into a supported target environment to execute it; and, conversely, when exporting a model built in a supported environment so that it can be re-used in other environments.

The MDF Python API can be used to create or load an MDF model for inspection and validation. It also includes a basic execution engine for simulating models in the format. However, this is not intended to provide a efficient, general-purpose simulation environment, nor is MDF intended as a programming language. Rather, the primary purpose of the Python API is to facilitate and validate the exchange of models between existing environments that serve different communities. Accordingly, these Python tools include bi-directional support for importing to and exporting from widely-used programming environments in a range of disciplines, and for easily extending these to other environments.

Development

The implementation and dissemination of the MDF language and associated tools is being carried out by the Model Exchange and Convergence Initiative (ModECI), which has been supported by the NSF Convergence Accelerator Program (Track D: AI-Driven Innovation via Data and Model Sharing), as a publicly accessible open-source project. The initial design has been informed by a series of workshops involving developers of key software environments and other stakeholders in machine learning, cognitive science and neuroscience. Future workshops will address broadening of support to other domains in basic and applied science and technology development (e.g., population biology, medical informatics, structural and environmental monitoring, and complex systems control). Environments for which support is currently being developed include PyTorch, ONNX, WebGME, NeuroML, PsyNeuLink, and ACT-R.

mdf interactions
Fig 1: Some of the current and planned formats which MDF will interact with. Click on the image for more information.

Successful interfacing of MDF to existing disciplinary standards (such as ONNX in machine learning, and NeuroML in neuroscience) as well as general-purpose simulation environments (such as WebGME) will permit bridging between these environments, and translation to the broader set of environments supported by those standards (such as Tensorflow & Keras in the case of ONNX, and The Virtual Brain and SONATA in the case of NeuroML). Initial investigations have also taken place, in collaboration with projects in the NSF Accelerator Track C (Quantum Technology), to use MDF for facilitating the implementation of computational models on quantum hardware.

The core elements of the MDF standard

Models The highest level construct in MDF is a model that consists of one or more graphs and model attributes. The former describe the operational features of the model (its structure and execution), while the latter provide additional information (metadata) useful for executing, evaluating, testing or visualizing it.

Graphs A graph specifies the structure and process flow of a model. The most fundamental element of a graph is a node, which specifies some unit of computation in terms of its parameters and functions. Nodes are connected to other nodes via directed edges, which, in the absence of additional conditions, define the computational flow of the model.

Nodes These define the core elements of computation in a graph, that receive and transmit information via their input and output ports. In general, ports represent points of contact between a node and the edgesthat connect it to other nodes.

Output Ports An output port is the starting point of the data transmission process. After processing the information in a node, an output port is used to begin the transmission of information to the next node through edges.

Edges These transmit information from the output port of one node to the input port of another, collectively defining a graph’s topography. Edges may contain weights that can operate on the information they carry.

Input Ports An input port is the endpoint of the data transmission process. It receives the information transmitted through an edge and inputs it to the next node for further processing.

Conditions These are a core and distinctive element of the MDF specification, that complement other computational graph-based formats by providing a high-level set of descriptors for specifying conditional execution of nodes. This allows models with relatively complex execution requirements (e.g., containing cycles, branches, and/or temporal dependencies) to be expressed as graphs in a sufficiently abstract form that facilities exchange among high-level modeling environments without requiring that they be “lowered” to and then recovered from more elaborated procedural descriptions.

Parameters Attributes that determine the configuration and operation of nodes and edges, can be defined in the MDF using parameters. In the case of parameters specifying large data structures (e.g., weight-matrices), arrays in widely used formats (e.g. numpy arrays, TensorFlow tensors) can be used, and serialisation in portable binary formats (e.g. BSON) is supported. Parameters can either be fixed values, which don't change when the node is executed, or can change over time (stateful parameters).

Functions A single value which is evaluated as a function of values on input ports and other functions and parameters. A key distinction with parameters is that a function is always stateless.

Model metadata There is the ability to add “metadata” to the model, graph, nodes and many of their sub elements which provide additional information about that element. While the metadata should not be essential to the mathematical description of the behavior/structure of the element, it could be useful for human interpretability of its function/purpose, or used when it is mapped to a specific application for simulation/visualization. Metadata can be added to the top level model to specify contact information, citations, acknowledgements, pointers to sample data and benchmark results, and environments in which the specified model was originally implemented and any that have been validated to support its execution.


Fig 2: A simple graph with 3 nodes and 2 edges expressed in MDF.


Fig 3: This graph illustrates the ability to specify behavior that extends beyond the directed flow through the graph. Here, Node 1 generates a random number and transmits that number to Node 2. Node 2 will only run if the number it receives from Node 1 is greater than 10.

Installation

Requirements

Requires Python >= 3.7

Quick start

pip install modeci-mdf

For more detailed installation instructions see here.

For guidelines on contributing to the development of MDF, see here.

Examples

To get started, follow the simple example in a Jupyter notebook here

Multiple examples of serialized MDF files, the Python scripts used to generate them, as well as mappings to target environments can be found here.

mdf's People

Contributors

29riyasaxena avatar davidt0x avatar esraa-abdelmaksoud avatar fatimaarshad-ds avatar ivy8127 avatar jdcpni avatar jeremyrl7 avatar kmantel avatar kusanele avatar megha-bose avatar monsurat-onabajo avatar mqnifestkelvin avatar mraunak avatar onoyiza avatar parikshit14 avatar patrickstock avatar pgleeson avatar rimjhimittal avatar rpradyumna avatar sakshikaushik717 avatar sanjayankur31 avatar shanka123 avatar shivani6320 avatar singular-value avatar somyagr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mdf's Issues

Rename modeci_mdf to modeci

Rename top-level package from modeci_mdf to modeci. mdf is already a sub-module within. (actually called MDF now but will be lowercased if we conform to PEP8) With this I can setup a modeci pypi package as well and setup automatic publish on tagged releases.

Add stateful nodes

Currently Nodes are stateless, they just calculate outputs and intermediate function values from the current inputs. There should be a way of assigning a variable inside a node as a state (can be a scalar, string or array), which can change over "time" as the graph is evaluated.

Initial work on this here: https://github.com/ModECI/MDF/tree/states

The value of a state can change by:

  1. specifying a value for the state, which depends on inputs, functions as well as the previous value of the state variable
  2. when an internal condition is met (e.g. input above a certain value) the state is (discontinuously) changed. External conditions might be supported, but for now, assume sufficient information is coming in via edges to the node to make the decision.
  3. specifying a TimeDerivative on the node which specified how a state changes with time (example in LEMS)

Obviously this is related to other open issues such as #30

Turn on branch protection

We should setup branch protection for branches development and main.

For now at least enable Require status checks to pass before merging on both development and main. I don't seem to have rights to do this @pgleeson on the repo, could you handle it.

JSON serialization of fails on bool and NoneType

See failing tests on: cf826e2.

NoneType and bool values are causing JSON serialization errors. I think this bug is somewhere in neuromllite @pgleeson. Python's json serialization supports this, are we not using it?

import json
json.dumps({'test1': None, 'test2': False, 'test3': True})
Out[41]: '{"test1": null, "test2": false, "test3": true}'

Typing

Decide whether parameters will be typed and develop a specification for typing.

JSON vs. YAML

Decide whether to use JSON or YAML for MDF files.

Convert State to Stateful Parameter

Overview

Bring in the concept of Stateful Parameters instead of State. The Node object will have Function and Parameter, and further Parameter is categorized as Stateful and fixed Parameter. Stateful Parameter stores the value and changes with time.

Step1: Create Stateful Parameter

  • Change the time derivative to change in value of parameter and output the updated value
  • Rename existing scheduler to execution_engine to perform calculation
  • Link with standalone repo of scheduler for scheduling the next execution
  • Pull to development branch

Step2: Write the Translator

  • Convert dv/dt to incremental format
  • Define a translator that converts states file into stateful parameters file

Include validation/testing as a core part of the spec?

Per discussion with the core team, one problem that's repeatedly come up is what to do about discrepancies in output. It's a foregone conclusion that different simulators/environments will produce different outputs at some level of granularity (if only because of floating point arithmetic differences), so the goal can't realistically be perfect reproducibility across all environments. BUT we do want to have some way of expressing what the intended/expected behavior is. E.g., if you build a model in NEURON, it might not give you exactly the same result if executed in PsyNeuLink, but the person who wrote the model (or someone else) should be able to say "we consider the result valid if, given inputs like this, the outputs look roughly like this, and the fitted/learned model parameters are in this range."

The suggestion here is to have some basic testing/validation annotations be a formal part of the specification. There are many ways to implement this, but the following seems like a reasonable set of core features we may want to include (or at least, discuss here):

  • This should be entirely optional (i.e., including a validation specification does not render a model document invalid), but strongly encouraged.
  • The validation specification should not be tied to any particular environment. I.e., it shouldn't say "you need to run the following routine in NEURON with these arguments in order to determine whether the results are valid. We'll probably want to have a separate subsection where users can indicate what environment the model was developed/run in, but that's a separate concern—the idea here is to have an environment-agnostic specification that formalizes what conditions a user deems to constitute a success.
  • Have a simple way to define discrete test cases. I.e., the approach could be analogous to standard software testing procedures, where you define example inputs, and then state what conditions have to hold on the output and learned model parameters.
  • Make reference to named parameters to the degree possible. For any output variables or internal model parameters that have named, it becomes easy (at least in principle) to specify simple conditions. E.g., "50 < node65.spikeRate < 100".
  • The set of available assertions would (at least initially) be very limited; it's probably not reasonable to let people, e.g., specify that the K-L divergence of some parameter has to be within X of a normal distribution. Limiting to basic logical/comparison operators, and maybe some really basic descriptive statistics (mean, variance, etc.) seems reasonable.

Putting those ideas together, we might have something like this (this is just intended to convey the idea, not to imply any commitment to this kind of structure/syntax):

{
    "tests": [
        {
            "inputs": {
                "data1": [
                    4,
                    3,
                    1,
                    5
                ],
                "data2": "my_critical_data.txt",
                "epochs": 300
            },
            "assertions": {
                "outputs": [
                    "all(output1) >= 0",
                    "8.3 <= mean(output1) <= 8.6"
                ],
                "parameters": [
                    "50 <= node1.spikeRate <= 100"
                ]
            }
        },
        {
            ...
        }
    ]
}

Update main README

The main README file should be updated with a more concise description of the main elements of the language, and pointers to the main (autogenerated) technical specification for the precise definitions of elements and links to working examples. The text worked on by @patrickstock will provide a starting point.

Add a license file

Should mainly cover spec documentation, but also any code in this repo

Overview

Change args to arguments?

In the definition of functions, "args" and "arguments" are mentioned in the spec document. Other names are not abbreviated, so perhaps best to just use "arguments" for this element name?

Black formatting

We can setup black formatting as precommit hook. This can be configured to format code automatically.

Black Code Style

@pgleeson, I know you had some issues with this being automated. Could you take a look and tell me whether this sounds acceptable? If not, we can use flake8 (pycodestyle) but then people will have to fix the flake8 formatting issues manually.

Conform to PEP8 module name conventions

As per PEP 8, can we convert python files\modules to all lower case, with optional separation of words by underscore? I can handle this refactor in a separate pull request if desired.

Change top level element from graphs to unique id of model and add version info

In line with #18, I propose to make the top level entry of the specification the id of the model, referencing 1) the graph(s) it contains and 2) the version of the language used:

{
    "Simple": {
        "format": "ModECI MDF v0.1",
        "graphs": {
            "simple_example": {
                 ...
           }
     }
}

This is in keeping with the proposal in #18 that everything with an id is referenced in the same way, and the fact that models can in theory contain multiple graphs, and so one overall id should be used for the "model" as opposed to its constituent graph(s).

Sphinx Documentation

Overview

Sphinx is document generator that converts text files, markdown files to HTML/PDF etc. formats

restructuredText, Markdown -> Sphinx -> HTML

  • cd docs
  • pip install -r requirements.txt
  • Output Format: html file

Python API
mdf.py, execution_engine.py, standard_functions.py, onnx_functions, interfaces.pytorch.exporter

Included readme files

Features

Google style Python docstrings is used for documenting python files. It requires sphinx napoleon extension
Napoleon is a pre-processor that parses Google style docstrings and converts them to reStructuredText before Sphinx attempts to parse them.

html theme : "sphinx read the doc theme"

file extensions supported currently:

".rst": "restructuredtext",
".txt": "restructuredtext",
".md": "markdown"

Intersphinx_mapping: Create automatic links to the documentation of modules and objects in other standard documents such as numpy, python

'python': ('https://docs.python.org/', None)
'numpy': ('https://numpy.org/doc/stable', None)

Working On

  • Sphinx Contrib Versioning-build Sphinx docs for all versions of the project
  • Working on incorporating all readme files from different modules
  • Automatic generation of links to map standalone repo of scheduler
  • Graphviz in sphinx

Reference

https://www.sphinx-doc.org/en/master/
https://sphinxcontrib-versioning.readthedocs.io/en/latest/tutorial.html

A single way to handle ids of elements in dictionaries

Currently there is a mixture of ways of handling how lists of subelements with unique ids are defined in the format in the specification document.

The set of graphs is currently specified as a list, i.e. [], of entries each with a name, but args is a dict, i.e. {} with ids of args as keys for dict values for the argument properties.

I propose to use the latter format for all entries which have multiple children with ids/names, e.g.

{
    "Simple": {
        "graphs": {
              "simple_example": {
                    "nodes": {
                        "input_node": {
                            "parameters": {
                                "input_level": 0.5
                            },
                            "output_ports": {
                                "out_port": {
                                    "value": "input_level"
                                }
                            }
                        },
                        "processing_node": {
                            "parameters": {
                               ...
                            },
                            "input_ports": {
                               ...
                            },
                            "functions": {
                                "linear_1": {
                                   ...
                                },
                                "logistic_1": {
                                    ...
                                }
                            },
                        }
                    },
                    "edges": {
                        "input_edge": {
                            "sender": "input_node",
                            "receiver": "processing_node",
                            "sender_port": "out_port",
                            "receiver_port": "input_port1"
                        }
                    }
                }
...

This has the advantages:

  • a single way to access any element by its id
  • removes the mix of lists/dicts in the spec
  • simplifies serialisation to YAML etc.
  • already what the python API writes/reads...

Error while loading pnl json file using mdf_load method

load_mdf(PNL.json')-->error

for exampe: load_mdf('SimpleBranching-conditional.json')

Error while parsing the attributes of objects(Graph, Node, InputPort, OutputPort, State/Stateful_Parameter)

The reasons for the error:

  • In PNL some attributes are defined for an object which is not included in allowed fields/children in mdf.py for that object
    e.g. in PNL, Graph has 'controller', but Graph in mdf doesn't have controller, similarly in PNL, Edge has attribute 'functions' but in mdf Edge has parameters, sender, receiver, sender_port, receiver_port as allowed fields but not functions.

  • methods used for parsing are _parse_elements and _parse_attributes, for parsing an Object, it requires dictionary format but for InputPort, OutputPort in PNL it is list.

  • key error comes when an attribute for an object is included in PNL json but it is not a part of allowed_fields/children for that object in mdf.

Either mdf.py and parsing methods can be altered or changes can be made in PNL json

Update Graphviz export to show conditions

@mraunak I've tested the graphviz export and it does actually manage to load and convert simple examples with conditions and generate a graph, see here: https://github.com/ModECI/MDF/tree/main/examples/MDF#conditions-example

Note it doesn't actually say what the conditions on the nodes and graph actually are, they can be seen in the yaml for example though: https://github.com/ModECI/MDF/blob/main/examples/MDF/abc_conditions.yaml#L7

https://github.com/ModECI/MDF/blob/main/src/modeci_mdf/interfaces/graphviz/importer.py should be updated to add this info, nicely formatted. The image can be regenerated easily with the -graph option in this file: https://github.com/ModECI/MDF/blob/main/examples/MDF/abc_conditions.py#L104

Can you look into this @mraunak? Also, it would good to test somethign similar with the MDF examples generated from @kmantel's PsyNeuLink examples, e.g. https://github.com/ModECI/MDF/blob/main/examples/PsyNeuLink/SimpleBranching-conditional.json, to be able to put these on the PNL readme.

Refactor API for use as Python package

Currently the demo code for writing MDF is all at top level in this repo. Ideally it should be made into an installable python package for use elsewhere, e.g. in the code @davidt0x has created for mapping to/from ONNX: https://github.com/ModECI/MDFTests/tree/main/ONNX/onnx-mdf

In the longer term the Python API could have its own repo, but for simplicity as the language is in development, the core spec/examples/api can all stay here.

The simplest name of the package would be "mdf". However... there is a python package already called MDF: https://pypi.org/project/mdf/ with a scope not a million miles from this...

So... call it "modeci-mdf"?

MDF scheduling wrappers

One thing that would be nice though is if there was a wrapper class around the PNL specific classes/fields so that all the terms used (trial/timescale/sequence) are MDF's own and can be changed/updated easily on our side without impacting PNL. Also the "innards" of that wrapper class can eventually substituted with a MDF specific implentation (or a standalone schedluer) without too much additional changes...

Thinking a bit more about this I feel this file (modeci_mdf/scheduler.py) is the proper place for the "wrapper" class/fields (i.e. the MDFScheduler) which hides the specifics of PNI in an MDF specific API, and the EvaluableGraph could be extracted out & just use that API...

Originally posted by @pgleeson in #60 (comment), #60 (comment)

Required (and optional) dicts not appearing in json/yaml

A basic example in of
mod_graph0 = Graph(id="Test")

produces the json

{
    "Test": {}
}

and yaml

Test: {}

which are missing entries "nodes": {} and "edges": {} . These are listed as required in the README spec. "parameters": {} and "conditions": {} are also missing, but these are listed as optional. Similarly, parameters is missing from Node if unspecified, although this isn't explicitly mentioned in the spec as required or not.

We should either update the to_json and to_yaml methods for Graph (or the underlying neuromllite classes) to always generate nodes and edges entries, or update these in the spec as optional. If we do the former, I think we should also consider including empty entries for parameters and conditions as well.

Move package to src/

It is common to move the root of package to src/. That is modeci_mdf should be src/modeci_mdf. This helps make sure that testing is done against installed versions of the package always and not the local copy. Here is a good description of why this is a good idea. As per convention, developers can install local editable installs with:

pip install -e .[dev]

This ensures that modifications made to the local directory are reflected immediately in the installed package without having to reinstall. Without the -e argument, pip will copy the src to site-packages and import modeci_mdf will reference the site-packages copy, not your local copy. Developers should install with -e, users should install with simple pip install. The [dev] denotes that extra developer requirements (typically pytest at lease) should be installed.

If there are no objections then I will make this change in a standalone pull request.

Add metadata element to graph, node, edge, etc. for non essential information

Building on the need for simulator specific information embedded in the mdf files (e.g. framework_specific here: #97 (comment)), a more general solution would be to add "metadata" to the core (eventually all) elements, which can contain a dict of information in any form.

  • A cruical point is that any parser/simulator should be able to ignore this info and still fully interpret/simulate the model correctly
  • Some of this might be top level gneral purpose info (e.g. 'color' for a suggested color to display nodes etc) or simulator specific info, e.g. 'PNL', 'NeuroML'

In progress here: https://github.com/ModECI/MDF/tree/feature/metadata

Merging parameter/statefulparameter/functions/derivedvariable to one entity

As an option to resolve the state/stateful_parameter discussions, one option would be to merge all of the different features in these to one entity (probably best called parameter...) which can either be:

  • a constant (like old parameters),
  • evaluated by an inbuilt function (like "function"/derived variable)
  • be evaluated by a freeform expression (as was proposed for "function"/derived variable)
  • have a simple statelike update rule (like state/stateful parameter)
  • have a time derivative (like state)

The conversion from MDF to MDFzero would involve finding the ones with time derivatives, adding dt as a parameter and converting them to simple stateful parameters

Working on this here: https://github.com/ModECI/MDF/tree/parameter_state_merge, e.g.

graphs:
        state_example:
            nodes:
                counter_node:
                    parameters:
                        increment:
                            value: '1'
                        count:
                            value: count + increment
                    output_ports:
                        out_port:
                            value: count
                sine_node:
                    parameters:
                        amp:
                            value: '3'
                        period:
                            value: '0.4'
                        level:
                            default_initial_value: '0'
                            time_derivative: 6.283185 * rate / period
                        rate:
                            default_initial_value: '1'
                            time_derivative: -1 * 6.283185 * level / period
                    output_ports:
                        out_port:
                            value: amp * level

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.