modeci / mdf Goto Github PK

View Code? Open in Web Editor NEW

37.0 5.0 70.0 103.34 MB

This repository contains the source for the MDF specification and Python API

Home Page: https://mdf.readthedocs.io

License: Apache License 2.0

Python 99.01% Shell 0.99%

machine-learning neuroscience onnx

mdf's Issues

Move WebGME example to here

From https://github.com/ModECI/MDFTests/tree/main/mdf_gme to WebGME dir when #64 is merged

Rename ModelGraph to Graph in API

Makes documentation simpler & is more logical

Black formatting

We can setup black formatting as precommit hook. This can be configured to format code automatically.

Black Code Style

@pgleeson, I know you had some issues with this being automated. Could you take a look and tell me whether this sounds acceptable? If not, we can use flake8 (pycodestyle) but then people will have to fix the flake8 formatting issues manually.

Change "BIDS-MDF" to "MDF"

@jdcpni Ok with this change? Need to check with Russ?

Conform to PEP8 module name conventions

As per PEP 8, can we convert python files\modules to all lower case, with optional separation of words by underscore? I can handle this refactor in a separate pull request if desired.

Restructure examples folder

As discussed today. I will make first stab at this...

Weight Matrices and Binary Data

Decide how to handle potentially large amounts of data in models (such as weights).

Simplify JSON format - too many lists/dicts

This issue with the current version of MDF JSON is down to NeuroMLlite and is described here: NeuroML/NeuroMLlite#11

Multiple Functions in a Node

Expand the node specification to allow for multiple functions in a single node.

Specify Node Connectivity using Ports or Edges

Decide if a node's list of input/output ports and their names specifies the node's connections to other nodes, or if the list of edges specifies connectivity between nodes.

Add some info in Quantum dir

The readme at https://github.com/ModECI/MDF/tree/main/examples/Quantum is looking pretty empty.
@rpradyumna is there any (preliminary) info you could add here?

Refactor API for use as Python package

Currently the demo code for writing MDF is all at top level in this repo. Ideally it should be made into an installable python package for use elsewhere, e.g. in the code @davidt0x has created for mapping to/from ONNX: https://github.com/ModECI/MDFTests/tree/main/ONNX/onnx-mdf

In the longer term the Python API could have its own repo, but for simplicity as the language is in development, the core spec/examples/api can all stay here.

The simplest name of the package would be "mdf". However... there is a python package already called MDF: https://pypi.org/project/mdf/ with a scope not a million miles from this...

So... call it "modeci-mdf"?

JSON vs. YAML

Decide whether to use JSON or YAML for MDF files.

Function Specification

Develop a specification for functions in the MDF.

Typing

Decide whether parameters will be typed and develop a specification for typing.

Error while loading pnl json file using mdf_load method

load_mdf(PNL.json')-->error

for exampe: load_mdf('SimpleBranching-conditional.json')

Error while parsing the attributes of objects(Graph, Node, InputPort, OutputPort, State/Stateful_Parameter)

The reasons for the error:

In PNL some attributes are defined for an object which is not included in allowed fields/children in mdf.py for that object
e.g. in PNL, Graph has 'controller', but Graph in mdf doesn't have controller, similarly in PNL, Edge has attribute 'functions' but in mdf Edge has parameters, sender, receiver, sender_port, receiver_port as allowed fields but not functions.
methods used for parsing are _parse_elements and _parse_attributes, for parsing an Object, it requires dictionary format but for InputPort, OutputPort in PNL it is list.
key error comes when an attribute for an object is included in PNL json but it is not a part of allowed_fields/children for that object in mdf.

Either mdf.py and parsing methods can be altered or changes can be made in PNL json

MDF <-> ACT-R pipeline

@jeremyrl7 Can you add details here?

A single way to handle ids of elements in dictionaries

Currently there is a mixture of ways of handling how lists of subelements with unique ids are defined in the format in the specification document.

The set of graphs is currently specified as a list, i.e. [], of entries each with a name, but args is a dict, i.e. {} with ids of args as keys for dict values for the argument properties.

I propose to use the latter format for all entries which have multiple children with ids/names, e.g.

{
    "Simple": {
        "graphs": {
              "simple_example": {
                    "nodes": {
                        "input_node": {
                            "parameters": {
                                "input_level": 0.5
                            },
                            "output_ports": {
                                "out_port": {
                                    "value": "input_level"
                                }
                            }
                        },
                        "processing_node": {
                            "parameters": {
                               ...
                            },
                            "input_ports": {
                               ...
                            },
                            "functions": {
                                "linear_1": {
                                   ...
                                },
                                "logistic_1": {
                                    ...
                                }
                            },
                        }
                    },
                    "edges": {
                        "input_edge": {
                            "sender": "input_node",
                            "receiver": "processing_node",
                            "sender_port": "out_port",
                            "receiver_port": "input_port1"
                        }
                    }
                }
...

This has the advantages:

a single way to access any element by its id
removes the mix of lists/dicts in the spec
simplifies serialisation to YAML etc.
already what the python API writes/reads...

Add a license file

Should mainly cover spec documentation, but also any code in this repo

Overview

Standardization:
- Conventions and development tools: @davidt0x, #21
- Core Python MDF implementation @pgleeson
- ReadMe / spec definition: @pgleeson, @jdcpni #19 #20 #18, etc.
Examples of exchange
- PyTorch -> MDF @davidt0x @patrickstock: #23
- MDF -> Pytorch @patrickstock: simple working available: #24
- MDF -> ONNX @rpradyumna: #25
- ONNX -> MDF @davidt0x #26
- MDF <-> NeuroML @pgleeson #27
- MDF <-> PsyNeuLink @kmantel #28
- MDF <-> ACT-R @jeremyrl7 #29
Process Control
- MDF library tool for parsing conditions into execution flow (i.e., scheduler):
  extend existing execution engine to support conditions @kmantel @pgleeson #30
Function library/ontology/mappings
- Outlined but place held [@rpradyumna?]
Vetting by users
- Vanderbilt

Copy FitzHugh Nagumo example from MDFTests & describe in readme

Sphinx Documentation

Overview

Sphinx is document generator that converts text files, markdown files to HTML/PDF etc. formats

restructuredText, Markdown -> Sphinx -> HTML

cd docs
pip install -r requirements.txt
Output Format: html file

Python API
mdf.py, execution_engine.py, standard_functions.py, onnx_functions, interfaces.pytorch.exporter

Included readme files

Features

Google style Python docstrings is used for documenting python files. It requires sphinx napoleon extension
Napoleon is a pre-processor that parses Google style docstrings and converts them to reStructuredText before Sphinx attempts to parse them.

html theme : "sphinx read the doc theme"

file extensions supported currently:

".rst": "restructuredtext",
".txt": "restructuredtext",
".md": "markdown"

Intersphinx_mapping: Create automatic links to the documentation of modules and objects in other standard documents such as numpy, python

'python': ('https://docs.python.org/', None)
'numpy': ('https://numpy.org/doc/stable', None)

Working On

Sphinx Contrib Versioning-build Sphinx docs for all versions of the project
Working on incorporating all readme files from different modules
Automatic generation of links to map standalone repo of scheduler
Graphviz in sphinx

Reference

https://www.sphinx-doc.org/en/master/
https://sphinxcontrib-versioning.readthedocs.io/en/latest/tutorial.html

Add stateful nodes

Currently Nodes are stateless, they just calculate outputs and intermediate function values from the current inputs. There should be a way of assigning a variable inside a node as a state (can be a scalar, string or array), which can change over "time" as the graph is evaluated.

Initial work on this here: https://github.com/ModECI/MDF/tree/states

The value of a state can change by:

specifying a value for the state, which depends on inputs, functions as well as the previous value of the state variable
when an internal condition is met (e.g. input above a certain value) the state is (discontinuously) changed. External conditions might be supported, but for now, assume sufficient information is coming in via edges to the node to make the decision.
specifying a TimeDerivative on the node which specified how a state changes with time (example in LEMS)

Obviously this is related to other open issues such as #30

JSON serialization of fails on bool and NoneType

See failing tests on: cf826e2.

NoneType and bool values are causing JSON serialization errors. I think this bug is somewhere in neuromllite @pgleeson. Python's json serialization supports this, are we not using it?

import json
json.dumps({'test1': None, 'test2': False, 'test3': True})
Out[41]: '{"test1": null, "test2": false, "test3": true}'

Change args to arguments?

In the definition of functions, "args" and "arguments" are mentioned in the spec document. Other names are not abbreviated, so perhaps best to just use "arguments" for this element name?

Turn on branch protection

We should setup branch protection for branches development and main.

For now at least enable Require status checks to pass before merging on both development and main. I don't seem to have rights to do this @pgleeson on the repo, could you handle it.

Allow serialisation of MDF in a binary format (HDF5, bson or msgpack)

In the same way as MDF can losslessly be serialised as JSON and YAML, it should be possible to serialise MDF as HDF5 using datasets for arrays, groups for other fields and perhaps attributes for dict entries. Would need to be added in NeuroMLlite, e.g. https://github.com/NeuroML/NeuroMLlite/blob/master/neuromllite/utils.py#L68

MDF <-> NeuroML pipeline

MDF -> NeuroML

In development here: https://github.com/ModECI/MDF/blob/main/modeci_mdf/export/NeuroML.py

NeuroML -> MDF

https://github.com/NeuroML/NeuroMLlite/blob/master/neuromllite/MDFHandler.py

Required (and optional) dicts not appearing in json/yaml

A basic example in of
mod_graph0 = Graph(id="Test")

produces the json

{
    "Test": {}
}

and yaml

Test: {}

which are missing entries "nodes": {} and "edges": {} . These are listed as required in the README spec. "parameters": {} and "conditions": {} are also missing, but these are listed as optional. Similarly, parameters is missing from Node if unspecified, although this isn't explicitly mentioned in the spec as required or not.

We should either update the to_json and to_yaml methods for Graph (or the underlying neuromllite classes) to always generate nodes and edges entries, or update these in the spec as optional. If we do the former, I think we should also consider including empty entries for parameters and conditions as well.

Convert State to Stateful Parameter

Overview

Bring in the concept of Stateful Parameters instead of State. The Node object will have Function and Parameter, and further Parameter is categorized as Stateful and fixed Parameter. Stateful Parameter stores the value and changes with time.

Step1: Create Stateful Parameter

Change the time derivative to change in value of parameter and output the updated value
Rename existing scheduler to execution_engine to perform calculation
Link with standalone repo of scheduler for scheduling the next execution
Pull to development branch

Step2: Write the Translator

Convert dv/dt to incremental format
Define a translator that converts states file into stateful parameters file

Add validation data to meta data for validating mdf models

Move package to src/

It is common to move the root of package to src/. That is modeci_mdf should be src/modeci_mdf. This helps make sure that testing is done against installed versions of the package always and not the local copy. Here is a good description of why this is a good idea. As per convention, developers can install local editable installs with:

pip install -e .[dev]

This ensures that modifications made to the local directory are reflected immediately in the installed package without having to reinstall. Without the -e argument, pip will copy the src to site-packages and import modeci_mdf will reference the site-packages copy, not your local copy. Developers should install with -e, users should install with simple pip install. The [dev] denotes that extra developer requirements (typically pytest at lease) should be installed.

If there are no objections then I will make this change in a standalone pull request.

Update README update development section to point to examples for CI testing

Update

Update SimpleScheduler to support Conditions

The SimpleScheduler, which can execute simple MDF models does not currently support conditions: https://github.com/ModECI/MDF#conditions.

The definitions of the python MDF should be updated to support conditions and a basic interpretation/handling of these should be added to the scheduler.

Update Graphviz export to show conditions

@mraunak I've tested the graphviz export and it does actually manage to load and convert simple examples with conditions and generate a graph, see here: https://github.com/ModECI/MDF/tree/main/examples/MDF#conditions-example

Note it doesn't actually say what the conditions on the nodes and graph actually are, they can be seen in the yaml for example though: https://github.com/ModECI/MDF/blob/main/examples/MDF/abc_conditions.yaml#L7

https://github.com/ModECI/MDF/blob/main/src/modeci_mdf/interfaces/graphviz/importer.py should be updated to add this info, nicely formatted. The image can be regenerated easily with the -graph option in this file: https://github.com/ModECI/MDF/blob/main/examples/MDF/abc_conditions.py#L104

Can you look into this @mraunak? Also, it would good to test somethign similar with the MDF examples generated from @kmantel's PsyNeuLink examples, e.g. https://github.com/ModECI/MDF/blob/main/examples/PsyNeuLink/SimpleBranching-conditional.json, to be able to put these on the PNL readme.

Update main README

The main README file should be updated with a more concise description of the main elements of the language, and pointers to the main (autogenerated) technical specification for the precise definitions of elements and links to working examples. The text worked on by @patrickstock will provide a starting point.

Retain the headers before "Overview"
Add links to the individual elements, e.g. https://github.com/ModECI/MDF/blob/main/docs/README.md#node
Add images from @patrickstock's document
Should resolve #18 & #19

MDF -> ONNX pipeline

@rpradyumna Can you add details on status?

MDF -> Pytorch pipeline

@patrickstock Can you add details on status?

Inception.py test failing

@davidt0x I disabled the test on inception.py on development, as it currently fails with an out of memory error on ubuntu: https://github.com/ModECI/MDF/actions/runs/1155523696. Could you have a look when you get a chance?

Rename modeci_mdf to modeci

Rename top-level package from modeci_mdf to modeci. mdf is already a sub-module within. (actually called MDF now but will be lowercased if we conform to PEP8) With this I can setup a modeci pypi package as well and setup automatic publish on tagged releases.

Add metadata element to graph, node, edge, etc. for non essential information

Building on the need for simulator specific information embedded in the mdf files (e.g. framework_specific here: #97 (comment)), a more general solution would be to add "metadata" to the core (eventually all) elements, which can contain a dict of information in any form.

A cruical point is that any parser/simulator should be able to ignore this info and still fully interpret/simulate the model correctly
Some of this might be top level gneral purpose info (e.g. 'color' for a suggested color to display nodes etc) or simulator specific info, e.g. 'PNL', 'NeuroML'

In progress here: https://github.com/ModECI/MDF/tree/feature/metadata

MDF scheduling wrappers

One thing that would be nice though is if there was a wrapper class around the PNL specific classes/fields so that all the terms used (trial/timescale/sequence) are MDF's own and can be changed/updated easily on our side without impacting PNL. Also the "innards" of that wrapper class can eventually substituted with a MDF specific implentation (or a standalone schedluer) without too much additional changes...

Thinking a bit more about this I feel this file (modeci_mdf/scheduler.py) is the proper place for the "wrapper" class/fields (i.e. the MDFScheduler) which hides the specifics of PNI in an MDF specific API, and the EvaluableGraph could be extracted out & just use that API...

Originally posted by @pgleeson in #60 (comment), #60 (comment)

Support arrays for variables and functions in MDF format/reference implementation

In progress here: https://github.com/ModECI/MDF/tree/arrays
See: https://github.com/ModECI/MDF/blob/arrays/examples/Arrays.json

ONNX -> MDF pipeline

@davidt0x Can you add details?

@pgleeson Yeah, Sure I will go through the README and check it. Will let you know if face any issue.

Originally posted by @mraunak in #66 (comment)

Change top level element from graphs to unique id of model and add version info

In line with #18, I propose to make the top level entry of the specification the id of the model, referencing 1) the graph(s) it contains and 2) the version of the language used:

{
    "Simple": {
        "format": "ModECI MDF v0.1",
        "graphs": {
            "simple_example": {
                 ...
           }
     }
}

This is in keeping with the proposal in #18 that everything with an id is referenced in the same way, and the fact that models can in theory contain multiple graphs, and so one overall id should be used for the "model" as opposed to its constituent graph(s).

Merging parameter/statefulparameter/functions/derivedvariable to one entity

As an option to resolve the state/stateful_parameter discussions, one option would be to merge all of the different features in these to one entity (probably best called parameter...) which can either be:

a constant (like old parameters),
evaluated by an inbuilt function (like "function"/derived variable)
be evaluated by a freeform expression (as was proposed for "function"/derived variable)
have a simple statelike update rule (like state/stateful parameter)
have a time derivative (like state)

The conversion from MDF to MDFzero would involve finding the ones with time derivatives, adding dt as a parameter and converting them to simple stateful parameters

Working on this here: https://github.com/ModECI/MDF/tree/parameter_state_merge, e.g.

graphs:
        state_example:
            nodes:
                counter_node:
                    parameters:
                        increment:
                            value: '1'
                        count:
                            value: count + increment
                    output_ports:
                        out_port:
                            value: count
                sine_node:
                    parameters:
                        amp:
                            value: '3'
                        period:
                            value: '0.4'
                        level:
                            default_initial_value: '0'
                            time_derivative: 6.283185 * rate / period
                        rate:
                            default_initial_value: '1'
                            time_derivative: -1 * 6.283185 * level / period
                    output_ports:
                        out_port:
                            value: amp * level

Include validation/testing as a core part of the spec?

Per discussion with the core team, one problem that's repeatedly come up is what to do about discrepancies in output. It's a foregone conclusion that different simulators/environments will produce different outputs at some level of granularity (if only because of floating point arithmetic differences), so the goal can't realistically be perfect reproducibility across all environments. BUT we do want to have some way of expressing what the intended/expected behavior is. E.g., if you build a model in NEURON, it might not give you exactly the same result if executed in PsyNeuLink, but the person who wrote the model (or someone else) should be able to say "we consider the result valid if, given inputs like this, the outputs look roughly like this, and the fitted/learned model parameters are in this range."

The suggestion here is to have some basic testing/validation annotations be a formal part of the specification. There are many ways to implement this, but the following seems like a reasonable set of core features we may want to include (or at least, discuss here):

This should be entirely optional (i.e., including a validation specification does not render a model document invalid), but strongly encouraged.
The validation specification should not be tied to any particular environment. I.e., it shouldn't say "you need to run the following routine in NEURON with these arguments in order to determine whether the results are valid. We'll probably want to have a separate subsection where users can indicate what environment the model was developed/run in, but that's a separate concern—the idea here is to have an environment-agnostic specification that formalizes what conditions a user deems to constitute a success.
Have a simple way to define discrete test cases. I.e., the approach could be analogous to standard software testing procedures, where you define example inputs, and then state what conditions have to hold on the output and learned model parameters.
Make reference to named parameters to the degree possible. For any output variables or internal model parameters that have named, it becomes easy (at least in principle) to specify simple conditions. E.g., "50 < node65.spikeRate < 100".
The set of available assertions would (at least initially) be very limited; it's probably not reasonable to let people, e.g., specify that the K-L divergence of some parameter has to be within X of a normal distribution. Limiting to basic logical/comparison operators, and maybe some really basic descriptive statistics (mean, variance, etc.) seems reasonable.

Putting those ideas together, we might have something like this (this is just intended to convey the idea, not to imply any commitment to this kind of structure/syntax):

{
    "tests": [
        {
            "inputs": {
                "data1": [
                    4,
                    3,
                    1,
                    5
                ],
                "data2": "my_critical_data.txt",
                "epochs": 300
            },
            "assertions": {
                "outputs": [
                    "all(output1) >= 0",
                    "8.3 <= mean(output1) <= 8.6"
                ],
                "parameters": [
                    "50 <= node1.spikeRate <= 100"
                ]
            }
        },
        {
            ...
        }
    ]
}

PyTorch -> MDF pipeline

@davidt0x @patrickstock Can you add details on status?

modeci / mdf Goto Github PK

mdf's Issues

MDF -> NeuroML

NeuroML -> MDF

Recommend Projects

Recommend Topics

Recommend Org

Jobs