GithubHelp home page GithubHelp logo

modeci / mdf Goto Github PK

View Code? Open in Web Editor NEW
37.0 5.0 70.0 103.34 MB

This repository contains the source for the MDF specification and Python API

Home Page: https://mdf.readthedocs.io

License: Apache License 2.0

Python 99.01% Shell 0.99%
machine-learning neuroscience onnx

mdf's Issues

Black formatting

We can setup black formatting as precommit hook. This can be configured to format code automatically.

Black Code Style

@pgleeson, I know you had some issues with this being automated. Could you take a look and tell me whether this sounds acceptable? If not, we can use flake8 (pycodestyle) but then people will have to fix the flake8 formatting issues manually.

Conform to PEP8 module name conventions

As per PEP 8, can we convert python files\modules to all lower case, with optional separation of words by underscore? I can handle this refactor in a separate pull request if desired.

Refactor API for use as Python package

Currently the demo code for writing MDF is all at top level in this repo. Ideally it should be made into an installable python package for use elsewhere, e.g. in the code @davidt0x has created for mapping to/from ONNX: https://github.com/ModECI/MDFTests/tree/main/ONNX/onnx-mdf

In the longer term the Python API could have its own repo, but for simplicity as the language is in development, the core spec/examples/api can all stay here.

The simplest name of the package would be "mdf". However... there is a python package already called MDF: https://pypi.org/project/mdf/ with a scope not a million miles from this...

So... call it "modeci-mdf"?

JSON vs. YAML

Decide whether to use JSON or YAML for MDF files.

Typing

Decide whether parameters will be typed and develop a specification for typing.

Error while loading pnl json file using mdf_load method

load_mdf(PNL.json')-->error

for exampe: load_mdf('SimpleBranching-conditional.json')

Error while parsing the attributes of objects(Graph, Node, InputPort, OutputPort, State/Stateful_Parameter)

The reasons for the error:

  • In PNL some attributes are defined for an object which is not included in allowed fields/children in mdf.py for that object
    e.g. in PNL, Graph has 'controller', but Graph in mdf doesn't have controller, similarly in PNL, Edge has attribute 'functions' but in mdf Edge has parameters, sender, receiver, sender_port, receiver_port as allowed fields but not functions.

  • methods used for parsing are _parse_elements and _parse_attributes, for parsing an Object, it requires dictionary format but for InputPort, OutputPort in PNL it is list.

  • key error comes when an attribute for an object is included in PNL json but it is not a part of allowed_fields/children for that object in mdf.

Either mdf.py and parsing methods can be altered or changes can be made in PNL json

A single way to handle ids of elements in dictionaries

Currently there is a mixture of ways of handling how lists of subelements with unique ids are defined in the format in the specification document.

The set of graphs is currently specified as a list, i.e. [], of entries each with a name, but args is a dict, i.e. {} with ids of args as keys for dict values for the argument properties.

I propose to use the latter format for all entries which have multiple children with ids/names, e.g.

{
    "Simple": {
        "graphs": {
              "simple_example": {
                    "nodes": {
                        "input_node": {
                            "parameters": {
                                "input_level": 0.5
                            },
                            "output_ports": {
                                "out_port": {
                                    "value": "input_level"
                                }
                            }
                        },
                        "processing_node": {
                            "parameters": {
                               ...
                            },
                            "input_ports": {
                               ...
                            },
                            "functions": {
                                "linear_1": {
                                   ...
                                },
                                "logistic_1": {
                                    ...
                                }
                            },
                        }
                    },
                    "edges": {
                        "input_edge": {
                            "sender": "input_node",
                            "receiver": "processing_node",
                            "sender_port": "out_port",
                            "receiver_port": "input_port1"
                        }
                    }
                }
...

This has the advantages:

  • a single way to access any element by its id
  • removes the mix of lists/dicts in the spec
  • simplifies serialisation to YAML etc.
  • already what the python API writes/reads...

Add a license file

Should mainly cover spec documentation, but also any code in this repo

Overview

Sphinx Documentation

Overview

Sphinx is document generator that converts text files, markdown files to HTML/PDF etc. formats

restructuredText, Markdown -> Sphinx -> HTML

  • cd docs
  • pip install -r requirements.txt
  • Output Format: html file

Python API
mdf.py, execution_engine.py, standard_functions.py, onnx_functions, interfaces.pytorch.exporter

Included readme files

Features

Google style Python docstrings is used for documenting python files. It requires sphinx napoleon extension
Napoleon is a pre-processor that parses Google style docstrings and converts them to reStructuredText before Sphinx attempts to parse them.

html theme : "sphinx read the doc theme"

file extensions supported currently:

".rst": "restructuredtext",
".txt": "restructuredtext",
".md": "markdown"

Intersphinx_mapping: Create automatic links to the documentation of modules and objects in other standard documents such as numpy, python

'python': ('https://docs.python.org/', None)
'numpy': ('https://numpy.org/doc/stable', None)

Working On

  • Sphinx Contrib Versioning-build Sphinx docs for all versions of the project
  • Working on incorporating all readme files from different modules
  • Automatic generation of links to map standalone repo of scheduler
  • Graphviz in sphinx

Reference

https://www.sphinx-doc.org/en/master/
https://sphinxcontrib-versioning.readthedocs.io/en/latest/tutorial.html

Add stateful nodes

Currently Nodes are stateless, they just calculate outputs and intermediate function values from the current inputs. There should be a way of assigning a variable inside a node as a state (can be a scalar, string or array), which can change over "time" as the graph is evaluated.

Initial work on this here: https://github.com/ModECI/MDF/tree/states

The value of a state can change by:

  1. specifying a value for the state, which depends on inputs, functions as well as the previous value of the state variable
  2. when an internal condition is met (e.g. input above a certain value) the state is (discontinuously) changed. External conditions might be supported, but for now, assume sufficient information is coming in via edges to the node to make the decision.
  3. specifying a TimeDerivative on the node which specified how a state changes with time (example in LEMS)

Obviously this is related to other open issues such as #30

JSON serialization of fails on bool and NoneType

See failing tests on: cf826e2.

NoneType and bool values are causing JSON serialization errors. I think this bug is somewhere in neuromllite @pgleeson. Python's json serialization supports this, are we not using it?

import json
json.dumps({'test1': None, 'test2': False, 'test3': True})
Out[41]: '{"test1": null, "test2": false, "test3": true}'

Change args to arguments?

In the definition of functions, "args" and "arguments" are mentioned in the spec document. Other names are not abbreviated, so perhaps best to just use "arguments" for this element name?

Turn on branch protection

We should setup branch protection for branches development and main.

For now at least enable Require status checks to pass before merging on both development and main. I don't seem to have rights to do this @pgleeson on the repo, could you handle it.

Required (and optional) dicts not appearing in json/yaml

A basic example in of
mod_graph0 = Graph(id="Test")

produces the json

{
    "Test": {}
}

and yaml

Test: {}

which are missing entries "nodes": {} and "edges": {} . These are listed as required in the README spec. "parameters": {} and "conditions": {} are also missing, but these are listed as optional. Similarly, parameters is missing from Node if unspecified, although this isn't explicitly mentioned in the spec as required or not.

We should either update the to_json and to_yaml methods for Graph (or the underlying neuromllite classes) to always generate nodes and edges entries, or update these in the spec as optional. If we do the former, I think we should also consider including empty entries for parameters and conditions as well.

Convert State to Stateful Parameter

Overview

Bring in the concept of Stateful Parameters instead of State. The Node object will have Function and Parameter, and further Parameter is categorized as Stateful and fixed Parameter. Stateful Parameter stores the value and changes with time.

Step1: Create Stateful Parameter

  • Change the time derivative to change in value of parameter and output the updated value
  • Rename existing scheduler to execution_engine to perform calculation
  • Link with standalone repo of scheduler for scheduling the next execution
  • Pull to development branch

Step2: Write the Translator

  • Convert dv/dt to incremental format
  • Define a translator that converts states file into stateful parameters file

Move package to src/

It is common to move the root of package to src/. That is modeci_mdf should be src/modeci_mdf. This helps make sure that testing is done against installed versions of the package always and not the local copy. Here is a good description of why this is a good idea. As per convention, developers can install local editable installs with:

pip install -e .[dev]

This ensures that modifications made to the local directory are reflected immediately in the installed package without having to reinstall. Without the -e argument, pip will copy the src to site-packages and import modeci_mdf will reference the site-packages copy, not your local copy. Developers should install with -e, users should install with simple pip install. The [dev] denotes that extra developer requirements (typically pytest at lease) should be installed.

If there are no objections then I will make this change in a standalone pull request.

Update Graphviz export to show conditions

@mraunak I've tested the graphviz export and it does actually manage to load and convert simple examples with conditions and generate a graph, see here: https://github.com/ModECI/MDF/tree/main/examples/MDF#conditions-example

Note it doesn't actually say what the conditions on the nodes and graph actually are, they can be seen in the yaml for example though: https://github.com/ModECI/MDF/blob/main/examples/MDF/abc_conditions.yaml#L7

https://github.com/ModECI/MDF/blob/main/src/modeci_mdf/interfaces/graphviz/importer.py should be updated to add this info, nicely formatted. The image can be regenerated easily with the -graph option in this file: https://github.com/ModECI/MDF/blob/main/examples/MDF/abc_conditions.py#L104

Can you look into this @mraunak? Also, it would good to test somethign similar with the MDF examples generated from @kmantel's PsyNeuLink examples, e.g. https://github.com/ModECI/MDF/blob/main/examples/PsyNeuLink/SimpleBranching-conditional.json, to be able to put these on the PNL readme.

Update main README

The main README file should be updated with a more concise description of the main elements of the language, and pointers to the main (autogenerated) technical specification for the precise definitions of elements and links to working examples. The text worked on by @patrickstock will provide a starting point.

Rename modeci_mdf to modeci

Rename top-level package from modeci_mdf to modeci. mdf is already a sub-module within. (actually called MDF now but will be lowercased if we conform to PEP8) With this I can setup a modeci pypi package as well and setup automatic publish on tagged releases.

Add metadata element to graph, node, edge, etc. for non essential information

Building on the need for simulator specific information embedded in the mdf files (e.g. framework_specific here: #97 (comment)), a more general solution would be to add "metadata" to the core (eventually all) elements, which can contain a dict of information in any form.

  • A cruical point is that any parser/simulator should be able to ignore this info and still fully interpret/simulate the model correctly
  • Some of this might be top level gneral purpose info (e.g. 'color' for a suggested color to display nodes etc) or simulator specific info, e.g. 'PNL', 'NeuroML'

In progress here: https://github.com/ModECI/MDF/tree/feature/metadata

MDF scheduling wrappers

One thing that would be nice though is if there was a wrapper class around the PNL specific classes/fields so that all the terms used (trial/timescale/sequence) are MDF's own and can be changed/updated easily on our side without impacting PNL. Also the "innards" of that wrapper class can eventually substituted with a MDF specific implentation (or a standalone schedluer) without too much additional changes...

Thinking a bit more about this I feel this file (modeci_mdf/scheduler.py) is the proper place for the "wrapper" class/fields (i.e. the MDFScheduler) which hides the specifics of PNI in an MDF specific API, and the EvaluableGraph could be extracted out & just use that API...

Originally posted by @pgleeson in #60 (comment), #60 (comment)

Change top level element from graphs to unique id of model and add version info

In line with #18, I propose to make the top level entry of the specification the id of the model, referencing 1) the graph(s) it contains and 2) the version of the language used:

{
    "Simple": {
        "format": "ModECI MDF v0.1",
        "graphs": {
            "simple_example": {
                 ...
           }
     }
}

This is in keeping with the proposal in #18 that everything with an id is referenced in the same way, and the fact that models can in theory contain multiple graphs, and so one overall id should be used for the "model" as opposed to its constituent graph(s).

Merging parameter/statefulparameter/functions/derivedvariable to one entity

As an option to resolve the state/stateful_parameter discussions, one option would be to merge all of the different features in these to one entity (probably best called parameter...) which can either be:

  • a constant (like old parameters),
  • evaluated by an inbuilt function (like "function"/derived variable)
  • be evaluated by a freeform expression (as was proposed for "function"/derived variable)
  • have a simple statelike update rule (like state/stateful parameter)
  • have a time derivative (like state)

The conversion from MDF to MDFzero would involve finding the ones with time derivatives, adding dt as a parameter and converting them to simple stateful parameters

Working on this here: https://github.com/ModECI/MDF/tree/parameter_state_merge, e.g.

graphs:
        state_example:
            nodes:
                counter_node:
                    parameters:
                        increment:
                            value: '1'
                        count:
                            value: count + increment
                    output_ports:
                        out_port:
                            value: count
                sine_node:
                    parameters:
                        amp:
                            value: '3'
                        period:
                            value: '0.4'
                        level:
                            default_initial_value: '0'
                            time_derivative: 6.283185 * rate / period
                        rate:
                            default_initial_value: '1'
                            time_derivative: -1 * 6.283185 * level / period
                    output_ports:
                        out_port:
                            value: amp * level

Include validation/testing as a core part of the spec?

Per discussion with the core team, one problem that's repeatedly come up is what to do about discrepancies in output. It's a foregone conclusion that different simulators/environments will produce different outputs at some level of granularity (if only because of floating point arithmetic differences), so the goal can't realistically be perfect reproducibility across all environments. BUT we do want to have some way of expressing what the intended/expected behavior is. E.g., if you build a model in NEURON, it might not give you exactly the same result if executed in PsyNeuLink, but the person who wrote the model (or someone else) should be able to say "we consider the result valid if, given inputs like this, the outputs look roughly like this, and the fitted/learned model parameters are in this range."

The suggestion here is to have some basic testing/validation annotations be a formal part of the specification. There are many ways to implement this, but the following seems like a reasonable set of core features we may want to include (or at least, discuss here):

  • This should be entirely optional (i.e., including a validation specification does not render a model document invalid), but strongly encouraged.
  • The validation specification should not be tied to any particular environment. I.e., it shouldn't say "you need to run the following routine in NEURON with these arguments in order to determine whether the results are valid. We'll probably want to have a separate subsection where users can indicate what environment the model was developed/run in, but that's a separate concern—the idea here is to have an environment-agnostic specification that formalizes what conditions a user deems to constitute a success.
  • Have a simple way to define discrete test cases. I.e., the approach could be analogous to standard software testing procedures, where you define example inputs, and then state what conditions have to hold on the output and learned model parameters.
  • Make reference to named parameters to the degree possible. For any output variables or internal model parameters that have named, it becomes easy (at least in principle) to specify simple conditions. E.g., "50 < node65.spikeRate < 100".
  • The set of available assertions would (at least initially) be very limited; it's probably not reasonable to let people, e.g., specify that the K-L divergence of some parameter has to be within X of a normal distribution. Limiting to basic logical/comparison operators, and maybe some really basic descriptive statistics (mean, variance, etc.) seems reasonable.

Putting those ideas together, we might have something like this (this is just intended to convey the idea, not to imply any commitment to this kind of structure/syntax):

{
    "tests": [
        {
            "inputs": {
                "data1": [
                    4,
                    3,
                    1,
                    5
                ],
                "data2": "my_critical_data.txt",
                "epochs": 300
            },
            "assertions": {
                "outputs": [
                    "all(output1) >= 0",
                    "8.3 <= mean(output1) <= 8.6"
                ],
                "parameters": [
                    "50 <= node1.spikeRate <= 100"
                ]
            }
        },
        {
            ...
        }
    ]
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.