modeci / mdf Goto Github PK
View Code? Open in Web Editor NEWThis repository contains the source for the MDF specification and Python API
Home Page: https://mdf.readthedocs.io
License: Apache License 2.0
This repository contains the source for the MDF specification and Python API
Home Page: https://mdf.readthedocs.io
License: Apache License 2.0
From https://github.com/ModECI/MDFTests/tree/main/mdf_gme to WebGME dir when #64 is merged
Makes documentation simpler & is more logical
We can setup black formatting as precommit hook. This can be configured to format code automatically.
@pgleeson, I know you had some issues with this being automated. Could you take a look and tell me whether this sounds acceptable? If not, we can use flake8 (pycodestyle) but then people will have to fix the flake8 formatting issues manually.
@jdcpni Ok with this change? Need to check with Russ?
As per PEP 8, can we convert python files\modules to all lower case, with optional separation of words by underscore? I can handle this refactor in a separate pull request if desired.
As discussed today. I will make first stab at this...
Decide how to handle potentially large amounts of data in models (such as weights).
This issue with the current version of MDF JSON is down to NeuroMLlite and is described here: NeuroML/NeuroMLlite#11
Expand the node specification to allow for multiple functions in a single node.
Decide if a node's list of input/output ports and their names specifies the node's connections to other nodes, or if the list of edges specifies connectivity between nodes.
The readme at https://github.com/ModECI/MDF/tree/main/examples/Quantum is looking pretty empty.
@rpradyumna is there any (preliminary) info you could add here?
Currently the demo code for writing MDF is all at top level in this repo. Ideally it should be made into an installable python package for use elsewhere, e.g. in the code @davidt0x has created for mapping to/from ONNX: https://github.com/ModECI/MDFTests/tree/main/ONNX/onnx-mdf
In the longer term the Python API could have its own repo, but for simplicity as the language is in development, the core spec/examples/api can all stay here.
The simplest name of the package would be "mdf". However... there is a python package already called MDF: https://pypi.org/project/mdf/ with a scope not a million miles from this...
So... call it "modeci-mdf"?
Decide whether to use JSON or YAML for MDF files.
Develop a specification for functions in the MDF.
Decide whether parameters will be typed and develop a specification for typing.
load_mdf(PNL.json')-->error
for exampe: load_mdf('SimpleBranching-conditional.json')
Error while parsing the attributes of objects(Graph, Node, InputPort, OutputPort, State/Stateful_Parameter)
The reasons for the error:
In PNL some attributes are defined for an object which is not included in allowed fields/children in mdf.py for that object
e.g. in PNL, Graph has 'controller', but Graph in mdf doesn't have controller, similarly in PNL, Edge has attribute 'functions' but in mdf Edge has parameters, sender, receiver, sender_port, receiver_port as allowed fields but not functions.
methods used for parsing are _parse_elements and _parse_attributes, for parsing an Object, it requires dictionary format but for InputPort, OutputPort in PNL it is list.
key error comes when an attribute for an object is included in PNL json but it is not a part of allowed_fields/children for that object in mdf.
Either mdf.py and parsing methods can be altered or changes can be made in PNL json
@jeremyrl7 Can you add details here?
Currently there is a mixture of ways of handling how lists of subelements with unique ids are defined in the format in the specification document.
The set of graphs is currently specified as a list, i.e. [], of entries each with a name, but args is a dict, i.e. {} with ids of args as keys for dict values for the argument properties.
I propose to use the latter format for all entries which have multiple children with ids/names, e.g.
{
"Simple": {
"graphs": {
"simple_example": {
"nodes": {
"input_node": {
"parameters": {
"input_level": 0.5
},
"output_ports": {
"out_port": {
"value": "input_level"
}
}
},
"processing_node": {
"parameters": {
...
},
"input_ports": {
...
},
"functions": {
"linear_1": {
...
},
"logistic_1": {
...
}
},
}
},
"edges": {
"input_edge": {
"sender": "input_node",
"receiver": "processing_node",
"sender_port": "out_port",
"receiver_port": "input_port1"
}
}
}
...
This has the advantages:
Should mainly cover spec documentation, but also any code in this repo
Standardization:
Examples of exchange
- PyTorch -> MDF @davidt0x @patrickstock: #23
- MDF -> Pytorch @patrickstock: simple working available: #24
- MDF -> ONNX @rpradyumna: #25
- ONNX -> MDF @davidt0x #26
- MDF <-> NeuroML @pgleeson #27
- MDF <-> PsyNeuLink @kmantel #28
- MDF <-> ACT-R @jeremyrl7 #29
Process Control
Function library/ontology/mappings
Vetting by users
Overview
Sphinx is document generator that converts text files, markdown files to HTML/PDF etc. formats
restructuredText, Markdown -> Sphinx -> HTML
Python API
mdf.py, execution_engine.py, standard_functions.py, onnx_functions, interfaces.pytorch.exporter
Included readme files
Features
Google style Python docstrings is used for documenting python files. It requires sphinx napoleon extension
Napoleon is a pre-processor that parses Google style docstrings and converts them to reStructuredText before Sphinx attempts to parse them.
html theme : "sphinx read the doc theme"
file extensions supported currently:
".rst": "restructuredtext",
".txt": "restructuredtext",
".md": "markdown"
Intersphinx_mapping: Create automatic links to the documentation of modules and objects in other standard documents such as numpy, python
'python': ('https://docs.python.org/', None)
'numpy': ('https://numpy.org/doc/stable', None)
Working On
Reference
https://www.sphinx-doc.org/en/master/
https://sphinxcontrib-versioning.readthedocs.io/en/latest/tutorial.html
Currently Nodes are stateless, they just calculate outputs and intermediate function values from the current inputs. There should be a way of assigning a variable inside a node as a state (can be a scalar, string or array), which can change over "time" as the graph is evaluated.
Initial work on this here: https://github.com/ModECI/MDF/tree/states
The value of a state can change by:
Obviously this is related to other open issues such as #30
See failing tests on: cf826e2.
NoneType
and bool
values are causing JSON serialization errors. I think this bug is somewhere in neuromllite @pgleeson. Python's json serialization supports this, are we not using it?
import json
json.dumps({'test1': None, 'test2': False, 'test3': True})
Out[41]: '{"test1": null, "test2": false, "test3": true}'
In the definition of functions, "args" and "arguments" are mentioned in the spec document. Other names are not abbreviated, so perhaps best to just use "arguments" for this element name?
We should setup branch protection for branches development
and main
.
For now at least enable Require status checks to pass before merging
on both development
and main
. I don't seem to have rights to do this @pgleeson on the repo, could you handle it.
In the same way as MDF can losslessly be serialised as JSON and YAML, it should be possible to serialise MDF as HDF5 using datasets for arrays, groups for other fields and perhaps attributes for dict entries. Would need to be added in NeuroMLlite, e.g. https://github.com/NeuroML/NeuroMLlite/blob/master/neuromllite/utils.py#L68
In development here: https://github.com/ModECI/MDF/blob/main/modeci_mdf/export/NeuroML.py
https://github.com/NeuroML/NeuroMLlite/blob/master/neuromllite/MDFHandler.py
A basic example in of
mod_graph0 = Graph(id="Test")
produces the json
{
"Test": {}
}
and yaml
Test: {}
which are missing entries "nodes": {}
and "edges": {}
. These are listed as required in the README spec. "parameters": {}
and "conditions": {}
are also missing, but these are listed as optional. Similarly, parameters
is missing from Node
if unspecified, although this isn't explicitly mentioned in the spec as required or not.
We should either update the to_json
and to_yaml
methods for Graph
(or the underlying neuromllite classes) to always generate nodes
and edges
entries, or update these in the spec as optional. If we do the former, I think we should also consider including empty entries for parameters
and conditions
as well.
Overview
Bring in the concept of Stateful Parameters instead of State. The Node object will have Function and Parameter, and further Parameter is categorized as Stateful and fixed Parameter. Stateful Parameter stores the value and changes with time.
Step1: Create Stateful Parameter
Step2: Write the Translator
It is common to move the root of package to src/. That is modeci_mdf
should be src/modeci_mdf
. This helps make sure that testing is done against installed versions of the package always and not the local copy. Here is a good description of why this is a good idea. As per convention, developers can install local editable installs with:
pip install -e .[dev]
This ensures that modifications made to the local directory are reflected immediately in the installed package without having to reinstall. Without the -e
argument, pip will copy the src to site-packages and import modeci_mdf
will reference the site-packages copy, not your local copy. Developers should install with -e
, users should install with simple pip install. The [dev]
denotes that extra developer requirements (typically pytest at lease) should be installed.
If there are no objections then I will make this change in a standalone pull request.
Update
The SimpleScheduler, which can execute simple MDF models does not currently support conditions: https://github.com/ModECI/MDF#conditions.
The definitions of the python MDF should be updated to support conditions and a basic interpretation/handling of these should be added to the scheduler.
@mraunak I've tested the graphviz export and it does actually manage to load and convert simple examples with conditions and generate a graph, see here: https://github.com/ModECI/MDF/tree/main/examples/MDF#conditions-example
Note it doesn't actually say what the conditions on the nodes and graph actually are, they can be seen in the yaml for example though: https://github.com/ModECI/MDF/blob/main/examples/MDF/abc_conditions.yaml#L7
https://github.com/ModECI/MDF/blob/main/src/modeci_mdf/interfaces/graphviz/importer.py should be updated to add this info, nicely formatted. The image can be regenerated easily with the -graph
option in this file: https://github.com/ModECI/MDF/blob/main/examples/MDF/abc_conditions.py#L104
Can you look into this @mraunak? Also, it would good to test somethign similar with the MDF examples generated from @kmantel's PsyNeuLink examples, e.g. https://github.com/ModECI/MDF/blob/main/examples/PsyNeuLink/SimpleBranching-conditional.json, to be able to put these on the PNL readme.
The main README file should be updated with a more concise description of the main elements of the language, and pointers to the main (autogenerated) technical specification for the precise definitions of elements and links to working examples. The text worked on by @patrickstock will provide a starting point.
@rpradyumna Can you add details on status?
@patrickstock Can you add details on status?
@davidt0x I disabled the test on inception.py on development, as it currently fails with an out of memory error on ubuntu: https://github.com/ModECI/MDF/actions/runs/1155523696. Could you have a look when you get a chance?
Rename top-level package from modeci_mdf
to modeci
. mdf
is already a sub-module within. (actually called MDF
now but will be lowercased if we conform to PEP8) With this I can setup a modeci
pypi package as well and setup automatic publish on tagged releases.
Building on the need for simulator specific information embedded in the mdf files (e.g. framework_specific here: #97 (comment)), a more general solution would be to add "metadata" to the core (eventually all) elements, which can contain a dict of information in any form.
In progress here: https://github.com/ModECI/MDF/tree/feature/metadata
One thing that would be nice though is if there was a wrapper class around the PNL specific classes/fields so that all the terms used (trial/timescale/sequence) are MDF's own and can be changed/updated easily on our side without impacting PNL. Also the "innards" of that wrapper class can eventually substituted with a MDF specific implentation (or a standalone schedluer) without too much additional changes...
Thinking a bit more about this I feel this file (modeci_mdf/scheduler.py) is the proper place for the "wrapper" class/fields (i.e. the MDFScheduler) which hides the specifics of PNI in an MDF specific API, and the EvaluableGraph could be extracted out & just use that API...
Originally posted by @pgleeson in #60 (comment), #60 (comment)
@davidt0x Can you add details?
@pgleeson Yeah, Sure I will go through the README and check it. Will let you know if face any issue.
Originally posted by @mraunak in #66 (comment)
In line with #18, I propose to make the top level entry of the specification the id of the model, referencing 1) the graph(s) it contains and 2) the version of the language used:
{
"Simple": {
"format": "ModECI MDF v0.1",
"graphs": {
"simple_example": {
...
}
}
}
This is in keeping with the proposal in #18 that everything with an id is referenced in the same way, and the fact that models can in theory contain multiple graphs, and so one overall id should be used for the "model" as opposed to its constituent graph(s).
As an option to resolve the state/stateful_parameter discussions, one option would be to merge all of the different features in these to one entity (probably best called parameter...) which can either be:
The conversion from MDF to MDFzero would involve finding the ones with time derivatives, adding dt as a parameter and converting them to simple stateful parameters
Working on this here: https://github.com/ModECI/MDF/tree/parameter_state_merge, e.g.
graphs:
state_example:
nodes:
counter_node:
parameters:
increment:
value: '1'
count:
value: count + increment
output_ports:
out_port:
value: count
sine_node:
parameters:
amp:
value: '3'
period:
value: '0.4'
level:
default_initial_value: '0'
time_derivative: 6.283185 * rate / period
rate:
default_initial_value: '1'
time_derivative: -1 * 6.283185 * level / period
output_ports:
out_port:
value: amp * level
Per discussion with the core team, one problem that's repeatedly come up is what to do about discrepancies in output. It's a foregone conclusion that different simulators/environments will produce different outputs at some level of granularity (if only because of floating point arithmetic differences), so the goal can't realistically be perfect reproducibility across all environments. BUT we do want to have some way of expressing what the intended/expected behavior is. E.g., if you build a model in NEURON, it might not give you exactly the same result if executed in PsyNeuLink, but the person who wrote the model (or someone else) should be able to say "we consider the result valid if, given inputs like this, the outputs look roughly like this, and the fitted/learned model parameters are in this range."
The suggestion here is to have some basic testing/validation annotations be a formal part of the specification. There are many ways to implement this, but the following seems like a reasonable set of core features we may want to include (or at least, discuss here):
"50 < node65.spikeRate < 100"
.Putting those ideas together, we might have something like this (this is just intended to convey the idea, not to imply any commitment to this kind of structure/syntax):
{
"tests": [
{
"inputs": {
"data1": [
4,
3,
1,
5
],
"data2": "my_critical_data.txt",
"epochs": 300
},
"assertions": {
"outputs": [
"all(output1) >= 0",
"8.3 <= mean(output1) <= 8.6"
],
"parameters": [
"50 <= node1.spikeRate <= 100"
]
}
},
{
...
}
]
}
@davidt0x @patrickstock Can you add details on status?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.