kassonlab / gmxapi Goto Github PK

(outdated) fork of https://gitlab.com/gromacs/gromacs

License: Other

CMake 3.21% C++ 86.25% Python 3.34% Shell 0.27% Dockerfile 0.04% Cuda 2.17% Jupyter Notebook 0.06% Perl 0.09% Scilab 0.01% C 4.29% XSLT 0.14% Yacc 0.10% Lex 0.02%

gmxapi's People

Contributors

Stargazers

Watchers

Forkers

eirrgang gmxapi spiwokv silcsbio wapsyed qshao zlmturnout wittler-github fdommert bbyun28 rnaimehaom foursealee meldaw84

gmxapi's Issues

GMXAPI hook for simulation stop condition

Provide a gmxapi hook to let client code issue a data event that causes the framework to set the stop condition.

Documentation entry points

We need clear and concise documentation entry points.

The README can be separated into a project description / front page, and an installation doc. The Sphinx-based user documentation should have a simple set of user docs and more concise / consolidated developer docs.

Importantly, we need a tutorial sort of document / Jupyter notebook with a walk-through of a work-flow utilizing the ensemble restraint demonstrated in the sample plugin. We can incorporate the sample restraint into this distribution so long as we continue to be mindful to minimize dependencies when testing.

Use Travis-CI with docker images

Use Travis-CI to build, push, and pull docker images so we can automatically have gromacs-gmxapi builds available.

We can reduce the CI testing time by reusing docker images that haven't changed for gromacs-gmxapi and gmxapi. Building the docker image for sample_restraint is a test in itself. Travis could publish these images, perhaps. We may also be able to use Travis-CI artifacts to speed up readthedocs builds.

Remove boilerplate for plugin instantiation

The Builder stuff in the sample_restraint doesn't belong there. It needs to be removed to a gmxapi feature (provided headers) and greatly improved. It will become a configurable Director to construct the part of the Session corresponding to the plugin. Client code will configure it with available inputs, outputs, and parameters with pybind-like syntax. Relates to #44 but does not need to wait for resolution to that issue to simplify sample_restraint. Relates to #47

The long term solution to all things surrounding the plugin environment is basically this:

plugin code uses Session resources for input, output, supporting operations, and state data
plugin developer just writes functions (not classes) and defines all of the input, output, and state through descriptive templating mechanisms (inspired by pybind and boost)

In addition to (hopefully) simplifying the development environment a bit, we can simultaneously address aspects of other issues, like #36, #40, #69.

We can implement this one part at a time, starting with state data. By storing state data in an object received from the Session, we can eliminate private data members and automatically checkpoint everything about a plugin. There are several optimizations that will be easier to make, but one potential performance challenge is the dichotomy of type safety and custom data structures.

We can consider less type safety, run-time type information, or building more templated code in the plugin.

Regardless, I think the plugin developer interface is the same, so hopefully if we choose wrong we only have to ask developers to update headers and recompile.

Unique work spec identification

The Session launcher needs to be able to be able to construct a reasonably unique working directory name and to be able to identify whether an existing directory represents the intended work specification. In the long term, we should include meta-data on specific aspects of data graph state, but the immediate need is met by a filesystem-friendly hash or string generation based on the work specification text and appropriate handling by the Context.

Stop condition hook

By default, a simulation has stop conditions based on the preconfigured nsteps, as well as the OS SIGINT and other implementation details.

Workflow level control logic needs to be able to tell a simulation to stop, say, after convergence of some property is detected.

The current plan is to make the types of interaction between MD elements and other elements more explicit than a mere dependency relationship, such as by naming elements in keyword parameters provided to the MD element, using keywords that signify the type of connection (e.g. restraint, stop). These relationships will result in distinct binding operations as the execution graph is instantiated. For immediate test functionality, though, prototype implementation will follow the previous automatic binding model, which temporarily bloats the Restraint interface.

allow GROMACS integrator to receive an external resource influencing its detection of stop condition instead of managing internally to the integrator_t call. #80
allow GROMACS client code to set the stop condition during the integrator execution. #81
provide a gmxapi hook to let client code issue a data event that causes the framework to set the stop condition. #82
express the gmxapi feature in terms of a boolean data flow for an MD element subscribed to a data provider. #83

Further discussion is needed regarding the interaction of the predefined nsteps and additional stop conditions. A big consideration is to try to make sure the behavior is what a user would intuitively expect. Should the nsteps stop condition be explicitly represented? Always? Or only if additional stop conditions are supplied? E.g. What if a simulation reaches nsteps without reaching the convergence condition? (Note, a convergence condition does not have to be a stop condition to trigger some data event if the user wants the simulation to run nsteps.)

Provide API hook to send stop condition to runner

allow GROMACS client code to set the stop condition during the integrator execution.

workflow-level integration with simulation steps

At least for short-term flexibility, we need to be able to run for a given number of steps or to update the final steps count and relaunch. More generally, we can handle this with periodic operations and flexible stop conditions.

Session resources

Generalize the resources object in the sample_restraint and make it a gmxapi feature.

Allow client management of MPI environment for GROMACS

In support of #50

When built with MPI, libgromacs requires MPI to be initialized very early and no more GROMACS calls should be made after it is finalized. GROMACS manages MPI initialization and finalization in gmx::init() and gmx::finalize(), but it should be amenable to MPI initialization and finalization in calling code. Since we do not want the GROMACS library to do the initialization and finalization, the gmxapi client should perform exactly one initialization and finalization per job. A script executed with mpiexec python -m mpi4py takes care of this, but we should still call gmx::init() and gmx::finalize() (per libgromacs API protocol).

Note, though, that gmx::init() initializes MPI even if the client is not started in an MPI environment if it is built with MPI.

The most straightforward way to handle this is for the local client Context to detect an MPI build of libgromacs. Upon gmx module load, the local Context instance can be created and stored as a module singleton. If MPI was not initialized before module load, the local Context instance can store a C++ RAII sentry to initialize and deinitialize MPI. The local context also stores a C++ RAII sentry to call gmx::init() and gmx::finalize(). Alternatively (or in the short term), because Python types are themselves Python objects, we can attach additional code to the creation and destruction of the gmx.core.Context type itself. ref http://pybind11.readthedocs.io/en/stable/advanced/misc.html?highlight=weakref#module-destructors

Question: if the client manages MPI initialization and finalization, is it okay/preferable to call gmx::init() / gmx::finalize() once per session?

We are introducing a more substantial zero-level Context with a necessary C++ component that we can reuse for C++-only clients, but the C++ testing has an extra effort. Previously, we used the GROMACS CMake macros to generate unit tests, but this framework includes management of gmx::init() and gmx::finalize() that we specifically want to avoid. CMake 3.9 includes more googletest resources that may make migration simpler, but we also probably want a temporary working directory facility.

Session and client need access to trajectory step

The last few versions of gmxapi haven't had a way to restrict or extend the number of steps to run in the simulation. The high-level design as well as the library-level interface for this functionality is still evolving, but in the short term, we need this feature back, so we can probably just enable it as an MD element parameter and either provide a hook for updating it or have the user run a series of simulations that append to the previous trajectory.

clarify plugin development with mixed Python/C++

Add examples and work towards standardization for plugin packages that require more than just C++ bindings, mixing or wrapping with pure Python parts.

Tag artifacts

Artifacts like trajectory output files should be tagged so that it is clear to the Context what they are the result of. Ideally this includes the nature of and state of all upstream graph elements.

Tutorial / working Jupyter notebook examples

Task for #27

We need a quick-start sample workflow to help new users understand what gmxapi does and to experience it hands-on. The current restrained ensemble system requires far too much CPU time to be an effective introduction.

We need a sufficiently small biological molecule that we can sample the conformation of, set a target distribution of some conformation data, and then apply the ensemble restraint, demonstrating some amount of approach to convergence within a few minutes. @peterkasson suggests a molecule such as the funnel web spider toxin 1OMB.PDB used in the Kerrigan GROMACS 4.6 tutorial.

Ideally, the example / tutorial can be run interactively in a Jupyter notebook.

enable MPI domain decomposition with plugins

Thread-MPI domain decomposition is working, but we should allow MPI domain decomposition with plugins. This requires some more fiddling with the MPI initialization in and out of GROMACS and sharing of MPI communicators.

A first step is to build the full stack with MPI compilers and MPI enabled in Gromacs, but gmxapi and sample_restraint shouldn't need any MPI dependencies.

Enabling MPI GROMACS must not break existing ensemble features in gmxapi, which currently rely on mpi4py in a Python-based Context implementation. Either mpi4py must coexist with the GROMACS MPI usage or the parallel Context must receive and use the communicator obtained when GROMACS initializes MPI. In the latter case, we must make sure that we no longer require the communicator after GROMACS finalizes the MPI environment.

gromacs-gmxapi testing matrix includes MPI and Thread-MPI builds
libgmxapi test client initializes MPI and GROMACS reuses the environment: #57
gmxapi Context can determine ensemble member rank and perform ensemble reduce operations
restraint plugin has access to call-back framework that can make one call per ensemble member
restraint plugin can be registered and initialized once per tMPI particle-pair force calculating thread
restraint plugin can be registered and initialized once per MPI particle-pair force calculating rank

Session abstraction in Python module

launch() is currently a method of gmx.core.MDSystem and returns a gmxapi::Session.
MDSystem objects are obtained from gmx.core.from_tpr(). They also provide add_potential().
gmxapi::Session objects are exposed as gmx.core.MDSession and provide run() and close() methods.

The simplest demonstration of the need for more abstraction is to represent a session with no work to perform, such as an unallocated MPI rank when the size of the trajectory ensemble is smaller than the MPI context.

Sample restraint documentation

Documentation for the sample restraint is poor, missing, or inaccurate, and is desperately in need of revision. It should also be incorporated into the gmxapi documentation (or at least linked) and/or moved to the gmxapi developer documentation.

Internal documentation is also a big issue, but is sort of part of the tasks of migrating boiler plate and improving the templating.

Avoid stdlib and templates at API boundaries

Some mismatched compiler problems might be avoided by hiding use of stdlib classes in objects that pass between the plugin, core.so, and libgmxapi or libgromacs.

port to GROMACS 2018

I should port our gromacs fork changes to GROMACS 2018 and begin tracking gromacs master more closely. This will involve some development and debugging time as well as some revisions to my workflow to stay up-to-date with master. We should release our software initially based on the official GROMACS 2018-1 (or later) release and be prepared to both track master as well as issue updates to our code in sanely-versioned ways, TBD.

The way to do this might be to use GROMACS master as a stream of updates to our development branch and to periodically issue commits to gmxapi master for tagged gmxapi releases, but it may be hard to retain compatibility with a specific GROMACS release if we have to rely on behavior changes for a gmxapi release. It may simply not be possible while our release schedule is faster than that of upstream GROMACS.

Revisit MDHolder

This class should probably be further simplified and maybe refactored in conjunction with an update to the Context concept on the libgromacs side. Note there are also some possibly confusing semantics and naming with the gmxapi::Workflow (not in the public interface, resembling the Python gmx.workflow.WorkSpec) and gmxapi::MDWorkSpec, which is in the public API but differs in important ways from the Python WorkSpec.

gmxapi::MDWorkSpec should probably be renamed to something more specific to its use in MDHolder and undergo some combination of generalization, minimalization, and obfuscation.

clean up input parameter specification for plugins

Allow easy ways for C++ plugin developers to have a clean and uniform Python interface for input parameters automatically generated and available.

For each input parameter, we need to specify

Python key word string
convertible data type
C++ symbol name

There is not a clear way to do this in one place unless we use some C preprocessor macros or suggest that frequently accessed parameters should be explicitly copied from a map or tuple to a simple data member. There may be some trick to getting the compiler to optimize out statically mapped strings at compile time, but I don't think I'd want to bet on that.

This issue is closely related to cleaning up the Builder definition and hiding the boiler plate from the plugin developer.

WorkSpec parsing to determine Context requirements

We need to clarify our notion of how Context implementations are selected and checked for compatibility. We currently do some checking, but we don't do any conditional dispatching or configuration. There are various open questions that should be enumerated and discussed...

Provide GROMACS integrator with external stop condition

Move the ownership of the simulation signals to the runner.

Allow Context to inspect GROMACS checkpoint files to determine timestep.

GROMACS fatal errors are not handled gracefully

In particular, GROMACS conventions like GMX_RELEASE_ASSERT do not produce an exception we can catch. Also, the GMX_RELEASE_ASSERT attempts to gather information that includes a call to gmx_node_num(), which tries to call MPI_Comm_size(MPI_COMM_WORLD, &i), which causes an even earlier exit when the problem is with gmx::init() and there is no MPI error handler.

So maybe one thing we should do is to make sure there is a non-default MPI error handler in place.

Relates to #57

Safe management of session working files

The context needs to correctly determine whether or not the working directories for a session exist, not overwriting previous work. It should also be able to determine whether the work is finished or ready to be restarted, but that will require additional features. This task is about file management.

There are not complete and comprehensive GROMACS tools to deal with this
situation, so I probably need to write them, but with enough of the
original files we should be able to generate a new input file for the
forked run. Note that we should confirm that the checkpoint file used
matches the step number that we think it should.

Several related issues to consider:

GNU filesystem utilities indicate that the process's current working directory is used to resolve paths to produce a file descriptor for fopen(), but it is unclear whether the semantics are universally well-defined for what happens if the current working directory is changed while a file descriptor is held for a file opened by a relative path.

We should specify all input and output files rather than rely on libgromacs default behavior.

We should avoid ambiguity by making sure that we pass absolute paths to libgromacs.

We should cease the practice of changing working directory during Session launch (with the possible exception of dispatching to another Context, which should be done in a separate process).

The Context implementation should handle shuffling of filesystem artifacts for such use cases as forking trajectories.

Independently of further discussions about input and output paradigms, we can achieve predictable behavior in the short term by distinguishing between a trajectory that is a continuation and a trajectory that is initialized as a fork of another trajectory.

As a further simplification of the last point, we can accept that our trajectory forking operation will be a freshly initialized simulation whose zeroth MD microstate is not exactly equal to that from which it was forked. It's time to write some utilities to extract / manipulate input components because right now I don't think there is a way to get a topology that grompp can use back out of a TPR file. For a proof-of-concept trajectory forking, we could wrap the following.

 gmx dump -s old.tpr -om temp.mdp
 gmx grompp -f temp.mdp -p topol.top -c state.cpt -o new.tpr

where old.tpr is available from the original load_tpr operation, temp.mdp is a temporary file managed by the Context implementation, topol.top can be provided as a parameter to the fork_trajectory() operation, state.cpt is already managed by the C++ Session, and new.tpr is an output that becomes the input for the forked md operation. But this is already convoluted enough that I should just make proper API tools.

specify pair restraints with selections instead of atom indices

a facility provided by the framework to allow plugins to have automatically friendly user interface

Session execution granularity

"granularity" here refers to how many discrete API states exist during execution of a graph and how much work is performed during a single execution of a graph.

We need to discuss paradigm for a sweep through the graph and repeated sweeps, but we need to fully enable the build, launch, run until done scheme. Our current sequential execution of nodes probably works for now, but to enable more parallelism may require nodes to report a ready-to-run state when they have the inputs they need, since optimizations need to allow nodes to communicate directly in some cases. For the moment, we will have some data-event driven execution, such as plugin force calculations and logical processing of MD stop conditions.

Also, we need to be able to defer parts of the session to sub-contexts. We could represent a sub-context as a fused operation node.

Allow CMake-only builds

Python setuptools has been more trouble than it is worth and does not succeed at the fundamental task of dependency resolution at build time. After consultation with experts and stakeholders, we need to at least be able to build all three repos with CMake install instructions on the path towards consistency and simplicity.

resolve protocol for API operation map

A Context implementation needs to be able to read an element from a work specification that tells it the name of a function to call, the parameters to pass to that function (or to set on the resulting object), and the library that implements the function.

There are several possible protocols, each with downsides. Before the map, non-built-in operations caused a module import followed by a non-method function call (with either no arguments or arguments from element.params). This works fine for calling a constructor.

I'm currently trying to build a map of functions when the context loads the work specification. There are built-in operations specified by the API and implemented by the Context, and there are non-built-in operations that are specified by work elements that use a non-built-in namespace. At run time, the Context can look up functions to run in its map: self->operations->namespace->operation_name.

If operations implemented by the Context are provided as member functions, then the signature of functions has the Context as the first argument. This is probably useful, and may be worth formalizing whether or not operations are intended to map to Context member functions.

In another prototype, the WorkElement object was passed to the function implementing the operation for maximal flexibility in dispatching, but just passing the params ought to be sufficient as per the original design.

For even more flexibility, I tried calling the mapped function first with the params as an argument list, catching an exception if it failed, and trying instead with no arguments, then calling set_params(*args) as a member function of the returned object. This allowed me to let mapped functions be class names that could be used to construct objects.

For the moment, the simplest thing is probably to just make the mapped functions work with either zero arguments or arguments from the params list. If we want to use member functions or something, we can store lambdas or some other closure in the map to hide the extra arguments.

Move to CMake-driven build and install

Pip is not giving us more ease-of-installation than it is costing, and pip 10 broke our builds. We are already wrapping CMake with setuptools, so maintaining pip / setup.py as the primary install method is neither sensible nor worthwhile. We may still want a thinner setup.py wrapper for readthedocs builds or something, but we should not use it for dependency resolution, compiler selection, etc. All that is really required is to use the setuptools Python package in a CMake script to locate the preferred installation directory for a given python environment.

WorkSpec feature: element deletion

More registration between WorkSpec objects and WorkElement objects will be necessary to ensure that stale references don't linger and that consistency is maintained between WorkElement objects and WorkSpec records. There may be additional considerations about the uniqueness of objects, serializability, portability, and even additions to the schema, so it is appropriate to defer this goal for a little while.

checkpoint interval

Allow a configurable interval of number of data events between checkpoints of data objects. This will probably only make sense to manage from the Session code, which will interact with GROMACS.

more execution graph node types

For flexible workflow configuration, we need to move forward with abstractions for more input and output nodes. Coupled to this, we need more concrete design for distinguishing types of dependency relationships / edge types. This is in part to clarify the binding process both to a high-level user and for implementation purposes.

The current sense is that workflow elements should have each of their interactions explicit to avoid unexpected behavior. This can be done by requiring the elements to be named in keyword parameters that the context can resolve to API-specified connection types during graph translation. For instance, a plugin that can provide both a force calculation interface and a stop condition to an MD operation would be listed twice in the MD parameters, such as (restraint=myplugin, stop=myplugin). During translation to the executable graph, the translator for myplugin will participate in two binding protocols, one for each interaction type.

In the long run, there is an important distinction between interactive edge types and data flow edge types that will need to be worked out. In the above case, a stop condition can ultimately be a data event, but the restraint force calculation interface is a tightly bound interaction with data flow in both directions during a single time step and is dependent on the MD engine implementation. However, this just means that in reality the MD engine is represented by several nodes corresponding to different phases of the MD loop iteration. Right now, this is implicit, but maybe we should make it explicit for consistency and be clear that those several nodes of MD engine and MD plugin are fused from the perspective of the workflow-level scheduler and deferred to the simulation-level scheduler.

In addition, we need to figure out where to put the protocol for declaring that no interaction needs to take place for x number of steps, which is a lower-level optimization for infrequent call-backs.

MD input

structure/configuration
topology
integrator state
simulation parameters (such as nsteps and other MDP options)?
stop condition (an edge type) #62

MD output

structure/configuration
checkpoint information(?)

Data operations

simulation operations
- add
- mean (client of operation should not need to know the nature of the domain decomposition)
ensemble operations
- add
- mean (client of operation should not need to know the size of the ensemble)
logical operations: may be necessary to implement workflow logic not already available, such as to produce a stop condition for a simulation that has converged or run long enough.

Data source

Define and initialize a data structure (array, scalar, or key-value block) that can be initialized, checkpointed, and updated while passing through other operations on a sweep of the graph in a TensorFlow-like manner.

effective generation of TPR files from MDP data

Sooner or later there needs to be something bundled with gmxpy to allow direct specification of MDP key-value pairs. The Beckstein gromacs-wrappers are okay, but not great. I can wrap the command line if I have to, but there is broad buy-in to do this. I will probably have to just borrow grompp and add Python bindings or CLI wrapper since it doesn't look like the migration of inputrec to gmx::Options will be complete soon. However, I could target gmx::Options and just migrate the parts of inputrec that are interesting to our use cases.

Eigen

We should use Eigen for data exchange sooner than later.

Formalize the external WorkElement interface for plugins

We are currently wrapping plugins in WorkElements in an ad hoc way. It will be easier to see how best to update the standard plugin interface after ensemble plugins with shared data are tested.

Formally specify operation Directors for Session builder.

The interface and conventions are still being refined, as well as the intended generality. I think we can probably use a DAG at the Python level and just an informal set of bindings between objects at the C++ level, but we do need an interface that Python and C++ can share. It is also not clear to me that we need as many steps as we have unless we make more use of the DAG between build() and launch(). This may become more obvious as we implement workflow checkpointing and/or asynchronous launching.

In the end, we may also prefer to eliminate the networkx dependency. We do not use many of its features, but I haven't checked if we can bundle it (license-wise), we might ultimately want something available at the C++ API, and it is unclear whether the external dependency is acceptable.

Docker does not access current gmxpy version

A shell command had unexpected behavior and the Docker build uses an old version of the gmxapi source.

The /home/jovyan/gmxpy directory should not have existed in the gromacs-gmxapi image.

Restore from checkpoint

Initialize nodes and replay the required data events. This is probably connected to discussions on the nature of the execution graph and edges. I think the entire state of the graph consists of the edge state and the initialization values of source nodes.

Add test coverage testing with codecov

Use Travis-CI output for codecov.io to give us a percentage coverage badge.

Generic data structures for graph edges and state

Provide data structures as Session resources that the Context implementation can checkpoint. External code can use these data structures to maintain state.

We can discuss whether and how we could use the TensorFlow concept of variables in addition to graph edges, but right now we should clarify that our check-pointable data is as edges. Then, we have data source nodes that initiate data flow on a stream and implicitly (magically) receive the data at the end of a pass through the graph to provide the updated stream on the next iteration. What we could do is to let edges essentially push data events, while "variables" are accessible in a token-passing sort of way, where the graph director for an operation just has to make sure that there are nodes in the right topology for the object code to have the token at the right time. The token is in the form of the Session Resources object that is available to a node when it is its turn to run.

This would all be so much easier if we could tell GROMACS to run to a specified step and then surrender control to the API. The alternative, I think we have to say that any cluster of nodes associated with a simulation is deferred to the libgromacs Context for handling. We've already inserted an object into do_md to manage the stop condition, so we can use it to synchronize the gmxapi Session by adding an additional hook that calls out to the Session each time the timestep is incremented, preferably after the PP coordinates data is in place...

Allow Context to automatically manipulate simulation input files to restore MD operation to a known state.

The TPR file reading/writing and simulation input preparation (#190, #195, etc) need to understand and use checkpoint (.cpt) files.

Get a consistent set of checkpoints

Presumably with some sort of checkpoint-participant interface in GROMACS, get a signal from GROMACS when a checkpoint is made and don't allow the workflow to proceed until a consistent set of checkpoints are made throughout. We may prefer something more abstract, but I don't know what that is right now.

Formal AllReduce operation

We need an abstraction to smartly handle reduce operations in different computing environments. Three choices are:

make it an API feature of data elements
make it an operation that interacts with certain types of API objects.
make it a standalone node with arbitrary input and output

The Context can provide one or more simple ensemble reduction operations to take care of data sharing needs. The operation would be used as a dependent element on an element from which it receives numpy or Eigen array data, which could be specified or wrapped in an adapter that is specified by the API to assert optimal data exchange.

schema 0_2

This issue is to track the proposed schema updates for the gmxapi_workspec_0_2.

The major changes discussed so far are to specify named inputs and outputs explicitly in the work elements. The hierarchy of the params and depends fields probably warrant more discussion.

Also, we may want to add the hash key we use to uniquely identify the spec to the spec itself. This would make it easier for a human to inspect the mapping of workspec to working directory as well as to provide a mechanism for built-in validation checks.

Though not part of the schema definition, we can include discussion of how API interaction with the specification could/should work. E.g. elements should be views into the spec object rather than independent objects; the spec object should contain additional members to maintain contact with the (chain of) Contexts associated with the work spec (and should itself maybe just be a view into the Context, with spec objects requiring a Context to exist at all)

Integration testing for sample restraint

Currently, integration testing for the sample_restraint code is done manually. We should add sample_restraint testing to the gmxapi Travis-CI builds.

checkpoint for data nodes

checkpoint for data nodes (used by plugin for parameters) needs to know or
be able to associate with the checkpoint for the simulation, which has been
appending. But we don't need to force the user to care about what timestep
they are on. It shouldn't be a parameter that we're required to mess with.
how/whether this integrates with trajectory timestep or workflow iteration
is a potential topic of discussion, but I'm leaning towards workflow iteration
with enough metadata to determine the associated simulation time on a need-to-know basis.

reusable output node

We need to be able to append trajectories sensibly and robustly along a single simulation pipeline.

Most basically, some workflows will included several stages of simulation that should produce a single continuous trajectory. Whether the lower level implementation involves multiple GROMACS program launches or a single launch (that runs for a bit, changes parameters, and runs more) is not relevant at the higher-level interface. So this issue has a few parts:

What does the workflow graph look like for output in multi-stage / adaptive simulations?
What is the sensible implementation in the short and long term?
What does that look like in the execution graph?

One way is to have a single output node represented in the workflow. Multiple simulation nodes in the workflow graph could run as a pipeline. In order for the output node to be responsible for writing the entire trajectory, the intermediate nodes would "pass through" trajectory frames for time steps before the simulation node is active.

Two obvious alternatives are

Each simulation node has an output node to perform the operation of writing trajectory data out (filesystem I/O is not a native workflow data stream type). We could try to handle appending to the same trajectory automagically or allow input parameters for the output nodes to specify accumulating frames by appending a single trajectory file.
More fully embrace the idea of data as graph edges. When the stream is initialized, specify that it is a file-backed stream and carry the necessary metadata to properly maintain the trajectory file as the graph is executed.

I like the latter and it seems more TensorFlow-ish, but I have thought about it less and it implies introducing more formalism into one or both of our graph schema: the workflow specification graph (specified in the high-level API) and the execution / data flow DAG (currently evolving fluidly).

multiple MD elements in a single workflow

Iterative and adaptive workflows will often be expressed well with a workflow containing multiple MD nodes.