elidupree / time-steward Goto Github PK

View Code? Open in Web Editor NEW

7.0 6.0 2.0 2.3 MB

Write games and simulations in Rust, using reactive programming for smoothness and replicability.

License: MIT License

Rust 97.06% HTML 0.40% JavaScript 1.96% CSS 0.51% Nix 0.07%

time-steward's Introduction

TimeSteward (under construction)

A game/simulation backend that automatically supports:

lockstep multiplayer with reliable synchronization
lag hiding
non-blocking autosaves*
parallel computation*
and other features.

The main catch is that you have to write the physics within the TimeSteward model. Any physics is possible to write using TimeSteward, but it may not be easy to convert an existing simulation to use it.

Short overview

TimeSteward has one core trick: It can change an event in the past, then cheaply update the present. It doesn't need to redo any computations that weren't dependent on that change.

Every time any event occurs, the TimeSteward records what data that event examined and modified. Thus, it can maintain a dependency graph between events. Ideally, all events only access or modify data within a small neighborhood, and dependencies don't propagate very fast. If these conditions are met, making a change to the recent past is very cheap.

This naturally supports lockstep multiplayer. Each client simply runs the simulation in real-time, handling local inputs immediately. When it receives input from other clients, it inserts that input into history at the time the input was sent. Thus, all clients ultimately end up with the same input history.

Individual clients can also speculatively simulate into the future, which lets them smooth over moments when very costly computations occur for only a short time.

Because TimeSteward retains old data, you can also cheaply take out a handle to a snapshot of the simulation state. You can trust that the snapshot won't change as the simulation continues. This allows, for instance, saving the snapshot to disk in a different thread*, without freezing the game while the user waits for the save to finish.

Gotchas

In order to remain synchronized, all code executing within the simulation must be deterministic. (This doesn't apply to inputs which are manually shared between the clients.) Being deterministic means it can only depend on data from within the simulation. It cannot depend on other things, such as:

The local system time
System random number generation
The floating-point implementation of the local processor
The endianness of the local processor
Mutable global (static/thread_local) variables
Whether data has been reloaded from a serialized version

In particular, your physics cannot depend on f32 or f64 arithmetic, the iteration order of std::collections::HashMap**, or the capacity of Vec.

TimeSteward provides some features to work around these limitations. It has a built-in deterministic PRNG. It will eventually also provide a deterministic alternative to HashMap and a deterministic plug-and-play replacement for f32/f64. (However, using floats may still be undesirable because floating-point emulation is much slower.)

TimeSteward also provides a convenient system for running test simulations synchronized over one or more computers. If synchronization fails, the system can report the exact event where the first failure occurred.

Detailed design

DataTimelines

All data that can change over time is stored in implementors of trait DataTimeline.

A DataTimeline is a retroactive data structure. You can insert operations in the present or past, and query it at any time in the present or past. The results of queries must depend only on the operations that exist at times earlier than the query – and not, for instance, on the order the operations were inserted. (You don't need to implement retroactive data structures yourself – TimeSteward has built-in types for common use cases. And if you do build your own, TimeSteward has features for testing that they behave correctly.)

A DataTimeline can also report all data existing as a specific time, as a snapshot. A snapshot taken at a specific time can be used to compute an exactly identical simulation, if the same user inputs are supplied after that.

DataTimelines can only change at discrete moments, and it is good to make those changes infrequent. If you want to represent, say, a moving object, the inner data should not just be the location of the object, but a representation of its trajectory over time:

struct Ball {
  // location at the last time the ball was modified
  location: [i64; 3],
  // velocity at the last time the ball was modified
  velocity: [i64; 3],
  // current constant acceleration – for instance, due to gravity or other forces
  acceleration: [i64; 3],
}

Thus, the data only needs to change when the forces on the ball change, such as when it runs into an object.

(In practice, the TimeSteward library provides implementations of a few trajectory types, so you may not have to implement this yourself. We will continue expanding the support libraries as development continues.)

Predictions and Events

If the data doesn't normally change over time, how do we know when to make things happen?

An Event is sort of like an object of type Fn(simulation state) -> results. When called, it can query and modify DataTimelines at a specific time. It is also allowed to create and destroy Predictions. A Prediction is an Event, combined with a time at which it's expected to happen. If the Prediction isn't destroyed by the time, the event happens.

Imagine that one Event makes a ball is move towards a wall. From the current trajectory of the ball and the location of the wall, the event computes the time when the ball will hit the wall, then creates a Prediction of a collision at that time.

let ball: Ball = accessor.query (...);
let time = ... // Examine various fields and compute the time when the ball hits the wall
let prediction = accessor.create_prediction (time, BallHitsWallEvent {...});

If a later Event changes the motion of the ball, it should then destroy the original Prediction and create a new one based on the new trajectory.

As shown above, Events interact with the simulation through "accessor" objects. These objects are the way we track what queries and operations were made. Generally, it is an error for an Event to get information by any means other than the accessor (and self).

This system – Events automatically creating Predictions, Predictions automatically running Events – can implement a complete ongoing physics. The only thing missing is the way to add user input.

FiatEvents

Events are the only thing that can change field data, but there are two ways Events can be created. One is to be predicted in a Prediction. The other is to be inserted from the outside by fiat. We call these FiatEvents. They usually represent user input, but they can also be based on the local time, instructions from a server, or other things. To keep simulations synchronized over a network, all FiatEvents, and only the FiatEvents, need to be shared between all clients.

Ordering and DeterministicRandomIds

If two Events are scheduled to happen at the same time, one of them technically has to happen before the other. For the simulation to be deterministic, the order has to be deterministic as well.

We accomplish this by using a cryptographic hash function. Each Event is given a DeterministicRandomId – a unique 128 bit ID. Events happen in order by ID. For both Predictions and FiatEvents, the caller has to provide a unique random id. DeterministicRandomId can easily be generated from any type that implements Serialize:

for time in 0..50 {
  if the user is holding down the red button {
    steward.insert_fiat_event(
      time,
      DeterministicRandomId::new(&time),
      UserContinuesHoldingdownRedButtonEvent::new());
  }
}

A typical choice for FiatEvents would be to hash together a tuple of (time, ID of user who gave the input, enum indicating the type of input). A typical choice for Predictions would be to hash together the time-id of the creating event with something unique to the prediction, like a unique id of an object being predicted about, or coordinates of a cell in a grid that's being predicted about.

ExtendedTime

There's a special case when and Event creates a Prediction at the same time as the Event. For instance, imagine that a ball is going to collide with two walls at the same time, like in a corner. One of the events happens first, and the ball is deflected away from the one wall. That event then predicts when the ball will collide with the other wall – which is zero time from now. It might generate a time ID that comes before the ID of the first Event!

To deal with this, we still make the second Event happen at the same numerical time, but in a later iteration. This gives rise to the concept of an ExtendedTime, which is defined approximately as follows:

struct ExtendedTime {
  base: Time,
  iteration: u32,
  id: TimeId,
}

ExtendedTimes are lexicographically ordered by the fields listed above. TimeSteward users usually don't need to be aware of ExtendedTimes (just implement your Events in terms of regular time, and they will likely turn out fine). However, it is possible for TimeSteward users to examine ExtendedTimes, which can be useful for debugging and loop detection.

PersistentTypeIds and serialization

All the physics-related data of a TimeSteward forms a network of DataTimelines, Events, and Predictions. These objects can contain handles to each other. DataTimeline changes are a form of interior mutability, which allows the formation of cyclic data structures. This makes them tricky to serialize, but TimeSteward does provide a way to serialize and deserialize the collection as a whole.

For various reasons, it's sometimes convenient to allow the handles to not know the concrete type of the objects they're pointing to, like trait objects. This complicates serialization, because we need to store some record of what type the objects are. Rust exposes TypeIds, but they're explicitly nondeterministic over multiple builds.

So, we require that some of the types used in simulation state to have hard-coded IDs. These IDs are simply one random u64 for each type. (Because there are fewer of them, they don't need to have as many bits to stay unique. Thanks to the birthday problem, this would have a >1% chance of a collision with a mere 700 million types. I don't think we need to worry about this. 128 bit IDs are necessary for events, because computers can generate billions of them easily, but this isn't the same situation.) The documentation provides a convenient way to generate these IDs. We could theoretically have these IDs be automatically generated from the type name and module path, which would make them unique, but hard-coding them helps keep serialization consistent from version to version of your program. (You wouldn't want savefiles to be incompatible just because you reorganized some modules.)

Invalidation, invariants, and auditing

A TimeSteward may run events out of order. Let's define the canonical state to be the exact history that results if you run all of the events in order. We want the history to eventually reach the canonical state regardless of what order the events are run. In particular, if a later event runs first, then an earlier event may change some data that the later event queried, making the later event invalid. If that happens, the later event must be rerun with the new inputs.

Each Event must:

implement a way for the event to be undone.
whenever the event modifies a DataTimeline, inform the TimeSteward of all future events whose queries to that DataTimeline would return a different result. (False-positives are okay, but false-negatives are not.)

TimeSteward could automatically track which events queried which DataTimelines, but that would be inefficient – in most cases, it would store many redundant dependencies. For instance, a certain type of event might be centered on a certain tile of a grid, and query the 25 tiles closest to it. An automatic system would store 25 handles to the event. But it would be possible to only store one, in the center tile. The implementor would simply know that when you modify one tile, you need to invalidate the events in surrounding tiles as well.

So instead of automatically tracking everything during normal simulations, TimeSteward provides features to audit that your more efficient behavior was correct after-the-fact.

To guarantee determinism, we require this invariant: "The history prior to the first invalid event is always in the canonical state." Note that for this definition, a Prediction that exists but hasn't been run yet is also considered "invalid", because the "valid" state is the one where it has been run.

This is sufficient to uphold determinism (we can simply keep running/undoing/rerunning invalid events until we reach the canonical state), but hard to audit in a helpful way. On one hand, whenever we run the first invalid event, we know that it receives canonical inputs, and we can audit that it leaves the history in the canonical state up through the next invalid event. On the other hand, imagine that, near the beginning of the simulation, an unimportant event E was invalidated, but not undone. Then the simulation continued for a while, executing hundreds of events unrelated to E. Only after that, E was rerun – and TimeSteward detected that it the history was now inconsistent with the canonical state! That isn't very useful. We have no idea which of the hundreds of events was responsible for the inconsistency.

Invalid events raise a problem for tightening this condition. We'd like to be able to say, "The history depends solely on the executed-events (events that have been executed but not undone)." But if you had a timeline like this:

Time 0 – Undone event E – Invalidated, but not undone, event F – Valid event G

Then the state after F could legitimately depend on whether F's execution happened before or after E was undone. So we need a weaker invariant.

Luckily, we can check the canonicity of individual DataTimelines. So specifically, "For any DataTimeline D, D's history prior to the first noncanonical event that modifies D is always in the canonical state." Again, we need to clarify the definition of noncanonical event – this includes both events that have been executed with noncanonical inputs and modified D, and also events that canonically modify D, but have not been executed canonically. (TODO: Can we prove that this is stricter than the first invariant? What do we need to know in order to know that there can't be any noncanonical events prior to the first invalid event?)

Example

Coming soon...

Optimizing TimeSteward simulations

Coming later...

Keywords

TimeSteward uses incremental processing to be a retroactive data structure. It's also a partially persistent data structure, in the sense that snapshots don't change even if you make retroactive modifications after taking them. I didn't need these terms for the explanation, but I want them to appear in this document to attract people who are doing web searches for "incremental processing game physics" or similar.

License

MIT

Footnotes

*Not yet, but it is in the works.

**Even if you use a deterministic hasher, Hash implementations are endian-unsafe, which makes the ordering of the elements nondeterministic across systems. Also, the default Serialize and Deserialize impls for HashMap do not record the current capacity, which makes ordering of the elements nondeterministic under serialization.

time-steward's People

Contributors

Stargazers

Watchers

Forkers

ireneknapp adamnemecek

time-steward's Issues

Properly deal with the #[derive] bounds issue

Because of rust-lang/rust#26925, Basics currently requires a bunch of unnecessary supertraits, just to make it possible to use #[derive] in situations where you should normally be able to use it. Eventually, we should have a better approach. Possible solutions:

Implement the traits manually. Probably a bad idea.
Solve the problem in the Rust compiler. This is likely too difficult for me to do myself, especially because of the backwards-compatibility issues, as discussed in the issue.
Use rust-derivative to derive with custom bounds. A decent compromise, except that rust-derivative doesn't support PartialOrd, Ord, Serialize, or Deserialize. The first two are planned (mcarton/rust-derivative#3), but it's not clear what the schedule is. I might someday consider submitting pull requests to rust-derivative (and/or serde?), if no one else resolves it first.

Garbage collection?

TimeSteward is theoretically ideal for incremental garbage collection: it is already obligated to represent its data as small, separable chunks with links between them, and retain immutable copies of old state.

(Here follows some not-perfectly-explained thoughts; maybe I will rewrite them when I'm in a clearer mental state.)

The basic idea is to record when each object is created, and incrementally trace through all objects based on their links, starting at the TimeSteward globals. When a trace is finished, it then iterates through all objects that have been allocated, and drops the ones that were created before the trace started but not reached by the trace.

In practice, implementing this will be very complicated. Some things to consider:

In what ways does the garbage collection need to support concurrency?
Garbage collection depends on user code to implement related traits (such as tracing). But it should remain memory-safe even if the user implements them wrong.
A neat way to do it would be to trace the data at a particular ExtendedTime – specifically, a time when forget_before() has been called, meaning that the data can no longer change before that. As the forgotten time goes forward, eventually everything that's inaccessible will be dropped. However, what about the case where lots of computation is done WITHOUT the forgotten time moving forward? If you ran out of memory while doing that, and the garbage collector wasn't able to free that memory, that would be bad. If we required client code to be able to retain snapshots of the full history instead of just a single moment, this can be handled well, and that would be a nice feature in general… But we might need fancier data structures for that, since it would be too inefficient to simply clone whole DataTimelines.
If we're doing this much memory management, maybe it would be possible to avoid using malloc() as well? There's some possible ideas about using memory pools and moving old objects around, leaving behind pointers to where they moved to. Of course, there's a trade-off between saving the work of malloc() and doing extra work moving objects and each time you follow a link (to check whether it has moved).

Put the "rowless" code at the top level, since it's almost as complete as the old code

What to do with the old code? Delete it? Put it in a subdirectory?

Deleting seems appropriate, considering that we do have the git history. If we want to move it into a subdirectory AND have it continue compiling successfully, it would require rewriting a lot of module paths in use statements. And it certainly seems desirable not to spend time compiling it when we only want to use the new stuff.

To be able to conveniently reference the old code, maybe I should just make a branch at the last commit where the old code still exists in the repo.

This change isn't trivial:

At the top level, api.rs and api_macros.rs will be deleted, but deterministic_random_id.rs is shared with the new code in its current form, and lib.rs will need edits.
src/support/collision_detection/ is old, but the other things in src/support/ are compatible.
src/implementation_support/common.rs has a few reused functions, which should be merged into the current src/rowless/implementation_support/common.rs, but is mostly old. src/implementation_support/insert_only.rs is shared. src/implementation_support/data_structures.rs isn't actually used by the new code at all, but isn't dependent on the old code, so it shouldn't be deleted (but it's not technically proper for it to stay where it is if it doesn't help implement time stewards?) src/implementation_support/list_of_types.rs is old.
Everything in src/stewards/ is old.
A bunch of the examples are old.
The new-API examples will need to be updated to remove rowless:: from all their use statements, but otherwise should work. We also need to update the links in the HTML files after removing "rowless" from some example names.
Most of the code in src/rowless/ deliberately use relative paths to ease this change.

Support upper time limits for step()

IncrementalTimeSteward::step() currently has a bit of a problem: if you're using a flat TimeSteward, you might have some free time to take a bunch of steps, but you can't afford to step beyond time X, the time of the next frame. The problem is that if IncrementalTimeSteward::updated_until_before() is lower than X now, you don't know whether it will be lower than X after one step.

One approach would be to make a IncrementalTimeSteward::next_step_time()->ExtendedTime. However, this isn't forwards-incompatible with concurrent TimeStewards that do a bunch of concurrent operations during step().

So, I propose giving step() a second argument, making it fn step (&mut self, limit: Option <ExtendedTime>)->bool. It would be guaranteed not to advance the settled time (see #38) as far as limit. It would also return false if there was no more work to do.

API stabilization

Most of the current API functions are okay.

The most important immediate issue is related to serialization. time_steward should provide features that make it easy to:

Serialize a snapshot
Deserialize a snapshot
Construct a new TimeSteward from a deserialized snapshot + predictors (+ constants?)

Is unsafe_now() the best way to serve its purpose?

Should rng(), random_id(), and constants() remain as they are? Right now, they are trait methods, which means that they could be implemented in different ways by each trait implementor. But they have simple, fixed ways they are supposed to behave. This leads to duplicate code and potential bugs.

ValidSince should have a method indicating whether it includes a particular base time. Perhaps it should implement Ord for itself as well.

insert_fiat_event() and erase_fiat_event() should probably return Result <(), FiatEventOperationError>.

Query by reference?

An event might want to look around in a medium-sized DataTimeline, in a way that would be more efficient using references than by first copying all the data it might be going to use.

This is tricky because it involves making the query API much more complex, and probably returning guards rather than plain references.

There might be other approaches that could accomplish the same thing.

Provide convenient floating-point emulation

Ordinary floating-point numbers are nondeterministic. However, users will certainly want to use them. If we don't provide convenient emulation, they will be tempted to try to circumvent the rules or implement their own questionably-safe alternatives.

MPFR may be suitable for this?

Standardize using ExtendedTime rather than base time in all API functions

I've been steadily exposing ExtendedTime more and more, and at this point, there's no reason not to go the rest of the way.

This applies to snapshot_before(), valid_since(), updated_until_before(), and forget_before().

This is mostly just an elegance thing, but it may be useful to allow snapshots of unusual ExtendedTimes for debug-examining stuff.

insert_fiat_event() could still use base time + id, because it's similar to the interface for creating predictions, and it supports automatically protecting the user from colliding fiat event ids with prediction ids.

The main downside of this change is that it would obligate the user to call beginning_of() themselves, but that seems tolerable.

Caching features

Caching can be important for efficiency, but it is inherently dangerous to determinism. Maybe we should provide caching types that are impossible to use unsafely and/or can have runtime checks enabled to detect whether they cause nondeterminism.

Constant caches can be included in a Basics::Constants. They should probably serialize to nothing by default. If we write consistency checks, they have to be done differently than field consistency checks (since caches are only required to be consistent between the simulations, but are allowed to be different from each other).

Maybe we can use adapton for this? It might be appropriate. It is not currently easy to learn (few examples), but if it turns out to be appropriate for this use, using it and writing our own examples would probably be better than implementing our own caching system from scratch.

Provide a deterministic alternative to HashMap

We could implement a deterministic HashMap type (i.e. one where the iteration order depends only on the elements contained). For instance, it could use linear hashing and have each bucket contain a sorted vector (or B-tree) of elements.

We might want to make groups more inherent the TimeSteward. Without special features, storing even a deterministic HashMap in the TimeSteward costs O(n) operations per event that makes a single insertion or deletion. Reducing that back to O(1) would be desirable. On the other hand, storing large amounts of data in a single field is discouraged, so we might not want to spend extra effort to support doing that. And if we do support it, it may also still be useful to create a deterministic HashMap type for use DURING single events.

Collision detection

Better debug output for simply_synchronized

Currently, simply_synchronized has a few weaknesses:

Every error is a panic
There is no way to test the first moment at which a Predictor gives different results (note: this is because time stewards are NOT required to run the same predictors at the same times, so it's a little harder to define how to sync them)
The error messages don't contain all of the information that could be useful (e.g. a full log of the queries made by the first inconsistent event; a snapshot of the state immediately before the problem, so that you can rerun it)
Doesn't completely distinguish between TimeSteward internal errors and client errors (test_lots() helps with this, but see #24)

Optimize by not rehashing DeterministicRandomIds

DeterministicRandomId can be used as a hash table key without applying another hash function to it. Its cousin, FieldId, can also be used just by XORing the ColumnId with part of the RowId. Since TimeId's are supposed to be unique, they can probably be used the same way, although that would mean committing to not generating "beginning of moment" ExtendedTimes. (That is currently not implemented, but is not yet forbidden either.)

This could be implemented as a custom Hasher with std::collections::HashMap. However, we may be writing a custom hash map type anyway.

Split off a different crate for shared implementation details?

I made a bunch of implementation details public in case someone wants to implement trait TimeSteward in a different crate. However, I then crammed them all into one submodule so that they wouldn't clog up the documentation for TimeSteward users. Worse, for the ones that were macros, I labeled them #[doc (hidden)].

A logical thing to do instead would be to move the implementation support into a separate crate. Then it wouldn't appear in the TimeSteward USER documentation at all, but COULD be properly documented for the sake of TimeSteward implementors.

This is probably a long way off, due to various inconveniences. It will become more important if people start wanting to implement TimeSteward, or if the data structures in the implementation details become good enough and stable enough that I should provide them as separate libraries.

Use a faster hash algorithm?

Time spent hashing is currently a minority of the overhead, but not insignificant (10%-ish).

Apparently, siphash128 is no longer "experimental" (what's the hard rationale for this?).

HighwayHash is also worth considering (apparently it's much faster with SIMD? Although that improvement would be dependent on platform support).

Tracking issue for current very-disruptive API changes

Currently, I hesitate to make too many more test cases (and even support libraries) because I'm going to have to update all of them in loads of places when I make API changes.

I'm hoping to settle these ones in particular:

Checklist for "deciding what to do":

#32, garbage collection (I don't necessarily need to implement garbage collection, just figure out how it will affect the DataHandle API)
#34, SimpleTimeline interface
#35, query by reference
#36, QueryOffset::Before
#46, StewardData

Checklist for actually implementing it:

Single-threaded incremental serialization?

Optimization features

One possibility: user can provide a function FieldId->[(PredictorId, RowId)] that lists predictors you KNOW will be invalidated by a change to that field, then have that predictor run its get() calls with an input called "promise_inferred" or something so that we don't spend time and memory recording the dependency. (Can we also do something like this for events? It would be at least a little more complicated.)

Another: a predictor might have a costly computation to find the exact time of a future event, which it won't need to do if it gets invalidated long before that time comes. For that, we can provide a defer_until(time) method. (This is probably premature optimization until/unless we actually develop a simulation where it would help.)

Make better manual Debug impls

With our cyclic data structures, the default Debug impl overflows the stack instead of displaying something reasonable. I should fix this by making manual impls that are somehow restrained in their recursion.

Implement Rand for DeterministicRandomId

I had a rationale for not doing this, but it's no longer consistent with my current general approach.

Write a TimeSteward that takes advantage of parallelism

If N random events – near the same time, but at different locations – have very low chance of interfering with each other, then we can theoretically make use of N processors at almost 100% efficiency.

Naturally, parallelism raises some practical challenges. However, this is an important goal.

Fiat events need to be serializable

To make a standard way of synchronizing a simulation over the network, we need a way to transmit the fiat events, which means that they need to be serializable.

It's not obvious what the API for this should be. Is the user obligated to make a struct and implement Fn for it? Should we create our own trait for events (and maybe for predictors as well)? Can we provide macros that make this easier, to make up for losing the convenience of plain closures?

Review all panic messages

Some of my panic messages are good. Others are not. Others shouldn't be panics at all, but Results instead.

Automated testing of TimeSteward callers

There are various ways that TimeSteward callers can misbehave, which we should find ways to audit for.

Using nondeterministic code in predictors or events.
Using field types that have lossy serialization or don't have an exact match between Eq and serialization equality.
Using unsafe_now() improperly.
Using (column, event, predictor) types that are not included in Basics::IncludedTypes.
Using nonrandom data in (column, event, predictor) ids.

Implement Div<i64> and variants for Range

Right now, there's a Mul implementation for multiplying with i64 (which is a simpler case than multiplying by a Range), but no matching one for Div.

Initial documentation

Without documentation, the TimeSteward is essentially useless to anyone but me, and not optimal for myself as well. I need to do a serious pass at documenting all of the important features.

Currently, a few things have documentation comments, but it is haphazard.

This may have to wait on more API stabilization.

Refine TimeSteward macros

All macros that accept struct definitions should permit trailing commas. Allowing trailing commas in other contexts is also desirable.
Also, where clauses.
Consider whether it's possible to remove the [] requirement from generic parameters and where clauses.

There's a bit of the trade-off between usability and maintainability here, so it's not necessarily good to allow more things just because we can (if the implementation is too complicated).

Make a BTreeMap-like data structure with heavily optimized deque operations

This is currently needed by SimpleTimeline. New tracked-queries are almost always inserted at the end of the structure, but it needs to be possible to insert them in the middle in less than O(n) time. So I currently use BTreeMap, but that isn't efficient because it takes O(log n) time to insert at the end (or remove from the beginning, as forget_before() requires). This is a significant chunk of the current CPU overhead.

There is no other already written code that would use this, but a lot of possible TimeSteward algorithms would benefit from it.

I have an idea for a modified B-tree, where the structure has pointers to the first and last leaf, and instead of just the root being allowed to have only 2 children, that relaxed condition would apply to every node on the left and right spine. That way, insertions at the end would be able to fill each node efficiently from empty to full without doing lots of operations further up the tree.

Eliminate large single-operation costs

One advantage of the TimeSteward is that you will be able to take snapshots (such as for a save file or certain networking things) asynchronously. That is, you will be able to copy all fields incrementally, without stalling the simulation for the user.

This purpose is kind of defeated by the fact that we currently use HashMap to store all the fields. HashMap must synchronously move all current field data when reallocating the table. We should use a map type that can resize incrementally.

When deciding what data structure to use, we should also consider how it might enable other potential long-term goals, like persistence or concurrency.

Figure out what I really mean by StewardData [duplicate]

Currently, various parts of the code require trait StewardData, but I'm not sure they have the same actual requirements, or if the requirements I've chosen are exactly the correct ones.

For the types that can only be dereferenced using a TimeSteward, is there any 0-cost way we can provide safety against using the WRONG TimeSteward object?

Finish implementing a full TimeSteward

So far, I have only finished implementing flat time stewards, even though they fail to fulfill the main point of this crate. I shall remedy this.

Take advantage of GPU computations?

The TimeSteward model assumes that a very large number of things do individual, relatively small computations according to the same rules. Theoretically, this is ideal for massively parallel programming.

This is a far-future goal, because the state of GPU programming support (across target platforms) is not very good currently, and there may be incompatibilities between our current implementations and GPU abilities (for instance, function pointers are not necessarily compatible with GPU control flow limitations).

Fix endianness issues

Currently, we rely on SipHasher, which is definitely not endian-safe because the default implementations of Hasher functions use mem::transmute(). We MIGHT be able to work around this by having SiphashIdGenerator use only write(), and implementing the rest of the Hasher functions in an endian-safe way.

However, we also need to be on the lookout for Hash implementations that are not endian safe. If #[derive (Hash)] doesn't always produce endian-safe code, we will have to avoid Hash and Hasher entirely.

Whatever solution we use, we should create #[test] functions that check the output of a few known inputs to make sure the generation is behaving consistently for every build of time_steward.

EventRng may also be a concern. The Rng functions that generate floats use mem::transmute(), which probably isn't safe. Since floats are forbidden anyway, we can override those defaults with a simple panic. fill_bytes() does not, but has a comment implying that that might be reasonable under some circumstances, so we need to beware that the rand crate might change implementations in a way that causes trouble for us.

Use new polymorphism features when they arrive in Rust

TimeStewardLifetimedMethods and TimeStewardStaticMethods are hacks to work around the current limitations of Rust polymorphism. In the future, Rust will hopefully provide features that allow us to do these things with only one trait, TimeSteward.

This will also make it much easier to write code that is generic in the TimeSteward type. Our libraries, such as the collision detection, will no longer need to use awkward macros.

This will break compatibility with older versions of the TimeSteward, but some breaking changes are inevitable for this.

Figure out what I really mean by StewardData

Currently, various parts of the code require trait StewardData, but I'm not sure they have the same actual requirements, or if the requirements I've chosen are exactly the correct ones.

Also, if/when StewardData is actually the correct concept, if it's just a collection of supertraits, I probably want to make a blanket impl so that you don't have to implement it yourself all over the place.

Is QueryOffset::Before worth it?

I wanted to include this for auditing, so that the auditing code can query immediately before and after an event. Then I exposed it to the Accessor query interface because it was easy to do so. But is it good?

It adds significant boilerplate to every query, and I've never actually used the Before variant in the few examples I've made so far..
A DataTimeline that wanted to provide Before queries would be able to implement that as separate query type of its own, in the same way that you could make a DataTimeline that answers queries about X seconds in the past.
It creates weirdness if you create a DataTimeline in an event and then query it using Before in the same event. (This is related to the question of whether DataTimelineCells should need a creation time.)

amortized::test_lots() should audit ALL the invariants, not just some of them

I was delayed in solving a bug when ran a test case where I called the function before and after every operation, to detect the first moment the invariants were broken, because it only ended up triggering after the first broken invariant caused the second invariant to break.

Automated testing of TimeSteward internals

Our main tool for testing TimeSteward behavior should be cross verified time stewards. If two time stewards receive the same valid input and give different results, there is an internal error. However, this testing should be cautious not to give false-positives when the caller gives invalid input.

I began implementing something like this, but ran into trouble with Rust polymorphism limitations. In the short term, working around them may require a whole pile of macros.

We can also make a wrapper class that tests whether a TimeSteward obeys the valid_since() rules.

EventHandle should not implement Ord

Ordering EventHandles by time forces their Eq implementation to be "equality by ExtendedTime", but this means that 2 different event handles measure "equal" even if one of them is an obsolete prediction that has been destroyed and replaced by a new prediction. This is confusing.

Event handles should probably implement Eq by object identity.

Simple_flat and simple_full currently depend on this, and it seems generally desirable to be ABLE to put event handles in sorted data structures, so we should implement a simple wrapper that implements Ord.

When generic associated types become available…

The API can finally all the defined in the actual api module, rather than a macro.
Accessor can have an associated read-guard type so that snapshots don't have to keep RefCells when they would be happier just returning regular references.

Rethink valid_since() and forget_before() API

When I created the valid_since() concept, forget_before() didn't exist, and "when can snapshots be taken?" was the same as "when can fiat events be modified?". Now, simple_flat defaults to retaining enough data to take old snapshots, but can't insert old fiat events.

forget_before() is designed to allow memory to be freed – after you call forget_before(), you can't do anything before that time (except refer to snapshots you already took), but all TimeSteward implementors retain the ability to take new snapshots in all cases EXCEPT where you call forget_before(). Currently, valid_since() only determines when you're allowed to create fiat events, but its name isn't quite right for that.

It seems inconsistent that TimeSteward implementors are required to report valid_since() but isn't required to report the most recent time it has forgotten-before.

SimpleTimeline interface polishing

Should it really be able to report the time/event that set the data to its current value? This makes serialized snapshots bigger, and is often unnecessary, or misleading (as I found in simple_diffusion when I modified a SimpleTimeline just to change a field that was different from the one that was based on the last change time). If it DOES report, it should presumably report the EventHandle instead of just the ExtendedTime. Originally, I only included this feature because it happened to be easy to provide, but it adds some annoyances, and storing the time as a 64-bit object inside VaryingData is only a small amount of memory overhead, and this is SimpleTimeline, not an especially optimized timeline.
Theoretically, it no longer needs to force wrapping its data in an Option. You could construct it with whatever initial value you wanted. However, this is still awkward if it's required to report the event, because the initial value won't have an associated event. Discarding the initial value in forget_before() would also make the code more complicated.
What if timelines can only be created in events, and are forbidden from being queried before the creation time? Then we wouldn't need a separate "initial value", and could always return just data (or data + EventHandle).
In the absence of query-by-reference, is it worth optimizing by making a query implementation that has a fn(&VaryingData)->Value generic parameter and returns a Value? Or is this something that should be handled on the TimeSteward-API end?
Is there a nice way to make several variants of the SimpleTimeline concept? Say, ones that do or don't report EventHandles, ones that do or don't have the query-tracking tree...

Modification protocol

Currently, queries have a structured protocol, but for modifications, you just pass in a closure that takes an &mut DataTimeline.

I haven't found any technical reason why modifications would benefit from a stricter protocol, but there might be one, and the current arrangement seems strange.

It could technically help audit that the canonical behavior is always the same.

Decide on a license

I'm considering a bunch of license options, from the least restrictive (MIT) to the most restrictive (AGPL).

There are 2 main problem scenarios I want to avoid:

A random developer considers using the time steward for their project, but doesn't do it because the license is incompatible (or because they don't know that it is).
Somebody makes a proprietary version of the TimeSteward, and the Free version gets abandoned.

1 is much more likely than 2, obviously, so I should consider leaning towards less restrictive licenses, but that doesn't necessarily give me an easy answer.

There's a third scenario that might be nice to optimize for, but might be too difficult: 3. A big game studio makes a commercial game using the time steward and doesn't pay me any $$$ >:-(

I mean, I'd like to be paid for my labor, but it might be too impractical to do that – the easiest way would be if I hold onto the copyright for all code in the time steward, but it would be nice to be able to receive contributions from other Free Software developers without doing weird copyright negotiations.

Make it practical for other crates to implement trait TimeSteward

Currently, our standard TimeSteward implementations rely on macros which may or may not function correctly outside of the crate. We should clean this up and provide instructions for implementing TimeSteward, in case anyone wishes to do so.

Networking support

The TimeSteward is designed with networking in mind – especially for the case of keeping a simulation synchronized on 2 or more computers. Any full TimeSteward implementation is inherently suitable for networking, but we should go beyond this. The time_steward crate should provide a default networking system to do this, so that developers can easily build a networked simulation without having to write very much of their own networking code.

Automated profiling of TimeSteward simulations

There are a lot of statistics that would be useful for developing/optimizing TimeSteward simulations. Many of them aren't trivial to compute using client-side code. We should include features for getting some of the statistics, such as:

A visualization of how event dependencies propagate throughout the simulation
Distribution of (number of dependent events) over time (think "how far back in time can I go before one event will explode to the whole simulation")
Distribution of sizes of fields (which we can approximate through the Serialize trait)
Stats about loops; in most simulations, we hope that almost everything happens on the first iteration, so detecting the frequency of iterations beyond the first is useful.

Code cleanup

So far, I have been focusing on assembling code quickly so that I can have a working prototype. This has left the code in a somewhat messy state. Structs and impl's aren't in a consistent order. Vestigial glue code is still being used in some places. Many unnecessary warnings have not been fixed.

I intend to go through the code and do a cleanup pass, at some point as I approach an MVP.

It may be convenient for this to wait on more API stabilization.

Provide an easy way to generate large batches of ColumnId, etc.

We currently place upon the user the unchecked requirement to use secure random data for ColumnId, PredictorId, and EventId construction. Lazy users typing in nonrandom numbers would be awful, so it is critical that we make it as easy as possible to do the right thing.

This presumably needs to be cross-platform (not just a shell script that you can run on Linux).

We should also do more automatic checks to try to guarantee randomness (for instance, ban 0, and have more user-friendly checks for when you accidentally use multiples of the same id).