elidupree / time-steward Goto Github PK
View Code? Open in Web Editor NEWWrite games and simulations in Rust, using reactive programming for smoothness and replicability.
License: MIT License
Write games and simulations in Rust, using reactive programming for smoothness and replicability.
License: MIT License
DeterministicRandomId can be used as a hash table key without applying another hash function to it. Its cousin, FieldId, can also be used just by XORing the ColumnId with part of the RowId. Since TimeId's are supposed to be unique, they can probably be used the same way, although that would mean committing to not generating "beginning of moment" ExtendedTimes. (That is currently not implemented, but is not yet forbidden either.)
This could be implemented as a custom Hasher with std::collections::HashMap. However, we may be writing a custom hash map type anyway.
If N random events – near the same time, but at different locations – have very low chance of interfering with each other, then we can theoretically make use of N processors at almost 100% efficiency.
Naturally, parallelism raises some practical challenges. However, this is an important goal.
I'm considering a bunch of license options, from the least restrictive (MIT) to the most restrictive (AGPL).
There are 2 main problem scenarios I want to avoid:
1 is much more likely than 2, obviously, so I should consider leaning towards less restrictive licenses, but that doesn't necessarily give me an easy answer.
There's a third scenario that might be nice to optimize for, but might be too difficult: 3. A big game studio makes a commercial game using the time steward and doesn't pay me any $$$ >:-(
I mean, I'd like to be paid for my labor, but it might be too impractical to do that – the easiest way would be if I hold onto the copyright for all code in the time steward, but it would be nice to be able to receive contributions from other Free Software developers without doing weird copyright negotiations.
TimeSteward is theoretically ideal for incremental garbage collection: it is already obligated to represent its data as small, separable chunks with links between them, and retain immutable copies of old state.
(Here follows some not-perfectly-explained thoughts; maybe I will rewrite them when I'm in a clearer mental state.)
The basic idea is to record when each object is created, and incrementally trace through all objects based on their links, starting at the TimeSteward globals. When a trace is finished, it then iterates through all objects that have been allocated, and drops the ones that were created before the trace started but not reached by the trace.
In practice, implementing this will be very complicated. Some things to consider:
I wanted to include this for auditing, so that the auditing code can query immediately before and after an event. Then I exposed it to the Accessor query interface because it was easy to do so. But is it good?
Caching can be important for efficiency, but it is inherently dangerous to determinism. Maybe we should provide caching types that are impossible to use unsafely and/or can have runtime checks enabled to detect whether they cause nondeterminism.
Constant caches can be included in a Basics::Constants. They should probably serialize to nothing by default. If we write consistency checks, they have to be done differently than field consistency checks (since caches are only required to be consistent between the simulations, but are allowed to be different from each other).
Maybe we can use adapton
for this? It might be appropriate. It is not currently easy to learn (few examples), but if it turns out to be appropriate for this use, using it and writing our own examples would probably be better than implementing our own caching system from scratch.
So far, I have been focusing on assembling code quickly so that I can have a working prototype. This has left the code in a somewhat messy state. Structs and impl's aren't in a consistent order. Vestigial glue code is still being used in some places. Many unnecessary warnings have not been fixed.
I intend to go through the code and do a cleanup pass, at some point as I approach an MVP.
It may be convenient for this to wait on more API stabilization.
Because of rust-lang/rust#26925, Basics currently requires a bunch of unnecessary supertraits, just to make it possible to use #[derive] in situations where you should normally be able to use it. Eventually, we should have a better approach. Possible solutions:
One advantage of the TimeSteward is that you will be able to take snapshots (such as for a save file or certain networking things) asynchronously. That is, you will be able to copy all fields incrementally, without stalling the simulation for the user.
This purpose is kind of defeated by the fact that we currently use HashMap to store all the fields. HashMap must synchronously move all current field data when reallocating the table. We should use a map type that can resize incrementally.
When deciding what data structure to use, we should also consider how it might enable other potential long-term goals, like persistence or concurrency.
Our main tool for testing TimeSteward behavior should be cross verified time stewards. If two time stewards receive the same valid input and give different results, there is an internal error. However, this testing should be cautious not to give false-positives when the caller gives invalid input.
I began implementing something like this, but ran into trouble with Rust polymorphism limitations. In the short term, working around them may require a whole pile of macros.
We can also make a wrapper class that tests whether a TimeSteward obeys the valid_since() rules.
There are various ways that TimeSteward callers can misbehave, which we should find ways to audit for.
When I created the valid_since() concept, forget_before() didn't exist, and "when can snapshots be taken?" was the same as "when can fiat events be modified?". Now, simple_flat defaults to retaining enough data to take old snapshots, but can't insert old fiat events.
forget_before() is designed to allow memory to be freed – after you call forget_before(), you can't do anything before that time (except refer to snapshots you already took), but all TimeSteward implementors retain the ability to take new snapshots in all cases EXCEPT where you call forget_before(). Currently, valid_since() only determines when you're allowed to create fiat events, but its name isn't quite right for that.
It seems inconsistent that TimeSteward implementors are required to report valid_since() but isn't required to report the most recent time it has forgotten-before.
Ordering EventHandles by time forces their Eq implementation to be "equality by ExtendedTime", but this means that 2 different event handles measure "equal" even if one of them is an obsolete prediction that has been destroyed and replaced by a new prediction. This is confusing.
Event handles should probably implement Eq by object identity.
Simple_flat and simple_full currently depend on this, and it seems generally desirable to be ABLE to put event handles in sorted data structures, so we should implement a simple wrapper that implements Ord.
The TimeSteward model assumes that a very large number of things do individual, relatively small computations according to the same rules. Theoretically, this is ideal for massively parallel programming.
This is a far-future goal, because the state of GPU programming support (across target platforms) is not very good currently, and there may be incompatibilities between our current implementations and GPU abilities (for instance, function pointers are not necessarily compatible with GPU control flow limitations).
Currently, queries have a structured protocol, but for modifications, you just pass in a closure that takes an &mut DataTimeline.
I haven't found any technical reason why modifications would benefit from a stricter protocol, but there might be one, and the current arrangement seems strange.
It could technically help audit that the canonical behavior is always the same.
Most of the current API functions are okay.
The most important immediate issue is related to serialization. time_steward should provide features that make it easy to:
Is unsafe_now() the best way to serve its purpose?
Should rng(), random_id(), and constants() remain as they are? Right now, they are trait methods, which means that they could be implemented in different ways by each trait implementor. But they have simple, fixed ways they are supposed to behave. This leads to duplicate code and potential bugs.
ValidSince should have a method indicating whether it includes a particular base time. Perhaps it should implement Ord for itself as well.
insert_fiat_event() and erase_fiat_event() should probably return Result <(), FiatEventOperationError>.
Right now, there's a Mul implementation for multiplying with i64 (which is a simpler case than multiplying by a Range), but no matching one for Div.
An event might want to look around in a medium-sized DataTimeline, in a way that would be more efficient using references than by first copying all the data it might be going to use.
This is tricky because it involves making the query API much more complex, and probably returning guards rather than plain references.
There might be other approaches that could accomplish the same thing.
Currently, we rely on SipHasher, which is definitely not endian-safe because the default implementations of Hasher functions use mem::transmute(). We MIGHT be able to work around this by having SiphashIdGenerator use only write(), and implementing the rest of the Hasher functions in an endian-safe way.
However, we also need to be on the lookout for Hash implementations that are not endian safe. If #[derive (Hash)] doesn't always produce endian-safe code, we will have to avoid Hash and Hasher entirely.
Whatever solution we use, we should create #[test] functions that check the output of a few known inputs to make sure the generation is behaving consistently for every build of time_steward.
EventRng may also be a concern. The Rng functions that generate floats use mem::transmute(), which probably isn't safe. Since floats are forbidden anyway, we can override those defaults with a simple panic. fill_bytes() does not, but has a comment implying that that might be reasonable under some circumstances, so we need to beware that the rand crate might change implementations in a way that causes trouble for us.
With our cyclic data structures, the default Debug impl overflows the stack instead of displaying something reasonable. I should fix this by making manual impls that are somehow restrained in their recursion.
Ordinary floating-point numbers are nondeterministic. However, users will certainly want to use them. If we don't provide convenient emulation, they will be tempted to try to circumvent the rules or implement their own questionably-safe alternatives.
MPFR may be suitable for this?
Some of my panic messages are good. Others are not. Others shouldn't be panics at all, but Results instead.
Time spent hashing is currently a minority of the overhead, but not insignificant (10%-ish).
Apparently, siphash128 is no longer "experimental" (what's the hard rationale for this?).
HighwayHash is also worth considering (apparently it's much faster with SIMD? Although that improvement would be dependent on platform support).
I had a rationale for not doing this, but it's no longer consistent with my current general approach.
Currently, our standard TimeSteward implementations rely on macros which may or may not function correctly outside of the crate. We should clean this up and provide instructions for implementing TimeSteward, in case anyone wishes to do so.
There's a bit of the trade-off between usability and maintainability here, so it's not necessarily good to allow more things just because we can (if the implementation is too complicated).
api
module, rather than a macro.One possibility: user can provide a function FieldId->[(PredictorId, RowId)] that lists predictors you KNOW will be invalidated by a change to that field, then have that predictor run its get() calls with an input called "promise_inferred" or something so that we don't spend time and memory recording the dependency. (Can we also do something like this for events? It would be at least a little more complicated.)
Another: a predictor might have a costly computation to find the exact time of a future event, which it won't need to do if it gets invalidated long before that time comes. For that, we can provide a defer_until(time) method. (This is probably premature optimization until/unless we actually develop a simulation where it would help.)
What to do with the old code? Delete it? Put it in a subdirectory?
Deleting seems appropriate, considering that we do have the git history. If we want to move it into a subdirectory AND have it continue compiling successfully, it would require rewriting a lot of module paths in use statements. And it certainly seems desirable not to spend time compiling it when we only want to use the new stuff.
To be able to conveniently reference the old code, maybe I should just make a branch at the last commit where the old code still exists in the repo.
This change isn't trivial:
api.rs
and api_macros.rs
will be deleted, but deterministic_random_id.rs
is shared with the new code in its current form, and lib.rs
will need edits.src/support/collision_detection/
is old, but the other things in src/support/
are compatible.src/implementation_support/common.rs
has a few reused functions, which should be merged into the current src/rowless/implementation_support/common.rs
, but is mostly old. src/implementation_support/insert_only.rs
is shared. src/implementation_support/data_structures.rs
isn't actually used by the new code at all, but isn't dependent on the old code, so it shouldn't be deleted (but it's not technically proper for it to stay where it is if it doesn't help implement time stewards?) src/implementation_support/list_of_types.rs
is old.src/stewards/
is old.rowless::
from all their use statements, but otherwise should work. We also need to update the links in the HTML files after removing "rowless" from some example names.src/rowless/
deliberately use relative paths to ease this change.There are a lot of statistics that would be useful for developing/optimizing TimeSteward simulations. Many of them aren't trivial to compute using client-side code. We should include features for getting some of the statistics, such as:
To make a standard way of synchronizing a simulation over the network, we need a way to transmit the fiat events, which means that they need to be serializable.
It's not obvious what the API for this should be. Is the user obligated to make a struct and implement Fn for it? Should we create our own trait for events (and maybe for predictors as well)? Can we provide macros that make this easier, to make up for losing the convenience of plain closures?
fn(&VaryingData)->Value
generic parameter and returns a Value
? Or is this something that should be handled on the TimeSteward-API end?I was delayed in solving a bug when ran a test case where I called the function before and after every operation, to detect the first moment the invariants were broken, because it only ended up triggering after the first broken invariant caused the second invariant to break.
Currently, I hesitate to make too many more test cases (and even support libraries) because I'm going to have to update all of them in loads of places when I make API changes.
I'm hoping to settle these ones in particular:
Checklist for "deciding what to do":
Checklist for actually implementing it:
The TimeSteward is designed with networking in mind – especially for the case of keeping a simulation synchronized on 2 or more computers. Any full TimeSteward implementation is inherently suitable for networking, but we should go beyond this. The time_steward crate should provide a default networking system to do this, so that developers can easily build a networked simulation without having to write very much of their own networking code.
We could implement a deterministic HashMap type (i.e. one where the iteration order depends only on the elements contained). For instance, it could use linear hashing and have each bucket contain a sorted vector (or B-tree) of elements.
We might want to make groups more inherent the TimeSteward. Without special features, storing even a deterministic HashMap in the TimeSteward costs O(n) operations per event that makes a single insertion or deletion. Reducing that back to O(1) would be desirable. On the other hand, storing large amounts of data in a single field is discouraged, so we might not want to spend extra effort to support doing that. And if we do support it, it may also still be useful to create a deterministic HashMap type for use DURING single events.
TimeStewardLifetimedMethods and TimeStewardStaticMethods are hacks to work around the current limitations of Rust polymorphism. In the future, Rust will hopefully provide features that allow us to do these things with only one trait, TimeSteward.
This will also make it much easier to write code that is generic in the TimeSteward type. Our libraries, such as the collision detection, will no longer need to use awkward macros.
This will break compatibility with older versions of the TimeSteward, but some breaking changes are inevitable for this.
Currently, various parts of the code require trait StewardData, but I'm not sure they have the same actual requirements, or if the requirements I've chosen are exactly the correct ones.
Also, if/when StewardData is actually the correct concept, if it's just a collection of supertraits, I probably want to make a blanket impl so that you don't have to implement it yourself all over the place.
Currently, various parts of the code require trait StewardData, but I'm not sure they have the same actual requirements, or if the requirements I've chosen are exactly the correct ones.
We currently place upon the user the unchecked requirement to use secure random data for ColumnId, PredictorId, and EventId construction. Lazy users typing in nonrandom numbers would be awful, so it is critical that we make it as easy as possible to do the right thing.
This presumably needs to be cross-platform (not just a shell script that you can run on Linux).
We should also do more automatic checks to try to guarantee randomness (for instance, ban 0, and have more user-friendly checks for when you accidentally use multiples of the same id).
This is currently needed by SimpleTimeline. New tracked-queries are almost always inserted at the end of the structure, but it needs to be possible to insert them in the middle in less than O(n) time. So I currently use BTreeMap, but that isn't efficient because it takes O(log n) time to insert at the end (or remove from the beginning, as forget_before() requires). This is a significant chunk of the current CPU overhead.
There is no other already written code that would use this, but a lot of possible TimeSteward algorithms would benefit from it.
I have an idea for a modified B-tree, where the structure has pointers to the first and last leaf, and instead of just the root being allowed to have only 2 children, that relaxed condition would apply to every node on the left and right spine. That way, insertions at the end would be able to fill each node efficiently from empty to full without doing lots of operations further up the tree.
Without documentation, the TimeSteward is essentially useless to anyone but me, and not optimal for myself as well. I need to do a serious pass at documenting all of the important features.
Currently, a few things have documentation comments, but it is haphazard.
This may have to wait on more API stabilization.
So far, I have only finished implementing flat time stewards, even though they fail to fulfill the main point of this crate. I shall remedy this.
I made a bunch of implementation details public in case someone wants to implement trait TimeSteward in a different crate. However, I then crammed them all into one submodule so that they wouldn't clog up the documentation for TimeSteward users. Worse, for the ones that were macros, I labeled them #[doc (hidden)].
A logical thing to do instead would be to move the implementation support into a separate crate. Then it wouldn't appear in the TimeSteward USER documentation at all, but COULD be properly documented for the sake of TimeSteward implementors.
This is probably a long way off, due to various inconveniences. It will become more important if people start wanting to implement TimeSteward, or if the data structures in the implementation details become good enough and stable enough that I should provide them as separate libraries.
Currently, simply_synchronized has a few weaknesses:
I've been steadily exposing ExtendedTime more and more, and at this point, there's no reason not to go the rest of the way.
This applies to snapshot_before(), valid_since(), updated_until_before(), and forget_before().
This is mostly just an elegance thing, but it may be useful to allow snapshots of unusual ExtendedTimes for debug-examining stuff.
insert_fiat_event() could still use base time + id, because it's similar to the interface for creating predictions, and it supports automatically protecting the user from colliding fiat event ids with prediction ids.
The main downside of this change is that it would obligate the user to call beginning_of() themselves, but that seems tolerable.
IncrementalTimeSteward::step() currently has a bit of a problem: if you're using a flat TimeSteward, you might have some free time to take a bunch of steps, but you can't afford to step beyond time X, the time of the next frame. The problem is that if IncrementalTimeSteward::updated_until_before() is lower than X now, you don't know whether it will be lower than X after one step.
One approach would be to make a IncrementalTimeSteward::next_step_time()->ExtendedTime
. However, this isn't forwards-incompatible with concurrent TimeStewards that do a bunch of concurrent operations during step().
So, I propose giving step() a second argument, making it fn step (&mut self, limit: Option <ExtendedTime>)->bool
. It would be guaranteed not to advance the settled time (see #38) as far as limit
. It would also return false if there was no more work to do.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.