leonbohn / lama Goto Github PK

View Code? Open in Web Editor NEW

4.0 1.0 0.0 2.57 MB

Learning and Manipulation of Automata

License: MIT License

Rust 99.90% Shell 0.10%

lama's Introduction

This is an umbrella library for dealing with (omega) automata in rust. It contains the following subpackages:

automata introduces the basic types for dealing with transition systems and words. It also contains combinators for manipulating transition systems
hoars implements a parser for the HOA format for representing omega automata
automata-learning deals with the inference of automata, defines both passive and active automaton learning schemes and implements some algorithms
bin contains a command line interface to some of the functionality provided in the other packages

Transition systems and automata

In essence, an automaton consists of a transition system (TS) together with an acceptance component. A TS is simply a finite collection of states (the set of all states is denoted $Q$) which are connected with directed edges. It can have colors on its states (then each state $q \in Q$ is assigned precisely one color) as well as colors on its edges (meaning every edge between two states has a color).

The implementation of TS is generic over the alphabet, which can either be simple (i.e. it is just a collection of individual symbols/chars as given implemented in the CharAlphabet struct) or propositional (meaning the alphabet consists of a collection of atomic propositions). Similar to other libraries dealing with (omega) automata, we distinguish between edges and transitions in a TS. Specifically, an edge is determined by its origin/target state, the color it emits and a guard, which is of the expression type that the alphabet determines. A transition on the other hand is a concretization of an edge, where the guard is a single symbol (also determined by the alphabet). For simple alphabets, expressions and symbols coincide, whereas for propositional alphabets, expressions are formulas (represented as BDDs) over the atomic propositions and symbols are satisfying valuations of expressions.

The most important trait is TransitionSystem, which provides access to the indices of all states and is capable of returning iterators over the outgoing edges of a state. It also provides a lot of combinators, which allow manipulation of the TS. For example map_state_color consumes the TS and relabels the colors on the states through applying a given function. Most combinators consume self, returning a new TS, which is mainly to avoid unneccessary cloning of the underlying data structures. If the original TS should continue to exist, call as_ref or simply use a reference to the TS. As each combinator returns an object which again implements TransitionSystem, these can easily be chained together without too much overhead. While this is convenient, the applied manipulations are computed on-demand, which may lead to considerable overhead. To circumvent this, it can be beneficial to collect the resulting TS into a structure, which then explicitly does all the necessary computations and avoids recomputation at a later point. There are also variants collect_with_initial/collect_ts, which either take the designated ininital state into account or collect into a specific representation of the TS.

The crate defines some basic building blocks of TS which can easily be manipulated (see Sproutable), these are

NTS/DTS (the latter is just a thin wrapper around the former). These store edges in a vector, a state contains a pointer to the first edge in this collection and each edge contains pointers to the previous/next one.
BTS which stores transitions in an efficient HashMap

Further traits that are of importance are

Pointed which picks one designated initial state, this is important for deterministic automata
Deterministic, a marker trait that disambiguates between nondeterministic and deterministic TS. As TransitionSystem only provides iterators over the outgoing edges, it can be used to deal with nondeterministic TS, that have multiple overlapping edges. By implementing Deterministic, we guarantee, that there is always a single unique outgoing transition for each state.
Sproutable enables growing a TS state by state and edge/transition by edge/transition. Naturally, this is only implemented for the basic building blocks, i.e. BTS, DTS and NTS.

Profiling while Benchmarking

We can profile the benchmark code with cargo bench --bench forc_paper -- --profile-time 20 where 20 is the time for which the benchmarks will be run. A flamegraph will be generated and placed in target/criterion/forc_paper/profile/flamegraph.csv.

Development

To ensure code quality, the main branch should only be modified via pull requests. The CI is configured that any pull request will run through a set of checks and tests. Specifically, the formatting is tested, then clippy is run and finally all tests will be run. To avoid unnecessary runs of the CI, these steps can be run locally before every commit. This can be done through the check script included in the base of the repository.

lama's People

Contributors

Stargazers

Watchers

lama's Issues

Streamline the code and remove unused stuff

There are lots of things in the code which are not used and can be removed. We should have a look at

Boundedness
Rawpresentation and similar business
Run and Successful/Partial/Induces

Include representation for edges not associated to a transition system

The trait IsEdge should be implemented for everything that represents an edge. At the moment, there is no way to represent an edge that is not associated to a specific transition system. This is a problem because paths are not associated to transition systems and we therefore cannot extract edges from paths.

Comparing Offsets of Omega Words

Checking for equality of offsets of omega words (e.g. type automata::word::Offset<'_, char, Reduced<char>>) can return false even though the represented word is identical.
Failing code example:

let w0 = Reduced::ultimately_periodic("a", "b");
let w1 = Reduced::periodic("b");
let offset0 = w0.offset(1);
let offset1 = w1.offset(0);
assert_eq!(offset0, offset1);

This behavior is probably fine and can be avoided by normalizing the offsets with offset.normalized(). However, I think there should be a method of the form offset0.equals(offset1) on offsets of omega words.
Note that this already exists for offsets of finite words.

Unclear naming

Some types in the automata module have names that can be confusing.
The ones I found are alphabet::Simple and word::omega::Reduced.
Additionally, as we discussed, it is possible to make representing omega words in their reduced form the default.

Specialize and rework collection of transition systems

Right now collect is pretty dumb and always builds an entirely new transition system. We should streamline this and specialize it when possible. For example when a DTS collects, we may simply return self. Similarly, when a MapStateColor collects, it would be enough to collect the underlying ts and then apply a transformation to the state colors in-place. This would be much more efficient and ergonomic.

The concrete steps for this issue are

Make sure different flavors of collect use each other to make specialization easy and require overwriting only a single default impl
Decide if we might skip a default implementation entirely to force efficiency and force implementors to deal with the operation explicitly
Implement efficient specializations wherever applicable

Fix documentation

We would like to have a thorough documentation that includes also some examples.

Running the examples from the docs during testing sounds reasonable and can be done by passing --test to rustdoc.

Reorder tuples for transitions

At the moment, we sometimes return tuples (source, expression, target, color) as a transition/edge. In other places, the more natural (source, expression, color, target) is used. We should only be using one of them. The only sensible choice is the latter.

Name Space Confusion with Idx

There is a type alias Idx for usize in the ts module but Idx is also commonly used as a generic type describing index types throughout the code.
An example is the Path struct.

Explain different types of Alphabets

Right now, there are multiple different types of alphabets and we do very little to explain in the documentation, what the differences between them are and why one should be used over the other. Therefore, we should

Improve the documentation of each type of Alphabet to better indicate its uses
Add documentation to the module/library to highlight the different types of alphabets and make it easier to grasp why they exist

Write benchmarks for common functionality

We should introduce a comprehensive benchmark suite that tests the performance of common operations such as running words, manipulating transition systems and collecting transition systems.

It would be good to design a set of transition systems on which these benchmarks are performed, which are hand-crafted and not randomly generated. If we then use a library such as criterion or iai, we can reliably detect changes in performance.

Get recurrent transitions for runs of a transition systems on omega words

For a learning algorithm I need to collect the transitions that are visited infinitely often when running a transition system on a set of omega words. For running the TS on the words I use the omega_run function which returns a Lasso (if the run is successful). For these, it is only possible to get the recurrent state indices, state colors and edge colors separately. What i want is an iterator over tuples of the form (source, symbol, target, color) or similar.

Refactor prelude

The automata::prelude module should be looked at. There might be some traits in there, which really should not be present in a prelude. We should make sure that this enables frictionless work with the automata package but at the same time niche traits/structs should not be included there.

Reorganize the whole `automaton` package

Rework Macros for different automata types

Right now, different automata types are implemented by the impl_mealy_automaton and impl_moore_automaton macros in mealy.rs and moore.rs, respectively. We should explore ways to make this more ergonomic and easy to refactor.

One option would be to define each automaton type entirely by hand and have more control like that.

Alternatively, we could introduce a struct like Automaton, which takes in a transition system and a set of semantics.
These semantics then define how a run is handled.
Some testing with the idea has been done in semantics.rs. Could be nicer to work with as we can use const generics to distinguish finite from infinite semantics.

Smarter computation of Right congruence

Remove unused traits

Right now, there are quite a few traits that appear just once and are not used. These should be investigated and ultimately removed as they serve no purpose other than to confuse people using the library. For example mapping::Morphism or ts::HasColorMut/ts::HasColor are not really used and it is not really clear what exact purpose they serve at the moment.

These traits should either be removed or used more consistently. I am in favor of the former option, as it decreases the overall complexity of the code.

Introduce a `Void` type for uncolored edges or states

We should represent uncolored edges or states with a separate type that is not just (). As the empty type () implements Color, this leads to some annoyances if we want to properly implement the "dropping" of colors during implement of the collecting of transition systems.

The rough idea of this change should follow along a type like this

pub struct Void;
impl<C: Color> From<C> for Void {
    fn from(_: C) -> Self {
        Void
    }
}