GithubHelp home page GithubHelp logo

turmoil's Introduction

Turmoil

This is very experimental

Add hardship to your tests.

Turmoil is a framework for testing distributed systems. It provides deterministic execution by running multiple concurrent hosts within a single thread. It introduces "hardship" into the system via changes in the simulated network. The network can be controlled manually or with a seeded rng.

Crates.io Documentation Build Status Discord chat

Quickstart

Add this to your Cargo.toml.

[dev-dependencies]
turmoil = "0.6"

See crate documentation for simulation setup instructions.

Examples

License

This project is licensed under the MIT license.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in turmoil by you, shall be licensed as MIT, without any additional terms or conditions.

turmoil's People

Contributors

battesonb avatar benjscho avatar brightcoder avatar camshaft avatar carllerche avatar davidpdrsn avatar domlupo avatar jeremymill avatar luciofranco avatar marcbowes avatar marinpostma avatar mcches avatar mh32 avatar petrichorit avatar taiki-e avatar tannerrogalsky avatar tereshch-aws avatar th7nder avatar tthebst avatar zakvdm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

turmoil's Issues

Infinitely running.

Hi, I'm not sure it is a bug, but given the code below ( I know it's a misuse of TcpListener) ,
the sim runs infinitely, and it is expected to stop within 10 logical seconds, and far shorter real world duration.

Maybe it's make sense to add a new simulator config like realworld_duration ?

#[test]
fn infinite() -> turmoil::Result {
    use std::time::SystemTime;
    use turmoil::{net::TcpListener, Builder};

    let mut sim = Builder::new().epoch(SystemTime::UNIX_EPOCH).build();
    sim.host("s", || async {


        loop {
            TcpListener::bind("0.0.0.0:80").await?;
        }
    });

    sim.run()
}

截屏2023-11-03 18 04 54

meets Tokio `enble_io()` error

I was trying to init a simulator runs a closure which bind to localhost. like this:

#[test]
fn test_main() -> Result {
    let mut sim = Builder::new()
        .build();

    sim.client("10.129.11.11", async {
        let (mut sock, addr) = TcpListener::bind((IpAddr::from(Ipv6Addr::UNSPECIFIED), 8080))
            .await?
            .accept()
            .await?;
        sock.write_i32(124).await?;
        Ok(())
    });
    sim.run()
}

And I meets error :
截屏2023-06-30 19 35 36

It seems turmoil didn't call tokio's enable_io() method when initiating a tokio runtime?
related code in turmoil/src/rt.rs

fn init() -> (Runtime, LocalSet) {
    let mut builder = tokio::runtime::Builder::new_current_thread();

    #[cfg(tokio_unstable)]
    builder.unhandled_panic(tokio::runtime::UnhandledPanic::ShutdownRuntime);

    let tokio = builder.enable_time().start_paused(true).build().unwrap();

    tokio.block_on(async {
        // Sleep to "round" `Instant::now()` to the closest `ms`
        tokio::time::sleep(Duration::from_millis(1)).await;
    });

    (tokio, new_local())
}

Support holding messages after send

Is your feature request related to a problem? Please describe.
No. This is new functionality.

Describe the solution you'd like
During the simulation I'd like to place a "hold" on the link between two hosts. Any messages sent will remain in the queue while the "hold" is active. At a later time I'd like to remove the "hold", which allows delivery for the queued messages.

This is useful for tests that need to control the ordering of events across multiple hosts.

support ephemeral port assignments

I tried the following code:

#[test]
fn ephemeral_port() -> Result {
    let mut sim = Builder::new().build();

    sim.client("client", async {
        let sock = bind_to(0).await?;

        // turmoil should assign a port to the ephemeral range
        assert_ne!(sock.local_addr()?.port(), 0);
        assert!(sock.local_addr()?.port() >= 49152);

        Ok(())
    });

    sim.run()
}

It would be nice to support ephemeral port assignment. This is useful for clients that don't care about the specific port number; they just need a free port.

From https://www.rfc-editor.org/rfc/rfc6335#section-6:

o the System Ports, also known as the Well Known Ports, from 0-1023
(assigned by IANA)

o the User Ports, also known as the Registered Ports, from 1024-
49151 (assigned by IANA)

o the Dynamic Ports, also known as the Private or Ephemeral Ports,
from 49152-65535 (never assigned)

Add warning for blocking tasks that block the sim

Related to #139 we should add a warning that prints when a blocking task is still active in the runtime causing the next tick to not happen. This can be done by 1) adding a blocking task count metric to tokio-metrics and then to spin a bg thread that checks this metric and some sort of tick count. It will then start printing if the tick can not progress.

Return errors instead of panicking, when sending invalid packets.

Currently turmoil will panic, if a packet is send to an ip address that does not exist,
since this will result in an invalid access to the index map in top.rs.

This does not mirror the behavior if tokio or std sockets and panicking seems too extrem,
especially since some applications may create such sockets, expecting errors instead of
panics.

Therefore it might be advantageous to return errors instead of panicking in World::send_message.

Example

This example will panic.

fn main() -> Result {
     let mut sim = Builder::new().build();
     sim.client("client", async move {
         let _ = net::TcpStream::connect("192.168.30.1:80").await?;
         Ok(())
     });

     sim.run()
 }
thread 'main' panicked at 'IndexMap: key not found', ~/dev/turmoil/src/top.rs:221:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: JoinError::Panic(Id(1), ...)

Calling `run()` after crashing a host errors

Repro:

#[test]
fn run_after_host_crashes() -> Result {
    let mut sim = Builder::new().build();

    sim.host("h", || async { future::pending().await });

    sim.crash("h");

    sim.run()
}

Fails with:

running 1 test
Error: JoinError::Cancelled(Id(1))
test sim::test::run_after_host_crashes ... FAILED

failures:

failures:
    sim::test::run_after_host_crashes

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 10 filtered out; finished in 0.00s

Add a condensed tracing format

Currently, tracing emits a "pretty-print" JSON format for all events. Some scenarios warrant seeing a more condensed version of the output.

Make the format configurable. Perhaps it could look like this?

src(dot) | dst(dot) | what(send, recv, etc.) | timestamp | ...

Add simulated PRNG

We should support deterministic PRNG for usage for retries, hashmaps, etc. We can accomplish this by providing a deterministic version of RandomState.

Implement additional UDP features

It would be useful to extend the current UDP model with the following features:

  • Randomized packet corruption/truncation
  • Randomized packet duplication/retransmission
  • Randomized packet reordering - this can be accomplished by having some jitter assigned to each packet.
  • Preferring new packets instead of old on full receive buffers - currently we drop new packets on full buffers but this isn't usually what network stacks do or what applications expect. - turns out this is exactly what stacks do - see #128 (comment)
  • Setting the MTU for a path and being able to drop and/or truncate packets larger than that value
  • Simulate bufferbloat (i.e. latency increases by some function as the number of packets being buffered increases).

Re-starting a crashed host with bounce panics

Repro:

#[test]
fn restart_host_after_crash() -> Result {
    let mut sim = Builder::new().build();

    sim.host("h", || async { future::pending().await });

    // crash and step to execute the err handling logic
    sim.crash("h");
    sim.step()?;

    // restart and step to ensure the host sfotware runs
    sim.bounce("h");
    sim.step()?;

    Ok(())
}
running 1 test
thread 'sim::test::restart_host_after_crash' panicked at 'missing host', src/sim.rs:143:43
stack backtrace:
   0: rust_begin_unwind
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
   2: core::panicking::panic_display
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:139:5
   3: core::panicking::panic_str
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:123:5
   4: core::option::expect_failed
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/option.rs:1879:5
   5: core::option::Option<T>::expect
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/option.rs:741:21
   6: turmoil::sim::Sim::run_with_hosts
             at ./src/sim.rs:143:22
   7: turmoil::sim::Sim::bounce
             at ./src/sim.rs:125:9
   8: turmoil::sim::test::restart_host_after_crash
             at ./src/sim.rs:606:9
   9: turmoil::sim::test::restart_host_after_crash::{{closure}}
             at ./src/sim.rs:596:5
  10: core::ops::function::FnOnce::call_once
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
  11: core::ops::function::FnOnce::call_once
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5

On crash, the rt is removed causing this issue.
https://github.com/tokio-rs/turmoil/blob/main/src/sim.rs#L321-L322

Add support for `TcpStream#peek`

Method: TcpStream#peek

I have run into an issue that requires the use of this method. I would be happy to get started on a fix, but I'm uncertain on the approach to take. The issue I'm running across is (1) that self is immutably referenced:

pub async fn peek(&self, buf: &mut [u8]) -> io::Result<usize>

This was previously mutable and relied on poll_peek.

The other issue (2) is that turmoil currently implements the ReadHalf and WriteHalf using a tokio::mpsc channel. However, it doesn't look like the Receiver has an option to immutably read the internal lock-free list.

So my question is whether we should paper over the ReadHalf with an internal data structure, try to get this implemented in tokio/chan or some other potential solution I'm missing?

Explore state exploration in turmoil

State exploration entails navigating all (or some portion of) the possible states a program can enter during its execution. Model checkers exist (TLA+, P, etc.), but they require building a model that is separate from the actual implementation.

turmoil provides an interesting opportunity where we are running all of the real code, but with a simulated network. The network provides a place to both view states and control state transitions. Can we expose the right APIs to make state exploration possible?

Note that this approach differs from fuzzing the network, which is already possible today.

Question: Reproduce, Random and Time

I read the examples and tests code in turmoil but still have some puzzle:

  1. In the similar project MadSim , there is a "Test Seed" for every run to generate a deterministic time and random number, so users can use same seed to get exactly same result. Can turmoil do something like this? and how ?

BTW, I thought it was Sim::epoch() to do this, but I got different result by every run in code below:

use rand::SeedableRng;
use std::time::SystemTime;
use turmoil::{Builder, Result};

fn main() {
    println!("Hello, world!");
}

#[test]
fn test_main() -> Result {
    let mut sim = Builder::new()
        .epoch(SystemTime::UNIX_EPOCH)
        .rng(rand::rngs::StdRng::seed_from_u64(10))
        .build();

    sim.client("host", async {
        println!("Hello world!");

// now() is diffferent in every run . And there seems no API in turmoil to mock time.
        println!("now: {:?}", std::time::Instant::now());
        Ok(())
    });
    sim.run()
}
  1. Can turmoil simulate IO other than network? (for example, Disk IO )

Regex matching throws exception in Pair

When using hold with regular expressions, Pair throws an exception because it expects the two IpAddr to be different.
Here is a minimal test:

    #[test] 
    #[cfg(feature = "regex")]
    fn hold_all() -> Result {
        let mut sim = Builder::new().build();

        sim.host("host", || { async { future::pending().await } });
        sim.client("client", async {  
                hold(regex::Regex::new(r".*")?, regex::Regex::new(r".*")?);
                Ok(())
        });
    
        sim.run()?;
        Ok(())
    }

Fails with:

thread 'sim::test::hold_all' panicked at 'assertion failed: `(left != right)`
  left: `192.168.0.1`,
 right: `192.168.0.1`', src/top.rs:35:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'sim::test::hold_all' panicked at 'a spawned task panicked and the LocalSet is configured to shutdown on unhandled panic', /Users/foo/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/task/local.rs:603:17

Spawn blocking blocking sim runtime

main...lucio/spawn-blocking-bug#diff-ace3e8abab9fb7b84efd253a7cea095084172b5cc3431426e3391305a554b152R46

With this example code its possible to never run the client future as the server one will hang until all spawn blockings complete. The real answer here is to not use threads since this removes determinism. But this is still surprising behavior. The work around is to use another thread provider like a different tokio runtime (where you call spawn_blocking on that) or std::thread.

cc @MarinPostma @mcches

Support client and host software errors

Currently, we only have panic to trigger failure during simulation runs. This makes writing both the test and host software a little clunky, as we can't ? return on err.

To make the experience better, we can define a dynamic Error type and have both client and hosts supply a future that aligns. On each run() iteration, we can check if any host finishes with an error, end the simulation and return the error.

e.g.

pub type TurmoilResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;

Add simulated disk Io

Is your feature request related to a problem? Please describe.
No. This is new functionality.

Describe the solution you'd like
Hosts have an Io concept today, but it is just network. Add the ability to write/read to/from disk, and have this state persist across host restarts.

Document determinism guidelines

Turmoil is built on the concept of deterministic execution. Using structures such as HashMap initialize with non-deterministic RandomState. Both the internals of turmoil and applications using it need to buy in.

e.g. HashMap, HashSet, tokio::select!, etc.

Document the guidelines.

Support multiple network interfaces

This refactor aims to introduce the ability of nodes to have multiple addresses
in distinct subnets.

Immediate Goals

  • Each node should be bound to an unique Ipv4Addr AND an unqiue Ipv6Addr
  • All bound addresses should be in a predefined subnet (like 192.168.0.0/16)
  • The available subnets should be configurable in the Builder
  • Addresses can be either automatically or manually assigned

Challenges

I have tried to implement this, and come to the conlsuion that some major changes
internally AND externally would be nessecary. Notably:

  • Nodes, and thus Rt/Host can no longer by identified by a single IpAddr.
    The best possible solution would be to identify them by something like a MAC addr,
    but that would warrent major internal changes
  • lookup would need to return more than one possible address, thus the public API
    of ToIpAddr / lookup / lookup_many would need to change. This could be a good
    moment to introduce API compliance with either std::net or tokio::net
  • using ToIpAddr for module creation creates problems when statically assigning addresses.
    Even without this refactoring, nodes with explicitly assigned address cannot have human readable
    names, since their place is traded for the address assignment. Mixed IP subnets only enlarge this
    problem. The current api of Sim::client / Sim::host provides no way to explicitly assign both
    an Ipv4 and an Ipv6 address. In short, the current API cannot support a node with explicitly assigned
    addresses, let alone a human-readable name, so changes would be nessecary.

In my opinion, this amount of changes would exceed the scope of one PR, so it might be benifical
to make step by step changes to the public API. However this warrants discussion.

Some related thoughts

  • While not in the scope of this refactoring, binding sockets to specific addresses may be beneficial
  • in std/tokio Ipv6 sockets bound to [::] can receive incoming Ipv4 packets (addresses are being mapped to Ipv6),
    however the reverse is not possible. This seems like an rare edge case, so i do not know whether we
    should ever support this behaviour
  • I might prove useful in the future, to refrain from hardcoding only two possible addresses in two possible subnets per node.
    Supporting a set of bound addresses+subnets might be beneficial to a) remodel localhost, to use top.rslinks or b) add support for multiple interface, thus multiple subnets, should that ever be a goal

As a reference, my current test implementation can be found here.
I have closed the corresponding PR #125, since it is already out of date.

Progress

  • types representing subnets
  • dns lookups that may return multiple IpAddrs
  • decoupling dns lookup and dns registration
  • uuid as Host/Rt identifers
  • updated node creation API
  • subnet configuration in Builder
  • multiple network interfaces per node (according to subnet configuration)
  • socket support for binding to specific addresses

Support binding multiple addrs within a host

Listener::bind() was added in #35, but it assumes the sole host's SocketAddr. It's both reasonable to support this (say for loopback or multiple acceptors within a process) and necessary to mirror tokio::net.

Support bouncing a host

Is your feature request related to a problem? Please describe.
No. This is new functionality.

Describe the solution you'd like
Hosts in the simulation are simply futures. During run_until() I'd like to have the ability to "bounce" a host (cancel, join and restart).

Fix AsyncRead impl for TcpStream

Is your feature request related to a problem? Please describe.

Yes, turmoil::net::TcpStream does not behave like tokio::net::TcpStream.

AsyncRead is broken if the supplied buf does not have capacity for the next message.

#[test]
fn read_buf_smaller_than_msg() -> Result {
    let mut sim = Builder::new().build();

    sim.client("server", async {
        let listener = bind().await?;
        let (mut s, _) = listener.accept().await?;

        s.write_u64(1234).await?;

        Ok(())
    });

    sim.client("client", async {
        let mut s = TcpStream::connect(("server", PORT)).await?;

        let mut buf = [0; 1];
        // panic!: buf.len() must fit in remaining()
        let _r = s.read(&mut buf).await?;

        Ok(())
    });

    sim.run()
}

See:

buf.put_slice(bytes.as_ref());

Describe the solution you'd like

Align turmoil with tokio::net.

Loopback is incomplete

The following scenarios exist for client -> server within the same host:

(Only Tcp is shown, but we need to handle it for Udp as well)

// bind | connect

// 0s | 127.0.0.1
// client: local Ok(127.0.0.1:49582), peer Ok(127.0.0.1:1234)
// server: 127.0.0.1:49582, local Ok(127.0.0.1:1234), peer Ok(127.0.0.1:49582)

// 127.0.0.1 | 127.0.0.1
// client: local Ok(127.0.0.1:49622), peer Ok(127.0.0.1:1234)
// server: 127.0.0.1:49622, local Ok(127.0.0.1:1234), peer Ok(127.0.0.1:49622)

// 0s | 192.168.1.42
// client: local Ok(192.168.1.42:49716), peer Ok(192.168.1.42:1234)
// server: 192.168.1.42:49716, local Ok(192.168.1.42:1234), peer Ok(192.168.1.42:49716)

// 127.0.0.1 | 192.168.1.42
// Error: Os { code: 61, kind: ConnectionRefused, message: "Connection refused" }

The first two work as expected, including setting the correct local|peer_addr on each side of the stream. The last two cause panics today due to holes in the stop-gap implementation for loopback.

We need to address this with workarounds and/or include this in the refactor being discussed in #132 .

More interesting examples of testing distribution system building blocks

Description

It would be great to provide a few more complex example to showcase more Turmoil capabilities. The example should be succinct with lightweight dependencies but need functional testing for:

  1. Fault tolerance
  2. Scalability to many nodes

Proposal

Food for thought - https://martinfowler.com/articles/patterns-of-distributed-systems/ with a few candidates:

  1. Heartbeat: seems straightforward, can be made more complex with Gossip?
  2. Leader - Follower: some reference implementation here for raft, seems to touch heartbeat, quorum as well - too big?
  3. 2PC
    Others?? Happy to contribute

Clean up network topology semantics

The simulation has the ability to manually and randomly change network conditions during the simulation. This was initially designed for the datagram (UDP) APIs, and does not fully translate to streams (TCP), namely dropping messages. The goal of the simulation is not to test that TCP works, rather it aims to test that applications built over TCP work correctly. These applications lean on the guarantees that TCP provides, ie message order.

Currently, one can apply two types of network partitions:

partition: All messages are dropped. Works for datagram. Not supported on established streams, however it works for new connections as we only send one message for the 3-way handshake.

See: https://github.com/tokio-rs/turmoil/blob/main/src/world.rs#L250

hold: Hold all messages "on the network". Works for both modes.

The goal of this issue is to figure out consistent semantics and naming for both networking modes.

Cannot build project with turmoil

I am trying to use turmoil for one of my projects, but it is failing to build.

steps to reproduce:

cargo new test-turmoil
cd test-turmoil
cargo add turmoil
cargo check

yields

error[E0433]: failed to resolve: could not find `UnhandledPanic` in `runtime`
  --> /Users/mpostma/Documents/code/rust/turmoil/src/rt.rs:94:42
   |
94 |         .unhandled_panic(tokio::runtime::UnhandledPanic::ShutdownRuntime)
   |                                          ^^^^^^^^^^^^^^ could not find `UnhandledPanic` in `runtime`

error[E0433]: failed to resolve: could not find `UnhandledPanic` in `runtime`
   --> /Users/mpostma/Documents/code/rust/turmoil/src/rt.rs:108:43
    |
108 |     local.unhandled_panic(tokio::runtime::UnhandledPanic::ShutdownRuntime);
    |                                           ^^^^^^^^^^^^^^ could not find `UnhandledPanic` in `runtime`

error[E0599]: no method named `unhandled_panic` found for mutable reference `&mut tokio::runtime::Builder` in the current scope
  --> /Users/mpostma/Documents/code/rust/turmoil/src/rt.rs:94:10
   |
94 |         .unhandled_panic(tokio::runtime::UnhandledPanic::ShutdownRuntime)
   |          ^^^^^^^^^^^^^^^ method not found in `&mut tokio::runtime::Builder`

error[E0599]: no method named `unhandled_panic` found for struct `LocalSet` in the current scope
   --> /Users/mpostma/Documents/code/rust/turmoil/src/rt.rs:108:11
    |
108 |     local.unhandled_panic(tokio::runtime::UnhandledPanic::ShutdownRuntime);
    |           ^^^^^^^^^^^^^^^ method not found in `LocalSet`

This seems to be caused by the fact that the tokio dependency in the turmoil project is set to 0.19, but unhandled_panic is not part of this version.

I tried to patch the tokio version in turmoil and use the path dependency, but this still does not work.

This is on both macOS and a fresh linux VM.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.