tokio-rs / turmoil Goto Github PK
View Code? Open in Web Editor NEWAdd hardship to your tests
License: MIT License
Add hardship to your tests
License: MIT License
Repro:
#[test]
fn restart_host_after_crash() -> Result {
let mut sim = Builder::new().build();
sim.host("h", || async { future::pending().await });
// crash and step to execute the err handling logic
sim.crash("h");
sim.step()?;
// restart and step to ensure the host sfotware runs
sim.bounce("h");
sim.step()?;
Ok(())
}
running 1 test
thread 'sim::test::restart_host_after_crash' panicked at 'missing host', src/sim.rs:143:43
stack backtrace:
0: rust_begin_unwind
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
1: core::panicking::panic_fmt
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
2: core::panicking::panic_display
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:139:5
3: core::panicking::panic_str
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:123:5
4: core::option::expect_failed
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/option.rs:1879:5
5: core::option::Option<T>::expect
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/option.rs:741:21
6: turmoil::sim::Sim::run_with_hosts
at ./src/sim.rs:143:22
7: turmoil::sim::Sim::bounce
at ./src/sim.rs:125:9
8: turmoil::sim::test::restart_host_after_crash
at ./src/sim.rs:606:9
9: turmoil::sim::test::restart_host_after_crash::{{closure}}
at ./src/sim.rs:596:5
10: core::ops::function::FnOnce::call_once
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
11: core::ops::function::FnOnce::call_once
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
On crash, the rt is removed causing this issue.
https://github.com/tokio-rs/turmoil/blob/main/src/sim.rs#L321-L322
I am trying to use turmoil for one of my projects, but it is failing to build.
steps to reproduce:
cargo new test-turmoil
cd test-turmoil
cargo add turmoil
cargo check
yields
error[E0433]: failed to resolve: could not find `UnhandledPanic` in `runtime`
--> /Users/mpostma/Documents/code/rust/turmoil/src/rt.rs:94:42
|
94 | .unhandled_panic(tokio::runtime::UnhandledPanic::ShutdownRuntime)
| ^^^^^^^^^^^^^^ could not find `UnhandledPanic` in `runtime`
error[E0433]: failed to resolve: could not find `UnhandledPanic` in `runtime`
--> /Users/mpostma/Documents/code/rust/turmoil/src/rt.rs:108:43
|
108 | local.unhandled_panic(tokio::runtime::UnhandledPanic::ShutdownRuntime);
| ^^^^^^^^^^^^^^ could not find `UnhandledPanic` in `runtime`
error[E0599]: no method named `unhandled_panic` found for mutable reference `&mut tokio::runtime::Builder` in the current scope
--> /Users/mpostma/Documents/code/rust/turmoil/src/rt.rs:94:10
|
94 | .unhandled_panic(tokio::runtime::UnhandledPanic::ShutdownRuntime)
| ^^^^^^^^^^^^^^^ method not found in `&mut tokio::runtime::Builder`
error[E0599]: no method named `unhandled_panic` found for struct `LocalSet` in the current scope
--> /Users/mpostma/Documents/code/rust/turmoil/src/rt.rs:108:11
|
108 | local.unhandled_panic(tokio::runtime::UnhandledPanic::ShutdownRuntime);
| ^^^^^^^^^^^^^^^ method not found in `LocalSet`
This seems to be caused by the fact that the tokio
dependency in the turmoil
project is set to 0.19
, but unhandled_panic
is not part of this version.
I tried to patch the tokio version in turmoil
and use the path dependency, but this still does not work.
This is on both macOS and a fresh linux VM.
Is your feature request related to a problem? Please describe.
Yes, turmoil::net::TcpStream
does not behave like tokio::net::TcpStream
.
AsyncRead
is broken if the supplied buf does not have capacity for the next message.
#[test]
fn read_buf_smaller_than_msg() -> Result {
let mut sim = Builder::new().build();
sim.client("server", async {
let listener = bind().await?;
let (mut s, _) = listener.accept().await?;
s.write_u64(1234).await?;
Ok(())
});
sim.client("client", async {
let mut s = TcpStream::connect(("server", PORT)).await?;
let mut buf = [0; 1];
// panic!: buf.len() must fit in remaining()
let _r = s.read(&mut buf).await?;
Ok(())
});
sim.run()
}
See:
Line 137 in 2d0fadd
Describe the solution you'd like
Align turmoil
with tokio::net
.
Tracing currently accepts a path to a file. Make this more flexible by accepting any Write
. Using stdout out is useful for short running tests.
I was trying to init a simulator runs a closure which bind to localhost. like this:
#[test]
fn test_main() -> Result {
let mut sim = Builder::new()
.build();
sim.client("10.129.11.11", async {
let (mut sock, addr) = TcpListener::bind((IpAddr::from(Ipv6Addr::UNSPECIFIED), 8080))
.await?
.accept()
.await?;
sock.write_i32(124).await?;
Ok(())
});
sim.run()
}
It seems turmoil didn't call tokio's enable_io()
method when initiating a tokio runtime?
related code in turmoil/src/rt.rs
fn init() -> (Runtime, LocalSet) {
let mut builder = tokio::runtime::Builder::new_current_thread();
#[cfg(tokio_unstable)]
builder.unhandled_panic(tokio::runtime::UnhandledPanic::ShutdownRuntime);
let tokio = builder.enable_time().start_paused(true).build().unwrap();
tokio.block_on(async {
// Sleep to "round" `Instant::now()` to the closest `ms`
tokio::time::sleep(Duration::from_millis(1)).await;
});
(tokio, new_local())
}
Hosts currently may only bind to 0.0.0.0
.
https://github.com/tokio-rs/turmoil/blob/main/src/net/tcp/listener.rs#L28
Add support to bind 127.0.0.1
to unblock loopback scenarios. We need to decide how network topology is affected by these changes. For example, it doesn't make sense to allow partitions within a host.
Related to #139 we should add a warning that prints when a blocking task is still active in the runtime causing the next tick to not happen. This can be done by 1) adding a blocking task count metric to tokio-metrics
and then to spin a bg thread that checks this metric and some sort of tick count. It will then start printing if the tick can not progress.
We should support deterministic PRNG for usage for retries, hashmaps, etc. We can accomplish this by providing a deterministic version of RandomState
.
Currently, tracing emits a "pretty-print" JSON format for all events. Some scenarios warrant seeing a more condensed version of the output.
Make the format configurable. Perhaps it could look like this?
src(dot) | dst(dot) | what(send, recv, etc.) | timestamp | ...
See comments in #48 re: spans.
For TcpStream
and UdpSocket
spans might simplify the context needed for each event, ie syn, fin, etc.
Is your feature request related to a problem? Please describe.
No. This is new functionality.
Describe the solution you'd like
Hosts in the simulation are simply futures. During run_until()
I'd like to have the ability to "bounce" a host (cancel, join and restart).
State exploration entails navigating all (or some portion of) the possible states a program can enter during its execution. Model checkers exist (TLA+, P, etc.), but they require building a model that is separate from the actual implementation.
turmoil
provides an interesting opportunity where we are running all of the real code, but with a simulated network. The network provides a place to both view states and control state transitions. Can we expose the right APIs to make state exploration possible?
Note that this approach differs from fuzzing the network, which is already possible today.
Tokio's https://docs.rs/tokio/latest/tokio/net/trait.ToSocketAddrs.html under the hood is an async operation which presents surface area for dns to hang etc
It would be useful to extend the current UDP model with the following features:
The following scenarios exist for client -> server within the same host:
(Only Tcp is shown, but we need to handle it for Udp as well)
// bind | connect
// 0s | 127.0.0.1
// client: local Ok(127.0.0.1:49582), peer Ok(127.0.0.1:1234)
// server: 127.0.0.1:49582, local Ok(127.0.0.1:1234), peer Ok(127.0.0.1:49582)
// 127.0.0.1 | 127.0.0.1
// client: local Ok(127.0.0.1:49622), peer Ok(127.0.0.1:1234)
// server: 127.0.0.1:49622, local Ok(127.0.0.1:1234), peer Ok(127.0.0.1:49622)
// 0s | 192.168.1.42
// client: local Ok(192.168.1.42:49716), peer Ok(192.168.1.42:1234)
// server: 192.168.1.42:49716, local Ok(192.168.1.42:1234), peer Ok(192.168.1.42:49716)
// 127.0.0.1 | 192.168.1.42
// Error: Os { code: 61, kind: ConnectionRefused, message: "Connection refused" }
The first two work as expected, including setting the correct local|peer_addr
on each side of the stream. The last two cause panics today due to holes in the stop-gap implementation for loopback.
We need to address this with workarounds and/or include this in the refactor being discussed in #132 .
This line in the UdpSocket
implementation suggests that it can be used with localhost
, ::
or 0.0.0.0
.
But if we try to change the binding address from "unspecified" to "localhost" in the udp tests, they all fail with ConnectionRefused
error.
If this is the expected behaviour for the socket, that part of documentation can be seen as somewhat misleading.
When using hold
with regular expressions, Pair
throws an exception because it expects the two IpAddr
to be different.
Here is a minimal test:
#[test]
#[cfg(feature = "regex")]
fn hold_all() -> Result {
let mut sim = Builder::new().build();
sim.host("host", || { async { future::pending().await } });
sim.client("client", async {
hold(regex::Regex::new(r".*")?, regex::Regex::new(r".*")?);
Ok(())
});
sim.run()?;
Ok(())
}
Fails with:
thread 'sim::test::hold_all' panicked at 'assertion failed: `(left != right)`
left: `192.168.0.1`,
right: `192.168.0.1`', src/top.rs:35:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'sim::test::hold_all' panicked at 'a spawned task panicked and the LocalSet is configured to shutdown on unhandled panic', /Users/foo/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/task/local.rs:603:17
I tried the following code:
#[test]
fn ephemeral_port() -> Result {
let mut sim = Builder::new().build();
sim.client("client", async {
let sock = bind_to(0).await?;
// turmoil should assign a port to the ephemeral range
assert_ne!(sock.local_addr()?.port(), 0);
assert!(sock.local_addr()?.port() >= 49152);
Ok(())
});
sim.run()
}
It would be nice to support ephemeral port assignment. This is useful for clients that don't care about the specific port number; they just need a free port.
From https://www.rfc-editor.org/rfc/rfc6335#section-6:
o the System Ports, also known as the Well Known Ports, from 0-1023
(assigned by IANA)o the User Ports, also known as the Registered Ports, from 1024-
49151 (assigned by IANA)o the Dynamic Ports, also known as the Private or Ephemeral Ports,
from 49152-65535 (never assigned)
Turmoil is built on the concept of deterministic execution. Using structures such as HashMap
initialize with non-deterministic RandomState
. Both the internals of turmoil and applications using it need to buy in.
e.g. HashMap
, HashSet
, tokio::select!
, etc.
Document the guidelines.
The simulation has the ability to manually and randomly change network conditions during the simulation. This was initially designed for the datagram (UDP) APIs, and does not fully translate to streams (TCP), namely dropping messages. The goal of the simulation is not to test that TCP works, rather it aims to test that applications built over TCP work correctly. These applications lean on the guarantees that TCP provides, ie message order.
Currently, one can apply two types of network partitions:
partition
: All messages are dropped. Works for datagram. Not supported on established streams, however it works for new connections as we only send one message for the 3-way handshake.
See: https://github.com/tokio-rs/turmoil/blob/main/src/world.rs#L250
hold
: Hold all messages "on the network". Works for both modes.
The goal of this issue is to figure out consistent semantics and naming for both networking modes.
With this example code its possible to never run the client future as the server one will hang until all spawn blockings complete. The real answer here is to not use threads since this removes determinism. But this is still surprising behavior. The work around is to use another thread provider like a different tokio runtime (where you call spawn_blocking on that) or std::thread
.
Currently turmoil
will panic, if a packet is send to an ip address that does not exist,
since this will result in an invalid access to the index map in top.rs
.
This does not mirror the behavior if tokio
or std
sockets and panicking seems too extrem,
especially since some applications may create such sockets, expecting errors instead of
panics.
Therefore it might be advantageous to return errors instead of panicking in World::send_message
.
This example will panic.
fn main() -> Result {
let mut sim = Builder::new().build();
sim.client("client", async move {
let _ = net::TcpStream::connect("192.168.30.1:80").await?;
Ok(())
});
sim.run()
}
thread 'main' panicked at 'IndexMap: key not found', ~/dev/turmoil/src/top.rs:221:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: JoinError::Panic(Id(1), ...)
[Placeholder]
Method: TcpStream#peek
I have run into an issue that requires the use of this method. I would be happy to get started on a fix, but I'm uncertain on the approach to take. The issue I'm running across is (1) that self
is immutably referenced:
pub async fn peek(&self, buf: &mut [u8]) -> io::Result<usize>
This was previously mutable and relied on poll_peek
.
The other issue (2) is that turmoil currently implements the ReadHalf
and WriteHalf
using a tokio::mpsc
channel. However, it doesn't look like the Receiver
has an option to immutably read the internal lock-free list.
So my question is whether we should paper over the ReadHalf
with an internal data structure, try to get this implemented in tokio/chan or some other potential solution I'm missing?
At min we should run a full build per PR.
It would be great to provide a few more complex example to showcase more Turmoil capabilities. The example should be succinct with lightweight dependencies but need functional testing for:
Food for thought - https://martinfowler.com/articles/patterns-of-distributed-systems/ with a few candidates:
[Placeholder]
I read the examples
and tests
code in turmoil but still have some puzzle:
BTW, I thought it was Sim::epoch()
to do this, but I got different result by every run in code below:
use rand::SeedableRng;
use std::time::SystemTime;
use turmoil::{Builder, Result};
fn main() {
println!("Hello, world!");
}
#[test]
fn test_main() -> Result {
let mut sim = Builder::new()
.epoch(SystemTime::UNIX_EPOCH)
.rng(rand::rngs::StdRng::seed_from_u64(10))
.build();
sim.client("host", async {
println!("Hello world!");
// now() is diffferent in every run . And there seems no API in turmoil to mock time.
println!("now: {:?}", std::time::Instant::now());
Ok(())
});
sim.run()
}
Summary: Unreachable hosts should cause an UnreachableHost rather than ConnectionRefused
on network partitions, etc.
Summary: I am not sure if Shuttle needs this level of fidelity just yet, and if anyone would notice the difference at this time. But someday simulations using Shuttle might take different actions based upon UnreachableHost vs ConnectionRefused, so it might make sense to fix.
detail
I modified the axum example by adding a single line before the client request:
turmoil::partition("client", "server");
Doing so resulted in this output:
[...]
thread 'main' panicked at examples/axum/src/main.rs:71:15:
called `Result::unwrap()` on an `Err` value: Error { kind: Connect, source: Some(Custom { kind:ConnectionRefused, error: "192.168.0.1:9999" }) }
[...]
Normally when trying to reach a TCP server via a partitioned network, a HostUnreachable error will occurr after a timeout period. A ConnectionRefused occurr will not occur, because a ConnectionRefused occurr happens when a box receiving a TCP syn rejects it, because there is no listener or server running on that port.
This can be demostrated by using curl
from the command kine.
# in this first example, I am curling an IP address without a computer.
# thus there is nothing to respond. it will timeout after ~3 seconds, and return host unreachable
c@intel12400 ~/t/e/axum (main) [7]> time curl -vvvvv 192.168.86.33
* Trying 192.168.86.33:80...
* connect to 192.168.86.33 port 80 from 192.168.86.5 port 59648 failed: Host is unreachable
* Failed to connect to 192.168.86.33 port 80 after 3055 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 192.168.86.33 port 80 after 3055 ms: Couldn't connect to server
________________________________________________________
Executed in 3.06 secs fish external
usr time 6.04 millis 960.00 micros 5.08 millis
sys time 0.32 millis 315.00 micros 0.00 millis
# this second example shows when a connection refused occurs
# I am curling to a valid IP with a computer running, but nothing running on the port specified
# thus the computer receives the TCP syn request, but denies it, cause nothing is on the port
c@intel12400 ~/t/e/axum (main) [7]> time curl -vvvvv 192.168.86.5:8888
* Trying 192.168.86.5:8888...
* connect to 192.168.86.5 port 8888 from 192.168.86.5 port 41228 failed: Connection refused
* Failed to connect to 192.168.86.5 port 8888 after 0 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 192.168.86.5 port 8888 after 0 ms: Couldn't connect to server
________________________________________________________
Executed in 5.57 millis fish external
usr time 5.49 millis 701.00 micros 4.79 millis
sys time 0.23 millis 226.00 micros 0.00 millis
https://github.com/tokio-rs/turmoil/blob/main/src/host.rs#L69
I think code like this should be replaced with something that sources time off a virtual clock. If so, there are two broad patterns to apply:
Hi, I'm not sure it is a bug, but given the code below ( I know it's a misuse of TcpListener
) ,
the sim runs infinitely, and it is expected to stop within 10 logical seconds, and far shorter real world duration.
Maybe it's make sense to add a new simulator config like realworld_duration
?
#[test]
fn infinite() -> turmoil::Result {
use std::time::SystemTime;
use turmoil::{net::TcpListener, Builder};
let mut sim = Builder::new().epoch(SystemTime::UNIX_EPOCH).build();
sim.host("s", || async {
loop {
TcpListener::bind("0.0.0.0:80").await?;
}
});
sim.run()
}
Is your feature request related to a problem? Please describe.
No. This is new functionality.
Describe the solution you'd like
Hosts have an Io concept today, but it is just network. Add the ability to write/read to/from disk, and have this state persist across host restarts.
Is your feature request related to a problem? Please describe.
No. This is new functionality.
Describe the solution you'd like
During the simulation I'd like to place a "hold" on the link between two hosts. Any messages sent will remain in the queue while the "hold" is active. At a later time I'd like to remove the "hold", which allows delivery for the queued messages.
This is useful for tests that need to control the ordering of events across multiple hosts.
Would be great if turmoil can publicly support the ability to introduce one-way partitions between hosts: host A can send messages to host B, but host B messages don't get delivered to host A.
This refactor aims to introduce the ability of nodes to have multiple addresses
in distinct subnets.
Ipv4Addr
AND an unqiue Ipv6Addr
192.168.0.0/16
)Builder
I have tried to implement this, and come to the conlsuion that some major changes
internally AND externally would be nessecary. Notably:
Rt
/Host
can no longer by identified by a single IpAddr
.lookup
would need to return more than one possible address, thus the public APIToIpAddr
/ lookup
/ lookup_many
would need to change. This could be a goodstd::net
or tokio::net
ToIpAddr
for module creation creates problems when statically assigning addresses.Sim::client
/ Sim::host
provides no way to explicitly assign bothIn my opinion, this amount of changes would exceed the scope of one PR, so it might be benifical
to make step by step changes to the public API. However this warrants discussion.
std
/tokio
Ipv6 sockets bound to [::]
can receive incoming Ipv4 packets (addresses are being mapped to Ipv6),top.rs
links or b) add support for multiple interface, thus multiple subnets, should that ever be a goalAs a reference, my current test implementation can be found here.
I have closed the corresponding PR #125, since it is already out of date.
IpAddrs
lookup
and dns registrationuuid
as Host
/Rt
identifersBuilder
Repro:
#[test]
fn run_after_host_crashes() -> Result {
let mut sim = Builder::new().build();
sim.host("h", || async { future::pending().await });
sim.crash("h");
sim.run()
}
Fails with:
running 1 test
Error: JoinError::Cancelled(Id(1))
test sim::test::run_after_host_crashes ... FAILED
failures:
failures:
sim::test::run_after_host_crashes
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 10 filtered out; finished in 0.00s
Currently, we only have panic to trigger failure during simulation runs. This makes writing both the test and host software a little clunky, as we can't ?
return on err.
To make the experience better, we can define a dynamic Error type and have both client and hosts supply a future that aligns. On each run()
iteration, we can check if any host finishes with an error, end the simulation and return the error.
e.g.
pub type TurmoilResult<T = ()> = std::result::Result<T, Box<dyn std::error::Error>>;
Listener::bind()
was added in #35, but it assumes the sole host's SocketAddr
. It's both reasonable to support this (say for loopback or multiple acceptors within a process) and necessary to mirror tokio::net
.
There are certain error conditions we want to test out that only happen when the TCP connection stalls and doesn't write right away (returns Poll::Pending
on write). Would be great if Turmoil allowed for that - so in the simple code below, the host b
just waited forever instead of the simulation panicking with socket buffer full
.
use std::{
net::{IpAddr, Ipv4Addr},
time::Duration,
};
use tokio::{io::AsyncWriteExt, time::sleep};
use turmoil::{
net::{TcpListener, TcpStream},
Result,
};
#[test]
fn want_backpressure() -> Result {
let mut sim = turmoil::Builder::new().build();
sim.host("b", || async {
let listener = TcpListener::bind((IpAddr::from(Ipv4Addr::UNSPECIFIED), 9876))
.await
.expect("Bind to local host");
let (mut conn, _addr) = listener.accept().await.expect("Accept conn");
for _ in 0..10000 {
conn.write_all(b"message").await.expect("Write");
conn.flush().await.expect("flush");
}
Ok(())
});
sim.client("a", async move {
let _conn = TcpStream::connect("b:9876").await.expect("Open to b");
sleep(Duration::from_millis(100)).await;
Ok(())
});
sim.run()
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.