tikv / fail-rs Goto Github PK
View Code? Open in Web Editor NEWFail points for rust
License: Apache License 2.0
Fail points for rust
License: Apache License 2.0
I have a usecase where I'd like to add failpoints across several of my libraries, however I'm experiencing some friction due to the way that cargo features are used by this crate.
Failpoints are currently active by default and needs to be disabled (opt-out) in production via the no_fail
cargo feature. This poses a problem when nesting a couple of levels of dependencies, as the top-level consumer is no more in charge of those features and can't directly opt-out.
Considering that cargo features are additive, a better approach would be to make failpoints disabled by default and enabling them via a dedicated feature (opt-in). That way, the top-level application/consumer would be optionally in charge of configuring the fail
environment and enabling failpoints (transparent to all intermediate libraries).
In practice, this would mean:
no_fail
featurefailpoints
feature to enable themIf this sounds fine to you, I can have a look around and send a PR in the next weeks.
This can probably be done shortly after upgrading to 2018: #21
I'll probably want to clean up the documentation a bit. It's overwhelming atm.
Is your feature request related to a problem? Please describe.
Make fail-point support dependencies (one fail-point wait for another before proceed)
we can refer to the implementation of rocksdb syncpoint https://github.com/facebook/rocksdb/blob/e9e0101ca46f00e8a456e69912a913d907be56fc/test_util/sync_point.h
Describe the solution you'd like
Support writting like this fail::cfg("point_A", "wait(point_B)")
wait
indicates pause on point_A until point_B is passed.wait_local
indicates point_A is enabled when point_A and point_B are processed on same thread. And it will also pause on point_A until point_B is passed.Additional context
part of #tikv/rust-rocksdb#361
I just noticed in a crater run that fail-rs is broken: https://crater-reports.s3.amazonaws.com/pr-60466/master%237840a0b753a065a41999f1fb6028f67d33e3fdd5/reg/fail-0.2.1/log.txt
It doesn't look like a problem with the crate, but I've asked @pietroalbini about it. Would be nice to have fail tested properly by crater.
Is your feature request related to a problem? Please describe.
Failpoint unit tests require taking a global lock, preventing test parallelism. An alternate or complimentary solution to a global lock (#23) would be to have a thread-local failpoint configuration, protected by a guard.
Describe the solution you'd like
Add a thread-local configuration that is protected by a guard that performs teardown.
Describe alternatives you've considered
Global locks: #23
Additional context
This would work for single-threaded test cases, but not generally for tests that require multiple threads.
Perhaps I'm doing something wrong, but I have code that looks very similar to the examples, and I can't get it to panic or otherwise respond to failpoints in the environment:
The full code is in https://github.com/sourcefrog/fail-repro
main.rs
is
use fail::fail_point;
fn main() {
println!("Has failpoints: {}", fail::has_failpoints());
println!(
"FAILPOINTS is {:?}",
std::env::var("FAILPOINTS").unwrap_or_default()
);
fail_point!("main");
println!("Failpoint passed");
}
When I run this:
$ FAILPOINTS=main=panic cargo +1.61 r --features fail/failpoints
Updating crates.io index
...
Running `target/debug/fail-repro`
Has failpoints: true
FAILPOINTS is "main=panic"
Failpoint passed
$ FAILPOINTS=main=print cargo +1.61 r --features fail/failpoints
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/fail-repro`
Has failpoints: true
FAILPOINTS is "main=print"
Failpoint passed
In case this was broken by a later Cargo change, I tried it on both 1.76 and 1.63 and they both show the same behavior.
This is on x86_64 Linux.
The API docs mention the no_fail
feature, but that feature no longer exists. Instead the API docs should mention, probably near the top, that failpoints are not active unless the failpoints
feature is on, and its existence can be checked (after #38) statically or dynamically with has_failpoints
.
cc @lucab
Describe the bug
Cannot use full name qualification for fail_point!
macro in the 3 arguments case
To Reproduce
Just try to compile:
fail::fail_point!("fail-point-3", enable, |_| {});
And you'll get:
error: cannot find macro `fail_point` in this scope
--> my_code.rs:10
|
10 | fail::fail_point!("fail-point-3", enable, |_| {});
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: consider importing this macro:
fail::fail_point
= note: this error originates in the macro `fail::fail_point` (in Nightly builds, run with -Z macro-backtrace for more info)
Expected behavior
You should be able to use the macro without importing it with use
Additional context
Looks like the issue is here:
Line 841 in 6645f17
The recursive macro invocation should look like this:
$crate::fail_point!($name, $e);
The crate docs are pretty overwhelming. Figure out how to defer some of that discussion to elsewhere in the docs.
I think we should consider defaulting to injecting the crate name in fail_point!
. Otherwise it's just too likely to have clashes if this crate is used by library crates for example.
This would need to happen at the next semver break.
Is your feature request related to a problem? Please describe.
fail-rs utilizes global registry to expose simple APIs and convenient FailPoint definition. But it also means all parallel tests have to be run one by one and do cleanup between each run to avoid configurations affect each other.
Describe the solution you'd like
This issue proposes to utilize thread group. Each test case defines a unique thread group, all configuration will be bound to exact one thread group. Every time a new thread is spawn, it needs to be registered to one thread group to make FailPoint reads configurations. If a thread is not registered to any group, it belongs to a default global group.
New public APIs include:
pub fn current_fail_group() -> FailGroup;
impl FailGroup {
pub fn register_current(&self) -> Result<()>;
pub fn deregister_current(&self);
}
Note that it doesn't require users have the ability to spawn a thread, register the thread before using FailPoint is enough.
Describe alternatives you've considered
One solution to this is pass the global registry to struct constructor, but it will interfere the general code heavily, it needs to be passed to anywhere FailPoints are defined.
Another solution is #24, but it lacks threaded cases support.
After TiKV itself is successfully upgraded (tikv/tikv#3896) we can bump fail to Rust 2018 as well. Do a major version bump.
Is your feature request related to a problem? Please describe.
In most of my failpoints I need to use the condition to enable a fail point, but I rarely use the return feature. Neverthless, I'm forced to use the 3 args version of the macro, defining some return value that makes sense for my function.
Describe the solution you'd like
A fail_point two argument macro with name and enable flag, e.g.: fail_point!("my-fail-point", if: enableFlag)
Describe the bug
From the README, the version has already bumped to 0.3. But in https://crates.io/crates/fail, its version is still 0.2.1. I guess we need a tag for release 0.3?
To Reproduce
Expected behavior
System information
Additional context
When running failpoint unit tests, one must take a global lock so the failpoint configuration stays consistent during parallel execution. We do this in our own failpoints tests, and it's explained extensively in the fail docs. Since the library is significantly less useful without a global lock we might one directly to the library and use them in the tikv failpoints test.
Just copy the pattern from tikv/tests/failpoints into this library, then test tikv against the new failpoints library. This can be done by temporarily replacing the fail
dependency in Cargo.toml with a path dependency to the modified version of fail, then running cargo test --test failpoints
.
If it all works, then submit the patch here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.