GithubHelp home page GithubHelp logo

nimiq / core-rs-albatross Goto Github PK

View Code? Open in Web Editor NEW
127.0 127.0 35.0 31.15 MB

Rust implementation of the Albatross protocol

Home Page: https://nimiq.com

License: Other

Dockerfile 0.06% Rust 97.36% Shell 0.24% Python 1.90% HTML 0.03% Jinja 0.16% JavaScript 0.14% TypeScript 0.10% Makefile 0.02%
blockchain rust

core-rs-albatross's People

Contributors

albermonte avatar antonleviathan avatar brantje avatar brunoffranca avatar burtonqin avatar cud4m avatar curdbecker avatar dependabot[bot] avatar eligioo avatar faberto avatar fiaxh avatar frvfrvr avatar hrxi avatar jeffesquivels avatar jgraef avatar josefschabasser avatar jsdanielh avatar maestroi avatar mar-v-in avatar nibhar avatar onmax avatar paberr avatar redmaner avatar rex4539 avatar riptl avatar sdschmidt avatar sisou avatar styppo avatar syvb avatar viquezclaudio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

core-rs-albatross's Issues

Apparently not enough votes on macro blocks

The DevNet demo shows some macro blocks short a few votes to be actually valid. I don't think it's a verification issue, since the method PbftProof::votes is both used for verification of a macro block and for returning the number of votes in the RPC server. Maybe for the latter case the Validators object is not correctly derived from the CompressedList?

I remember implementing the conversion, where I followed @terorie's comments:

// CompressedList compresses a list of items by deduplication,
// using a bit vector to track duplicates.
// Compression algorithm:
//  - Group ranges of identical items, remembering the starting index of each group
//  - Insert all distinct items into the vector
//  - Insert all starting indexes into the BitSet (setting bit at index to one)
// Decompression algorithm:
//  - Iterate over the bits in the bitset (size is policy::SLOTS)
//  - If one bit, the next item is popped off the start of the vector
//  - If zero bit, the next item is the same as the previous item
impl<T> From<CompressedList<T>> for GroupedList<T>
    where T: Clone + Eq + PartialEq + Serialize + Deserialize
{
    fn from(mut compressed: CompressedList<T>) -> Self {
        let mut current: Option<T> = None;
        let mut groups = Vec::with_capacity(compressed.distinct.len());
        let mut distinct = compressed.distinct.drain(..);
        let mut n = 0;

        for i in 0 .. (compressed.count as usize) {
            if compressed.allocation.contains(i) {
                if let Some(x) = current {
                    groups.push(Group(n, x))
                }
                current = distinct.next();
                n = 0;
            }
            n += 1;
        }
        if let Some(x) = current {
            groups.push(Group(n, x))
        }

        GroupedList(groups)
    }
}

There is a test to cover this though: it_can_be_from_compressed_list.

I'll try to find a relevant block and test the conversion on it.

Punishing validators building on top of invalid blocks

@mar-v-in had the idea to punish validators that built on invalid blocks.
His idea only works if the invalid block was produced by the correct block producer for this slot.

As explained in #31, we can immediately start a view change in this case, that could be annotated with the information that it was due to an invalid block.
Punishing validators that built on top of the invalid block then works similarly to fork proofs.
One simply needs the header and signature, which includes the previous block hash.

Failed to compute transactions root, micro blocks missing

Stack trace

logs/devnet_validator_13.log- INFO  consensus            | Now at block #38912
logs/devnet_validator_13.log- WARN  staking_contract     | Slashing NQ26 349P FBHS 4D6Y JR21 164P Q709 NQKP F12D with 6.49077
logs/devnet_validator_13.log: ERROR panic                | thread 'tokio-runtime-worker-1' panicked at 'Failed to compute transactions root, micro blocks missing': src/libcore/option.rs:1185
logs/devnet_validator_13.log-stack backtrace:
logs/devnet_validator_13.log-   0: log_panics::init::{{closure}}
logs/devnet_validator_13.log-   1: std::panicking::rust_panic_with_hook
logs/devnet_validator_13.log-             at src/libstd/panicking.rs:468
logs/devnet_validator_13.log-   2: std::panicking::continue_panic_fmt
logs/devnet_validator_13.log-             at src/libstd/panicking.rs:373
logs/devnet_validator_13.log-   3: rust_begin_unwind
logs/devnet_validator_13.log-             at src/libstd/panicking.rs:302
logs/devnet_validator_13.log-   4: core::panicking::panic_fmt
logs/devnet_validator_13.log-             at src/libcore/panicking.rs:139
logs/devnet_validator_13.log-   5: core::option::expect_failed
logs/devnet_validator_13.log-             at src/libcore/option.rs:1185
logs/devnet_validator_13.log-   6: nimiq_block_production_albatross::BlockProducer::next_macro_header
logs/devnet_validator_13.log-   7: nimiq_block_production_albatross::BlockProducer::next_macro_block_proposal
logs/devnet_validator_13.log-   8: nimiq_validator::validator::Validator::produce_macro_block
logs/devnet_validator_13.log-   9: <futures::future::lazy::Lazy<F,R> as futures::future::Future>::poll
logs/devnet_validator_13.log-  10: futures::task_impl::Spawn<T>::poll_future_notify
logs/devnet_validator_13.log-  11: std::panicking::try::do_call
logs/devnet_validator_13.log-  12: __rust_maybe_catch_panic
logs/devnet_validator_13.log-             at src/libpanic_unwind/lib.rs:79
logs/devnet_validator_11.log- INFO  consensus            | Now at block #30014
logs/devnet_validator_11.log- INFO  consensus            | Now at block #30015

Invalid blockchain state after network restart

I just restarted the DevNet. They synced up with albatross-seed2. After finishing the sync, all nodes get disconnected:

Rejecting block - commit failed: AccountsError(InvalidCoinValue)
 INFO  consensus            | Disconnected from ws://5.0.0.34:8443/36796f558ea20fc9430341a07f208094
 INFO  validator_network    | Validator left: 36796f558ea20fc9430341a07f208094

Before that there is also this in albatross-seed2:

 INFO  validator            | Produced block #13577.1: 421320fe7e0fa7ca9103e1edd74bb294d81073368081bdb82546d908a35fafd2
 WARN  blockchain           | Rejecting block - lower view number 1 < 2
 WARN  blockchain           | Rejecting block - Bad header / justification
 ERROR validator            | Failed to push produced micro block to blockchain: InvalidBlock(InvalidViewNumber)
 INFO  consensus            | Now at block #13600

Is maybe the validator code producing blocks, even though they're not synced up? The node is producing block #13577.1, but right afterwards we're already at #13600. I don't think this error is fatal, since the producer itself immediately rejects the block. But it might break their blockchain state and thus other peers can't connect.

Logs of all nodes with log_level=debug from after the restart: invalid_coin_value.tar.gz
There is no panic or deadlock in these logs.

Incorrect debug message for old view change

When we receive a view change for an old epoch, we print a debug message, but the current block number is incorrect. It currently shows the block number of the blockchain head, instead of the next block number.

It'll also spam debug messages for every old view change proof it receives. This is because it'll just check the signature instead of first checking if the view change proof is actually for this epoch.

Log

 DEBUG validator_agent      | [VIEW-CHANGE] Ignoring view change message for an old epoch: current=#15872/125, change_to=#15872/124
 DEBUG validator_agent      | [VIEW-CHANGE] Ignoring view change message for an old epoch: current=#15872/125, change_to=#15872/124
 DEBUG validator_agent      | [VIEW-CHANGE] Ignoring view change message for an old epoch: current=#15872/125, change_to=#15872/124
 DEBUG validator_agent      | [VIEW-CHANGE] Ignoring view change message for an old epoch: current=#15872/125, change_to=#15872/124
 DEBUG validator_agent      | [VIEW-CHANGE] Ignoring view change message for an old epoch: current=#15872/125, change_to=#15872/124
 DEBUG validator_agent      | [VIEW-CHANGE] Ignoring view change message for an old epoch: current=#15872/125, change_to=#15872/124
 DEBUG validator_agent      | [VIEW-CHANGE] Ignoring view change message for an old epoch: current=#15872/125, change_to=#15872/124
 DEBUG validator_network    | Invalid view change proof: InvalidSignature
 INFO  validator_network    | Received view change proof for: #15872.1
 DEBUG validator            | Completed view change to #15872.1
 DEBUG validator_network    | Invalid view change proof: InvalidSignature
 DEBUG validator_network    | Invalid view change proof: InvalidSignature
 DEBUG validator_network    | Invalid view change proof: InvalidSignature
 DEBUG validator_network    | Invalid view change proof: InvalidSignature
 DEBUG validator_agent      | [VIEW-CHANGE] Ignoring view change message for an old epoch: current=#15872/125, change_to=#15872/124

Possible manipulation of validator selection

The current algorithm for validator selection uses a max_considered to cap the amount of computation needed to get the set of active validators.

This opens the possibility to manipulation of who is considered.

We could efficiently compute the active validator set by keeping the SegmentTree in the StakingContract and then only update it when there is a new staking transaction.

Additionally, to discourage splitting up stakes or spamming staking transactions (and thus bloating the SegmentTree) we could add a proof of work to the staking transaction.

Deadlock in Aggregation::start_level

devnet_validator_14.log- INFO  validator_network    | Received view change proof for: #7552.637
devnet_validator_14.log- INFO  validator            | Starting view change to #7552.638
devnet_validator_14.log- INFO  validator_network    | Received view change proof for: #7552.638
devnet_validator_14.log- INFO  validator            | Starting view change to #7552.639
devnet_validator_14.log- INFO  validator_network    | Received view change proof for: #7552.639
devnet_validator_14.log: ERROR deadlock             | 1 deadlocks detected
devnet_validator_14.log: ERROR deadlock             | Deadlock #0
devnet_validator_14.log: ERROR deadlock             | Thread Id 139870780675840
devnet_validator_14.log: ERROR deadlock             | stack backtrace:
devnet_validator_14.log-   0:     0x56274629dc57 - backtrace::backtrace::trace::h23dbc4c26a81c3f9
devnet_validator_14.log-   1:     0x56274629cdc3 - backtrace::capture::Backtrace::new::h999dbb7d13f7c4e8
devnet_validator_14.log:   2:     0x56274628728d - parking_lot_core::parking_lot::deadlock_impl::on_unpark::h7bcf4e70001f6819
devnet_validator_14.log-   3:     0x56274628e631 - parking_lot_core::parking_lot::park_internal::hc6dcbe0872ee3623
devnet_validator_14.log-   4:     0x562746286057 - parking_lot::raw_rwlock::RawRwLock::lock_shared_slow::hc3763480dfed0194
devnet_validator_14.log-   5:     0x562745ddf290 - nimiq_handel::aggregation::Aggregation<P>::start_level::h9df986e411f7e937
devnet_validator_14.log-   6:     0x562745de08a2 - nimiq_handel::aggregation::Aggregation<P>::check_completed_level::h5e7afa92e01f4325
devnet_validator_14.log-   7:     0x562745ddfaef - nimiq_handel::aggregation::Aggregation<P>::push_contribution::h7e27ee902d190f46
devnet_validator_14.log-   8:     0x562745e13c26 - nimiq_validator::signature_aggregation::pbft::PbftAggregation::push_signed_prepare::h52429bd96e42b2c5
devnet_validator_14.log-   9:     0x562745e0ffde - nimiq_validator::validator_network::ValidatorNetwork::push_prepare::h3910161a6044fe83
devnet_validator_14.log-  10:     0x562745dbdd28 - nimiq_validator::validator::Validator::on_pbft_proposal::he3aa15529dae46d6
devnet_validator_14.log-  11:     0x562745db83a2 - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::h16b7138d1938fbac
devnet_validator_14.log-  12:     0x562745e0e739 - nimiq_validator::validator_network::ValidatorNetwork::on_pbft_proposal::h362c9d49e774abd8
devnet_validator_14.log-  13:     0x562745e1ff2a - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::h67eeb1f5d8a9685d
devnet_validator_14.log-  14:     0x562745e1256a - nimiq_validator::validator_agent::ValidatorAgent::on_pbft_proposal_message::h8afd9679218b7feb
devnet_validator_14.log-  15:     0x562745e2114f - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::hdd65bf97c4539c19
devnet_validator_14.log-  16:     0x5627460a6f9d - nimiq_messages::MessageNotifier::notify::he8f10588941b6573
devnet_validator_14.log-  17:     0x562745ea549d - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::h19808fbd19898a36
devnet_validator_14.log-  18:     0x562745e5bba2 - nimiq_utils::observer::PassThroughNotifier<E>::notify::h174a290f62415cf7
devnet_validator_14.log-  19:     0x562745e7fedd - <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll::h72da4b09c79f6579
devnet_validator_14.log-  20:     0x562745e5854c - futures::future::chain::Chain<A,B,C>::poll::he9fb69a3e7fc3bac
devnet_validator_14.log-  21:     0x562745e7efec - <futures::future::select::Select<A,B> as futures::future::Future>::poll::h0947b8a821e86a50
devnet_validator_14.log-  22:     0x562745ea66e1 - <futures::future::map::Map<A,F> as futures::future::Future>::poll::h59c38b1c736fc607
devnet_validator_14.log-  23:     0x562745e3e36a - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::h0e3a577500db6572
devnet_validator_14.log-  24:     0x562746202bcd - futures::task_impl::Spawn<T>::poll_future_notify::h7bb4085b14d49f6e
devnet_validator_14.log-  25:     0x562746202861 - std::panicking::try::do_call::h09f73c275b109d8e
devnet_validator_14.log-  26:     0x5627462d274a - __rust_maybe_catch_panic
devnet_validator_14.log-                               at src/libpanic_unwind/lib.rs:83
devnet_validator_14.log-  27:     0x562746200c49 - tokio_threadpool::task::Task::run::hd68f1083bc521c34
devnet_validator_14.log-  28:     0x5627461fcf0a - tokio_threadpool::worker::Worker::run_task::head025ba2e2fc00e
devnet_validator_14.log-  29:     0x5627461fc565 - tokio_threadpool::worker::Worker::run::hc9fda95a92e3b29b
devnet_validator_14.log-  30:     0x5627461e2d00 - std::thread::local::LocalKey<T>::with::h126572f9ec5b1b7c
devnet_validator_14.log-  31:     0x5627461e2e48 - std::thread::local::LocalKey<T>::with::h509921bfab7ed9b8
devnet_validator_14.log-  32:     0x5627461e283c - tokio_reactor::with_default::h5e2c5e64d938a001
devnet_validator_14.log-  33:     0x5627461dd7ff - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::hc83a6ada0dfef39c
devnet_validator_14.log-  34:     0x5627462005aa - std::thread::local::LocalKey<T>::with::heb678555caec55f6
devnet_validator_14.log-  35:     0x562746200399 - std::thread::local::LocalKey<T>::with::h91a9994ef809d26b
devnet_validator_14.log-  36:     0x5627461f9c18 - std::sys_common::backtrace::__rust_begin_short_backtrace::h85f58666e938a950
devnet_validator_14.log-  37:     0x5627462028bc - std::panicking::try::do_call::h211c0bd537af46ef
devnet_validator_14.log-  38:     0x5627462d274a - __rust_maybe_catch_panic
devnet_validator_14.log-                               at src/libpanic_unwind/lib.rs:83
devnet_validator_14.log-  39:     0x5627461fa760 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h904b95757ee0c525
devnet_validator_14.log-  40:     0x5627462b929f - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h7e48ca79c501d374
devnet_validator_14.log-                               at /rustc/aa4e57ca8f18b836bf77923cd0d9ad1390f0110b/src/liballoc/boxed.rs:942
devnet_validator_14.log-  41:     0x5627462d18f0 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h7f80bcfd5ad6a8ff
devnet_validator_14.log-                               at /rustc/aa4e57ca8f18b836bf77923cd0d9ad1390f0110b/src/liballoc/boxed.rs:942
devnet_validator_14.log-                           std::sys_common::thread::start_thread::h2f29bc8a5ef472d5
devnet_validator_14.log-                               at src/libstd/sys_common/thread.rs:13
devnet_validator_14.log-                           std::sys::unix::thread::Thread::new::thread_start::h2b3e9caa754ff917
devnet_validator_14.log-                               at src/libstd/sys/unix/thread.rs:79
devnet_validator_14.log-  42:     0x7f363544a182 - start_thread
devnet_validator_14.log-  43:     0x7f3635357b1f - __clone
devnet_validator_14.log-  44:                0x0 - <unknown>
devnet_validator_14.log-

Circular dead-lock in Validator

logs/devnet_validator_32.log: ERROR deadlock             | 1 deadlocks detected
logs/devnet_validator_32.log: ERROR deadlock             | Deadlock #0
logs/devnet_validator_32.log: ERROR deadlock             | Thread Id 140681859139328
logs/devnet_validator_32.log: ERROR deadlock             | stack backtrace:
logs/devnet_validator_32.log-   0:     0x556215725df7 - backtrace::backtrace::trace::he1bd0e735dbbc667
logs/devnet_validator_32.log-   1:     0x556215724f63 - backtrace::capture::Backtrace::new::hd1afaac098ef9e52
logs/devnet_validator_32.log:   2:     0x55621570f42d - parking_lot_core::parking_lot::deadlock_impl::on_unpark::hebd2fcc23c174c94
logs/devnet_validator_32.log-   3:     0x5562157167d1 - parking_lot_core::parking_lot::park_internal::h6202e8ed10c8808f
logs/devnet_validator_32.log-   4:     0x55621570de13 - parking_lot::raw_rwlock::RawRwLock::lock_exclusive_slow::h88ffb00ce5a26ddc
logs/devnet_validator_32.log-   5:     0x556215248db9 - nimiq_validator::validator::Validator::init_epoch::h994add5d77f823bc
logs/devnet_validator_32.log-   6:     0x556215243a63 - <F as nimiq_utils::observer::Listener<E>>::on_event::h56bd2759a88ec87d
logs/devnet_validator_32.log-   7:     0x556215574b1b - nimiq_utils::observer::Notifier<E>::notify::h258267e0e524901a
logs/devnet_validator_32.log-   8:     0x55621554d92a - nimiq_blockchain_albatross::blockchain::Blockchain::push_block::h37853048d9ce43ab
logs/devnet_validator_32.log-   9:     0x556215555f4c - <nimiq_blockchain_albatross::blockchain::Blockchain as nimiq_blockchain_base::AbstractBlockchain>::push::h32b106710917c72a
logs/devnet_validator_32.log-  10:     0x556214fb73d2 - nimiq_consensus::inventory::InventoryAgent<B,MA>::on_block::h7e0e116205208d7b
logs/devnet_validator_32.log-  11:     0x55621502a0ff - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::habbbf6892186b07b
logs/devnet_validator_32.log-  12:     0x55621553532c - nimiq_utils::observer::PassThroughNotifier<E>::notify::ha4112aa6b5ee2f22
logs/devnet_validator_32.log-  13:     0x55621552df0c - nimiq_messages::MessageNotifier::notify::h18f3153878fc1387
logs/devnet_validator_32.log-  14:     0x55621532cead - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::h926d5a09c1f63f01
logs/devnet_validator_32.log-  15:     0x5562152e35b2 - nimiq_utils::observer::PassThroughNotifier<E>::notify::h16c0ed17809685f4
logs/devnet_validator_32.log-  16:     0x5562153082fd - <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll::h9aaf65e4d097f7e6
logs/devnet_validator_32.log-  17:     0x5562152d95ac - futures::future::chain::Chain<A,B,C>::poll::h0b738fc7e6b2bc47
logs/devnet_validator_32.log-  18:     0x556215306a1c - <futures::future::select::Select<A,B> as futures::future::Future>::poll::h6706b8de82ace6b5
logs/devnet_validator_32.log-  19:     0x55621532e181 - <futures::future::map::Map<A,F> as futures::future::Future>::poll::h7c225d6f02abad06
logs/devnet_validator_32.log-  20:     0x5562152c5e0a - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::h621d65ac7587dbe7
logs/devnet_validator_32.log-  21:     0x55621568aefd - futures::task_impl::Spawn<T>::poll_future_notify::ha31b037a092a2736
logs/devnet_validator_32.log-  22:     0x55621568aa01 - std::panicking::try::do_call::h185dec94ef9a1086
--
logs/devnet_validator_32.log: ERROR deadlock             | Thread Id 140681872049920
logs/devnet_validator_32.log: ERROR deadlock             | stack backtrace:
logs/devnet_validator_32.log-   0:     0x556215725df7 - backtrace::backtrace::trace::he1bd0e735dbbc667
logs/devnet_validator_32.log-   1:     0x556215724f63 - backtrace::capture::Backtrace::new::hd1afaac098ef9e52
logs/devnet_validator_32.log:   2:     0x55621570f42d - parking_lot_core::parking_lot::deadlock_impl::on_unpark::hebd2fcc23c174c94
logs/devnet_validator_32.log-   3:     0x5562157167d1 - parking_lot_core::parking_lot::park_internal::h6202e8ed10c8808f
logs/devnet_validator_32.log-   4:     0x55621570d404 - parking_lot::raw_mutex::RawMutex::lock_slow::h30f2c0b535d56ab0
logs/devnet_validator_32.log-   5:     0x5562155560c7 - <nimiq_blockchain_albatross::blockchain::Blockchain as nimiq_blockchain_base::AbstractBlockchain>::lock::h6788afa82f31c091
logs/devnet_validator_32.log-   6:     0x5562152b1d62 - nimiq_block_production_albatross::BlockProducer::next_macro_block_proposal::hc17c254fde431839
logs/devnet_validator_32.log-   7:     0x55621524aad5 - nimiq_validator::validator::Validator::produce_macro_block::h1688236826015392
logs/devnet_validator_32.log-   8:     0x55621528e16a - <futures::future::lazy::Lazy<F,R> as futures::future::Future>::poll::h795c6655b030e207
logs/devnet_validator_32.log-   9:     0x55621568aefd - futures::task_impl::Spawn<T>::poll_future_notify::ha31b037a092a2736
logs/devnet_validator_32.log-  10:     0x55621568aa01 - std::panicking::try::do_call::h185dec94ef9a1086
logs/devnet_validator_32.log-  11:     0x55621575a8ea - __rust_maybe_catch_panic
logs/devnet_validator_32.log-                               at src/libpanic_unwind/lib.rs:83
logs/devnet_validator_32.log-  12:     0x556215688de9 - tokio_threadpool::task::Task::run::h7910cafcf67341a7
logs/devnet_validator_32.log-  13:     0x5562156850aa - tokio_threadpool::worker::Worker::run_task::he831297aeedb2153
logs/devnet_validator_32.log-  14:     0x556215684530 - tokio_threadpool::worker::Worker::run::h3b776189020f3bd5
logs/devnet_validator_32.log-  15:     0x55621566aca0 - std::thread::local::LocalKey<T>::with::hefc133a8c4e93cae
logs/devnet_validator_32.log-  16:     0x55621566ab78 - std::thread::local::LocalKey<T>::with::hd985fae9b1c1d1d7
logs/devnet_validator_32.log-  17:     0x55621566a69c - tokio_reactor::with_default::hb0e802b210729b6f
logs/devnet_validator_32.log-  18:     0x55621566ca5f - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::hbc5a0d67213046cd
logs/devnet_validator_32.log-  19:     0x55621568874a - std::thread::local::LocalKey<T>::with::hd069e4a876a9abd1
logs/devnet_validator_32.log-  20:     0x5562156881e9 - std::thread::local::LocalKey<T>::with::h95e390417fdeca45
logs/devnet_validator_32.log-  21:     0x556215681db8 - std::sys_common::backtrace::__rust_begin_short_backtrace::hce13dccd4f5cc0f6

Thread behavior

  • Thread 1:
    • Blockchain::push_block locks blockchain mutex
    • Blockchain::extend write-locks blockchain state, but drops it again
    • Validator::init_epoch write-locks validator state
    • Validator::get_pk_idx_and_slots read-locks blockchain state
  • Thread 2:
    • Validator::produce_macro_block write-locks validator state
    • BlockProducer::next_macro_block_proposal locks blockchain mutex

So the two threads dead-lock on blockchain's push_lock Mutex and validator's state RwLock.

Guaranteeing progression of the chain

Once all slot owners have been slashed and thus deactivated, we should select a random block producer from them to immediately produce a macro block. Otherwise the blockchain will stop, because there are no candidates for block producers.

Failed to collect receipts during block production: InvalidForTarget

This panic seems to have occured on all nodes. I will download the full logs and sync up the explorer, so I can inspect the block chain.

Log

today at 4:55 PM   DEBUG validator_network    | View change already complete: #63980.38
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ83 87U0 P3TR XVDT M2VV V5RL RMKX BDRE FYFC
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ58 8FPT L6FS JQE5 7MCX 438L HNBU JR5Q A5S6
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ49 FNT2 3A0E K8P5 VQX3 AS5D PUVY 5TLQ XUE0
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ19 DFJQ UD8Y J0B7 BSQD AM8L 05YH SVB5 JRN5
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ76 3Q58 AKXH 69K3 YHPE 90JU JPYQ S9G6 FB7H
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ58 8FPT L6FS JQE5 7MCX 438L HNBU JR5Q A5S6
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ76 PX5M PUKD 3J3B YHUJ UK2D J0Q7 7GFX A9EF
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ76 3Q58 AKXH 69K3 YHPE 90JU JPYQ S9G6 FB7H
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ79 86D0 BS19 QJ90 55JQ P2PP RSE8 4SHG 22XB
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ98 VSM6 CTKE 108F AVHA VLXF M9F1 L6YU GP92
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ92 NRQS 1QE9 TFRQ FMVF S65A 9GU5 2P83 SCDL
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ76 3Q58 AKXH 69K3 YHPE 90JU JPYQ S9G6 FB7H
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ91 LGLK TVNM 3RSE GKQD 7N34 G7S9 G70H GC56
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ79 86D0 BS19 QJ90 55JQ P2PP RSE8 4SHG 22XB
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ98 VSM6 CTKE 108F AVHA VLXF M9F1 L6YU GP92
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ79 86D0 BS19 QJ90 55JQ P2PP RSE8 4SHG 22XB
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ81 Q8MC SKLJ P1JQ JXUA AF54 MTAL 8X0E S14Q
today at 4:55 PM   DEBUG blockchain           | Slash inherent: view change: 2.90041 NIM, NQ19 DFJQ UD8Y J0B7 BSQD AM8L 05YH SVB5 JRN5
today at 4:55 PM   DEBUG validator_network    | View change already complete: #63980.38
today at 4:55 PM   ERROR panic                | thread 'tokio-runtime-worker-7' panicked at 'Failed to collect receipts during block production: InvalidForTarget': src/libcore/result.rs:1165
today at 4:55 PM  stack backtrace:
today at 4:55 PM     0: log_panics::init::{{closure}}
today at 4:55 PM     1: std::panicking::rust_panic_with_hook
today at 4:55 PM               at src/libstd/panicking.rs:468
today at 4:55 PM     2: std::panicking::continue_panic_fmt
today at 4:55 PM               at src/libstd/panicking.rs:373
today at 4:55 PM     3: rust_begin_unwind
today at 4:55 PM               at src/libstd/panicking.rs:302
today at 4:55 PM     4: core::panicking::panic_fmt
today at 4:55 PM               at src/libcore/panicking.rs:141
today at 4:55 PM     5: core::result::unwrap_failed
today at 4:55 PM               at src/libcore/result.rs:1165
today at 4:55 PM     6: nimiq_block_production_albatross::BlockProducer::next_micro_block
today at 4:55 PM     7: nimiq_validator::validator::Validator::produce_micro_block
today at 4:55 PM     8: <futures::future::lazy::Lazy<F,R> as futures::future::Future>::poll
today at 4:55 PM     9: futures::task_impl::Spawn<T>::poll_future_notify
today at 4:55 PM    10: std::panicking::try::do_call
today at 4:55 PM    11: __rust_maybe_catch_panic
today at 4:55 PM               at src/libpanic_unwind/lib.rs:79
today at 4:55 PM    12: tokio_threadpool::task::Task::run
today at 4:55 PM    13: tokio_threadpool::worker::Worker::run_task
today at 4:55 PM    14: tokio_threadpool::worker::Worker::run
today at 4:55 PM    15: std::thread::local::LocalKey<T>::with
today at 4:55 PM    16: std::thread::local::LocalKey<T>::with
today at 4:55 PM    17: tokio_reactor::with_default
today at 4:55 PM    18: tokio::runtime::threadpool::builder::Builder::build::{{closure}}
today at 4:55 PM    19: std::thread::local::LocalKey<T>::with
today at 4:55 PM    20: std::thread::local::LocalKey<T>::with
today at 4:55 PM    21: std::sys_common::backtrace::__rust_begin_short_backtrace
today at 4:55 PM    22: std::panicking::try::do_call
today at 4:55 PM    23: __rust_maybe_catch_panic
today at 4:55 PM               at src/libpanic_unwind/lib.rs:79
today at 4:55 PM    24: core::ops::function::FnOnce::call_once{{vtable.shim}}
today at 4:55 PM    25: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
today at 4:55 PM               at /rustc/ac162c6abe34cdf965afc0389f6cefa79653c63b/src/liballoc/boxed.rs:942
today at 4:55 PM    26: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
today at 4:55 PM               at /rustc/ac162c6abe34cdf965afc0389f6cefa79653c63b/src/liballoc/boxed.rs:942
today at 4:55 PM        std::sys_common::thread::start_thread
today at 4:55 PM               at src/libstd/sys_common/thread.rs:13
today at 4:55 PM        std::sys::unix::thread::Thread::new::thread_start
today at 4:55 PM               at src/libstd/sys/unix/thread.rs:79
today at 4:55 PM    27: start_thread
today at 4:55 PM    28: __clone

Implement `serde::{Serialize, Deserialize}` for primitives

Implementing serde's Serialize and Deserialize would greatly reduce the amount of parsing, serializing code we need to write or have already written. E.g.:

  • The RPC server currently uses the json! macro to manually serialize some structs like Block and Transaction.
  • The config file loader implements Deserialize for structs that contain almost identical information to e.g. NetworkId.
  • It makes it easy to serialize to a lot of common formats.

Integration tests

With the new client lib it should be easier to write some integration tests. Especially with the complexity of the validator protocol, we should have that.

We could e.g. have a test that just runs a blockchain for like 2 epochs. The blockchain event handler than can let the test finish as successfull if we arrive at block 256 or whatever.

Also: Write integration tests for Handel: Just use a trivial IdentityRegistry and provide a Sender that just uses mpsc.

Integer overflow in validator timeout

DEBUG validator            | Completed view change to #128.1
thread 'tokio-runtime-worker-5' panicked at 'attempt to subtract with overflow', validator/src/validator.rs:377:44
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

`syn::attr::Meta` needs to be `Debug`

We're debug printing syn::attr::Meta in a panic! in bserial/beserial_derive/lib.rs. Apparently Meta doesn't implement Debug by default, but needs a feature in the syn crate to be enabled. It's unclear why it's working on albatross but not on core-rs/master. Maybe some dependency enables that feature?

Anyway, just add this to the syn dependency in beserial_derive:

features = ["extra-traits"]

Thanks to @jeffesquivels for finding this :)

Validator without full chain history

A pure validator only needs to remember blocks of the last two epochs.
Introduce a new consensus mode that automatically discards old blocks.

Deadlock in `Blockchain::push_block`

 ERROR deadlock             | 1 deadlocks detected
 ERROR deadlock             | Deadlock #0
 ERROR deadlock             | Thread Id 140079429310208
 ERROR deadlock             | stack backtrace:
   0:     0x55c607f7c217 - backtrace::backtrace::trace::he1bd0e735dbbc667
   1:     0x55c607f7b383 - backtrace::capture::Backtrace::new::hd1afaac098ef9e52
   2:     0x55c607f6584d - parking_lot_core::parking_lot::deadlock_impl::on_unpark::hebd2fcc23c174c94
   3:     0x55c607f6cbf1 - parking_lot_core::parking_lot::park_internal::h6202e8ed10c8808f
   4:     0x55c607f64c0f - parking_lot::raw_rwlock::RawRwLock::upgrade_slow::h67d9f3a330f7302d
   5:     0x55c607da3257 - nimiq_blockchain_albatross::blockchain::Blockchain::push_block::h37853048d9ce43ab
   6:     0x55c607da15ac - nimiq_blockchain_albatross::blockchain::Blockchain::push::hbafafebd93562921
   7:     0x55c607aa14c4 - nimiq_validator::validator::Validator::produce_micro_block::h898ef6eada548a33
   8:     0x55c607ae428d - <futures::future::lazy::Lazy<F,R> as futures::future::Future>::poll::h795c6655b030e207
   9:     0x55c607ee131d - futures::task_impl::Spawn<T>::poll_future_notify::ha31b037a092a2736
  10:     0x55c607ee0e21 - std::panicking::try::do_call::h185dec94ef9a1086
  11:     0x55c607fb0d0a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:83
  12:     0x55c607edf209 - tokio_threadpool::task::Task::run::h7910cafcf67341a7
  13:     0x55c607edb4ca - tokio_threadpool::worker::Worker::run_task::he831297aeedb2153
  14:     0x55c607eda950 - tokio_threadpool::worker::Worker::run::h3b776189020f3bd5
  15:     0x55c607ec10c0 - std::thread::local::LocalKey<T>::with::hefc133a8c4e93cae
  16:     0x55c607ec0f98 - std::thread::local::LocalKey<T>::with::hd985fae9b1c1d1d7
  17:     0x55c607ec0abc - tokio_reactor::with_default::hb0e802b210729b6f
  18:     0x55c607ec2e7f - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::hbc5a0d67213046cd
  19:     0x55c607edeb6a - std::thread::local::LocalKey<T>::with::hd069e4a876a9abd1
  20:     0x55c607ede609 - std::thread::local::LocalKey<T>::with::h95e390417fdeca45
  21:     0x55c607ed81d8 - std::sys_common::backtrace::__rust_begin_short_backtrace::hce13dccd4f5cc0f6
  22:     0x55c607ee0e7c - std::panicking::try::do_call::hd0dfef565d14dc52
  23:     0x55c607fb0d0a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:83
  24:     0x55c607ed8d20 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hb0f8ffe5e3b63f17
  25:     0x55c607f9785f - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h55e2fe7570195774
                               at /rustc/c553e8e8812c19809e70523064989e66c5cfd3f1/src/liballoc/boxed.rs:942
  26:     0x55c607fafeb0 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h5dc11c0f82cb11d1
                               at /rustc/c553e8e8812c19809e70523064989e66c5cfd3f1/src/liballoc/boxed.rs:942
                           std::sys_common::thread::start_thread::h80378c40e1d94fb4
                               at src/libstd/sys_common/thread.rs:13
                           std::sys::unix::thread::Thread::new::thread_start::h96368244a6c4ab72
                               at src/libstd/sys/unix/thread.rs:79
  27:     0x7f66d1db8182 - start_thread
  28:     0x7f66d1cc5b1f - __clone
  29:                0x0 - <unknown>

Don't just broadcast `ValidatorInfo`s

Currently ValidatorInfos are broadcasted to all other validators if we don't know it yet. This leads to a lot of them being send around.

Better solution:

  • Collect them, but don't broadcast
  • Each ValidatorAgent can remember which ones are already known by the particular validator peer.
  • Send only ValidatorInfos to peers that they don't know yet
  • Only send out ValidatorInfo periodically and batch them together (The message can hold a Vec of ValidatorInfos)

Circular deadlock in Handel

There are a bunch of circular dependencies between locks in the Handel code. I fixed some of them already. My approach is to get the state lock first always. Also avoid holding locks when you don't absolutely need to.

 INFO  consensus            | Now at block #1914
 DEBUG consensus_agent      | Known block 701687bf101322ffa398eea1b371b673e010cf0bd9ead4d4c1316c165622dc30 from ws://5.0.0.34:8443/a16f18b24658bfe2bef5cf5f98069d7c
 DEBUG consensus_agent      | Known block 551021cdd642ad22e3f97d52ba353293e454c9352673e5d23aeec50998752f74 from ws://5.0.0.34:8443/a16f18b24658bfe2bef5cf5f98069d7c
 DEBUG validator_network    | New view change for: #1915.5, node_id=24
 INFO  validator            | Starting view change to #1915.5
 ERROR deadlock             | 1 deadlocks detected
 ERROR deadlock             | Deadlock #0
 ERROR deadlock             | Thread Id 140187675358976
 ERROR deadlock             | stack backtrace:
   0:     0x55db0ee8a867 - backtrace::backtrace::trace::he1bd0e735dbbc667
   1:     0x55db0ee899d3 - backtrace::capture::Backtrace::new::hd1afaac098ef9e52
   2:     0x55db0ee73e9d - parking_lot_core::parking_lot::deadlock_impl::on_unpark::hebd2fcc23c174c94
   3:     0x55db0ee7b241 - parking_lot_core::parking_lot::park_internal::h6202e8ed10c8808f
   4:     0x55db0ee72883 - parking_lot::raw_rwlock::RawRwLock::lock_exclusive_slow::h88ffb00ce5a26ddc
   5:     0x55db0e9bbf2a - nimiq_handel::aggregation::Aggregation<P>::push_contribution::heac32547103f8595
   6:     0x55db0ea0f759 - nimiq_validator::signature_aggregation::voting::VoteAggregation<T>::push_contribution::ha6691d3a19c522a6
   7:     0x55db0e9f990a - nimiq_validator::validator_network::ValidatorNetwork::start_view_change::h3f29f3c42a3d6904
   8:     0x55db0e9ac6d7 - nimiq_validator::validator::Validator::on_block_timeout::hca839f33c9f25afc
   9:     0x55db0e9e2d74 - <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll::h451e7966dbf40f1b
  10:     0x55db0ea0d13a - <futures::future::select::Select<A,B> as futures::future::Future>::poll::hf8653b009ab69b6d
  11:     0x55db0e9d3838 - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::hd2172076a10929a5
  12:     0x55db0edef96d - futures::task_impl::Spawn<T>::poll_future_notify::ha31b037a092a2736
  13:     0x55db0edef471 - std::panicking::try::do_call::h185dec94ef9a1086
  14:     0x55db0eebf35a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:83
  15:     0x55db0eded859 - tokio_threadpool::task::Task::run::h7910cafcf67341a7
  16:     0x55db0ede9b1a - tokio_threadpool::worker::Worker::run_task::he831297aeedb2153
  17:     0x55db0ede9175 - tokio_threadpool::worker::Worker::run::h3b776189020f3bd5
  18:     0x55db0edcf710 - std::thread::local::LocalKey<T>::with::hefc133a8c4e93cae
  19:     0x55db0edcf5e8 - std::thread::local::LocalKey<T>::with::hd985fae9b1c1d1d7
  20:     0x55db0edcf10c - tokio_reactor::with_default::hb0e802b210729b6f
  21:     0x55db0edd14cf - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::hbc5a0d67213046cd
  22:     0x55db0eded1ba - std::thread::local::LocalKey<T>::with::hd069e4a876a9abd1
  23:     0x55db0edecc59 - std::thread::local::LocalKey<T>::with::h95e390417fdeca45
  24:     0x55db0ede6828 - std::sys_common::backtrace::__rust_begin_short_backtrace::hce13dccd4f5cc0f6
  25:     0x55db0edef4cc - std::panicking::try::do_call::hd0dfef565d14dc52
  26:     0x55db0eebf35a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:83
  27:     0x55db0ede7370 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hb0f8ffe5e3b63f17
  28:     0x55db0eea5eaf - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h55e2fe7570195774
                               at /rustc/c553e8e8812c19809e70523064989e66c5cfd3f1/src/liballoc/boxed.rs:942
  29:     0x55db0eebe500 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h5dc11c0f82cb11d1
                               at /rustc/c553e8e8812c19809e70523064989e66c5cfd3f1/src/liballoc/boxed.rs:942
                           std::sys_common::thread::start_thread::h80378c40e1d94fb4
                               at src/libstd/sys_common/thread.rs:13
                           std::sys::unix::thread::Thread::new::thread_start::h96368244a6c4ab72
                               at src/libstd/sys/unix/thread.rs:79
  30:     0x7f7ffd294182 - start_thread
  31:     0x7f7ffd1a1b1f - __clone
  32:                0x0 - <unknown>

 ERROR deadlock             | Thread Id 140187673249536
 ERROR deadlock             | stack backtrace:
   0:     0x55db0ee8a867 - backtrace::backtrace::trace::he1bd0e735dbbc667
   1:     0x55db0ee899d3 - backtrace::capture::Backtrace::new::hd1afaac098ef9e52
   2:     0x55db0ee73e9d - parking_lot_core::parking_lot::deadlock_impl::on_unpark::hebd2fcc23c174c94
   3:     0x55db0ee7b241 - parking_lot_core::parking_lot::park_internal::h6202e8ed10c8808f
   4:     0x55db0ee72c37 - parking_lot::raw_rwlock::RawRwLock::lock_shared_slow::ha87dd17cda083b88
   5:     0x55db0e9ba3c0 - nimiq_handel::aggregation::Aggregation<P>::send_update::ha2c9122dd7dd7108
   6:     0x55db0e9bb181 - nimiq_handel::aggregation::Aggregation<P>::start_level::hb0f8d63a2bf9cff8
   7:     0x55db0e9bd522 - nimiq_handel::aggregation::Aggregation<P>::check_completed_level::h6e633393da6b7ed0
   8:     0x55db0e9d42f3 - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::hed9efe52824d43e2
   9:     0x55db0edef96d - futures::task_impl::Spawn<T>::poll_future_notify::ha31b037a092a2736
  10:     0x55db0edef471 - std::panicking::try::do_call::h185dec94ef9a1086
  11:     0x55db0eebf35a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:83
  12:     0x55db0eded859 - tokio_threadpool::task::Task::run::h7910cafcf67341a7
  13:     0x55db0ede9b1a - tokio_threadpool::worker::Worker::run_task::he831297aeedb2153
  14:     0x55db0ede8fa0 - tokio_threadpool::worker::Worker::run::h3b776189020f3bd5
  15:     0x55db0edcf710 - std::thread::local::LocalKey<T>::with::hefc133a8c4e93cae
  16:     0x55db0edcf5e8 - std::thread::local::LocalKey<T>::with::hd985fae9b1c1d1d7
  17:     0x55db0edcf10c - tokio_reactor::with_default::hb0e802b210729b6f
  18:     0x55db0edd14cf - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::hbc5a0d67213046cd
  19:     0x55db0eded1ba - std::thread::local::LocalKey<T>::with::hd069e4a876a9abd1
  20:     0x55db0edecc59 - std::thread::local::LocalKey<T>::with::h95e390417fdeca45
  21:     0x55db0ede6828 - std::sys_common::backtrace::__rust_begin_short_backtrace::hce13dccd4f5cc0f6
  22:     0x55db0edef4cc - std::panicking::try::do_call::hd0dfef565d14dc52
  23:     0x55db0eebf35a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:83
  24:     0x55db0ede7370 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hb0f8ffe5e3b63f17
  25:     0x55db0eea5eaf - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h55e2fe7570195774
                               at /rustc/c553e8e8812c19809e70523064989e66c5cfd3f1/src/liballoc/boxed.rs:942
  26:     0x55db0eebe500 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h5dc11c0f82cb11d1
                               at /rustc/c553e8e8812c19809e70523064989e66c5cfd3f1/src/liballoc/boxed.rs:942
                           std::sys_common::thread::start_thread::h80378c40e1d94fb4
                               at src/libstd/sys_common/thread.rs:13
                           std::sys::unix::thread::Thread::new::thread_start::h96368244a6c4ab72
                               at src/libstd/sys/unix/thread.rs:79
  27:     0x7f7ffd294182 - start_thread
  28:     0x7f7ffd1a1b1f - __clone
  29:                0x0 - <unknown>

StakingValidator sorted set inconsistency

The StakingContract uses a BTreeSet to have all active stakes sorted. The ordering is ascending, yet the slot selection assumes descending balances:

        // Build potential validator set and find minimum stake.
        // Iterate from highest balance to lowest.
        for validator in self.active_stake_sorted.iter() {

This will be obsolete once we remove min_stake.

Increasing view change timeouts

Both PBFT and Albatross linearly increase the timeout for view changes after a view change.
Our code does not reflect that at the moment and uses a constant timeout instead.

First, to avoid starting a view change too soon, a replica that multicasts a view-change message for view v+1 waits for 2f+1 view-change messages for view v+1 and then starts its timer to expire after some time T. If the timer expires before it receives a valid new-view message for v+1 or before it executes a request in the new view that it had not executed previously, it starts the view change for view v+2 but this time it will wait 2T before starting a view change for view v+3.
see PBFT paper

We should adapt our code and increase the timeouts. In our Albatross paper we reset the factor after each slot.

Make (Get)EpochTransactions more DoS resistant

Once #16 is merged, it still needs to become more DoS resistant.

Currently, the following could be exploited:

  • Peer A sends a GetEpochTransactions request to malicious Peer B
  • Let's suppose the epoch contained 4 transactions: [0, 1, 2, 3]
  • Peer B is now able to send EpochTransactions messages with correct proofs for any subsets. For example, he could send such messages for [0, 1], [0, 2], [0, 3], [0, 1, 2], [2, 3], ...
  • All these message would currently pass our checks.
  • Only after sending the last message, Peer A would try creating the Merkle root of [0, 1, 0, 2, 0, 3, 0, 1, 2, 2, 3, ...] and realise that it doesn't match.

How to prevent this?
The best way is to replace the current Merkle proofs with a setting similar to our AccountsTreeChunks.
Intuitively, we only provide a proof for the rightmost transaction of our chunk of consecutive transactions. If these transactions are consecutive transactions, we can calculate the proof from there and check.

Consider the following Merkle tree:

                   h13
            /               \
          h11                h12
       /      \            /   \
     h7        h8        h9     h10
   /   \     /   \     /   \    |
  h0   h1   h2   h3   h4   h5   h6
   |    |    |    |    |    |    |
  t0   t1   t2   t3   t4   t5   t6

We want to split the list of six transactions into three chunks: [t0, t1, t2], [t3, t4, t5], [t6].
The first chunk consists of the transactions [t0, t1, t2] and the following list of hashes (what we call proof in the following): [h3, h12]
From the tree picture above, one can see that these two hashes, together with the first three transactions, suffice to verify the root.

The next chunk consists of [t3, t4, t5] and the proof is [h10].
In combination with the previously received transactions, this hash again suffices to calculate the root.

The proof for our last chunk of transactions is always empty (even indicating that this is the last set of transactions).

Slot calculation

Right now, after a view change, we could still assign the same, inactive validator to the next slot.
The specification, however, says:

The prohibition of producing micro blocks is applied immediately after a view-change (for a delay) or in the block where a slash transaction is included (for a fork).

Ignoring blocks in slot after invalid block

Our implementation currently differs from the specification in that it ignores invalid blocks, but still accepts further blocks from the same validator during the slot.
According to our specs, it should however ignore any further blocks from this validator during the current slot.

Due to this, in fact, we can immediately start a view change.
EDIT: Of course, we should only start a view change, if the invalid block was from the correct block producer for this slot!

Refactor KeyStore

Right now the KeyStore is just a file that stores anything that is Serialize + Deserialize. But it should make some other things easy as well. E.g. it should generate a key, if the key store doesn't exist yet. That means we need a Generate trait. The Generate trait should make sure we only use a CSPRNG.

I also think we should just store the keys in the database. Afaik core-js does this too. Then our whole "storage backend" is just which database we use: LMDB, LMDB-volatile or IndexedDB. Just use a database key-store and give keys names: peer-key, validator-key, wallet-default, etc.

Deadlock in InventoryAgent

logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log- 2019-12-16 18:57:28.683 WARN  blockchain           | Rejecting block - unknown predecessor
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log- 2019-12-16 18:57:28.683 DEBUG sync                 | blockchain.push() took 0ms (1 txs)
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log- 2019-12-16 18:57:28.683 DEBUG consensus_agent      | Received orphan block e9b730bb2c32ba3de696fa127789f3606e5cb91501328d61f2e7ce6bc88842be from ws://7.0.0.214:8443/1a9c3545e64efca6283c1d2c9c1ba231
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log- 2019-12-16 18:57:28.708 DEBUG sync                 | blockchain.push() took 13ms (1 txs)
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log- 2019-12-16 18:57:31.609 WARN  consensus            | Peer ws://7.0.0.217:8443/3171884c76ec214391cdd1285b91d5ca out of sync, re-syncing
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log: 2019-12-16 18:57:38.064 ERROR deadlock             | 1 deadlocks detected
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log: 2019-12-16 18:57:38.064 ERROR deadlock             | Deadlock #0
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log: 2019-12-16 18:57:38.064 ERROR deadlock             | Thread Id 140140896990976
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log: 2019-12-16 18:57:38.065 ERROR deadlock             | stack backtrace:
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-   0:     0x5642a6b4c077 - backtrace::backtrace::trace::hef5f178d8fb1f682
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-   1:     0x5642a6b4a1d3 - backtrace::capture::Backtrace::new::h2750fb2118bd8fbb
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log:   2:     0x5642a6b2e2c2 - parking_lot_core::parking_lot::deadlock_impl::on_unpark::ha4894833f2313316
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-   3:     0x5642a6b2502a - parking_lot::raw_rwlock::RawRwLock::wait_for_readers::h847d464ca6cfafcb
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-   4:     0x5642a6b21dd6 - parking_lot::raw_rwlock::RawRwLock::lock_exclusive_slow::h91e6374355f38ec5
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-   5:     0x5642a64167ce - nimiq_consensus::inventory::InventoryAgent<P>::on_objects_received::h8f49cd23bfcbbac4
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-   6:     0x5642a6416164 - nimiq_consensus::inventory::InventoryAgent<P>::on_object_received::h3f01b9a34f7f683a
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-   7:     0x5642a6419608 - nimiq_consensus::inventory::InventoryAgent<P>::on_tx::h37a1eb2b9eaf4ec8
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-   8:     0x5642a64699af - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::hd3bab14919dd3d06
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-   9:     0x5642a6a3c45f - nimiq_utils::observer::PassThroughNotifier<E>::notify::h679aa917c268ed6e
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  10:     0x5642a6a3377c - nimiq_messages::MessageNotifier::notify::h552e34ab7fe03672
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  11:     0x5642a6819c6a - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::h4868a009be3ccf6b
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  12:     0x5642a67d52fb - <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll::hdf99e51a9ea23fa5
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  13:     0x5642a67a8421 - futures::future::chain::Chain<A,B,C>::poll::he8804179c18546d9
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  14:     0x5642a67d2b0c - <futures::future::select::Select<A,B> as futures::future::Future>::poll::hfa6551d5fef46b97
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  15:     0x5642a6810711 - <futures::future::map::Map<A,F> as futures::future::Future>::poll::hd7940180a480f2e8
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  16:     0x5642a681ba6a - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::h0b51b9f54b6fcab8
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  17:     0x5642a6b0c602 - futures::task_impl::std::set::h2c2830f6cb06e220
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  18:     0x5642a6b0e5f2 - std::panicking::try::do_call::hba99e89c5b62b580
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  19:     0x5642a6b734aa - __rust_maybe_catch_panic
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-                               at src/libpanic_unwind/lib.rs:78
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  20:     0x5642a6b0ca1b - tokio_threadpool::task::Task::run::hc0d76b7d1c1a053f
logs/devnet_validator.1.u7ldwz67bv3xaxet0tmtucf57.log-  21:     0x5642a6b05653 - tokio_threadpool::worker::Worker::run_task::h57a0fae50b7432d1

Building fork proofs

Currently, our code doesn't check for forks.
We should however detect those, build fork proofs and send them to the network/include them into the chain.

Duplicate delay for key mempool

General information

  • Branch: devnet-rc3

Bug report

Node log:

DEBUG sync                 | blockchain.push() took 12ms (0 txs)
DEBUG sync                 | blockchain.push() took 12ms (0 txs)
ERROR timers               | Duplicate delay for key Mempool
DEBUG sync                 | blockchain.push() took 12ms (0 txs)
DEBUG consensus            | Finished sync with peer ws://7.0.0.2:8443/1a9c3545e64efca6283c1d2c9c1ba231
DEBUG sync                 | blockchain.push() took 0ms (0 txs)
DEBUG sync                 | blockchain.push() took 0ms (0 txs)
DEBUG sync                 | blockchain.push() took 0ms (0 txs)
DEBUG sync                 | blockchain.push() took 0ms (0 txs)
DEBUG sync                 | blockchain.push() took 0ms (0 txs)
DEBUG sync                 | blockchain.push() took 12ms (0 txs)
INFO  nimiq_client         | Head: #229 - aa2a888a841a9133fee6f8d6e20602068d7ffd6873b91d79e93782b1a3fd4615, Peers: 4
INFO  nimiq_client         | Head: #229 - aa2a888a841a9133fee6f8d6e20602068d7ffd6873b91d79e93782b1a3fd4615, Peers: 4
INFO  nimiq_client         | Head: #229 - aa2a888a841a9133fee6f8d6e20602068d7ffd6873b91d79e93782b1a3fd4615, Peers: 4
INFO  nimiq_client         | Head: #229 - aa2a888a841a9133fee6f8d6e20602068d7ffd6873b91d79e93782b1a3fd4615, Peers: 4

After that, the client is stuck. Not sure if that's caused by the duplicate delay key.

Panic in Blockchain::push_block

Branch: devnet-rc3.

Occurred during the first 100-validator cloud test.

Panic at

Block::Macro(_) => unreachable!(),

Crash log

Panic
 2019-12-02 03:02:13.667 DEBUG sync                 | blockchain.push() took 30ms (0 txs)
 2019-12-02 03:02:16.470 DEBUG consensus_agent      | Accepted tx 525eea42fc5abf56d047a2b2b15715aab869211b3794f346de8ec57f38828784 from ws://7.0.0.238:8443/af54b271816d56ce2f2b2833d649e252
 2019-12-02 03:02:16.478 DEBUG consensus_agent      | Accepted tx 95dcf357e611024b2ec272a3100b62aea1ba275908dd23a1d6612c294de37f85 from ws://7.0.0.238:8443/af54b271816d56ce2f2b2833d649e252
 2019-12-02 03:02:20.862 DEBUG blockchain           | Rebranching to fork 69afd4f93d614870c556686faf55299743bfca2428c2853a4ae479695f65e9e7, height #27185, view number 6
 2019-12-02 03:02:20.862 DEBUG blockchain           | Found common ancestor 335655801f3d89eedd6f9cf1d1bb8709b7aec1a11ef20f9c92dff2c7376510e2 at height #27184, 1 blocks up
 2019-12-02 03:02:20.865 DEBUG blockchain           | Slash inherent: view change: NQ54 XYRY HT58 NAU8 RDFR H7KD X0YQ SFQL 4ACY
thread 'tokio-runtime-worker-1' panicked at 'internal error: entered unreachable code', blockchain-albatross/src/blockchain.rs:697:36
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:84
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:61
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1024
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1428
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:65
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:50
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:193
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:210
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:471
  11: std::panicking::begin_panic
  12: nimiq_blockchain_albatross::blockchain::Blockchain::push_block
  13: <nimiq_blockchain_albatross::blockchain::Blockchain as nimiq_blockchain_base::AbstractBlockchain>::push
  14: <nimiq_consensus::consensus_agent::sync::FullSync<B> as nimiq_consensus::consensus_agent::sync::SyncProtocol<B>>::on_block
  15: nimiq_consensus::inventory::InventoryAgent<P>::on_block
  16: <F as nimiq_utils::observer::PassThroughListener<E>>::on_event
  17: nimiq_utils::observer::PassThroughNotifier<E>::notify
  18: nimiq_messages::MessageNotifier::notify
  19: <F as nimiq_utils::observer::PassThroughListener<E>>::on_event
  20: <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll
  21: futures::future::chain::Chain<A,B,C>::poll
  22: <futures::future::select::Select<A,B> as futures::future::Future>::poll
  23: <futures::future::map::Map<A,F> as futures::future::Future>::poll
  24: <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll
  25: futures::task_impl::std::set
  26: futures::task_impl::Spawn<T>::poll_future_notify
  27: std::panicking::try::do_call
  28: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:81
  29: tokio_threadpool::task::Task::run
  30: tokio_threadpool::worker::Worker::run_task
  31: tokio_threadpool::worker::Worker::run
  32: tokio_timer::clock::clock::with_default
  33: tokio::runtime::threadpool::builder::Builder::build::{{closure}}
  34: std::thread::local::LocalKey<T>::with
  35: std::thread::local::LocalKey<T>::with
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
 2019-12-02 03:02:21.137 INFO  nimiq_client         | Head: #27272 - 7f65e610dee399e0ee543439a125ee5c557119c6daa028a2ee2f9583f991fe7e, Peers: 62
 2019-12-02 03:02:23.670 DEBUG validator_network    | New view change for: #27273.2, node_id=18
 2019-12-02 03:02:23.670 INFO  validator            | Starting view change to #27273.2
 2019-12-02 03:02:24.724 DEBUG sink                 | Closing connection, reason: SendFailed (Some("SendFailed"))

Deadlock InventoryManager and InventoryAgent

Log

 DEBUG consensus            | Now at block #108
 ERROR deadlock             | 1 deadlocks detected
 ERROR deadlock             | Deadlock #0
 ERROR deadlock             | Thread Id 140299464595200
 ERROR deadlock             | stack backtrace:
   0:     0x55a34cd27e37 - backtrace::backtrace::trace::ha32ddd3bd1865c9b
   1:     0x55a34cd26fa3 - backtrace::capture::Backtrace::new::h497f7bc3d19d47ca
   2:     0x55a34cd12e62 - parking_lot_core::parking_lot::deadlock_impl::on_unpark::h35536aa489cad7c5
   3:     0x55a34cd0bbee - parking_lot::raw_rwlock::RawRwLock::lock_exclusive_slow::h84076fc30b0d7f02
   4:     0x55a34c6f8e94 - nimiq_consensus::inventory::InventoryAgent<P>::on_inv::h73cfc94dfbc2b9e0
   5:     0x55a34c610aa6 - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::h5a26572b5cdded2d
   6:     0x55a34cb39622 - nimiq_messages::MessageNotifier::notify::h1de7cbc6ae1c7fa2
   7:     0x55a34c952dba - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::hbf80717f1f5c85d6
   8:     0x55a34c900382 - nimiq_utils::observer::PassThroughNotifier<E>::notify::h9e8e3581021eb4e1
   9:     0x55a34c923b5d - <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll::h8ca01f084935b934
  10:     0x55a34c8f50f1 - futures::future::chain::Chain<A,B,C>::poll::h1df1b0aaa883c703
  11:     0x55a34c92e00c - <futures::future::select::Select<A,B> as futures::future::Future>::poll::hb3344e7bfcdb9f1a
  12:     0x55a34c953601 - <futures::future::map::Map<A,F> as futures::future::Future>::poll::h490d91e2f4faf5e6
  13:     0x55a34c8e268a - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::h07ac67317c82b61e
  14:     0x55a34cc8291d - futures::task_impl::Spawn<T>::poll_future_notify::h578192f6f2ceaa7d
  15:     0x55a34cc82521 - std::panicking::try::do_call::hbd3bb5a4fe726b8d
  16:     0x55a34cd5719a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:79
  17:     0x55a34cc809a9 - tokio_threadpool::task::Task::run::h428af919e1c4bc25
  18:     0x55a34cc7ccea - tokio_threadpool::worker::Worker::run_task::he9b680dae937c0d2
  19:     0x55a34cc7c345 - tokio_threadpool::worker::Worker::run::h57425c93fdabd8a3
  20:     0x55a34cc65f10 - std::thread::local::LocalKey<T>::with::h59030a792b45c3c4
  21:     0x55a34cc66058 - std::thread::local::LocalKey<T>::with::h5d932d7d80f3fbbe
  22:     0x55a34cc65a4c - tokio_reactor::with_default::h807439988ae28b9a
  23:     0x55a34cc609ef - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::hf3c7ed9481edd9ed
  24:     0x55a34cc7fd9a - std::thread::local::LocalKey<T>::with::h1355ee9ab466f05f
  25:     0x55a34cc7feb9 - std::thread::local::LocalKey<T>::with::h9749da1bf60553ee
  26:     0x55a34cc79ab8 - std::sys_common::backtrace::__rust_begin_short_backtrace::hb3b9a4c5d85d330e
  27:     0x55a34cc824dc - std::panicking::try::do_call::h7404f5a8f67048d9
  28:     0x55a34cd5719a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:79
  29:     0x55a34cc7aad0 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h4e812cafcaae0b61
  30:     0x55a34cd3dcef - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h93b6874db877fc34
                               at /rustc/ac162c6abe34cdf965afc0389f6cefa79653c63b/src/liballoc/boxed.rs:942
  31:     0x55a34cd56340 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::ha81aa42908ef2d29
                               at /rustc/ac162c6abe34cdf965afc0389f6cefa79653c63b/src/liballoc/boxed.rs:942
                           std::sys_common::thread::start_thread::h8f7df45dc4b098bd
                               at src/libstd/sys_common/thread.rs:13
                           std::sys::unix::thread::Thread::new::thread_start::he096a55a4d133fff
                               at src/libstd/sys/unix/thread.rs:79
  32:     0x7f9a14460182 - start_thread
  33:     0x7f9a1436db1f - __clone
  34:                0x0 - <unknown>

 ERROR deadlock             | Thread Id 140299673511680
 ERROR deadlock             | stack backtrace:
   0:     0x55a34cd27e37 - backtrace::backtrace::trace::ha32ddd3bd1865c9b
   1:     0x55a34cd26fa3 - backtrace::capture::Backtrace::new::h497f7bc3d19d47ca
   2:     0x55a34cd12e62 - parking_lot_core::parking_lot::deadlock_impl::on_unpark::h35536aa489cad7c5
   3:     0x55a34cd0bbee - parking_lot::raw_rwlock::RawRwLock::lock_exclusive_slow::h84076fc30b0d7f02
   4:     0x55a34c6fb8e9 - nimiq_consensus::inventory::InventoryManager<P>::request_vector::h478a165f3b96f77a
   5:     0x55a34c6fcfcf - nimiq_consensus::inventory::InventoryManager<P>::note_vector_not_received::h75dc4fe51e13c4b3
   6:     0x55a34c6ec601 - nimiq_consensus::inventory::InventoryAgent<P>::on_not_found::hb5bea1e1caad35b0
   7:     0x55a34c611ed6 - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::hbf6b31d10b6c54f2
   8:     0x55a34cb38c31 - nimiq_messages::MessageNotifier::notify::h1de7cbc6ae1c7fa2
   9:     0x55a34c952dba - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::hbf80717f1f5c85d6
  10:     0x55a34c900382 - nimiq_utils::observer::PassThroughNotifier<E>::notify::h9e8e3581021eb4e1
  11:     0x55a34c923b5d - <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll::h8ca01f084935b934
  12:     0x55a34c8f50f1 - futures::future::chain::Chain<A,B,C>::poll::h1df1b0aaa883c703
  13:     0x55a34c92e00c - <futures::future::select::Select<A,B> as futures::future::Future>::poll::hb3344e7bfcdb9f1a
  14:     0x55a34c953601 - <futures::future::map::Map<A,F> as futures::future::Future>::poll::h490d91e2f4faf5e6
  15:     0x55a34c8e268a - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::h07ac67317c82b61e
  16:     0x55a34cc8291d - futures::task_impl::Spawn<T>::poll_future_notify::h578192f6f2ceaa7d
  17:     0x55a34cc82521 - std::panicking::try::do_call::hbd3bb5a4fe726b8d
  18:     0x55a34cd5719a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:79
  19:     0x55a34cc809a9 - tokio_threadpool::task::Task::run::h428af919e1c4bc25
  20:     0x55a34cc7ccea - tokio_threadpool::worker::Worker::run_task::he9b680dae937c0d2
  21:     0x55a34cc7c345 - tokio_threadpool::worker::Worker::run::h57425c93fdabd8a3
  22:     0x55a34cc65f10 - std::thread::local::LocalKey<T>::with::h59030a792b45c3c4
  23:     0x55a34cc66058 - std::thread::local::LocalKey<T>::with::h5d932d7d80f3fbbe
  24:     0x55a34cc65a4c - tokio_reactor::with_default::h807439988ae28b9a
  25:     0x55a34cc609ef - tokio::runtime::threadpool::builder::Builder::build::{{closure}}::hf3c7ed9481edd9ed
  26:     0x55a34cc7fd9a - std::thread::local::LocalKey<T>::with::h1355ee9ab466f05f
  27:     0x55a34cc7feb9 - std::thread::local::LocalKey<T>::with::h9749da1bf60553ee
  28:     0x55a34cc79ab8 - std::sys_common::backtrace::__rust_begin_short_backtrace::hb3b9a4c5d85d330e
  29:     0x55a34cc824dc - std::panicking::try::do_call::h7404f5a8f67048d9
  30:     0x55a34cd5719a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:79
  31:     0x55a34cc7aad0 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h4e812cafcaae0b61
  32:     0x55a34cd3dcef - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h93b6874db877fc34
                               at /rustc/ac162c6abe34cdf965afc0389f6cefa79653c63b/src/liballoc/boxed.rs:942
  33:     0x55a34cd56340 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::ha81aa42908ef2d29
                               at /rustc/ac162c6abe34cdf965afc0389f6cefa79653c63b/src/liballoc/boxed.rs:942
                           std::sys_common::thread::start_thread::h8f7df45dc4b098bd
                               at src/libstd/sys_common/thread.rs:13
                           std::sys::unix::thread::Thread::new::thread_start::he096a55a4d133fff
                               at src/libstd/sys/unix/thread.rs:79
  34:     0x7f9a14460182 - start_thread
  35:     0x7f9a1436db1f - __clone
  36:                0x0 - <unknown>

Obey consensus mode configuration

This is currently not yet implemented. Which SyncProtocol is used is determined by the ConsensusProtocol. The ConsensusProtocol used in nimiq_lib2 is currently hardcoded to AlbatrossConsensus, which hard-codes the SyncProtocol to be FullSync.

We can make the SyncProtocol a type parameter of the AlbatrossConsensusProtocol. In nimiq_lib2 I think the best solution is to use a BoxedConsesusProtocol and box it:

struct BoxedConsensusProtocol(Box<dyn ConsensusProtocol>);
impl ConsensusProtocol for BoxedConsensusProtocol { ... }

With some modifications this would allow nimiq_lib2::Client to also instantiate Nimiq 1.0 clients.

Inactivating delaying slot owners

From our discussion:

Instead of slashing (with fine) misbehaving validators, we can deactivate all stakes of that slot owner. The deactivation of the stakes will be done after a grace period of 2 epochs. The slot owner than either has to:

  • Produce a block in the next epoch (where their stake is still active) - optional
  • Send a restaking transaction in the next epoch - this should be the default behavior

The restaking transaction can be done using a warm address that is linked to that stake. Thus the validator only needs to keep a private key of the warm wallet which only needs funds for transaction fees.

ed25519-dalek breaking change

nimiq-keys depends on the develop branch of ed25519-dalek. They don't derive Default for the secret key anymore. Thus our code doesn't compile anymore. They released 1.0.0-pre.3, so we can depend on that version.

Update rand crate and provide default CSPRNG

We're using an old version of the rand crate with some compatibility layer. We should really update to the latest version.

Furthermore, since a lot of our code depends on having a a cryptographically secure RNG, we should probably provide a default somwhere in nimiq-utils, which a consumer then can just pass to AnySecretKey::generate.

Macro Block PBFT

As explained in #53 and #49, there are still some implementation issues around the Macro Block PBFT.
I discussed these with @jgraef and Bruno Franca just now and here's the list of things that still need to be changed:

  1. The timeout for view changes should stop when receiving a valid proposal. Even if the network is split at that moment and half of the validators start a view change, they won't be able to complete it and can thus accept the valid proposal after a reunion of the network.
  2. A validator could provide two valid, but different proposals. The way to resolve this is to define a strict ordering on proposals (e.g., by their hash) and require honest validators to always choose the lowest.
  3. Considering case 2. and a split network, it can happen that half of the validators sends prepare for proposal A and the other half for B. So, after a reunion of the network, validators have to be allowed to send another prepare for the lower block (e.g., A). But under no circumstances, they will ever send commit messages for both blocks. This last requirement is actually already enforced by the rule to only send commit if you have 2f+1 prepares.

Albatross RPC Methods

  • getProducer(): query the producer of a block
  • slotState(): get the active slots
  • Staking contract information
  • Wallet Staking calls
  • Macro block prepare/commit progress
  • View Change progress

Remove path from dependencies

Instead of specifying our dependencies like this

nimiq-database = { path = "../database", version = "0.1" }

we can omit the path and have a patch section in the workspace's Cargo.toml:

[patch.crates-io]
nimiq-database = { path = "../database" }

Remove slashing completely

The idea is that loosing one's part of the rewards is economic disincentive enough to prevent forks. Bruno came up with this idea and provided a document that was checked by @mar-v-in and me.

Basically, together with #9, no more slashing needs to be done.
Validators that forked will loose their share of the rewards and cannot produce any block during the epoch anymore.
Validators that failed to produce a block will be treated according to #9.

This has huge advantages:

  • We don't need to calculate a slash amount.
  • We don't need to restrict the validator selection to max_considered.

This requires optimisation of the SegmentTree though: it should be updatable, so that we don't build a new segment tree for every validator selection process.

Deadlock in `Blockchain::push_block`

Note: This is a deadlock has a different code path than #6

Stack trace

logs/devnet_validator_26.log- WARN  pool                 | No validator info for: 10 (23 votes)
logs/devnet_validator_26.log- WARN  pool                 | No validator info for: 12 (34 votes)
logs/devnet_validator_26.log- WARN  pool                 | No validator info for: 14 (21 votes)
logs/devnet_validator_26.log- WARN  pool                 | No validator info for: 16 (25 votes)
logs/devnet_validator_26.log- INFO  validator_network    | Received view change proof for: #23394.31
logs/devnet_validator_26.log: ERROR deadlock             | 1 deadlocks detected
logs/devnet_validator_26.log: ERROR deadlock             | Deadlock #0
logs/devnet_validator_26.log: ERROR deadlock             | Thread Id 140380133652224
logs/devnet_validator_26.log: ERROR deadlock             | stack backtrace:
logs/devnet_validator_26.log-   0:     0x55d912efd2b7 - backtrace::backtrace::trace::hec5b9775dc1d48cf
logs/devnet_validator_26.log-   1:     0x55d912efc423 - backtrace::capture::Backtrace::new::he8e2cfc45bc90990
logs/devnet_validator_26.log:   2:     0x55d912ee68ed - parking_lot_core::parking_lot::deadlock_impl::on_unpark::hbc1ebf8950306e52
logs/devnet_validator_26.log-   3:     0x55d912eedc91 - parking_lot_core::parking_lot::park_internal::he84998e5df5c11e3
logs/devnet_validator_26.log-   4:     0x55d912ee58cf - parking_lot::raw_rwlock::RawRwLock::upgrade_slow::ha02972c4993bed5a
logs/devnet_validator_26.log-   5:     0x55d912d35e31 - nimiq_blockchain_albatross::blockchain::Blockchain::push_block::h44f02ec4e09d2652
logs/devnet_validator_26.log-   6:     0x55d912d3edec - <nimiq_blockchain_albatross::blockchain::Blockchain as nimiq_blockchain_base::AbstractBlockchain>::push::hfd80ebcf8157243b
logs/devnet_validator_26.log-   7:     0x55d91288aeb3 - <nimiq_consensus::consensus_agent::sync::FullSync<B> as nimiq_consensus::consensus_agent::sync::SyncProtocol<B>>::on_block::h6e500077ebaf5fd2
logs/devnet_validator_26.log-   8:     0x55d9128dabb0 - nimiq_consensus::inventory::InventoryAgent<P>::on_block::hfb909ea99485ba50
logs/devnet_validator_26.log-   9:     0x55d9127f2bef - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::hae74b681fed177f7
logs/devnet_validator_26.log-  10:     0x55d912d1d96c - nimiq_utils::observer::PassThroughNotifier<E>::notify::h0dec4b1e700e784d
logs/devnet_validator_26.log-  11:     0x55d912d15fcb - nimiq_messages::MessageNotifier::notify::h22d165486707c28b
logs/devnet_validator_26.log-  12:     0x55d912b2ed3d - <F as nimiq_utils::observer::PassThroughListener<E>>::on_event::h0d1eb895ecababc8
logs/devnet_validator_26.log-  13:     0x55d912adde62 - nimiq_utils::observer::PassThroughNotifier<E>::notify::h352d5c6cb670a11e
logs/devnet_validator_26.log-  14:     0x55d912b0258d - <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll::hc1eac64dcf437bc7
logs/devnet_validator_26.log-  15:     0x55d912ad570c - futures::future::chain::Chain<A,B,C>::poll::h4506e149e0a36ee6
logs/devnet_validator_26.log-  16:     0x55d912b0111c - <futures::future::select::Select<A,B> as futures::future::Future>::poll::h5bde97e4142fea98
logs/devnet_validator_26.log-  17:     0x55d912b305d1 - <futures::future::map::Map<A,F> as futures::future::Future>::poll::hc106dc243de6998c
logs/devnet_validator_26.log-  18:     0x55d912abfe8a - <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll::h6cce08ccd9a3745a
logs/devnet_validator_26.log-  19:     0x55d912e6164d - futures::task_impl::Spawn<T>::poll_future_notify::h0d68f206582f2312
logs/devnet_validator_26.log-  20:     0x55d912e612f1 - std::panicking::try::do_call::h6559204dd63416eb
logs/devnet_validator_26.log-  21:     0x55d912f31daa - __rust_maybe_catch_panic
logs/devnet_validator_26.log-                               at src/libpanic_unwind/lib.rs:79

Reward distribution

Right now, we distribute the reward equally between all non-misbehaving validators.
While this incentives reporting of forks, it also incentivises validators attacking each other to maximise their reward.

The current specification thus says:

His reward is confiscated and burned\footnote{The reward is not divided among the other validators so as to not incentivize them to attack each other (ex: by doing a denial-of-service).}.

Failed to compute transactions root, micro blocks missing

This panic was captured on our DevNet (branch devnet-rc3):

logs/devnet_validator_9.log- WARN  staking_contract     | Slashing NQ98 LFT2 96K1 314V H2DT D3UE 9BME BY9R FNM4 with 3.39218
logs/devnet_validator_9.log- WARN  staking_contract     | Slashing NQ89 1551 J9H2 K8C0 4XMV SG37 PP38 V6RT E5N4 with 3.39218
logs/devnet_validator_9.log- WARN  staking_contract     | Slashing NQ76 0JQR LU02 9RHX ES9A TSSX CNGE UPYE KTFC with 3.39218
logs/devnet_validator_9.log- INFO  consensus            | Now at block #7168
logs/devnet_validator_9.log- WARN  staking_contract     | Slashing NQ08 EGR6 4UUA LRBE GV9Q 50Y1 2FDU CPA0 GH7C with 3.37895
logs/devnet_validator_9.log: ERROR panic                | thread 'tokio-runtime-worker-5' panicked at 'Failed to compute transactions root, micro blocks missing': src/libcore/option.rs:1185
logs/devnet_validator_9.log-stack backtrace:
logs/devnet_validator_9.log-   0: log_panics::init::{{closure}}
logs/devnet_validator_9.log-   1: std::panicking::rust_panic_with_hook
logs/devnet_validator_9.log-             at src/libstd/panicking.rs:468
logs/devnet_validator_9.log-   2: std::panicking::continue_panic_fmt
logs/devnet_validator_9.log-             at src/libstd/panicking.rs:373
logs/devnet_validator_9.log-   3: rust_begin_unwind
logs/devnet_validator_9.log-             at src/libstd/panicking.rs:302
logs/devnet_validator_9.log-   4: core::panicking::panic_fmt
logs/devnet_validator_9.log-             at src/libcore/panicking.rs:139
logs/devnet_validator_9.log-   5: core::option::expect_failed
logs/devnet_validator_9.log-             at src/libcore/option.rs:1185
logs/devnet_validator_9.log-   6: nimiq_block_production_albatross::BlockProducer::next_macro_header
logs/devnet_validator_9.log-   7: nimiq_block_production_albatross::BlockProducer::next_macro_block_proposal
logs/devnet_validator_9.log-   8: nimiq_validator::validator::Validator::produce_macro_block
logs/devnet_validator_9.log-   9: <futures::future::lazy::Lazy<F,R> as futures::future::Future>::poll
logs/devnet_validator_9.log-  10: futures::task_impl::Spawn<T>::poll_future_notify
logs/devnet_validator_9.log-  11: std::panicking::try::do_call
logs/devnet_validator_9.log-  12: __rust_maybe_catch_panic
logs/devnet_validator_9.log-             at src/libpanic_unwind/lib.rs:83

Use RwLock for LazyPublicKey

It currently uses a Mutex. Since the internal cache for uncompressed public keys is only accessed once for reading, it makes much more sense to use a RwLock here.

Furthermore what's the point of uncompressed()? There is uncompress() that returns a reference and uncompressed just clones that. No need to have a method for that IMHO.

Unknown signers in signature: MultiSignature

Seems like there is a race condition where we reset the ValidatorPool which is used by Handel as a IdentityRegistry. But the impl looks like this:

impl WeightRegistry for ValidatorRegistry {
    fn weight(&self, id: usize) -> Option<usize> {
        self.validators.read().get_public_key(id).map(|(_, weight)| weight)
    }
}

So in between calls the validator pool might change. Actually ValidatorPool::reset_epoch should do all changes in one go (while holding a write lock), but maybe not all validators got selected for the new epoch, so a validator ID is missing. Also if this wasn't the case, the IDs would still be messed up.

Solution

As a first fix, the impl for the IdentityRegistry can override the signers_weight implementation, so atleast all weights can be fetched with holding the lock only once.

In a second step we need to remove the possibility that an aggregation for epoch X queries the ValidatorPool which was reset to epoch X + 1.
In the first implementation of IdentityRegistry it actually wasn't intended to have interior mutability. We can reimplement that again by copying all public keys and weights only when the IdentityRegistry is constructed. All necessary data should be available at this point anyway.

Notes

  • Same log dump as #59
  • The node kept working after this panic and this is should happen very rarely. So low priority.

Log

 DEBUG consensus            | Now at block #32895
 DEBUG validator_network    | pBFT proposal by validator 11: 341b44023fc028f26b6beef4b8f01aadbc3e25b2f77a1a5a9ff59dfd31e9b8b8
 DEBUG consensus            | Now at block #32896
 DEBUG validator            | Setting validator to active: pk_idx=28
 WARN  pool                 | No validator info for: 8 (2 votes)
 ERROR panic                | thread 'tokio-runtime-worker-2' panicked at 'Unknown signers in signature: MultiSignature { signature: Signature(850d7f1bd3cef158bf45c7631db6820c0ad7801f83a9c7d178ccf8f027575225d652d8cb1c3091fb6279232c742d095f), signers: Bit
Set(29)[0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30] }': validator/src/signature_aggregation/voting.rs:171
stack backtrace:
   0: log_panics::init::{{closure}}
   1: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:468
   2: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:373
   3: std::panicking::begin_panic_fmt
             at src/libstd/panicking.rs:328
   4: nimiq_validator::signature_aggregation::voting::VotingProtocol<T>::votes::{{closure}}::{{closure}}
   5: nimiq_validator::signature_aggregation::voting::VotingProtocol<T>::votes
   6: nimiq_validator::validator_network::ValidatorNetwork::on_pbft_commit_level_update
   7: <F as nimiq_utils::observer::PassThroughListener<E>>::on_event
   8: nimiq_validator::validator_agent::ValidatorAgent::on_pbft_commit_message
   9: <F as nimiq_utils::observer::PassThroughListener<E>>::on_event
  10: nimiq_messages::MessageNotifier::notify
  11: <F as nimiq_utils::observer::PassThroughListener<E>>::on_event
  12: nimiq_utils::observer::PassThroughNotifier<E>::notify
  13: <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll
  14: futures::future::chain::Chain<A,B,C>::poll
  15: <futures::future::select::Select<A,B> as futures::future::Future>::poll
  16: <futures::future::map::Map<A,F> as futures::future::Future>::poll
  17: <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll
  18: futures::task_impl::Spawn<T>::poll_future_notify
  19: std::panicking::try::do_call
  20: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:79
  21: tokio_threadpool::task::Task::run
  22: tokio_threadpool::worker::Worker::run_task
  23: tokio_threadpool::worker::Worker::run
  24: std::thread::local::LocalKey<T>::with
  25: std::thread::local::LocalKey<T>::with
  26: tokio_reactor::with_default
  27: tokio::runtime::threadpool::builder::Builder::build::{{closure}}
  28: std::thread::local::LocalKey<T>::with
  29: std::thread::local::LocalKey<T>::with
  30: std::sys_common::backtrace::__rust_begin_short_backtrace
  31: std::panicking::try::do_call
  32: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:79
  33: core::ops::function::FnOnce::call_once{{vtable.shim}}
  34: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/ac162c6abe34cdf965afc0389f6cefa79653c63b/src/liballoc/boxed.rs:942
  35: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/ac162c6abe34cdf965afc0389f6cefa79653c63b/src/liballoc/boxed.rs:942
      std::sys_common::thread::start_thread
             at src/libstd/sys_common/thread.rs:13
      std::sys::unix::thread::Thread::new::thread_start
             at src/libstd/sys/unix/thread.rs:79
  36: start_thread
  37: __clone

 DEBUG consensus            | Now at block #32897
 DEBUG sink                 | Closing connection, reason: SendFailed (Some("SendFailed"))
 DEBUG sink                 | Error closing connection: send failed because receiver is gone
 DEBUG consensus            | Now at block #32898

Invalid Block after finishing sync

With the new client and client API the connection to the seed fails after finishing the sync. This is configured as full sync, but synced up suspiciously fast.

The code for this is on branch janosch/new-client.

Log

 INFO  consensus            | Now at block #33000
 INFO  nimiq_client2        | Head: #33072 - 868593894cb7856dd4c1c726c8748645446ec8789ceada7667bca060ed71d952, Peers: 1
 INFO  consensus            | Now at block #33100
 INFO  consensus            | Now at block #33200
 DEBUG consensus            | Finished sync with peer ws://5.0.0.5:8443/1a9c3545e64efca6283c1d2c9c1ba231
 INFO  consensus            | Synced with all connected peers (1), consensus established
 INFO  consensus            | Blockchain at block #33256 [4e81becd8edcbb779a94f448a1f3782ddc9487b2855d5dd58f9ac1c57a8ded97]
 WARN  blockchain           | Failed to determine slots - preceding macro block not found: block_number=33300, view_number=0, state.block_number()=33256
 DEBUG sink                 | Closing connection, reason: InvalidBlock (None)
 DEBUG connection_pool      | Peer left: ws://albatross.nimiq.dev:8444/1a9c3545e64efca6283c1d2c9c1ba231 144.76.172.80 (version=Some(1), closeType=InvalidBlock)
 INFO  consensus            | Disconnected from ws://5.0.0.5:8443/1a9c3545e64efca6283c1d2c9c1ba231
 DEBUG connection_pool      | Connection established (outbound) #0 144.76.172.80 ws://albatross.nimiq.dev:8444/1a9c3545e64efca6283c1d2c9c1ba231
 DEBUG connection_pool      | Peer joined: ws://5.0.0.5:8443/1a9c3545e64efca6283c1d2c9c1ba231 (v1, FULL | VALIDATOR, core-rs/0.1.0 (native; linux x86_64))
 INFO  consensus            | Connected to ws://5.0.0.5:8443/1a9c3545e64efca6283c1d2c9c1ba231
 DEBUG network_agent        | Requesting addresses from ws://5.0.0.5:8443/1a9c3545e64efca6283c1d2c9c1ba231
 DEBUG consensus            | Syncing blockchain with peer ws://5.0.0.5:8443/1a9c3545e64efca6283c1d2c9c1ba231
 INFO  consensus            | Now at block #33257
 INFO  consensus            | Now at block #33258
 INFO  consensus            | Now at block #33259

Analysis

As you can see the client reconnects. It then syncs again, but the same error repeats - with the only difference that in the repeating errors there is no: Failed to determine slots - preceding macro block not found
After some time, it works without errors.

Accounts::commit takes block number

Accounts::commit takes a block number, but for macro-block-sync we don't exactly know that number. Currently Blockchain::push_isolated_macro uses the block number of the macro block. A full sync uses the block number if the actual micro block the transactions and inherents were in.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.