pkel / cpr Goto Github PK

consensus protocol research

Makefile 0.49% Python 24.78% OCaml 57.21% Shell 0.74% TeX 7.14% R 1.40% Nix 0.69% JavaScript 0.26% SCSS 0.16% HTML 0.53% Rust 6.60%

cpr's Introduction

Consensus Protocol Research

CPR is a toolbox for specifying, simulating, and attacking proof-of-work consensus protocols. In this repository you find

protocol specifications for Bitcoin, Ethereum PoW, and others,
implementations of known attacks against these protocols,
a simulator that executes the specified protocols and attacks in a virtual environment,
tooling for automatic attack-search with reinforcement learning (RL) and
evaluation scripts and notebooks for the above.

I'm working on a website with more details.

Related Work

CPR was inspired by previous work on HotPoW and Parallel Proof-of-Work / $\mathcal B_k$. [code] [preprint] [AFT'22 paper]
We applied CPR to analyze the Tailstorm consensus and cryptocurrency. [preprint]

Python/RL Quickstart

CPR provides an OpenAI Gym environment for attack search with Python RL frameworks. If you meet the following requirements, you can install it from PyPI.

Unix-like operating system with x86_64 support
CPython, version >= 3.9

pip install cpr-gym

If this worked, you are ready to go. The following snippet simulates 2016 steps of honest behaviour in Nakamoto consensus.

import gym
import cpr_gym

env = gym.make("cpr-nakamoto-v0", episode_len = 2016)
obs = env.reset()
done = False
while not done:
    action = env.policy(obs, "honest")
    obs, rew, done, info = env.step(action)

Install from Source

The protocol specifications and simulator are OCaml programs. Also most parts of the Gym environment are written in OCaml. The Python module cpr_gym loads the OCaml code from a pre-compiled shared object named cpr_gym_engine.so. In order to install the package from source, you have to build this shared object and hence have the OCaml toolchain installed.

Opam is the OCaml package manager. It's a bit like Python's pip or Javascript's npm. We use it do download and install our OCaml dependencies and to manage different versions of the OCaml compiler. Make sure that a recent version (>= 2.0) is installed on your system. Follow these instructions. Then use make setup to get compiler and dependencies setup in the current working directory under _opam. Later, e.g. when dependencies change, run make dependencies to update the toolchain. If you ever suspect that the OCaml dependencies are broken, and you do not know how to fix it, delete the _opam directory and run make setup again.

Dune is an OCaml build system. We use it to build executables and shared objects, and to run tests. You do not have to interact with dune directly. Just run make build to test whether the OCaml build works.

Now, installing cpr_gym as editable Python package should work. Try pip install -e . and follow the short Python example above. If it works, you're ready to go.

import cpr_gym tries to detect editable installs. If so, cpr_gym_engine.so is loaded from the OCaml build directory (./_build). You can rebuild the DLL with make build.

It might be useful to install all Python development dependencies with pip install -r requirements.txt. Afterwards, you can run the full test suite, OCaml and Python, with make test.

cpr's People

Contributors

Stargazers

Watchers

Forkers

umass-forensics bglick13

cpr's Issues

Redundant case in Tailstorm policy.

cpr/ocaml/protocols/tailstorm_ssz.ml

Lines 333 to 335 in 3f8cc5e

 else if o.private_blocks = 0 && o.public_blocks = 0 

 then Wait_Proceed 

 else if o.public_blocks = 0

This likely is also the case in the other attack spaces.

Upgrade gym or switch to gymnasium

See cdab3e8 and c6dc932

Visualize blockchains and executions in browser

The current approach to visualization is to generate GraphViz dot files and render them to PNG. This works for small blockchains and short executions. It also works on the local machine only.

A better approach might be to log executions into a graph exchange format (e.g. graphml), then render it in a browser using javascript. Candidate libraries are sigma.js and cytoscape.js. It would be really nice if make visualize would create a website listing all available executions. I could also support loading an execution that is not listed, e.g. from a raised exception on a different machine.

Tasks:

Log executions into graph exchange format (see 2c368cb)
Visualize in browser

Efficient Tailstorm block selection

In the Tailstorm protocol participants choose which (sub) blocks to include into the next summary (or strong block) such that their own reward is maximized. The current implementation use a brute-force algorithm to select the optimal combination of sub blocks. This is very expensive for high $k$.

Things can be sped up with a dynamic programming algorithm. I think George has some notes about that somewhere.

Synchronize Tailstorm implementations

The Tailstorm implementation here in this repo (spec) does not align with the C++ implementation of the protocol (impl).

Impl (full/strong) blocks do not require a proof-of-work; they are formed deterministically from $k$ sub blocks; all miners derive their own strong block as soon as they have $k$ sub blocks available; due to the determinism the resulting strong block are compatible (the same) as long as the $k$ referenced sub blocks are the same.

Spec (full/strong) blocks require a proof-of-work; there are $k-1$ sub blocks per strong block; miners share the strong block with the other nodes as soon as the required proof-of-work is solved.

I think that the difference is not substantial. However, communicating the difference is tedious and might confuse readers. It'd be better if spec and impl align. I cannot change impl, thus I'll adapt the spec.

The current DAG / proof-of-work abstraction does not support deterministic appends. In principle it could be added, but we might get away without it. At least for a first draft.

I plan to implement a new version (spec') as follows.

Spec' (full/strong) blocks do not require a proof-of-work; forming them requires $k$ sub blocks parents; all miners derive their own strong block as soon as they have $k$ sub blocks available; due to the non-determinism the resulting strong block are not compatible; each miner will mine on his own strong block until the conflict is resolved by counting confirming sub blocks; this will create a lot of strong block orphans.

This should be relatively straight forward. We have something similar (without tree-structure of sub blocks) in the $B_k$ protocol.

At a later stage, I can add support for deterministic appends. This will get rid of the orphaned strong blocks. This will transform spec' into spec.

Implement spec'.
Support deterministic appends.

RL episode termination

From https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html

Two important things to keep in mind when creating a custom environment is to avoid breaking Markov assumption and properly handle termination due to a timeout (maximum number of steps in an episode). For instance, if there is some time delay between action and observation (e.g. due to wifi communication), you should give a history of observations as input.

Termination due to timeout (max number of steps per episode) needs to be handled separately. You should fill the key in the info dict: info["TimeLimit.truncated"] = True. If you are using the gym TimeLimit wrapper, this will be done automatically. You can read Time Limit in RL or take a look at the RL Tips and Tricks video for more details.

Delayed action should not be an issue for us but termination might be.The linked paper's abstract is instructive:

In reinforcement learning, it is common to let an agent interact for a fixed amount of time with its environment before resetting it and repeating the process in a series of episodes. The task that the agent has to learn can either be to maximize its performance over (i) that fixed period, or (ii) an indefinite period where time limits are only used during training to diversify experience. In this paper, we provide a formal account for how time limits could effectively be handled in each of the two cases and explain why not doing so can cause state aliasing and invalidation of experience replay, leading to suboptimal policies and training instability. In case (i), we argue that the terminations due to time limits are in fact part of the environment, and thus a notion of the remaining time should be included as part of the agent's input to avoid violation of the Markov property. In case (ii), the time limits are not part of the environment and are only used to facilitate learning. We argue that this insight should be incorporated by bootstrapping from the value of the state at the end of each partial episode. For both cases, we illustrate empirically the significance of our considerations in improving the performance and stability of existing reinforcement learning algorithms, showing state-of-the-art results on several control tasks.

In principle, we are in case (ii). Blockchain protocols run forever and so do the attacks. We end the episode to facilitate training. The paper recommends to bootstrap the start state of the next episode from the end state of the finished episode. In our setting, this would mean that we do not reset the DAG and participant state on episode end. Instead we should only reset the reward calculation. Maybe truncate the DAG to avoid memory leaks. This is certainly feasible but requires some time to implement.

In the meantime, we can apply the solution proposal for case (i) to our problem. Add episode progress (can be chain progress, chain time, number of steps in episode) to the observation. I've implemented a wrapper for this in 5093252.

cpr/python/train/ppo.py

Lines 197 to 201 in 5093252

 fields = [] 

 fields.append(((lambda self, info: info["episode_progress"]), 0, float("inf"), 0)) 

 fields.append(((lambda self, info: info["episode_chain_time"]), 0, float("inf"), 0)) 

 fields.append(((lambda self, info: info["episode_n_steps"]), 0, float("inf"), 0)) 

 env = cpr_gym.wrappers.ExtendObservationWrapper(env, fields)

But it's not clear to me whether we indeed violate the Markov property w/o one of the above fixes. Wind-down of the episode does not do anything special. We just return done = true and restart from scratch at the end of the episode. In the dense wrapper, rewards have been calculated and reported in the previous step. Maybe I should read the full paper in order to understand the problem better.

SSZ attack spaces bug

Consider the prepare handler in the Ethereum attack space.

cpr/ocaml/protocols/ethereum_ssz.ml

Lines 409 to 417 in 642756c

 | Deliver x -> 

 let state = 

 (* simulate defender *) 

 handle_public state event 

 in 

 (* deliver visible (not ignored) messages *) 

 if Public_view.visibility x 

 then `Deliver, handle_private state event 

 else `Deliver, state

L414 must be a bug. If anything, we should filter for private visibility.

But it seems to me, that handle_private should not be applied at all. Imagine a situation, where the agent wants to work on a private chain that is shorter than the public chain. The application of handle_private to delivered messages forces the agent to adopt the longer (public) chain.

I fixed the behaviour for tailstorm (now tailstormll) in #10. All other attack spaces have this problem.

When fixing this problem, consider adopting the recent changes from tailstorm_ssz for the other protocols. I think the new match/override mechanism translates to all protocols.

Fix broken DAA test

https://github.com/pkel/cpr/actions/runs/3603867013/jobs/6072772913

The test_simple_daa python test fails sometimes. I guess the observed chain is too short. Needs investigation.

Seed OCaml random state from Python

For reproducible evaluations it would be great if we can set the RL-engine's (OCaml) random state from the Python API.

# Option 1
env = gym.make("cpr_gym:cpr-v0", seed=42)
# Option 2
env.reset(seed=42)
# Option 3
env.set_random_state(42)

Option 3 is most versatile. We could call set_random_state from reset and make/__init__ already calls reset.

Optimize observation spaces

All SSZ attack spaces use integer observations and set their range either to 0 to max_int or min_int to max_int. For some fields the actual range is much smaller. Maybe we can provide more restrictive bounds to the OpenAI gym.

Improve B_k attacks

The $B_k$ protocol has some hard-coded attack strategies but they are not nearly optimal. Consider this graph from https://github.com/pkel/cpr/blob/training/experiments/rl-eval/compare-models-and-find-breakeven.ipynb.

Two ways forward:

Copy/adapt the best hard-coded policy against tailstorm and check how it performs
Try to reverse-engineer the learned policies.

Reproduce old igraph experiment with new graphml runner

In PR #8, I've added a simulator executable that takes graphml input and produces graphml output. It allows to coordinate network simulations from pyhton and R.

In the PR I left open two ToDos:

Reproduce old igraph experiment with new runner
Delete old igraph experiment

With old experiment I mean the stuff in /experiments/simulator-topology

Simulator failure when training Tailstorm with k=2

Observed this error after running make train-online on 247c11f.

## Environment (before vectorization) ##
Tailstorm with k=2, constant rewards, and optimal sub-block selection; SSZ'16-like attack space; α=0.25 attacker
public_blocks: 0
private_blocks: 0
diff_blocks: 0
public_votes: 1
private_votes_inclusive: 2
private_votes_exclusive: 1
public_depth: 0
private_depth_inclusive: 1
private_depth_exclusive: 1
event: 2
Actions: (0) Adopt_Prolong | (1) Override_Prolong | (2) Match_Prolong | (3) Wait_Prolong | (4) Adopt_Proceed | (5) Override_Proceed | (6) Match_Proceed | (7) Wait_Proceed
## Training ##
Using cpu device
-----------------------------------
| rollout/           |            |
|    ep_len_mean     | 248        |
|    ep_rew_mean     | 0.63059205 |
| time/              |            |
|    fps             | 10568      |
|    iterations      | 1          |
|    time_elapsed    | 23         |
|    total_timesteps | 245760     |
-----------------------------------
Process ForkServerProcess-20:
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    observation, reward, done, info = env.step(data)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 208, in step
    obs, reward, done, was_info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 184, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 159, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 84, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/gym/wrappers/order_enforcing.py", line 11, in step
    observation, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/envs.py", line 47, in step
    obs, r, d, i = engine.step(self.ocaml_env, a)
  File "ocaml/gym/bridge.ml", line 105, in Dune__exe__Bridge.(fun):105
  File "ocaml/gym/engine.ml", line 183, in Dune__exe__Engine.of_module.step:183
  File "ocaml/protocols/tailstorm_ssz.ml", line 293, in Cpr_protocols__Tailstorm_ssz.Make.Agent.apply:293
  File "ocaml/protocols/tailstorm.ml", line 519, in Cpr_protocols__Tailstorm.Make.Honest.next_summary':519
  File "ocaml/protocols/tailstorm.ml", line 415, in Cpr_protocols__Tailstorm.Make.Honest.optimal_quorum:415
  File "ocaml/protocols/combinatorics.ml", line 17, in Cpr_protocols__Combinatorics.n_choose_k:17
ValueError: (Division_by_zero)
Traceback (most recent call last):
  File "/home/patrik/devel/cpr/python/train/ppo.py", line 315, in <module>
    model.learn(
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/ppo/ppo.py", line 314, in learn
    return super().learn(
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 251, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 185, in collect_rollouts
    if callback.on_step() is False:
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
    return self._on_step()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 192, in _on_step
    continue_training = callback.on_step() and continue_training
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
    return self._on_step()
  File "/home/patrik/devel/cpr/python/train/ppo.py", line 232, in _on_step
    r = super()._on_step()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 435, in _on_step
    episode_rewards, episode_lengths = evaluate_policy(
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/evaluation.py", line 87, in evaluate_policy
    observations, rewards, dones, infos = env.step(actions)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 162, in step
    return self.step_wait()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/vec_monitor.py", line 76, in step_wait
    obs, rewards, dones, infos = self.venv.step_wait()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 120, in step_wait
    results = [remote.recv() for remote in self.remotes]
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 120, in <listcomp>
    results = [remote.recv() for remote in self.remotes]
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError

Selfish mining network with gamma = 0 is broken

The selfish mining network as described in the following blog post is broken. Gamma=0 results in a devide-by-zero and hence unbounded message delays from attacker to defender. These message delays weaken the attacker.

https://pkel.github.io/cpr/blog/generalizing-selfish-minings-gamma/#constructing-a-network

As a hotfix, I've decreased the epsilon to 1e-9. This enables gamma=0.01 without risking significant message delays.

e7521ad#diff-d63bdccebde9f15fb0baea8999c8565dfe15c0cb0dbcf37ef3a896770550c290

It would be better to sample the message delays from a different interval. Maybe fix the interval length, then move it up or down to meet the given gamma. I recall that we had something like this before. Could be worth browsing the history of the network implementation.

Revise API for incentive schemes

The current API for implementing incentive schemes looks like this.

cpr/ocaml/lib/intf.ml

Lines 37 to 39 in 3f8cc5e

 (** Calculate and assign rewards for the vertex and (potentially) its neighbours. 

  Typically this is called on the history of the winning chain. *) 

 type 'a reward_function = assign:(float -> 'a Dag.vertex -> unit) -> 'a Dag.vertex -> unit

I think it would be easier to add something like a coinbase transaction to each block. E.g. the referee could define a function

val reward: data -> (int * float) list

Previously, the incentive scheme assigned rewards to vertices. The framework looked up the origin of the vertex and redirected the reward to the originating node.

With the new scheme this is not possible. Means we cannot assign rewards to votes. But the new scheme would hand out the vote rewards with the next block.

The new scheme works better with deterministic appends, where vertices can have more than one origin.

One feature of the existing API is that one protocol can define multiple reward schemes. If we want to keep this, we could do something like

val reward: scheme -> data -> (int * float) list

The simulator could accumulate the past rewards for each DAG vertex. This would simplify the implementation of the RL engine, where we currently recalculate the rewards for the whole chain on each step. See

cpr/ocaml/gym/engine.ml

Lines 277 to 295 in 3f8cc5e

 (* TODO. We calculate rewards for the whole chain on each step, then return the delta. 

  If this turns out to be to expensive, we can record safepoints and use them for 

  caching. *) 

 let reward_attacker = ref 0. 

 and reward_defender = ref 0. 

 and n_pow = ref 0 in 

 let () = 

 let f vertex = 

 let open Simulator in 

 if Option.is_some (Dag.data vertex).pow then incr n_pow else (); 

 t.reward_function 

 ~assign:(fun reward vertex -> 

 match reward_recipient vertex with 

 | None -> () 

 | Some 0 -> reward_attacker := !reward_attacker +. reward 

 | Some _ -> reward_defender := !reward_defender +. reward) 

 vertex 

 in 

 Ref.history head |> Seq.iter f

Tailstorm must disambiguate summaries by reward

In Tailstorm, participants optimize summary proposals such that their own reward is maximized. The current implementation is buggy. Optimal summaries are appended, but they are not necessarily adopted as preferred tip of the chain. The update_head function should prefer blocks that yield higher reward.

Might be easier to fix after implementing #14.

Test tailstorm[ll] reward schemes.

It happens from time to time that I have to change the implementation of reward schemes. Last time in #28. Currently I do not have any tests for these functions. I resort to manually looking at the output of make visualize. Currently everything looks fine, but the manual inspection is prone to error.

If reward scheme bugs sneak it, they probably will stay undetected for some time. Maybe invalidate expensive RL results.

I should add some tests for that. Maybe render short blockchains with rewards into a string. Then do expect tests.

	else if o.private_blocks = 0 && o.public_blocks = 0
	then Wait_Proceed
	else if o.public_blocks = 0

	fields = []
	fields.append(((lambda self, info: info["episode_progress"]), 0, float("inf"), 0))
	fields.append(((lambda self, info: info["episode_chain_time"]), 0, float("inf"), 0))
	fields.append(((lambda self, info: info["episode_n_steps"]), 0, float("inf"), 0))
	env = cpr_gym.wrappers.ExtendObservationWrapper(env, fields)

	\| Deliver x ->
	let state =
	(* simulate defender *)
	handle_public state event
	in
	(* deliver visible (not ignored) messages *)
	if Public_view.visibility x
	then `Deliver, handle_private state event
	else `Deliver, state

	(** Calculate and assign rewards for the vertex and (potentially) its neighbours.
	Typically this is called on the history of the winning chain. *)
	type 'a reward_function = assign:(float -> 'a Dag.vertex -> unit) -> 'a Dag.vertex -> unit

	(* TODO. We calculate rewards for the whole chain on each step, then return the delta.
	If this turns out to be to expensive, we can record safepoints and use them for
	caching. *)
	let reward_attacker = ref 0.
	and reward_defender = ref 0.
	and n_pow = ref 0 in
	let () =
	let f vertex =
	let open Simulator in
	if Option.is_some (Dag.data vertex).pow then incr n_pow else ();
	t.reward_function
	~assign:(fun reward vertex ->
	match reward_recipient vertex with
	\| None -> ()
	\| Some 0 -> reward_attacker := !reward_attacker +. reward
	\| Some _ -> reward_defender := !reward_defender +. reward)
	vertex
	in
	Ref.history head \|> Seq.iter f