nomad-xyz / rust Goto Github PK

View Code? Open in Web Editor NEW

54.0 54.0 16.0 2.59 MB

Rust work for nomad actors

License: Apache License 2.0

Dockerfile 0.06% Rust 97.78% Shell 0.60% Python 1.02% TypeScript 0.53%

rust's People

Contributors

Stargazers

Watchers

Forkers

mattsse odyslam zhiyucircle connext preethamr genysys rysiman psushi sascha1337 damandal0rian aaxx luka-ethernal 0xpolygon

rust's Issues

feature/refactor: add fraud-method gas data + integrate hardcoded gas limits and prices into configurations

want to set per-network gas limits for home.update, replica.update, replica.prove, replica.process
want to set per-network gas limits and prices for all fraud-related methods (double_update, improper_update, owner_unenroll_replica, unenroll_replica)
Can get gas limits for non-fraud methods from past data
Will need to use a gas reporter tool for fraud-related methods

Document steps for running relayer

Describe what the agent does, namely forward signed updates from the home to configured replicas
Describe how configuration maps to relayer behavior (forwards updates for channels home to X replicas)
Describe behavior during failure modes for clarity

Allow external config files

Current agent configuration scheme is rigid and breaks when trying to supply external config file. Want to allow any arbitrary party to provide a unified config file and set of env vars to run agent against arbitrary networks.

Fix base config validation to allow for config files
Compare secrets against supplied config (whether builtin or from file)

fix: default processor index_only to true

Want to avoid subsidizing expensive txs on ethereum. Safer this way.

chore: update `configuration/data/definitions.ts ` for new configuration types

Missing definitions for

Agent-specific config types
SignerConf
ChainConf
Logging types

document process to deploy contracts

copied from a Discord message which should be transcribed:

the code for deploying the contracts is here: https://github.com/nomad-xyz/monorepo/tree/main/packages/new-deploy

this script is an example of invoking the deploy script: https://github.com/nomad-xyz/monorepo/blob/5b603ee6f58c27d9d5dda6b2a1fa9435a9fc9b97/packages/new-deploy/package.json#L31
the arguments are

the local path to the config file
the local path to the overrides for submitting transactions (if desired)

in my case, i opted to store the configs in a local file in the same package, here: https://github.com/nomad-xyz/monorepo/tree/main/packages/new-deploy/config/development

the layout of the config file should look like this: https://github.com/nomad-xyz/rust/blob/main/configuration/configs/development.json

steps to make your own deploy are:

configure the chains you wan to deploy in the protocol section
delete the core and bridge components so that they are empty objects {} (TODO: make an example pre-deploy configuration)
plop that file into the local path, like above, and invoke the deploy script - just like in npm run deploy

bug: fix rust config word boundary interaction with CamelCase config

When CamelCased config variables show up in the agent config, the default behavior by the config crate is to treat it as a word boundary, resulting in weird behavior in the environment variable pipeline.

@prestwich would this be fixed in your config update?

Document general steps for running a Nomad agent

Document steps for running a Nomad agent in general. This would include some of the following:

Where/how to get/setup image for agent (cc @yourbuddyconner)
Supplying configuration file or specifying run env
Supplying env vars for RPC endpoints, transaction signers, and optional attestation signer

Additional Relevant Deploy Info:

We release images via Github actions to our GCR Image Repository which get tagged with the git sha/branch they were built with
You can see the build process here if you'd like to build from source -- here is the Dockerfile.
We've got a helm repository here, with a nomad-agent helm chart.

test(ethers): test timelag middleware reads state behind tip

spin up two providers
compare results
dispatch a tx
compare results

replace `color_eyre` with `eyre`

color eyre is cute, put a little messier to read. we can always add a colored subscriber if we want

feature/refactor: allow agents to optionally read with timelag

Some reads want timelag (updater produce_update), some don't (processor message_status)
Having all or nothing leads to unexpected behavior with agents reading old state
Want granular control over when timelag is used per contract call at call time
Want something like home.produce_update(TimeLag::On)

chore: DRY up agent settings code and their tests

Duplicated code between tests for per-agent configuration. Would be nice to consolidate somehow with macro (maybe some test hook that does all the shared test ops).

Add pull request template

Like the one in the monorepo :)

refactor: write simplified deserialization schemes for signers/connections to avoid passing key/connection types

want to reduce number of env vars passed
signers and rpcs (connections) require "types" (http vs ws, hexkey vs aws)
simplified deser. scheme without explicit types would simplify operational load

Unresolved import `crate::test_utils::find_vector`

This function does not exist but is imported and used

rust/nomad-core/src/test_output.rs

Line 6 in 59282f6

test_utils::find_vector,

bug: explicitly check tx.executed when dispatching transactions

Currently we dispatch transactions at various points in the codebase, however we don't explicitly check to ensure outcome of the transaction was successful. This results in loops and bad behavior.

Ex: https://github.com/nomad-xyz/agents/blob/main/agents/updater/src/submit.rs#L57

Solution:
Lift the dispatch code up into nomad-core and fix the problem, ensuring that all agents receive this change.

replace `strip_0x_prefix` with use of the std lib method

https://doc.rust-lang.org/std/string/struct.String.html#method.strip_prefix

this was standardized sometime after we wrote the strip_0x_prefix method

observability: track channel faults metric in grafana and send slack warnings after threshold number of errors

IntGaugeVec tracking number of channel faults (channel_faults)
channel_faults split up into IntGauges by keys ["home", "replica", "agent"]
Graph: show number of channel faults per agent per home <> replica channel
Alerts: if any channel-specific IntGauge for channel faults exceeds say 10 faults in 1 hour, send alert in slack

bug(merklesync-updater): tries to poll new_root before it has finished syncing all messages

TL;DR the bug in the updater that failed all dev homes is as follows:

Updater tries produce new_roots from a local merkle tree which is built from all the emitted messages for that home (same process as the processor)
The updater has until now never synced/indexed messages
The updater started syncing messages but before that has finished, it picks the current root of the local tree and uses that as its new_root
This update of last_committed-->new_root is invalid (the new_root is actually a really old root)
The improper update is submitted and the home is failed (for each updater)

Solution:

Have the syncing merkle tree keep track of the current committed_root corresponding to its current root
I.e. Each time the syncing tree stores a new leaf, it will get the Dispatch.committed_root field and update its current committed_root field
The updater will only use tree.root() if tree.current_committed() == old_root

feature: add processor S3 fields to dev and prod configs

chore: simplify `ethers` dependencies

Import ethers rather than importing sub crates of ethers individually.

#30 (comment)

chore: delete /config folder once NomadConfig integration complete

Support new NomadConfig from nomad-xyz/configuration

High Level Agent Startup Flow

Retrieve NomadConfig for the intended environment (dev, staging, prod)
Use NomadConfig to build union of nomad_base::Settings and agent-specific settings
Instantiate agent from settings (e.g. Updater::from_settings(settings).await? in agents/updater/main.rs) and run agent (implementation already there)

Implementation Details (modify decl_settings! macro)

Implement nomad_base::Settings::from_nomad_config(config: NomadConfig, home_name: &str) to map NomadConfig to agent-specific settings (used in next step)
fn new() -> Result<Self> must:
- Fetch NomadConfig based on the RUN_ENV env variable by calling config::get_builtins(<environment name>)
- Get separation of home vs. replicas based on the AGENT_HOME env variable
- Return result of Settings::from_nomad_config(config, home_name)
Agent takes result of Settings::new()? and uses to run agent (rest of agent/main.rs)

Loading secrets (signers & RPC)?

Looks like new config crate is not meant to support secrets since all NomadConfig values will come directly from public crate imported as dependency (using something like config::get_builtins(...))
Possible solution:
- Create new AgentSecrets struct (below) that can be deserialized from JSON (shown below)
- In agent-specific Settings::new() function, expect/load a secrets.json file (below) that deserializes into AgentSecrets (inject secrets into json file as we already do)
- Implement some kind of Settings::from_configuration(config: NomadConfig, secrets: AgentSecrets, home_name: &str) where all public values come from NomadConfig while all private values come from AgentSecrets
- Implement checks if we are missing values in AgentSecrets when comparing to NomadConfig (e.g. we accidentally left out rpc for one network)

=== Rust - AgentSecrets ===
struct AgentSecrets {
   rpcs: HashMap<String, ChainConf>,
   transaction_signers: HashMap<String, SignerConf>,
   attestation_signer: SignerConf,
}

=== JSON - secrets.json ===
{
   "rpcs": {
       "ethereum": {
           "type": "http",
           "url": ""
       },
       "moonbeam": {
           "type": "http",
           "url": ""
       },
       "milkomedaC1": {
           "type": "http",
           "url": ""
       },
   },
   "transactionSigners": {
      "ethereum": {
          "key": "",
          "type": "hexKey"
      },
      "moonbeam": {
          "key": "",
          "type": "hexKey"
      },
      "milkomedaC1": {
          "key": "",
          "type": "hexKey"
      },
   },
   "attestationSigner": {
       "key": "",
       "type": "hexKey"
   }
}

bug: secrets validation throws error for configs that don't match configuration crate

secrets and base settings validation fail because they match against the configuration crate
we want to allow outside parties to provide their own configs that might not match ours and don't want to crash agent because of it

bug: update rinkeby config finalizationBlocks to 80

fix(middlewares): stop relying on updater timelagged state reads

rollback ethers version to remove last updater patch (adds timelag for contract state reads)
produce updates from local tree of messages which uses indexing timelag

feature: preflight reverts in fixed-gas actions

currently reverts in certain actions are not caught in pre-flight because we no longer estimate gas. We should run an eth_call to see if they would cause a revert and take appropriate action

example action:
update on a failed home

chore: deprecate usage of agent name for matching and make proper enum type

#30 (comment)

feature: implement `AgentSecrets::from_env`

Can build AgentSecrets from a secrets.json file. Should also add method to load secrets directly from environment variables.

monitoring: replicate prod alerts for dev/staging!

refactor: deprecate base `Settings` and move info to config crate

Should have something like BaseAgentSettings in configuration and then all the types with lots of functionality (e.g. CachingHome) implement From<BaseAgentSettings>. Overall, we should move from implementing <config struct>::into_<some struct with functionality> to implementing <struct with functionality>::from_<some config struct>

chore: delete `tools/balance-exporter` + `nomad-base/bin/example.rs`

Not used anywhere. Collateral from earlier fall 2021 days. Delete.

Example file in nomad-base can also go.

refactor: have ContractSync catch missed updates when syncing at tip

backtracking algorithm for catching updates only works if we assume there will be more updates to reveal a discontinuity in RPC
relayer and watcher (which sync at tip) are then vulnerable to missing updates completely for hours until next update shows up (if it ever does)
we want to index at the tip and also reindex ranges finality_blocks behind the tip to catch any missed updates

Allow env overriding agent config values

Want to allow agent-specific values to be overridden through environment variables.

implement EnvOverridable for each agent config in configuration/src/agent
agent_settings.load_env_overrides() in nomad-base/src/settings/macro.rs before adding to overarching agent settings

chore: comment in `LightMerkle` test case and refactor nomad test crates

#35 (comment)

refactor: every agent implements an agent-specific `AgentMetrics` struct

Will follow same pattern as Channel struct in run refactor PR
Each agent has a type AgentMetrics and must implement a method build_agent_metrics which specifies how to build the struct from AgentCore

feature: Secrets Reconciliation Script

See comment here for details on executing the secrets_template script.

We need to write an automation tool that wraps the secrets_template script, and diffs the output of it with a secret that exists in Google Secrets Manager and outputs a patch to the existing secret.

This tool can then be used as a pre-deploy hook to error if the secrets are malformed or missing entries.

Document steps for running processor

Describe the functionality of the processor, namely that it tracks messages on the home and proves/processes them on any configured replicas
Describe how configuration maps to processor behavior (proves/processes messages for channels home to X replicas
Describe the functionality of the allowlist, denylist, index_only, and s3 options and how they can be used for custom clients/use cases
Describe behavior during failure modes for extra clarity

Currently, updater must wait till its last update becomes final to move onto producing a new one (does so by re-reading its own update on a timelag)
This incurs a blocktime x finality latency for between each update
If you dispatch a message that reaches finality just after the last update was submitted, you have a potentially 2 x (blocktime x finality) latency

Solution

We know that the all updates produced in db will be valid chain, thus the new_root of the prev produced update will always be the old_root of the next produced update
UpdateProducer should immediately retrieve next root to build off of from produced updates db

feature: add secrets_template.rs build to Dockerfile

We need to be able to call the secrets_template binary from within the docker container for the purposes of automation tooling. Please add it to the docker build for the agents.

Document steps for running watcher

Describe key provision process and watcher enrollment on xapp connection managers
Describe what watcher configuration with given homes and replicas means (watchers over channel from home to X replicas)
Suggest running watcher with multiple RPC endpoints for safety
Describe behavior during failure modes

feature: allow configurable hardcoded gas limits through configuration crate

feature: agent expect JSON config and secrets files by default

Need some way to override agent config values at agent run time.

Nomad Team:

Run pre-agent script, which takes in a RUN_ENV environment variable. This script will output a template secrets.json and the a config.json from the Nomad configuration crate for that RUN_ENV (config will be development or production json files in crate).
Override any necessary values in secrets.json or config.json.
Run agent with RUN_ENV and AGENT_HOME. All agent expects is secrets.json and config.json in cwd.

External Party (no usage of Nomad configuration crate)

Create custom config.json and secrets.json in cwd (might not include all our networks).
Run agent.

chore: publish `nomad-types` and republish `configuration` to crates.io

feature: add logging/metrics to watcher

Just watcher:

Gauge: number of updates sent through UpdateHandler (number of updates checked)
Gauge: 0 or 1 binary for double update detected
Histogram: latency between Update event emitted and update sent through UpdateHandler (see ContractSync for example)

All agents that check home failure:

Gauge: 0 or 1 for improper update observed (failed home; can be for all agents)
Gauge: number of home failed checks (can be for all agents)

observability: add grafana graphs/alerts for failure checks and watcher

Add grafana display for the number of double updates a watcher has observed (fine if 0, go crazy with alerts if >= 1)
Add grafana graph for number of updates the watcher has inspected for double updates (should be up and to the right if watcher is functioning properly)
Add grafana display for number of times home has been observed as failed (fine if 0, go crazy with alerts if >= 1)
Add grafana graph for number of home failure checks an agent has performed (should be up and to the right if agent is healthy)

refactor: updater and watcher should use error-catching logic that isolates failures

updater and watcher cannot use the run_many pattern and thus won't catch errors for channel faults
at some point, need to rearchitect to allow them to catch errors instead just crashing

nomad-xyz / rust Goto Github PK

rust's People

Contributors

Stargazers

Watchers

Forkers

rust's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs