base-org / pessimism Goto Github PK

View Code? Open in Web Editor NEW

1.4K 26.0 393.0 1.1 MB

Detect real-time threats and events on OP Stack compatible blockchains

Home Page: https://base-org.github.io/pessimism/

License: MIT License

Makefile 0.56% Go 98.87% Dockerfile 0.12% Shell 0.45%

pessimism's Introduction

pessimism

As of May 28, 2024, Pessimism has been deprecated and is no longer being actively maintained

Because you can't always be optimistic

Pessimism is a public good monitoring service that allows for OP Stack and EVM compatible blockchains to be continuously assessed for real-time threats using custom defined user heuristic rule sets. To learn about Pessimism's architecture, please advise the documentation.

Warning: Pessimism is currently experimental and very much in development. It means Pessimism is currently unstable, so code will change and builds can break over the coming months. If you come across problems, it would help greatly to open issues so that we can fix them as quickly as possible.

Setup

To use the template, run the following command(s):

Create local config file (config.env) to store all necessary environmental variables. There's already an example config.env.template in the repo that stores default env vars.
Download or upgrade to golang 1.19.
Install all project golang dependencies by running go mod download.

To Run

Compile pessimism to machine binary by running the following project level command(s):
- Using Make: make build-app
To run the compiled binary, you can use the following project level command(s):
- Using Make: make run-app
- Direct Call: ./bin/pessimism

Docker

Ensure docker is installed on your machine
Pull the latest image from Github container registry (ghcr) via docker pull ghcr.io/base-org/pessimism:latest
Make sure you have followed the above instructions to create a local config file (config.env) using the config.env.template

Run the following:

Without genesis.json:

docker run -p 8080:8080 -p 7300:7300 --env-file=config.env -it ghcr.io/base-org/pessimism:latest

With genesis.json:

docker run -p 8080:8080 -p 7300:7300 --env-file=config.env -it -v ${PWD}/genesis.json:/app/genesis.json ghcr.io/base-org/pessimism:latest

Note: If you want to bootstrap the application and run specific heuristics/paths upon start, update config.env BOOTSTRAP_PATH value to the location of your genesis.json file then run

Building and Running New Images

Run make docker-build at the root of the repository to build a new docker image.
Run make docker-run at the root of the repository to run the new docker image.

Linting

golangci-lint is used to perform code linting. Configurations are defined in .golangci.yml It can be ran using the following project level command(s):

Using Make: make lint
Direct Call: golangci-lint run

Linting Markdown Files

To ensure consistent formatting and avoid common mistakes in our Markdown documents, we use markdownlint. Before submitting a pull request, you can check your Markdown files for compliance.

Installation

Install Node.js: If you haven't already, install Node.js.
Install markdownlint CLI globally:

npm install -g markdownlint-cli

Linting with markdownlint

To lint your Markdown files, navigate to the root directory of the project and run:

markdownlint '**/*.md'

If markdownlint reports any issues, please fix them before submitting your pull request.

Testing

Unit Tests

Unit tests are written using the native go test library with test mocks generated using the golang native mock library. These tests live throughout the project's /internal directory and are named with the suffix _test.go.

Unit tests can run using the following project level command(s):

Using Make: make test
Direct Call: go test ./...

Integration Tests

Integration tests are written that leverage the existing op-e2e testing framework for spinning up pieces of the bedrock system. Additionally, the httptest library is used to mock downstream alerting services (e.g. Slack's webhook API). These tests live in the project's /e2e directory.

Running integration tests requires generating devnet allocation files for compatibility with the Optimism monorepo. The following scripts/devnet_allocs.sh can be run to do this generation. If successful, a new .devnet directory will be created in the project's root directory. These allocations should only be regenerated when go.mod rebases to a new monorepo release.

Integration tests can run using the following project level command(s):

Using Make: make e2e-test
Direct Call: go test ./e2e/...

Bootstrap Config

A bootstrap config file is used to define the initial state of the pessimism service. The file must be json formatted with its directive defined in the BOOTSTRAP_PATH env var. (e.g. BOOTSTRAP_PATH=./genesis.json)

Example File

[
    {
        "network": "layer1",
        "type": "contract_event", 
        "start_height": null,
        "alerting_params": {
            "message": "",
            "destination": "slack"
        },
        "heuristic_params": {
            "address": "0xfC0157aA4F5DB7177830ACddB3D5a9BB5BE9cc5e",
            "args": ["Transfer(address, address, uint256)"]
        }
    },
    {
        "network": "layer1",
        "type": "balance_enforcement", 
        "start_height": null,
        "alerting_params": {
            "message": "",
            "destination": "slack"
        },
        "heuristic_params": {
            "address": "0xfC0157aA4F5DB7177830ACddB3D5a9BB5BE9cc5e",
            "lower": 1,
            "upper": 2
       }
    }
]

Spawning a heuristic session

To learn about the currently supported heuristics and how to spawn them, please advise the heuristics' documentation.

pessimism's People

Contributors

Stargazers

Watchers

Forkers

wbnns pavel9213 umbreiiaa robertbberryman gracts643 intecanothe3 sheyetterly40 sunts1938 the-amazing-atharva brianmillsjr davissok ethsecurity1 vavrajosef eliascoin 43josearias manipulation44 dengsheng1987 flemmingwilbert aegon-labs epociask namdar12 pbinhan grecomateo17 vittawas6 borisbritva1 cryptoman50 beslan09 washin70 annaberry22 hadideljavan borislam4 nitantchhajed gofansclub lisasapp11 huyminhape dukui-s songdukui youngnamelove sunyoung1985 songduk latepopcorn bonghyun55 earlypopcornhot whalnave eagle1649 2pekan2crypto aaaaaannsh137 linluan5921 jasmy8 boylinkes audi0xr8 godting-71 xxxcc232 frankmmendoza27 cagatayoezkan vdesna kathleenismithz45 olumide3367 louisaal irengs1 valeryforbes uoasanmamay mk7928166 abdoulayediarra800 alvineleakwamg rapp39zhalo8999 ninamakas90 naj777879246 moamardahan salhommeeran 777641147ah amiralkrm abrahymalnmany99 esperancemukansenga orjimecomfort94 bourhanatoure785 martinesmilton88 gladysgalvez569 kucheran101 svetameta vicentasanchezsanchez80 guevarawelsy miracleeli218 jarshival eligibleairdrop wanyi9 ouangeline0 moutsinadikarlo ederolmedo1988 ntivuguruzwasimon nabintuange2018 chornuy7 castelo-sam naybanboy ductien77d1 rhymes1212 xnlr666 xbukh giorgiperad omahs

pessimism's Issues

No Subsystem Manager Tests

L1/L2 geth block polling algorithm could fall out of sync with head

Edge Case

An arbitrary node failure occurs that causes an execution node on base ethereum to temporarily fall out of sync with the majority network for h0...hn blocks. If Pessimism is actively subscribed to this node, it'll continue to poll for a new block at height h0 until retrieved. Once retrieved, the oracle implementation will perform a time wait before retrieving a block at h1. Given that these time waits are typically proportionally to the block production rate, the oracle could be operating on a false head with no perception that the application is out-of-sync.

Mitigation(s)

Check for the most recent block number every poll step and perform an immediate backfill operation if the oracle is >1 block behind head; this would require more node calls and could over-exhaust down-stream subsystems in real application if the backfill requires many blocks.
Reduce default polling times to ensure the application will catchup; this would require more CPU operation.

No Support for Multiple Alerting Destinations for an Invariant Session

Problem

Currently, an invariant session only supports a single alerting destination within the alerting store.

Problem Solution

Add support for multiple destinations that can be configured via the the InvariantRequestParams.

Lack of Architectural Documentation

Problem

The current repository lacks sufficient architectural documentation, making it challenging for developers to understand the system's design and make informed decisions during development and maintenance. Without comprehensive documentation, developers may face difficulties in onboarding, troubleshooting, and extending the project. This issue inhibits efficient collaboration and hinders the growth and sustainability of the project.

Proposed Solution

To address this problem, we should prioritize the creation of comprehensive architectural documentation for the repository.

Configurable Alerting Definition

Problem

Currently, the severity routing configuration is hardcoded. While this works for some opinionated use cases, it fails to be generalized to many unique use cases. Additionally the service only supports integration with a single slack webhook and two pagerduty integration keys. Ideally, a user could have up to many slack webhooks and pager-duty services to alert.

Problem Solution

Support some alerting definition JSON that allows for severity routing definitions of multiple downstream dependencies for a severity. This will be a global definition that the application ingests and uses to understand where to alert.

This would look something like:

{
    "sev_low": [
          {
            "destination": "slack",
            "config": {},
          }
       ]
     "sev_mid": [
         {
           "destination": "slack",
           "config": {},
          },{
              "destination": "pagerduty",
              "config": {},
         }]
}

Support Process Removals

Problem

Existing DAG implementation only supports component/edge addition. This is works great for bringing new pipelines on-board but doesn't allow for removal of temporary pipelines (backtests). Additionally in the future, once a user opts to remove an invariant, the corresponding pipeline to it should be removed as well (if applicable).

Problem Solution

Add removal functionality to support the deletion of both edges and components.

Acceptance Criteria

I. removeEdge and removeComponent functions are fully implemented (There may need to be other functions added to support this logic)
II. Unit tests exist that comprehensively assess the behavioral assumptions of each function
III. PoC exists that showcases edge addition/removal using real-time components

Conduit Package Abstraction is Inviable within the OP-Stack Ecosystem

Problem

Conduit was recently announced as a ground-breaking dev-ops solution for operationalizing usability of the Optimism stack. Pessimism currently uses an abstraction named Conduit to refer to its internal ETL system. This will likely create confusion for developers given its conflicts with a recently announced service offering.

Proposed Solution

Determine a more appropriate package name that adheres to Pessimism's ETL abstraction and create a PR with:

A justification for a new name
Remapped dependency imports with new package name

No Metrics Support

I. Determine which metrics to track / gauge

Documentation PR

II. Implementing telemetry

Metrics integration PR

III. Internal deployment to dev & PRD

Development of internal dashboards

No Aggregation Component Support

Problem

As of now, there is no way to sync data between two unique asynchronous data sources. This is necessary functionality for assessing data from both L1 (Ethereum) and L2 (Base, Optimism) blockchains within a single invariant implementation; (e.g, native ETH bridge supply monitoring).

Proposed Solution (Conveyor Component)

Introduce a Conveyor pipeline component to be used as an intermediary aggregator to convert many input data types into a single output transit data.

Every conveyor should have a:

Some set of input channels which they continuously poll data upon
A synchronization policy that defines how different transit data from multiple input streams will be aggregated into a single piece of data
An OutputRouter to handle downstream transit data routing to other components or destinations

Single Value Subscription

Only send output at the interval of a single Input Parameter channel

Single Value Subscription refers to a synchronization policy where a bucketed multi-data tuple is submitted every time there’s an update to a single input data queue.

For example we can have an invariant that subscribes to blocks from two heterogenous chains (l1/l2) or {Chain A, Chain B}, let's assume Chain A's block production time is greater than Chain B’s.

We can either specify that the invariant will run every-time there's an update or a new block from Chain A:

{
   "A:latest_blocks": [xi] where cardinality = 1,
   "B:latest_blocks": [yj, ..., yn] where cardinality >= 1,
}

Or we can specify the inverse, every-time there's a new block from Chain B:

{
   "A:latest_blocks": [NULL OR xi] where cardinality <= 1,
   "B:latest_blocks": [yj] where cardinality = 1,
}

Proof-of-Concept

A proof-of-concept should be built that orchestrates a lightweight component pipeline that uses a conveyor to sync data from different sources. (e.g, an Ethereum/Optimism conveyor that performs aggregations every-time a new ETH block is added). This is similar to how the Oracle and Pipe components were previously proved out.

No API Specifications

Problem

As of now, there is no detailed specification outlined the current and expected behavior of the Pessimism API.

Problem Solution

Build a standardized API spec. This can be done using swagger.

Markdown Linting Support

Problem

Many of the existing repo markdown documents are riddled with typos and formatting issues. This can cause confusion for users and is nonetheless poor hygiene. Adding markdown linting will help establish a consistent formatting style for Markdown files.

Problem Solution

Integrated markdown linting in CI
Make directive to run markdown linter
Repo passing markdown linting scan

No E2E Integration Between Pessimism Tests & EVM Execution Nodes

Problem

There currently exists no end-to-end integration tests that validate Pessimism's correctness. Currently, the only way to validate the e2e (ie. block --> invariantInput-->activation-->alert) flow is via manual tests on a local instance of the application. This is problematic as:
I. Code changes that can impact correctness aren't programmatically assessed when being considered for merge
II. Performing manual validations is exhaustive and disparate from the tests in the actual repo

Problem Solution

[1] (#74) Re-architect the existing main caller abstractions into an Application struct that has accessor methods for starting, stopping, and calling application level methods. The Application should hold all three subsystems the http server.
[2] (#85) Develop a way to register downstream alert invocations. A huge part of testing will require validating that an alert actually fires given an expected sequence of blocks and transactions.
[3] (#85) Develop a testing framework that assess that correctness of the balance_enforcement and contract_event invariants. The existing Optimism monorepo already has a op-geth node representation that can be leveraged for these tests.

Enable optional environment file loading

Problem

Currently, in order to build and run the application, a config.env file is required otherwise the application fails. The application should not fail if an env file is not found. This is not ideal if running in a production environment and env variables are injected through the system process instead of through an env file.

Solution

Instead of requiring the application to be run with a .env file, we should instead have a solution that allows the application to run with or without an env file present. This may mean simply logging a warning instead of failing out of the application upon not loading in an env file, or something more complex.

`Pipeline` Lacks Introspective Capabilities into its Components

Problem

Existing Pipeline struct receiver logic has no support for an event loop routine that can read existing component states to understand when:
I. A component crashes or stops
II. An oracle component finishes backfilling: syncing --> live
III. (STRETCH) Inter-component communication latency is below some threshold

This logic is critical for the etlManager abstraction to be able to:
I. Perform merging operations when a syncing pipeline has become live; with there existing >1 identical pipelines with the same ID and live state
II. Understand when pipelines have successfully shutdown
IV. Understand when pipelines have crashed in real-time

Proposed Solution

Two current possible solution paths, both of which are open for discussion:

Solution 1: Higher Order Event Polling

This event loop should run in the following fashion to understand when key changes to component (consequently pipeline) state occur using some interval based polling algorithm like this:


while True:
  time.sleep(x)
  oracle_state = pipeline[0].state
  
  if pipeline.state != PipelineState(oracle_state): 
     pipeline.state = oracle_state
     emit pipeline.state
     
  for i, state in enumerate([c.state for c in pipeline]):
     if i == 0:
        continue
         
     if state != "active" :
        pipeline.state = state
        emit pipeline.state

This issues with this approach are:
I. Polling increases computational load
II. Increased latency given time discrepancies between when a component state change happens versus when the poller performs a pipeline read

Solution 2: Event Based

Otherwise, a more event (listener/subscriber) based model could be leveraged where the pipeline is actively listening for components to emit activationState events. This

The issues with this approach are:
I. (abstraction leak) Components need to have higher order knowledge (i.e, go channel) of the greater Pipeline.
II. Increased concurrency management.

NOTE

These proposed solutions don't take failure management into considerations For example, in the instance of a failed pipeline, we could enact some retry procedure to re-attempt running it N times. These fail safe or recovery procedures should be explored in a subsequent issue.

No Support for Alerting Cooldown Configurations

Problem

Currently, if an alert is produced for some invariant session, it could be produced so frequently that it could spam a downstream subscribed system (e.g. slack, third-party server).

For example, say we deploy a balance_enforcement session binded to account 0x421 to ensure that the account balance never exceeds 100 ETH. Now, let's say some transaction to 0x421 tops the account balance above the 100 ETH threshold. Now, every-time a balance enforcement invalidation is checked for this session, an alert will be produced until the account's balance again falls below 100 ETH. This would result in an alert being produced every-time a new block is produced; on L2 this would be multiple times a minute.

Problem Solution (Spam Prevention)

Introduce an optional CooldownTime parameter to all invariant session configurations that silences alerts for some user defined time before alerting again.

Add Internal Process Activity State Tracking

Components currently have no tracking in-place for operational states. For higher level abstractions and constructs to better reason about the status of some running component, an ActivityState should be added to each component uniquely to better enhance system introspection and concurrency management.

Refine Invariant Registry to Use Less Anti-Patterns

The existing invariant registry implementation is a functional package with a single used GetInvariant method.
As invariants seemingly become more complex, having a register based representation similar to the component registry is essential for:

Better modularity for invariant definitions
Providing a clean way for preprocessing injection
Learning about an invariant without having an invocation of it

GETH Block Oracle Definition Logic is Bug Prone and Untested

Problem

As noted in Initial Pipeline PR, the oracle implementation for indexing sequential block metadata from an op-geth execution node is incredibly lackluster and could use some heavy refinements to ensure better resiliency/extensibility.

Proposed Solution

The following improvements should be supported to better enhance the oracle definition:

A backtest routine should be supported by the Oracle definition that takes a starting and ending block height parameter to then sequentially iterate from block_start to block_end inclusively; reading each block before transiting it to downstream components.
NOTE: Batched concurrent processing techniques could be leveraged here to ensure more efficient backtesting.
Failure cases should be threat modeled with engineered fail safe solutions. Some of these cases have been noted in the internal/conduit/registry/geth_block.go file. However there are likely more that aren't currently considered.
All oracle definition logic should be fully unit tested; this includes validating happy path logic as well as ensuring that fail safe routines behave as expected.

NOTE: The ethClient struct could be binded to some internal interface so that failure cases can be properly unit tested.

Resources

Initial Pipeline PR
OP Indexer --> Great point of reference

Hardcoded Invariant Support

Support for Initializing Pessimism Service with an Optional Configuration File

Problem

Currently, Pessimism starts with a blank slate and requires clients to teach it what to monitor. This process might involve lengthy client interaction logic for deploying and initializing the service.

Problem Solution

Support an optional initialization file that defines arbitrary invariant schemas. These schemas would be executed upon service start, similar to how a blockchain node ingests a genesis state during bootstrapping. This feature aims to provide a more seamless operationalization of the service via multiple users.

API Request Validation Submodule

Problem

Currently, Pessimism's API supports no request validation module nor request validation for invariant deployment requests. This means that request bodies that successfully unmarshall can contain invalid data fields that could result in unforeseen consequences when the server attempts to process the request.

Problem Solution

Implement API validation logic to ensure that requests fields are properly vetted before triggering subsystem logic. This will require analyzing each structured request body (ie. InvariantRequest) field to understand how a variable's expression should be constrained and which value ranges should be applied to it.

No Architectural Documentation for ETL Component Specification & Usage

Problem

Pessimism has no architectural readmes that specify component architecture and technical specifications.

Problem Solution

Build a technical markdown rendered writeup that describes:
I. Each component type and a cross-analysis between all component types
II. Component metadata (i,e. ids, egress, outgress, outputType, activityState)
III. Component IDs and their encoding/decoding byte orderings

Mock Generation for `EthClientInterface` directs output mock file to wrong subdirectory

Support For Arbitrary Alert Messages

Heuristic configurations should be able to define some metadata that holds arbitrary user text. As of now, alerts just generate heuristic specific data fields that may not provide the necessary context for on-call individuals/analysts that work to triage and respond to an alert.

Verify precision loss

Problem

The current code for converting wei to ether in the given acount_balance oracle definition has a potential issue of precision loss since the returned overflow/underflow byte is unchecked. This can lead to inaccurate conversion results and affect the accuracy of the overall calculations.

Solution

To ensure accurate conversion from wei to ether, it is necessary to verify the precision loss in the existing code implementation. This can be addressed by performing thorough testing and validation.

State Key Representation is Insecure

Problem

The current state key representation is very insecure given that a struct is used with no pointer references by the respective stateful component definitions. Currently this results in data loss of the state keys that are held in the component's structure once higher level callers garbage collect the value. Because of this, active components fail to lookup necessary stateful values for secure operation

This bug is currently only prevalent in the current account_balance oracle implementation, consequently resulting in the existing balance_enforcement invariant to not work.

There are some bandaid fixes that we could easily apply to remediate this in the short-term:
I. Make the balance_oracle store a reference to the stateKey type instead of a direct value

However, the fundamental representation of the key itself is flawed as by nature it should be a low-memory primitive type. Additionally, a lower level representation would ensure that there'd be need to no have a string representation in memory.

Problem Solution - 1 (32 byte array)

Update state key representation to be adherent with existing ID representations that are already used across the application. A byte slice representation would be ideal that encodes the necessary metadata for stateful lookups using the following 256 bit representation:

type StateKey [32]byte

With a byte encoding schema like:

0       1        2        3                      22           32
|-------|--------|--------|----------------------|--------------|

nested    prefix  register     (optional) address         PUUID
                     byte

This would avoid the need for pointers when referencing keys.
Additionally, chars in go occupy 4 bytes, meaning that this rep stored as a string would be 128 bytes versus 32.

Problem Solution - 2 (57 byte struct)

Store references to state keys in the oracle metadata. This would require keeping the existing state key struct types as is; ie:

// StateKey ... Represents a key in the state store
type StateKey struct {
	Nested bool // Indicates whether the key is nested
	Prefix uint8
	Key    string
}

func (sk StateKey) IsNested() bool {
	return sk.Nested
}

// WithPUUID ... Adds a pipeline UUID to the state key prefix and returns a new state key
func (sk StateKey) WithPUUID(pUUID PipelineUUID) StateKey {
	return StateKey{
		sk.Nested,
		sk.Prefix,
		pUUID.String() + ":" + sk.Key,
	}
}

NOTE: We should probably update the struct definition to store the PUUID instead of doing a string concatenation inside a clone method

Versus solution - 1

In comparison to solution - 1, 2 will occupy more space as the Nested value would be a boolean; occupying 4 bytes instead of 1 and would require holding 25 bytes for the entire PUUID. However, other data fields (ie. nested, prefix, registerType, address) would occupy the same space as they do in solution - 1. It's important to note that this solution is more intuitive for developers and readability purposes.

Account Balance Oracle Components Fail to Successfully Generate

Problem

2023-06-19T23:29:04.654-0700	ERROR	handlers/invariant.go:35	Could not process invariant request	{"error": "could not cast component initializer function to oracle constructor type"}

Problem Solution

Fix the oracle constructor definition for the current AccountBalance oracle.

No Pull Request Template

A pull request template should be added to the Pessimism service so that developers can easily describe their work and changes in a standardized fashion.

Poll Interval is a Hardcoded Value for EVM Block Polling Client

Problem

Poll interval is a hardcoded value for geth.block poller. This is infeasible given that poller will read live block data from both layer-1 Ethereum and layer-2 Op-Stack blockchains.

Problem Solution

Poll interval should be stored as env vars L1_POLL_INTERVAL and L2_POLL_INTERVAL

Add `Account.Balance` Backtesting Support

Problem

The current implementation of the BackTestRoutine function lacks support for backtesting when using this oracle. This limitation prevents users from simulating and analyzing historical trading scenarios based on account balances.

Solution

To enhance the functionality of the oracle and enable backtesting, it is necessary to add support for account balance backtesting in the BackTestRoutine function where iterations are performed from some starting to future block heights that are specified by the user. This will allow users to test various real-time based scenarios on historical ledger data.

No Security Policy for Disclosing Vulnerabilities

Add `Log` as Default Alert Destination Type

Problem

Currently, our alerting system supports various alert destination types such as email, SMS, and webhook. However, we have received a requirement to include a new default alert destination type called Log. When Log is selected as the alert destination, the alert should be generated as a debug log entry.

Proposed Solution

Introduce a new default alert destination type called Log that just emits a log message from the alerting subsystem to represent an alert invocation.

Optional ETL Persistence Support

Problem

We need a persistence storage for faster invariant bootstrapping and reduce redundant ETL operations on external sources.

Proposed Solution

Postgres support with support to get data at a particular block, between blocks.
Store deposited txns from L1 to L2, and withdrawals from L2
Pooled connection with retries
Configurable parameters for SSL connections

No Production Grade Logging

Problem

As of now, there is no support for standardized application logging within the Pessimism service.

Proposed Solution

Add support for context based logging using zap.

https://medium.com/@gosamv/using-gos-context-library-for-logging-4a8feea26690

Use Go Mock

Problem

Mocks used for testing are currently written manually using testify/mock. While this provides necessary functionality for testing, it doesn't scale well for the addition of new testing mocks and regeneration of existing ones.

Problem Solution

GoMock provides mock autogeneration capabilities using embed instructions within Go files.

Acceptance Criteria

I. All current mocks (i.e, EthClientInterface) are regenerated using mockgen and stored as independent files in an internal/mocks package

II. All current mocks have embedded gomock instructions at the top of the code files which declare the interface that we'd like to mock. For example, to mock EthClientInterface, we add the following line to the top of internal/client/eth_client.go:

//go:generate mockgen -package=mocks -destination=../mocks/eth_client.go github.com/base-org/pessimism/internal/client EthClientInterface

III. Mock generation should be possible using a make command. (e.g, make mock-gen)

Dynamic Invariant RCE Support Proposal

Dynamic Invariant Remote Code Execution Support

TBD

Pipeline Collisions Occur When They Shouldn't

Bug Description

Current DAG implementation will add edges between existing components based on conflicting ComponentID values. However, this means that pipelines that perform backfills can conflict with ones that are live. Same for backtesting pipelines as well.

Example Scenario

Two pipelines are shown below (RP0, RP1), both of which use the same exact components, but require backfilling from some starting heights (x0, x1) where:

x in Z+ and x <= current_block_height

In this example, since RP0.component_set = RP1.component_set, RP1 will be treated as a duplicate pipeline with the EtlManager since their respective pipeline IDs are equal. This will result in RP1 powering some invariant from whatever point in chain history that RP0 is in; resulting in invariants failing to successfully backfill.

It's important to note that in this scenario, x0 could equal x1 but doesn't necessarily have to. The presence of some x for a registerPipeline denotes that it has backfill requirements to be ran. This means that each pipeline state is:
I. Syncing when monotonic x < current_block_height
II. Live once x >= current_block_height

Problem Solution

Introduce access management logic that performs additional integrity checks before reusing an existing component. I,e. The presence of sync flag for a pipeline deems it globally unique where it's components cannot be reused.

In the example of identical pipelines (P0, P1) with the same IDs but backfilling toggles, the pipelines would merge once their states have transitioned (syncing --> live).

More Granular Health Checking

Problem

The existing health checking /health endpoint performs no real health assessment. It fails to consider dependency health (Ie. layer-1 node, layer-2 node, slack), meaning that Pessimism will render Healthy=true in the event of upstream node/dependency failures.

Problem Solution

The health check /health endpoint should be extended to support health checking assessments for critical application dependencies. Starting out, the health check struct itself should look something like:

type HealthCheck {
      IsL1Healthy bool
      IsL2Healthy bool
}

From there, the yielded Healthy value in the client response should return:

 true if IsL1Healthy == true && IsL2Healthy == true
 false if IsL1Healthy != true || IsL2Healthy != true

Introduce optional Retry-able EthClient

Problem

Current eth client doesn't support retry logic and isn't testable/mockable easily. With the ability to specify another http client, it will be retry-able with minimal code change and testable as well.

Proposed Solution (Conveyor Component)

A client such as resty, which has retry built in along with mocking, might be created and provided to the NewClient function of the ethclient. Along with a constructor for the ethclient, which can take in options passed in, it will be useful tool for many.

No ETL Management Procedure Exists for Handling Component State Changes

Parallelized Node RPC Calls for Health Checking

Problem

The existing health check service logic queries both the L1 & L2 nodes for health statuses. Currently this is done sequentially, but can be parallelized where both nodes are queried asynchronously.

Problem Solution

Parallelize these queries using go routines and wait-group concurrency management patterns.

No Unit Tests for Pipeline & ETL Manager Logic

Enforcing Function Header Comment Hygiene

As mentioned in #32, there is no current enforcement for function comments per structure. Would love to enforce this to prevent improperly formatted function comments

No Documentation Exists Specifying How to Run & Test Service

Mermaid Diagrams fail to generate when viewing documentation via github pages

Problem

Currently, our GitHub Pages site is encountering an issue where it fails to generate Mermaid diagrams. Mermaid is a popular JavaScript library for generating diagrams and flowcharts. It is an essential tool for visualizing complex information and improving the overall user experience on our site.

Problem Solution

Investigate and identify the root cause of the failure in generating Mermaid diagrams on the GitHub Pages site.
Fix the issue to ensure that Mermaid diagrams are correctly rendered and displayed on the site.

History

some transactions doesn´t appears in the history.

Pipeline Analysis Functionality

Problem

Current ETL has no way to understand when pipeline collisions occur where components can be reused.

Proposed Solution

Introduce an analysis struct type that analyzes two pipelines with the same PipelineUID for mergability.

DoS vector w/ Infinite Go Routines

Risk Vector

Currently, the /v0/invariant endpoint can be exploited by an attacker to create infinite go routines on an application instance of Pessimism. While deduplication policies do exist within the ETL implementation, an attacker could meticulously request invariant deployments where deduplication would not occur for some time (eg. backfill from L1 genesis block). Eventually the machine running the Pessimism app would exhaust computation resources (ie. cpu, ram) to the point where the app could no longer function.

Mitigation(s)

Introduce concurrency rate limiting within the Pessimism application logic (ie. MAX_GO_ROUTINES) OR
Introduce a max number of active pipelines that can exist at once (ie. MAX_PIPELINES)

Add closure logic to all component types

Problem

While contexts do a good job of handling closing of goroutines, we need to have a good way to ensure that all routines have closed down and we aren't leaking memory or have unfinished jobs.

Proposed solution

To have a Close() function, that is IO blocking, which will be called when the program needs to exit and assurance granted that, when the Close() function is called and it finishes, all the routines that the specific component needs to care about has closed down correctly. This provides completion assurance and helps design better systems which are resilient.

Verify config validity during `HeaderTraversal` construction

Problem

Currently, the passed in block heights and their validity are being checked at the functions where they shouldn't have to worry about it. The configs need to be verified when the oracle is asked to be constructed.

Proposed Solution (Conveyor Component)

When the NewOracle function is called, we need to verify if the configs are valid and throw respective errors if they're not.