GithubHelp home page GithubHelp logo

graphprotocol / indexer-rs Goto Github PK

View Code? Open in Web Editor NEW
18.0 19.0 21.0 1.13 MB

Rewrite of indexer-service in Rust with TAP payments implementation

License: Apache License 2.0

Rust 99.38% PLpgSQL 0.62%
graph-protocol indexers the-graph graphprotocol thegraph

indexer-rs's Introduction

indexer-service-rs

Introduction

A Rust impl for The Graph indexer service to provide data services as an Indexer, integrated with TAP which is a fast, efficient, and trustless unidirectional micro-payments system.

Features

  • Receive paid or free query requests and route to graph node
  • Route "meta" queries on indexing statuses and deployment health
  • Serve indexer information such as health, indexer version, and operator address
  • Monitor allocations, attestation signers, and manage receipts using TAP, store receipts in the indexer database
  • Record performance and service metrics

Quick start

$ cargo run -p service -- --help

Usage: service --config <FILE>

Options:
      --config <FILE>  Path to the configuration file.
                       See https://github.com/graphprotocol/indexer-rs/tree/main/service for examples.
  -h, --help           Print help

All the configuration is done through a TOML file. Please see up-to-date TOML configuration templates:

Upgrading

We follow conventional semantics for package versioning. An indexer may set a minor version specification for automatic patch updates while preventing breaking changes. To safely upgrading the package, we recommend the following steps:

  1. Review Release Notes: Before upgrading, check the release notes for the new version to understand what changes, fixes, or new features are included.
  2. Review Documentation: Check the up-to-date documentation for an accurate reflection of the changes made during the upgrade.
  3. Backup Configuration: Save your current configuration files and any local modifications you've made to the existing codebase.
  4. Deploy: Replace the old executable or docker image with the new one and restart the service to apply the upgrade.
  5. Monitor and Validate: After the upgrade, monitor system behavior and performance metrics to validate that the service is running as expected.

These steps should ensure a smooth transition to the latest version of indexer-service-rs, harnessing new capabilities while maintaining system integrity.

Contributing

Contributions guide

Supported request and response format examples

✗ curl http://localhost:7300/
Ready to roll!

✗ curl http://localhost:7300/health
{"healthy":true}

✗ curl http://localhost:7300/version
{"version":"0.1.0","dependencies":{}}

✗ curl http://localhost:7300/operator/info
{"publicKey":"0xacb05407d78129b5717bb51712d3e23a78a10929"}

# Subgraph queries
# Checks for receipts and authorization
✗ curl -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer token-for-graph-node-query-endpoint' --data '{"query": "{_meta{block{number}}}"}' http://localhost:7300/subgraphs/id/QmacQnSgia4iDPWHpeY6aWxesRFdb8o5DKZUx96zZqEWrB
"{\"data\":{\"_meta\":{\"block\":{\"number\":9425787}}}}"

# Takes hex representation for subgraphs deployment id aside from IPFS hash representation
✗ curl -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer token-for-graph-node-query-endpoint' --data '{"query": "{_meta{block{number}}}"}' http://localhost:7300/subgraphs/id/0xb655ca6f49e73728a102219726ff678d61d8fb792874792e9f0d9887dc616600
"{\"data\":{\"_meta\":{\"block\":{\"number\":9425787}}}}"

# Free query auth token check failed
✗ curl -X POST -H 'Content-Type: application/json' -H 'Authorization: blah' --data '{"query": "{_meta{block{number}}}"}' http://localhost:7300/subgraphs/id/0xb655ca6f49e73728a102219726ff678d61d8fb792874792e9f0d9887dc616600
"Invalid Tap-Receipt header provided"%

# Subgraph health check
✗ curl http://localhost:7300/subgraphs/health/QmVhiE4nax9i86UBnBmQCYDzvjWuwHShYh7aspGPQhU5Sj
"Subgraph deployment is up to date"%
## Unfound subgraph
✗ curl http://localhost:7300/subgraphs/health/QmacQnSgia4iDPWHpeY6aWxesRFdb8o5DKZUx96zZqEWrB
"Invalid indexing status"%

# Network queries
# Checks for auth and configuration to serve-network-subgraph
✗ curl -X POST -H 'Content-Type: application/json' -H 'Authorization: token-for-network-subgraph' --data '{"query": "{_meta{block{number}}}"}' http://localhost:7300/network
"Not enabled or authorized query"

# Indexing status resolver - Route supported root field queries to graph node status endpoint
✗ curl -X POST -H 'Content-Type: application/json' --data '{"query": "{blockHashFromNumber(network:\"goerli\", blockNumber: 9069120)}"}' http://localhost:7300/status
{"data":{"blockHashFromNumber":"e1e5472636db73ba5496aee098dc21310683c95eb30fc46f9ba6c36d8b28d58e"}}%

# Indexing status resolver -
✗ curl -X POST -H 'Content-Type: application/json' --data '{"query": "{indexingStatuses {subgraph health} }"}' http://localhost:7300/status
{"data":{"indexingStatuses":[{"subgraph":"QmVhiE4nax9i86UBnBmQCYDzvjWuwHShYh7aspGPQhU5Sj","health":"healthy"},{"subgraph":"QmWVtsWk8Pqn3zY3czDjyoVreshRLmoz9jko3mQ4uvxQDj","health":"healthy"},{"subgraph":"QmacQnSgia4iDPWHpeY6aWxesRFdb8o5DKZUx96zZqEWrB","health":"healthy"}]}}

# Indexing status resolver - Filter out the unsupported queries
✗ curl -X POST -H 'Content-Type: application/json' --data '{"query": "{_meta{block{number}}}"}' http://localhost:7300/status
{"errors":[{"locations":[{"line":1,"column":2}],"message":"Type `Query` has no field `_meta`"}]}%

######## Cost server - read-only graphql query
curl -X GET -H 'Content-Type: application/json' --data '{"query": "{ costModel(deployment: \"Qmb5Ysp5oCUXhLA8NmxmYKDAX2nCMnh7Vvb5uffb9n5vss\") { deployment model variables }} "}' http://localhost:7300/cost

curl -X GET -H 'Content-Type: application/json' --data '{"query": "{ costModel(deployment: \"Qmb5Ysp5oCUXhLA8NmxmYKDAX2nCMnh7Vvb5uffb9n5vss\") { deployment model variables }} "}' http://localhost:7300/cost
{"data":{"costModel":{"deployment":"0xbd499f7673ca32ef4a642207a8bebdd0fb03888cf2678b298438e3a1ae5206ea","model":"default => 0.00025;","variables":null}}}%

curl -X GET -H 'Content-Type: application/json' --data '{"query": "{ costModel(deployment: \"Qmb5Ysp5oCUXhLA8NmxmYKDAX2nCMnh7Vvb5uffb9n5vas\") { deployment model variables }} "}' http://localhost:7300/cost
{"data":{"costModel":null}}%

curl -X GET -H 'Content-Type: application/json' --data '{"query": "{ costModel(deployment: \"Qmb5Ysp5oCUXhLA8NmxmYKDAX2nCMnh7Vvb5uffb9n5vss\") { deployment odel variables }} "}' http://localhost:7300/cost
{"errors":[{"message":"Cannot query field \"odel\" on type \"CostModel\". Did you mean \"model\"?","locations":[{"line":1,"column":88}]}]}%

curl -X GET -H 'Content-Type: application/json' --data '{"query": "{ costModels(deployments: [\"Qmb5Ysp5oCUXhLA8NmxmYKDAX2nCMnh7Vvb5uffb9n5vss\"]) { deployment model variables }} "}' http://localhost:7300/cost
{"data":{"costModels":[{"deployment":"0xbd499f7673ca32ef4a642207a8bebdd0fb03888cf2678b298438e3a1ae5206ea","model":"default => 0.00025;","variables":null}]}}%

Dependency choices

  • switching from actix-web to axum for the service server
  • App profiling should utilize perf, flamegraphs or cpu profilers, and benches to track and collect performance data. The typescript implementation uses gcloud-profile
  • Consider replacing and adding parts from TAP manager
  • postgres database connection required to indexer management server database, shared with the indexer agent
  • No migration in indexer service as it might introduce conflicts to the database; indexer agent is solely responsible for database management.

Indexer common components

Temporarily live inside the indexer-service package under src/common.

Simple indexer management client to track NetworkSubgraph and postgres connection.

  • NetworkSubgraph instance track both remote API endpoint and local deployment query endpoint.
    • TODO: query indexing status of local deployment, only use remote API as fallback.
  • Keeps cost model schema and resolvers with postgres and graphQL types: costModel(deployment) and costModels(deployments). If deployments is empty, all cost models are returned.
    • Global cost model fallback used when specific deployments are queried
  • No database migration in indexer service as it might introduce schema conflicts; indexer agent is solely responsible for database management.

Indexer native dependency

Linked dependency could not be linked directly with git url "https://github.com/graphprotocol/indexer" and path "packages/indexer-native/native" at the same time, and could not access it on crates.io. So copid the folder to local repo with the version at https://github.com/graphprotocol/indexer/blob/972658b3ce8c512ad7b4dc575d29cd9d5377e3fe/packages/indexer-native/native.

Since indexer-service will be written in Rust and no need for typescript, indexer-native's neon build and util has been removed.

Component NativeSignatureVerifier renamed to SignatureVerifier.

Separate package in the workspace under 'native'.

common-ts components

Temporarily live inside the indexer-service package under src/types

  • Address
  • readNumber

Components checklist (basic, not extensive)

  • Server path routing
    • basic structure
    • CORS
    • timeouts
    • Rate limiting levels
    • Logger stream
  • Query processor
    • graph node query endpoint at specific subgraph path
    • wrap request to and response from graph node
    • extract receipt header
    • Free query
      • Query struct
      • Free query auth token check
      • Query routes + responses
      • set graph-attestable in response header to true
    • Network subgraph query
      • Query struct
      • serve network subgraph boolean + auth token check
      • Query routes + responses
      • set graph-attestable in response header to false
    • Paid query
      • receipts graphQL schema
      • TAP manager to handle receipts logic
        • derive, cache, and look up attestation signers
          • contracts - connect by network chain id
            • network provider
        • validate receipt format (need unit tests)
        • parse receipt (need unit tests)
        • validate signature (need unit tests)
        • store
      • extract graph-attestable from graph node response header
      • monitor eligible allocations
        • network subgraph
        • operator wallet -> indexer address
    • subgraph health check
    • query timing logs
  • Deployment health server
    • query status endpoint and process result
  • Status server
    • indexing status resolver - to query indexingStatuses
    • Filter for unsupported queries
  • Cost server
    • Simple indexer management client to track postgres connection and network subgraph endpoint.
    • serve queries with defined graphQL schema and psql resolvers to database: costModel(deployment) and costModels(deployments). If deployments is empty, all cost models are returned.
    • Global cost model fallback used when specific deployments are queried
  • Constant service paths
    • health
    • ready to roll
    • versions
    • operator public key
      • validate mnemonics to public key
  • Import indexer native
  • Metrics
    • Metrics setup
    • serve basic indexer service metrics
    • Add cost model metrics
  • CLI args
  • App profiling
    • No gcloud profiling, can use perf to collect performance data.

indexer-rs's People

Contributors

aasseman avatar carlosvdr avatar gusinacio avatar hopeyen avatar jannis avatar lnsd avatar renovate[bot] avatar suchapalaver avatar theodus avatar yaroshkvorets avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

indexer-rs's Issues

[Feat.Req] Deny senders over a certain amount of unaggregated fees

Problem statement
Implement a maximum unaggregated fees trigger (that is larger than the trigger for RAV request) over which business with a sender is to be denied.

Expectation proposal
Adding:

  • A new unaggregated fees trigger for sender denial
  • A sender deny list in DB
  • PG notifications to quickly inform the indexer-service instances of adding/removal from the deny list

Indexer management client

Indexer management client is required to resolve queries to Indexer DB.
In indexer-service, the requirement for indexer management client is to serve queries to cost models.

  • create IndexerManagementClient and add to ServerOptions
  • define IndexerManagementModels to include CostModelModels
  • allow sql queries of Cost Model to the client

Initiate integration testing

Since the merge of #47, we can start preparing for some very basic integration tests. What's missing:

  • #49 Handling of the receipt aggregate requests. So for now the receipts would just get stored and nothing happens. The "tap_receipts" DB migrations would have to be run by hand for the indexer-service to be happy (the indexer_tap_agent would take care of the migrations, following the existing logic for the other components).
  • #50 Handling of the RAVs in indexer-agent when it's time for the indexer to redeem their query fees

panic on subgraph query

Subgraph queries panic early when attempting to use the QUERY_DURATION metric.

thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: InconsistentCardinality { expect: 2, got: 1 }', /home/theodus/.cargo/registry/src/index.crates.io-6f17d22bba15001f/prometheus-0.13.3/src/vec.rs:258:49
...
at /home/theodus/src/edgeandnode/local-network/build/graphops/indexer-service-rs/service/src/server/routes/subgraphs.rs:40:32

QUERY_DURATION is declared with the expectation of 2 labels, deployment & name, here. But only deployment is set here. I would have just fixed this, but I'm not sure what the name label is supposed to be populated with.

Make RAV value threshold per sender

          That's a good point. Is our thinking so far that the RAV request threshold is per `(sender, allocation)`? From a user (i.e. indexer) perspective, that doesn't feel very intuitive. More intuitive would indeed be: "per sender, I'm willing to run a risk of X GRT remaining uncollectable". That would also open the door to per-sender risk thresholds.

Originally posted by @Jannis in #83 (comment)

[Feat] More flexible IndexerServiceMetrics

Problem statement
There's only 3 fields in the metrics tracked. It would be good to have flexibility in this struct so we can add more metrics for individual data services instead of hosting a separate metrics service.

pub struct IndexerServiceMetrics {
    pub requests: IntCounterVec,
    pub successful_requests: IntCounterVec,
    pub failed_requests: IntCounterVec,
}

Expectation proposal

Generic type/field added to IndexerServiceMetrics and function to register more metrics.

Alternative considerations
We can have a separate metrics server on the data service level, but that structure seems a bit convoluted for the data providers.

Multi-network support

While multi-network support has landed for indexer-agent, it hasn't yet for indexer-service.
So no rush there yet.

malformed indexing status response

The /status route seems to be returning a JSON string value of the expected response body. Example:

"{\"data\":{\"indexingStatuses\":[{\"subgraph\":\"QmNmVRCM39PKPXqC9HNFSGW9WQWzDTWUEmpLktAd26vYCQ\",\"chains\":[{\"network\":\"hardhat\",\"latestBlock\":{\"number\":\"599\",\"hash\":\"cd3e683d1b9c8829f21b0a0c39a00ec4ec67fdb38756fcf4a275121de177ba9b\"},\"earliestBlock\":{\"number\":\"0\",\"hash\":\"0x0\"}}]},{\"subgraph\":\"QmRR4o5jApp3v7Wcym6o8dXJtBo2chnxDyFyKt12nK6eTE\",\"chains\":[{\"network\":\"hardhat\",\"latestBlock\":{\"number\":\"2265\",\"hash\":\"ca89f224b4e310a790266b94608e270b01072e514c2a5ec70f4ddf580837bc64\"},\"earliestBlock\":{\"number\":\"0\",\"hash\":\"0x0\"}}]},{\"subgraph\":\"QmeVg9Da6uyBvjUEy5JqCgw2VKdkTxjPvcYuE5riGpkqw1\",\"chains\":[{\"network\":\"hardhat\",\"latestBlock\":{\"number\":\"2312\",\"hash\":\"64272c37ada8842f1dec2fb0e7ee2f5e43684d0c1080b791223aa100ad194174\"},\"earliestBlock\":{\"number\":\"0\",\"hash\":\"0x0\"}}]},{\"subgraph\":\"QmTcYL6jsUDW2y8QE4zre7jgS26qen7PCURSdymsaVL9cG\",\"chains\":[{\"network\":\"hardhat\",\"latestBlock\":{\"number\":\"2312\",\"hash\":\"64272c37ada8842f1dec2fb0e7ee2f5e43684d0c1080b791223aa100ad194174\"},\"earliestBlock\":{\"number\":\"0\",\"hash\":\"0x0\"}}]}]}}"

Example response from the TS indexer-service in the same environment (expected):

{"data":{"indexingStatuses":[{"subgraph":"QmNmVRCM39PKPXqC9HNFSGW9WQWzDTWUEmpLktAd26vYCQ","chains":[{"network":"hardhat","latestBlock":{"number":"599","hash":"cd3e683d1b9c8829f21b0a0c39a00ec4ec67fdb38756fcf4a275121de177ba9b"},"earliestBlock":{"number":"0","hash":"0x0"}}]},{"subgraph":"QmRR4o5jApp3v7Wcym6o8dXJtBo2chnxDyFyKt12nK6eTE","chains":[{"network":"hardhat","latestBlock":{"number":"2265","hash":"ca89f224b4e310a790266b94608e270b01072e514c2a5ec70f4ddf580837bc64"},"earliestBlock":{"number":"0","hash":"0x0"}}]},{"subgraph":"QmeVg9Da6uyBvjUEy5JqCgw2VKdkTxjPvcYuE5riGpkqw1","chains":[{"network":"hardhat","latestBlock":{"number":"2328","hash":"ed848b1de1913d3f857026bd48f2f5574ae547657c2df3c77e815e3e0333c3ba"},"earliestBlock":{"number":"0","hash":"0x0"}}]},{"subgraph":"QmTcYL6jsUDW2y8QE4zre7jgS26qen7PCURSdymsaVL9cG","chains":[{"network":"hardhat","latestBlock":{"number":"2328","hash":"ed848b1de1913d3f857026bd48f2f5574ae547657c2df3c77e815e3e0333c3ba"},"earliestBlock":{"number":"0","hash":"0x0"}}]}]}}

steps to reproduce

curl http://${host}:${port}/status \
  -H 'content-type: application/json' \
  -d '{"query": "{ indexingStatuses(subgraphs: []) { subgraph chains { network latestBlock { number hash } earliestBlock { number hash } } } }"}'

feat: Deployment health server

Deployment health server at subgraphs/health/:deployment to

  • query graph node status endpoint for deployment's indexing status
  • process graph node's block number results for formatted response

feat: TAP integration on paid query receipts

Receipt related operations should be dealt by an Allocation receipt manager. The original flow from indexer typescript should be replaced with TAP manager instead.

Brief functionalities needed

  • TAP manager to handle receipts logic
    • derive, cache, and look up attestation signers
      • contracts - connect by network chain id
        • network provider
    • validate receipt format, parse receipt, validate receipt signature
    • store receipt to db
  • receipts graphQL schema for storage and queries
  • extract graph-attestable from graph node response header
  • monitor eligible allocations for client signer
    • network subgraph query for eligibility
    • operator wallet -> indexer address

feat: write indexer_tap_agent

The agent will be tasked with:

  • Monitoring outstanding, un-aggregated query fees
  • Completing all the receipt checks
  • Requesting receipt aggregate vouchers (RAVs) periodically from senders
  • Storing the RAVs in DB for the indexer-agent to eventually redeem on-chain.

EPIC: Testing

  • unit tests #14
  • API tests #15
    • enable sqlx test in CI #44
  • Integration tests #16

Escrow subgraph deployment: QmcPoPmisXLzsJXsdU6UUQ8Xot9ucfgaQgXvcnQDi4cUsX, make queries to https://api.studio.thegraph.com/proxy/53925/timeline-aggregation-protocol/v0.0.2/graphql

Shorten allowed time for recently closed allocations

Currently the allocation monitor allows recently closed allocations until the end of the next epoch.
Instead, they should be allowed for a short time after the allocation close itself (match indexer-agent, few minutes?).

[Feat.Req] Garbage collect finished SenderAllocationRelationships

Problem statement
TAP Agent will currently tell SenderAllocationRelationships to do their last RAV request when an allocation or a sender become ineligible. The actual RAV request is done asynchronously and therefore the finished SenderAllocationRelationships cannot be removed immediately.

Expectation proposal
There needs to be a process that will check SenderAllocationRelationships that are finished and deletes them. Ideally in a separately spawned async task of some sort.

Additional context

// TODO: remove SenderAllocationRelationship instances that are finished. Ideally done in
// another async task?

[Feat.Req] Limit number of receipts per RAV request

Problem statement
TAP Agent will currently try to send all unaggregated receipts with a RAV request.
The TAP Aggregator will currently not accept more than 10MB of data per request (otherwise would be a DoS vector). It was therefore decided that the TAP Aggregator should be guaranteed to handle a maximum of 15,000 receipts per call.

Expectation proposal
Implement the logic that will only pull up to 15,000 oldest unaggregated receipts per RAV request, as well as continuing to do more RAV requests if there should be more than 15,000 receipts to aggregate.
This will involve changes in https://github.com/semiotic-ai/timeline-aggregation-protocol/tree/main/tap_core.

Additional context
https://github.com/semiotic-ai/timeline-aggregation-protocol/blob/main/tap_aggregator/README.md

Bug: indexer_allocations limited to first 1000

Describe the bug

https://github.com/graphprotocol/indexer-rs/blob/main/common/src/allocations/monitor.rs#L68
This query string limits indexer_allocation query to first 1000 active allocations with no pagination. Shown by an instance that there exists indexer with 1000+ active allocations, this function fails to grab all eligible allocations for monitoring.

Expected behavior
Add pagination to the query and ensure all active allocations for an indexer are monitored.

Additional context
Uncovered by the same issue on TS version of indexer-service

feat: routes IPFS

If indexer provide endpoints for a self-hosted IPFS node, then enable routing at /ipfs with a limited set of APIs

Apache-2 license headers

inherited from the issue made by @aasseman hopeyen/indexer-service-rs#16

The license requires headers at the top of each source file.

I propose that we use this format:

// Copyright 2023-, [attribution]
// SPDX-License-Identifier: Apache-2.0

However, I'm not 100% sure about the attribution. Should it be "Hope Yen", "GraphOps", "GraphOps and Semiotic Labs", "The Graph Foundation"? Perhaps it could also be per file, such that files that I creates have "Alexis Asseman", and yours are "Hope Yen"?

And then we can also use CI like here to make sure we don't forget in the future?

I think we could do GraphOps and Semiotic Labs, will assign myself and do similar to the reference CI example

[Feat.Req] Check receipt timestamps early

Problem statement
We are not checking (in indexer-service) that new receipts have a reasonably accurate timestamp (let's say within 5 seconds of our own clock). So as it is right now it is not really safe :/.

Expectation proposal
Check in-band that receipt timestamps are reasonably close to the system clock.

Additional context
#83 (comment)

[Feat.Req] Add HTTP request compression to RAV requests

Problem statement
The TAP Aggregator API is based on JSON-RPC. Thus there are easy bandwidth gains to be had with request compression since JSON is usually highly compressible.

Additional context

// TODO: Request compression and response decompression. Also a fancy user agent?
let client = HttpClientBuilder::default()
.request_timeout(Duration::from_secs(
inner.config.tap.rav_request_timeout_secs,
))
.build(&inner.sender_aggregator_endpoint)?;

[Feat.Req] Implement receipt value check

Problem statement
TAP Agent does not currently check that the receipt values are correct w.r.t the query and Agora model.

Expectation proposal
Computing a query's price w.r.t. an Agora model is fairly easy on principle, as the canonical Agora implementation already is a Rust library: https://github.com/graphprotocol/agora.

However:

  • Indexers can change their Agora models at any time.
  • Gateways poll the indexer's models at a regular interval (~30sec).
  • The indexer is not informed of which model was used for a particular query (would have to guess by testing its most recent model versions).
  • There is no standard currently as to how long a gateway is authorized to use an obsolete model.

Additional context

let required_checks = vec![
ReceiptCheck::CheckUnique,
ReceiptCheck::CheckAllocationId,
ReceiptCheck::CheckTimestamp,
// ReceiptCheck::CheckValue,
ReceiptCheck::CheckSignature,
ReceiptCheck::CheckAndReserveEscrow,
];

test: simple unit tests

Task from EPIC: Testing#7

full coverage is the goal but be realistic about priorities

  • Validate configs, fail at startup if any provided configuration isn't valid
  • types: Subgraph deployment id IpfsHash and bytes representations
  • Create graph node instance
  • Util functions
    • package_version
    • public_key
  • attestation_signer
    routing helpers
  • routes::deployment::block_numbers
  • routes::basic::operator_info

feat: Cost server

  • Serve cost model requests
  • provide cost model GraphQL schema
  • build request to query the indexer management server
  • process and format response before returning

test: API tests with mocked server/deps

Task from #7

API tests for

  • constant routes
    -[ ] health
    -[ ] version
    -[ ] operator/info
  • nested passthroughs
    -[x] graph node query endpoint
    -[x] network subgraph (if not locally syncing the deployment)
    -[x] allocation monitors and signers
    -[ ] cost server
    -[ ] graph node status endpoint

feat: Server stream logger and rate limiter

  • stream logging for the server
  • basic rate limiter (network limiter for network queries, slow limiter for other routes except for subgraph queries)
  • performance measurements required to add ratelimiter for subgraph queries

incorrect subgraph indexing status check

The behavior of monitor_deployment_status is incorrect for the following reasons:

  1. The query uses the wrong filter key, resulting in the expected 0 index value potentially being the status for a different deployment. It should be the following:
    query indexingStatuses($ids: [ID!]!) {
        indexingStatuses(subgraphs: $ids) {
            synced
            health
        }
    }
  2. The status_url is not the correct URL. On my local-network, it's set to http://172.17.0.1:8000/status instead of http://localhost:8030/graphql, despite the latter being set as --graph-node-status-endpoint.
    image

[Feat.Req] TAP Agent gRPC API

Problem statement

For facilitating payments for Firehose and Substreams data, we will need to extend the functionality of TAP agent as described in this issue. Note: The bold parts in the use case description are what require new TAP agent functionality.

This assumes that indexers will run e.g. a Firehose Indexer Service that consumers will open a connection with in order to exchange TAP receipts and RAVs. This service will need to know when to request an RAV from a consumer and when to stop serving them.

Unlike with gateways that have a public endpoint to send RAV requests to, the only way to get RAVs from consumers is by "talking" to them directly. We therefore envision roughly the following architecture for this:

flowchart TB

  subgraph Indexer
    fis[Firehose Indexer Service]
    f[Firehose]
  end

  subgraph Consumer
    fc[Firehose Client]
    fnc[Firehose Network Client]
  end

  fc -- Firehose requests --> fnc
  fnc -- Data stream --> fc
  fis -- Authorization response, receipt & RAV requests --> fnc
  fnc -- Authorization request, receipts & RAVs --> fis

  fnc -- Firehose requests --> f
  f -- Data stream --> fnc
  f -- Report bytes sent/read --> fis
Loading

How does this work in practice?

  1. Firehose Client wants to stream some data and makes requests to a local Firehose Network Client.
  2. Firehose Network Client picks an indexer to use and opens a payment connection with its Firehose Indexer Service.
  3. Firehose Indexer Service decides whether it still owes the consumer some data from a previous receipt.
    1. If yes, it sends an authorization message back that includes a Firehose URL and auth token.
    2. If no, it sends a receipt request back. Once it gets a receipt from the Firehose Network Client, it sends the authorization message.
  4. Firehose Network Client opens the data connection and starts streaming data from the indexer's Firehose.
  5. Firehose Indexer Service tracks the bytes served to the consumer against the latest receipt it has received.
    1. When it has served the data corresponding to the receipt amount, it requests another receipt.
  6. Firehose Indexer Service also
    1. periodically checks how much collateral the consumer has remaining. If this ever goes to zero, it instructs the Firehose to terminate the data connection.
    2. periodically checks whether it needs a RAV from the consumer. When it does, it sends a RAV request to the consumer and waits for a RAV. If it doesn't get one back in a certain time frame, it instructs Firehose to terminate the data connection.

Proposal

We propose that TAP agent serves a gRPC API for Firehose Indexer Services but also Subgraph Indexer Services to connect to. This API could look as follows:

service TAPAgent {
  rpc PayerStatus(PayerStatusRequest) returns (PayerStatusResponse);
}

message PayerStatusRequest {
  bytes payer_address = 1;
}

message PayerStatusResponse {
  bytes payer_address = 1;
  bytes remaining_collateral = 2;
  optional bytes rav_request = 3;
}

Using this, different indexer service implementations can:

  1. Decide when to stop serving a consumer (e.g. when their remaining collateral goes to zero).
  2. Forward RAV requests to the consumer whenever necessary.

The subgraph indexer service could use this by periodically checking the payer status for all gateways it has interacted with. If a RAV is required for any of them, it could then send that request to their aggregator endpoint. This way, TAP agent would not need to know anything about gateways and their URLs.

The Firehose indexer service could use this by periodically checking the payer status for all consumers that have a payment connection open with the indexer. It can forward RAV requests to them via these payment connections.

Alternative considerations

  • The above gRPC API may not be ideal for performance reasons. I (Jannis) am not a gRPC/protobuf expert. Perhaps a stream would be better instead of a request/response pattern? 🤔

Additional context

  • It may make sense to also use this gRPC API between the subgraph indexer service and TAP agent. The way that could work is that the subgraph indexer service would be the one to have awareness of what gateways exist and what their RAV endpoints are. It could then periodically poll the TAP agent for whether it needs a RAV from any of the gateways, and take action accordingly. This way, the TAP agent would not have to be aware of gateways at all and the way Firehose indexer service, subgraph indexer service and others obtain RAV requests would be uniform.

[Feat.Req] Handle TAP payment faults

Problem statement
Currently TAP Agent ignores payment faults.

Expectation proposal
Faults can be invalid receipts (wrong value, receipt replay, etc), invalid RAVs, or aggregation denial.

For each type of fault, criteria should be determined for deciding when to stop doing business with a problematic sender.

The signal to reject a sender should be communicated promptly to the service instances for it to be effective.

EDIT:
Sender denial move to issue #103

Additional context

// TODO: Handle invalid receipts

Create an indexer-service framework within indexer-common

My "dream" is to be able to write an indexer service for a new data service with only a few lines of code. Think along the lines of running a single run_indexer_service function to which a few things are passed:

  • A function to extract receipts from incoming requests.
  • A function to forward incoming requests to the data service backend (e.g. graph-node, SQL database).
  • A status API of sorts
  • A cost model API of sorts

Consider splitting `AllocationMonitor` from `AttestationSigners`, etc

Would it make sense?
I'm considering this because TAP will also need to watch for new allocations, and having lots of TAP stuff inside of AllocationMonitor may be a little bit of an anti-pattern.
For such a change, I was looking into leveraging tokio::sync::broadcast or tokio::sync::watch, and the other components would just have to subscribe to know when the current set of eligible allocations has changed.

That may be overkill though 🤷

Spike: json-rpc provider as a service

Valuable to spike out what is required to provide json-rpc services by indexer-service

rough points:

  • add json-rpc/block provider dependencies to support such service
  • structure similar to QueryProcesser
  • auth-token should be flexibly rolled and communicated without shutting down the service (perhaps scheduled in an agreement contract or triggered by gateway)
  • service fee manageable by TAP?

[Feat.Req] Integrate TAP Agent with gateway registry contract

Problem statement
Currently the TAP Agent relies on a user-provided list of sender addresses with corresponding TAP aggregator endpoint URLs. This is a stopgap solution for integration testing.

Expectation proposal
Once the gateway registry contract is implemented (TBD), TAP Agent will be modified to get the TAP aggregator endpoint URLs from there.

Alternative considerations
A clear and concise description of any alternative solutions or features you've considered.

Additional context

// TODO: replace with a proper implementation once the gateway registry contract is ready
let sender_aggregator_endpoints = aggregator_endpoints::load_aggregator_endpoints(
config.tap.sender_aggregator_endpoints_file.clone(),
);

feat: Limit indexing status API

For statusSever,

  1. extract graphql query from the received request
  2. filter the query for supported root fields,
let supported_root_fields = [
"indexingStatuses",
"publicProofsOfIndexing",
"entityChangesInBlock",
"blockData",
"cachedEthereumCalls",
"subgraphFeatures",
"apiVersions",
];
  1. send the updated query request to graph_node_status_endpoint
  2. return the response from graph node

Feat: Record metrics

  • register metrics, serve metrics port that can be scrapped by Prometheus
  • record queries metrics
  • record cost model metrics
  • record indexer errors
  • Agreeing with comments from @aasseman , we can further improve metrics with

    query duration be discarded if there's an error
    FAILED_QUERIES be incremented at all the points where the functions returns with an error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.