GithubHelp home page GithubHelp logo

tablelandnetwork / go-tableland Goto Github PK

View Code? Open in Web Editor NEW
53.0 8.0 10.0 26.52 MB

Go implementation of the Tableland database/validator - run your own node, handling on-chain events and serving read-queries

Home Page: https://tableland.xyz/

License: MIT License

Makefile 1.05% Shell 0.50% Dockerfile 0.13% Go 97.78% HCL 0.55%
tableland sql database golang sqlite web3

go-tableland's Introduction

Tableland Validator

Review Test Go Reference Go Report Card License Release standard-readme compliant

Go implementation of the Tableland database—run your own node, handling on-chain mutating events and serving read-queries.

Table of Contents

Background

go-tableland is a Go language implementation of a Tableland node, enabling developers and service providers to run nodes on the Tableland network and host databases for web3 users and applications. Note that the Tableland protocol is currently in open beta, so node operators have the opportunity to be one of the early network adopters while the responsibilities of the validator will continue to change as the Tableland protocol evolves.

What is a validator?

Validators are the execution unit/actors of the protocol.

They have the following responsibilities:

  • Listen to on-chain events to materialize Tableland-compliant SQL queries in a database engine (currently, SQLite by default).
  • Serve read-queries (e.g., SELECT * FROM foo_69_1) to the external world.

In the future, validators will have more responsibilities in the network.

Validator and network relationship

The following diagram describes a high level interaction between the validator, EVM chains, and the external world:

To better understand the usual mechanics of the validator, let’s go through a typical use case where a user mints a table, adds data to the table, and reads from it:

  1. The user will mint a table (ERC721) from the Tableland Registry smart contract on a supported EVM chain.
  2. The Registry contract will emit a CreateTable event containing the CREATE TABLE statement as extra data.
  3. Validators will detect the new event and execute the CREATE TABLE statement.
  4. The user will call the mutate method in the Registry smart contract, with mutating statements such as INSERT INTO ..., UPDATE ..., DELETE FROM ..., etc.
  5. The Registry contract, as a result of that call, will emit a RunSQL event that contains the mutating SQL statement as extra data.
  6. The validators will detect the new event and execute the mutating query in the corresponding table, assuming the user has the right permissions (e.g., table ownership and/or smart contract defined access controls).
  7. The user can query the /query?statement=... REST endpoint of the validator to execute read-queries (e.g., SELECT * FROM ...), to see the materialized result of its interaction with the smart contract.

The description above is optimized to understand the general mechanics of the validator. Minting tables and executing mutating statements also imply more work both at the smart contract and validator levels (e.g., ACL enforcing), which are being omitted here for simplicity sake.

The validator detects the smart contract events using an EVM node API (e.g., geth node), which can be self-hosted or served by providers (e.g., Alchemy, Infura, etc).

If you're curious about Tableland network growth, eager to contribute, or interested in experimenting, we encourage you to try running a validator. To get started, follow the step-by-step instructions provided below. We appreciate your interest and welcome any questions or feedback you may have during the process; stay tuned for updates and developments in our Discord and Twitter.

For projects that want to use the validator API, Tableland maintains a public gateway that can be used to query the network.

Running a validator

Running a validator only involves running a single process. Since we use SQLite as the default database engine, it is embedded and has many advantages:

  • There’s no separate process for the database.
  • There’s no inter-process communication between the validator and the database.
  • There’s no separate configuration or monitoring needed for the database.

We provide everything you need to run a validator with a single command using a docker-compose setup. This will automatically build everything from the source code, making it platform-independent since most OSes support docker. The build process is also dockerized, so node operators don’t need to worry about installing compilers or similar.

If you like creating your own setup (e.g., run raw binaries, use systemd, k8, etc.), we’re also planning to automate versioned Docker images or compiled executables. If there are other setups you're interested in, feel free to let us know or even share your own setup.

The Docker Compose setup section below describes how to run a validator in more detail, including:

  • Folder structure.
  • Configuration files.
  • Where the state of the validator lives.
  • Baked in observability stack (i.e., Prometheus + Grafana with dashboard).
  • Optional healthbot process to have an end-to-end (e2e) healthiness check of the validator.

Reviewing this section is strongly recommended but not strictly necessary.

Usage

System requirements

Currently, we recommend running the validator on a machine that has at least:

  • 4 vCPUs.
  • 8GiB of RAM.
  • SSD disk with 10GiB of free space.
  • Reliable and fast internet connection.
  • Static IP.

Hardware requirements might change with time, but this setup is probably over provisioned in the current state. We’re planning to do a stress testing benchmark suite to understand and predict the behavior of the validator under different loads to have more data about potential future recommended system requirements.

Firewall configuration

If you’re behind a firewall, you should open ports :8080 or :443, depending on if you run with TLS certificates. By default, TLS is not required, thus, expecting :8080 to be open to the external world.

System prerequisites

There are two prerequisites for running a validator:

  • Install host-level dependencies.
  • Get EVM node API keys.

Tableland has two separate networks:

  • mainnet: this network syncs mainnet EVM chains (e.g., Ethereum mainnet, Arbitrum mainnet, etc.).
  • testnet: this network is syncing testnet EVM chains (e.g., Ethereum Sepolia, Arbitrum Sepolia, etc.).

This guide will focus on running the validator in the mainnet network.

We do this for two reasons:

  • The mainnet network is the most stable one and is also where we want the most number of validators.
  • We can provide concrete file paths related to mainnet and avoid being abstract.

We’ll also explain how to run a validator using Alchemy as a provider for the EVM node API the validator will use. The configuration will be analogous if you use self-hosted nodes or other providers. Note that if you do want to support testnets, you can, generally, replace this documentation's mainnet reference with testnet (e.g., an environment variable with MAINNET would be TESTNET; docker/deployed/mainnet would shift to docker/deployed/testnet).

Install host-level dependencies

To run the provided docker-compose setup, you’ll need to have installed:

Note that there’s no need for a particular Go installation since binaries are compiled within a docker container containing the correct Go compiler versions. Despite not being strictly necessary, creating a separate user in the host is usually recommended to run the validator.

Create EVM node API keys

The current setup needs one API key per supported chain. The default setup expects Alchemy keys for the following: Ethereum, Optimism, Arbitrum One, and Polygon; QuickNode for Arbitrum Nova. But, you are free to use a self-hosted node or another provider that supports the targeted chains.

To get your Alchemy keys, create an Alchemy account, log in, and follow these steps:

  1. Create one app for each chain using the + Create App button.
  2. You’ll see one row per chain—click the View Key button and copy/save the API KEY.

To get your QuickNode Arbitrum Nova key, create a QuickNode account, log in, and follow these steps:

  1. Create an endpoint.
  2. Select Arbitrum Nova Mainnet.
  3. When you finish the wizard, you should be able to have access to your API key.

Note: For Filecoin, we recommend Glif.io RPC support, which does not require authentication; the .env variable's value (shown below) can be left empty.

Run the validator

Now that you have installed the host-level dependencies, have one wallet per chain, and provider (Alchemy, QuickNode, etc.) API keys, you’re ready to configure the validator and run it.

1. Clone the go-tableland repository

Navigate to the folder where you want to clone the repository and run:

git clone https://github.com/tablelandnetwork/go-tableland.git

Running the main branch should always be safe since it’s the exact code that the public validator is running. We recommend this approach since we’re moving quickly with features and improvements but expect soon to be better guided by official releases.

2. Configure your secrets in .env files

You must configure each EVM account's private keys and EVM node provider API keys into the validator secrets:

  1. Create a .env_validator file in docker/deployed/mainnet/api folder—an example is provided with .env_validator.example.

  2. Add the following to .env_validator (as noted, this focuses on mainnet configurations but could be generally replicated for testnet support):

    VALIDATOR_ALCHEMY_ETHEREUM_MAINNET_API_KEY=<your ethereum mainnet alchemy key>
    VALIDATOR_ALCHEMY_OPTIMISM_MAINNET_API_KEY=<your optimism mainnet alchemy key>
    VALIDATOR_ALCHEMY_ARBITRUM_MAINNET_API_KEY=<your arbitrum mainnet alchemy key>
    VALIDATOR_ALCHEMY_POLYGON_MAINNET_API_KEY=<your polygon mainnet alchemy key>
    VALIDATOR_QUICKNODE_ARBITRUM_NOVA_MAINNET_API_KEY=<your arbitrum nova mainnet quicknode key>
    VALIDATOR_GLIF_FILECOIN_MAINNET_API_KEY=

    Note: there is also an optional METRICS_HUB_API_KEY variable; this can be left empty. It's a service (cmd/metricshub) that aggregates metrics like git summary and pushes them to centralized infrastructure (GCP Cloud Run) managed by the core team. If you'd like to have your validator push metrics to this hub, please reach out to the Tableland team, and we may make it available to you. However, this process will further be decentralized in the future and remove this dependency entirely.

  3. Tune the docker/deployed/mainnet/api/config.json :

    1. Change the ExternalURIPrefix configuration attribute into the DNS (or IP) where your validator will be serving external requests.

    2. In the Chains section, only leave the chains you’ll be running; remove any chain entries you do not wish to support.

      Reference: example entry
      {
        "Name": "Ethereum Mainnet",
        "ChainID": 1,
        "Registry": {
          "EthEndpoint": "wss://eth-mainnet.g.alchemy.com/v2/${VALIDATOR_ALCHEMY_ETHEREUM_MAINNET_API_KEY}",
          "ContractAddress": "0x012969f7e3439a9B04025b5a049EB9BAD82A8C12"
        },
        "EventFeed": {
          "ChainAPIBackoff": "15s",
          "NewBlockPollFreq": "10s",
          "MinBlockDepth": 1,
          "PersistEvents": true
        },
        "EventProcessor": {
          "BlockFailedExecutionBackoff": "10s",
          "DedupExecutedTxns": true,
          "WebhookURL": "https://discord.com/api/webhooks/${VALIDATOR_DISCORD_WEBHOOK_ID}/${VALIDATOR_DISCORD_WEBHOOK_TOKEN}"
        },
        "HashCalculationStep": 150
      }
  4. Create a .env_grafana file in the docker/deployed/mainnet/grafana folder—an example is provided with .env_grafana.example.

  5. Add the following to .env_grafana:

GF_SECURITY_ADMIN_USER=<user name you'd like to login intro grafana>
GF_SECURITY_ADMIN_PASSWORD=<password of the user>

Note: the GF_SERVER_ROOT_URL variable is optional and can be left empty. By default, Grafana is hosted locally at http://localhost:3000.

That’s it...your validator is now configured!

It's worthwhile to review the config.json file to see how the environment variables configured in the .env files inject these secrets into the validator configuration. Also, note how supporting more chains only requires adding an extra entry in the Chains, so it's straightforward to add support for any of the supported testnets of each mainnet chain. Note that adding a new mainnet chain that's not yet supported by the network is not possible as this requires the core Tableland protocol to separately deploy a Registry smart contract in order to enable new chain support. This is performed on a case-by-case basis, so please reach out to the Tableland team if you'd like support for a new mainnet chain.

3. Run the validator

To run the validator, move to the docker folder and run the following:

make mainnet-up

Some general comments and tips:

  • The first time you run this, it can take some time since you’ll have a cold cache regarding images and dependencies in Docker; subsequent runs will be quite fast.
  • You can inspect the general health of containers with docker ps.
  • You can tail the logs with docker logs docker-api-1 -f.
  • You can tear down the stack with make mainnet-down.

The default docker-compose setup has a baked-in observability substack with Prometheus and Grafana. You can learn more about this in the next section.

While the validator is syncing, you might see the logs are generated rather quickly. In the docker/deployed/mainnet/api/database.db, you should expect that the SQLite database will start to grow in size.

Docker Compose setup

The docker-compose setup can feel a bit magical, so in this section, we’ll explain the setup's folder structure and important considerations. Remember that you don’t need to understand this section to run a validator, but knowing how things work is highly recommended.

Architecture and port bindings

When you run make mainnet-up, you’re running the following stack:

If you’re running the validator, you’ll see these four containers running with docker ps.

There’re two main port binding groups:

  • :8080 and :443 to the api container (the validator), depending if you have configured TLS in the validator.
  • :3000 to the grafana container to access the Grafana dashboard. Remember that if you want to access to Grafana from the external world, you’ll have to configure your firewall.

Regarding the containers:

  • api is the container running the validator.
  • healthbot is an optional container to have an e2e daemon checking the healthiness of the full write-query transaction and events execution. More about this in the Healthbot section.
  • grafana and prometheus are part of the observability stack, allowing a fully-featured Grafana dashboard that provides useful live information about the validator. There's more information about this in the Observability section.

Folder structure

The docker/deployed/mainnet folder contains one folder per process that it’s running:

  • api folder: contains all the relevant secrets, configuration and state of the validator.
    • config.json file: the full configuration file of the validator.
    • .env_validator file: contains secrets that are injected in the config.json file.
    • database.db* files: when you run the validator, you’ll see these files, which are the SQLite database of the validator (running in WAL mode).
  • grafana and prometheus folders: contain any state from these daemons. For example, Grafana can include alerts or settings customizations, and Prometheus has the time-series database, so whenever you reset the container, it will keep historical data.
  • healthbot folder: contains secrets and configuration for the healthbot.

From an operational point of view, you usually don’t have to touch these folders apart from the api/config.json or api/.env_validator if you want to change something about the validator configuration or secrets. The Prometheus setup has a default 15 days retention time for the time series data, so the database size should be automatically bounded.

Configuration files

The validator configuration is done via a JSON file located at deployed/mainnet/api/config.json.

This file contains general and chain-specific configuration, such as desired listening ports, gateway configuration, log level configuration, and chain-specific configuration, including name, chain ID, contract address, wallet private keys, and EVM node API endpoints.

The provided configurations in each deployed/<environment> already have everything needed for the environment and other recommended values. The environment variable expansion parts of the config.json file, such as secrets and other attributes in the .env_validator file, were explained in the secret configuration section above. For example, the VALIDATOR_ALCHEMY_ETHEREUM_MAINNET_API_KEY variable configured in .env_validator expands a ${VALIDATOR_ALCHEMY_ETHEREUM_MAINNET_API_KEY} present in the config.json file. If you want to use a self-hosted Ethereum mainnet node API or another provider, you can edit the config.json file in the EthEndpoint endpoint. This same logic applies to every possible configuration in the validator.

Observability stack

As mentioned earlier, the default docker-compose setup provides a fully configured observability stack by running Prometheus and Grafana.

This setup configures the scrape endpoints in Prometheus to pull metrics from the validator and data sources dashboard for Grafana. These automatically bound configuration files are in docker/observability/(grafana|prometheus) folders. They are not part of the state of the processes. This is intentional so that, for example, the dashboard is part of the go-tableland repository, and you’ll get automatic dashboard upgrades while is being improved or extended.

After you spin up the validator, you can go to http://localhost:3000 and access the Grafana setup. Recall that you configured the credentials in the .env_grafana file in docker/deployed/mainnet/grafana.

If you browse the existing dashboards, you should see an existing Validator dashboard that should look like the following, which aggregates all metrics that the validator generates:

Healthbot (optional)

The healthbot daemon is an optional feature of the docker-compose stack and is only needed if you support a testnet network; it's disabled by default.

The main goal of healthbot is to test e2e in order to see if the validator is running correctly:

  • For every configured chain, it executes a write statement to Tableland smart contract to increase a counter value in a pre-minted table that is owned by the validator.
  • It waits to see if the increased counter in the target table was materialized in the table, thus, signaling that:
    • The transaction with the UPDATE statement was correctly sent to the chain.
    • The transaction was correctly minted in the target blockchain.
    • The event for that UPDATE was detected and processed by the validator
    • A SELECT statement reading that table should read the increased counter in the target table.

In short, it tests most of the processing healthiness of the validator. For each of the target chains, you should mint a table with the following statement:

CREATE TABLE healthbot_{chainID} (counter INTEGER);

This would result in having four tables—one per chain:

  • healthbot_11155111_{tableID} (Ethereum Sepolia)
  • healthbot_11155420_{tableID} (Optimism Sepolia)
  • healthbot_421614_{tableID} (Arbitrum Sepolia)
  • healthbot_314159_{tableID} (Filecoin Calibration)

You should create a file .env_healthbot in the docker/deployed/testnet/healthbot folder with the following content (an example is provided with .env_healthbot.example):

HEALTHBOT_ETHEREUM_SEPOLIA_TABLE=healthbot_11155111_{tableID}
HEALTHBOT_OPTIMISM_SEPOLIA_TABLE=healthbot_11155420_{tableID}
HEALTHBOT_ARBITRUM_SEPOLIA_TABLE=healthbot_421614_{tableID}
HEALTHBOT_FILECOIN_CALIBRATION_TABLE=healthbot_314159_{tableID}

Finally, edit the docker/deployed/testnet/healthbot/config.json file Target attribute with the public DNS where your validator is serving to the external world. This is the endpoint where the healthbot will be making the healthiness probes. Since running the healthbot requires custom tables to be minted, it’s disabled by default.

To enable running the healthbot, you should run the following make testnet-up with the HEALTHBOT_ENABLED=true environment value set:

HEALTHBOT_ENABLED=true make testnet-up

After a few minutes, you should see the HealthBot -e2e check section of the Grafana dashboard populated:

Pruning docker images (optional)

Removing old docker images from time to time may be beneficial to avoid unnecessary disk usage. You can set up a cron rule to do that automatically. For example, you could do the following:

  1. Run crontab -e.
  2. Add the rule: 0 0 * * FRI /usr/bin/docker system prune --volumes -f  >> /home/validator/cronrun 2>&1

Backups and other routines

All validators are equipped with a backup scheduler that runs a background routine that executes a backup process of the SQLite database file at a configurable regular frequency. Besides the main backup of the database, the Backuper process executes a VACUUM process in the backup file and compresses it with zstd.

How the backup process works

The backup process called Backuper takes a backup of SQLite database file and stores it in a local directory relative to where the database is stored.

The process uses the SQLite Backup API provided by mattn/go-sqlite3. It is a full backup in a single step. Right now, the database is small enough not to worry about locking and how long it takes, but an incremental backup approach may be needed when as the database grows in the future.

How the scheduler works

The scheduler ticks at a regular interval defined by the Frequency config. It is important to mention that the time it runs is relative to the epoch time. That means, as the validator becomes operational and healthy after a deployment, it will start a backup routine in the next timestamp multiple of Frequency relative to epoch. That allows having backup files evenly distributed according to timestamp.

Vacuum

After the backup is finished, it executes the VACUUM SQL statement in the backup database to remove any unused rows and reduce the database file. This process may take a while, but it's expected since there shouldn't be any other connections to the backup database at this point.

Compression

After vacuum, we shrink the database even further by compressing it using the zstd algorithm implemented by compress library.

Pruning

We don't keep all backup files around—at the end, we remove any files exceeding the backup's KeepFiles config, located in cmd/api/config.go. The default value is 5.

Filename convention

The backup files follow the pattern: tbl_backup_{{TIMESTAMP}}.db.zst. For example, it should resemble the following: tbl_backup_2022-08-25T20:00:00Z.db.zst.

Decompressing the file

If you're on Linux or Mac, you should have unzstd installed out of the box. For example, run unzstd tbl_backup_2022-08-25T20:00:00Z.db.zst (replace with your file name) to decompress the compressed database file.

Metrics

We collect the following metrics from the process through logs:

Timestamp.             time.Time
ElapsedTime            time.Duration
VacuumElapsedTime      time.Duration
CompressionElapsedTime time.Duration
Size                   int64
SizeAfterVacuum        int64
SizeAfterCompression   int64

Additionally, we collect the metric tableland.backup.last_execution through Open Telemetry and Prometheus.

Configs

The backup configuration files are located in the docker/deployed/mainnet/api/config.json file. The following is the default configuration:

"Backup" : {
  "Enabled": true,       // enables the backup scheduler to execute backups
  "Dir": "backups",      // where backup files are stored relative to db
  "Frequency": 240,      // backup frequency in minutes
  "EnableVacuum": true,
  "EnableCompression": true,
  "Pruning" : {
    "Enabled": true,  // enables pruning
    "KeepFiles": 5    // pruning keeps at most `KeepFiles` backup files
  }
}

Development

Get started by following the validator setup steps described above. From there, you can make changes to the codebase and run the validator locally. For a validator stack against a local Hardhat network, you can run the following from the docker folder:

  • make local-up
  • make local-down

For a validator stack against deployed staging environments, you can run:

  • make staging-up
  • make staging-down

Configuration

Note that for deployed environments, there are two relevant configuration files in each folder docker/deployed/<environment>:

  • .env_validator: allows you to configure environments to fill secrets for the validator, plus, expand variables present in the config file (see the .env_validator.example example file).
  • config.json: the configuration file for the validator.

Besides that, you may want to configure Grafana's admin_user and admin_password. To do that, configure the .env_grafana file with the values of the expected keys shown in .env_grafana.example. This all should have been set up already but is worth noting.

Contributing

PRs accepted. Feel free to get in touch by:

Small note: If editing the README, please conform to the standard-readme specification.

License

MIT AND Apache-2.0, © 2021-2024 Tableland Network Contributors

go-tableland's People

Contributors

andrewxhill avatar asutula avatar avichalp avatar brunocalza avatar carsonfarmer avatar datadanne avatar dependabot[bot] avatar dtbuchholz avatar eightysteele avatar joewagner avatar jsign avatar sanderpick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-tableland's Issues

Switch to use sqlc again

When we switched from Postgres to SQLite, we had to stop using the great library https://github.com/kyleconroy/sqlc, because then it didn't have support for SQLite. Since then, we've been evolving the code in a similar way with the same folder structure.

Some time ago was announced that SQLite is now supported, so we can (and I think, should) start using this library again. It makes maintaining database access for a big part of the validator much easier and less error-prone.

See here for docs.

Separate testnet and mainnet networks

This came up in a chat with @andrewxhill. He pointed out that allowing JOINs between testnet chains and mainnet chains might setup some bad expectations.

Is there an easy way for us to ensure all tables involved in a query are either on a testnet or on a mainnet?

Move the analytics database and related infra to BigQuery

Following a suggestion by @brunocalza, we should have some ETL or decided process to move the analytics database information living in the SQLite validator to BigQuery.

This is much nicer since:

  • Retool has a native interface against BigQuery, so we don't need to run postlite in the validator VM to act as a reverse proxy to give a Postgres wire-protocol support for SQLite. Also, it potentially means removing firewall rules to let Retool access the VM.
  • Analytics read load won't hit the validator, prioritizing syncing chains and serving read queries.
  • We'd have easier access to analytics information for the team and community by directly accessing the BQ database to do ad-hoc queries.

[GOT-60] Improve the performance of table state hash calculations

Currently, we calculate table state hashes for every N block per chain. We do this to collect hashes as metrics to detect state divergence between validators in the network.

There're a couple of improvements that we can do in the current implementation:

  • We don't necessarily need to use sha1, since we don't need "cryptographic" guarantees for the hashing. We can use faster-hashing functions.
  • On every state hash, we calculate the hash of all the tables in the chain. An improvement would be only to figure the ones that changed since the last state hash calculation. This might require changing what the "root" hash means. (i.e: today is a sha1(...) of a byte-stream representation of the whole state. It can be better to be a hash of hashes of tables (vector or tree-based).
  • Related to the last point, calculating the hash of a linear byte stream of the whole state doesn't allow parallelizing hash calculations. If we switch to a hash of hashes, we can parallelize work at the "tables" level.

The above are just quick ideas around the problem. This needs more exploration and thinking.

Finally, we should consider another improvement when a validator syncs from scratch. The way it works now, it will calculate the hashes for every X blocks. But maybe it makes sense to pause the table hash calculation until it syncs to the tip and then enable the feature. This can allow a validator to sync from scratch much faster if the state hash intends to collect it as a metric and no other internal use for the protocol (for now). Maybe this isn't needed if we make hash calculations be much faster as described earlier.

GOT-60

Misleading error message when creating a table with a not allowed prefix

The issue

When executing a CREATE TABLE statement with a prefix that is not allowed (eg sqlite_, registry, or system_) we get the error message query references a table name with the wrong format. That error message does not indicate the cause of the problem very well.

Replicate the issue by running:

curl --location --request POST 'https://testnet.tableland.network/rpc' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer '"$TBL_TOKEN"'' \
--data-raw '{
   "jsonrpc": "2.0",
    "method": "tableland_validateCreateTable",
    "id" : 1,
    "params": [{
        "create_statement": "create table registry_5 (id int primary key)"
    }]
}'

cc @dtbuchholz

[GOT-31] Explore if we can make an iterator-based read-query execution pipeline

This is capturing an idea that I had a while ago. Currently, we store all the read-query results in memory and do the right formating after.

That might be comfortable for formatting but also means that we hold all the read-query results in memory which is very aggressive to the validator. Users could send queries that have very big results, and that will directly impact memory usage.

We should explore if we can leverage the iteration we do with rows.Next() in the lowest layer and pass that upstream to do the right formatting to have a bounded memory usage.

cc @brunocalza @sanderpick @asutula

GOT-31

[GOT-55] Table validator controllers are not discoverable

Discussed in https://github.com/orgs/tablelandnetwork/discussions/228

Originally posted by sanderpick August 19, 2022
There's currently no way to list the EOA address that have been granted write permission to a table. This is an issue for a couple reasons:

  • After using GRANT, it's natural to wonder if it worked.
    • The txn could succeed, but you may have had an error in the SQL statement.
    • Even if the SQL statement was validated, you might just want to sanity check by listing the current controllers.
  • It would be nice if dapps (or even the SDK) did some checking to ensure you have access to a table and don't waste funds.

Stepping back, we have two avenues to access control:

  1. Validator based access control with GRANT and REVOKE
  2. Smart contract based access control with Policys

If we had an API for getting validator based access control info, the SDK could combine that with a view of the current on-chain controller in a unified data model that could be presented to the user.

GOT-55

Deprecate Optimism Kovan from staging and testnet networks

On Wednesday 10/05, Kovan 11PST will cease to exist; thus, Alchemy will stop providing APIs targeting this network.

This means that we should remove Optimism Kovan support both from the testnet and mainnet, which includes:

  1. Removing the network in the validators config, so we stop syncing the chain.
  2. Decide what to do with existing data in these chains.
  3. Communicate with users and validators about this event.

My first thoughts on this are that we could stop syncing the chain when Alchemy starts erroring or decide to do it some hours earlier. No strong opinion about this.

If necessary, we can promise to keep the existing data for two weeks for read-queries to keep working. Still, I'm not sure that would be entirely useful for users since the underlying chain is unusable.

We have a pending cold full-sync for testnet and staging validators because we did some fixes and spec changes that have a consistent state in the last months. That will happen probably later than two weeks, so we can simply wait until we do this to make this data disappear (or, more precisely, it won't be rebuilt in the full sync).

cc @brunocalza @sanderpick @carsonfarmer

Have mobile based alerting for the #grafana-alerting Discord channel

Since our discord channel has more than 1000 users, Discord has a limitation of disabling channel notifications on mobile devices.
This is pretty bad since it is useful to know about any alerting in this channel as soon as possible.

We should configure in Grafana Alerts with other additional contact points apart from Discord, to get mobile notifications from there.
It doesn't need to be PagerDuty, but can be any option that can pipe messages as mobile notifications.

[GOT-56] feat: add support for randomness in SQL queries

Is your feature request related to a problem? Please describe.

Normal SQL supports a randomness function, which can be useful for both read and write queries. I'd like to leverage something similar to the RANDOM() function in Tableland to help increase the parity with native SQL.

Describe the solution you'd like

Create/enable support for a RANDOM([seed]) function that generates a random number. This could be used in a read or write query:

SELECT * FROM mytable ORDER BY random()
INSERT INTO mytable VALUES (random())

Ideally, the RANDOM keyword could be used for 1:1 SQLite parity to reduce potential friction there, but this may not be possible and might require something like RANDOM_TBL(). Non-SQLite SQL often use RAND, in case that's possible.

Describe alternatives you've considered

Using something like ChainLink VRF, which may still be useful but adds a significant cost for "more secure" rng guarantees.

Additional context

Discussion about this in Discord: here. For context, the SQLite random() and randomblob() both use pseudo-rng: here.

GOT-56

Consider re-organizing dependencies creation

I think our main function is reasonably clear, but we can probably do the best job organizing how the dependency tree gets created.
We don't need to use any dependency injection tool or magical indirections.

I'm referring to better encapsulation, making it more obvious that everything that should be gracefully close is registered, and avoiding cognitive load in figuring out how the modules fit together.

There's no rush to do this, I'm capturing a feeling.

Support EVM specific functions for read-queries and write-queries

There's been a few cases come up where it would be nice to grab some available state from transactions:

  • TXN_HASH(): The validator would replace this text with the has of the txn that delivered the event (only available to write queries).
  • BLOCK_NUM(): The validator would replace this text with the number of the block that delivered the event (only available to write queries).
  • BLOCK_NUM(<chain_id>): The validator would replace this text with the number of the last seen block (only available to read queries). For example, in Rigs we want to show how long a rig has been in-flight at a metadata attribute. We have start_time and end_time columns on a flight sessions table that is contains block numbers. Each flight session results in a row in this table with a start time. If end_time is null, the rig is in-flight. So, to calculate total flight time including an active session, we'd use this method:
SELECT SUM(COALESCE(end_time, BLOCK_NUM(<chain_id>))) - SUM(start_time) 
FROM pilot_sessions_80001_3515 WHERE rig_id=11

BLOCK_TIME could be added in the same way as BLOCK_NUM, with a read and write flavor.

[GOT-34] Validate Query against Table structure

The Parser currently allows the user to validate a query in isolation, but I'd like to validate that a query will work for a given table.

I imagine it would work something like this:

const createTable = "CREATE TABLE MyTable_1_1 (id integer, name text, message text);";
const insertStatement = "INSERT INTO MyTable_1_1 (id, person) VALUES (1, 'Allen');";
parser.validate(createTableStatement, insertStatement); // Returns a failure, since "person" isn't a valid column.

An alternative would be to simply instantiate an in-memory database with SQLite, and attempt to run the query against it. For my purposes, I'd have to import sql.js or some other SQLite for browser's solution, into the application or SDK in which I want to validate the query. This adds a lot of overhead for this validation.

GOT-34

Add `type` to `runReadQuery` RPC response `columns` contained object

User Story

A simple read query returns the columns and rows for a particular table. However, for each column, it is not clear what the type on the column is. Namely, if I create a column of id, there's no way to directly know what this type should be without some specific SQL knowledge.

As an example, I would have to run select sql from sqlite_schema where name='rest_api_5_85' (see query here), which returns a row of CREATE TABLE rest_api_5_85 (id INT) STRICT. Then, I'd have to parse that response etc. to get the actual type.

Requirements

Add type to each object returned in the runReadQuery RPC response's columns array:

{
  "columns": [
    {
      "name": "id",
      "type": "int"
    }
  ],
  "rows": [
    [
      0
    ]
  ]
}

Gather information about read-queries

Today we collect operational (Prometheus) metrics about read-queries, histogram latencies, and similar. We show this in our operational dashboard, which is quite useful.

But I feel we're missing something extra: visibility in outliers.
Once in a while, I see spikes in the 95-th latency in the /query endpoint, which is the result of sometimes a user sending complex/big queries.

It would be useful to know:

  • Which are these queries?
  • Who is sending these queries, and see if it's worth taking some appropriate action of "banning" per user/IP-address.
  • Looking again at our timeouts for read-query execution to put some hard bounds.

Additionally, it might be interesting to simply collect all the read-queries (or do some sampling, but I don't think the volume is that big) so we can understand better how people are using the network to improve their experience or have better focus points. This can be an optional feature to enable in the validator since it has some storage overhead.

cc @sanderpick @brunocalza @carsonfarmer @dtbuchholz

[GOT-32] Reconsider the .json validator configuration file

The current config.json validation configuration file looks reasonable, but I believe it has some problems.

In the same configuration file, we have two big groups of configurations:

  • Protocol level configurations: TableContraints, QueryConstraints, SC-addresses, etc.
  • Validator level configurations: wallet addresses, private keys, port numbers, etc.

In my opinion, a problem arises when we do some protocol-level configuration change. If people use that config.json configuration file with docker-compose, if we edit any field, it might conflict with any other edition the validator has done in the file. For example, it feels very odd if the validator edited the port number or any other validator-related configuration.

Moreover, the "Protocol level configuration" fields shouldn't be changed by validators since they're part of the protocol. This makes the config.json file be somewhat confusing.

If I'm a validator, and I look at config.json, it isn't clear which fields I can change and which ones I must not change.
I feel it would be better to have these two groups live in different places. For example, it might make sense to leave the protocol level configurations hard-coded and migrated by the code in the SQLite table and not in the config file. (This is just an idea).

We should reflect on this and see if we can do better. Open to suggestions.

GOT-32

[GOT-33] Test framework with SQL fuzzer to test out new spec language features

From a conversation on Discord... it would be pretty cool to have a testing framework where we could just test out adding features to our SQL language spec and run them with a fuzzer or something using a simulated network, to try to catch non-determinism.

Questions like, "what if we added sub-selects to write queries, does that lead to non-determinism?" could be answered in an open and testable framework, and could be run via CI etc.

GOT-33

Add animation_url to table NFT uri object

I'm working on an HTML app which will render a Table's contents, along with the query (which will be editable by the user).
The HTML app will pretty simple. There will be an input box for a query, which can be edited, but will be prepopulated with a SELECT * FROM ${table_from_page_params} LIMIT 20, and that query will be automatically run, with the contents shown below the query box. The animation_url of a Table's NFT would point to this app. Eventually, I'm thinking it could be an IPFS pointer, but for now we can point to it the same way we point to the SVG at render.tableland.xyz.

For now, I'm thinking https://render.tableland.xyz/anim/?chain=1&id=1,

Send structure hash in response to tableland_createTable RPC endpoint

This is a feature request from the #creator-chat on discord. I also started a ticket there
The request is that the call to the SDK's create method returns the structure of the table. currently it only returns the queryable name.
The developer was also asking that the SDK expose a method that allowed devs to get the structure for a given create statement. If we want to support that idea I think it would require a new RPC endpoing, or we could duplicate the hashing function in the SDK. This would add complexity to the SDK and force us to maintain that hashing in two places. I tend to think setting up a new RPC endpoint is the better option, but since the hashing function will probably not change it might be ok to rewrite it in the SDK. Thoughts?

If we do want to implement this, as a first draft suggestion I'd say we could do the following:

  1. calls to the tableland_createTable RPC endpoint can return JSON with the form {"name": "brave_joe_87", "structure": "ef7be01282ea97380e4d3bbcba6774cbc7242c46ee51b7e611f1efdfa3623e53"}
  2. A new RPC endpoint called tableland_structureHash is created and expects the params to have the form [{statement: "create table newtlb1 (a int);"}]. This will return JSON of the form {"structure": "ef7be01282ea97380e4d3bbcba6774cbc7242c46ee51b7e611f1efdfa3623e53"}

Again all of this is just more of a draft than a spec, and would be great to hear if anyone has suggestions for a better implementation. Or if there's a reason not to do it, I'm sure the person requesting the change will be amenable.

If we do start making these changes I can open an issue/pr in the SDK

The SIWE message `domain` field does not match spec

The SIWE spec defines the message domain field as being a RFC 3986 URI. We are currently enforcing that the value be set to "Tableland" which is not an RFC 3986 URI compliant value.
After re-reading the siwe spec we should potentially ignore the value of domain and start checking that the value of the message uri field matches the config attribute in the validator with the DNS where the gateway is served.
This change will be breaking for any one using the js sdk <= 3.1.x, and as suggested in tablelandnetwork/js-tableland#112 by @jsign we should potentially enable accepting both message formats for some period to allow clients to update their SDK versions gracefully.

Add support for a `tableland.network` gateway, redirecting to Tableland testnet or mainnet

Is your feature request related to a problem? Please describe.
Register & use a tableland.network domain that handles redirects to the Tableland testnet or mainnet gateways via inference.

Describe the solution you'd like (via @sanderpick)
Ultimately, we want this:

  • Tableland testnet only deals with evm testnets
  • Tableland mainnet only deals with evm mainnets

This means that chainID actually infers the Tableland network. So, instead of standing up some new gateway like mainnet.tableland.network and requiring devs to make changes to their contracts, we can just alias testnet.tableland.network with tableland.network and create some redirect rules:

  • If chainID is an evm testnet, permanent redirect to testnet.tableland.network if need be
  • If chainID is an evm mainnet, permanent redirect to tableland.network if need be

Describe alternatives you've considered
As noted, a dedicated mainnet.tableland.network was considered, but the solution above makes any future migration much more seamless.

Additional context
This reduces the need to update an ERC721 contract's tokenURI (or other methods) that reference the Tableland gateway. It alleviates some friction points, especially, if a contract was not upgradable and couldn't change the URI itself.

[GOT-66] Create stress tests to understand how read-query load affects syncing time

We should try to understand better how read-query load affects background work regarding syncing chains.

Technically speaking, we use WAL mode in SQLite, so readers shouldn't block writers, compared to the default journal mode where this isn't the case. Using WAL mode was done intentionally exactly for this reason.

This can help us understand better the limits of the validator in particular VM setups (e.g: the one we use in the cloud).

In any case, reads are easy to scale since they can be executed in "replica" nodes without interfering with writes. That's a more complex setup since it requires SQLite replication (we can use excellent projects out there) and other processes running, but is totally doable. But, it might be good to understand that limit better and know if we're getting closer to working on it.

GOT-66

Allow a fresh validator to sync from a snapshot URL

Our cloud validator is creating snapshots of its database with some frequency.
This is useful for many things, but in particular, it can be the basis to allow a validator to sync from a trusted/verified point in time forward.

Some obvious use-cases for this:

  • Spinning a second cloud validator quickly.
  • Allowing external validators to have a synced validator within less time and with fewer costs than syncing from scratch.
  • Have a local validator up and synced as fast as possible for local debugging.

I'm pretty sure we need to do some extra work in the backup databases since they contain data in some tables that shouldn't be there. For example, we still copy the relayed transactions table, etc.

As part of uploading the database backups to some cloud storage to have a public URL, we can have a previous step of cleaning up information that shouldn't be there. But we can also fine-tune the backup process to generate this database directly.

Open to comments, it might require some more dedicated scoping.

Investigate WAL checkpointing problem

We've detected that the WAL size of the validator seems to be growing without bounds.

This might be caused by Checkpointing starvation as described in the SQLite WAL documentation. After a quick look some time ago, we haven't found any obvious reason why this would be the case. We had even set all the database pools to SetMaxIdleConnections(0) to be sure idle connections were interpreted as "open" thus blocking checkpoints.

Unfortunately, being so aggressive with idle connections has a considerable performance hit in all the validators. The performance hit isn't noticeable today due to our needed throughput. But, considering it hasn't solved the checkpointing problem, it should go away.

We need to dedicate some extra time to check what might be happening regarding checkpoint starvation to see if we see the ultimate cause. If we can't find it, we should create a background process in the validator that forces checkpointing via a PRAGMA.

[GOT-58] Add query transformation capabilities (e.g., render SVG from a query)

User Story

Tableland is hyper-focused on enabling composable, dynamic NFT experiences. However, every dynamic NFT requires some sort of rendering component. This means web2 infra is required, which detracts from the value prop of dNFTs in the first place. If possible, it'd provide a major value add if Tableland could act as a permissionless, serverless relational database plus renderer.

Requirements

Enable query transformations such that a developer can call a Tableland endpoint and receive an SVG as a response (or a more generic implementation vs. only enabling SVGs).

Additional Context

The future internet + UIs will be made up of ERC721-based objects, so we should enable that functionality the best we can. Original TPD here.

Note: As this has been a commonly asked request, please use this issue to log +1s upon hearing the request. And for any devs in the community, please voice your opinion and add any context, as desired.

GOT-58

REST endpoint doesn't return any tables for a controller's address

User Story

When I call the /chain/{chainId}/tables/controller/{address} endpoint, I expect a response of an array of objects where each object represents a table controlled at the path param address. However, when calling this endpoint using a smart contract address that implements TablelandController and has been set as the controller for a table, the results are empty.

Steps to Reproduce

The following was performed on Goerli (chainId: 5)

  1. Create a table
  2. Deploy a smart contract that implements TablelandController (allow all example from the evm-tableland repo)
  3. Call setController on the TablelandTables deployed smart contract with the address from (2) and the table from (1)
  4. Insert data into the table with a random wallet, which should be able to due to the allow all contract (this works as expected)
  5. Call the REST API at endpoint /chain/{chainId}/tables/controller/{address} -- here, the response is empty, i.e., []
  6. Validate the controller address is the controller by calling getController on the registry smart contract with the tableId from (1), which should match the address from (2)

Here's an example that does the steps above:

  1. See table rest_api_5_85 (created at tx: 0xf8da1143d65fde76c3a175685810146ea32d6695af76cf1d77758e879abaa321)
  2. See the deployed controller contract at address 0x627A289FDfFA63b662FC52FE8b30b479B8AED2a8, which simply implements the provided allow all contract in the evm-tableland repo (link above in step 2)
  3. Calling setController with the address above on the Tableland registry contract for Goerli -- tx hash: 0x83f1f3204cc8df3b4e27789579bbf485af566cbaa1a26bc2cf8f2c1556989c69
  4. Insert data with a random account, successfully -- tx hash: 0x880b05171f22930f55c5099e7115c434f6f299abf982b2258c1bf77f6838364a
  5. Make a GET request at the address in (2), which incorrectly returns nothing (i.e., this is the controller of rest_api_5_85 but the API says otherwise and returns []): curl 'https://testnet.tableland.network/chain/5/tables/controller/0x627A289FDfFA63b662FC52FE8b30b479B8AED2a8'
  6. Lastly, check the registry smart contract using getController, passing 85, which does return 0x627A289FDfFA63b662FC52FE8b30b479B8AED2a8 as the table's controller -- validating that (6) does not work as expected

Create a multi-versioned execution pipeline

This is a big one that probably deserves a project.

Our current execution pipeline isn't multi-versioned, meaning that any new feature or code change would affect historic events that happened in the network. This is fine now since we're just starting with the protocol definition.

At some point, we should start including new functionality in the protocol spec that actives at a particular block number in each chain. If we don't do this, by creating new features we would be "rewriting history" and potentially changing the table state for users that shouldn't expect that to be happening.

To do this, we need to do some refactor of the execution pipeline to have multiple versioned Executor (and downstream) "packaged" that are "plugged"/"switched" depending on the block number we're in. At activation times, this also might require adding a functionality to run migrations in the database.

For example, if we announce that at block X in Mumbai the Tableland will have a database migration in all tables defined by some rule in the spec, we need to support that in the validator. Another example is mentioning that a new SQL statement will be supported starting at block X in Goerli (and not before, if you cold sync a validator).

Hopefully, this makes sense, but let me know if the idea isn't clear. Of course, we need to dedicate time to design and scope the work to do since this will take some work.

[GOT-62] Create benchmarks to measure validator's performance

Standing in the shoulders of #299, we can create benchmarks to understand better:

  • How fast we can sync from scratch using production data.
  • Which are the non-network related bottlenecks in the execution pipeline.
  • Build a table of average execution latency (and potential outliers) per event type.
  • Measure the performance for both in-memory and file-based SQLite databases.
  • Do some plots to compress all the information and understand it more easily quickly.
    The above are just some ideas.

The main point is that we can have a baseline to:

  • Understand how the validator behaves today to see if there's any pressing bottleneck to tackle. My experience with #299 is that there isn't any Go related code that needs to be optimized, but it was a quick exploration.
  • Having averages and known latencies for event types allows us to understand how future features and changes affect the performance of the validator. Without a baseline is hard to see if we're introducing important performance regressions.
  • My feelings today are that the main improvements can come from improving file-based SQLite performance, mostly exploring other drivers (at least for write-queries; i.e: Executors). But, without having measurements, we can't guide our decision.

At some point, I'd be interested in working on this.

It may be the case that this deserves its own project. For now, capturing it with some ideas in this issue.

GOT-62

Design a process to publish, announce and document new releases

Currently, we don't have any process in the validator regarding:

  • Tagging new versions.
  • Generating publicly usable infra assets (docker images, binaries, etc.).
  • A changelog pipeline to explain what was included in each version.
  • Potential release notes that might need to explain to validator changes in their config or particular things to watch out for (e.g. migrations).
  • Keep improving and updating validator documentation. (Today, we only have a Notion doc, but we'll probably need a section in our docs).
  • What does merging to a branch means (e.g: merging to main)

When we start thinking about this, I feel it may be better to transform this GH issue into a full project since it has many angles to cover and requires scoping/discussion. I'm creating the GH issue to capture the need.

Add `tableland_relayCreateTable` to testnet rpc calls (and, optionally, REST API on top of it)

User Story:

I'm a developer that only uses the Tableland JSON RPC API (alternatively, only the REST API). I do not use the Tableland CLI. On testnet, I can write to a table using raw JSON RPC calls, but I cannot create a table. I have to first download the CLI, get a SIWE token, and then run tableland create...then, use that SIWE token in my RPC calls as well as the table that was created from the CLI.

Requirements:

  • Add tableland_relayCreateTable to the testnet RPCs (to match a similar flow with tableland_relayWriteQuery), taking params owner (string) and statement (string)
  • (Optional) If possible, the SIWE generation should be made available from an RPC (not sure if this is feasible) so that a dev does not have to install the CLI to access these calls. Maybe an oversimplification so ignore if not possible/ideal.
  • (Optional) Every RPC should have a REST API on top of it (where it makes sense) to allow for a developer that uses purely REST APIs to get the same functionality. This is a "nice to have" but the focus is on the RPC first; scope creep but a consideration.

Add a PR template

It would be useful to add a PR template, so we always have a suggested structure for PRs.

I think we've already experimented with some ideas in the latest PRs, so we can be inspired on those plus get ideas from other repos.

Use snake case for `errorEventIdx` in `tableland_getReceipt`

Description

A small nit -- I'm not sure if there is a reason why camelcase was used for errorEventIdx (part of tableland_getReceipt response), so if so, nbd. But, for consistency across the other RPC fields (which all use snake case), the errorEventIdx should instead be error_event_idx.

[GOT-64] Avoid returning 404 if read-query statement returns 0 rows

Some days ago, I was looking at error level logs in the validator, and I noticed a couple of "Rows not found" messages.

These came from /query API calls with read queries that result in 0 rows. It wasn't reasonable to log an error level message in this case since it's fine that users will make read queries with empty results, so the log line was removed to avoid noise in logs.

Apart from that, while chatting with @brunocalza about this case, we have the sensation that the API shouldn't be returning StatusNotFound either see here.

If we receive a read-query that returns an empty result, it'd seem OK to return a 200 status code with an empty result. Technically speaking, the read-query executed fine, and the correct result is empty. That's maybe the intention of the read-query: checking if the result is empty. If that was the case, the user is getting a 404 status code which seems odd.

We don't remember if this was done on purpose, so maybe has a good reason. Also, this is a breaking change since some people might be relying on the 404 behavior today.

We're surfacing this situation here for further discussion since some other API changes are coming so it's worth bundling it with them.

cc @sanderpick @carsonfarmer @joewagner @awmuncy @dtbuchholz

GOT-64

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.