GithubHelp home page GithubHelp logo

paritytech / helm-charts Goto Github PK

View Code? Open in Web Editor NEW
24.0 24.0 22.0 3.19 MB

Parity Helm charts collection

License: GNU General Public License v3.0

Smarty 63.50% Mustache 6.35% Shell 3.29% Makefile 1.36% Dockerfile 0.92% Go 24.59%

helm-charts's People

Contributors

alvicsam avatar andreieres avatar arshamteymouri avatar asiniscalchi avatar bakhtin avatar bulatsaif avatar ccubu avatar dblane-digicatapult avatar dependabot[bot] avatar fevo1971 avatar ironoa avatar kogeler avatar lazam avatar michalziobro avatar mutantcornholio avatar okalenyk avatar parutger avatar pierrebesson avatar radupopa2010 avatar sudo-whodo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

helm-charts's Issues

Create an umbrella charts for "polkadot-stack" and polkadot-parachain-stack"

We should create dedicated charts to provide standardized deployment of polkadot and polkadot-parachain (cumulus). Those would be wrappers around the generic "node" helm-chart with values correctly set for their respective latest image tag as well as more comprehensive docs on how to correctly configure them.

  • For simplicity, the chart version could be equal to the polkadot/cumulus release version.
  • We need to find a way to semi-automate version upgrades.
  • This is the opportunity to add end 2 end chart tests to validate that the node can connect to the network.

Update:
As agreed upon this comment, this chart will now be a polkadot-stack chart and polkadot-parachain-stack.

docker images in use to be wrapped with the umbrella chart:

Allow fixing the p2p IP

Automatic node port discovery was introduced in #28, however in some cases the operator will not want / be able to open a large range of ports (30000-32767) on their Kubernetes nodes.

An option should be added to fix the attributed nodePort, however in this case it might be impossible to support more than 1 replica for the statefulset as fixing the port will result in a port conflict for the second replica.

Similarly, it should be possible to deploy a node which uses a fixed p2p IP by using loadBalancerIP for LoadBalancer services using a pre-reserved IP at the Cloud Provider (eg. for GCP).
However in this case, the p2p service would no longer be of type NodePort but instead of type LoadBalancer.

Allow annotating created p2p services

It would be useful to be able to independently annotate the auto-created p2p Services.
This would enable using kubernetes-sigs/external-dns to manage Bootnodes / RPC nodes DNS entries.

For example:

apiVersion: v1
kind: Service
metadata:
  annotations:
    cloud.google.com/load-balancer-type: Internal
    external-dns.alpha.kubernetes.io/hostname: bootnode.testnet.parity.io.

Implement PodDisruptionBudget in node helm-chart

PodDisruptionBudget support would be very useful to be resilient to Kubernetes node pool upgrades.

It should be configured as such:

node:
  disruptionBudget:
    # only one of minAvailable and maxUnavailable can be set
    minAvailable:
    maxUnavailable: 

The template logic must check that minAvailable and maxUnavailable are not set at the same time.

Use static version instead of `tag: latest` for the node helm chart

Currently, when a new polkadot release is available, people should change their values files to point to the new version.
I think the best is to release a new version for the node helm chart per each release and use the chart appVersion as default for the image tag, then people who are using our chart for their deployment can easily update their nodes by bumping the chart version instead of editing the values.yaml file manually or by a pipeline.

Allow mounting a Kubernetes emptyDir at the keystore path

When inserting keys into a substrate node those will end up in the /data/.../chains/chain_name/keystore folder which in our setup is stored in the "data" Kubernetes volume. In a secure setup, we don't want this data volume to contain our private keys so we should mount a tmpfs (ie Kubernetes volume emptyDir type) on this path.
This will prevent keys from being persisted in the data disk after having been sourced from Hashicorp Vault or other secure places.

[node] Bug: empty vault key injection

When we use .extraDerivation in .Values.node.vault.keys the node helm chart will still inject vault key, even if vault agent failed to mount key:

cat: /vault/secrets/name: No such file or directory
Inserted key aura (type=aura, scheme=sr25519) into Keystore

The Inserted command will not fail, since it can derive from well-known key (for example //Alice, //extraDerivation)

we need to add a check in inject-vault-keys init-container. If the file does not exist the init container should fail.

Add an option to run the polkadot-introspector kvdb exporter as a sidecar

The polkadot-introspector kvdb tool can be used to monitor the database continuously. We should add support for running this exporter as a sidecar in the node helm chart.

We have successfully set it up with this configuration but it feels like a lot of boilerplate to be adding for people who would like to set this up.


extraContainers:
  - name: relaychain-kvdb-introspector
    image: paritytech/polkadot-introspector:438d3406
    command: [
      "polkadot-introspector",
      "kvdb",
      "--db",
      "/data/chains/versi_v1_9/db/full",
      "--db-type",
      "rocksdb",
      "prometheus",
      "--port",
      "9620"
    ]
    resources:
      limits:
        memory: "1Gi"
    ports:
      - containerPort: 9620
        name: relay-kvdb-prom
    volumeMounts:
      - mountPath: /data
        name: chain-data
  - name: parachain-kvdb-introspector
    image: paritytech/polkadot-introspector:438d3406
    command: [
      "polkadot-introspector",
      "kvdb",
      "--db",
      "/data/chains/versi_v1_9/db/full/parachains/db",
      "--db-type",
      "rocksdb",
      "prometheus",
      "--port",
      "9621"
    ]
    resources:
      limits:
        memory: "1Gi"
    ports:
      - containerPort: 9621
        name: para-kvdb-prom
    volumeMounts:
      - mountPath: /data
        name: chain-data

Note there should be the option to run 1 or 2 sidecars. 1 to monitor the main db and 1 for the parachain db (as an option for relay chains).
We also need to create the appropriate ServiceMonitor for loading data in Prometheus.

Missing backslash escaping cause flags to be ignored

The following issue has been reported to me by @kogeler :

Containers:
  kusama:
    Container ID:  containerd://8a9995292c78499072b0abaf828c67bfbf27024548bf144cb9e2dd511c5d7eb1
    Image:         parity/polkadot:v0.9.16
    Image ID:      docker.io/parity/polkadot@sha256:46ec2899a865ff7640ea3eaaf7306ecfc3128609fc40d7fe587f486c7ff9eba9
    Ports:         9933/TCP, 9944/TCP, 9615/TCP, 30333/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/sh
    Args:
      -c
      RELAY_CHAIN_P2P_PORT="$(cat /data/relay_chain_p2p_port)"
      echo "RELAY_CHAIN_P2P_PORT=${RELAY_CHAIN_P2P_PORT}"
      exec polkadot \
        --name=${POD_NAME} \
        --base-path=/data/ \
        --chain=${CHAIN} \
        --pruning=archive --rpc-external --ws-external --rpc-methods=safe --rpc-cors=all --prometheus-external --telemetry-url='wss://submit.telemetry.parity-stg.parity.io/submit/ 1'
        --listen-addr=/ip4/0.0.0.0/tcp/${RELAY_CHAIN_P2P_PORT} \
        --listen-addr=/ip4/0.0.0.0/tcp/30333 \
Newline is not escaped after [flags' unwrapping](https://github.com/paritytech/helm-charts/blob/main/charts/node/templates/statefulset.yaml#L283).

If you check the pod you can see:

polkadot@kusama-public-sidecar-node-0:/$ ps -p 1 -o args
COMMAND
polkadot --name=kusama-public-sidecar-node-0 --base-path=/data/ --chain=kusama --pruning=archive --rpc-external --ws-external --rpc-methods=safe --rpc-cors=all --prometheus-external --telemetry-url=wss://submit.telemetry.parity-stg.parity.io/submit/ 1
--listen-addr flags are absent in fact

End-to-end tests for the node Helm chart

Create end-to-end tests that would cover some of the most common scenarios for deploying a node with the Helm chart: a full node, an RPC node, a bootnode, a validator, and a collator.

The tests should be included into the CI pipeline. All the tests should be passing before a PR can be merged.

Implement S3 support for backups

It could e.g. be achieved with one tool for both clouds: s5cmd, which potentially also downloads the backup faster than gsutil. It would be interesting to do some tests.

Add node pruning option

We should be able to set the pruning option (pruned/archive) in the chart values to set the --pruning flags and add useful pod/service label.

Revamp node helm-chart YAML templates

node Helm chart templates are missing some of the configurable parameters available in Kubernetes (like loading environment variables from a ConfigMap using envFrom).
It is time-consuming (though rewarding on larger scale) to maintain our own standardized library of Helm templates. So instead I re-used the great template library from Bitnami during the development of staking-miner Helm chart

A node Helm chart templates should be refactored in the same way to be consistent with available Kubernetes features

Use helm-readme-generator for documenting helm chart values

The Bitnami Helm Readme Generator is very useful to maintain up to date README files. We should generalize the use of the tool:

  • All charts values.yaml comments need to be rewritten to follow the proper syntax
  • Apply readme-generator on all charts and document how to use it in CONTRIBUTING.md
  • Add a CI job that validates that the readme-generator has correctly been applied on PRs and blocks merge if it is not the case

helm-charts/charts/substrate-faucet/

Hello @PierreBesson and thank you for creating this helm chart.

I am trying to use it to deploy a faucet for Picasso Rococo

Question : How would I be able to test that the deployment is good or how to use it? Using the matrix bot?

Steps I did before installing the helm chart

  1. create a matrix bot account composablefi_faucet
  2. create a matrix access token for the bot

Those are the values I used (removed the secrets data)

helm install substrate-faucet parity/substrate-faucet \
    --set faucet.secret.SMF_BACKEND_FAUCET_ACCOUNT_MNEMONIC="removed" \
    --set faucet.secret.SMF_BOT_MATRIX_ACCESS_TOKEN="removed" \
    --set faucet.config.SMF_BACKEND_RPC_ENDPOINT="https://picasso-rococo-rpc-lb.composablenodes.tech/" \
    --set faucet.config.SMF_BACKEND_INJECTED_TYPES='{}' \
    --set faucet.config.SMF_BACKEND_NETWORK_DECIMALS='12' \
    --set faucet.config.SMF_BOT_MATRIX_SERVER="https://matrix.org" \
    --set faucet.config.SMF_BOT_MATRIX_BOT_USER_ID="@composablefi_faucet:matrix.org" \
    --set faucet.config.SMF_BOT_NETWORK_UNIT="PICA" \
    --set faucet.config.SMF_BOT_DRIP_AMOUNT="1"

Testing the endpoint picasso-rpc-lb.composablenodes.tech seems to be fine

curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "rpc_methods"}'  https://westend-rpc.polkadot.io
{"jsonrpc":"2.0","result":{"methods":["account_nextIndex","author_hasKey","author_hasSessionKeys","author_insertKey
... (content removed)


echo '{"id":1, "jsonrpc":"2.0", "method": "rpc_methods"}' | websocat wss://picasso-rpc-lb.composablenodes.tech
{"jsonrpc":"2.0","result":{"methods":["account_nextIndex","assets_balanceOf","assets_listAssets","author_hasKey","author_hasSessionKeys","author_insertKey","author_pendingExtrinsics","author_removeExtrinsic","author_rotateKeys","author_submitAndWatchExtrinsic","author_sub
...  (content removed)

By the way https://rococo-rpc.polkadot.io seems to be down

curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "rpc_methods"}'  https://rococo-rpc.polkadot.io
Service Unavailable

echo '{"id":1, "jsonrpc":"2.0", "method": "rpc_methods"}' | websocat wss://rococo-rpc.polkadot.io
websocat: WebSocketError: WebSocketError: Received unexpected status code (503 Service Unavailable)
websocat: error running

while https://westend-rpc.polkadot.io/ works

curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "rpc_methods"}'  https://westend-rpc.polkadot.io
{"jsonrpc":"2.0","result":{"methods":["account_nextIndex","author_hasKey","author_hasSessionKeys","author_insertKey","author_pendingExtrinsics","author_removeExtrinsic","author_rotateKeys","author_submitAndWatchExtrinsic","author_submitExtrinsic","author_unwatchExtrinsi
... (content removed)

echo '{"id":1, "jsonrpc":"2.0", "method": "rpc_methods"}' | websocat wss://westend-rpc.polkadot.io
{"jsonrpc":"2.0","result":{"methods":["account_nextIndex","author_hasKey","author_hasSessionKeys","author_insertKey
... (content removed)

Here is what I have in the logs

kubectl logs substrate-faucet-85b7c64b7f-vcqsn (expand me)
kubectl logs substrate-faucet-85b7c64b7f-vcqsn 
yarn run v1.22.5
$ node ./build/src/start.js
2023-03-30 12:46:26        API/INIT: Api will be available in a limited mode since the provider does not support subscriptions
[2023-03-30T12:46:26.928] [INFO] default - ๐Ÿšฐ Plip plop - Creating the faucets's account
[2023-03-30T12:46:26.929] [INFO] default - Ignore list: (1 entries)
[2023-03-30T12:46:26.929] [INFO] default -  ''
SMF:
  ๐Ÿ“ฆ BOT:
     โœ… BACKEND_URL: "http://localhost:5555"
     โœ… DRIP_AMOUNT: 1
     โœ… MATRIX_ACCESS_TOKEN: *****
     โœ… MATRIX_BOT_USER_ID: "@composablefi_faucet:matrix.org"
     โœ… MATRIX_SERVER: "https://matrix.org"
     โœ… NETWORK_DECIMALS: 12
     โœ… NETWORK_UNIT: "PICA"
     โœ… FAUCET_IGNORE_LIST: ""
     โœ… DEPLOYED_REF: "unset"
     โœ… DEPLOYED_TIME: "unset"
[2023-03-30T12:46:26.965] [INFO] default - โœ… BOT config validated
SMF:
  ๐Ÿ“ฆ BACKEND:
     โœ… FAUCET_ACCOUNT_MNEMONIC: *****
     โœ… FAUCET_BALANCE_CAP: 100
     โœ… INJECTED_TYPES: "[]"
     โœ… NETWORK_DECIMALS: 12
     โœ… PORT: 5555
     โœ… RPC_ENDPOINT: "https://picasso-rococo-rpc-lb.composablenodes.tech/"
     โœ… DEPLOYED_REF: "paritytech/faucet:latest"
     โœ… DEPLOYED_TIME: "2023-03-30T15:45:28"
     โœ… EXTERNAL_ACCESS: false
     โœ… DRIP_AMOUNT: "0.5"
     โœ… RECAPTCHA_SECRET: *****
[2023-03-30T12:46:26.979] [INFO] default - โœ… BACKEND config validated
[2023-03-30T12:46:26.995] [INFO] default - Starting faucet v1.1.2
[2023-03-30T12:46:26.995] [INFO] default - Faucet backend listening on port 5555.
[2023-03-30T12:46:26.995] [INFO] default - Using @polkadot/api 10.0.1
Connected to the in-memory SQlite database.
Getting saved sync token...
Getting push rules...
Attempting to send queued to-device messages
Got saved sync token
Got reply from saved sync, exists? false
All queued to-device messages sent
2023-03-30 12:46:31        API/INIT: RPC methods not decorated: assets_balanceOf, assets_listAssets, crowdloanRewards_amountAvailableToClaimFor, ibc_clientUpdateTimeAndHeight, ibc_generateConnectionHandshakeProof, ibc_queryBalanceWithAddress, ibc_queryChannel, ibc_queryChannelClient, ibc_queryChannels, ibc_queryClientConsensusState, ibc_queryClientState, ibc_queryClients, ibc_queryConnection, ibc_queryConnectionChannels, ibc_queryConnectionUsingClient, ibc_queryConnections, ibc_queryDenomTrace, ibc_queryDenomTraces, ibc_queryEvents, ibc_queryLatestHeight, ibc_queryNewlyCreatedClient, ibc_queryNextSeqRecv, ibc_queryPacketAcknowledgement, ibc_queryPacketAcknowledgements, ibc_queryPacketCommitment, ibc_queryPacketCommitments, ibc_queryPacketReceipt, ibc_queryProof, ibc_queryRecvPackets, ibc_querySendPackets, ibc_queryUnreceivedAcknowledgement, ibc_queryUnreceivedPackets, ibc_queryUpgradedClient, ibc_queryUpgradedConnectionState, pablo_pricesFor, pablo_simulateAddLiquidity, pablo_simulateRemoveLiquidity
2023-03-30 12:46:32        API/INIT: picasso/10011: Not decorating unknown runtime apis: 0x9c53906fa888fe7c/1, 0x5c497be959ff24ab/1, 0xf60c4a6e7ca253cc/1, 0xa74824145d05c12a/1
Got push rules
Adding default global override for .org.matrix.msc3786.rule.room.server_acl
Checking lazy load status...
Checking whether lazy loading has changed in store...
Storing client options...
Stored client options
Getting filter...
[2023-03-30T12:46:36.615] [INFO] default - Fetched faucet balance ๐Ÿ’ฐ
Sending initial sync request...
Waiting for saved sync before starting sync processing...
Adding default global override for .org.matrix.msc3786.rule.room.server_acl
Caught /sync error TypeError: Cannot read properties of undefined (reading 'cryptoStore')
    at /faucet/node_modules/matrix-js-sdk/lib/sync.js:1191:49
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async Object.promiseMapSeries (/faucet/node_modules/matrix-js-sdk/lib/utils.js:445:5)
    at async SyncApi.processSyncResponse (/faucet/node_modules/matrix-js-sdk/lib/sync.js:1184:5)
    at async SyncApi.doSync (/faucet/node_modules/matrix-js-sdk/lib/sync.js:843:9)
[2023-03-30T13:03:00.027] [INFO] default - Auto-joined !JTMeQUcNDfTSdeIIvP:matrix.org.
EventTimelineSet.addLiveEvent: ignoring duplicate event $yZzI-R0yRHKSxnsfHpGveuj3QMgs7T3Zugk7NB__mmI
[2023-03-30T13:03:06.314] [INFO] default - Auto-joined !JTMeQUcNDfTSdeIIvP:matrix.org.
2023-03-30 20:41:56        RPC-CORE: queryStorageAt(keys: Vec<StorageKey>, at?: BlockHash): Vec<StorageChangeSet>:: [502]: Bad Gateway
[2023-03-30T20:41:56.989] [ERROR] default - Error: [502]: Bad Gateway
    at HttpProvider._HttpProvider_send (/faucet/node_modules/@polkadot/rpc-provider/cjs/http/index.js:162:19)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async callWithRegistry (/faucet/node_modules/@polkadot/rpc-core/cjs/bundle.js:172:28)
2023-03-30 20:44:06        RPC-CORE: queryStorageAt(keys: Vec<StorageKey>, at?: BlockHash): Vec<StorageChangeSet>:: [502]: Bad Gateway
[2023-03-30T20:44:06.311] [ERROR] default - Error: [502]: Bad Gateway
    at HttpProvider._HttpProvider_send (/faucet/node_modules/@polkadot/rpc-provider/cjs/http/index.js:162:19)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async callWithRegistry (/faucet/node_modules/@polkadot/rpc-core/cjs/bundle.js:172:28)

Make it easy to deploy a substrate-connect enabled bootnode to Kubernetes

Substrate-connect light clients have been officially announced as ready to use for the general public (https://www.youtube.com/watch?v=TDbTCrDDO2U). However, from an operational point of view, an obstacle to adoption is that to work properly, the light client needs to access a bootnode which exposes it's p2p over websocket (typically --listen-addr /ip4/0.0.0.0/tcp/30444/ws --listen-addr /ip6/::/tcp/30444/ws). Moreover, as browsers will only allow connecting to a secured websocket, you need a reverse proxy in front such as nginx to add a letsencrypt certificate.

I believe it would be valuable to offer an easy way to deploy such a bootnode to kubernetes with the following:

  • Option to auto-generate the relevant p2p-ws Ingress and Service on the helm-chart
  • Example config showing how to set up an ingress controller and external-dns / cert-manager for automatic certificate management.

ping @tomaka

Standardize default exposed ports in the node chart

We want to make it easier to reason about the exposed ports for substrate nodes (especially collators that run 2 nodes in 1). So from internal discussions at Parity we brainstormed the following table. The logic is to reuse conventions that arose organically while minimizing confusion (eg. it is very hard to differentiate 30334 and 30344 at a glance).

To achieve this we propose to shift port numbers by -1000 for the secondary chain (ie. relay-chain for the collator). Note that most of the times those ports don't need to be exposed.

Type Primary Secondary
p2p_tcp 30333 29333
p2p_ws 30444 29444
prom 9615 8615
rpc 9933 8933
rpc_ws 9944 8944

[node] Fix readinessProbe

Staring from polkadot v0.9.28, logs are full with following lines:

2022-10-03 20:29:32 Accepting new connection 1/100
2022-10-03 20:29:33 Rejected connection: Transport(i/o error: unexpected end of file
Caused by:
     unexpected end of file)

It is caused by readinessProbe it uses tcpSocket to check if the port is open.
Previously port was closed until the node is synced but the substrate network was refactored several times and now this is not the case.

Rename helper templates in node chart

Since helm v3.7.0 we have the option to refer to a subcharts templated helper functions. However if the parent chart uses identically named helper templates then the child ones will be overwritten.

Additionally chart.name is fairly close to Chart.name which is a helm builtin object as we should move to avoid confusion.
I suggest we change the naming of these functions from chart.blah to node.blah or some other nomenclature to avoid helper templates being overwritten when called from a parent chart.

statefulSet inject keys doesn't work

When iterating over the {{- range $index, $key := .Values.node.keys }} the helm chart fails to correctly insert values.node.command in the subsequent initContainer command this is because when you use a range you change the scope of the Values.

Consider the following values.yaml

node:
    keys: 
    - type: "gran"
      scheme: "ed25519"
      seed: "//Blah"

This results in the following error: nil pointer evaluating interface {}.node as node.command is no longer in scope.

The following works but is obviously undesirable:

node:
   keys:
   - type: "gran"
      scheme: "ed25519"
      seed: "//Blah"
      Values:
        node:
          command: "command"

An easy fix is evaluating {{ .Values.node.command }} as $COMMAND and setting that as an envar similar to how we set {{ .Values.node.chain }} in the same init container. Alternatively we can set {{ .Values.node.command }} as a helm variable before the loop.

Happy to put up a fix for this if you let me know your preferred method.

Method for adding custom node-keys

In our use case we use the pallet-node-authorization and we need to be able to insert the node-key.

Currently the only methods for achieving this are:

  • Bring up a node with node.persistGeneratedNodeKey true and then kubectl exec into the container and put our key in /data/node-key OR
  • Bring up a node with node.persistGeneratedNodeKey true and take the generated node-key and recreate this in our node-authorization pallet, recreate the chainspec then download it using the node.customChainspecUrl

A better solution would be allowing us the option to create our own node-key mounted as a read-only secret instead of reading it from the RW data volume.

We can provide more detail if requested.

inject-keys init container displays keys

When inserting keys via the values.node.keys method the init container displays them, meaning that anyone who has read access to the statefulSet can read them.

It would probably be better to mount these as secret and inject them using a file redirect than an echo.

Revamp node flags settings using charts values

When using the node helm chart, it is really hard to figure out whether flags should be set in the chart values or inside node.flags. The situation can be improved:

  • List common substrate flags option and allow configuring them with chart values (--unsafe-rpc-external, --telemetry-url, --log). We shouldn't include more esoteric flags (polkadot/cumulus specific)
  • The pruning and database options should be under chainData and relayChainData.
  • Preferentially use default values (ie. not setting the flags when value=null)
  • relayChainFlags should have --name=${POD_NAME}" by default and the same telemetry URLs as the main chain.
  • Add script on node startup to check if any passed flags is in the list of flags that are handled by chart values and fail if it's the case
  • Also forbid to set flags which don't make sense for the chart: --ws-port and --rpc-port

Support externalized relay-chain node for collators

The parachain collator 0.9.300 supports collation via RPC relay chain node. In this mode the collator doesn't need to run a local relay-chain node and simply needs to point to a relay-chain RPC URL.

Add support for this mode:

  • add node.collatorRelayChain.rpcUrl which when set will add --relay-chain-rpc-url ws://rpc-node-url and disable the relay-chain data and keystore volume
  • remove collator flags when this is on

[Question] p2p networking between pods (across nodes)

Hi team ๐Ÿ‘‹

I'm using the node chart and having some trouble with p2p networking between pods.

It looks like I created a fork where pods in aks-testnet-23183882-vmss000001 node are isolated from those in the other node (peer discovery is working locally with --allow-private-ipv4 flag within each node).

> k get pods -o wide
NAME                  READY   STATUS    RESTARTS   AGE     IP            NODE                              NOMINATED NODE   READINESS GATES
ajuna-node-0          1/1     Running   0          5m51s   10.244.1.10   aks-testnet-23183882-vmss000000   <none>           <none>
ajuna-node-1          1/1     Running   0          5m5s    10.244.2.13   aks-testnet-23183882-vmss000001   <none>           <none>
ajuna-validator-0-0   1/1     Running   0          5m51s   10.244.2.11   aks-testnet-23183882-vmss000001   <none>           <none>
ajuna-validator-1-0   1/1     Running   0          5m51s   10.244.2.12   aks-testnet-23183882-vmss000001   <none>           <none>
ajuna-validator-2-0   1/1     Running   0          5m51s   10.244.1.11   aks-testnet-23183882-vmss000000   <none>           <none>

Commands I'm using:

--chain=testnet
--name=$(POD_NAME)
--base-path=/data
--rpc-cors=all
--ws-external
--rpc-methods=safe
--allow-private-ipv4
--listen-addr=/ip4/0.0.0.0/tcp/30333

Any pointers you could share to debug this?

HPA scaling event does not handle creation of p2p services

Horizontal Pod Autoscaler added in #120 when enabled conditionally removes replicas field from the StatefulSet. Creation of p2p services relies on the presence of that field. Thus when replica count is scaled up or down by HPA additional p2p services are not created/removed and no Pods can work.

We need to check whether Helm has any mechanism to rely on the current replicas set by HPA. Or, if not implement some custom handler that monitors K8s scaling events and creates/removes p2p services accordingly

Default http startup probes failing

Startup probes appear to be consistently failing in our testnets:
polkadot version: 0.9.36

probe config:

    Startup:    http-get http://:http-rpc/health delay=0s timeout=1s period=10s #success=1 #failure=30
  Warning  Unhealthy  18m (x123 over 27h)  kubelet  Startup probe failed: Get "http://10.20.142.31:9933/health": dial tcp 10.20.142.31:9933: connect: connection refused

Remove `node.serviceAccountName` property and fully replace with `serviceAccount.name`

The service account name is already defined in serviceAccount.name, so it's strange that it's possible to redefine it in node.serviceAccountName and unclear what purpose it has.

Note apparently this is mandatory to set it to the right value for Vault auth to work:

        vault.hashicorp.com/role: {{ .Values.node.vault.authRole | default (include "node.serviceAccountName" .) | squote }}

Using HashiCorp Vault in Helm Charts

Would like to use HashiCorp Vault in Susbtrate/Polkadot helm charts for secret management and protecting sensitive data, in this case, the node keys, for instance.

Replace curl for startup probe

Currently an exec to curl is used in the startup probe to work around the issue of the RPC endpoint only allowing local network access by default.

Possible solutions:

  • Disable the startup probe by default, and use a regular httpGet probe. However it will be the user's responsibility to set the correct flags to allow kubernetes probes traffic to the RPC endpoint
  • Enable the probes by default, assuming the default rpc flag has not been overridden.

Provide examples of deploying a node with Ingress

As mentioned in #108 (comment) it would be nice to have examples of provisioning a Substrate node with Ingress object (possibly for different popular Ingress Controllers). Ingress is usually required to have more control over proxying the traffic to the node. We at Parity is using Ingress to proxy traffic to boot nodes and RPC nodes. We can put examples of using it into a separate examples directory to not pollute the original Helm chart

Run all containers as read-only and prevent privilege escalation by default

If I recall, the polkadot images are pretty friendly with running as read-only as long as you use a volume for the chain data.

   containers:
      - name: mynode
         image: <the_image>
         securityContext:
            runAsUser: 1000
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false

For tmp containers where we do not want to mount a volume, the solution to keep running the containe as read-only is to mount whatever folder where the node needs to write as tmpfs.

[node] fix `backup-chain-gcs` init container

node chart has init container backup-chain-gcs to dump db to gcp on startup.

If the pod has CrashLoopBackOff the same dump will be uploaded to gcp every restart.

We need to create a "lock file" or "status of the last backup" file, Which will be checked before db is uploaded to GCP.
if the last backup is younger than 1h (1h - should be configurable in values.yml) - skip the backup.
if the last backup failed (we have a lock file) - fail with an error message.

Since the pod is in CrashLoopBackOff it will be hard to exec to the pod and clean a lockfile, we need to add option to remove it.

Telemetry chart does not install "by default"

Here is my test:

NS=testing
kubectl create ns $NS
helm install substrate-telemetry parity/substrate-telemetry -n $NS

The 3 deployments fail:

image

image

image

NOTE: In the meantime, I did install a LoadBalancer in my cluster but the lack of it does not seem to be the issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.