status-im / infra-nim-waku Goto Github PK

View Code? Open in Web Editor NEW

4.0 36.0 5.0 337 KB

Infrastructure for Nim Waku

Home Page: https://github.com/status-im/nim-waku

Makefile 3.67% HCL 5.77% Python 85.89% Shell 1.91% Jinja 2.77%

infra fleet waku

infra-nim-waku's People

Contributors

Stargazers

Watchers

Forkers

jm-clius isabella232 william3johnson inferno-inc sionois

infra-nim-waku's Issues

Waku V2 prod fleet WebSockify connectivity issues

I've noticed the following possible issues with the Waku v2 prod fleet:

Logging/Kibana issues Resolved

Logs aren't available for any Waku v2 prod nodes on Kibana since ~11:13 UTC on the 14th of June. (In fact, the only Waku v2 service that's still logging seems to be the one on host node-01.gc-us-central1-a.wakuv2.test.)
The logs for all services on node-01.do-ams3.wakuv2.prod disappeared even earlier from Kibana (9th of June).

Connectivity to `node-01.do-ams3.wakuv2.prod` Resolved

prod nodes do not seem to be (stably) connected to node-01.do-ams3.wakuv2.prod. From available logs:

The other two prod nodes seemed to be trying to reach the peerId for node-01.do-ams3.wakuv2.prod at the wire address for node-01.ac-cn-hongkong-c.wakuv2.prod (/ip4/8.210.222.231/tcp/30303). Perhaps some earlier inconsistency in the connect.yml script? Since these get persisted, we may need to drop the Peer table on each of the prod nodes, before running the connect script again.
It seems possible that node-01.do-ams3.wakuv2.prod was not updated with the other two nodes after deploying to prod. This is based on the other two nodes complaining that it does not support the same protocols as them. (Though this could just be a side-effect of the issue (1) above.)

Possible `websockify` issues

websockify (on all hosts) logs frequent errors: (e.g. "Connection reset", "no shared cipher", "bad syntax", "Method not allowed") on available logs.

Prod fleet (websocket) down

Cannot connect to any of the prod fleet waku2 nodes using websocket:

client.js:18 WebSocket connection to 'wss://node-01.ac-cn-hongkong-c.wakuv2.prod.statusim.net/p2p/16Uiu2HAm4v86W3bmT1BiH6oSPzcsSr24iDQpSN5Qa992BCjjwgrD' failed: WebSocket is closed before the connection is established.
client.js:18 WebSocket connection to 'wss://node-01.do-ams3.wakuv2.prod.statusim.net/p2p/16Uiu2HAmL5okWopX7NqZWBUKVqW8iUxCEmd5GMHLVPwCgzYzQv3e' failed: WebSocket is closed before the connection is established.
client.js:18 WebSocket connection to 'wss://node-01.gc-us-central1-a.wakuv2.prod.statusim.net/p2p/16Uiu2HAmVkKntsECaYfefR1V2yCR79CegLATuTPE6B9TxgxBiiiA' failed: WebSocket is closed before the connection is established.
client.js:18 WebSocket connection to 'wss://node-01.do-ams3.wakuv2.prod.statusim.net/p2p/16Uiu2HAmL5okWopX7NqZWBUKVqW8iUxCEmd5GMHLVPwCgzYzQv3e' failed: WebSocket is closed before the connection is established.
client.js:18 WebSocket connection to 'wss://node-01.ac-cn-hongkong-c.wakuv2.prod.statusim.net/p2p/16Uiu2HAm4v86W3bmT1BiH6oSPzcsSr24iDQpSN5Qa992BCjjwgrD' failed: WebSocket is closed before the connection is established.

https://status-im.github.io/js-waku/ can be used to check (open console logs)

bug: deployed DNS lists for `wakuv2.*` fleets contain wrong ENR

Problem

As part of automating Merkle tree lists for DNS discovery in the status.* fleets, we also updated the deployed DNS lists for the wakuv2.* fleets. However, the ENRs in the deployed lists only for the wakuv2.* fleets are incomplete as reported in waku-org/nwaku#1464. The lists for status.* fleets are correct.

Not sure how this happened, as a quick scan through the current logs indicate that the nodes themselves log the correct ENRs and should provide the correct ones as response to info requests. It may be that some inconsistency/outdated versions of nwaku were running at the time of list deployment.

What should be done?

Verify from at least one node in a wakuv2.* fleet that it provides a complete ENR in response to info RPC requests. Afaik these responses are used to scrape the ENR records as source for deployed lists. For our fleets, a "complete" ENR in decoded form:

usually ends in TIP or Mg8 (not Dw), or
contains a multiaddrs field when pasted and decoded into https://enr-viewer.com/

If (1) successful, trigger the deployment script to redeploy the lists for the wakuv2.* fleets with the correct ENRs.

Deploy Waku v2 `prod` nodes list to DNS provider

UPDATE (original description follows below, but is outdated):

After Release v0.7 the nim-waku prod list to be deployed to a DNS provider, in zone file format:

; name                        ttl     class type  content
@                             60      IN    TXT   enrtree-root:v1 e=2FPT7YBR4ZHGPTSEDRD5D6N7MU l=FDXN3SN67NA5DKA4J2GOK7BVQI seq=1 sig=6fuJ5SvsUX_9se681PVe2ZToyTfkm6WHTD9hoy3RhFtDMTD2ZvxgJ3V1283aFhBsftch0T3UvKohWM404A7eIgE
FDXN3SN67NA5DKA4J2GOK7BVQI    86900   IN    TXT   enrtree-branch:
2FPT7YBR4ZHGPTSEDRD5D6N7MU    86900   IN    TXT   enrtree-branch:MFAJGGC6CLWF6CXCR2V6CNED7Q,QHKIV5TAXBC6N24HILY7ASGSFU,WTAZHIHIXAEAFB4KVC7SNBIFLU
MFAJGGC6CLWF6CXCR2V6CNED7Q    86900   IN    TXT   enr:-Iu4QJUcYA1ULMF9UfwOZBLNPT0dSxrVtGRws6KhE9b3WH9IS-1gAfTSzJjm30gp3Wz46upPJbdarGOIZTNlG3k_n0YBgmlkgnY0gmlwhAjS3ueJc2VjcDI1NmsxoQKNAvlb34B4l68WYt7pg4qKFzBmPRb2I-f3dUOBHku3soN0Y3CCdl-Fd2FrdTIP
QHKIV5TAXBC6N24HILY7ASGSFU    86900   IN    TXT   enr:-Iu4QALmEco5FZq1DDf3egW27AJ6YSg4Rpdp3RJxQlLQbdrNDxsYJXIVH8ZUh346EVOldWhxr8zoiyfBwJPDgPoNoOsBgmlkgnY0gmlwhLymh5GJc2VjcDI1NmsxoQNuXVf_MkjDcQAka2YcxWfhnbGGlKhRo_kSWwcoK_uGVYN0Y3CCdl-Fd2FrdTIP
WTAZHIHIXAEAFB4KVC7SNBIFLU    86900   IN    TXT   enr:-Iu4QE2RvpRiwWwoCf-EkwCx25_ftH5VOiv67bp2gO1qzi6yKzYMV3ZrG4JDEOyJuZs5miH-nRtM3r48ksb6QL2gvWYBgmlkgnY0gmlwhCJ5ZGyJc2VjcDI1NmsxoQP99JB4EKn12UYqGuCf7uWrIF0yeYsP_MN5RCAh-Exbv4N0Y3CCdl-Fd2FrdTIP

This must be deployed to:
prod.waku.nodes.status.im

The public key used to sign this list is:
ANTL4SLG2COUILKAPE7EF2BYNL2SHSHVCHLRD5J7ZJLN5R3PRJD2Y

The list can then be accessed, verified and linked to other trees using the URL:
enrtree://ANTL4SLG2COUILKAPE7EF2BYNL2SHSHVCHLRD5J7ZJLN5R3PRJD2Y@prod.waku.nodes.status.im

cc @richard-ramos @D4nte

NB: everything below serves only as background. The list to be deployed and DNS domain is set out in the updated section above.

Background

Release v0.6 of nim-waku includes node discovery via DNS. This allows new nodes to discover a bootstrap list of Waku v2 peers to connect to using DNS queries. The list is encoded as a Merkle tree and deployed in TXT records to a DNS provider according to EIP-1459.

We have previously deployed such a list for the test fleet. Since the release, this is now available for the prod fleet as well.

List to deploy to DNS provider

In zone file format:

; name                        ttl     class type  content
@                             60      IN    TXT   enrtree-root:v1 e=CYNE2WDADG4RP2UTJGMHMQZK6I l=FDXN3SN67NA5DKA4J2GOK7BVQI seq=1 sig=DRT5-3_lQdnSiD3Uw_kapkqyq6W4aimj8nu_4IDSYix8DA4-h318AUDcx2_2aZ0PTT85n75-6LVUkF4rig3fLwA
FDXN3SN67NA5DKA4J2GOK7BVQI    86900   IN    TXT   enrtree-branch:
CYNE2WDADG4RP2UTJGMHMQZK6I    86900   IN    TXT   enrtree-branch:M5XMY2Q5CSBCKPPR6IUAXNDMBE,IJ27VGBIE5LC42G4HNWQW6UR7A,4GZP2NQGNRHCKQ22ZCUFSPJZW4
IJ27VGBIE5LC42G4HNWQW6UR7A    86900   IN    TXT   enr:-IS4QLi9BXG9ysPjAC1ZJfgCW_RuGZXX6oOSG7S93P7ioSDRbzFeZhW0SxmjpTMBRCUFuBDUM88Q9jc1XZi3Qfo4XdEBgmlkgnY0gmlwhLymh5GJc2VjcDI1NmsxoQNuXVf_MkjDcQAka2YcxWfhnbGGlKhRo_kSWwcoK_uGVYN0Y3CCdl8
M5XMY2Q5CSBCKPPR6IUAXNDMBE    86900   IN    TXT   enr:-IS4QD61QfYLaKgo3wwAeNOk23yCGV8ujcix0iYdgJUNj32eWJymXSTTXyL_sIrDpcIImovpVX5dby1Tf71FL12ZyvoBgmlkgnY0gmlwhAjS3ueJc2VjcDI1NmsxoQKNAvlb34B4l68WYt7pg4qKFzBmPRb2I-f3dUOBHku3soN0Y3CCdl8
4GZP2NQGNRHCKQ22ZCUFSPJZW4    86900   IN    TXT   enr:-IS4QECyE7awbeqDqZurmKfl1qHnlkWcCIebx8JcLPiE3cnmKB_sP8n_7QDaNWHNc1inPRk-Zk6H5jVOsBmDPdk0MRsBgmlkgnY0gmlwhCJ5ZGyJc2VjcDI1NmsxoQP99JB4EKn12UYqGuCf7uWrIF0yeYsP_MN5RCAh-Exbv4N0Y3CCdl8

Where to deploy

The list above should be deployed against four different domain names. This is because of a Desktop requirement to distinguish between nodes for each of Waku's four main protocols. Although we currently have the same three prod nodes for all 4 main Waku protocols, it's reasonable to expect these lists to diverge once user-run nodes are added for some protocols.

The chosen (sub)domain names should:

not be associated with vac as a team, but rather indicate waku
differentiate lists for our main protocols: relay, store, filter, lightpush.

My suggestion is therefore to deploy the list at:

relay.waku.nodes.status.im
store.waku.nodes.status.im
filter.waku.nodes.status.im
lightpush.waku.nodes.status.im

I'd like to get @richard-ramos and @oskarth comments here as well about choice of domain.

The future of key management for DNS discovery

Currently this list is signed by a password-protected key on my local system, which is far from ideal, especially if we want to update these lists. We need to deploy the tree creator utility somewhere where it can be initialised with a Status-managed key. This requires some modifications to the existing utility. Tracking issue here.

Deploy websockify with nim-waku nodes in prod fleet

Problem

We intend to use the waku2 prod fleet to dog food the app, to do that with the web chat app, we need to access the nodes via websocket

Proposal

Same than #6 but for prod nodes.

Cannot connect to websocket port.

Getting issues to connect to wakuv2.prod fleet via websocket:

Firefox can’t establish a connection to the server at wss://node-01.do-ams3.wakuv2.prod.statusim.net:8000/p2p/16Uiu2HAmL5okWopX7NqZWBUKVqW8iUxCEmd5GMHLVPwCgzYzQv3e.

 Firefox can’t establish a connection to the server at wss://node-01.do-ams3.wakuv2.prod.statusim.net:8000/p2p/16Uiu2HAmL5okWopX7NqZWBUKVqW8iUxCEmd5GMHLVPwCgzYzQv3e. index.js:42875:19

Can easily be reproduced with https://examples.waku.org/light-js/

Take one of the wss address from https://fleets.status.im/
Enter it in the mutliaddr field (remove double quotes)
Hit dial
check console with F12.

However, the issue does not seem to be Waku/libp2p related as it also fails with websocket command line tool websocat:

▶ websocat -v wss://node-01.do-ams3.wakuv2.prod.statusim.net:8000
[INFO  websocat::lints] Auto-inserting the line mode
[INFO  websocat::sessionserve] Serving Line2Message(Stdio) to Message2Line(WsClient("wss://node-01.do-ams3.wakuv2.prod.statusim.net:8000/")) with Options { websocket_text_mode: true, websocket_protocol: None, websocket_reply_protocol: None, udp_oneshot_mode: false, unidirectional: false, unidirectional_reverse: false, exit_on_eof: false, oneshot: false, unlink_unix_socket: false, exec_args: [], ws_c_uri: "ws://0.0.0.0/", linemode_strip_newlines: false, linemode_strict: false, origin: None, custom_headers: [], custom_reply_headers: [], websocket_version: None, websocket_dont_close: false, one_message: false, no_auto_linemode: false, buffer_size: 65536, broadcast_queue_len: 16, read_debt_handling: Warn, linemode_zero_terminated: false, restrict_uri: None, serve_static_files: [], exec_set_env: false, reuser_send_zero_msg_on_disconnect: false, process_zero_sighup: false, process_exit_sighup: false, socks_destination: None, auto_socks5: None, socks5_bind_script: None, tls_domain: None, tls_insecure: false, headers_to_env: [], max_parallel_conns: None, ws_ping_interval: None, ws_ping_timeout: None }
[INFO  websocat::stdio_peer] get_stdio_peer (async)
[INFO  websocat::stdio_peer] Setting stdin to nonblocking mode
[INFO  websocat::stdio_peer] Installing signal handler
[INFO  websocat::ws_client_peer] get_ws_client_peer
## HANGS HERE FOR A WHILE
[INFO  websocat::stdio_peer] Restoring blocking status for stdin
websocat: WebSocketError: I/O failure
[INFO  websocat::stdio_peer] Restoring blocking status for stdin
websocat: error running

Expected output (working websocket server):

▶ websocat -v ws://ws.vi-server.org/mirror
[INFO  websocat::lints] Auto-inserting the line mode
[INFO  websocat::sessionserve] Serving Line2Message(Stdio) to Message2Line(WsClient("ws://ws.vi-server.org/mirror")) with Options { websocket_text_mode: true, websocket_protocol: None, websocket_reply_protocol: None, udp_oneshot_mode: false, unidirectional: false, unidirectional_reverse: false, exit_on_eof: false, oneshot: false, unlink_unix_socket: false, exec_args: [], ws_c_uri: "ws://0.0.0.0/", linemode_strip_newlines: false, linemode_strict: false, origin: None, custom_headers: [], custom_reply_headers: [], websocket_version: None, websocket_dont_close: false, one_message: false, no_auto_linemode: false, buffer_size: 65536, broadcast_queue_len: 16, read_debt_handling: Warn, linemode_zero_terminated: false, restrict_uri: None, serve_static_files: [], exec_set_env: false, reuser_send_zero_msg_on_disconnect: false, process_zero_sighup: false, process_exit_sighup: false, socks_destination: None, auto_socks5: None, socks5_bind_script: None, tls_domain: None, tls_insecure: false, headers_to_env: [], max_parallel_conns: None, ws_ping_interval: None, ws_ping_timeout: None }
[INFO  websocat::stdio_peer] get_stdio_peer (async)
[INFO  websocat::stdio_peer] Setting stdin to nonblocking mode
[INFO  websocat::stdio_peer] Installing signal handler
[INFO  websocat::ws_client_peer] get_ws_client_peer
[INFO  websocat::ws_client_peer] Connected to ws

Issues with chat2bridge deployment

The chat2bridge deployment is currently not automated via ansible but there are a few issues that need to be addressed:

Hanno wrote in the discord #waku-infra chat:

There are a couple of strange things that I'm noticing on the prod fleet. Perhaps you can help or point me in the right direction?

I cannot view logs for any service on node-01.do-ams3.wakuv2.prod after June 9 on Kibana.
The other two nodes seem to be trying to reach the peerId for node-01.do-ams3.wakuv2.prod at the wire address for node-01.ac-cn-hongkong-c.wakuv2.prod (/ip4/8.210.222.231/tcp/30303). Perhaps some earlier inconsistency in the connect.yml script? Since these get persisted, we may need to drop the Peer table on each of the prod nodes, before running the connect-script again.
It seems possible that node-01.do-ams3.wakuv2.prod was not updated with the other two nodes after deploying to prod. This is based on the other two nodes complaining that it does not support the same protocols as them. (Though this could just be a side-effect of the issue (2) above.)
websockify (on all hosts) log frequent errors: (e.g. "Connection reset", "no shared cipher", "bad syntax", "Method not allowed")

DNS4 domain name for `wakuv2.test` fleet

Background

We recently added the ability to configure a nim-waku node with an (ipv4) domain name.
This allows for the node's publically announced multiaddrs to use the /dns4/ scheme. In addition, nodes with domain name and secure websocket configured, will generate a discoverable ENR containing the wss multiaddr with /dns4/ domain name. This is necessary to verify domain certificates when connecting to this node over secure websocket and is a prerequisite for addressing #38

Required change

For the wakuv2.test fleet:
Add config option
--dns4-domain-name=<node domain>

e.g. the effective config option for the node on node-01.do-ams3.wakuv2.test should be:
--dns4-domain-name=node-01.do-ams3.wakuv2.test.statusim.net

This option is not yet available on the prod fleet, but will be after the release: waku-org/nwaku#870. If we encounter no issues with this config on wakuv2.test we can apply the same to wakuv2.prod after the release.

More documentation at https://github.com/status-im/nim-waku/tree/master/waku/v2#configuring-a-domain-name

Static sharding & Postgres

Hi,

@Ivansete-status and me are in need of a fleet for static sharding and the new DB.

Configuring nwaku for static shard is just a matter of changing the pubsub topics but for the DB I'm not sure, I'll let Ivan explain.

We seek guidance on this, O great infra master :P

I was thinking of copying waku2.test then modify it for our needs? What else do we need?

Deploy `chat2bridge` to Waku v2 `prod` fleet

Problem

To facilitate Waku v2 dogfooding, nim-waku now has a chat2bridge target that bridges messages between matterbridge and Waku v2 (specifically the chat2 app used on the wakuv2 fleet nodes). This should be set up to create a bridge between the wakuv2.prod fleet and the #waku channel on Discord.

Acceptance criteria

chat2bridge deployed and connected to wakuv2.prod fleet
matterbridge deployed and configured to bridge messages between the API and the #waku channel on the Discord Status Server
chat2bridge successfully pointed to the matterbridge API

Instructions

There are step-by-step instructions here.
Note that, since chat2bridge encompasses a wakunode2, it can be configured with the same parameters as the other v2 fleet nodes. The only added config options are:

--mb-host-address         Listening address of the Matterbridge host
--mb-host-port            Listening port of the Matterbridge host
--mb-gateway              Matterbridge gateway

Also note that the last step is to connect the chat2bridge to the wakuv2.prod fleet. Using something like this script is recommended.

Related issues:

blocks waku-org/nwaku#404

Deploy `chat2bridge` off latest `master`

Relates to #8

It is necessary to update the chat2bridge image to run the latest nim-waku master (we've made some important changes). Because of the way prod fleets currently connect to chat2bridge only on startup, this upgrade has to be done before the prod fleet can be upgraded with the latest changes.

enrtree's multiaddrs do not match SSL cert

Regarding the enr tree stored at test.waku.nodes.status.im.

The values stored in the multiaddrs field of the various ENR records are as follow:

/ip4/134.209.139.210/tcp/8000/wss
/ip4/104.154.239.128/tcp/8000/wss
/ip4/47.242.210.73/tcp/8000/wss

However, the SSL certificates used are letsencrypt certificates

infra-nim-waku/ansible/group_vars/wakuv2-test.yml

Line 27 in e6fad0f

 nim_waku_websocket_ssl_cert: '/etc/letsencrypt/live/{{ nim_waku_websocket_domain }}/fullchain.pem' 

That have for Subject Alt Names the fqdn of the host:

DNS Name: node-01.ac-cn-hongkong-c.wakuv2.test.statusim.net

Hence, the certificates are not valid for the given multiaddresses as the certificates do not certify the nodes' ip (contains in the multiaddr) are not.

Proposed solution

Update the ENR tree to contains dns4 multiaddresses in the multiaddrs field.

Note: Only js-waku should be using the websocket multiaddrs so the lack of dns4 support of nim-waku should not block this issue (afaik).

Cc @jm-clius

Deploy updated Waku v2 `test` nodes list to DNS provider

Background

This issue relates to #29.
We previously deployed a discoverable list for the wakuv2.test fleet to a DNS provider, following EIP-1459. Since then:

the format of the deployed ENRs have evolved to include more information, including wss addresses. This follows RFC 31.
we have decided to move away from a Vac-specific domain (test.nodes.vac.dev) to test.waku.nodes.status.im

This necessitates deploying an updated list for the test fleet to the new domain. Once the latest release has been installed on prod, a list will be deployed for that too (tracked in #29, blocked by waku-org/nwaku#808).

List to deploy to DNS provider

In zone file format:

; name                        ttl     class type  content
@                             60      IN    TXT   enrtree-root:v1 e=SLJBDRNPBTM7X5RSO4WH25IDJY l=FDXN3SN67NA5DKA4J2GOK7BVQI seq=1 sig=zQ0_i4YO91B6wJ9doGoOLKb-V-stM8RdZAmnqSFtwKA8NAx6IbK-k-rRR3vrC5xzDDEFDUYCpC5--OvhKHgr8gA
FDXN3SN67NA5DKA4J2GOK7BVQI    86900   IN    TXT   enrtree-branch:
SLJBDRNPBTM7X5RSO4WH25IDJY    86900   IN    TXT   enrtree-branch:3JSK24BF37BTAFHD4PAHCWBDOY,LECZMNF5HMUHARM32YUF7RH3OE,26XUO2RSK7DWIGSTCBP4DQAH5Y
3JSK24BF37BTAFHD4PAHCWBDOY    86900   IN    TXT   enr:-KO4QBlzNVtUWMCNvsMkslFNSpa67cDaLXrXkMdGRddToD3jcRfEEi345NV6h56-rXO3C8oMxZ44UaZfJL10Tm7U8vMBgmlkgnY0gmlwhIbRi9KKbXVsdGlhZGRyc4wACgSG0YvSBh9A3gOJc2VjcDI1NmsxoQOevTdO6jvv3fRruxguKR-3Ge4bcFsLeAIWEDjrfaigNoN0Y3CCdl-Fd2FrdTIP
LECZMNF5HMUHARM32YUF7RH3OE    86900   IN    TXT   enr:-KO4QBzUhElxiM-NSjSA1B4iEkFw3_SZj-Q18jPfvruKTLy9Na3VgbUwVMFOXcvMylAB_2C6uesZJSHm99FbFqir6-0BgmlkgnY0gmlwhGia74CKbXVsdGlhZGRyc4wACgRomu-ABh9A3gOJc2VjcDI1NmsxoQNYImLJllfa9siWWAIjBhyX4y5S3qgLkwKJFBjzf_6ZMYN0Y3CCdl-Fd2FrdTIP
26XUO2RSK7DWIGSTCBP4DQAH5Y    86900   IN    TXT   enr:-KO4QFi0j_vwI2p4j9lPn_mCEPCx3G32ZQt1uTd8aql6chBncFQqbu5m6sW18mwpidE_Fn1U_5RrtWmqwWpVRicH9TIBgmlkgnY0gmlwhC_y0kmKbXVsdGlhZGRyc4wACgQv8tJJBh9A3gOJc2VjcDI1NmsxoQIQJvj-KupljjSnBp2yz1PFl2SeELNwMSG68nL3Oz1SrIN0Y3CCdl-Fd2FrdTIP

Where to deploy

test.waku.nodes.status.im, if possible

cc @D4nte @richard-ramos - the list contains the updated ENRs for the test fleet, including the waku2 and multiaddrs fields. Note the proposed change in domain. The tree can still be accessed and verified using the previous public key, i.e. AOFTICU2XWDULNLZGRMQS4RIZPAZEHYMV4FYHAPW563HNRAOERP7C@test.waku.nodes.status.im

Missing Swap Protocol Logs on Kibana

Currently, logs from the Swap protocol are not displayed on Kibana.

Solution:
The solution to this is to enable Swap in the docker-compose.yml file, as stated here:
#573

Deploy websockify with nim-waku nodes in test fleet

Problem

nim-waku does not support secure websocket connectivity.
js-waku in the browser only supports secure websocket connections

Solution

Get certificates using letsencrypt
Deploy and run websockify to proxy nim-waku tcp connections and wrap them with SSL+Websocket

Steps

For each wakunode2 binary deployed in the fleet, considering wakunode2 is deployed on <testmachine.statusim.net>, listening on tcp port <tcp_port>, on the same machine:

Install certbot:

sudo apt install certbot

Get certificates:

sudo certbot certonly -d <testmachine.statusim.net>

Install websockify

sudo apt install websockify

Start websockify

sudo websockify --cert /etc/letsencrypt/live/<testmachine.statusim.net>/fullchain.pem --key /etc/letsencrypt/live/<testmachine.statusim.net>/privkey.pem 0.0.0.0:443 127.0.0.1:<tcp_port>

Value should be: /dns4/<testmachine.statusim.net>/tcp/443/wss/p2p/<peer-id>, with the same peer-id value used in the current JSON file.

Notes:

a. If the file system is preserved between deployment, 1, 2 & 3 may not be necessary.
b. Not sure what will be the best way to embed this new value in https://fleets.status.im/
c. Once this is confirmed as working as expected on the test fleet, I'll open an issue for a similar setup on the prod fleet.
d. When running certbot, it detects webserver present and proposes several methods to action the challenge.

Cc @oskarth @jm-clius

Limit per IP websocket connections on wakuv2 fleets to 20

Part of investigation into waku-org/pm#23.

As mitigating factor for nodes running out of open connections, we want to limit the number of connections allowed per IP to the websocket port (8000) to 20 in the IP tables for hosts on wakuv2.prod and wakuv2,test.

Deploy updated Waku v2 nodes lists to DNS provider

Background

We previously deployed discoverable lists for the wakuv2 test and prod fleets to a DNS provider.
The records for each node have now been updated with a domain name in order to match the SSL certificate on the corresponding host and the lists should be updated accordingly.
This fixes #38.

List to deploy to DNS provider:

wakuv2.prod

To deploy to prod.waku.nodes.status.im:

; name                        ttl     class type  content
@                             60      IN    TXT   enrtree-root:v1 e=6H46RFQ6PMETFLWGHNEXDQWHWA l=FDXN3SN67NA5DKA4J2GOK7BVQI seq=1 sig=MrAgKpfEeYbXCRv5p6MPPYqEgEpmhy_NlXbCUlXByxtAbgdR-py1HFYVcnEeTYjjca7U_Jy_hPUAA-psp_dftwA
FDXN3SN67NA5DKA4J2GOK7BVQI    86900   IN    TXT   enrtree-branch:
6H46RFQ6PMETFLWGHNEXDQWHWA    86900   IN    TXT   enrtree-branch:DQFS7TZNID2772ZTMP3OHCTFQA,RT5J5PK55RQN3GSELSH42DK5GQ,G2CY6SUSGLDHHRWNEVXHNJPRTA
DQFS7TZNID2772ZTMP3OHCTFQA    86900   IN    TXT   enr:-NK4QHk-qAIQIDDjQmtKYZJmYikTgechWNeT9Tl-tCaKsImVL-_i4p54tYUW8KBjA4F3T97A5F6Ecq8brC4vhKaXRQ4BgmlkgnY0gmlwhCJ5ZGyKbXVsdGlhZGRyc7g6ADg2MW5vZGUtMDEuZ2MtdXMtY2VudHJhbDEtYS53YWt1djIucHJvZC5zdGF0dXNpbS5uZXQGH0DeA4lzZWNwMjU2azGhA_30kHgQqfXZRioa4J_u5asgXTJ5iw_8w3lEICH4TFu_g3RjcIJ2X4V3YWt1Mg8
RT5J5PK55RQN3GSELSH42DK5GQ    86900   IN    TXT   enr:-NK4QGWuLwNtQFSGUx_r_1ORLdZXCnSKX-dD1bO2QUGvz7LceTW4I4iBxrstbk7_9a7BdY5kbXr-YlhySHkEMkvNPqoBgmlkgnY0gmlwhAjS3ueKbXVsdGlhZGRyc7g6ADg2MW5vZGUtMDEuYWMtY24taG9uZ2tvbmctYy53YWt1djIucHJvZC5zdGF0dXNpbS5uZXQGH0DeA4lzZWNwMjU2azGhAo0C-VvfgHiXrxZi3umDiooXMGY9FvYj5_d1Q4EeS7eyg3RjcIJ2X4V3YWt1Mg8
G2CY6SUSGLDHHRWNEVXHNJPRTA    86900   IN    TXT   enr:-Mi4QLr6Xdnrzp3imebRPr8LXuUKWablTwnXLzxGi5Q47-lYMwf4zPeGcfLy9IXXdT-PwXXv96eBS8ZunmaFlUKJ81cBgmlkgnY0gmlwhLymh5GKbXVsdGlhZGRyc7EALzYobm9kZS0wMS5kby1hbXMzLndha3V2Mi5wcm9kLnN0YXR1c2ltLm5ldAYfQN4DiXNlY3AyNTZrMaEDbl1X_zJIw3EAJGtmHMVn4Z2xhpSoUaP5ElsHKCv7hlWDdGNwgnZfhXdha3UyDw

wakuv2.test

To deploy to test.waku.nodes.status.im:

; name                        ttl     class type  content
@                             60      IN    TXT   enrtree-root:v1 e=ASH2KK77DJ4HFEGIVQUUAOZD3Q l=FDXN3SN67NA5DKA4J2GOK7BVQI seq=1 sig=jb_mNdzqWiy6N496NjSQMz6R2Pt3DqqIc5o4_vZGIFxfYHeHhMejLj3fMRJfS_thWbXPftPVQmDxpCrFGZh-IwA
FDXN3SN67NA5DKA4J2GOK7BVQI    86900   IN    TXT   enrtree-branch:
ASH2KK77DJ4HFEGIVQUUAOZD3Q    86900   IN    TXT   enrtree-branch:XDJGBTC2TZPTCROQFOAFUO6L5E,HEBEZQWP7ZIDNVPHJXN4P7QADA,GNW657LRZJA7COEYYY4NYJ6TLU
XDJGBTC2TZPTCROQFOAFUO6L5E    86900   IN    TXT   enr:-NK4QP9Nb0sPOEZ85EFsm7brEHemAtQz0zRUlHH68E2zWtiWPbbVi7aAUA8Swe_CIFsHLQIYr8mnn9keO65Pw9ucwwwBgmlkgnY0gmlwhC_y0kmKbXVsdGlhZGRyc7g6ADg2MW5vZGUtMDEuYWMtY24taG9uZ2tvbmctYy53YWt1djIudGVzdC5zdGF0dXNpbS5uZXQGH0DeA4lzZWNwMjU2azGhAhAm-P4q6mWONKcGnbLPU8WXZJ4Qs3AxIbrycvc7PVKsg3RjcIJ2X4V3YWt1Mg8
HEBEZQWP7ZIDNVPHJXN4P7QADA    86900   IN    TXT   enr:-NK4QGYl1WKxTYL3qRf_LHgvpwuOA7eUX4ZBmppDTG7pBCL4VgZiXO-J5_Dai25Mt71T--oo7NfVid5T4C1ebMdOAP4BgmlkgnY0gmlwhGia74CKbXVsdGlhZGRyc7g6ADg2MW5vZGUtMDEuZ2MtdXMtY2VudHJhbDEtYS53YWt1djIudGVzdC5zdGF0dXNpbS5uZXQGH0DeA4lzZWNwMjU2azGhA1giYsmWV9r2yJZYAiMGHJfjLlLeqAuTAokUGPN__pkxg3RjcIJ2X4V3YWt1Mg8
GNW657LRZJA7COEYYY4NYJ6TLU    86900   IN    TXT   enr:-Mi4QNgb4XSvOmmjvzOpnmxZRffvqBBpND0mkoWWbacUPLRLeFvM9R07Fyjbw7lYFozdicN_QITRPXmOnUeJlR9zzs4BgmlkgnY0gmlwhIbRi9KKbXVsdGlhZGRyc7EALzYobm9kZS0wMS5kby1hbXMzLndha3V2Mi50ZXN0LnN0YXR1c2ltLm5ldAYfQN4DiXNlY3AyNTZrMaEDnr03Tuo77930a7sYLikftxnuG3BbC3gCFhA4632ooDaDdGNwgnZfhXdha3UyDw

These lists must replace the existing entries at those domains.

The trees will still be accessible and verifiable with the previously used public keys, namely:
ANTL4SLG2COUILKAPE7EF2BYNL2SHSHVCHLRD5J7ZJLN5R3PRJD2Y@prod.waku.nodes.status.im
AOFTICU2XWDULNLZGRMQS4RIZPAZEHYMV4FYHAPW563HNRAOERP7C@test.waku.nodes.status.im

cc @D4nte @richard-ramos

Renaming of Nim-Waku V2 fleets for consistency

I have set up a new infra-go-waku fleet as requested by @richard-ramos, configured in the same manner as infra-nim-waku.

The small issue with that is that the current naming of hosts and fleets does not include the language used, which is confusing.

Currently the fleet names are wakuv2.{test.prod}, and my idea is to rename those to just nim-waku.{test.prod} in the same schema as go-waku.{test,prod}. I want to drop the v2 since Waku V1 is in maintenance mode essentially and we mostly use or talk about V2, so it seems not useful and would improve brevity of naming. Of course we would have a transitory period where both old and new naming schemes are available under https://fleets.status.im/.

I'd like some feedback from @richard-ramos, @oskarth and @jm-clius.

Deploy Waku v2 node Merkle tree to DNS provider

Background

Waku v2 has recently integrated a POC version of node discovery via DNS. This allows new nodes to discover a bootstrap list of Waku v2 peers to connect to using DNS queries. The list is encoded as a Merkle tree and deployed in TXT records to a DNS provider according to EIP-1459.

Deployment to DNS provider for `test` fleet POC

In order to create a POC version of DNS discovery for Waku v2, the wakuv2.test nodes were encoded as a Merkle tree, resulting in the following subdomains and TXT records.

In zone file format:

; name                        ttl     class type  content
@                             60      IN    TXT   enrtree-root:v1 e=XLH4Z6DFDGBT36MDANNCQH3EUM l=FDXN3SN67NA5DKA4J2GOK7BVQI seq=1 sig=3fwyT4idon8DaQwRaORQ-y00faWQk8_TEzzn4DS-30ZTG0uieGDR-WFu3BJF6UAtc8Hpz8iVhVyEXLgwQYaUzwA
FDXN3SN67NA5DKA4J2GOK7BVQI    86900   IN    TXT   enrtree-branch:
XLH4Z6DFDGBT36MDANNCQH3EUM    86900   IN    TXT   enrtree-branch:37W35K6WDTQZJYYHCEUHJUT7OM,LX3OHTPYYKHLGOX45A4ZHI73XU,PPIJVI5OG6T5HO7CQK26AA3DGA
PPIJVI5OG6T5HO7CQK26AA3DGA    86900   IN    TXT   enr:-IS4QCdt9fr5dayFCBx9FYCfJIiHR89iVAdD9GHOcV1fglAdebrRmYRcgJw4un8kf5YCmm7Trc2uGamp3N1IRQilZ4IBgmlkgnY0gmlwhC_y0kmJc2VjcDI1NmsxoQIQJvj-KupljjSnBp2yz1PFl2SeELNwMSG68nL3Oz1SrIN0Y3CCdl8
37W35K6WDTQZJYYHCEUHJUT7OM    86900   IN    TXT   enr:-IS4QAmC_o1PMi5DbR4Bh4oHVyQunZblg4bTaottPtBodAhJZvxVlWW-4rXITPNg4mwJ8cW__D9FBDc9N4mdhyMqB-EBgmlkgnY0gmlwhIbRi9KJc2VjcDI1NmsxoQOevTdO6jvv3fRruxguKR-3Ge4bcFsLeAIWEDjrfaigNoN0Y3CCdl8
LX3OHTPYYKHLGOX45A4ZHI73XU    86900   IN    TXT   enr:-IS4QBGb7cP-S3aD8y7a7ANcbD8o-kZp9L51Erh1wlbB5R8WCVtU5PmYN71fxRJ3qO7wHNdR-HQyjdAhXh2GIVpge8kBgmlkgnY0gmlwhGia74CJc2VjcDI1NmsxoQNYImLJllfa9siWWAIjBhyX4y5S3qgLkwKJFBjzf_6ZMYN0Y3CCdl8

These records should be deployed against the domain and subdomain names as indicated. Since this is for POC purposes, this can be done for any reasonable domain (though it will be great if the origin can be something like nodes.vac.dev). From the root record, the entries at each subdomain (e.g. PPIJVI5OG6T5HO7CQK26AA3DGA.nodes.vac.dev) are iteratively resolved by DNS discovery clients.

Recurrent disk limitation in wakuv2.prod fleet

Problem

The node of the wakuv2.prod fleet are often getting out of disk space.

Current configuration

The fleet is configure to have a 30 day retention policy.

Observations

Before last cleanup, the database take ~32GB for 6.13 Millions messages.
In 5 days after the deletion, the database takes 6.8GB for 1.315 M messages.

The message size is similar in both case (5.16 KB/msg and ~5.2 KB/msg), no deletion was done by waku before the cleanup.
With disk of 40 GB, each with this size per message, we can store around 28 / 29 days of messages (~7.5 Millions messages)
Therefore adding the vacuum for this stage won't solve the issue if we keep the limitation at 30 days.

Note:

In sqlite, deleting data doesn't free disk space, a vacuum has to be done.

To performe a Vacuum in sqlite, a significant portion of the disk need to be available.

Waku Vacuum is done only when the node is starting.

Solution ideas

Lower the max message number (either by reducing the retention period or by limiting the max number of messages) and adding a regular vacuum to the db.
Increase the disk size and add a regular vacuum to free the space of deleted message.

publish fleet ENRs and entree:// URL

Now that nwaku supports discv5 (v0.8) and soon discv5 DNS bootstrap (v0.9), it would be nice to have ENRs and an entree:// URL for the fleet published in the fleets json.

cc @richard-ramos @jm-clius

Websocket connection issue

See log below, time is AEST (UTC+10)

15:21:04.961 Firefox can’t establish a connection to the server at wss://node-01.gc-us-central1-a.wakuv2.prod.statusim.net/p2p/16Uiu2HAmVkKntsECaYfefR1V2yCR79CegLATuTPE6B9TxgxBiiiA.

It seems to be at websocket level (as it's a Firefox error, not a JS one)

launch a node in Singapore instead of HK (Asia)

in reference to waku-org/js-waku#1465 (comment), it seems a better option to launch a node in Singapore instead of Hong Kong for Asia

Infra to run waku-simulator on latest nwaku master

In order to detect potential issues as soon as possible in nwaku we would need an instance of waku-simulator deployed with the latest nwaku master commit, so that every time we merge a new PR to nwaku, waku-simulator tool is redeployed with that image, so we can monitor if we are introducing any issues (specially related to networking or performance in general).

waku-simulator allows to easily:

Create a network with an arbitrary amount of nwaku nodes (max 250)
Automatically inject gosipsub traffic into the network with some configurable parameters.
Monitor said network with an already provisioned grafana dashboard.

What would we need?

Some infra to run waku-simulator
Redeploy the setup on every new commit to nwaku master. Unsure if this requires changes in nwaku CI, or perhaps it can be auto detected?
A static IP with port :3000 open so that we can visualize the metrics.

Important notes:

The amount of metrics that waku-simulator generates is quite high, this is why we provide our custom grafana/prometheus instance. I would suggest to not "index" these metrics in status infra. To avoid using too much diskspace, prometheus retention time is set to 7 days.
Nodes are running with a simple configuration, where each one should use < 100Mb. A machine with 64Gb should be enough.
Diskspace usage shouldn't be very high, store protocol is not used so the only data that is stored is the prometheus metrics.

TLDR:

We would need some infra to run waku-simulator so that ~~every time we merge a PR to nwaku master, the following is executed.~~ once a day, it deploys the latest nightly nwaku release see

This is the repo

git clone https://github.com/waku-org/waku-simulator.git
cd waku-simulator

And only LATEST_MASTER_PLACEHOLDER should be updated.

export NWAKU_IMAGE=statusteam/nim-waku:LATEST_MASTER_PLACEHOLDER
export NUM_NWAKU_NODES=100
export GOWAKU_IMAGE=statusteam/go-waku:v0.7.0
export NUM_GOWAKU_NODES=0
export MSG_PER_SECOND=10
export MSG_SIZE_KBYTES=10
docker-compose up -d

And then have the already provisioned dashboard available at ip:3000.

cc @jakubgs

Decommission wakuv2 fleets

After discussion in Doha we've come to a conclusion it makes no sense to migrate the wakuv2.* fleets to use Postgres if we already have waku.test that uses it correctly. We have an overabundance of fleets and this chaos needs to be curtailed.

The wakuv2.test fleet is not depended upon the same way as wakuv2.prod is so it should not require a grace period.
In order to decommission wakuv2.prod extra caution will be necessary, due to it's usage in documentation and by 3rd parties.
Only the waku.sandbox fleet should be referenced in public-facing documentation in lieu of wakuv2.prod.

Prod host on Google Cloud not responding

The node-01.gc-us-central1-a.wakuv2.prod host appears to be down, which was detected by failing waku-peers script runs:

Exception: RPC Error: failed to dial 16Uiu2HAmVkKntsECaYfefR1V2yCR79CegLATuTPE6B9TxgxBiiiA:
 * [/ip4/34.121.100.108/tcp/30303] dial tcp4 0.0.0.0:30303->34.121.100.108:30303: i/o timeout

I've seen this before but it seems to be getting more frequent. The host just stopped responding around 01:06:00 UTC:

Monitor websocket

We noticed issues with websockify that we were not able to resolve at this time.
Because websockify can have issues accepting secure websocket connections, it means that the TCP monitoring is not sufficient.

Ultimately, websockify should be sunset in favor of native websocket support in wakunode2, tracked with waku-org/nwaku#434.

It is possible to use curl to test a secure websocket connection. However, the result is not great for programmatic parsing:

▶ curl \
    --include \
    --no-buffer \
    --http1.1 \
    --header "Connection: Upgrade" \
    --header "Upgrade: websocket" \
    --header "Host: https://node-01.ac-cn-hongkong-c.wakuv2.test.statusim.net:443" \
    --header "Origin: https://node-01.ac-cn-hongkong-c.wakuv2.test.statusim.net:443" \
    --header "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
    --header "Sec-WebSocket-Version: 13" \
    https://node-01.ac-cn-hongkong-c.wakuv2.test.statusim.net:443/
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: qGEgH3En71di5rrssAZTmtRTyFk=

�/multistream/1.0.0
^C

SIGINT needs to be sent to stop the curl command
if multistream is present in the output then it means that the websocket connection was successful (libp2p protocol negotiation started).

If this is not suitable, utility such as websocat could be used instead.

Memory issues on `ac-cn-hongkong-c.wakuv2.prod` host

On 2021-10-08 starting around 07:20 UTC the node-01.ac-cn-hongkong-c.wakuv2.prod host started having memory issues:

There was also a few major CPU usage spikes:

It appears this coincides with a major traffic spike:

Which caused a spike in orphaned sockets:

This did not subside until I restarted the host.

Nim-Waku Clusters

Stable Cluster

This will be the first cluster, we will release to this cluster whenever a new tag is pushed indicating a stable release of WakuV2. This cluster should contain 3 nodes in geographically different regions so it is similar to the production clusters of what status is currently running.

Testing Cluster

This cluster will be trailing the master branch. It is the less stable testing cluster of WakuV2. The geographical location of this cluster is probably not as important as in the stable cluster, but there should also be multiple nodes.

DNS4 domain names for `wakuv2.prod` fleet

Background

We added dns4 domain name config for the wakuv2.test fleet in #41. This option is now available for the wakuv2.prod fleet as well and must be added to the configuration.
This allows for the node's publically announced multiaddrs to use the /dns4/ scheme.

Required change

For the wakuv2.prod fleet:
Add config option
--dns4-domain-name=<node domain>

e.g. the effective config option for the node on node-01.ac-cn-hongkong-c.wakuv2.prod should be:
--dns4-domain-name=node-01.ac-cn-hongkong-c.wakuv2.prod.statusim.net

More documentation at https://github.com/status-im/nim-waku/tree/master/waku/v2#configuring-a-domain-name

Connecting waku v2 test fleets to an Ethereum node on Goerli testnet

Problem

As part of waku-rln-relay protocol, waku v2 nodes are supposed to listen to some events emitted from a smart contract deployed on Goerli testnet and perform some operations internally. The necessary code for this is already implemented that is right now waku v2 nides can accept the address of a node on Goerli testnet via a dedicated config parameter i.e., eth-client-address which is the websocket address of a node (e.g., Infura). Once passed, the waku node makes the necessary connections for event subscription and further communication.

The current issue is to explore how it should be done on waku test fleets. My understanding is that 1) we need to have the address of an Ethereum node on Goerli testnet to which waku test fleets can connect 2) the corresponding config option eth-client-address should be added to https://github.com/status-im/infra-nim-waku/blob/master/ansible/group_vars/wakuv2-test.yml#L45 similar to other rln-relay parameters.

ERROR gowaku.discv5 discv5/discover.go:252 obtaining peer info from enode

I noticed the error below in console logs when using Status desktop app.

According to @jm-clius, the issue here seems to be that gowaku attempted to parse this into an ip4 multiaddr rather than simply discard it.

2023-05-30T10:36:45.118+0300    WARN    peerstore    pstoremem/addr_book.go:248    Was passed p2p address with a different peerId. found: 16Uiu2HAmFy8BrJhCEmCYrUfBdSNkrPw6VHExtv4rRp1DSBnCPgx8, expected: 16Uiu2HAmVTJAnf519TCwucVAuM5g5sGnrQFBhT9GF6ttZnuzDzka

2023-05-30T10:10:25.748+0300    ERROR    gowaku.discv5    discv5/discover.go:252    obtaining peer info from enode    {"enr": "enr:-KG4QBLYUYZv3s6rFuPAiP-keY7ubolEqLuaPiPJ5YFEH1itAIdQiIerz13zvRzf5OZW7M7oUFp7l33T9ihvurAVwMwBgmlkgnY0g2lwNpAkAUkAHLkEu8KDTInhNThciXNlY3AyNTZrMaED4B5bJ27C5xfXCd5F1FRE7mqtf11XQiouHlofluLMYgmEdGNwNoLqYIR1ZHA2giMohXdha3UyAQ", "error": "failed to parse multiaddr \"/ip4/2401:4900:1cb9:4bb:c283:4c89:e135:385c/tcp/0/p2p/16Uiu2HAmTjrXEE1HqVZ37pDFw7ckhefzYqDdifEVWQJSHTRbpekY\": invalid value \"2401:4900:1cb9:4bb:c283:4c89:e135:385c\" for protocol ip4: failed to parse ip4 addr: 2401:4900:1cb9:4bb:c283:4c89:e135:385c"}

Set `user_version` on fleet SQLite DBs

Background

During Waku v2 dogfooding, @richard-ramos discovered that the nim-waku fleet nodes' SQLite DBs indicate PRAGMA user_version== 2, even though their Message table schema still correspond to user_version==1.

After some investigation, we determined that this was due to a failed DB migration that appeared as a success and hence resulted in the DB user_version being bumped from 1 to 2. I've merged a fix that will prevent this user_version inconsistency. The underlying cause was that the node did not find any migration scripts to run during the upgrade. It's still unclear why this happened, and any effort to replicate locally has failed so far.

Suggested steps to fix or debug:

Choose one node from the nim-waku test fleet and manually revert the user_version to 1:

sqlite> PRAGMA user_version = 1;

Restart that node. This will force the node to attempt to re-migrate the DB. If it fails, we should now have better logging to try to debug the underlying cause. If it is successful, we can follow the same process to fix the schema consistency on the remaining prod or test nodes.

CPU and memory usage spikes on hosts

I've been seeing some big spikes in resource usage on some hosts that cause them to become inaccessible:

In this case it's node-01.gc-us-central1-a.wakuv2.prod, but I've seen it on other hosts.

published ENRs do not contain the UDP port

The ENRs made available in #42 do not contain the UDP port.
The cause might be that discv5 is not enabled on the fleet nodes. If --discv5-discovery=false, which is the default, only DNS related ENRs are available.

Currently, nwaku manages two separate ENRs, see waku-org/nwaku#915
The json RPC call returns the discv5 related ENR; only if discv5 is disabled, it will return the DNS related ENR.

cc @jakubgs @jm-clius

(Further, only the jenkins pipeline for the status-test fleet has the -d:discv5_protocol_id:d5waku. This unrelated to this issue, but necessary for Waku discv5 operation. The other fleets would need that parameter set, too)

Prod fleet slows down

The following is observed on the wakuv2.prod fleet (2022-05-16 8 UTC):

CPU Usage on node-01.ac-cn-hongkong-c.wakuv2.prod and node-01.gc-us-central1-a.wakuv2.prod has increased to 30% and 60% respectively.
Connections to either these two nodes are either slow or fails after a timeout
No visible increase in memory usage or evidence of swapping (yet). Also no obvious correlation with increase in traffic or connections. Will investigate this further using more sources/metrics/logs.
node-01.do-ams3.wakuv2.prod remains seemingly unaffected (for now)

status-im / infra-nim-waku Goto Github PK

infra-nim-waku's People

Contributors

Stargazers

Watchers

Forkers

infra-nim-waku's Issues

Logging/Kibana issues Resolved

Connectivity to node-01.do-ams3.wakuv2.prod Resolved

Possible websockify issues

Problem

What should be done?

Background

List to deploy to DNS provider

Where to deploy

The future of key management for DNS discovery

Problem

Proposal

Background

Required change

Problem

Acceptance criteria

Instructions

Related issues:

Proposed solution

Background

List to deploy to DNS provider

Where to deploy

Problem

Solution

Steps

Background

List to deploy to DNS provider:

wakuv2.prod

wakuv2.test

Background

Deployment to DNS provider for test fleet POC

Problem

Current configuration

Observations

Solution ideas

Stable Cluster

Testing Cluster

Background

Required change

Problem

Background

Suggested steps to fix or debug:

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Connectivity to `node-01.do-ams3.wakuv2.prod` Resolved

Possible `websockify` issues

Deployment to DNS provider for `test` fleet POC