GithubHelp home page GithubHelp logo

Comments (22)

kase113 avatar kase113 commented on May 21, 2024

I don't know if there is a problem with the way I run drand, please help me

from drand.

CluEleSsUK avatar CluEleSsUK commented on May 21, 2024

Does it work if you pass the --tls-disable flag to the leader's share command? I would expect a different error but that's the first thing that stands out

from drand.

CluEleSsUK avatar CluEleSsUK commented on May 21, 2024

Also there could be subtle timing issues here - given you & a few of the commands and don't check the status of the ping command, nodes might not be up and running when you issue the first share command.
Do you have any logs from the leader node?

from drand.

kase113 avatar kase113 commented on May 21, 2024

--tls-disableflag传给leader的share命令行不行?我会期待一个不同的错误,但这是突出的第一件事

Of course, this is an important question, but I missed it. But even if I add --tls-disable flag in the leader share, the problem still occurs.Did not successfully print the leader log

from drand.

kase113 avatar kase113 commented on May 21, 2024

I add a sleep(1) before I ping the node, and a new problem occurs:

2023-04-27T21:57:40.771+0800    DEBUG   0.0.0.0:61200.default.4 beacon/node.go:346         {"beacon_loop": "new_round", "round": 1499, "lastbeacon": 894}
2023-04-27T21:57:40.772+0800    DEBUG   0.0.0.0:61200.default.4 beacon/node.go:427         {"broadcast_partial": 895, "prev_sig": "96513c", "msg_sign": "f19070"}
2023-04-27T21:57:40.772+0800    DEBUG   0.0.0.0:61200.default.4 beacon/node.go:359         {"beacon_loop": "run_sync_catchup", "last_is": "{ round: 894, sig: 96513c, prevSig: 88fc4d }", "should_be": 1499}
2023-04-27T21:57:40.772+0800    DEBUG   0.0.0.0:61200.default.4.SyncManagerbeacon/sync_manager.go:300      starting new sync       {"sync_manager": "start sync", "up_to": 1499, "nodes": "[ 127.0.0.1:36647 - 127.0.0.1:35151 - 127.0.0.1:42071 - 127.0.0.1:40255 - 127.0.0.1:34347 - 127.0.0.1:37309 ]"}
2023-04-27T21:57:40.772+0800    ERROR   0.0.0.0:61200.default.4.SyncManager.tryNode        beacon/sync_manager.go:361      unable_to_sync  {"with_peer": "127.0.0.1:42071", "err": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:42071: connect: connection refused\""}
2023-04-27T21:57:40.772+0800    DEBUG   0.0.0.0:61200.default.4.SyncManagerbeacon/sync_manager.go:305      skipping sync with our own node {"sync_manager": "sync"}
2023-04-27T21:57:40.773+0800    ERROR   0.0.0.0:61200.default.4.SyncManager.tryNode        beacon/sync_manager.go:361      unable_to_sync  {"with_peer": "127.0.0.1:37309", "err": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:37309: connect: connection refused\""}
2023-04-27T21:57:40.773+0800    ERROR   0.0.0.0:61200.default.4.SyncManager.tryNode        beacon/sync_manager.go:361      unable_to_sync  {"with_peer": "127.0.0.1:36647", "err": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:36647: connect: connection refused\""}
2023-04-27T21:57:40.773+0800    ERROR   0.0.0.0:61200.default.4.SyncManager.tryNode        beacon/sync_manager.go:361      unable_to_sync  {"with_peer": "127.0.0.1:35151", "err": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:35151: connect: connection refused\""}
2023-04-27T21:57:40.773+0800    ERROR   0.0.0.0:61200.default.4.SyncManager.tryNode        beacon/sync_manager.go:361      unable_to_sync  {"with_peer": "127.0.0.1:40255", "err": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:40255: connect: connection refused\""}
2023-04-27T21:57:40.773+0800    DEBUG   0.0.0.0:61200.default.4.SyncManagerbeacon/sync_manager.go:321      Tried all nodes without success {"sync_manager": "failed sync"}
2023-04-27T21:57:40.772+0800    DEBUG   0.0.0.0:61200.default.4 beacon/node.go:457 sending partial {"round": 895, "to": "127.0.0.1:36647"}
2023-04-27T21:57:40.773+0800    ERROR   0.0.0.0:61200.default.4 beacon/node.go:460 error sending partial   {"round": 895, "err": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:36647: connect: connection refused\"", "to": "127.0.0.1:36647"}
2023-04-27T21:57:40.772+0800    DEBUG   0.0.0.0:61200.default.4 beacon/node.go:457 sending partial {"round": 895, "to": "127.0.0.1:40255"}
2023-04-27T21:57:40.773+0800    ERROR   0.0.0.0:61200.default.4 beacon/node.go:460 error sending partial   {"round": 895, "err": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:40255: connect: connection refused\"", "to": "127.0.0.1:40255"}
2023-04-27T21:57:40.772+0800    DEBUG   0.0.0.0:61200.default.4 beacon/node.go:457 sending partial {"round": 895, "to": "127.0.0.1:35151"}
2023-04-27T21:57:40.773+0800    ERROR   0.0.0.0:61200.default.4 beacon/node.go:460 error sending partial   {"round": 895, "err": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:35151: connect: connection refused\"", "to": "127.0.0.1:35151"}
2023-04-27T21:57:40.772+0800    DEBUG   0.0.0.0:61200.default.4 beacon/node.go:457 sending partial {"round": 895, "to": "127.0.0.1:42071"}
2023-04-27T21:57:40.773+0800    ERROR   0.0.0.0:61200.default.4 beacon/node.go:460 error sending partial   {"round": 895, "err": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:42071: connect: connection refused\"", "to": "127.0.0.1:42071"}
2023-04-27T21:57:40.772+0800    DEBUG   0.0.0.0:61200.default.4 beacon/chainstore.go:194           {"store_partial": "127.0.0.1:34347", "round": 895, "len_partials": "1/4"}
2023-04-27T21:57:40.772+0800    DEBUG   0.0.0.0:61200.default.4 beacon/node.go:457 sending partial {"round": 895, "to": "127.0.0.1:37309"}
2023-04-27T21:57:40.773+0800    ERROR   0.0.0.0:61200.default.4 beacon/node.go:460 error sending partial   {"round": 895, "err": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:37309: connect: connection refused\"", "to": "127.0.0.1:37309"}

I don't know why ports such as 37309,42071 appear

from drand.

CluEleSsUK avatar CluEleSsUK commented on May 21, 2024

what commands are you using to generate the keypairs for each node? It's important that the address in the keypair corresponds to the one you're using to start the node

from drand.

kase113 avatar kase113 commented on May 21, 2024

what commands are you using to generate the keypairs for each node? It's important that the address in the keypair corresponds to the one you're using to start the node

Thank you for your help! I was able to run drand successfully, but I still have a few questions:
Can period be set <1? I hope node can immediately construct random number upon receiving the share of threshold.
What command do you need to query the latest random number for the running drand?
--out seems to output only group information, what command can output log documents (or I need to implement it in bash).

from drand.

CluEleSsUK avatar CluEleSsUK commented on May 21, 2024

Glad it worked for you!

In principle, the network can run at a period <1s, however in practice 1 or 2 seconds is the realistic limit.
Verifying BLS signatures is fairly slow which sets a lower bound in the 100s of milliseconds.
In our production network (which is geographically distributed with 23 nodes) we're seeing on average 850ms to aggregate a beacon. We run at a period of 3s as it's the fastest we feel comfortable supporting with 100% uptime right now.

If you were running your own network with that was geographically closer-located/a smaller number of nodes/a lower threshold/you don't mind a round delayed here and there, you could reasonably run at 1s.

Lower than that will likely cause issues as getting behind causes nodes to start a sync process; if multiple nodes are behind and trying to sync from one another it might take a while to catch up etc

from drand.

kase113 avatar kase113 commented on May 21, 2024

Glad it worked for you!

In principle, the network can run at a period <1s, however in practice 1 or 2 seconds is the realistic limit. Verifying BLS signatures is fairly slow which sets a lower bound in the 100s of milliseconds. In our production network (which is geographically distributed with 23 nodes) we're seeing on average 850ms to aggregate a beacon. We run at a period of 3s as it's the fastest we feel comfortable supporting with 100% uptime right now.

If you were running your own network with that was geographically closer-located/a smaller number of nodes/a lower threshold/you don't mind a round delayed here and there, you could reasonably run at 1s.

Lower than that will likely cause issues as getting behind causes nodes to start a sync process; if multiple nodes are behind and trying to sync from one another it might take a while to catch up etc

Thanks for your answer! Is there documented data describing performance metrics, such as latency to complete a beacon relative to the number of nodes, or other.

from drand.

CluEleSsUK avatar CluEleSsUK commented on May 21, 2024

Unfortunately there is not - we haven't extensively tested different network layouts/sizes.

Here's a snapshot from the last 24h for how long it took us to aggregate a beacon on our nodes:
image

We currently have 23 nodes in our network: some west coast US, some east coast, some europe, a few in China/Singapore.

A previous contributor noted a strange bug that the time you start a node has some small effect on the discrepancy, so that + no previous signature might account for why fastnet has a considerably lower discrepancy!
Note: time discrepancy = the time after the designated clock-time that the beacon was aggregated. Potentially we could do things like pre-prepare our partial signatures before the clock time to reduce it further, but we don't right now

from drand.

CluEleSsUK avatar CluEleSsUK commented on May 21, 2024

Also to note: mainnet1 = US, mainnet2 = Europe, mainnet3 = Asia

from drand.

kase113 avatar kase113 commented on May 21, 2024

I am trying to update the drand group members, and the updated bash script is as follows:


COUNTER=6
FOLDER=/tmp

for ((id=0;id<=COUNTER-1;id++)); do
  if [ "$id" -eq 0 ]; then
    leader=1600$id
    # echo $leader
    # touch $FOLDER/drand/node-$id/log
    drand share --control 1700$id --leader --transition --tls-disable --secret-file $FOLDER/secret/Reshare --nodes 7 --threshold 4 --out $FOLDER/drand/node-$id/group2.toml &
  else
    # touch $FOLDER/drand/node-$id/log 
    drand share --control 1700$id --connect 127.0.0.1:$leader --transition --secret-file $FOLDER/secret/Reshare --tls-disable --out $FOLDER/drand/node-$id/group2.toml &
  fi
done

rm -r$FOLDER/drand/node-6/
mkdir -p $FOLDER/drand/node-6
drand generate-keypair --tls-disable --folder $FOLDER/drand/node-6 --control 17006 --id default 0.0.0.0:16006 &
drand start --folder $FOLDER/drand/node-6 --control 17006 --private-listen 0.0.0.0:16006 --public-listen 0.0.0.0:15006 --tls-disable &
drand share --tls-disable --control 17006 --connect 127.0.0.1:16000 --from /tmp/drand/node-0/group.toml --secret-file /tmp/secret/Reshare --out /tmp/drand/node-6/group2.toml &
echo drand share --tls-disable --control 17006 --connect 127.0.0.1:16000 --from /tmp/drand/node-0/group.toml --secret-file /tmp/secret/Reshare --out /tmp/drand/node-6/group2.toml

But it didn't work out:

Generating private / public key pair without TLS.
Keypair already present in `/tmp/drand/node-6/multibeacon/default`.
Remove them before generating new one
Keys couldn't be loaded on drand daemon. If it is not running, these new keys will be loaded on startup. Err: could not reload the beacon process [default]: rpc error: code = Unknown desc = beacon id [default] is already running
drand 1.5.4 (date 2023-03-16T14:23:50Z, commit 11b42f0)
2023-05-06T17:17:39.417+0800    INFO    0.0.0.0:16006   core/drand_daemon.go:119                {"network": "init", "insecure": true}
can't instantiate drand daemon listen tcp 0.0.0.0:15006: bind: address already in use
drand 1.5.4 (date 2023-03-16T14:23:50Z, commit 11b42f0)
Participating to the resharing. Beacon ID: [default] 
2023-05-06T17:17:39.422+0800    INFO    0.0.0.0:16006.default   core/drand_beacon_control.go:148                {"init_reshare": "begin", "leader": false}
2023-05-06T17:17:39.424+0800    INFO    0.0.0.0:16006.default   core/drand_beacon_control.go:682                {"setup_reshare": "signaling_key_to_leader"}

I am not sure whether there is any problem with the bash script.
Meanwhile, is there any performance evaluation about updating the drand group.
Thank you for your help!

from drand.

CluEleSsUK avatar CluEleSsUK commented on May 21, 2024

The bind: address already in use suggests you are already running a drand node on that port.
Are you possibly running a docker stack or another node in another shell?

is there any performance evaluation about updating the drand group

do you mean the key resharing process here or something else?

from drand.

kase113 avatar kase113 commented on May 21, 2024

do you mean the key resharing process here or something else?

I want to know the latency of the total update drand, including key resharing , chain synchronization of new nodes, and so on

from drand.

kase113 avatar kase113 commented on May 21, 2024

And I have a question about whether drand can operate on a completely asynchronous network, where data sent from a node may arrive at the destination node in an infinite amount of time (as it must eventually) and does not need to synchronize the clock.

from drand.

kase113 avatar kase113 commented on May 21, 2024

Also, can drand tolerate 1/3 of the total number of hostile opponents?

from drand.

CluEleSsUK avatar CluEleSsUK commented on May 21, 2024

I want to know the latency of the total update drand, including key resharing , chain synchronization of new nodes, and so on

There are quite a few factors that influence this, so we don't have any concrete metrics on it.
Right now resharing takes at worst ~40secs (as we have a hard timeout for it in Kyber). That could be extended, and if you were doing a DKG with e.g. 100 or 200 participants it could well take longer. 'Fast mode' is enabled by default, so generally if all nodes run their sharing command at the sameish time, the DKG finishes much faster (on the order of 5secs).
Chain synchronisation depends a lot on how many beacons have been produced and whether the network itself is catching up, or just a single node. A single node can run the follow command from the CLI and catch up fairly quickly (by syncing with other nodes). Additionally, a new node could just rsync an existing beacon database from another node to catchup as it would likely be faster.

If the network gets behind, it depends on the catchupPeriod defined in the DKG - by default we set it to the period/2 so the network doesn't get overloaded in the case of an outage.

Re: clock synchronisation, nodes in the LoE network use NTP for clock synchronisation, but this is purely a governance issue - nodes could do their own synchronisation and get ahead/behind in theory.

Also, can drand tolerate 1/3 of the total number of hostile opponents?

This depends on what you set the threshold to in the DKG. We currently use n/2 + 1, so it can tolerate up to 1/2 hostile nodes in the network

from drand.

kase113 avatar kase113 commented on May 21, 2024

I seem to have misunderstood the drand update, a new node to join the drand network must first pause the existing network node and then execute the reshare command, rather than a new node can directly join a running network.

At the same time, I learned from the security model that the drand setup stage needs to synchronize the network. Is it necessary to synchronize the network

from drand.

CluEleSsUK avatar CluEleSsUK commented on May 21, 2024

I'm not quite sure I understand - nodes doing a resharing continue to make beacons on the existing network while the DKG happens, and during the DKG a future round is selected to switch over to the new keys.
A new node running the follow command will continue to sync new beacons while the DKG takes place, and will start participating once the key distribution has finished using its new key share.

Strictly speaking nodes can participate in the DKG without synchronising the network; this poses a liveness risk however: imagine a network of 9 nodes with a threshold of 5; if 7 new nodes were to join unsynchronised, with a new node count of 16 and a threshold of 9, any of the original nodes going down would cause an outage.
To that end, it's better everybody is synchronised or close to synchronised before joining the network

from drand.

kase113 avatar kase113 commented on May 21, 2024

Thank you for your help! I would like to cite Drand as a reference result, but I am not sure about the citation format for Drand.

from drand.

CluEleSsUK avatar CluEleSsUK commented on May 21, 2024

Hmm I’m not quite sure - sounds like a question for @AnomalRoil !

from drand.

AnomalRoil avatar AnomalRoil commented on May 21, 2024

Depends in what part you're interested in I guess.

The "producing random beacon" part of it stemmed from https://doi.org/10.1109/SP.2017.45 using the BLS optimization from section 4.E
The "timelock encryption" part is in https://eprint.iacr.org/2023/189
The "distributed key generation" part isn't really part of any specific paper, so you might just reference the drand.love website I guess for that.

Otherwise maybe that would be fine as a generic one:

@misc{drand,
title={drand: Distributed Publicly Verifiable Randomness Beacon},
howpublished = {\url{https://drand.love}}
} 

cc @nikkolasg too

from drand.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.