libp2p / go-libp2p-kad-dht Goto Github PK

View Code? Open in Web Editor NEW

509.0 50.0 216.0 3.67 MB

A Kademlia DHT implementation on go-libp2p

Home Page: https://github.com/libp2p/specs/tree/master/kad-dht

License: MIT License

Makefile 0.04% Go 99.96%

libp2p dht ipfs kad-dht

go-libp2p-kad-dht's People

Stargazers

Watchers

Forkers

geoah justindrake vyzo mildred stebalien magik6k metacurrency wigy-opensource-developer tvi voodoo12345 ssikdar1 openbazaar zippy mashatan dmbreaker forging2012 dirkmc rargulati w1r2p1 ringtail ridewindx df05 michaelmure nonsense chebykin rushit bigs jiangsong bit-nation xiemylogos mikiquantum netbsd8 zaibon vincentchu gpestana grokcoder 90dy wanghongdacn bamaao dms3-p2p parallelo-ai raulk vvvictorlee luceas chaog bennofs tessrnet zixuanzh qq431169079 jhiesey zhaohaijun stiartsly mtojek avbasov oksbsb nanjia qizheng09 djdv stefanhans xuejwei icook 0xyuan90 im-kulikov txien coder-lb bpot kristenpire var2dan aschmahmann aarshkshah1992 mplaza blackironj didlie noclouds jimpick rtradeltd lukesolo jorikschellekens scriptkitty forkkit jorropo captainchanzxc pontiyaraja cpacia weijun-sh fatproteins ssu jgr-great rhazberries edjx yann-sjtu daotl diemyst fluencelabs xmxanuel mxinden decanus tron-us filtok3 sandmanhome

go-libp2p-kad-dht's Issues

PutValue appears to not put to K peers

In order for queries to function properly, PutValue should ensure it puts out to K (20) peers. It currently seems that this is not happening, and that a much smaller number of records (less than 16, our GetValue threshold) is actually being put.

This was brought up in ipfs/kubo#3860

We should investigate and fix this.

GetValues and friends should not use RecvdVal

Given the stated usecase for RecvdVal:

// RecvdVal represents a dht value record that has been received from a given peer
// it is used to track peers with expired records in order to correct them.
type RecvdVal struct {
	From peer.ID
	Val  []byte
}

We really shouldn't be exposing these on a public interface; this information appears to be for internal bookkeeping only. Furthermore, not all Routers can even fill in a reasonable peer.ID.

channel returned by FindProvidersAsync emits empty peers

FindProvidersAsync returns a channel - when I assign that channel to a variable, and select on it, I get tons of output like:

Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}Got something from providers: {<peer.ID > []}

(This is obviously pasted from a log line I made to check)

causing my select statement to consume 100% cpu. I've looked at the internals of that function, and they seem pretty intricate - at least to me. dunno if this is a bug, or I am using the function incorrectly.

Record validation issue with multiple naming systems

I have been looking at record validation and saw that a record in the DHT had to be valid on the peer that stores it. If we want to allow different naming schemes, this is going to be a problem because a naming scheme has specific validation routines.

So, either we only define in a central place a list of all naming schemes with their validation functions that all peers must implement, or we change how we do validation or how we store our records.

Assuming we want to be flexible about the validation functions, we have two choices:

we stop verifying records, which can lead to abuse
or we manage somehow to only store and retrieve records of a defined type on peers that understand this record type

If the DHT implementation is somewhere near what the Kadmelia DHT does, the second option should be possible to implement, by essentially creating sub-networks for each supported validation scheme a peer has.

What's the status of the 'old' DHT?

In dht.go there seems to be two DHT protocols in use:

var ProtocolDHT protocol.ID = "/ipfs/kad/1.0.0"
var ProtocolDHTOld protocol.ID = "/ipfs/dht"

Are they both in use? If so are they being used in parallel, and when is the old one going to be deprecated?

Never overwrite new records with old records.

We need to call the record selector in handlePutValue.

Race condition in tracking known DHT nodes

Currently, we handle connect/disconnect notifications asynchronously so we have a race where a peer can be recorded as a DHT node even when we're not connected to them. We did this to avoid dialing peers (to test if they are DHT nodes) from within the notification handlers.

IMO, the best fix is to simply add all peers to the routing table optimistically and then remove them as we learn that they aren't DHT nodes. At the end of the day, this should be the same net amount of work.

Memory Leak: Remove old message senders from strmap

After disconnecting from a peer, we should garbage collect the associated message sender from strmap.

FindProviders don't add peer addresses to peerstore

When running FindProviders, the DHT returns a list of peers complete with their id and their addresses. However, later when asking to connect to these peers, their addresses are not in the peerstore.

The reason is that the DHT doesn't store the addresses on FindProviders. Perhaps storing the address could result in less queries made to the DHT.

Does go-libp2p use a specific styleguide?

I notice that the comments in this repo are inconsistent. Sometimes comments starts with an uppercase letter, sometimes a lowercase letter. Sometimes ends with a ., sometimes not.

Test instructions

I've rungo get -u -y github.com/libp2p/go-libp2p-kad-dht but go test fails out of the box. What steps should I take to make the tests run?

As-MacBook-Pro:go-libp2p-kad-dht a$ go test
11:42:37.756 ERROR        dht: adding value on:  <peer.ID WgYphF> dht_test.go:168
11:42:37.757 ERROR        dht: requesting value on dht:  <peer.ID YSaBVe> dht_test.go:188
11:42:38.693 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:57487: use of closed network connection swarm_dial.go:404
11:42:38.696 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:57505: use of closed network connection swarm_dial.go:404
11:42:38.697 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:57504: use of closed network connection swarm_dial.go:404
11:42:38.698 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:57512: use of closed network connection swarm_dial.go:404
11:42:39.827 ERROR        dht: checking dht client type: stream closed notif.go:43
11:42:39.850 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58391: use of closed network connection swarm_dial.go:404
11:42:39.851 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58398: use of closed network connection swarm_dial.go:404
11:42:39.851 ERROR        dht: checking dht client type: stream closed notif.go:43
11:42:39.851 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58383: use of closed network connection swarm_dial.go:404
11:42:39.863 ERROR        dht: checking dht client type: stream closed notif.go:43
11:42:39.864 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58389: use of closed network connection swarm_dial.go:404
11:42:39.870 ERROR        dht: checking dht client type: stream closed notif.go:43
11:42:39.872 ERROR        dht: checking dht client type: stream closed notif.go:43
11:42:39.875 ERROR        dht: checking dht client type: stream closed notif.go:43
11:42:39.901 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58525: use of closed network connection swarm_dial.go:404
11:42:39.903 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58513: use of closed network connection swarm_dial.go:404
11:42:39.904 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58538: use of closed network connection swarm_dial.go:404
11:42:39.929 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58615: use of closed network connection swarm_dial.go:404
11:42:39.935 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58535: use of closed network connection swarm_dial.go:404
11:42:39.936 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58598: use of closed network connection swarm_dial.go:404
11:42:39.948 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58595: use of closed network connection swarm_dial.go:404
11:42:39.948 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58613: use of closed network connection swarm_dial.go:404
11:42:39.950 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58582: use of closed network connection swarm_dial.go:404
11:42:39.950 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58584: use of closed network connection swarm_dial.go:404
11:42:39.963 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58605: use of closed network connection swarm_dial.go:404
11:42:39.972 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58631: use of closed network connection swarm_dial.go:404
11:42:39.975 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58624: use of closed network connection swarm_dial.go:404
11:42:39.984 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58608: use of closed network connection swarm_dial.go:404
11:42:39.984 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58618: use of closed network connection swarm_dial.go:404
11:42:39.985 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58626: use of closed network connection swarm_dial.go:404
11:42:39.988 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58621: use of closed network connection swarm_dial.go:404
11:42:39.994 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58611: use of closed network connection swarm_dial.go:404
11:42:39.997 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58638: use of closed network connection swarm_dial.go:404
11:42:40.004 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58642: use of closed network connection swarm_dial.go:404
11:42:40.004 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58627: use of closed network connection swarm_dial.go:404
11:42:40.011 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58641: use of closed network connection swarm_dial.go:404
11:42:40.015 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58646: use of closed network connection swarm_dial.go:404
11:42:40.031 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58645: use of closed network connection swarm_dial.go:404
11:42:40.036 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58681: use of closed network connection swarm_dial.go:404
11:42:40.052 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58685: use of closed network connection swarm_dial.go:404
11:42:40.054 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58691: use of closed network connection swarm_dial.go:404
11:42:40.058 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58741: use of closed network connection swarm_dial.go:404
11:42:40.058 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58699: use of closed network connection swarm_dial.go:404
11:42:40.058 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58719: use of closed network connection swarm_dial.go:404
11:42:40.058 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58710: use of closed network connection swarm_dial.go:404
11:42:40.060 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58748: use of closed network connection swarm_dial.go:404
11:42:40.061 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58716: use of closed network connection swarm_dial.go:404
11:42:40.061 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58718: use of closed network connection swarm_dial.go:404
11:42:40.061 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58701: use of closed network connection swarm_dial.go:404
11:42:40.062 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58947: use of closed network connection swarm_dial.go:404
11:42:40.062 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58695: use of closed network connection swarm_dial.go:404
11:42:40.063 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58838: use of closed network connection swarm_dial.go:404
11:42:40.063 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58754: use of closed network connection swarm_dial.go:404
11:42:40.063 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58739: use of closed network connection swarm_dial.go:404
11:42:40.063 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58771: use of closed network connection swarm_dial.go:404
11:42:40.063 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58890: use of closed network connection swarm_dial.go:404
11:42:40.063 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58780: use of closed network connection swarm_dial.go:404
11:42:40.064 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58820: use of closed network connection swarm_dial.go:404
11:42:40.064 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58724: use of closed network connection swarm_dial.go:404
11:42:40.064 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58703: use of closed network connection swarm_dial.go:404
11:42:40.064 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58938: use of closed network connection swarm_dial.go:404
11:42:40.064 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58767: use of closed network connection swarm_dial.go:404
11:42:40.064 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58706: use of closed network connection swarm_dial.go:404
11:42:40.064 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58746: use of closed network connection swarm_dial.go:404
11:42:40.065 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58827: use of closed network connection swarm_dial.go:404
11:42:40.066 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58772: use of closed network connection swarm_dial.go:404
11:42:40.065 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58785: use of closed network connection swarm_dial.go:404
11:42:40.066 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58798: use of closed network connection swarm_dial.go:404
11:42:40.067 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58790: use of closed network connection swarm_dial.go:404
11:42:40.067 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58863: use of closed network connection swarm_dial.go:404
11:42:40.067 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58971: use of closed network connection swarm_dial.go:404
11:42:40.067 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58789: use of closed network connection swarm_dial.go:404
11:42:40.067 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:59210: use of closed network connection swarm_dial.go:404
11:42:40.065 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58830: use of closed network connection swarm_dial.go:404
11:42:40.066 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58955: use of closed network connection swarm_dial.go:404
11:42:40.066 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58755: use of closed network connection swarm_dial.go:404
11:42:40.068 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58846: use of closed network connection swarm_dial.go:404
11:42:40.068 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58883: use of closed network connection swarm_dial.go:404
11:42:40.068 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58988: use of closed network connection swarm_dial.go:404
11:42:40.068 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58807: use of closed network connection swarm_dial.go:404
11:42:40.072 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:59084: use of closed network connection swarm_dial.go:404
11:42:40.073 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:59105: use of closed network connection swarm_dial.go:404
11:42:40.074 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:59149: use of closed network connection swarm_dial.go:404
11:42:40.074 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:59067: use of closed network connection swarm_dial.go:404
11:42:40.075 ERROR     swarm2: failed to reset connection deadline after setup:  set tcp 127.0.0.1:58956: use of closed network connection swarm_dial.go:404
11:42:40.316 ERROR        dht: checking dht client type: stream closed notif.go:43
counts:  20 17
11:42:43.026 ERROR net/identi: <peer.ID RzicRh> cannot unmarshal key from remote peer: <peer.ID QZjMJu> id.go:257
11:42:43.026 ERROR net/identi: <peer.ID QZjMJu> cannot unmarshal key from remote peer: <peer.ID QZjMJu> id.go:257
11:42:43.026 ERROR net/identi: <peer.ID RzicRh> cannot unmarshal key from remote peer: <peer.ID RzicRh> id.go:257
11:42:43.027 ERROR   boguskey: TestBogusPrivateKey.Sign -- this better be a test! key.go:49
11:42:43.028 ERROR net/identi: <peer.ID QwUcMK> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.028 ERROR net/identi: <peer.ID THthiC> cannot unmarshal key from remote peer: <peer.ID QwUcMK> id.go:257
11:42:43.028 ERROR net/identi: <peer.ID Rm7TkB> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.028 ERROR net/identi: <peer.ID Rm7TkB> cannot unmarshal key from remote peer: <peer.ID QwUcMK> id.go:257
11:42:43.028 ERROR net/identi: <peer.ID SMMwYd> cannot unmarshal key from remote peer: <peer.ID QwUcMK> id.go:257
11:42:43.028 ERROR net/identi: <peer.ID QNkYsY> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.030 ERROR net/identi: <peer.ID QwUcMK> cannot unmarshal key from remote peer: <peer.ID QwUcMK> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID THthiC> cannot unmarshal key from remote peer: <peer.ID THthiC> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID UHVWRe> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID TeeRkD> cannot unmarshal key from remote peer: <peer.ID THthiC> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID WWfwWC> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID XJUHLG> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID UHVWRe> cannot unmarshal key from remote peer: <peer.ID THthiC> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID XZwjrQ> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID ZbSq3U> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID WWfwWC> cannot unmarshal key from remote peer: <peer.ID THthiC> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID XJUHLG> cannot unmarshal key from remote peer: <peer.ID THthiC> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID aAo1wX> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.037 ERROR net/identi: <peer.ID avjHsv> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.038 ERROR net/identi: <peer.ID bCJhEQ> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.038 ERROR net/identi: <peer.ID XzyjTp> cannot unmarshal key from remote peer: <peer.ID THthiC> id.go:257
11:42:43.038 ERROR net/identi: <peer.ID dFEqHY> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.030 ERROR net/identi: <peer.ID SMMwYd> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
11:42:43.038 ERROR net/identi: <peer.ID aAo1wX> cannot unmarshal key from remote peer: <peer.ID THthiC> id.go:257
11:42:43.030 ERROR net/identi: <peer.ID TeeRkD> cannot unmarshal key from remote peer: <peer.ID QNkYsY> id.go:257
PASS
11:42:43.032 ERROR net/identi: <peer.ID ZbSq3U> cannot unmarshal key from remote peer: <peer.ID WWfwWC> id.go:257
11:42:43.032 ERROR net/identi: <peer.ID UHVWRe> cannot unmarshal key from remote peer: <peer.ID UHVWRe> id.go:257
11:42:43.033 ERROR net/identi: <peer.ID TeeRkD> cannot unmarshal key from remote peer: <peer.ID TeeRkD> id.go:257
11:42:43.033 ERROR net/identi: <peer.ID UHVWRe> cannot unmarshal key from remote peer: <peer.ID TeeRkD> id.go:257
11:42:43.033 ERROR net/identi: <peer.ID WWfwWC> cannot unmarshal key from remote peer: <peer.ID TeeRkD> id.go:257
11:42:43.033 ERROR net/identi: <peer.ID XJUHLG> cannot unmarshal key from remote peer: <peer.ID TeeRkD> id.go:257
11:42:43.033 ERROR net/identi: <peer.ID XZwjrQ> cannot unmarshal key from remote peer: <peer.ID TeeRkD> id.go:257
11:42:43.033 ERROR net/identi: <peer.ID XzyjTp> cannot unmarshal key from remote peer: <peer.ID TeeRkD> id.go:257
11:42:43.033 ERROR net/identi: <peer.ID ZbSq3U> cannot unmarshal key from remote peer: <peer.ID TeeRkD> id.go:257
11:42:43.033 ERROR net/identi: <peer.ID aAo1wX> cannot unmarshal key from remote peer: <peer.ID TeeRkD> id.go:257
11:42:43.033 ERROR net/identi: <peer.ID avjHsv> cannot unmarshal key from remote peer: <peer.ID TeeRkD> id.go:257
11:42:43.033 ERROR net/identi: <peer.ID dFEqHY> cannot unmarshal key from remote peer: <peer.ID TeeRkD> id.go:257
11:42:43.034 ERROR net/identi: <peer.ID XJUHLG> cannot unmarshal key from remote peer: <peer.ID UHVWRe> id.go:257
11:42:43.034 ERROR net/identi: <peer.ID XZwjrQ> cannot unmarshal key from remote peer: <peer.ID UHVWRe> id.go:257
11:42:43.034 ERROR net/identi: <peer.ID XzyjTp> cannot unmarshal key from remote peer: <peer.ID UHVWRe> id.go:257
11:42:43.034 ERROR net/identi: <peer.ID dFEqHY> cannot unmarshal key from remote peer: <peer.ID WWfwWC> id.go:257
11:42:43.034 ERROR net/identi: <peer.ID ZbSq3U> cannot unmarshal key from remote peer: <peer.ID UHVWRe> id.go:257
ok  	github.com/libp2p/go-libp2p-kad-dht	5.420s

panic in FindPeersConnectedToPeer

triggered by my debug crawler:

goroutine 142220 [running]:
runtime.throw(0x903377, 0x21)
	/usr/lib64/go/src/runtime/panic.go:605 +0x95 fp=0xc42283bbf8 sp=0xc42283bbd8 pc=0x42b805
runtime.mapaccess2_faststr(0x85c6a0, 0xc4212a9c80, 0xc42053ba40, 0x22, 0xc420dc4868, 0xc420688300)
	/usr/lib64/go/src/runtime/hashmap_fast.go:324 +0x47a fp=0xc42283bc50 sp=0xc42283bbf8 pc=0x40d73a
gx/ipfs/QmT9TxakNKCHg3uBcLnNzBSBhhACvqH8tRzJvYZjUevrvE/go-libp2p-kad-dht.(*IpfsDHT).FindPeersConnectedToPeer.func1(0xc7da80, 0xc42399adf0, 0xc420b7dd70, 0x22, 0x1, 0x1, 0x1)
	/home/vyzo/src/golang/src/gx/ipfs/QmT9TxakNKCHg3uBcLnNzBSBhhACvqH8tRzJvYZjUevrvE/go-libp2p-kad-dht/routing.go:527 +0x174 fp=0xc42283bdb0 sp=0xc42283bc50 pc=0x77c4a4
gx/ipfs/QmT9TxakNKCHg3uBcLnNzBSBhhACvqH8tRzJvYZjUevrvE/go-libp2p-kad-dht.(*dhtQueryRunner).queryPeer(0xc4229b3400, 0xc80980, 0xc4212d96e0, 0xc420b7dd70, 0x22)
	/home/vyzo/src/golang/src/gx/ipfs/QmT9TxakNKCHg3uBcLnNzBSBhhACvqH8tRzJvYZjUevrvE/go-libp2p-kad-dht/query.go:272 +0x139 fp=0xc42283bf68 sp=0xc42283bdb0 pc=0x778999
gx/ipfs/QmT9TxakNKCHg3uBcLnNzBSBhhACvqH8tRzJvYZjUevrvE/go-libp2p-kad-dht.(*dhtQueryRunner).spawnWorkers.func1(0xc80980, 0xc4212d96e0)
	/home/vyzo/src/golang/src/gx/ipfs/QmT9TxakNKCHg3uBcLnNzBSBhhACvqH8tRzJvYZjUevrvE/go-libp2p-kad-dht/query.go:213 +0x50 fp=0xc42283bfa0 sp=0xc42283bf68 pc=0x77bdb0
gx/ipfs/QmSF8fPo3jgVBAy8fpdjjYqgG87dkJgUprRBHRd2tmfgpP/goprocess.(*process).Go.func1(0xc421bbde40, 0xc4212d96e0, 0xc4212d9740)
	/home/vyzo/src/golang/src/gx/ipfs/QmSF8fPo3jgVBAy8fpdjjYqgG87dkJgUprRBHRd2tmfgpP/goprocess/impl-mutex.go:112 +0x3c fp=0xc42283bfc8 sp=0xc42283bfa0 pc=0x7257ac
runtime.goexit()
	/usr/lib64/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc42283bfd0 sp=0xc42283bfc8 pc=0x459d31
created by gx/ipfs/QmSF8fPo3jgVBAy8fpdjjYqgG87dkJgUprRBHRd2tmfgpP/goprocess.(*process).Go
	/home/vyzo/src/golang/src/gx/ipfs/QmSF8fPo3jgVBAy8fpdjjYqgG87dkJgUprRBHRd2tmfgpP/goprocess/impl-mutex.go:111 +0x2d1

The peersSeen map needs to be mutex protected.

(DoS vector?) Peers are removed from the routing table on Disconnect

Currently when a peer disconnects, the peer is removed from the routing table. To me this seems needlessly aggressive. Shouldn't the Kademlia least-recently seen eviction policy deal with clearing inactive nodes? Certainly a node shouldn't be evicted from the routing table for a temporary disconnection.

I don't see this remove-on-disconnect policy in the Kademlia whitepaper, and to my eye it is a DoS attack vector. An attacker can flood a victim at the network level to force the temporary closure of all its connections. This would flush the node's routing state, and the attacker could then fill the victim's routing table with bad nodes.

Even in a non-hostile scenario, if a node is temporary shut off from the internet (e.g. for just a few minutes), then it needlessly has to repopulate its routing table from scratch.

Am I missing something obvious?

Are DHT messages signed and/or encrypted?

I'm interested in the transport layer for DHT messages. When a message is sent out (e.g. a PING message) from peer A to peer B, what kind of cryptographic guarantees are there? For example does peer A encrypt the message with peer B's public key? Does peer A sign the message with his private key?

I ask because I'd like to know how reliable the PING messages are, i.e. if they can be tampered with or trusted. Looking in dht_net.go I don't see any signs of any crypto happening at the transport layer for DHT messages.

multiple-value p.ExtractPublicKey() in single-value context

Hi,
I tried to import the DHT. When I import it in my project I am getting the following error:
src/github.com/libp2p/go-libp2p-kad-dht/records.go:30:26: multiple-value p.ExtractPublicKey() in single-value context.

Reproduce:

Just run go get github.com/libp2p/go-libp2p-kad-dht - the error should be displayed. This is my golang version: go version go1.10 darwin/amd64.

DHT review

So I did a deep dive into the DHT code to figure out why it might not work as well as possible and took notes along the way:

GetValue:

if a record is local, the dht is queried anyway. The local record could be trusted during a fraction of it's age vs TTL, and more and more peers queried after. But this optimization should probably be done only after the dht works well.
if an error happen during the query but we still got some values, the error is dropped silently

handlePutValue:

the incoming message is read back exactly, including the value. This could be optimized with a simpler response success/failure

FindProviders:

providers are streamed up to KValue. When at some point there is more than KValue providers available:
- excess providers are just dropped
- there is no selection to return the best, it's just the first of them
new providers discovered don't seems to be stored locally
there is a special handling for a bug closed in 2016, likely safe to remove

FindPeer:

comments seems to equal peer in the routing table to peer connected. Is it still the case with the connection manager ?

FindPeersConnectedToPeer:

is that even used ? no relevant hit on the whole github. Maybe that could be dropped.

Can someone have a look to confirm ?

Disjoint Path Lookups

Currently, we retrieve 16 values to try to prevent eclipse attacks. However, according to SKademlia, we actually need to query with 16 different paths. That is, we need to complete 16 independent queries starting at 16 independent peers.

Edit (@jacobheun):

Design notes

S/Kademlia (secure kademlia) proposes a security improvement to Kademlia. Instead of searching along a single path, S/Kademlia searches multiple distinct paths.

Start D queries with distinct starting peers.
When querying a peer on a path, "claim" that peer. Peers claimed by one path cannot be queried on another path.
Finally, return the K peers found on each path. Technically, the S/Kademlia paper says that peers should ask the final peers for their K "siblings". Asking the closest peer to a key for the closest peers to that key should yield the same result.
- Note: It's unclear whether we should take the union of the K closest or the K closest of the union of the D sets of K closest. However, given that it's trivial to (a) find the closest node to a K then (b) generate 20 identities close to that peer and closer to the key (without true sybil resistance), we might as well assume that the 20 closest nodes are "good".

The second part of S/Kademlia is sybil/eclipse resistance: when joining the network, peers should have to, e.g., perform some proof of work or get some signature by a central authority.

This second part is tricky to tune and we'll likely punt it till later. S/Kademlia is still useful.
The disjoint path lookups are useful even without this feature as they reduce the likelihood of getting lost in a disconnected partition of the network.

Testing mechanics

Multipath queries have the potential of multiplying traffic in the network. Our hypothesis is that the end result, when coupled with Terminating Queries Correctly, will not be worse than the current situation, where we basically walk around the DHT relentlessly. We need to verify this by comparative testing.

The combination of both changesets should yield better characteristics than the status quo, in terms of number of dials, number of peers queried, query convergence speed, and number of peers at each distance, the latter of which should be n-fold (where n is the number of paths) the value of the same metric with the Terminating Queries Correctly changeset only.

Success Criteria

The DHT can tolerate small subgraphs of partitioned nodes without queries getting trapped in these subgraphs.

dht.Bootstrap leaks ticker

Bootstrap calls BoostrapWithConfig which creates an implicit ticker in https://github.com/libp2p/go-libp2p-kad-dht/blob/master/dht_bootstrap.go#L77

This makes it impossible to cleanly shutdown the dht in order to transition from an online to offline state, as it leaks resources.

Ensure that queries converge in log(key-size) steps

Currently, when queried for peers closer to a key, we return at most 20 closest known peers. Instead, we should be returning the 20 closest known peers that are over half-way between us and the target key. Otherwise, the worst-case convergence is the size of the network.

The client should also validate this constraint.

Atomically update DHT records

It looks like there's a bit of a race condition when putting new DHT records. If we have two parallel puts, we may end up overwriting a newer record with an older record because we don't serialize puts to any given key.

We should either:

Add a CompareAndSwap to the datastore interface (useful but probably not worth the trouble).
Add a per-key lock (simpler).

Find self on bootstrap

We need to find ourself in the DHT to properly bootstrap the routing table. Finding random IDs will connect us to the rest of the DHT but we need to find ourself to:

Connect to our neighbors.
Distribute our peer addresses to our neighbors.

This is kind of critical for proper DHT functionality. Actually, in addition to finding our own peer, I wonder if we should be trying to find a key in each bucket (or some sampling thereof)?

Add more tests

We need to test some more situations here. In particular, things i'm concerned about:

GetClosestPeers should always return min(K, len(allPeers)) peers.
FindPeer should return more than just the first addresses it gets
FindPeer (even in the absense of the previous point) should never return an empty address set.
The dht should reuse streams to peers. I'm not 100% confident this is happening yet.
PutValue and PutProvider should actually send out K messages, in all cases.

Finish dependency extraction

Currently, this package will work, but many of its dependencies are still located in the main go-ipfs codebase that makes it rather difficult to import back into ipfs. To finalize the extraction, we should work to remove or extract all dependencies on the main go-ipfs codebase this project has.

This means we need to either extract more packages into their own repos, or find ways to just not use certain bits of code to remove a dependency.

Add to admin team

Consecutive queries to the same key are not much faster

DHT RPC calls such as GetValues and GetClosestPeers start by getting alpha nearest peers to the key from the k-table, where alpha is currently set to 3

The RPC requests the k closest peers to the key from each of the alpha peers, filters out the peers that have already been queried, and requests the k closest peers from the remaining set of peers. It repeats the process of requesting k closest peers and filtering until enough values have been received, or there are no more peers to query. k is currently set to 20

As peers are retrieved, they are added to the k-table, so that when the RPC completes this peer will know about most of the peers close to the key it has just queried.

However on a subsequent query for the same key, only 3 of these nearest peers will be initially queried, so the requests will essentially follow the same pattern as described above. The RPC would complete faster (and use less resources) if it directly queried the k closest peers that are already in its k-table for that key.

The current DHT implementation follows the spec correctly, so there may be a reason I'm not aware of for why it should behave this way. An alternative could be to use a separate cache (ie not the k-table) for the closest peers for recent queries to particular keys.

Faster consecutive queries for the same key would help with the performance for example of the proposed changes to DHT Publish suggested by @Stebalien, and go some way to addressing concerns mentioned by @vyzo

Cannot run go test

So this morning I was to resume running through all the tests on the dht to learn it.

And I got this:

christopher@holochain:~/dev.kadDHT$ go test
# github.com/libp2p/go-libp2p/p2p/host/basic
../go/src/github.com/libp2p/go-libp2p/p2p/host/basic/basic_host.go:114: cannot use h (type *BasicHost) as type host.Host in argument to identify.NewIDService:
        *BasicHost does not implement host.Host (missing ConnManager method)
FAIL    _/home/christopher/dev.kadDHT [build failed]

Im pretty sure this is to do with the fact that the holochain stuff has caused gx-go to rewrite some of the packages.
Ive run gx-go rewrite --undo from the holochain package, but that makes no difference.

Does anyone have any idea what I can do about this?

Switch x/net/context to new stdlib context

We're currently importing "golang.org/x/net/context", should switch to the new stdlib context package when we have a chance.

TestValueGetSet outputs error logs.

go-libp2p-kad-dht commit hash: d30bf628ca4c1c686840a213316610fc7a1dae2f

Running the TestValueGetSet results in error logs although the test passes. I'm a beginner to libp2p and IPFS so I have not got a clear understanding of what is the valid key format, but I would expect the tests to be written in such a way that they do not cause errors. Do all keys have to be CIDs? Or is any key acceptable?

$ go test . -v -run TestValueGetSet
=== RUN   TestValueGetSet
11:47:16.662 ERROR        dht: adding value on:  <peer.ID fEAbZS> dht_test.go:165
11:47:16.662 ERROR        dht: loggableKey could not cast key to a CID: 2f762f68656c6c6f invalid cid version number: 104 lookup.go:62
11:47:16.663 ERROR        dht: loggableKey could not cast key to a CID: 2f762f68656c6c6f invalid cid version number: 104 lookup.go:62
11:47:16.663 ERROR        dht: requesting value on dht:  <peer.ID cyU1YL> dht_test.go:185
11:47:16.664 ERROR        dht: loggableKey could not cast key to a CID: 2f762f68656c6c6f invalid cid version number: 104 lookup.go:62
11:47:16.665 ERROR        dht: loggableKey could not cast key to a CID: 2f762f68656c6c6f invalid cid version number: 104 lookup.go:62
--- PASS: TestValueGetSet (0.13s)
PASS
ok  	github.com/libp2p/go-libp2p-kad-dht	0.133s

Standard readme

@RichardLitt

tests fail on windows

$ gx test
15:17:26.959 ERROR        dht: adding value on: <peer.ID PaD3Ee> dht_test.go:168
15:17:26.959 ERROR        dht: loggableKey could not cast key: invalid cid version number: 47 lookup.go:35
15:17:26.960 ERROR        dht: requesting value on dht: <peer.ID Sijobb> dht_test.go:188
--- FAIL: TestProvides (0.02s)
        dht_test.go:84: While setting up DHTs peerid got duplicated.
--- FAIL: TestBootstrap (0.00s)
        dht_test.go:84: While setting up DHTs peerid got duplicated.
--- FAIL: TestPeriodicBootstrap (0.02s)
        dht_test.go:84: While setting up DHTs peerid got duplicated.
--- FAIL: TestProvidesAsync (0.00s)
        dht_test.go:84: While setting up DHTs peerid got duplicated.

Externally provided peer filter to exclude on lookup

When wanting to use the DHT for other purposes, it seems the peer set is limited in size w/ no way to clear it (from the outside) and there is no way for me to pre-filter the peers (from the outside) before they are added. Suggestions on best approach? Some options:

Add PeerSet.Clear()
Add some kind of peer filter getter/setter on IpfsDHT
Add an overload, e.g. GetClosestFilteredPeers
Add something like QueryRawPeers that does not use the peerset or maybe even peerstore (but would need to return back PeerInfo itself)

Dial backpressure

We currently queue up a bunch of dials which ends up using a bunch of memory. It would be nice to have some sort of backpressure mechanism.

Still getting 'unexpected EOF' from notif.go

This looks like it shouldn't be possible since 3094255, but I just go this error message on a fresh install of go-ipfs:

Agent Version go-ipfs/0.4.8/
Protocol Version ipfs/0.1.0

ERROR        dht: checking dht client type: unexpected EOF notif.go:43

The fact that it's on line 43 rather than 38 like ipfs/kubo#3599 suggests to me that I do indeed have the newest version of notif.go. I'm not sure why, but it looks like that change didn't fix what it was supposed to somehow.

Automatically add the author of an IPNS record to the set of peers to query

One potential hack to speed up IPNS is to simply add the IPNS key to the list of peers to query. We'll still need to find the peer but this may speedup some queries.

GetClosestPeers returning the zero address.

We should be judiciously stripping these out.

See #88 (comment)

During DHT crawl, keep evicted peers as a backup

Is the PING message type used?

I can see all the other message types (see dht.proto) being used (e.g. PUT_VALUE, GET_VALUE, etc.) but PING seems unused. I'm writing a DHT crawler and this PING message seems quite useful to know who is online, so I'm considering writing a wrapper function for it (similar to getValueSingle).

Improve GetProviders Performance by around 2x

By doing a series of small improvements I was able to improve the query time in TestLargeProvidersSet by around 2x.

Improvement	Time (single CPU)	Speedup (from Initial)
Initial time	7290 ms	1x
Avoid go channels #38	6249 ms	1.2 x
Avoid redundant key check #38	3791 ms	1.92 x
Fast Base32 conversion (already in master)	3719 ms	1.96 x

Avoid unnecessary copy of databases Value ipfs/go-datastore#59	3430 ms	2.13 x

Redundant Error checking

Don't know the use of second err != nil expression. This case is already handled in the switch case. Although it might get optimised out during compilation but still not required.

go-libp2p-kad-dht/dht.go

Lines 150 to 162 in 654c41b

 switch err { 

 case ErrReadTimeout: 

 log.Warningf("read timeout: %s %s", p.Pretty(), key) 

 fallthrough 

 default: 

 return err 

 case nil: 

 break 

 } 

 if err != nil { 

 return err 

 }

Lock congestion at message sender

https://github.com/libp2p/go-libp2p-kad-dht/blob/master/dht_net.go?utf8=%E2%9C%93#L211

Full stack at: https://hastebin.com/rahanetiyi.go

I am not sure of the underlying reason.

Examples

Stand-alone usage examples would be lovely, as godoc snippets or buildable code. :)

Add GetValuesAsync function

Just like there is FindPeersAsync, there should be a function to lookup a value in the DHT asynchronously, GetValuesAsync. With no maximum number of records (just a cancellable context) and a channel to get the results.

Fix protobuf issue when importing two versions of this package in the same codebase

There are potentially valid use-cases for having two different versions of this package imported in the same go binary. Its on the user to avoid any type interop issues, but it would be great if the annoying panic from protobuf registering the same name twice wouldnt happen.

Relay connection should be prioritized, especially for PutValue/Provide

Storing the discussion here so the idea doesn't get lost:

15:45 < BatO_> does someone know if there is a reason for the choice of Kademlia for the DHT ?
15:57 < whyrusleeping> BatO_: because its the state of the art in terms of DHTs
15:57 < whyrusleeping> really we're supposed to have implemented a mixture of S/Kademlia and coral
16:11 < BatO_> whyrusleeping: i'm asking because some other DHT design implement what they call a leaf set in addition to the routing table
16:12 < whyrusleeping> like which DHTs?
16:12 < whyrusleeping> chord?
16:13 < BatO_> pastry, chimera, bamboo
16:14 < BatO_> each node maintain a list of the closest nodes, and they propagate values and request to them
16:15 < whyrusleeping> thats because routing in those systems is done relative to the peer in question, not to the content they are dealing with
16:15 < whyrusleeping> if kademlia kept a table of the closest peers for each key it dealt with, it would quickly run out of memory
16:15 < whyrusleeping> what would likely help though is caching findclosestpeers responses
16:16 < whyrusleeping> at least for some short period of time
16:17 < BatO_> hmm, i'm not sure which one of us didn't get it
16:17 < BatO_> each node maintain a list of the closest nodes to himself
16:18 < whyrusleeping> kademlia does this
16:18 < whyrusleeping> via the routing table
16:19 < whyrusleeping> the higher order the bucket of the kademlia routing table, the closer the peers are to us
16:19 < whyrusleeping> unless by 'close' you mean latency
16:19 < BatO_> yes but there is no relay of the query, especially not the putValues queries
16:19 < whyrusleeping> yeah, relaying queries is bad
16:20 < whyrusleeping> it leads to very easy amplicifcation attacks
16:20 < BatO_> even if it's just the end ?
16:22 < BatO_> i'm asking because, if a node cannot connect to the target node and it's neighbor to store a value, the query later will perform badly
16:22 < BatO_> relaying just the end could mitigate that
16:23 < BatO_> i'm just wondering though, i'm not sure if that's actually a problem now
16:23 < whyrusleeping> if the nodes online, and a node we're connected to could relay the value for us, we could just use the circuit relay to connect to that target node and do the put instead
16:23 < whyrusleeping> actually
16:23 < whyrusleeping> thats a great idea BatO_
16:23 < whyrusleeping> the dht should prioritize using relayed connection for DHT queries
16:24 < BatO_> huu, yeah, i totally had this idea ;)
16:24 < whyrusleeping> more dht nodes would have to support relaying for this to be effective
16:24 < whyrusleeping> but
16:24 < whyrusleeping> if we get that implemented, it should really speed things up

@whyrusleeping

Are there bootstrap seed nodes?

In dht_bootstrap.go I was somehow expected seed nodes to be defined somewhere. How does the very first connection to the DHT work? Presumably there needs to be at least one seed node.

Tests Hang

Saw a test rung hang on CI, don't yet have enough information to debug (no idea which test hung).

( @Kubuxu )

What is clientOnly?

The IpfsDHT object defined in dht.go has a clientOnly bool field which seems to be unused.

What is this field? Is it really unused, and if not what is its use?

DHT Query Performance

DHT queries are slow, in the order of several seconds, which is a performance problem.
This issue is here to discuss performance and propose ways to measure and improve it.

Broken link

The file dht.go has the following line:

// TODO. SEE https://github.com/jbenet/node-ipfs/blob/master/submodules/ipfs-dht/index.js

The link https://github.com/jbenet/node-ipfs/blob/master/submodules/ipfs-dht/index.js seems to be broken.

backoff triggered inappropriately by `context canceled` errors from DHT queries.

It seems like backoff is getting triggered inappropriately by context canceled errors from DHT queries.

I have been unable to get the tests to run from my own fork go-libp2p-kad-dht because of my fights with gx, however, the problem can be duplicated over on the holochain codebase which uses a variant of kat-dht.

Here is a test that reveals the problem. This test simply creates 10 nodes and has each node query the other then nodes in a nested for loop.

When I run it, the test invariably fails because [this line in swarm_dail.go] (https://github.com/libp2p/go-libp2p-swarm/blob/7edb555bea6f8c4a3183d79cedc0cd0ebced06da/swarm_dial.go#L188) gets called after a context canceled error, which I'm pretty sure is coming from the query, and is interpreted as a connection failure instead of the cancellation that it really is.

As an experiment I change that line to:

if err.Error() != "context canceled" {
	s.backf.AddBackoff(p) // let others know to backoff
}

And then my test runs just fine, which indicates that nodes are actually all available for connection it's just that the backoff from the cancel causes a subsequent connection to the same node that was cancelled be treated incorrectly as a network failure.

	switch err {
	case ErrReadTimeout:
	log.Warningf("read timeout: %s %s", p.Pretty(), key)
	fallthrough
	default:
	return err
	case nil:
	break
	}

	if err != nil {
	return err
	}