GithubHelp home page GithubHelp logo

dht-crawler's People

Contributors

0xcaff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dht-crawler's Issues

Add Readme

This project is awesome but no one can tell because there isn't a readme.

Iterative Query Helper Method

Make a helper to asynchronously iteratively query to find_node and get_node. We would only need find_node for bootstrapping (find_node(self))

Timeout and Dropping Behavior

Currently, the TransactionFuture doesn't clean up when it is dropped. It should do this so timeouts can be easily implemented.

Better documentation

How to use this tools, what can be achieved? Can i get torrent infohash, how they are stored?

Bucket Refresh

From BEP0005:

Each bucket should maintain a "last changed" property to indicate how "fresh" the contents are. When a node in a bucket is pinged and it responds, or a node is added to a bucket, or a node in a bucket is replaced with another node, the bucket's last changed property should be updated. Buckets that have not been changed in 15 minutes should be "refreshed." This is done by picking a random ID in the range of the bucket and performing a find_nodes search on it. Nodes that are able to receive queries from other nodes usually do not need to refresh buckets often. Nodes that are not able to receive queries from other nodes usually will need to refresh all buckets periodically to ensure there are good nodes in their table when the DHT is needed.

I believe this can also be used to do bootstrapping. First, the bootstrap nodes are put into the routing table. Next, the only bucket in the routing table is refreshed. This would be the only time (aside from adding the bootstrap node) that nodes are added to the routing table.

Need to think about the ownership of the bucket refresh process.

Rename Envelope -> Message

Envelopes usually contain a binary string with an encoded message. Our Envelope isn't an envelope in that sense.

Logging

Implement logging.

There should be a way to keep track of relevant events. It shouldn't also be an obstacle to the DHT client.

Stateless Tokens

Currently, verifying the validity of a token requires having state. The state needs to be updated ever 15m. We can remove the need for this state by sending an encrypted blob which includes the ip address and at which the token expires. This will work similar to the way JWT works but without a more compact encoding and a fixed encryption algorithm.

Network Penetration

We would like to maximize the number of IP addresses and torrent infohashes collected.

We will collect infohashes whenever we see them. For example, whenever we receive a request for get_peer or announce_peer, the infohash will be collected.

We will collect IP addresses whenever someone calls get_peer or announce_peer. BEP42, says only peers with valid node ids are considered valid storage targets making it difficult to scale collection of announce_peer part up without many IP addresses.

Prior work in this area creates an abusive client which pretends to be near node id's it finds. https://github.com/boramalper/magnetico

We need to figure out a strategy which balances being good and collecting tons of information.

Stub Out Routing Table

Operations:

  • Add Node
  • Get Nodes Nearest To NodeID
  • Get Node ID's Contact Information
  • Schedule Node Updates

Timeouts

Currently if a request is sent and a response isn't received, a Future is polled forever. There should be a timeout and error instead.

Write Bootstrap Method

There should be a method, bootstrap. It returns a future which while polled will keep the dht updated:

  • handling inbound messages
  • update peers as needed

It should do the following.

  • Bootstrap
  • Generate Random Node ID
  • Fetch Nodes Near Us
  • Add Them to Routing Table

Handle Congestion

After using a timeout for our only outbound request (#24), we've now encountered another problem: if too much traffic is sent at a time, some of it gets dropped. During bootstrap, eventually so much traffic is sent at the same time that all requests fail and the bootstrap dies.

Some ideas on how to handle this:

  1. With a limited number of retries, resend the request after a fixed interval. This could work but requests might pile up and get resent at the same time just moving the problem into the future.
  2. Don't send so many requests when bootstrapping. libtorrent and synapse only send a few requests to bootstrap. Bootstrapping is done after 32 nodes are in the routing table (or buckets for synapse?). We are discovered by other nodes responding to find_nodes with our information.
  3. Only allow a certain amount of requests to be sent in a time interval. This is going down the road of measuring available bandwidth delay product (very complicated).

How others handle this:

  • Synapse. Doesn't handle it. Ignores dropped requests.
  • anacrolix/dht. Handles by re-sending after a fixed interval a limited number of times.
  • libtorrent. Doesn't seem to handle it. I'm not sure exactly how libtorrent handles bootstrapping either. I think it just bootstraps a few nodes and gets other nodes by gossiping.

Add Top Level `get_nodes`

Problem

The DHT implementation isn't really useful because it can't be used as a library. get_nodes and announce_peer are needed for this to work well.

Make Packages

Encapsulation, tests and documentation suck. Use packages as a forcing function to make these things better.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.