GithubHelp home page GithubHelp logo

DHT crawling about bittorrent-dht HOT 5 CLOSED

webtorrent avatar webtorrent commented on May 21, 2024
DHT crawling

from bittorrent-dht.

Comments (5)

feross avatar feross commented on May 21, 2024

You can't really query a DHT node to find out what torrents it is tracking (as far as I'm aware). You can do the reverse though, then build a map of ip -> infohash as you crawl the DHT.

from bittorrent-dht.

feross avatar feross commented on May 21, 2024

@fbodz Looks like they're just issuing repeated, psuedo-random find_node or get_peer queries to the DHT and printing out a message whenever they find a new infohash.

So, in this way, you can discover infohashes. Then, if you combine that with a torrent client to connect and download the metadata (via the ut_metadata extension, supported in webtorrent btw) then you can learn what the torrent actually contains.

from bittorrent-dht.

ralyodio avatar ralyodio commented on May 21, 2024

Had anyone written a tutorial or script on how to do this?

from bittorrent-dht.

ralyodio avatar ralyodio commented on May 21, 2024

Can someone help me with this code? I'm trying to crawl the dht (using chat-gpt to help me, but its pretty buggy):

import DHT from 'bittorrent-dht';
import bencode from 'bencode';
import Protocol from 'bittorrent-protocol';
import net from 'net';
import Tracker from 'bittorrent-tracker';
import crypto from 'crypto';
import dotenv from 'dotenv-flow';
import Surreal from 'surrealdb.js';
import BaseController from './base.js';
import { Account } from '../../src/models/account.js';

dotenv.config()
const { DB_RPC_URL, DB_USER, DB_PASS, DB_NS, DB_DB, DB_PORT } = process.env;

export default class DHTCrawler extends BaseController {
    constructor(targetNodes = 1000) {
        super();
        // this.db = new Surreal(DB_RPC_URL);
        // this.account = new Account(this.db)
        this.targetNodes = targetNodes;
        this.dht = new DHT();
        this.discoveredInfoHashes = new Set();
    }

    async init() {
        await new Promise((resolve) => {
            this.dht.on('ready', () => {
                console.log('DHT is ready');
                resolve();
            });
        });

        this.dht.on('announce', (peer, infoHash) => {
            const { host, port } = peer;
            console.log(`announce: ${host}:${port} ${infoHash.toString('hex')}`);
        });

        this.dht.on('peer', (peer, infoHash, from) => {
            console.log('peer:', infoHash, peer.toString('hex'));
            const infoHashHex = infoHash.toString('hex');

            if (!this.discoveredInfoHashes.has(infoHashHex)) {
                this.discoveredInfoHashes.add(infoHashHex);
                console.log(`Discovered infohash: ${infoHashHex}`);
                this.fetchMetadata(infoHash, peer);
                this.lookupNext(infoHash);
            }
        });

        this.dht.on('response', (node) => {
            const nodeIdHex = node.r.id.toString('hex');
            if (!this.discoveredInfoHashes.has(nodeIdHex)) {
                this.discoveredInfoHashes.add(nodeIdHex);
                console.log(`Discovered response node: ${nodeIdHex}`);
            }
        });

        this.dht.on('find_node', (node) => {
            const nodeIdHex = node.toString('hex');
            if (!this.discoveredInfoHashes.has(nodeIdHex)) {
                this.discoveredInfoHashes.add(nodeIdHex);
                console.log(`Discovered find_node: ${nodeIdHex}`);
            }
        });


        // Bootstrap the DHT crawler with a known DHT node.
        this.dht.addNode({
            host: 'router.bittorrent.com',
            port: "6881"
        });

        console.log('DHT bootstrap completed');
        await this.lookupNext();
    }

    async fetchMetadata(infoHash, peer) {
        const socket = new net.Socket();
        const wire = new Protocol();

        const onMetadata = (metadata) => {
            const torrent = bencode.decode(metadata);
            console.log('Torrent metadata:', {
                infoHash,
                name: torrent.info.name.toString('utf-8'),
                files: torrent.info.files
                    ? torrent.info.files.map((file) => file.path.toString('utf-8'))
                    : [],
            });

            this.getSeedersAndLeechers(infoHash);
        };

        socket.setTimeout(5000, () => {
            socket.destroy();
        });

        socket.connect(peer.port, peer.host, () => {
            socket.pipe(wire).pipe(socket);
            wire.handshake(infoHash, this.dht.peerId, { dht: true });
        });

        wire.on('handshake', (infoHash, peerId, extensions) => {
            if (extensions.extended) {
                wire.extendedHandshake = { m: { ut_metadata: 1 } };
                wire.extended(0, bencode.encode(wire.extendedHandshake));
            }
        });

        wire.on('extended', (ext, buf) => {
            if (ext === 0) {
                const extendedHandshake = bencode.decode(buf);
                if (extendedHandshake.m && extendedHandshake.m.ut_metadata) {
                    const utMetadataId = extendedHandshake.m.ut_metadata;
                    wire.ut_metadata = new Protocol.UTMetadata(extendedHandshake.metadata_size);
                    wire.ut_metadata.fetch();
                    wire.on(`ut_metadata${utMetadataId}`, wire.ut_metadata.onMessage.bind(wire.ut_metadata));
                    wire.ut_metadata.on('metadata', onMetadata);
                }
            }
        });

        wire.on('timeout', () => {
            socket.destroy();
        });

        wire.on('close', () => {
            socket.destroy();
        });
    }

    getSeedersAndLeechers(infoHash) {
        const client = new Tracker({
            infoHash: infoHash,
            peerId: this.dht.peerId,
            announce: ['udp://tracker.openbittorrent.com:80'],
        });

        client.start();

        client.once('update', (data) => {
            console.log('Torrent seeders and leechers:', {
                infoHash,
                seeders: data.complete,
                leechers: data.incomplete,
            });
            client.stop();
        });

        client.on('error', (err) => {

            console.error(`Error getting seeders and leechers for ${infoHash}:`, err.message);
            client.stop();
        });
    }

    async lookupNext(infoHash) {
        if (this.dht.nodes.count() >= this.targetNodes) {
            console.log('Reached target node count');
            return;
        }

        if (!infoHash) {
            infoHash = crypto.randomBytes(20);
        }
        try {
            await new Promise((resolve, reject) => {
                this.dht.lookup(infoHash, (err) => {
                    if (err) {
                        reject(err);
                    } else {
                        resolve();
                    }
                });
            });
        } catch (err) {
            console.error('Error during lookup:', err);
        }


        setTimeout(() => this.lookupNext(infoHash), 1000);
    }
}

const crawler = new DHTCrawler();
crawler.init();

from bittorrent-dht.

ralyodio avatar ralyodio commented on May 21, 2024

https://stackoverflow.com/questions/77810843/how-can-i-index-the-bittorrent-dht-properly-for-infohashes

https://www.youtube.com/watch?app=desktop&v=cvQrNoCwxgE

from bittorrent-dht.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.