GithubHelp home page GithubHelp logo

combinatorrent's Introduction

Combinatorrent - a bittorrent client.

Build Status

Introduction

This is a bittorrent client. I am the introduction document and I need to be written by some generous soul!

Installation

Here is what I do to install haskell torrrent locally on my machine:

cabal install --prefix=$HOME --user

Since we are using the magnificient cabal, this is enough to install haskell torrent in our $HOME/bin directory.

Usage

Combinatorrent can currently only do one very simple thing. If you call it with

Combinatorrent foo.torrent

then it will begin downloading the file in foo.torrent to the current directory via the Bittorrent protocol.

Protocol support

Currently haskell-torrent supports the following BEPs (See the BEP Process document for an explanation of these)

  • 0003, 0004, 0006, 0010, 0020,

Combinatorrent implicitly supports these extensions

  • 0027: Support by the virtue of only supporting a single tracker and no DHT.

Partially supported extensions:

  • 0007: Combinatorrent understands and uses the "peers6" response from the tracker to connect clients. On the other hand, it does nothing to provide the "ipv4=" and "ipv6=" keys on tracker requests. As such, it can be claimed that 0007 support is available, as everything we left out is only qualified as MAY.

  • 0023: Combinatorrent supports the "compact" response only, although it is explicitly stated that the client must support both. In practice it has little impact as all modern trackers will only return compact responses anyway.

Combinatorrent is not supporting these BEPs, but strives to do so one day:

  • 0005, 0009, 0012, 0015, 0016, 0017, 0018, 0019, 0021, 0022, 0024, 0026, 0028, 0029, 0030, 0031, 0032

Debugging

For debugging, jlouis tends to use the following:

make conf build test

This builds Combinatorrent with the Debug flag set and also builds the software with profiling by default so it is easy to hunt down performance regressions. It also runs the internal test-suite for various values. There are a couple of interesting targets in the top-level Makefile

Reading material for hacking Combinatorrent:

  • Protocol specification - BEP0003: This is the original protocol specification, tracked into the BEP process. It is worth reading because it explains the general overview and the precision with which the original protocol was written down.

  • Bittorrent Enhancement Process - BEP0000 The BEP process is an official process for adding extensions on top of the BitTorrent protocol. It allows implementors to mix and match the extensions making sense for their client and it allows people to discuss extensions publicly in a forum. It also provisions for the deprecation of certain features in the long run as they prove to be of less value.

  • wiki.theory.org An alternative description of the protocol. This description is in general much more detailed than the BEP structure. It is worth a read because it acts somewhat as a historic remark and a side channel. Note that there are some commentary on these pages which can be disputed quite a lot.

  • "Supervisor Behaviour" From the Erlang documentation. How the Erlang Supervisor behaviour works. The Supervisor and process structure of Combinatorrent is somewhat inspired by the Erlang ditto.

Source code Hierarchy

  • Data: Data structures.

    • Queue: Functional queues. Standard variant with two lists.
    • PendingSet: A wrapper around Data.PSQueue for tracking how common a piece is.
    • PieceSet: BitArrays of pieces and their operations.
  • Process: Process definitions for the different processes comprising Combinatorrent

    • ChokeMgr: Manages choking and unchoking of peers, based upon the current speed of the peer and its current state. Global for multiple torrents.
    • Console: Simple console process. Only responds to 'quit' at the moment.
    • DirWatcher: Watches a directory and adds any torrent present in it.
    • FS: Process managing the file system.
    • Listen: Not used at the moment. Step towards listening sockets.
    • Peer: Several process definitions for handling peers. Two for sending, one for receiving and one for controlling the peer and handle the state.
    • PeerMgr: Management of a set of peers for a single torrent.
    • PieceMgr: Keeps track of what pieces have been downloaded and what are missing. Also hands out blocks for downloading to the peers.
    • Status: Keeps track of uploaded/downloaded/left bytes for a single torrent. Could be globalized.
    • Timer: Timer events.
    • TorrentManager: Manages torrents at the top-level.
    • Tracker: Communication with the tracker.
  • Protocol: Modules for interacting with the various bittorrent protocols.

    • BCode: The bittorrent BCode coding. Used by several protocols.
    • Wire: The protocol used for communication between peers.
  • Top Level:

    • Channels: Various Channel definitions.
    • Combinatorrent: Main entry point to the code. Sets up processes.
    • Digest: SHA1 digests as used in the bittorrent protocol.
    • FS: Low level Filesystem code. Interacts with files.
    • Process: Code for Erlang-inspired processes.
    • RateCalc: Rate calculations for a network socket. We use this to keep track of the current speed of a peer in one direction.
    • Supervisor: Erlang-inspired Supervisor processes.
    • Test.hs: Code for test-framework
    • TestInstance.hs: Various helper instances not present in the test framework by default
    • Torrent: Various helpers and types for Torrents.
    • Tracer: Code for simple "ring"-like tracing.
    • Version.hs.in: Generates Version.hs via the configure script.

combinatorrent's People

Contributors

abhin4v avatar astro avatar axman6 avatar erikd avatar jlouis avatar johngunderman avatar nikmikov avatar saizan avatar thomaschrstnsn avatar trofi avatar unkindpartition avatar vincenthz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

combinatorrent's Issues

Use intelligent flushing of the sender queue

Currently, the send queue just flushes itself after each message has been delivered. If we have #5 implemented, we can choose to flush on a rarer basis than right now. This would definitely improve our system and help the kernel some more.

Do not send HAVE messages if the peer already has the piece.

When we complete a piece at our end, it is mandatory to send a HAVE message to other peers so they can begin requesting the piece from us. However, if the given peer already posses the given piece, there is no need to send a message over the wire to him.

This optimization is fairly straightforward since the Peer process already has all the needed information at its disposal.

Handle Snubbing correctly.

When a peer has not sent us any data for some time he is snubbing us. Handle this to more aggressively go for new peers.

Optimize block reading access

Right now, 16k blocks are read fairly early and then kept in a queue of requested pieces. So a client requesting 32 blocks will have a memory consumption of at least 32 x 16K = 512K. That is far too much, if we expect there to be 40-100 connections. It amounts to something along the lines of 20-50 megabytes of waste.

It is possible to optimize this. The first part is simply to only request this at the last possible time so the fetched data can be thrown away after it has gone down the wire. It will also pave the way for the fast extension SUGGEST option.

Improve the command line parser

The command-line parser is currently quite weak and simple. It can be improved to handle many more commands and do much more than what is currently possible.

If you are feeling really adventurous, you should first play a couple of Infocom games, play with the inform interactive fiction creator tool, take a bit of craziness and implement "Adventure for a torrent". Less than this will do as well however :)

Optimize Piece Manager requests

When we grab pieces from the Piece Manager, let it provide us with a pruned set of pieces we can ask with later. This way, we only need to consider pieces we already have once and we get a faster system. When doing this, only prune pieces which are done and checked.

Currently, this is not in the hot code path, so it is not that important to pull off yet.

Add support for partial downloads.

This is a popular feature in modern clients. Rather than having to download everything, you allow the client to just download part of the file and then stop when the partial data is downloaded.

There are some consequences for getting this to work in the client, so it might be worth analyzing and planning a bit ahead before embarking on doing it.

PieceMgr assertion failure

"PieceMgrP"(Fatal): Process exiting due to ex: user error (P/Blk (655,Block {blockOffset = 81920, blockSize = 16384}) is in the HaveBlocks set)
"ConsoleP"(Info):   Process Terminated by Supervisor

This bug manifests itself from a failure in the PieceManager. Specifically, the above is an assertion failure because there is a block both in the set of blocks we have and blocks we are currently downloading. It may be stray blocks which is a part of the bittorrent protocol without the FAST extension.

Improve the HTML pages

My lack of HTML/Webdev skills are showing. If anybody wants to improve it, they are free to do so!

Add support for multiple trackers.

There is an extension for supporting multiple trackers in the same torrent. This extension should be added. There are numerous torrents out there where the main tracker is dead and gone for long but the "backup"-trackers in the multi-tracker list are still up and strong.

Send Queue Optimization

Currently, the send queue is one queue on which messages flow. The reason we are requesting fairly small 16k piece blocks is because we might want to interleave other messages in the queue stream. We don't do any kind of optimization on this at the moment.

If you take a look at the SCTP protocol, it has a session design in which sessions are multiplexed on top of the same line. That way, you could run a control channel independent of the data channel. We want to simulate this construction in combinatorrent.

A message to be queued is either a control message or a data message. We want messages on the control stream to take precedence over the messages on the data stream.

  • Change the sender Queue into having two queues, one for short
    messages and one for long messages.

Use mmap() for file I/O

We will assume a 64 bit architecture from the start. This means we can use mmap()'ed I/O all over the place and just map files into the VM memory space. In turn, this will enable fast disk I/O while outsourcing the caching problems to the kernel.

The task is somewhat isolated to the FS Process and its backend FileSystem library. However, it does need some work to get into a running state. Most importantly, one needs a way to do SHA1 on the mmap() store as well.

Listen port improvements.

There are two things to do to the Listen-port. One is to make it possible to select a different port than the default. The other is to select a random port off a range.

Not really an issue - burn after reading!

I found the post in master/doc/haskell-vs-erlang.mkd to be both fun and enlightening - where on the web is it published so I can "like it" (in the new real world ;-) )
(Google took me here)

Improve run-times of PieceSets

Our new PieceSet implementation is expensive to run. It now accounts for more than half of all the work done in the client. Improve this situation.

Obviously, one can track the complete pieceset as a specific constructor in the PieceSet datatype and thus skip a lot of the work in this common case.

Properly link supervisors into a tree.

Some supervisors are currently faked and are hanging on some processes in the wrong way. They should be part of a supervisor tree so the client will be closing down gracefully.

DHT support

One very interesting extension is that of DHT support. Getting this done enables a client to fetch peers from the DHT rather than from the tracker. It greatly improves the robustness of torrents. When doing this, it is important to heed to "private" field in the torrent file.

Utilize the HAVE ALL/NONE messages.

Combinatorrent currently just naively sends the Bitfield message irregardless of the corner cases. It would be more beneficial to send HAVE ALL or HAVE NONE if that is indeed the case.

Keep a track record of peers

Rather than just accepting any peer blindly, we should keep a track record of the peers we have spoken to in the past. This gives us a way to filter out peers based on their earlier merits, not connect to the same peer twice, blackhole peers which are consistenly bad and so on.

It also paves the way for blocklist support, should you want that kind of thing.

Support UDP tracking (BEP 0015)

Popular trackers dies die to excessive TCP handshaking for sending very little information back to the client. The UDP tracking extension allows one to communicate with the tracker through a UDP interface.

Write a Users Guide

A Users guide would be nice to have. Currently I wouldn't really bother with it, because the client changes too much all the time.

ETA estimation.

Write code which can estimate the completion time of torrents.

Use a rate estimate to decide how many blocks to request from the piece manager.

We currently always request 25 blocks and then do a rerequest when it hits 5 blocks left. In practice, a better solution is to take the upload rate, multiply it by a number of seconds (3-5) and then divide by the typical block size, e.g., around 17 kilobytes. This number is the number of pieces to request. We should also use this value to update the bound where we should fill up more blocks. In practice you need a bandwidth*delay product if this is to be done right.

if we can't get enough blocks, we can stop asking until something changes the game or a timer of 10 seconds or so expire.

Implement scraping

There is a nice scraper methodology for asking a tracker different kinds of information. Rather than using the current methodology, we could in its stead use the scraper method.

Consider a "pure seeder" mode.

In this mode, the client will assume it has ample upstream bandwidth and change its internal algorithms with the sole purpose of using as much bandwidth as possible. Details are to be worked out.

Combinatorrent may freak out GHC 6.12.1

We have hit this one (unfortunately with lost context):

HaskellTorrent: internal error: throwTo: unrecognised why_blocked value
     (GHC version 6.12.1 for x86_64_unknown_linux)
Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Aborted

It is bug #3923 at the
GHC trac.

Currently, we have not seen this bug in the wild for quite a while. If it is there, it is pretty rare. Given that we rearrange the concurrency all the time, it might be impossible to reproduce anymore.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.