GithubHelp home page GithubHelp logo

linux-microsecondrto's Introduction

Linux TCP microsecond timers patch
Author: Vijay Vasudevan <[email protected]>

This patch adds support for microsecond-granularity TCP
retransmissions.  It replaces the existing coarse-grained timers
using the jiffy timer with microsecond-graunlarity timers using the
high-resolution timer (hrtimer) infrastructure.

Warning: This patch can cause the system to hang during heavy network
activity for versions after 2.6.28.  I have not root caused the reason,
so proceed with caution.  If you identify the problem, please let me know
or submit a pull request to this patch.

Background: This patch was developed to help mitigate the "incast"
problem for wide-fan-in, synchronized communication in datacenters.
See http://www.pdl.cmu.edu/Incast/ for details.  Our solution required
retransmissions to fire on the timescale of latencies in a datacenter
(i.e., microseconds).  This patch enables machines with
high-resolution timer capabilities to retransmit quickly on packet
loss.  Keep in mind that the retransmission timeout is always
calculated using the flow latency, so wide-area communication should
still have timeouts in the tens to hundreds of milliseconds using this
patch.

This patch has only been tested in local clusters and on a few
externally facing machines, but needs much larger-scale testing before
further adoption.  This repository contains two versions of the patch,
one for Linux 2.6.28.10 and one for 2.6.39 (cloned just hours before
3.0-rc1 was released, so it should work on 3.0-rc1 too). The 2.6.28.10
patch has been tested in real-world clusters, whereas the 2.6.39
patch compiles but has not been tested and may contain bugs
(maintaining a separate patch across 2 years of independent kernel
development is tough!).

This patch requires that the kernel config options HIGH_RES_TIMERS and
TCP_HIGH_RES_TIMERS are enabled.

The TCP timeout defaults are unchanged on boot (e.g., 200ms minimum
TCP RTO), so you will have to execute the following sysctl commands to
change them.

sysctl -w net.ipv4.tcp_rto_min=X
  sets the minimum retransmission timeout to X microseconds. (Default 200000)

sysctl -w net.ipv4.tcp_delack_min=X
  sets the minimum delayed ACK timeout to X microseconds. (Default 40000)

sysctl -w net.ipv4.tcp_delayed_ack=0
  "disables" the delayed ACK mechanism. (=1 enables). (Default 1)

Notes:

1) These patches have not been tested, nor do they compile, with TCP
   congestion control options such as cubic, bicubic, lp, etc.
   Modifying these congestion control implementations should be
   somewhat straightforward if based off the changes from the default
   implementation.  Volunteers to implement/test are encouraged.

2) An alternative to these patches is to simply reduce TCP_RTO_MIN in
   tcp.h to 1.  However, this provides no less than 5ms
   retransmissions because the existing timer infrastructure is tied
   to HZ, which does not increase above 1000 in most configurations.
   However, this is a simple one-line change.  Note that this may have
   strange interactions with delayed ACK unless you use a patch
   above to disable delayed ACK.  See
   http://www.pdl.cmu.edu/PDL-FTP/Storage/sigcomm147-vasudevan.pdf for
   details.

3) Please read up on the incast problem using the links above to
   understand when this patch is applicable.

Please email me if you have trouble and let me know what kernel
version and the errors you are receiving.  Pull requests to fix bugs
or add functionality are greatly appreciated!

linux-microsecondrto's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.