GithubHelp home page GithubHelp logo

kansas's Introduction

kansas

kansas is a custom reverse proxy which is used internally as part of Zulip. Its purpose is to sit between the nginx load-balancer and the Tornado-based real-time event system, and forward connections from clients to their appropriate back-end shards. It is not necessary unless there is more than one Tornado shard.

Constraints

  • The database is authoritative for where a realm's queues are.
  • Durability
    • In the event of a failure of the service, it should not result in dropping all queues on the floor.
    • Memcached is not durable, so is not enough to satisfy the previous criteria.
    • The default codepath should not require a database access.
  • We may want a concept of a "default" assignment which uses consistent hashing to not have to make an explicit choice about which backend to use.
    • However, not all realms are the same size; it is useful to be able to pin some large realms to be on different shards from each other.
    • This also allows us to slowly shift load off of a shard before shutting it down, in order to not shock-load the shards by suddenly moving a large number of queues.
  • Given that we want to be able to move queues between shards
    • We want to generally have all queues for a realm on the same shard, since that makes for a better common experience in case of shard outage, and minimizes messages on the rabbitmq (or eventually PostgreSQL) bus.
    • While sharding is configured at the realm level, it happens at the queue level. Which means that moving a realm requires locking and serializing all queues -- or relaxing the constraint that realms are only ever on one shard.
    • Enabling realms to move piecewise, and not atomically, also allows us to spread the load on the backend tornado instances for doing the queue serialization, at the cost of an extra message in the rabbitmq bus per shard that the realm is on. This is not a high cost.
    • The durable storage in the database knows nothing about queue-ids, however, which means that moves have to happen at the granularity of users, at very least.

Queue moves

Queue moves with page reloads

  • Commit change to the database
  • 400 with BAD_EVENT_QUEUE_ID anything which comes in which would go to the old shard
    • Can either do this invalidation at the kansas level, or by enqueuing something to the old Tornado shard which tells it to drop the relevant queues.

Transparent queue moves

  • Create queues in new shard
  • Commit change to database
  • Ensure nothing is writing to old shards using stale database data (..somehow?)
  • Ensure old shard has consumed all of the data
  • Pause requests from clients
  • Serialize queues in old shard
  • Send to new shard
  • Merge serialized data with anything which has been sent to the queue already
  • Resume requests, but to the new shard

Restarts of Kansas without connection drops

  • Open a UNIX socket to transfer state
  • Fork and exec the new binary
  • Old process writes the listen socket (port 9799) into the UNIX socket
  • New process reads the socket out and calls listen() but not accept()
  • New process sends a signal (e.g. USR1) to the old one that it has started and is ready
  • Old process stops closes the listen socket
  • Old process serializes the state of its internal state to the UNIX socket
  • New process reads the state
  • New process starts the accept() calls on the listen socket
  • Old process finishes any in-flight requests
  • Old process exits gracefully

Supervisor won't like this, because it wants to be the parent PID of what it monitors. We can work around that by making the first process supervisor runs be the one in change of mediating all of the above actions, so it never exits (unless its one child exits) and is the parent of the actual kansas programs which do request handling.

Additionally supervisorctl stop kansas will always do a full stop / start. As of Supervisor 3.2.0, you can supervisorctl signal SIGUSR1 or what-not, however.

Restarts of Tornado without connection drops

  • On connection close mid-request from tornado, or new request with dead tornado
  • 502 anything that's not a GET (e.g. DELETE's are not safe to manufacture)
  • Add socket to waiting queue for that shard
  • Wait until earlier of 50s or backend comes back
  • Respond with empty response, or maybe some synthetic "waiting" if hit 50s
  • Add rate-limit before popping next one off, to spread request load

TODO

  • Load-test
  • Garbage-collection of queues after 10 minutes, to match Tornado
  • Persist the listen socket, using https://github.com/tormol/uds
  • Persist the internal state
    • send in-memory store using serde over a socket
    • write into a local sqlite database
    • store in redis
    • just talk to postgres directly
  • Paper over Tornado restarts
  • Request tracing?

kansas's People

Contributors

alexmv avatar andersk avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

andersk

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.