GithubHelp home page GithubHelp logo

analytics-java-benchmark's Introduction

Java Library Benchmark

This benchmark illustrates the basic flushing principles of the the Segment.io java library.

Goals

Our libraries should be convenient and should not force the user to have to write extra code or worry about performance.

Our libraries should not cause the crash of the application.

Equivalently, we shouldn't starve the host from resources like CPU, memory, or network even when there's a lot of data going through the library.

Benchmark

The benchmark calls

Analytics.track(userId, "Benchmark");

at a specified rate. Every 1 second, the benchmark samples these variables:

Variables

  • The system's CPU %
  • The JVM's memory usage (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory())
  • The amount of messages inserted into the queue
  • The amount of messages successfully sent to the server
  • The amount of messages that failed to make it to the server
  • The current queue size

Testbed

This benchmark was run on jdk7 on my retina Mac OS X mountain lion laptop.

Non-Blocking

Our client libraries use in-memory queues to make sure your calls to:

Analytics.identify(...)
Analytics.track(...)

return very quickly, and without waiting* for the actual HTTP request to our servers to happen. This allows you to send us data from web servers and other performance sensitive code without worrying about us blocking the calling thread.

When you call Analytics.track or any similar methods, all we do is validate the input and put it in the in memory queue. In practice, we found this takes less than a millisecond.

Flushing

The library will use exactly one extra thread to constantly flush the queue.

The flushing thread does the following:

do
  batch = []
  do
    msg = wait_for_message(queue)
    batch += msg
  while batch.size() <= 20 and queue.size() > 0

  if batch.size > 0
    flush(batch)

while active

It will wait until the queue has a message. As soon as there's something to send, the flushing thread will collect as much as 20 messages. Once it has collected its current batch, it will make the request to the server. Then, repeat.

The advantage is, if you aren't flushing many messages, your messages will be sent immediately. However, if you're sending tons of messages, the flushing thread will collect large batches and send them together to maximize the request throughput and decrease TCP connection overhead.

Even at ah high rate of 50 requests a second, the flushing thread can match the insert rate:

In this test, we can see that the queue rarely grows since the flushing thread can flush as fast as messages are coming in.

CPU Usage

Since there's only one flushing thread, both CPU and network won't be saturated.

Here's the CPU usage over the 50 requests per second test:

And then again at 500 requests per second:

Maximum Queue Size

In situations where more messages are coming in than can be flushed, the library avoids running out of memory by disallowing new messages into the queue. The maximumQueueSize defaults to 10000.

Here's a demonstration of 500 analytics calls each second:

In this test, more messages are being added than can be flushed. You can see that the queue size grows until its at 10,000 at which time it stops accepting new messages until more are flushed.

Looking at the JVM's memory (totalMemory - freeMemory) during this time, you can see that this constraint prevents the memory from overflowing:

Here's another test at 500 requests per second, except without a maximum queue size constraint:

And you can see the memory growing out of control:

Data

The data and graphs are available as public Google spreadsheets:

Run it yourself

You can run the benchmark yourself via the JavaClientBenchmark main class.

analytics-java-benchmark's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.