GithubHelp home page GithubHelp logo

Load testing about deployment HOT 4 CLOSED

tfwiki avatar tfwiki commented on June 23, 2024
Load testing

from deployment.

Comments (4)

rjackson avatar rjackson commented on June 23, 2024

WIP comment; updating as per findings

What's the typical traffic we currently receive (users/sec, insight from Google Analytics)

From Google Analytics' "Audience" data over the last 2 years, 2016-01-09 to 2018-01-09, converting to per-timeframe equivalents:

Metric Value Per day Per hour Per minute Per second
Sessions 34,556,894 47,338 1,972 32 0.54
Users 14,372,905 19,689 820 13 0.21
Page views 119,635,151 163,884 6,829 114 1.9
Pages/session 3.46
Avg. Session Duration 00:04:26

(What we refer to as "users" in load test would map to "sessions" in the above data)

To mimic this average traffic in a load test, we would have to create a user which browses 3.5 pages every 4.5 minutes (1 page per 90 seconds):

class AverageUser(HttpLocust):
    """ Emulate an average user according to Google Analytics data collected between
        2016-01-09 and 2018-01-09 """

    task_set = Top10Pages

    # Average of 90 seconds, but include += 50% variance
    avg_wait = 90 * 1000
    min_wait = 0.5 * avg_wait
    max_wait = 1.5 * avg_wait
    

From the average session duration, we can also figure out how many simultaneous visitors the website serves: 32 sessions per minute * 4.5 minute average session duration = 144 simultaneous sessions.

How many requests-per-second does that equate to?

Running a load test (tfwiki/load-tests) which models this average user behaviour, operating 144 simultaneous users, with the load test also loading page resources (images, stylesheets, javascript) we see our typical traffic generates approximately 40 requests per second to the server.

This appears to be severely slowed down by the MediaWiki pod's handling images (we're seeing the pod I/O bound instead of CPU bound). It may be worth re-evaluating this without handling images to get a better idea of raw MediaWiki performance.

from deployment.

rjackson avatar rjackson commented on June 23, 2024

What does a major update spike look like?

The Pyromania Update caused the largest single-day traffic spike in the Wiki's history on June 28, 2012. Let's create a "Major Update" traffic model based on traffic on that day:

Metric Value Per hour Per minute Per second
Sessions 447,249 18,635.38 310.59 5.18
Users 250,774 10,448.92 174.15 2.9
Page views 3,257,921 135,746.71 2,262.45 37.71
Pages/session 7.28
Avg. Session Duration 00:08:02

(What we refer to as "users" in load test would map to "sessions" in the above data)

To mimic these users in a load test, we would have to create a user which browses 7.5 pages every 8 minutes (1 page per 64 seconds):

class PyromaniacUser(HttpLocust):
    """ Emulate Pyromania update users, according to behaviour observed on 2012-06-28 """

    task_set = PyromaniaTop10Pages

    # Average of 64 seconds, but include += 50% variance
    avg_wait = 64 * 1000
    min_wait = 0.5 * avg_wait
    max_wait = 1.5 * avg_wait

From the average session duration, we can also figure out how many simultaneous visitors the website served during this event: 311 sessions per minute * 8 minute average session duration = 2488 simultaneous sessions.

from deployment.

rjackson avatar rjackson commented on June 23, 2024

Yeeaah, current set up with 4 Varnish instances (1 per server) currently handles a boat load of traffic perfectly fine – I've got it handles 10,000 simulated Pyromania users and its not breaking a sweat. So no need to worry about getting resource limits perfect yet, we can handle that down the line.

127 0 0 1_58906_

from deployment.

rjackson avatar rjackson commented on June 23, 2024

Don't care about these reports any more. Live traffic never quite matched up due to the Wiki having a lot of pages, and thus a lot of uncached content when we went live.

So the numbers seemed impressive, but weren't realistic.

With typical traffic nowadays, 4 Mediawiki containers just about manage the non-Cached traffic. Clearing Varnish significantly increases load, and Kubernetes' horizontal pod autoscaler can be slow to update accordingly, leading to single-digit minutes worth of perceived downtime. In those scenarios, manually scaling MW up to 8 containers seems to typical traffic well enough (likely more than needed), and Kubernetes will auto scale back down to the minimum of 4 containers when Varnish has taken over the brunt of the impact.

from deployment.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.