Comments (4)
WIP comment; updating as per findings
What's the typical traffic we currently receive (users/sec, insight from Google Analytics)
From Google Analytics' "Audience" data over the last 2 years, 2016-01-09
to 2018-01-09
, converting to per-timeframe equivalents:
Metric | Value | Per day | Per hour | Per minute | Per second |
---|---|---|---|---|---|
Sessions | 34,556,894 | 47,338 | 1,972 | 32 | 0.54 |
Users | 14,372,905 | 19,689 | 820 | 13 | 0.21 |
Page views | 119,635,151 | 163,884 | 6,829 | 114 | 1.9 |
Pages/session | 3.46 | ||||
Avg. Session Duration | 00:04:26 |
(What we refer to as "users" in load test would map to "sessions" in the above data)
To mimic this average traffic in a load test, we would have to create a user which browses 3.5 pages every 4.5 minutes (1 page per 90 seconds):
class AverageUser(HttpLocust):
""" Emulate an average user according to Google Analytics data collected between
2016-01-09 and 2018-01-09 """
task_set = Top10Pages
# Average of 90 seconds, but include += 50% variance
avg_wait = 90 * 1000
min_wait = 0.5 * avg_wait
max_wait = 1.5 * avg_wait
From the average session duration, we can also figure out how many simultaneous visitors the website serves: 32 sessions per minute * 4.5 minute average session duration = 144 simultaneous sessions.
How many requests-per-second does that equate to?
Running a load test (tfwiki/load-tests) which models this average user behaviour, operating 144 simultaneous users, with the load test also loading page resources (images, stylesheets, javascript) we see our typical traffic generates approximately 40 requests per second to the server.
This appears to be severely slowed down by the MediaWiki pod's handling images (we're seeing the pod I/O bound instead of CPU bound). It may be worth re-evaluating this without handling images to get a better idea of raw MediaWiki performance.
from deployment.
What does a major update spike look like?
The Pyromania Update caused the largest single-day traffic spike in the Wiki's history on June 28, 2012. Let's create a "Major Update" traffic model based on traffic on that day:
Metric | Value | Per hour | Per minute | Per second |
---|---|---|---|---|
Sessions | 447,249 | 18,635.38 | 310.59 | 5.18 |
Users | 250,774 | 10,448.92 | 174.15 | 2.9 |
Page views | 3,257,921 | 135,746.71 | 2,262.45 | 37.71 |
Pages/session | 7.28 | |||
Avg. Session Duration | 00:08:02 |
(What we refer to as "users" in load test would map to "sessions" in the above data)
To mimic these users in a load test, we would have to create a user which browses 7.5 pages every 8 minutes (1 page per 64 seconds):
class PyromaniacUser(HttpLocust):
""" Emulate Pyromania update users, according to behaviour observed on 2012-06-28 """
task_set = PyromaniaTop10Pages
# Average of 64 seconds, but include += 50% variance
avg_wait = 64 * 1000
min_wait = 0.5 * avg_wait
max_wait = 1.5 * avg_wait
From the average session duration, we can also figure out how many simultaneous visitors the website served during this event: 311 sessions per minute * 8 minute average session duration = 2488 simultaneous sessions.
from deployment.
Yeeaah, current set up with 4 Varnish instances (1 per server) currently handles a boat load of traffic perfectly fine – I've got it handles 10,000 simulated Pyromania users and its not breaking a sweat. So no need to worry about getting resource limits perfect yet, we can handle that down the line.
from deployment.
Don't care about these reports any more. Live traffic never quite matched up due to the Wiki having a lot of pages, and thus a lot of uncached content when we went live.
So the numbers seemed impressive, but weren't realistic.
With typical traffic nowadays, 4 Mediawiki containers just about manage the non-Cached traffic. Clearing Varnish significantly increases load, and Kubernetes' horizontal pod autoscaler can be slow to update accordingly, leading to single-digit minutes worth of perceived downtime. In those scenarios, manually scaling MW up to 8 containers seems to typical traffic well enough (likely more than needed), and Kubernetes will auto scale back down to the minimum of 4 containers when Varnish has taken over the brunt of the impact.
from deployment.
Related Issues (20)
- Media sync between pre-Cloud and Cloud infrastructure HOT 3
- Continuous deployment HOT 4
- Remove Javascript error tracking HOT 1
- Simplify configs with Kustomize
- SMTP Relay service
- Broken image, possibly due to filename encoding HOT 1
- Localisation strings often failing to render HOT 2
- Timeouts on uploads > 10MB
- Implement better captcha HOT 1
- 502 errors on certain Special pages HOT 6
- Increase $wgQueryCacheLimit HOT 1
- "External links" captcha triggering on references to .jpg?
- Caching problems with images HOT 1
- Add StructuredDiscussions extension
- Add Thanks extension
- Add DisableAccount extension
- Cloud friendly file storage HOT 5
- Simplify deployment via Helm? HOT 2
- Remove hardcoded NFS IP
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deployment.