osu-infrastructure's People
osu-infrastructure's Issues
Score statistics processor needs to process PP for imported highscores
When we deploy lazer for ranking purposes, we need to completely shut down osu-performance
and have PP be solely updated by osu-queue-score-statistics
. Otherwise, because osu-performance
only reads from the legacy tables, it will exclude any scores set in lazer.
But right now all imported highscores bypass osu-queue-score-statistics
so it'll miss any scores set in osu!stable.
Add partitioning to `solo_scores` table
See https://gist.github.com/peppy/8851f98d0fb783ff29043f408e6a3923 for three proposals.
(I'd recommend opening each in separate browser tabs to easily compare the differences between them by switching tabs, or something like that)
partition_preserve
is the most pessimistic (less optimised but also less fussy for the future maybe, when we add more rulesets)partition_ruleset_preserve
is most optimal and removesruleset_id
out of the largest index, and removes thepreserve
index** (see caveat in inline comments)partition_preserve_ruleset
removes the caveat of preserve lookups, but can't optimise theruleset_id
out of the other index as a result
I'd recommend running that and importing some data to test performance / query plan for any concern queries.
insert into partitioned_solo_scores select * from solo_scores order by id desc limit 10000;
can be used if you already have data in the actual table.
SR/PP update checklist 2022-09
Important changes
- osu: Added HD+FL difficulty adjustment mod combination (1032)
- osu: Added TD difficulty adjustment mod (4)
- osu: Added
speed_note_count
difficulty attribute.
Table setup
- No changes required.
- Database already contains the new
Speed note count
attribute (attrib_id = 21
).
- Database already contains the new
Medal updates
Related PRs
- https://github.com/peppy/osu-web-10/pull/201
- https://github.com/peppy/osu-stable/pull/2412
- ppy/osu-beatmap-difficulty-lookup-cache#11
Deployment (in order)
Suggested ruleset order:
taiko -> osu -> mania -> catch
-
osu-difficulty-calculator
(watch - all rulesets) -
osu-difficulty-calculator
(reprocess - all rulesets) -
osu-web-10
(with above PRs) -
osu-performance
(watch - all rulesets) -
osu-performance
(reprocess - all rulesets) -
osu-beatmap-difficulty-lookup-cache
(with above PR) -
osu-stable
(with above PR) -
osu-queue-score-statistics
(watch) -
osu-queue-score-statistics
(scores all
- all rulesets)
Wiki
Infrastructure deployment tasks for path-to-ranking
From a high level, tracking the order of deployment tasks that have dependencies on other changes.
Testing notes for self:
To reset everything:
# nuke all indices
osu.ElasticIndexer# dotnet run index nuke
# view all indices
curl -X GET "localhost:9200/_cat/indices?v&pretty"
# restart es service
systemctl restart elasticsearch
truncate table scores;truncate table score_legacy_id_map;truncate table score_performance;
Ongoing stuff which needs to be run:
# osu-queue-store-statistics
cd ~/repos/osu-queue-score-statistics/osu.Server.Queues.ScoreStatisticsProcessor
git fetch; git reset --hard peppy/new-table-names
SCHEMA=20231208 dotnet run queue import-high-scores --start-id 0
# osu-elastic-indexer
cd ~/repos/osu-elastic-indexer/osu.ElasticIndexer
git fetch --all; git reset --hard peppy/new-table-names
SCHEMA=20231208 dotnet run queue watch
Replays and PP should not be processed for non-passing scores
Migrate osu-web (workers/cronjob) to Kubernetes
First place scores on user profiles don't consider lazer scores
- depends on ppy/osu#27685
In addition, the recent event section doesn't show when a user gets a high rank on a beatmap for lazer scores.
Add osu-web support for new score infrastructure
Currently documented at https://github.com/ppy/osu-infrastructure/blob/master/score-submission.md
osu-web PRs:
- ppy/osu-web#10887
- ppy/osu-web#10888
- ppy/osu-web#10889
- ppy/osu-web#10890
- ppy/osu-web#10891
- ppy/osu-web#10892
- ppy/osu-web#10893
- ppy/osu-web#10894
- ppy/osu-web#10895
osu-web Preparation (done)
- update profile page recent plays to use solo scores
- add user/ruleset index on solo scores
- need to figure out how to filter out the fail ones
- also probably need to add
id
as part of the index for sorting
- update beatmap user scores to use es
- add beatmap leader reset job/command
- update profile page first place section to use solo scores
- create query builder-like class for querying scores
- measure index size for
solo_scores(beatmap_id)
: 19GB/16GB (dynamic
/+lz4), 9GB/8GB (compressed
/+lz4) -
measure index size fornot needed anymore since we're going to show all recent plays, not limited to just 24hsolo_scores(user_id, ruleset_id, created_at desc)
- update beatmap pack page to use es
(partially waiting for ppy/osu-elastic-indexer#111) - fix tests (queuing score for index and update github action and docker dev)
(partially waiting for ppy/osu-elastic-indexer#110) - update beatmap scoreboard reset (during state change) to also delete solo scores
- queue new scores
(waiting on the specs?)
SSL migration
Our current wildcard certificate expires on September 3rd. Our current provider (DigiCert) has increased their pricing and we're looking to move away from them.
After reviewing and testing Let's Encrypt devices compatibility, we have made the decision to integrate ACME in our infrastructure and switch to LE.
Google Trust Services has also put a public ACME service in place. They offer a similar service to Let's Encrypt, except the compatibility is as good as a root certificate from 1998 gets. Using GTS enables us to retain the same compatibility that osu! users are used to. This service is in free public beta. It is not impossible that this service will become paid at the end of the beta phase, but as they both use ACME we can switch back-and-forth with these providers in just a few minutes, so the plan is to roll with GTS for now.
-
Kubernetes clusters SSL migration
Our Kubernetes clusters will issue certificates using cert-manager, an ACME client by jetstack built for Kubernetes.- Staging cluster (http01 validation only)
- Production cluster (dns01 & http01 validation)
-
Individual droplets SSL migration
Most of the work goes here as there are dozens of droplets to migrate - or rather, create an SSL infrastructure for. As we used to renew with DigiCert every 3 years, no automated process has been put in place. LE/GTS deliver certificates for only up to 90 days, so we must switch to an automated solution.Our individual droplets run on a huge variety of different operating systems. Managing ACME clients on each would be a huge overhead, and we'd rather not share our CloudFlare API token with every droplets that need wildcard certificates.
Therefore, we will rely on the cert-manager in our production Kubernetes cluster to issue and renew all the certificates we need. All our droplets will fetch these certificates on a regular basis using small bash/curl scripts, via a custom-made HTTPS service that will be running inside the production cluster. Droplets will be authenticated using client-side certificates authentication.
- Certificates serving back-end development
- Certificates serving back-end deployment
- Certificates fetching script development
- Certificates fetching script deployment across all ~15 droplets/nodes that need them..
-
Automatically refresh our custom edge certificate on Cloudflare
Replay handling for imported (and new) `solo_scores`
Just documenting some IRL discussion regarding the path forward with migration of the has_replay
flag (aka knowing if a score has a replay available), which is currently not present in the new solo_scores
schema.
This was decided with the goal of keeping things simple and flexible for now, and may change in the future once we have the systems online.
Current proposal:
- Add
BOOL has_replay
tosolo_scores
- Add
legacy_score_id
to JSONdata
(alternative is to add a second index tolegacy_id_map
table, but feels like the legacy ID should be in the json data) - Change replay retrieval code to first retrieve using
solo_score.ID
and falling back tosolo_score.data.legacy_score_id
. - Migrating S3 data from legacy to new IDs once everything has settled (then removing the fallback).
Tasks to make this happen:
- Run online schema changes to add new
has_replay
column - Update
ImportHighScores
(ppy/osu-score-statistics-processor) to includelegacy_score_id
andhas_replay
- Update web-10 submission to correctly update the new flags (on replay deletion)
- Update web-10 and osu-web replay retrieval to perform fallback lookups
- Migrate existing replays to new IDs on S3
cc/ @nanaya @smoogipoo
Populate `max_combo` in `osu_beatmaps` using `osu-difficulty-calculator`
Ensure that only max_combo
is written to avoid applying new diffcalc version.
Osu! Lazer Scores not showing on profile
I have set multiple plays in Osu! Lazer that do not show on my public profile, let me provide examples;
Here, you can see a play I set on "how to create the weeknd's "blinding lights"" by Seth Everman. Here you can see that this submitted score added 162pp play to my profile.
This is a screenshot of my profile as you can see, the play does not show up. I will provide one more example:
As you can see in this image I set a 202pp play on "Horrible Kids" by Set It Off. If you look at the previous screenshot of my profile you can see I have Lazer mode selected and these plays do not show up on my public profile. Is this a visual bug, or a bug with submitting scores, I would really like to have this issue resolved as it is my first 200pp play and another score with a decent pp count. Thank you for reading. (Both of these scores were set over a week ago btw)
Migrate camo to Kubernetes
Improving multiplayer things
I've been working through migration to the new multiplayer_score_links
table, along with review on ppy/osu#24697 / ppy/osu-server-spectator#185. I've also been in discussion with @nanaya over my slight unhappiness (inability to easily comprehend) the current structure of things.
To recap:
CREATE TABLE `multiplayer_score_links`
(
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`user_id` int unsigned NOT NULL,
`room_id` bigint unsigned NOT NULL,
`playlist_item_id` bigint unsigned NOT NULL,
`beatmap_id` mediumint unsigned NOT NULL,
`build_id` mediumint unsigned NOT NULL DEFAULT '0',
`score_id` bigint unsigned DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `multiplayer_score_links_score_id_index` (`score_id`),
KEY `multiplayer_score_links_room_id_user_id_index` (`room_id`, `user_id`),
KEY `multiplayer_score_links_playlist_item_id_index` (`playlist_item_id`),
KEY `multiplayer_score_links_user_id_index` (`user_id`)
)
multiplayer_score_links
is a table that replaces multiplayer_scores
, and allows the majority of the score metadata to be stored in the main solo_scores
table. While not immediately obvious from the structure or naming, it currently serves a dual purpose:
- Used by osu-web for recalculating "user best" scores in a playlist or multiplayer room.
- Used instead of
solo_score_tokens
for multiplayer scores. In other words, it is used to give the user a token when they begin gameplay that they can use to submit the final score.
This is weird. So my proposal is that we stop using this table for the token process, and use solo_score_tokens
instead.
CREATE TABLE `solo_score_tokens`
(
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`score_id` bigint DEFAULT NULL,
`user_id` bigint NOT NULL,
`beatmap_id` mediumint NOT NULL,
`ruleset_id` smallint NOT NULL,
`build_id` mediumint unsigned DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`)
)
To make this work, we should add a playlist_item_id NULL DEFAULT NULL
to solo_score_tokens
. This table is already ephemeral data so adding extra columns like this is not a huge deal. We can then restructure the multiplayer_score_links
table to be used specifically for lookup purposes.
The new tables would look something like this:
CREATE TABLE `solo_score_tokens`
(
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`score_id` bigint DEFAULT NULL,
`user_id` bigint NOT NULL,
`beatmap_id` mediumint NOT NULL,
`ruleset_id` smallint NOT NULL,
`playlist_item_id` bigint unsigned NULL DEFAULT NULL,
`build_id` mediumint unsigned DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `playlist_scores`
(
`playlist_item_id` bigint unsigned NOT NULL,
`score_id` bigint unsigned DEFAULT NULL,
`user_id` int unsigned NOT NULL,
PRIMARY KEY (`score_id`),
KEY `multiplayer_score_links_score_id_index` (`playlist_item_id`),
KEY `multiplayer_score_links_user_id_index` (`user_id`)
)
This ill make ppy/osu#24697 / ppy/osu-server-spectator#185 obsoleted. It would fix my naming issues with the multiplayer_score_links
table.
Also some remaining cleanup tasks
-
DROP multiplayer_scores
- ppy/osu-web#10678
Some other considerations
- Rename
solo_scores
toscores
? Use aVIEW
to alias the old name to ease migration? - Rename
multiplayer_score_links
toplaylist_scores
Investigate data size concerns for adding a new index to `solo_scores`
Database structure testing
As per #15, we need an index on (user_id, ruleset_id, beatmap_id)
if using the database for operations like ppy/osu-queue-score-statistics#149 (and in the future, ranked score processing and probably more).
I want to test on production data (scaled down slightly) to get a real-world idea of how adding/changing indices affect the size of this table.
Test whether changing primary key composition affects index size
TL;DR it doesn't seem to.
-- original
ALTER TABLE `solo_scores`
DROP PRIMARY KEY,
ADD PRIMARY KEY (`id`, `preserve`, `ruleset_id`, `created_at`);
MySQL root@(none):osu> SELECT
-> table_name AS `Table`,
-> ROUND(data_length / 1024 / 1024) AS `Data MB`,
-> ROUND(index_length / 1024 / 1024) AS `Index MB`
-> FROM
-> information_schema.tables
-> WHERE
-> table_schema = 'osu' AND
-> table_name = 'solo_scores';
+-------------+---------+----------+
| Table | Data MB | Index MB |
+-------------+---------+----------+
| solo_scores | 1868 | 237 |
+-------------+---------+----------+
MySQL root@(none):osu> SHOW TABLE STATUS LIKE 'solo_scores';
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| solo_scores | InnoDB | 10 | Compressed | 9523175 | 205 | 1958215680 | 0 | 248758272 | 3932160 | 2406267336 | 2023-09-01 01:24:18 | <null> | <null> | utf8mb4_0900_ai_ci | <null> | row_format=COMPRESSED KEY_BLOCK_SIZE=4 partitioned | |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
-- remove ruleset_id and see if index size changes
ALTER TABLE `solo_scores`
DROP PRIMARY KEY,
ADD PRIMARY KEY (`id`, `preserve`, `created_at`)
MySQL root@(none):osu> SELECT
-> table_name AS `Table`,
-> ROUND(data_length / 1024 / 1024) AS `Data MB`,
-> ROUND(index_length / 1024 / 1024) AS `Index MB`
-> FROM
-> information_schema.tables
-> WHERE
-> table_schema = 'osu' AND
-> table_name = 'solo_scores';
+-------------+---------+----------+
| Table | Data MB | Index MB |
+-------------+---------+----------+
| solo_scores | 1867 | 236 |
+-------------+---------+----------+
MySQL root@(none):osu> SHOW TABLE STATUS LIKE 'solo_scores';
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| solo_scores | InnoDB | 10 | Compressed | 9577016 | 204 | 1957953536 | 0 | 247709696 | 5505024 | 2406267336 | 2023-09-01 02:58:21 | <null> | <null> | utf8mb4_0900_ai_ci | <null> | row_format=COMPRESSED KEY_BLOCK_SIZE=4 partitioned | |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
Test whether existing index needs id DESC
at end (or whether it can use the implicity primary key)
TL;DR it doesn't, it can use the primary key
explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;
DROP INDEX `user_ruleset_id_index`;
ADD INDEX `user_ruleset_id_index` (`user_id`,`ruleset_id`);
explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;
Test how badly inserting beatmap_id
into the existing index (KEY user_ruleset_id_index (user_id,ruleset_id,id DESC)
) breaks things (100% it will)
TL;DR it does, as expected. We can get around this by changing the existing KEY beatmap_id
to include user_id
at the end. This comes at almost zero storage cost due to the high cardinality of users per beatmap (almost completely unique). So we're just changing the ordering of the index rather than adding new overheads.
explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;
MySQL root@(none):osu> explain SELECT `id` FROM `solo_scores_p` WHERE `user_id` = 19743981 AND `ruleset_id` = 0 and preserve in (0,1) ORDER BY `id`limit 10;
+----+-------------+---------------+------------+------+---------------+---------------+---------+-------------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+---------------+---------------+---------+-------------+---------+----------+--------------------------+
| 1 | SIMPLE | solo_scores_p | p0catch,p1 | ref | user_id_index | user_id_index | 6 | const,const | 1700684 | 20.0 | Using where; Using index |
+----+-------------+---------------+------------+------+---------------+---------------+---------+-------------+---------+----------+--------------------------+
10 rows in set
Time: 0.005s
DROP INDEX `user_ruleset_id_index`;
ADD INDEX `user_ruleset_id_index` (`user_id`,`ruleset_id`, `beatmap_id`, `id` DESC);
explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;
MySQL root@(none):osu> explain SELECT `id` FROM `solo_scores_p` WHERE `user_id` = 19743981 AND `ruleset_id` = 0 and preserve
-> in (0,1) ORDER BY `id`limit 10;
+----+-------------+---------------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+
| 1 | SIMPLE | solo_scores_p | p0catch,p1 | index | user_id_index | PRIMARY | 16 | <null> | 62 | 3.18 | Using where |
+----+-------------+---------------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+
10 rows in set
Time: 0.498s
Test whether including ruleset_id
in the constructed partitions (as touched on in #16) means we can remove it from indices
TL;DR correct, but I'm avoiding this because of concerns for future extensibility (custom rulesets). Also there's no storage saving on disk.
explain select * from solo_scores where user_id = 4937439 and ruleset_id = 3 and beatmap_id = 75;
ALTER TABLE `solo_scores`
PARTITION BY RANGE COLUMNS(`preserve`, `ruleset_id`, created_at)
(
PARTITION p0r0catch VALUES LESS THAN (0,0,MAXVALUE) ENGINE = InnoDB,
PARTITION p0r1catch VALUES LESS THAN (0,1,MAXVALUE) ENGINE = InnoDB,
PARTITION p0r2catch VALUES LESS THAN (0,2,MAXVALUE) ENGINE = InnoDB,
PARTITION p0r3catch VALUES LESS THAN (0,3,MAXVALUE) ENGINE = InnoDB,
PARTITION p1r0 VALUES LESS THAN (1,0,MAXVALUE) ENGINE = InnoDB,
PARTITION p1r1 VALUES LESS THAN (1,1,MAXVALUE) ENGINE = InnoDB,
PARTITION p1r2 VALUES LESS THAN (1,2,MAXVALUE) ENGINE = InnoDB,
PARTITION p1r3 VALUES LESS THAN (1,3,MAXVALUE) ENGINE = InnoDB
);
-- working alternative?
ALTER TABL๏ฟฝE `solo_scores`
PARTITION BY list COLUMNS(`preserve`, `ruleset_id`)
(
PARTITION p0r0catch VALUES IN ((0,0)),
PARTITION p0r1catch VALUES IN ((0,1)),
PARTITION p0r2catch VALUES IN ((0,2)),
PARTITION p0r3catch VALUES IN ((0,3)),
PARTITION p1r0 VALUES IN ((1,0)),
PARTITION p1r1 VALUES IN ((1,1)),
PARTITION p1r2 VALUES IN ((1,2)),
PARTITION p1r3 VALUES IN ((1,3))
);
+-------------+----------------+---------+----------+
| Table | AVG_ROW_LENGTH | Data MB | Index MB |
+-------------+----------------+---------+----------+
| solo_scores | 178 | 1635 | 342 |
+-------------+----------------+---------+----------+
MySQL root@(none):osu> explain select * from solo_scores where user_id = 4937439 and ruleset_id = 3 and beatmap_id = 75 and preserve = 1\G;
***************************[ 1. row ]***************************
id | 1
select_type | SIMPLE
table | solo_scores
partitions | p1r3
type | ref
possible_keys | user_ruleset_id_index,beatmap_id
key | user_ruleset_id_index
key_len | 6
ref | const,const
rows | 1
filtered | 5.0
Extra | Using index condition; Using where
ALTER TABLE `solo_scores`
DROP INDEX `user_ruleset_id_index`,
ADD KEY `user_id_index` (`user_id`,`id` DESC);
+-------------+----------------+---------+----------+
| Table | AVG_ROW_LENGTH | Data MB | Index MB |
+-------------+----------------+---------+----------+
| solo_scores | 178 | 1635 | 342 |
+-------------+----------------+---------+----------+
explain select * from solo_scores where user_id = 4937439 and ruleset_id = 3 and beatmap_id = 75;
Check why we have a beatmap_id
index
- Used when a beatmap changes
approved
state (osu-web
).
Notes
- Using
LIST
partitioning, mysql can't determine it should be using specific partitions whenpreserve=
is not in the query:
MySQL root@(none):osu> explain select id from solo_scores where user_id = 19743981 and ruleset_id = 0 order by id desc limit 10\G;
***************************[ 1. row ]***************************
id | 1
select_type | SIMPLE
table | solo_scores
partitions | p0r0,p0r1,p0r2,p0r3,p1r0,p1r1,p1r2,p1r3
type | index
possible_keys | user_id_index
key | PRIMARY
key_len | 16
ref | <null>
rows | 64
filtered | 1.56
Extra | Using where; Backward index scan
MySQL root@(none):osu> explain select id from solo_scores where user_id = 19743981 and ruleset_id = 0 and preserve in (0,1) order by id desc limit 10\G;
***************************[ 1. row ]***************************
id | 1
select_type | SIMPLE
table | solo_scores
partitions | p0r0,p1r0
type | index
possible_keys | user_id_index
key | PRIMARY
key_len | 16
ref | <null>
rows | 56
filtered | 0.35
Extra | Using where; Backward index scan
- Enabling partitioning seems to reduce
DATA_SIZE
by around 33%. But there's no documentation anywhere saying that this should be the case. What gives?
+-----------------+----------------+---------+----------+
| Table | AVG_ROW_LENGTH | Data MB | Index MB |
+-----------------+----------------+---------+----------+
| solo_scores | 300 | 2835 | 369 |
| solo_scores_p | 178 | 1780 | 359 |
| solo_scores_p_r | 178 | 1782 | 359 |
+-----------------+----------------+---------+----------+
(fixed by running ALTER TABLE solo_scores FORCE
...)
Migrate osu-web (octane) to Kubernetes
Figure out the final partitioning structure for `solo_scores`
I setup partitioning to work with my last major effort on this table, but two pieces are missing.
Partition scheme includes ruleset_id
but it is not used
Originally we were looking to partition per ruleset, which seems like a great idea from a performance angle. As such, this was included in the partition scheme, but it hasn't actually been used in the constructed partitions.
I want to test the overhead having ruleset_id
in the primary key adds, and if it is negligible then we should leave it there and ensure the constructed partitions are per-ruleset.
(will be tested in #17)
Conclusion: Including ruleset in the partitioning schema is not a good direction, in terms of future extensibility and complexity in partition rotation.
Partition rotation is not yet implemented
Currently we have only two partitions โ preserve=0
and preserve=1
. The plan is to add partition rotation for the p0
case, but this hasn't been done yet. It still seems like a good idea, but comes with a concern:
We are going in a direction where a score can be switched between preserve=0
and preserve=1
. We also have tooling for removing preserve=0
scores after a certain delay and when all criteria are correct (see ppy/osu-queue-score-statistics#141).
So, what happens when a score becoming p=0
and falls into a partition that is about to be rotated for cleanup? If the cleanup process is guaranteeing a time window before scores are cleaned up after the last preserve
flag switch, then they may end up getting rotated out of existence too soon. An example would be a user unpinning a score, then realising they want to pin it again a few minutes later. With unfortunate timing it would be lost during this period.
A solution may be to change the partitioning to be on updated_at
instead of created_at
and ensure we are updating updated_at
on any row change.
Partition rotation should be considered for other tables
solo_score_tokens
Considerations for `solo_scores` table structure / indices
For certain tasks like ppy/osu-queue-score-statistics#149, we need the ability to do lookups on (user_id,ruleset_id,beatmap_id)
. Currently we have no index on this.
For other similar lookups in osu-web, elasticsearch is used. I have concerns for using elasticsearch are reliability and correctness. The latter is important: if an item in elasticsearch has been queued for indexing/reindexing which hasn't yet been completely, incorrect (outdated) data could be returned.
So I'm going to focus on the database side of things firstly.
Adding an index has both performance and data size concerns.
I'll be investigating and testing of structural changes to figure the correct path forward.
Structural changes to be applied from investigations:
--- old.sql 2023-09-01 18:17:38
+++ new.sql 2023-09-01 20:03:04
@@ -6,12 +6,12 @@
`data` json NOT NULL,
`has_replay` tinyint(1) DEFAULT '0',
`preserve` tinyint(1) NOT NULL DEFAULT '0',
- `created_at` datetime NOT NULL,
- `updated_at` timestamp NULL DEFAULT NULL,
- PRIMARY KEY (`id`,`ruleset_id`,`preserve`,`created_at`),
- KEY `user_ruleset_id_index` (`user_id`,`ruleset_id`,`id` DESC),
- KEY `beatmap_id` (`beatmap_id`)
+ `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
+ `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+ PRIMARY KEY (`id`,`preserve`,`updated_at`),
+ KEY `user_ruleset_index` (`user_id`,`ruleset_id`),
+ KEY `beatmap_user_index` (`beatmap_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
-/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,created_at)
+/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,`updated_at`)
(PARTITION p0catch VALUES LESS THAN (0,MAXVALUE) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (MAXVALUE,MAXVALUE) ENGINE = InnoDB) */
Investigate whether there's a better way to do the recent scores lookup
The only reason for index KEY user_ruleset_id_index (user_id,ruleset_id,id DESC)
existing is to do recent score display on user profiles.
Historically, we've maintained a separate table for this. We may just want to go back to doing this if it means the main table gets a whole lot smaller.
Redeploy server components with new legacy difficulty attribute storage
[Discussion] Is `osu_scores_high` `UNLOGGED`?
Having a table be UNLOGGED
can lead to much faster performance (if you're using postgresql). Maybe give it a try if it's not already unlogged?
ALTER TABLE "osu_scores_high" SET UNLOGGED
Kubernetes `osu-server-spectator` deploy isn't reporting to datadog
As seen on https://status.ppy.sh (see datadog metric and note only one host is reporting).
Deploy ProxySQL on kubernetes
The version we are running is quite outdated, and has a rare tendency to fall over. It would be beneficial if we can run one (or more) instances on kubernetes, to allow for easier upgrades and better resilience.
Things that need consideration:
- Management of configuration (needs to be easy to make quick changes). I don't know what kind of import/export options are available, potentially just a
mysql-dump
. - If we run multiple instances, I'd hope kubernetes can round robing between them automatically while only giving a single host specification to deployments
- I'd probably limit usage to only apps deployed to kubernetes. We can keep a separate instance for other usage as we migrate.
For reference, ProxySQL is high cpu, low everything-else
Migrate osu-notification-server to Kubernetes
Migrate scthumber on Kubernetes
This service is running on an oversized droplet and is not an user-facing service, making it a perfect candidate for migration to Kubernetes.
[Discussion] OpenSearch instead of ElastiSearch
Create a new elasticsearch schema
OpenSearch is a fork of ElastiSearch that's more performant. Maybe give it a shot?
Investigate adding a new column for rankability of individual scores
While we can control the rankability (i.e. for pp) of individual builds via the osu_builds
flag, it may be worthwhile to also be able to control the rankability of individual scores.
For example, if we want to unrank only those scores set for a particular mod.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.