Score statistics processor needs to process PP for imported highscores

When we deploy lazer for ranking purposes, we need to completely shut down osu-performance and have PP be solely updated by osu-queue-score-statistics. Otherwise, because osu-performance only reads from the legacy tables, it will exclude any scores set in lazer.

But right now all imported highscores bypass osu-queue-score-statistics so it'll miss any scores set in osu!stable.

Add partitioning to `solo_scores` table

See https://gist.github.com/peppy/8851f98d0fb783ff29043f408e6a3923 for three proposals.
(I'd recommend opening each in separate browser tabs to easily compare the differences between them by switching tabs, or something like that)

partition_preserve is the most pessimistic (less optimised but also less fussy for the future maybe, when we add more rulesets)
partition_ruleset_preserve is most optimal and removes ruleset_id out of the largest index, and removes the preserve index** (see caveat in inline comments)
partition_preserve_ruleset removes the caveat of preserve lookups, but can't optimise the ruleset_id out of the other index as a result

I'd recommend running that and importing some data to test performance / query plan for any concern queries.

insert into partitioned_solo_scores select * from solo_scores order by id desc limit 10000;

can be used if you already have data in the actual table.

SR/PP update checklist 2022-09

Important changes

osu: Added HD+FL difficulty adjustment mod combination (1032)
osu: Added TD difficulty adjustment mod (4)
osu: Added speed_note_count difficulty attribute.

Table setup

No changes required.
- Database already contains the new Speed note count attribute (attrib_id = 21).

Medal updates

https://github.com/peppy/osu-web-10/pull/202

Related PRs

Deployment (in order)

Suggested ruleset order:
taiko -> osu -> mania -> catch

osu-difficulty-calculator (watch - all rulesets)
osu-difficulty-calculator (reprocess - all rulesets)
osu-web-10 (with above PRs)
osu-performance (watch - all rulesets)
osu-performance (reprocess - all rulesets)
osu-beatmap-difficulty-lookup-cache (with above PR)
osu-stable (with above PR)
osu-queue-score-statistics (watch)
osu-queue-score-statistics (scores all - all rulesets)

Wiki

Infrastructure deployment tasks for path-to-ranking

From a high level, tracking the order of deployment tasks that have dependencies on other changes.

Testing notes for self:

To reset everything:

# nuke all indices
osu.ElasticIndexer# dotnet run index nuke

# view all indices
curl -X GET "localhost:9200/_cat/indices?v&pretty"

# restart es service
systemctl restart elasticsearch

truncate table scores;truncate table score_legacy_id_map;truncate table score_performance;

Ongoing stuff which needs to be run:

# osu-queue-store-statistics
cd ~/repos/osu-queue-score-statistics/osu.Server.Queues.ScoreStatisticsProcessor
git fetch; git reset --hard peppy/new-table-names
SCHEMA=20231208 dotnet run queue import-high-scores --start-id 0

# osu-elastic-indexer
cd ~/repos/osu-elastic-indexer/osu.ElasticIndexer
git fetch --all; git reset --hard peppy/new-table-names
SCHEMA=20231208 dotnet run queue watch

Replays and PP should not be processed for non-passing scores

Tasks

Beta Give feedback

Stop processors running which don't need to be run in osu-queue-score-statistics ppy/osu-queue-score-statistics#156
Stop replay uploads in osu-server-spectator ppy/osu-server-spectator#187
Options

Migrate osu-web (workers/cronjob) to Kubernetes

First place scores on user profiles don't consider lazer scores

depends on ppy/osu#27685

In addition, the recent event section doesn't show when a user gets a high rank on a beatmap for lazer scores.

Add osu-web support for new score infrastructure

Currently documented at https://github.com/ppy/osu-infrastructure/blob/master/score-submission.md

osu-web PRs:

osu-web Preparation (done)

SSL migration

Our current wildcard certificate expires on September 3rd. Our current provider (DigiCert) has increased their pricing and we're looking to move away from them.

After reviewing and testing Let's Encrypt devices compatibility, we have made the decision to integrate ACME in our infrastructure and switch to LE.

Google Trust Services has also put a public ACME service in place. They offer a similar service to Let's Encrypt, except the compatibility is as good as a root certificate from 1998 gets. Using GTS enables us to retain the same compatibility that osu! users are used to. This service is in free public beta. It is not impossible that this service will become paid at the end of the beta phase, but as they both use ACME we can switch back-and-forth with these providers in just a few minutes, so the plan is to roll with GTS for now.

Kubernetes clusters SSL migration
Our Kubernetes clusters will issue certificates using cert-manager, an ACME client by jetstack built for Kubernetes.
- Staging cluster (http01 validation only)
- Production cluster (dns01 & http01 validation)
Individual droplets SSL migration
Most of the work goes here as there are dozens of droplets to migrate - or rather, create an SSL infrastructure for. As we used to renew with DigiCert every 3 years, no automated process has been put in place. LE/GTS deliver certificates for only up to 90 days, so we must switch to an automated solution.

Our individual droplets run on a huge variety of different operating systems. Managing ACME clients on each would be a huge overhead, and we'd rather not share our CloudFlare API token with every droplets that need wildcard certificates.

Therefore, we will rely on the cert-manager in our production Kubernetes cluster to issue and renew all the certificates we need. All our droplets will fetch these certificates on a regular basis using small bash/curl scripts, via a custom-made HTTPS service that will be running inside the production cluster. Droplets will be authenticated using client-side certificates authentication.
- Certificates serving back-end development
- Certificates serving back-end deployment
- Certificates fetching script development
- Certificates fetching script deployment across all ~15 droplets/nodes that need them..
Automatically refresh our custom edge certificate on Cloudflare

Replay handling for imported (and new) `solo_scores`

Just documenting some IRL discussion regarding the path forward with migration of the has_replay flag (aka knowing if a score has a replay available), which is currently not present in the new solo_scores schema.

This was decided with the goal of keeping things simple and flexible for now, and may change in the future once we have the systems online.

Current proposal:

Add BOOL has_replay to solo_scores
Add legacy_score_id to JSON data (alternative is to add a second index to legacy_id_map table, but feels like the legacy ID should be in the json data)
Change replay retrieval code to first retrieve using solo_score.ID and falling back to solo_score.data.legacy_score_id.
Migrating S3 data from legacy to new IDs once everything has settled (then removing the fallback).

Tasks to make this happen:

Run online schema changes to add new has_replay column
Update ImportHighScores (ppy/osu-score-statistics-processor) to include legacy_score_id and has_replay
Update web-10 submission to correctly update the new flags (on replay deletion)
Update web-10 and osu-web replay retrieval to perform fallback lookups
Migrate existing replays to new IDs on S3

cc/ @nanaya @smoogipoo

Populate `max_combo` in `osu_beatmaps` using `osu-difficulty-calculator`

See ppy/osu-web#10566

Ensure that only max_combo is written to avoid applying new diffcalc version.

Osu! Lazer Scores not showing on profile

I have set multiple plays in Osu! Lazer that do not show on my public profile, let me provide examples;

Here, you can see a play I set on "how to create the weeknd's "blinding lights"" by Seth Everman. Here you can see that this submitted score added 162pp play to my profile.

This is a screenshot of my profile as you can see, the play does not show up. I will provide one more example:

As you can see in this image I set a 202pp play on "Horrible Kids" by Set It Off. If you look at the previous screenshot of my profile you can see I have Lazer mode selected and these plays do not show up on my public profile. Is this a visual bug, or a bug with submitting scores, I would really like to have this issue resolved as it is my first 200pp play and another score with a decent pp count. Thank you for reading. (Both of these scores were set over a week ago btw)

Migrate camo to Kubernetes

Improving multiplayer things

I've been working through migration to the new multiplayer_score_links table, along with review on ppy/osu#24697 / ppy/osu-server-spectator#185. I've also been in discussion with @nanaya over my slight unhappiness (inability to easily comprehend) the current structure of things.

To recap:

CREATE TABLE `multiplayer_score_links`
(
    `id`               bigint unsigned    NOT NULL AUTO_INCREMENT,
    `user_id`          int unsigned       NOT NULL,
    `room_id`          bigint unsigned    NOT NULL,
    `playlist_item_id` bigint unsigned    NOT NULL,
    `beatmap_id`       mediumint unsigned NOT NULL,
    `build_id`         mediumint unsigned NOT NULL DEFAULT '0',
    `score_id`         bigint unsigned             DEFAULT NULL,
    `created_at`       timestamp          NULL     DEFAULT NULL,
    `updated_at`       timestamp          NULL     DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `multiplayer_score_links_score_id_index` (`score_id`),
    KEY `multiplayer_score_links_room_id_user_id_index` (`room_id`, `user_id`),
    KEY `multiplayer_score_links_playlist_item_id_index` (`playlist_item_id`),
    KEY `multiplayer_score_links_user_id_index` (`user_id`)
)

multiplayer_score_links is a table that replaces multiplayer_scores, and allows the majority of the score metadata to be stored in the main solo_scores table. While not immediately obvious from the structure or naming, it currently serves a dual purpose:

Used by osu-web for recalculating "user best" scores in a playlist or multiplayer room.
Used instead of solo_score_tokens for multiplayer scores. In other words, it is used to give the user a token when they begin gameplay that they can use to submit the final score.

This is weird. So my proposal is that we stop using this table for the token process, and use solo_score_tokens instead.

CREATE TABLE `solo_score_tokens`
(
    `id`         bigint unsigned NOT NULL AUTO_INCREMENT,
    `score_id`   bigint               DEFAULT NULL,
    `user_id`    bigint          NOT NULL,
    `beatmap_id` mediumint       NOT NULL,
    `ruleset_id` smallint        NOT NULL,
    `build_id`   mediumint unsigned   DEFAULT NULL,
    `created_at` timestamp       NULL DEFAULT NULL,
    `updated_at` timestamp       NULL DEFAULT NULL,
    PRIMARY KEY (`id`)
)

To make this work, we should add a playlist_item_id NULL DEFAULT NULL to solo_score_tokens. This table is already ephemeral data so adding extra columns like this is not a huge deal. We can then restructure the multiplayer_score_links table to be used specifically for lookup purposes.

The new tables would look something like this:

CREATE TABLE `solo_score_tokens`
(
    `id`               bigint unsigned NOT NULL AUTO_INCREMENT,
    `score_id`         bigint               DEFAULT NULL,
    `user_id`          bigint          NOT NULL,
    `beatmap_id`       mediumint       NOT NULL,
    `ruleset_id`       smallint        NOT NULL,
    `playlist_item_id` bigint unsigned NULL DEFAULT NULL,
    `build_id`         mediumint unsigned   DEFAULT NULL,
    `created_at`       timestamp       NULL DEFAULT NULL,
    `updated_at`       timestamp       NULL DEFAULT NULL,
    PRIMARY KEY (`id`)
);

CREATE TABLE `playlist_scores`
(
    `playlist_item_id` bigint unsigned NOT NULL,
    `score_id`         bigint unsigned DEFAULT NULL,
    `user_id`          int unsigned    NOT NULL,
    PRIMARY KEY (`score_id`),
    KEY `multiplayer_score_links_score_id_index` (`playlist_item_id`),
    KEY `multiplayer_score_links_user_id_index` (`user_id`)
)

This ill make ppy/osu#24697 / ppy/osu-server-spectator#185 obsoleted. It would fix my naming issues with the multiplayer_score_links table.

Also some remaining cleanup tasks

DROP multiplayer_scores
ppy/osu-web#10678

Some other considerations

Rename solo_scores to scores? Use a VIEW to alias the old name to ease migration?
Rename multiplayer_score_links to playlist_scores

cc/ @nanaya @notbakaneko @bdach @smoogipoo

Investigate data size concerns for adding a new index to `solo_scores`

Database structure testing

As per #15, we need an index on (user_id, ruleset_id, beatmap_id) if using the database for operations like ppy/osu-queue-score-statistics#149 (and in the future, ranked score processing and probably more).

I want to test on production data (scaled down slightly) to get a real-world idea of how adding/changing indices affect the size of this table.

Test whether changing primary key composition affects index size

TL;DR it doesn't seem to.

-- original
ALTER TABLE `solo_scores`
DROP PRIMARY KEY,
ADD PRIMARY KEY (`id`, `preserve`, `ruleset_id`, `created_at`);

MySQL root@(none):osu> SELECT
                    ->     table_name AS `Table`,
                    ->     ROUND(data_length / 1024 / 1024) AS `Data MB`,
                    ->     ROUND(index_length / 1024 / 1024) AS `Index MB`
                    -> FROM
                    ->     information_schema.tables
                    -> WHERE
                    ->     table_schema = 'osu' AND
                    ->     table_name = 'solo_scores';
+-------------+---------+----------+
| Table       | Data MB | Index MB |
+-------------+---------+----------+
| solo_scores | 1868    | 237      |
+-------------+---------+----------+


MySQL root@(none):osu> SHOW TABLE STATUS LIKE 'solo_scores';
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| Name        | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation          | Checksum | Create_options                                     | Comment |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| solo_scores | InnoDB | 10      | Compressed | 9523175 | 205            | 1958215680  | 0               | 248758272    | 3932160   | 2406267336     | 2023-09-01 01:24:18 | <null>      | <null>     | utf8mb4_0900_ai_ci | <null>   | row_format=COMPRESSED KEY_BLOCK_SIZE=4 partitioned |         |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+

-- remove ruleset_id and see if index size changes
ALTER TABLE `solo_scores`
DROP PRIMARY KEY,
ADD PRIMARY KEY (`id`, `preserve`, `created_at`)

MySQL root@(none):osu> SELECT
                    ->     table_name AS `Table`,
                    ->     ROUND(data_length / 1024 / 1024) AS `Data MB`,
                    ->     ROUND(index_length / 1024 / 1024) AS `Index MB`
                    -> FROM
                    ->     information_schema.tables
                    -> WHERE
                    ->     table_schema = 'osu' AND
                    ->     table_name = 'solo_scores';
+-------------+---------+----------+
| Table       | Data MB | Index MB |
+-------------+---------+----------+
| solo_scores | 1867    | 236      |
+-------------+---------+----------+

MySQL root@(none):osu> SHOW TABLE STATUS LIKE 'solo_scores';
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| Name        | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation          | Checksum | Create_options                                     | Comment |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| solo_scores | InnoDB | 10      | Compressed | 9577016 | 204            | 1957953536  | 0               | 247709696    | 5505024   | 2406267336     | 2023-09-01 02:58:21 | <null>      | <null>     | utf8mb4_0900_ai_ci | <null>   | row_format=COMPRESSED KEY_BLOCK_SIZE=4 partitioned |         |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+

Test whether existing index needs `id DESC` at end (or whether it can use the implicity primary key)

TL;DR it doesn't, it can use the primary key

explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;

DROP INDEX `user_ruleset_id_index`;
ADD INDEX `user_ruleset_id_index` (`user_id`,`ruleset_id`);

explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;

Test how badly inserting `beatmap_id` into the existing index (`KEY user_ruleset_id_index (user_id,ruleset_id,id DESC)`) breaks things (100% it will)

TL;DR it does, as expected. We can get around this by changing the existing KEY beatmap_id to include user_id at the end. This comes at almost zero storage cost due to the high cardinality of users per beatmap (almost completely unique). So we're just changing the ordering of the index rather than adding new overheads.

explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;


MySQL root@(none):osu> explain SELECT `id` FROM `solo_scores_p` WHERE `user_id` = 19743981 AND `ruleset_id` = 0 and preserve in (0,1) ORDER BY `id`limit 10;
+----+-------------+---------------+------------+------+---------------+---------------+---------+-------------+---------+----------+--------------------------+
| id | select_type | table         | partitions | type | possible_keys | key           | key_len | ref         | rows    | filtered | Extra                    |
+----+-------------+---------------+------------+------+---------------+---------------+---------+-------------+---------+----------+--------------------------+
| 1  | SIMPLE      | solo_scores_p | p0catch,p1 | ref  | user_id_index | user_id_index | 6       | const,const | 1700684 | 20.0     | Using where; Using index |
+----+-------------+---------------+------------+------+---------------+---------------+---------+-------------+---------+----------+--------------------------+

10 rows in set
Time: 0.005s


DROP INDEX `user_ruleset_id_index`;
ADD INDEX `user_ruleset_id_index` (`user_id`,`ruleset_id`, `beatmap_id`, `id` DESC);

explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;

MySQL root@(none):osu> explain SELECT `id` FROM `solo_scores_p` WHERE `user_id` = 19743981 AND `ruleset_id` = 0 and preserve
                    -> in (0,1) ORDER BY `id`limit 10;
+----+-------------+---------------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+
| id | select_type | table         | partitions | type  | possible_keys | key     | key_len | ref    | rows | filtered | Extra       |
+----+-------------+---------------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+
| 1  | SIMPLE      | solo_scores_p | p0catch,p1 | index | user_id_index | PRIMARY | 16      | <null> | 62   | 3.18     | Using where |
+----+-------------+---------------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+

10 rows in set
Time: 0.498s

Test whether including `ruleset_id` in the constructed partitions (as touched on in #16) means we can remove it from indices

TL;DR correct, but I'm avoiding this because of concerns for future extensibility (custom rulesets). Also there's no storage saving on disk.

explain select * from solo_scores where user_id = 4937439 and ruleset_id = 3 and beatmap_id = 75;

ALTER TABLE `solo_scores`
    PARTITION BY RANGE COLUMNS(`preserve`, `ruleset_id`, created_at)
        (
        PARTITION p0r0catch VALUES LESS THAN (0,0,MAXVALUE) ENGINE = InnoDB,
        PARTITION p0r1catch VALUES LESS THAN (0,1,MAXVALUE) ENGINE = InnoDB,
        PARTITION p0r2catch VALUES LESS THAN (0,2,MAXVALUE) ENGINE = InnoDB,
        PARTITION p0r3catch VALUES LESS THAN (0,3,MAXVALUE) ENGINE = InnoDB,
        PARTITION p1r0 VALUES LESS THAN (1,0,MAXVALUE) ENGINE = InnoDB,
        PARTITION p1r1 VALUES LESS THAN (1,1,MAXVALUE) ENGINE = InnoDB,
        PARTITION p1r2 VALUES LESS THAN (1,2,MAXVALUE) ENGINE = InnoDB,
        PARTITION p1r3 VALUES LESS THAN (1,3,MAXVALUE) ENGINE = InnoDB
        );

-- working alternative?
ALTER TABL�E `solo_scores`
 PARTITION BY list COLUMNS(`preserve`, `ruleset_id`)
     (
     PARTITION p0r0catch VALUES IN ((0,0)),
     PARTITION p0r1catch VALUES IN ((0,1)),
     PARTITION p0r2catch VALUES IN ((0,2)),
     PARTITION p0r3catch VALUES IN ((0,3)),
     PARTITION p1r0 VALUES IN ((1,0)),
     PARTITION p1r1 VALUES IN ((1,1)),
     PARTITION p1r2 VALUES IN ((1,2)),
     PARTITION p1r3 VALUES IN ((1,3))
     );

+-------------+----------------+---------+----------+
| Table       | AVG_ROW_LENGTH | Data MB | Index MB |
+-------------+----------------+---------+----------+
| solo_scores | 178            | 1635    | 342      |
+-------------+----------------+---------+----------+

MySQL root@(none):osu> explain select * from solo_scores where user_id = 4937439 and ruleset_id = 3 and beatmap_id = 75 and preserve = 1\G;
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | solo_scores
partitions    | p1r3
type          | ref
possible_keys | user_ruleset_id_index,beatmap_id
key           | user_ruleset_id_index
key_len       | 6
ref           | const,const
rows          | 1
filtered      | 5.0
Extra         | Using index condition; Using where

ALTER TABLE `solo_scores`
    DROP INDEX `user_ruleset_id_index`,
    ADD KEY `user_id_index` (`user_id`,`id` DESC);


+-------------+----------------+---------+----------+
| Table       | AVG_ROW_LENGTH | Data MB | Index MB |
+-------------+----------------+---------+----------+
| solo_scores | 178            | 1635    | 342      |
+-------------+----------------+---------+----------+

explain select * from solo_scores where user_id = 4937439 and ruleset_id = 3 and beatmap_id = 75;

Check why we have a `beatmap_id` index

Used when a beatmap changes approved state (osu-web).

Notes

Using LIST partitioning, mysql can't determine it should be using specific partitions when preserve= is not in the query:

MySQL root@(none):osu> explain select id from solo_scores where user_id = 19743981 and ruleset_id = 0 order by id desc limit 10\G;
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | solo_scores
partitions    | p0r0,p0r1,p0r2,p0r3,p1r0,p1r1,p1r2,p1r3
type          | index
possible_keys | user_id_index
key           | PRIMARY
key_len       | 16
ref           | <null>
rows          | 64
filtered      | 1.56
Extra         | Using where; Backward index scan


MySQL root@(none):osu> explain select id from solo_scores where user_id = 19743981 and ruleset_id = 0 and preserve in (0,1) order by id desc limit 10\G;
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | solo_scores
partitions    | p0r0,p1r0
type          | index
possible_keys | user_id_index
key           | PRIMARY
key_len       | 16
ref           | <null>
rows          | 56
filtered      | 0.35
Extra         | Using where; Backward index scan

Enabling partitioning seems to reduce DATA_SIZE by around 33%. But there's no documentation anywhere saying that this should be the case. What gives?

+-----------------+----------------+---------+----------+
| Table           | AVG_ROW_LENGTH | Data MB | Index MB |
+-----------------+----------------+---------+----------+
| solo_scores     | 300            | 2835    | 369      |
| solo_scores_p   | 178            | 1780    | 359      |
| solo_scores_p_r | 178            | 1782    | 359      |
+-----------------+----------------+---------+----------+

(fixed by running ALTER TABLE solo_scores FORCE...)

Migrate osu-web (octane) to Kubernetes

Figure out the final partitioning structure for `solo_scores`

I setup partitioning to work with my last major effort on this table, but two pieces are missing.

Partition scheme includes `ruleset_id` but it is not used

Originally we were looking to partition per ruleset, which seems like a great idea from a performance angle. As such, this was included in the partition scheme, but it hasn't actually been used in the constructed partitions.

I want to test the overhead having ruleset_id in the primary key adds, and if it is negligible then we should leave it there and ensure the constructed partitions are per-ruleset.

(will be tested in #17)

Conclusion: Including ruleset in the partitioning schema is not a good direction, in terms of future extensibility and complexity in partition rotation.

Partition rotation is not yet implemented

Currently we have only two partitions – preserve=0 and preserve=1. The plan is to add partition rotation for the p0 case, but this hasn't been done yet. It still seems like a good idea, but comes with a concern:

We are going in a direction where a score can be switched between preserve=0 and preserve=1. We also have tooling for removing preserve=0 scores after a certain delay and when all criteria are correct (see ppy/osu-queue-score-statistics#141).

So, what happens when a score becoming p=0 and falls into a partition that is about to be rotated for cleanup? If the cleanup process is guaranteeing a time window before scores are cleaned up after the last preserve flag switch, then they may end up getting rotated out of existence too soon. An example would be a user unpinning a score, then realising they want to pin it again a few minutes later. With unfortunate timing it would be lost during this period.

A solution may be to change the partitioning to be on updated_at instead of created_at and ensure we are updating updated_at on any row change.

Partition rotation should be considered for other tables

solo_score_tokens

Considerations for `solo_scores` table structure / indices

For certain tasks like ppy/osu-queue-score-statistics#149, we need the ability to do lookups on (user_id,ruleset_id,beatmap_id). Currently we have no index on this.

For other similar lookups in osu-web, elasticsearch is used. I have concerns for using elasticsearch are reliability and correctness. The latter is important: if an item in elasticsearch has been queued for indexing/reindexing which hasn't yet been completely, incorrect (outdated) data could be returned.

So I'm going to focus on the database side of things firstly.

Adding an index has both performance and data size concerns.

I'll be investigating and testing of structural changes to figure the correct path forward.

Tasks

Beta Give feedback

Structural changes to be applied from investigations:

--- old.sql	2023-09-01 18:17:38
+++ new.sql	2023-09-01 20:03:04
@@ -6,12 +6,12 @@
   `data` json NOT NULL,
   `has_replay` tinyint(1) DEFAULT '0',
   `preserve` tinyint(1) NOT NULL DEFAULT '0',
-  `created_at` datetime NOT NULL,
-  `updated_at` timestamp NULL DEFAULT NULL,
-  PRIMARY KEY (`id`,`ruleset_id`,`preserve`,`created_at`),
-  KEY `user_ruleset_id_index` (`user_id`,`ruleset_id`,`id` DESC),
-  KEY `beatmap_id` (`beatmap_id`)
+  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
+  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+  PRIMARY KEY (`id`,`preserve`,`updated_at`),
+  KEY `user_ruleset_index` (`user_id`,`ruleset_id`),
+  KEY `beatmap_user_index` (`beatmap_id`,`user_id`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
-/*!50500 PARTITION BY RANGE  COLUMNS(`preserve`,created_at)
+/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,`updated_at`)
 (PARTITION p0catch VALUES LESS THAN (0,MAXVALUE) ENGINE = InnoDB,
  PARTITION p1 VALUES LESS THAN (MAXVALUE,MAXVALUE) ENGINE = InnoDB) */

Investigate whether there's a better way to do the recent scores lookup

The only reason for index KEY user_ruleset_id_index (user_id,ruleset_id,id DESC) existing is to do recent score display on user profiles.

Historically, we've maintained a separate table for this. We may just want to go back to doing this if it means the main table gets a whole lot smaller.

Redeploy server components with new legacy difficulty attribute storage

[Discussion] Is `osu_scores_high` `UNLOGGED`?

Having a table be UNLOGGED can lead to much faster performance (if you're using postgresql). Maybe give it a try if it's not already unlogged?

ALTER TABLE "osu_scores_high" SET UNLOGGED

Kubernetes `osu-server-spectator` deploy isn't reporting to datadog

As seen on https://status.ppy.sh (see datadog metric and note only one host is reporting).

Deploy ProxySQL on kubernetes

The version we are running is quite outdated, and has a rare tendency to fall over. It would be beneficial if we can run one (or more) instances on kubernetes, to allow for easier upgrades and better resilience.

Things that need consideration:

Management of configuration (needs to be easy to make quick changes). I don't know what kind of import/export options are available, potentially just a mysql-dump.
If we run multiple instances, I'd hope kubernetes can round robing between them automatically while only giving a single host specification to deployments
I'd probably limit usage to only apps deployed to kubernetes. We can keep a separate instance for other usage as we migrate.

For reference, ProxySQL is high cpu, low everything-else

Create a new elasticsearch schema

OpenSearch is a fork of ElastiSearch that's more performant. Maybe give it a shot?

Investigate adding a new column for rankability of individual scores

While we can control the rankability (i.e. for pp) of individual builds via the osu_builds flag, it may be worthwhile to also be able to control the rankability of individual scores.

For example, if we want to unrank only those scores set for a particular mod.

ppy / osu-infrastructure Goto Github PK

osu-infrastructure's People

Contributors

Stargazers

Watchers

Forkers

osu-infrastructure's Issues

Important changes

Table setup

Medal updates

Related PRs

Deployment (in order)

Wiki

Tasks

Tasks

Database structure testing

Test whether changing primary key composition affects index size

Test whether existing index needs id DESC at end (or whether it can use the implicity primary key)

Test how badly inserting beatmap_id into the existing index (KEY user_ruleset_id_index (user_id,ruleset_id,id DESC)) breaks things (100% it will)

Test whether including ruleset_id in the constructed partitions (as touched on in #16) means we can remove it from indices

Check why we have a beatmap_id index

Notes

Partition scheme includes ruleset_id but it is not used

Partition rotation is not yet implemented

Partition rotation should be considered for other tables

Tasks

Tasks

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Test whether existing index needs `id DESC` at end (or whether it can use the implicity primary key)

Test how badly inserting `beatmap_id` into the existing index (`KEY user_ruleset_id_index (user_id,ruleset_id,id DESC)`) breaks things (100% it will)

Test whether including `ruleset_id` in the constructed partitions (as touched on in #16) means we can remove it from indices

Check why we have a `beatmap_id` index

Partition scheme includes `ruleset_id` but it is not used