GithubHelp home page GithubHelp logo

osu-infrastructure's People

Contributors

bdach avatar nanaya avatar peppy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osu-infrastructure's Issues

Score statistics processor needs to process PP for imported highscores

When we deploy lazer for ranking purposes, we need to completely shut down osu-performance and have PP be solely updated by osu-queue-score-statistics. Otherwise, because osu-performance only reads from the legacy tables, it will exclude any scores set in lazer.

But right now all imported highscores bypass osu-queue-score-statistics so it'll miss any scores set in osu!stable.

Add partitioning to `solo_scores` table

See https://gist.github.com/peppy/8851f98d0fb783ff29043f408e6a3923 for three proposals.
(I'd recommend opening each in separate browser tabs to easily compare the differences between them by switching tabs, or something like that)

  • partition_preserve is the most pessimistic (less optimised but also less fussy for the future maybe, when we add more rulesets)
  • partition_ruleset_preserve is most optimal and removes ruleset_id out of the largest index, and removes the preserve index** (see caveat in inline comments)
  • partition_preserve_ruleset removes the caveat of preserve lookups, but can't optimise the ruleset_id out of the other index as a result

I'd recommend running that and importing some data to test performance / query plan for any concern queries.

insert into partitioned_solo_scores select * from solo_scores order by id desc limit 10000;

can be used if you already have data in the actual table.

SR/PP update checklist 2022-09

Important changes

  • osu: Added HD+FL difficulty adjustment mod combination (1032)
  • osu: Added TD difficulty adjustment mod (4)
  • osu: Added speed_note_count difficulty attribute.

Table setup

  • No changes required.
    • Database already contains the new Speed note count attribute (attrib_id = 21).

Medal updates

Related PRs

Deployment (in order)

Suggested ruleset order:
taiko -> osu -> mania -> catch

  • osu-difficulty-calculator (watch - all rulesets)
  • osu-difficulty-calculator (reprocess - all rulesets)
  • osu-web-10 (with above PRs)
  • osu-performance (watch - all rulesets)
  • osu-performance (reprocess - all rulesets)
  • osu-beatmap-difficulty-lookup-cache (with above PR)
  • osu-stable (with above PR)
  • osu-queue-score-statistics (watch)
  • osu-queue-score-statistics (scores all - all rulesets)

Wiki

Infrastructure deployment tasks for path-to-ranking

From a high level, tracking the order of deployment tasks that have dependencies on other changes.

Tasks

  1. peppy
  2. 9 of 9
    area:results type:online
    bdach
  3. peppy
  4. peppy
  5. 13 of 13
    smoogipoo
  6. bdach
  7. bdach
  8. 5 of 7
    bdach peppy

Testing notes for self:

To reset everything:

# nuke all indices
osu.ElasticIndexer# dotnet run index nuke

# view all indices
curl -X GET "localhost:9200/_cat/indices?v&pretty"

# restart es service
systemctl restart elasticsearch

truncate table scores;truncate table score_legacy_id_map;truncate table score_performance;

Ongoing stuff which needs to be run:

# osu-queue-store-statistics
cd ~/repos/osu-queue-score-statistics/osu.Server.Queues.ScoreStatisticsProcessor
git fetch; git reset --hard peppy/new-table-names
SCHEMA=20231208 dotnet run queue import-high-scores --start-id 0

# osu-elastic-indexer
cd ~/repos/osu-elastic-indexer/osu.ElasticIndexer
git fetch --all; git reset --hard peppy/new-table-names
SCHEMA=20231208 dotnet run queue watch

Add osu-web support for new score infrastructure

Currently documented at https://github.com/ppy/osu-infrastructure/blob/master/score-submission.md

osu-web PRs:

osu-web Preparation (done)
  • update profile page recent plays to use solo scores
  • add user/ruleset index on solo scores
    • need to figure out how to filter out the fail ones
    • also probably need to add id as part of the index for sorting
  • update beatmap user scores to use es
  • add beatmap leader reset job/command
  • update profile page first place section to use solo scores
  • create query builder-like class for querying scores
  • measure index size for solo_scores(beatmap_id): 19GB/16GB (dynamic/+lz4), 9GB/8GB (compressed/+lz4)
  • measure index size for solo_scores(user_id, ruleset_id, created_at desc) not needed anymore since we're going to show all recent plays, not limited to just 24h
  • update beatmap pack page to use es (partially waiting for ppy/osu-elastic-indexer#111)
  • fix tests (queuing score for index and update github action and docker dev) (partially waiting for ppy/osu-elastic-indexer#110)
  • update beatmap scoreboard reset (during state change) to also delete solo scores
  • queue new scores (waiting on the specs?)

SSL migration

Our current wildcard certificate expires on September 3rd. Our current provider (DigiCert) has increased their pricing and we're looking to move away from them.

After reviewing and testing Let's Encrypt devices compatibility, we have made the decision to integrate ACME in our infrastructure and switch to LE.

Google Trust Services has also put a public ACME service in place. They offer a similar service to Let's Encrypt, except the compatibility is as good as a root certificate from 1998 gets. Using GTS enables us to retain the same compatibility that osu! users are used to. This service is in free public beta. It is not impossible that this service will become paid at the end of the beta phase, but as they both use ACME we can switch back-and-forth with these providers in just a few minutes, so the plan is to roll with GTS for now.

  • Kubernetes clusters SSL migration
    Our Kubernetes clusters will issue certificates using cert-manager, an ACME client by jetstack built for Kubernetes.

    • Staging cluster (http01 validation only)
    • Production cluster (dns01 & http01 validation)
  • Individual droplets SSL migration
    Most of the work goes here as there are dozens of droplets to migrate - or rather, create an SSL infrastructure for. As we used to renew with DigiCert every 3 years, no automated process has been put in place. LE/GTS deliver certificates for only up to 90 days, so we must switch to an automated solution.

    Our individual droplets run on a huge variety of different operating systems. Managing ACME clients on each would be a huge overhead, and we'd rather not share our CloudFlare API token with every droplets that need wildcard certificates.

    Therefore, we will rely on the cert-manager in our production Kubernetes cluster to issue and renew all the certificates we need. All our droplets will fetch these certificates on a regular basis using small bash/curl scripts, via a custom-made HTTPS service that will be running inside the production cluster. Droplets will be authenticated using client-side certificates authentication.

    • Certificates serving back-end development
    • Certificates serving back-end deployment
    • Certificates fetching script development
    • Certificates fetching script deployment across all ~15 droplets/nodes that need them..
  • Automatically refresh our custom edge certificate on Cloudflare

Replay handling for imported (and new) `solo_scores`

Just documenting some IRL discussion regarding the path forward with migration of the has_replay flag (aka knowing if a score has a replay available), which is currently not present in the new solo_scores schema.

This was decided with the goal of keeping things simple and flexible for now, and may change in the future once we have the systems online.

Current proposal:

  • Add BOOL has_replay to solo_scores
  • Add legacy_score_id to JSON data (alternative is to add a second index to legacy_id_map table, but feels like the legacy ID should be in the json data)
  • Change replay retrieval code to first retrieve using solo_score.ID and falling back to solo_score.data.legacy_score_id.
  • Migrating S3 data from legacy to new IDs once everything has settled (then removing the fallback).

Tasks to make this happen:

  • Run online schema changes to add new has_replay column
  • Update ImportHighScores (ppy/osu-score-statistics-processor) to include legacy_score_id and has_replay
  • Update web-10 submission to correctly update the new flags (on replay deletion)
  • Update web-10 and osu-web replay retrieval to perform fallback lookups
  • Migrate existing replays to new IDs on S3

cc/ @nanaya @smoogipoo

Osu! Lazer Scores not showing on profile

I have set multiple plays in Osu! Lazer that do not show on my public profile, let me provide examples;

image
Here, you can see a play I set on "how to create the weeknd's "blinding lights"" by Seth Everman. Here you can see that this submitted score added 162pp play to my profile.
profile
This is a screenshot of my profile as you can see, the play does not show up. I will provide one more example:
image
As you can see in this image I set a 202pp play on "Horrible Kids" by Set It Off. If you look at the previous screenshot of my profile you can see I have Lazer mode selected and these plays do not show up on my public profile. Is this a visual bug, or a bug with submitting scores, I would really like to have this issue resolved as it is my first 200pp play and another score with a decent pp count. Thank you for reading. (Both of these scores were set over a week ago btw)

Improving multiplayer things

I've been working through migration to the new multiplayer_score_links table, along with review on ppy/osu#24697 / ppy/osu-server-spectator#185. I've also been in discussion with @nanaya over my slight unhappiness (inability to easily comprehend) the current structure of things.

To recap:

CREATE TABLE `multiplayer_score_links`
(
    `id`               bigint unsigned    NOT NULL AUTO_INCREMENT,
    `user_id`          int unsigned       NOT NULL,
    `room_id`          bigint unsigned    NOT NULL,
    `playlist_item_id` bigint unsigned    NOT NULL,
    `beatmap_id`       mediumint unsigned NOT NULL,
    `build_id`         mediumint unsigned NOT NULL DEFAULT '0',
    `score_id`         bigint unsigned             DEFAULT NULL,
    `created_at`       timestamp          NULL     DEFAULT NULL,
    `updated_at`       timestamp          NULL     DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `multiplayer_score_links_score_id_index` (`score_id`),
    KEY `multiplayer_score_links_room_id_user_id_index` (`room_id`, `user_id`),
    KEY `multiplayer_score_links_playlist_item_id_index` (`playlist_item_id`),
    KEY `multiplayer_score_links_user_id_index` (`user_id`)
)

multiplayer_score_links is a table that replaces multiplayer_scores, and allows the majority of the score metadata to be stored in the main solo_scores table. While not immediately obvious from the structure or naming, it currently serves a dual purpose:

  • Used by osu-web for recalculating "user best" scores in a playlist or multiplayer room.
  • Used instead of solo_score_tokens for multiplayer scores. In other words, it is used to give the user a token when they begin gameplay that they can use to submit the final score.

This is weird. So my proposal is that we stop using this table for the token process, and use solo_score_tokens instead.

CREATE TABLE `solo_score_tokens`
(
    `id`         bigint unsigned NOT NULL AUTO_INCREMENT,
    `score_id`   bigint               DEFAULT NULL,
    `user_id`    bigint          NOT NULL,
    `beatmap_id` mediumint       NOT NULL,
    `ruleset_id` smallint        NOT NULL,
    `build_id`   mediumint unsigned   DEFAULT NULL,
    `created_at` timestamp       NULL DEFAULT NULL,
    `updated_at` timestamp       NULL DEFAULT NULL,
    PRIMARY KEY (`id`)
)

To make this work, we should add a playlist_item_id NULL DEFAULT NULL to solo_score_tokens. This table is already ephemeral data so adding extra columns like this is not a huge deal. We can then restructure the multiplayer_score_links table to be used specifically for lookup purposes.

The new tables would look something like this:

CREATE TABLE `solo_score_tokens`
(
    `id`               bigint unsigned NOT NULL AUTO_INCREMENT,
    `score_id`         bigint               DEFAULT NULL,
    `user_id`          bigint          NOT NULL,
    `beatmap_id`       mediumint       NOT NULL,
    `ruleset_id`       smallint        NOT NULL,
    `playlist_item_id` bigint unsigned NULL DEFAULT NULL,
    `build_id`         mediumint unsigned   DEFAULT NULL,
    `created_at`       timestamp       NULL DEFAULT NULL,
    `updated_at`       timestamp       NULL DEFAULT NULL,
    PRIMARY KEY (`id`)
);

CREATE TABLE `playlist_scores`
(
    `playlist_item_id` bigint unsigned NOT NULL,
    `score_id`         bigint unsigned DEFAULT NULL,
    `user_id`          int unsigned    NOT NULL,
    PRIMARY KEY (`score_id`),
    KEY `multiplayer_score_links_score_id_index` (`playlist_item_id`),
    KEY `multiplayer_score_links_user_id_index` (`user_id`)
)

This ill make ppy/osu#24697 / ppy/osu-server-spectator#185 obsoleted. It would fix my naming issues with the multiplayer_score_links table.

Also some remaining cleanup tasks

Some other considerations

  • Rename solo_scores to scores? Use a VIEW to alias the old name to ease migration?
  • Rename multiplayer_score_links to playlist_scores

cc/ @nanaya @notbakaneko @bdach @smoogipoo

Investigate data size concerns for adding a new index to `solo_scores`

Database structure testing

As per #15, we need an index on (user_id, ruleset_id, beatmap_id) if using the database for operations like ppy/osu-queue-score-statistics#149 (and in the future, ranked score processing and probably more).

I want to test on production data (scaled down slightly) to get a real-world idea of how adding/changing indices affect the size of this table.

Test whether changing primary key composition affects index size

TL;DR it doesn't seem to.

-- original
ALTER TABLE `solo_scores`
DROP PRIMARY KEY,
ADD PRIMARY KEY (`id`, `preserve`, `ruleset_id`, `created_at`);

MySQL root@(none):osu> SELECT
                    ->     table_name AS `Table`,
                    ->     ROUND(data_length / 1024 / 1024) AS `Data MB`,
                    ->     ROUND(index_length / 1024 / 1024) AS `Index MB`
                    -> FROM
                    ->     information_schema.tables
                    -> WHERE
                    ->     table_schema = 'osu' AND
                    ->     table_name = 'solo_scores';
+-------------+---------+----------+
| Table       | Data MB | Index MB |
+-------------+---------+----------+
| solo_scores | 1868    | 237      |
+-------------+---------+----------+


MySQL root@(none):osu> SHOW TABLE STATUS LIKE 'solo_scores';
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| Name        | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation          | Checksum | Create_options                                     | Comment |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| solo_scores | InnoDB | 10      | Compressed | 9523175 | 205            | 1958215680  | 0               | 248758272    | 3932160   | 2406267336     | 2023-09-01 01:24:18 | <null>      | <null>     | utf8mb4_0900_ai_ci | <null>   | row_format=COMPRESSED KEY_BLOCK_SIZE=4 partitioned |         |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
-- remove ruleset_id and see if index size changes
ALTER TABLE `solo_scores`
DROP PRIMARY KEY,
ADD PRIMARY KEY (`id`, `preserve`, `created_at`)

MySQL root@(none):osu> SELECT
                    ->     table_name AS `Table`,
                    ->     ROUND(data_length / 1024 / 1024) AS `Data MB`,
                    ->     ROUND(index_length / 1024 / 1024) AS `Index MB`
                    -> FROM
                    ->     information_schema.tables
                    -> WHERE
                    ->     table_schema = 'osu' AND
                    ->     table_name = 'solo_scores';
+-------------+---------+----------+
| Table       | Data MB | Index MB |
+-------------+---------+----------+
| solo_scores | 1867    | 236      |
+-------------+---------+----------+

MySQL root@(none):osu> SHOW TABLE STATUS LIKE 'solo_scores';
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| Name        | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation          | Checksum | Create_options                                     | Comment |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+
| solo_scores | InnoDB | 10      | Compressed | 9577016 | 204            | 1957953536  | 0               | 247709696    | 5505024   | 2406267336     | 2023-09-01 02:58:21 | <null>      | <null>     | utf8mb4_0900_ai_ci | <null>   | row_format=COMPRESSED KEY_BLOCK_SIZE=4 partitioned |         |
+-------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------------------------------------------+---------+

Test whether existing index needs id DESC at end (or whether it can use the implicity primary key)

TL;DR it doesn't, it can use the primary key

explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;

DROP INDEX `user_ruleset_id_index`;
ADD INDEX `user_ruleset_id_index` (`user_id`,`ruleset_id`);

explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;

Test how badly inserting beatmap_id into the existing index (KEY user_ruleset_id_index (user_id,ruleset_id,id DESC)) breaks things (100% it will)

TL;DR it does, as expected. We can get around this by changing the existing KEY beatmap_id to include user_id at the end. This comes at almost zero storage cost due to the high cardinality of users per beatmap (almost completely unique). So we're just changing the ordering of the index rather than adding new overheads.

explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;


MySQL root@(none):osu> explain SELECT `id` FROM `solo_scores_p` WHERE `user_id` = 19743981 AND `ruleset_id` = 0 and preserve in (0,1) ORDER BY `id`limit 10;
+----+-------------+---------------+------------+------+---------------+---------------+---------+-------------+---------+----------+--------------------------+
| id | select_type | table         | partitions | type | possible_keys | key           | key_len | ref         | rows    | filtered | Extra                    |
+----+-------------+---------------+------------+------+---------------+---------------+---------+-------------+---------+----------+--------------------------+
| 1  | SIMPLE      | solo_scores_p | p0catch,p1 | ref  | user_id_index | user_id_index | 6       | const,const | 1700684 | 20.0     | Using where; Using index |
+----+-------------+---------------+------------+------+---------------+---------------+---------+-------------+---------+----------+--------------------------+

10 rows in set
Time: 0.005s


DROP INDEX `user_ruleset_id_index`;
ADD INDEX `user_ruleset_id_index` (`user_id`,`ruleset_id`, `beatmap_id`, `id` DESC);

explain SELECT * FROM solo_scores WHERE user_id = 2 AND ruleset_id = 0 ORDER BY id DESC LIMIT 50;

MySQL root@(none):osu> explain SELECT `id` FROM `solo_scores_p` WHERE `user_id` = 19743981 AND `ruleset_id` = 0 and preserve
                    -> in (0,1) ORDER BY `id`limit 10;
+----+-------------+---------------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+
| id | select_type | table         | partitions | type  | possible_keys | key     | key_len | ref    | rows | filtered | Extra       |
+----+-------------+---------------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+
| 1  | SIMPLE      | solo_scores_p | p0catch,p1 | index | user_id_index | PRIMARY | 16      | <null> | 62   | 3.18     | Using where |
+----+-------------+---------------+------------+-------+---------------+---------+---------+--------+------+----------+-------------+

10 rows in set
Time: 0.498s

Test whether including ruleset_id in the constructed partitions (as touched on in #16) means we can remove it from indices

TL;DR correct, but I'm avoiding this because of concerns for future extensibility (custom rulesets). Also there's no storage saving on disk.

explain select * from solo_scores where user_id = 4937439 and ruleset_id = 3 and beatmap_id = 75;

ALTER TABLE `solo_scores`
    PARTITION BY RANGE COLUMNS(`preserve`, `ruleset_id`, created_at)
        (
        PARTITION p0r0catch VALUES LESS THAN (0,0,MAXVALUE) ENGINE = InnoDB,
        PARTITION p0r1catch VALUES LESS THAN (0,1,MAXVALUE) ENGINE = InnoDB,
        PARTITION p0r2catch VALUES LESS THAN (0,2,MAXVALUE) ENGINE = InnoDB,
        PARTITION p0r3catch VALUES LESS THAN (0,3,MAXVALUE) ENGINE = InnoDB,
        PARTITION p1r0 VALUES LESS THAN (1,0,MAXVALUE) ENGINE = InnoDB,
        PARTITION p1r1 VALUES LESS THAN (1,1,MAXVALUE) ENGINE = InnoDB,
        PARTITION p1r2 VALUES LESS THAN (1,2,MAXVALUE) ENGINE = InnoDB,
        PARTITION p1r3 VALUES LESS THAN (1,3,MAXVALUE) ENGINE = InnoDB
        );

-- working alternative?
ALTER TABL๏ฟฝE `solo_scores`
 PARTITION BY list COLUMNS(`preserve`, `ruleset_id`)
     (
     PARTITION p0r0catch VALUES IN ((0,0)),
     PARTITION p0r1catch VALUES IN ((0,1)),
     PARTITION p0r2catch VALUES IN ((0,2)),
     PARTITION p0r3catch VALUES IN ((0,3)),
     PARTITION p1r0 VALUES IN ((1,0)),
     PARTITION p1r1 VALUES IN ((1,1)),
     PARTITION p1r2 VALUES IN ((1,2)),
     PARTITION p1r3 VALUES IN ((1,3))
     );

+-------------+----------------+---------+----------+
| Table       | AVG_ROW_LENGTH | Data MB | Index MB |
+-------------+----------------+---------+----------+
| solo_scores | 178            | 1635    | 342      |
+-------------+----------------+---------+----------+

MySQL root@(none):osu> explain select * from solo_scores where user_id = 4937439 and ruleset_id = 3 and beatmap_id = 75 and preserve = 1\G;
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | solo_scores
partitions    | p1r3
type          | ref
possible_keys | user_ruleset_id_index,beatmap_id
key           | user_ruleset_id_index
key_len       | 6
ref           | const,const
rows          | 1
filtered      | 5.0
Extra         | Using index condition; Using where

ALTER TABLE `solo_scores`
    DROP INDEX `user_ruleset_id_index`,
    ADD KEY `user_id_index` (`user_id`,`id` DESC);


+-------------+----------------+---------+----------+
| Table       | AVG_ROW_LENGTH | Data MB | Index MB |
+-------------+----------------+---------+----------+
| solo_scores | 178            | 1635    | 342      |
+-------------+----------------+---------+----------+

explain select * from solo_scores where user_id = 4937439 and ruleset_id = 3 and beatmap_id = 75;


Check why we have a beatmap_id index

  • Used when a beatmap changes approved state (osu-web).

Notes

  • Using LIST partitioning, mysql can't determine it should be using specific partitions when preserve= is not in the query:
MySQL root@(none):osu> explain select id from solo_scores where user_id = 19743981 and ruleset_id = 0 order by id desc limit 10\G;
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | solo_scores
partitions    | p0r0,p0r1,p0r2,p0r3,p1r0,p1r1,p1r2,p1r3
type          | index
possible_keys | user_id_index
key           | PRIMARY
key_len       | 16
ref           | <null>
rows          | 64
filtered      | 1.56
Extra         | Using where; Backward index scan


MySQL root@(none):osu> explain select id from solo_scores where user_id = 19743981 and ruleset_id = 0 and preserve in (0,1) order by id desc limit 10\G;
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | solo_scores
partitions    | p0r0,p1r0
type          | index
possible_keys | user_id_index
key           | PRIMARY
key_len       | 16
ref           | <null>
rows          | 56
filtered      | 0.35
Extra         | Using where; Backward index scan
  • Enabling partitioning seems to reduce DATA_SIZE by around 33%. But there's no documentation anywhere saying that this should be the case. What gives?
+-----------------+----------------+---------+----------+
| Table           | AVG_ROW_LENGTH | Data MB | Index MB |
+-----------------+----------------+---------+----------+
| solo_scores     | 300            | 2835    | 369      |
| solo_scores_p   | 178            | 1780    | 359      |
| solo_scores_p_r | 178            | 1782    | 359      |
+-----------------+----------------+---------+----------+

(fixed by running ALTER TABLE solo_scores FORCE...)

Figure out the final partitioning structure for `solo_scores`

I setup partitioning to work with my last major effort on this table, but two pieces are missing.

Partition scheme includes ruleset_id but it is not used

Originally we were looking to partition per ruleset, which seems like a great idea from a performance angle. As such, this was included in the partition scheme, but it hasn't actually been used in the constructed partitions.

I want to test the overhead having ruleset_id in the primary key adds, and if it is negligible then we should leave it there and ensure the constructed partitions are per-ruleset.

(will be tested in #17)

Conclusion: Including ruleset in the partitioning schema is not a good direction, in terms of future extensibility and complexity in partition rotation.

Partition rotation is not yet implemented

Currently we have only two partitions โ€“ preserve=0 and preserve=1. The plan is to add partition rotation for the p0 case, but this hasn't been done yet. It still seems like a good idea, but comes with a concern:

We are going in a direction where a score can be switched between preserve=0 and preserve=1. We also have tooling for removing preserve=0 scores after a certain delay and when all criteria are correct (see ppy/osu-queue-score-statistics#141).

So, what happens when a score becoming p=0 and falls into a partition that is about to be rotated for cleanup? If the cleanup process is guaranteeing a time window before scores are cleaned up after the last preserve flag switch, then they may end up getting rotated out of existence too soon. An example would be a user unpinning a score, then realising they want to pin it again a few minutes later. With unfortunate timing it would be lost during this period.

A solution may be to change the partitioning to be on updated_at instead of created_at and ensure we are updating updated_at on any row change.

Partition rotation should be considered for other tables

  • solo_score_tokens

Considerations for `solo_scores` table structure / indices

For certain tasks like ppy/osu-queue-score-statistics#149, we need the ability to do lookups on (user_id,ruleset_id,beatmap_id). Currently we have no index on this.

For other similar lookups in osu-web, elasticsearch is used. I have concerns for using elasticsearch are reliability and correctness. The latter is important: if an item in elasticsearch has been queued for indexing/reindexing which hasn't yet been completely, incorrect (outdated) data could be returned.

So I'm going to focus on the database side of things firstly.


Adding an index has both performance and data size concerns.

I'll be investigating and testing of structural changes to figure the correct path forward.

Tasks

  1. peppy
  2. peppy
  3. peppy

Structural changes to be applied from investigations:

--- old.sql	2023-09-01 18:17:38
+++ new.sql	2023-09-01 20:03:04
@@ -6,12 +6,12 @@
   `data` json NOT NULL,
   `has_replay` tinyint(1) DEFAULT '0',
   `preserve` tinyint(1) NOT NULL DEFAULT '0',
-  `created_at` datetime NOT NULL,
-  `updated_at` timestamp NULL DEFAULT NULL,
-  PRIMARY KEY (`id`,`ruleset_id`,`preserve`,`created_at`),
-  KEY `user_ruleset_id_index` (`user_id`,`ruleset_id`,`id` DESC),
-  KEY `beatmap_id` (`beatmap_id`)
+  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
+  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+  PRIMARY KEY (`id`,`preserve`,`updated_at`),
+  KEY `user_ruleset_index` (`user_id`,`ruleset_id`),
+  KEY `beatmap_user_index` (`beatmap_id`,`user_id`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
-/*!50500 PARTITION BY RANGE  COLUMNS(`preserve`,created_at)
+/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,`updated_at`)
 (PARTITION p0catch VALUES LESS THAN (0,MAXVALUE) ENGINE = InnoDB,
  PARTITION p1 VALUES LESS THAN (MAXVALUE,MAXVALUE) ENGINE = InnoDB) */

Investigate whether there's a better way to do the recent scores lookup

The only reason for index KEY user_ruleset_id_index (user_id,ruleset_id,id DESC) existing is to do recent score display on user profiles.

Historically, we've maintained a separate table for this. We may just want to go back to doing this if it means the main table gets a whole lot smaller.

Redeploy server components with new legacy difficulty attribute storage

Tasks

Deploy ProxySQL on kubernetes

The version we are running is quite outdated, and has a rare tendency to fall over. It would be beneficial if we can run one (or more) instances on kubernetes, to allow for easier upgrades and better resilience.

Things that need consideration:

  • Management of configuration (needs to be easy to make quick changes). I don't know what kind of import/export options are available, potentially just a mysql-dump.
  • If we run multiple instances, I'd hope kubernetes can round robing between them automatically while only giving a single host specification to deployments
  • I'd probably limit usage to only apps deployed to kubernetes. We can keep a separate instance for other usage as we migrate.

For reference, ProxySQL is high cpu, low everything-else

iTerm2 2022-06-13 at 02 51 48

Migrate scthumber on Kubernetes

This service is running on an oversized droplet and is not an user-facing service, making it a perfect candidate for migration to Kubernetes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.