Comments (3)
Based on investigations above, the change in structure I'm looking to apply is:
--- old.sql 2023-09-01 18:17:38
+++ new.sql 2023-09-01 18:18:47
@@ -8,9 +8,9 @@
`preserve` tinyint(1) NOT NULL DEFAULT '0',
`created_at` datetime NOT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
- PRIMARY KEY (`id`,`ruleset_id`,`preserve`,`created_at`),
- KEY `user_ruleset_id_index` (`user_id`,`ruleset_id`,`id` DESC),
- KEY `beatmap_id` (`beatmap_id`)
+ PRIMARY KEY (`id`,`preserve`,`created_at`),
+ KEY `user_ruleset_index` (`user_id`,`ruleset_id`),
+ KEY `beatmap_user_index` (`beatmap_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,created_at)
(PARTITION p0catch VALUES LESS THAN (0,MAXVALUE) ENGINE = InnoDB,
- Index added to allow lookups of a user's scores on a specific beatmap (as required for ppy/osu-queue-score-statistics#149). Note that the order of this index keeps
beatmap_id
first to aid in operations on a whole beatmap (ie. deleting all scores). id DESC
removed fromuser_ruleset_index
– it's not required as it's in the primary key and implicitly available at the end of the index. This doesn't reduce the index size, so MySQL was likely doing this optimisation internally.- Removed
ruleset_id
from primary key. It wasn't being used and will not be used in partitioning scheme due to extensibility concerns (see OP for more commentary).
This is a first checkpoint, I still have some further smaller changes to test:
TODO:
- Consider changing partitioning to be on
updated_at
instead ofcreated_id
(see concerns in #16) - Change
created_at
totimestamp
(saving 4 bytes per row) - Add default value for
created_at
andupdated_at
- Make
updated_at
NOT NULL
from osu-infrastructure.
Updated with TODO changes applied:
--- old.sql 2023-09-01 18:17:38
+++ new.sql 2023-09-01 20:03:04
@@ -6,12 +6,12 @@
`data` json NOT NULL,
`has_replay` tinyint(1) DEFAULT '0',
`preserve` tinyint(1) NOT NULL DEFAULT '0',
- `created_at` datetime NOT NULL,
- `updated_at` timestamp NULL DEFAULT NULL,
- PRIMARY KEY (`id`,`ruleset_id`,`preserve`,`created_at`),
- KEY `user_ruleset_id_index` (`user_id`,`ruleset_id`,`id` DESC),
- KEY `beatmap_id` (`beatmap_id`)
+ `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
+ `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+ PRIMARY KEY (`id`,`preserve`,`updated_at`),
+ KEY `user_ruleset_index` (`user_id`,`ruleset_id`),
+ KEY `beatmap_user_index` (`beatmap_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
-/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,created_at)
+/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,`updated_at`)
(PARTITION p0catch VALUES LESS THAN (0,MAXVALUE) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (MAXVALUE,MAXVALUE) ENGINE = InnoDB) */
from osu-infrastructure.
Bonus round: compression testing to ensure we have things fine tuned
Compression testing
Time values are from ALTER
. In general this seems very optimised, so the actual overheads for INSERT
etc. operations would be higher.
# key_size=16
# Time: 268.081s
-rw-r----- 1 dean admin 6069157888 Sep 4 12:35 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 3179282432 Sep 4 12:35 solo_scores_p#p#p1.ibd
# key_size=8
# Time: 195.931s
-rw-r----- 1 dean admin 3036676096 Sep 4 11:33 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 1598029824 Sep 4 11:33 solo_scores_p#p#p1.ibd
# key_size=4
# Time: 204.383s
-rw-r----- 1 dean admin 1606418432 Sep 4 12:20 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 872415232 Sep 4 12:20 solo_scores_p#p#p1.ibd
# key_size=2
# Time: 249.902s
-rw-r----- 1 dean admin 1602224128 Sep 4 12:24 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 1019215872 Sep 4 12:24 solo_scores_p#p#p1.ibd
# row_format=compact
# Time: 122.149s
-rw-r----- 1 dean admin 6069157888 Sep 4 12:35 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 3179282432 Sep 4 12:35 solo_scores_p#p#p1.ibd
# row_format=dynamic
# Time: 58.805s
-rw-r----- 1 dean admin 6480199680 Sep 4 12:42 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 3376414720 Sep 4 12:42 solo_scores_p#p#p1.ibd
from osu-infrastructure.
Related Issues (20)
- Migrate osu-web (workers/cronjob) to Kubernetes
- Replay handling for imported (and new) `solo_scores` HOT 3
- SSL migration
- Add partitioning to `solo_scores` table HOT 2
- SR/PP update checklist 2022-09 HOT 6
- Considerations for `solo_scores` table structure / indices HOT 1
- Figure out the final partitioning structure for `solo_scores` HOT 3
- Investigate whether there's a better way to do the recent scores lookup HOT 1
- Replays and PP should not be processed for non-passing scores
- Redeploy server components with new legacy difficulty attribute storage HOT 7
- Populate `max_combo` in `osu_beatmaps` using `osu-difficulty-calculator` HOT 7
- Infrastructure deployment tasks for path-to-ranking HOT 1
- Improving multiplayer things HOT 17
- Score statistics processor needs to process PP for imported highscores HOT 4
- Investigate adding a new column for rankability of individual scores HOT 1
- Migrate scthumber on Kubernetes
- Migrate camo to Kubernetes
- Migrate osu-notification-server to Kubernetes
- Migrate osu-web (octane) to Kubernetes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from osu-infrastructure.