GithubHelp home page GithubHelp logo

Comments (3)

peppy avatar peppy commented on May 18, 2024 1

Based on investigations above, the change in structure I'm looking to apply is:

--- old.sql	2023-09-01 18:17:38
+++ new.sql	2023-09-01 18:18:47
@@ -8,9 +8,9 @@
   `preserve` tinyint(1) NOT NULL DEFAULT '0',
   `created_at` datetime NOT NULL,
   `updated_at` timestamp NULL DEFAULT NULL,
-  PRIMARY KEY (`id`,`ruleset_id`,`preserve`,`created_at`),
-  KEY `user_ruleset_id_index` (`user_id`,`ruleset_id`,`id` DESC),
-  KEY `beatmap_id` (`beatmap_id`)
+  PRIMARY KEY (`id`,`preserve`,`created_at`),
+  KEY `user_ruleset_index` (`user_id`,`ruleset_id`),
+  KEY `beatmap_user_index` (`beatmap_id`,`user_id`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
 /*!50500 PARTITION BY RANGE  COLUMNS(`preserve`,created_at)
 (PARTITION p0catch VALUES LESS THAN (0,MAXVALUE) ENGINE = InnoDB,
  • Index added to allow lookups of a user's scores on a specific beatmap (as required for ppy/osu-queue-score-statistics#149). Note that the order of this index keeps beatmap_id first to aid in operations on a whole beatmap (ie. deleting all scores).
  • id DESC removed from user_ruleset_index – it's not required as it's in the primary key and implicitly available at the end of the index. This doesn't reduce the index size, so MySQL was likely doing this optimisation internally.
  • Removed ruleset_id from primary key. It wasn't being used and will not be used in partitioning scheme due to extensibility concerns (see OP for more commentary).

This is a first checkpoint, I still have some further smaller changes to test:

TODO:

  • Consider changing partitioning to be on updated_at instead of created_id (see concerns in #16)
  • Change created_at to timestamp (saving 4 bytes per row)
  • Add default value for created_at and updated_at
  • Make updated_at NOT NULL

from osu-infrastructure.

peppy avatar peppy commented on May 18, 2024

Updated with TODO changes applied:

--- old.sql	2023-09-01 18:17:38
+++ new.sql	2023-09-01 20:03:04
@@ -6,12 +6,12 @@
   `data` json NOT NULL,
   `has_replay` tinyint(1) DEFAULT '0',
   `preserve` tinyint(1) NOT NULL DEFAULT '0',
-  `created_at` datetime NOT NULL,
-  `updated_at` timestamp NULL DEFAULT NULL,
-  PRIMARY KEY (`id`,`ruleset_id`,`preserve`,`created_at`),
-  KEY `user_ruleset_id_index` (`user_id`,`ruleset_id`,`id` DESC),
-  KEY `beatmap_id` (`beatmap_id`)
+  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
+  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+  PRIMARY KEY (`id`,`preserve`,`updated_at`),
+  KEY `user_ruleset_index` (`user_id`,`ruleset_id`),
+  KEY `beatmap_user_index` (`beatmap_id`,`user_id`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
-/*!50500 PARTITION BY RANGE  COLUMNS(`preserve`,created_at)
+/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,`updated_at`)
 (PARTITION p0catch VALUES LESS THAN (0,MAXVALUE) ENGINE = InnoDB,
  PARTITION p1 VALUES LESS THAN (MAXVALUE,MAXVALUE) ENGINE = InnoDB) */

from osu-infrastructure.

peppy avatar peppy commented on May 18, 2024

Bonus round: compression testing to ensure we have things fine tuned

Compression testing

Time values are from ALTER. In general this seems very optimised, so the actual overheads for INSERT etc. operations would be higher.

# key_size=16
# Time: 268.081s
-rw-r-----  1 dean  admin  6069157888 Sep  4 12:35 solo_scores_p#p#p0catch.ibd
-rw-r-----  1 dean  admin  3179282432 Sep  4 12:35 solo_scores_p#p#p1.ibd

# key_size=8
# Time: 195.931s

-rw-r-----  1 dean  admin  3036676096 Sep  4 11:33 solo_scores_p#p#p0catch.ibd
-rw-r-----  1 dean  admin  1598029824 Sep  4 11:33 solo_scores_p#p#p1.ibd

# key_size=4
# Time: 204.383s

-rw-r-----  1 dean  admin  1606418432 Sep  4 12:20 solo_scores_p#p#p0catch.ibd
-rw-r-----  1 dean  admin   872415232 Sep  4 12:20 solo_scores_p#p#p1.ibd

# key_size=2
# Time: 249.902s

-rw-r-----  1 dean  admin  1602224128 Sep  4 12:24 solo_scores_p#p#p0catch.ibd
-rw-r-----  1 dean  admin  1019215872 Sep  4 12:24 solo_scores_p#p#p1.ibd

# row_format=compact
# Time: 122.149s

-rw-r-----  1 dean  admin  6069157888 Sep  4 12:35 solo_scores_p#p#p0catch.ibd
-rw-r-----  1 dean  admin  3179282432 Sep  4 12:35 solo_scores_p#p#p1.ibd

# row_format=dynamic
# Time: 58.805s

-rw-r-----  1 dean  admin  6480199680 Sep  4 12:42 solo_scores_p#p#p0catch.ibd
-rw-r-----  1 dean  admin  3376414720 Sep  4 12:42 solo_scores_p#p#p1.ibd

from osu-infrastructure.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.