GithubHelp home page GithubHelp logo

azrenbeth / rust-synapse-compress-state Goto Github PK

View Code? Open in Web Editor NEW

This project forked from matrix-org/rust-synapse-compress-state

0.0 0.0 0.0 1011 KB

A tool to compress some state in a Synapse instance's database

License: Apache License 2.0

Rust 99.64% Shell 0.36%

rust-synapse-compress-state's Introduction

Compress Synapse State Tables

An experimental tool that reads in the rows from state_groups_state and state_group_edges tables for a particular room and calculates the changes that could be made that (hopefully) will significantly reduce the number of rows.

This tool currently does not write to the database in any way, so should be safe to run. If the -o option is specified then SQL will be written to the given file that would change the tables to match the calculated state. (Note that if -t is given then each change to a particular state group is wrapped in a transaction).

The SQL generated by the -o option is safe to apply against the database with Synapse running. This is because the state_groups and state_groups_state tables are append-only: once written to the database, they are never modified. There is therefore no danger of a modification racing against a running synapse. Further, this script makes its changes within atomic transactions, and each transaction should not affect the results from any of the queries that synapse performs.

The tool will also ensure that the generated state deltas do give the same state as the existing state deltas before generating any SQL.

Algorithm

The algorithm works by attempting to create a tree of deltas, produced by appending state groups to different "levels". Each level has a maximum size, where each state group is appended to the lowest level that is not full.

This produces a graph that looks approximately like the following, in the case of having two levels with the bottom level (L1) having a maximum size of 3:

L2 <-------------------- L2 <---------- ...
^--- L1 <--- L1 <--- L1  ^--- L1 <--- L1 <--- L1

The sizes and number of levels used can be controlled via -l, and defaults to 3 levels of sizes 100, 50 and 25.

Note: Increasing the sum of the sizes of levels will increase the time it takes for to query the full state of a given state group. By default Synapse attempts to keep this below 100.

Example usage

$ synapse_compress_state -p "postgresql://localhost/synapse" -r '!some_room:example.com' -o out.sql -t
Fetching state from DB for room '!some_room:example.com'...
Got initial state from database. Checking for any missing state groups...
Number of state groups: 73904
Number of rows in current table: 2240043
Number of rows after compression: 165754 (7.40%)
Compression Statistics:
  Number of forced resets due to lacking prev: 34
  Number of compressed rows caused by the above: 17092
  Number of state groups changed: 2748
New state map matches old one

# It's finished, so we can now go and rewrite the DB
$ psql synapse < out.data

Running Options

  • -p [URL] Required
    The url for connecting to the postgres database. This should be of the form "postgresql://username:[email protected]/database"

  • -r [ROOM_ID] Required
    The room to process. This is the value found in the rooms table of the database not the common name for the room - is should look like: "!wOlkWNmgkAZFxbTaqj:matrix.org"

  • -b [MIN_STATE_GROUP]
    The state group to start processing from (non inclusive)

  • -n [GROUPS_TO_COMPRESS]
    How many groups to load into memory to compress (starting from the 1st group in the room or the group specified by -s)

  • -l [LEVELS]
    Sizes of each new level in the compression algorithm, as a comma separated list. The first entry in the list is for the lowest, most granular level, with each subsequent entry being for the next highest level. The number of entries in the list determines the number of levels that will be used. The sum of the sizes of the levels effect the performance of fetching the state from the database, as the sum of the sizes is the upper bound on number of iterations needed to fetch a given set of state. [default's to 100,50,25]

  • -m [COUNT]
    If the compressor cannot save this many rows from the database then it will stop early

  • -s [MAX_STATE_GROUP]
    If a max_state_group is specified then only state groups with id's lower than this number are able to be compressed.

  • -o [FILE]
    File to output the SQL transactions to (for later running on the database)

  • -t
    If this flag is set then then each change to a particular state group is wrapped in a transaction. This should be done if you wish to apply the changes while synapse is still running.

  • -g
    If this flag is set then output the node and edge information for the state_group directed graph built up from the predecessor state_group links. These can be looked at in something like Gephi (https://gephi.org)

  • -c
    If this flag is set then the changes the compressor makes will be committed to the database. This should be safe to use while synapse is running as it assumes by default that the transactions flag is set

Using as python library

The compressor can also be built into a python library as it uses PyO3. It can be built and installed into the current virtual environment by running maturin develop

All the same running options are available, see the comments in the Config struct in lib.rs for the names of each argument. All arguments other than db_url and room_id are optional.

The following code does exactly the same as the command-line example from above:

import synapse_compress_state as comp

comp.run_compression(
  db_url="postgresql://localhost/synapse",
  room_id="!some_room:example.com",
  output_file="out.sql",
  transactions=True
)

Note: since this library uses Jemalloc, you might get an error of the form:

ImportError: /[LONG_PATH]/synapse_compress_state.abi3.so: cannot allocate memory in static TLS block

If this happens then try running the following:

LD_PATH=/[LONG_PATH]/synapse_compress_state.abi3.so ./my_python_script

Or just try disabling jemalloc:

$ maturin develop --cargo-extra-args="--no-default-features"

Running tests

There are integration tests for these tool stored in compressor_integration_tests/

To run the integration tests, you first need to start up a postgres database for the libary to talk to. There is a docker-compose file that sets one up with all of the correct tables. The tests can therefore be run as follows:

$ cd compressor_integration_tests/
$ docker-compose up -d
$ cargo test --workspace
$ docker-compose down

Note, any output from these tests goes into compressor_integration_tests/tmp/ so if this directory doesn't already exist then you will need to create it.

rust-synapse-compress-state's People

Contributors

azrenbeth avatar erikjohnston avatar heftig avatar jo-so avatar ananace avatar richvdh avatar reivilibre avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.