GithubHelp home page GithubHelp logo

osu-elastic-indexer's Introduction

ElasticIndexer

Component for loading osu! scores into Elasticsearch.

Requirements

  • .NET 6
  • Elasticsearch 7
  • Redis 6

Elasticsearch 8 compatiblity

If using Elasticsearch 8, a minimum version of Elasticsearch 8.2 is required.

The following env needs to be set on the indexer:

ELASTIC_CLIENT_APIVERSIONING=true

and the following must be set in elasticsearch server configuration elasticsearch.yml

xpack.security.enabled: false

or docker environment, e.g. in docker compose:

environment:
  xpack.security.enabled: false

This will enable http connections to elasticsearch and disable the https and authentication requirement, as well as, returning a compatible response to the client.

Usage

Schema

A string value is used to indicate the current schema version to be used.

When the queue processor is running, it will store the version it is processing in a set in Redis at osu-queue:score-index:${prefix}active-schemas.

If a queue processor is stops automatically due to a schema version change, it will remove the version it is processing from the set of versions; it will not be removed if the processor if stopped manually or from processor failures; this is to allow other services to continue pushing to those queues.

Adding items to be indexed

Scores with preserve=true belonging to a user with user_warnings=0 will be added to the index, scores where any of the previous conditions are false will be removed from the index.

Push items to osu-queue:score-index-${schema}

Switching to a new schema

Run dotnet run schema set ${schema} or set osu-queue:score-index:${prefix}schema directly in Redis

Automatic index switching

If there is an already running indexer watching the queue for the new schema version, it will automatically update the alias to point to the new index. When the alias is updated, any index previously used by the alias will be closed.

The alias will not be updated if:

  • the schema value does not change
  • the indexer processing the queue for that version was not running before the change.

When the schema version changes, all indexers processing the queues for any other version will automatically stop.

Configuration

Configuration is loaded from environment variables. No environment files are automatically loaded.

To read environment variables from an env file, you can prefix the command to run with env $(cat {envfile}) replacing {envfile} with your env file, e.g.

Note that this method of passing envvars does not support values with spaces.

env $(cat .env) dotnet run

Additional envs can be set:

env $(cat .env) SCHEMA=1 dotnet run

Environment Variables

BATCH_SIZE

Maximum number of items to handle/dequeue per batch. This affects the size of the _bulk request sent to Elasticsearch.

Defaults to 10000.

BUFFER_SIZE

Maximum number of BATCH_SIZE * BUFFER_SIZE items allowed inflight during queue processing.

Defaults to 5 (default of 50000 items).

DB_HOST

Host for MySQL.

Defaults to localhost.

DB_NAME

Database name.

Defaults to osu.

DB_USER

Database username.

Defaults to root.

DB_PASS

Database password.

DD_AGENT_HOST

Host to submit DataDog/StatsD metrics to.

Defaults to localhost.

DD_ENTITY_ID

Enables DataDog origin detection when running in a container. See DataDog documentation.

ES_INDEX_PREFIX

Optional prefix for the index names in elasticsearch.

ES_HOST

Url to the Elasticsearch host.

Defaults to http://localhost:9200

REDIS_HOST

Redis connection string; see here for configuration options.

Defaults to localhost

SCHEMA

Schema version for the queue; see Schema.

Commands

This documentation assumes dotnet run can be used; in cases where dotnet run is not available, the assembly should be used, e.g. dotnet osu.ElasticIndexer.dll

Watching a queue for new scores

Running queue will automatically create an index if an open index matching the requested schema does not exist. If a matching open index exists, it will be reused.

SCHEMA=${schema} dotnet run queue watch

e.g.

SCHEMA=1 dotnet run queue watch

Getting the current schema version

dotnet run schema get

Setting the schema version

dotnet run schema set ${schema}

Unsetting the schema version

This is used to unset the schema version for testing purposes.

dotnet run schema clear

Changing the alias to a new index

The index the alias points to can be changed manually:

dotnet run schema alias ${schema}

will update the index alias to the latest index with schema ${schema} tag.

List indices

To list all indices and their corresponding states (schema, aliased, open or closed)

dotnet run index list

Closing unused indices

This will close all score indices except the active one, unloading them from Elasticsearch's memory pool.

dotnet run index close

A specific index can be closed by passing in index's name as an argument; e.g. the following will close index_1:

dotnet run index close index_1

Cleaning up closed indices

This will delete all closed indices and free up the storage space used by those indices. The command will only delete an index if it is in the closed state.

dotnet run index delete

Passing arguments to the command will delete the matching index:

dotnet run index delete index_1

Adding fake items to the queue

For testing purposes, we can add fake items to the queue:

SCHEMA=1 dotnet run queue pump-fake

It should be noted that these items will not exist or match the ones in the database.

Queuing a specific score for indexing

SCHEMA=${schema} dotnet run queue pump-score ${id}

will queue the score with ${id} for indexing; the score will be added or deleted as necessary, according to the value of SoloScore.ShouldIndex.

See Queuing items for processing from another client

Adding existing database records to the queue

SCHEMA=1 dotnet run queue pump-all

will read existing solo_scores in chunks and add them to the queue for indexing. Only scores with a corresponding phpbb_users entry will be queued.

Extra options:

--from {id}: solo_scores.id to start reading from

--switch: Sets the schema version after the last item is queued; it does not wait for the item to be indexed; this option is provided as a conveninence for testing.

Listing known versions currently being processed

dotnet run active-schemas list

will list the versions known to have queue processors listening on the queue.

Manually add or remove known versions

For debugging purposes or to perform and manual maintenance or cleanups, the list of versions can be updated manually:

dotnet run active-schemas add ${schema}
dotnet run active-schemas remove ${schema}

(Re)Populating an index

Populating an index is done by pushing score items to a queue.

Docker

docker build -t ${tagname} -f osu.ElasticIndexer/Dockerfile osu.ElasticIndexer

docker run -e SCHEMA=1 -e "ES_HOST=http://host.docker.internal:9200" -e "ES_INDEX_PREFIX=docker." -e "REDIS_HOST=host.docker.internal" -e "DB_CONNECTION_STRING=Server=host.docker.internal;Database=osu;Uid=osuweb;SslMode=None;" ${tagname} ${cmd}

where ${cmd} is the command to run, e.g. dotnet osu.ElasticIndexer.dll queue

Typical use-cases

Queuing items for processing from another client

Push items into the Redis queue "osu-queue:score-index-${schema}" e.g.

ListLeftPush("osu-queue:score-index-1", "{ \"ScoreId\": 1 }");

or from redis-cli:

LPUSH "osu-queue:score-index-1" "{\"ScoreId\":1}"

Indexing a score by id

{ "ScoreId": 1 }

Queuing a whole score

{
    "Score": {Solo.Score}
}

osu-elastic-indexer's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar nanaya avatar notbakaneko avatar peppy avatar smoogipoo avatar thepoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osu-elastic-indexer's Issues

Convert scores may need to be marked

Needed for beatmap pack completion state check as it only counts plays on non-converted difficulties. Splitting search per ruleset is possible but that means one query per ruleset will be run for the check.

beatmapset_id would also be nice although not strictly needed.

Store in-progress schema versions in redis

Not just the current schema used, but any versions being reindexed as well.

The use case for this is during a reindex, if a score with an id lower than the current indexing progress gets deleted, it will get missed since other processors pushing to the queue will still be pushing to the old queue.
This should let processors know there is more than one queue to push to.

Investigate growing resources usage until crash

Running pppy/osu-elastic-indexer:2023.823.0 in production, we are observing CPU and RAM usage growing on every instance, until it hits a StackExchange.Redis.RedisTimeoutException and crashes. This happens about every 30 minutes.

image

Elasticsearch 7 compatibility

The form of some of the queries currently being used won't work with ES7+ (especially the index_meta ones).
This metadata should probably be moved into the high_score indices themselves; the metadata queries can be replaced by listing and filtering for matching indices.

Document/improve method to index scores

Maybe also allow pushing just the id? So it's easier to push scores for reindexing. Mainly for user restriction/unrestriction and deleting scores on beatmapset state change (and testing).

Also the current json being pushed doesn't match the score data structure anymore so either the indexer need to deal with incomplete json structure or everything else pushing scores for indexing will need to sync their json as needed.

`--switch` to new version may miss scores

osu-queue-score-statistics will push to the queue externally. It will query osu-queue:score-index:{AppSettings.Prefix}schema to get the version.

This version is set after an all command completes. There may be a few scores missed between the time during which the all command is iterating scores and when the schema version is updated.

Easy solution is to do one final index operation after switching the schema version, to catch any new scores which may have been missed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.