I have a use case where i have a few large indexing machines that are ingesting all th

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

delta snapshots computation about rocksdb HOT 9 OPEN

patelprateek commented on April 27, 2024

delta snapshots computation

from rocksdb.

Comments (9)

jowlyzhang commented on April 27, 2024

If I understand correctly, there are two steps involved in this pipeline that you want to check if can be optimized:

Step 1): large indexing machine periodically checkpointing to distributed storage. Each time a checkpointing happens, it uploads the entire index (the whole DB). We want to check if this can be improved.
Step 2): read-only servers loading from distributed storage periodically to local SSD and serve read traffic. Each time the loading happens, it loads the entire index too. We want to check if this can be improved.

RocksDB's backup feature support incremental backup, it's essentially to not copy duplicated files. It seems to me using this feature for step 1) can achieve what you need, you would need to implement a rocksdb::Env object for you distributed file system

rocksdb/include/rocksdb/utilities/backup_engine.h

Line 44 in 95b41ee

Env* backup_env;

For step 2): Can you use backup feature again and treat what is on the local storage as a backup for what is on the distributed storage.

I'm curious how does the read-only server handles a new snapshot, does it reopen the DB everytime a new download is made?

from rocksdb.

patelprateek commented on April 27, 2024

@jowlyzhang : thanks your understanding is correct.
Our current operations are
Indexing (write only data ingestion jobs)

Indexers : index about few hundred Gbs for each shard (1 indexer per shard) . storage : local ssd or network attached ssd
Every 30 minutes or synchronization point , we run full compaction , copy full index (all sst files) to distributed storage like gcs or s3.
So even if the incremental data within 30 miutes was few hundred megabytes , we end up copying the entire hundreds of gb data to remote storage from local/network ssds .

Serving (read-only , serving queries

One server can load multiple shards (produced by multiple indexers) . So typically we serve TB scale data on a single serving machine . We limit it to 1TB per serving node to ensure we can scale up new servers under 10 minutes.
Servers copy data from remote GCS/S3 buckets to local SSD and then open multiple rocksdb instance (one for each shard that is served on this machine).
when new snapshot appears the background threads copy new full snapshot to local ssds and then close down current instance and open a new db instance which points to these new snapshot.

We want to make these process efficient by making data ingested on indexing to be available on serving faster i.e we want to take these snapshots every minute (or possibly 30 seconds) .
The main bottleneck is copying full snapshots (from local to remote for indexer) and (copying full snapshot from remote to local + opening and initializing new instances) .
I was wondering if the incremental checkpoints can help us scale so that even if our index are large but the incremental data updates are around few hundred mbs , we can possibly reduce the end to end latency.

from rocksdb.

jowlyzhang commented on April 27, 2024

@patelprateek Thank you for the context, do you think the incremental capability of RocksDB backup feature can help here?

from rocksdb.

patelprateek commented on April 27, 2024

@jowlyzhang : my thought process was if there is a way to propagate deltas about index such that they can be replicated easily among query server would be way to go.
Another approach is having servers ingest and serve queries both from same node (read-write both with rate limiting enabled for reads so as to not impact query read latencies). The issue in this scenario would be that all replicated servers are essentially ingesting same data and doing same compaction work (i wanted to avoid that , and hence was leaning indexers doing compaction and read only servers just replicating the state of DB such that its compute efficient) Or some how one master node ingesting and compacting while all other replicas are copying some delta files and metadata such that they dont have to do compaction work and can make new data available to be queried under few seconds latency

my question was for rocksd db team to see if this is feasible , what you suggested might help but i don't know if it allows us to scale for example do you see any issues if we have say 128gb index , trying to take incremental backup every 30 seconds (new data coming every 30 seconds are probably few hundred mbs) ?

from rocksdb.

jowlyzhang commented on April 27, 2024

Your current workflow already do checkpointing every 30 seconds, right? Backup is built on top of checkpointing, shouldn't be more expensive since it's incremental.

from rocksdb.

patelprateek commented on April 27, 2024

No current cadence is 30 minutes (in my update above in the thread) full snapshot (not incremental) , thats why i wanted to understand any perf implication if we need to take incremental backup every 30 sec ?

from rocksdb.

jowlyzhang commented on April 27, 2024

In that case, you would need to do a DB reopen on the read-only servers every 30 seconds, right? I think sometimes it will even take longer than 30 seconds for the DB to open.

from rocksdb.

patelprateek commented on April 27, 2024

yes , i was wondering if its possible to apply incre,ental update without having to re-open.
dont kow the implementation details but if incremental update can tell like file x y deleted and new file a b added , then possibly we could copy those new file or replicate it to be in same state ?

from rocksdb.

jowlyzhang commented on April 27, 2024

I see what you mean. This feature is backup though, it's mainly for backing up a DB, only used to restore a DB when accident happens, which is rare, and thus designed in a way that it needs a reopen. This is for serving online, real time changes. We have a secondary instance feature that can catch up with the primary's changes. But that's for accessing a common set of files on the same file system though.

from rocksdb.

delta snapshots computation about rocksdb HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs