GithubHelp home page GithubHelp logo

delta snapshots computation about rocksdb HOT 9 OPEN

patelprateek avatar patelprateek commented on April 27, 2024
delta snapshots computation

from rocksdb.

Comments (9)

jowlyzhang avatar jowlyzhang commented on April 27, 2024

If I understand correctly, there are two steps involved in this pipeline that you want to check if can be optimized:

Step 1): large indexing machine periodically checkpointing to distributed storage. Each time a checkpointing happens, it uploads the entire index (the whole DB). We want to check if this can be improved.
Step 2): read-only servers loading from distributed storage periodically to local SSD and serve read traffic. Each time the loading happens, it loads the entire index too. We want to check if this can be improved.

RocksDB's backup feature support incremental backup, it's essentially to not copy duplicated files. It seems to me using this feature for step 1) can achieve what you need, you would need to implement a rocksdb::Env object for you distributed file system

For step 2): Can you use backup feature again and treat what is on the local storage as a backup for what is on the distributed storage.

I'm curious how does the read-only server handles a new snapshot, does it reopen the DB everytime a new download is made?

from rocksdb.

patelprateek avatar patelprateek commented on April 27, 2024

@jowlyzhang : thanks your understanding is correct.
Our current operations are
Indexing (write only data ingestion jobs)

  1. Indexers : index about few hundred Gbs for each shard (1 indexer per shard) . storage : local ssd or network attached ssd
  2. Every 30 minutes or synchronization point , we run full compaction , copy full index (all sst files) to distributed storage like gcs or s3.
    So even if the incremental data within 30 miutes was few hundred megabytes , we end up copying the entire hundreds of gb data to remote storage from local/network ssds .

Serving (read-only , serving queries

  1. One server can load multiple shards (produced by multiple indexers) . So typically we serve TB scale data on a single serving machine . We limit it to 1TB per serving node to ensure we can scale up new servers under 10 minutes.
  2. Servers copy data from remote GCS/S3 buckets to local SSD and then open multiple rocksdb instance (one for each shard that is served on this machine).
  3. when new snapshot appears the background threads copy new full snapshot to local ssds and then close down current instance and open a new db instance which points to these new snapshot.

We want to make these process efficient by making data ingested on indexing to be available on serving faster i.e we want to take these snapshots every minute (or possibly 30 seconds) .
The main bottleneck is copying full snapshots (from local to remote for indexer) and (copying full snapshot from remote to local + opening and initializing new instances) .
I was wondering if the incremental checkpoints can help us scale so that even if our index are large but the incremental data updates are around few hundred mbs , we can possibly reduce the end to end latency.

from rocksdb.

jowlyzhang avatar jowlyzhang commented on April 27, 2024

@patelprateek Thank you for the context, do you think the incremental capability of RocksDB backup feature can help here?

from rocksdb.

patelprateek avatar patelprateek commented on April 27, 2024

@jowlyzhang : my thought process was if there is a way to propagate deltas about index such that they can be replicated easily among query server would be way to go.
Another approach is having servers ingest and serve queries both from same node (read-write both with rate limiting enabled for reads so as to not impact query read latencies). The issue in this scenario would be that all replicated servers are essentially ingesting same data and doing same compaction work (i wanted to avoid that , and hence was leaning indexers doing compaction and read only servers just replicating the state of DB such that its compute efficient) Or some how one master node ingesting and compacting while all other replicas are copying some delta files and metadata such that they dont have to do compaction work and can make new data available to be queried under few seconds latency

my question was for rocksd db team to see if this is feasible , what you suggested might help but i don't know if it allows us to scale for example do you see any issues if we have say 128gb index , trying to take incremental backup every 30 seconds (new data coming every 30 seconds are probably few hundred mbs) ?

from rocksdb.

jowlyzhang avatar jowlyzhang commented on April 27, 2024

Your current workflow already do checkpointing every 30 seconds, right? Backup is built on top of checkpointing, shouldn't be more expensive since it's incremental.

from rocksdb.

patelprateek avatar patelprateek commented on April 27, 2024

No current cadence is 30 minutes (in my update above in the thread) full snapshot (not incremental) , thats why i wanted to understand any perf implication if we need to take incremental backup every 30 sec ?

from rocksdb.

jowlyzhang avatar jowlyzhang commented on April 27, 2024

In that case, you would need to do a DB reopen on the read-only servers every 30 seconds, right? I think sometimes it will even take longer than 30 seconds for the DB to open.

from rocksdb.

patelprateek avatar patelprateek commented on April 27, 2024

yes , i was wondering if its possible to apply incre,ental update without having to re-open.
dont kow the implementation details but if incremental update can tell like file x y deleted and new file a b added , then possibly we could copy those new file or replicate it to be in same state ?

from rocksdb.

jowlyzhang avatar jowlyzhang commented on April 27, 2024

I see what you mean. This feature is backup though, it's mainly for backing up a DB, only used to restore a DB when accident happens, which is rare, and thus designed in a way that it needs a reopen. This is for serving online, real time changes. We have a secondary instance feature that can catch up with the primary's changes. But that's for accessing a common set of files on the same file system though.

from rocksdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.