Comments (9)
If I understand correctly, there are two steps involved in this pipeline that you want to check if can be optimized:
Step 1): large indexing machine periodically checkpointing to distributed storage. Each time a checkpointing happens, it uploads the entire index (the whole DB). We want to check if this can be improved.
Step 2): read-only servers loading from distributed storage periodically to local SSD and serve read traffic. Each time the loading happens, it loads the entire index too. We want to check if this can be improved.
RocksDB's backup feature support incremental backup, it's essentially to not copy duplicated files. It seems to me using this feature for step 1) can achieve what you need, you would need to implement a rocksdb::Env
object for you distributed file system
For step 2): Can you use backup feature again and treat what is on the local storage as a backup for what is on the distributed storage.
I'm curious how does the read-only server handles a new snapshot, does it reopen the DB everytime a new download is made?
from rocksdb.
@jowlyzhang : thanks your understanding is correct.
Our current operations are
Indexing (write only data ingestion jobs)
- Indexers : index about few hundred Gbs for each shard (1 indexer per shard) . storage : local ssd or network attached ssd
- Every 30 minutes or synchronization point , we run full compaction , copy full index (all sst files) to distributed storage like gcs or s3.
So even if the incremental data within 30 miutes was few hundred megabytes , we end up copying the entire hundreds of gb data to remote storage from local/network ssds .
Serving (read-only , serving queries
- One server can load multiple shards (produced by multiple indexers) . So typically we serve TB scale data on a single serving machine . We limit it to 1TB per serving node to ensure we can scale up new servers under 10 minutes.
- Servers copy data from remote GCS/S3 buckets to local SSD and then open multiple rocksdb instance (one for each shard that is served on this machine).
- when new snapshot appears the background threads copy new full snapshot to local ssds and then close down current instance and open a new db instance which points to these new snapshot.
We want to make these process efficient by making data ingested on indexing to be available on serving faster i.e we want to take these snapshots every minute (or possibly 30 seconds) .
The main bottleneck is copying full snapshots (from local to remote for indexer) and (copying full snapshot from remote to local + opening and initializing new instances) .
I was wondering if the incremental checkpoints can help us scale so that even if our index are large but the incremental data updates are around few hundred mbs , we can possibly reduce the end to end latency.
from rocksdb.
@patelprateek Thank you for the context, do you think the incremental capability of RocksDB backup feature can help here?
from rocksdb.
@jowlyzhang : my thought process was if there is a way to propagate deltas about index such that they can be replicated easily among query server would be way to go.
Another approach is having servers ingest and serve queries both from same node (read-write both with rate limiting enabled for reads so as to not impact query read latencies). The issue in this scenario would be that all replicated servers are essentially ingesting same data and doing same compaction work (i wanted to avoid that , and hence was leaning indexers doing compaction and read only servers just replicating the state of DB such that its compute efficient) Or some how one master node ingesting and compacting while all other replicas are copying some delta files and metadata such that they dont have to do compaction work and can make new data available to be queried under few seconds latency
my question was for rocksd db team to see if this is feasible , what you suggested might help but i don't know if it allows us to scale for example do you see any issues if we have say 128gb index , trying to take incremental backup every 30 seconds (new data coming every 30 seconds are probably few hundred mbs) ?
from rocksdb.
Your current workflow already do checkpointing every 30 seconds, right? Backup is built on top of checkpointing, shouldn't be more expensive since it's incremental.
from rocksdb.
No current cadence is 30 minutes (in my update above in the thread) full snapshot (not incremental) , thats why i wanted to understand any perf implication if we need to take incremental backup every 30 sec ?
from rocksdb.
In that case, you would need to do a DB reopen on the read-only servers every 30 seconds, right? I think sometimes it will even take longer than 30 seconds for the DB to open.
from rocksdb.
yes , i was wondering if its possible to apply incre,ental update without having to re-open.
dont kow the implementation details but if incremental update can tell like file x y deleted and new file a b added , then possibly we could copy those new file or replicate it to be in same state ?
from rocksdb.
I see what you mean. This feature is backup though, it's mainly for backing up a DB, only used to restore a DB when accident happens, which is rare, and thus designed in a way that it needs a reopen. This is for serving online, real time changes. We have a secondary instance feature that can catch up with the primary's changes. But that's for accessing a common set of files on the same file system though.
from rocksdb.
Related Issues (20)
- TransactionDB->CreateColumnFamilyWithImport doesn't create valid column family handles HOT 1
- Cache dumper could exit early
- rocksdb 9.0.0 fails to build on GCC 13.2.1 with `-march=x86-64-v3`
- Cache Dump all keys without filter
- checkpoint directory is empty when db is empty HOT 2
- [Java] In read-only mode can't get data from blob only if there is just one checkpoint with one entry HOT 11
- Solution to the periodic slowdown of GetUpdatesSnce
- Heading typo on wiki docs: PlainTable Format HOT 1
- does rocksdb provide any monitoring metrics? HOT 1
- Java release for 9.1 HOT 2
- segFault while write large data on multiple thread HOT 3
- Feature request: rate limit compaction triggered by periodic compaction seconds/ ttl only HOT 5
- The value of 'micros/op' is not equal to 1,000,000 divided by the value of 'ops/sec' HOT 4
- Question about CompactRange behavior with option atomic_flush=true
- Segfault During compaction using FIFO Compaction style for a single CF
- Safer shutdown behaviour by deafult HOT 2
- StdLogger truncating last letter in some cases HOT 1
- High Memory Usage/ LRU cache size is not being respected HOT 1
- rocksdb abnormal exit
- Feature request: log to stderr logger + LOG file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rocksdb.