Comments (9)
Your suggestions for reducing write-amp are valid. However getting more data into memory isn't trival. If you make the memtable huge then queries of it and inserts into it can be slow. An alternative is to keep old memtables in memory and then merge several of them to create a larger L0 file. RocksDB also has alternate memtable implementations where you sacrifice some features provided by the skip list to get better performance.
Having spent a lot of time thinking about write-amplification this year, I am happy to read what others write about it. With RocksDB we now have LevelDB-style (leveled) and HBase-style compaction (universal). I think you are describing write-amp for leveled compaction above and write-amp for it is usually much worse than for universal. However, I don't understand why you include the "2*" term in the cost formula above. I think that might be doubling the write-amp that is likely to occur. The 2X will account for the reads & writes that can be done during compaction, but I call that compaction amplification, not write amplification.
I use the following estimate for write-amp with leveled compaction when there are N levels -- L0, L1, L2, ... LN-1.
- +1 for the redo log
- +1 for the write to an L0 file
- +1 for compaction to L1 assuming size(L0 files in compaction) == size(L1)
- +10 for each level that follows assuming default growth ratio of 10 between levels was used
So my estimate is 3 + 10*X where X is the number of levels excluding L0 and L1. If you use the --stats_per_interval and --stats_interval options with db_bench when doing a benchmark that writes data it will show the amount of reads and writes per level and the total write-amp and the value printed there tends to match the estimated cost I describe here.
from rocksdb.
My calculation has included the read IO, so I include "2*" term.
Given the speed of read/write is 126k/17k, just about 7.5, Then I think the io-amp is the bottleneck. Maybe making the memtable huge or keep more memtables in memory is a worthy strategy.
And another non trivial strategy is to split range dynamicly like bigtable or hbase. Then it can keep the number of levels low.
from rocksdb.
RocksDB has two compaction algorithms -- leveled and universal. leveled is what you we discussed earlier on this thread. universal is more like hbase and write amplification is much lower with it. Try db_bench --compaction_type=1. But we really need to provide more docs for it.
from rocksdb.
We have added a new compaction algorithm called FIFO compaction:
6de6a06
FIFO compaction allows a database to behave more like a TTL-based cache where older entries are automatically purged. This setting can be configured such that write-amplification is really low, close to 1 or 2.
from rocksdb.
Will there be any performance impact for using this FIFO compaction with level db?
Basically, I want to know what are the trade-offs for doing this ?
from rocksdb.
LevelDB?
FIFO compaction is currently very low overhead and write amplification, but might increase space and read amplification. It is also TTL based, so it deletes oldest records automatically.
from rocksdb.
Sorry, I mean Rocksdb :-)
Thanks for the information. But, delete is very expensive in flash. Did you guys have any benchmark data (performace and WA induced) at scale on flash for the Rocksdb ?
from rocksdb.
These are our performance benchmarks: https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks
from rocksdb.
Closing this since it's not really an issue. Please post in https://www.facebook.com/groups/rocksdb.dev/ if you wish to discuss this further.
from rocksdb.
Related Issues (20)
- New LZ4 encoding HOT 1
- release 8.10.2 on maven central HOT 1
- tailing iterator ambiguity
- When run in two consecutive times DB::Open has quite different time cost HOT 2
- Is it possible to make the db.get operation after db.close without exception? HOT 3
- Feature Request: Graceful upgrade
- There is an error when I am compilling rocksDB version above 8.10.0 with zenFS 2.1.4 HOT 1
- Doc bug: CustomFilterPolicy example
- Feature request: check if prefix exists without seeking HOT 5
- intermittent segfault with small database with multiple column families HOT 2
- coredump when trying to repair database
- Rocksdb crash when upgrade from 6.6.3 to 8.0 HOT 1
- ZSTD_TrainDictionary runs even when the compression is set to kNoCompression for a given level HOT 4
- can I reuse a rocksdb writebatch by calling rocksdb_writebatch_clear?
- TtlDb.java doesn't set default column family handle HOT 4
- Discussion: requiring minimum of C++20 HOT 10
- Enable/Disable CacheEntryRoleOptions::chargedfor CacheEntryRole DataBlock is not supported on Ubuntu only HOT 5
- rocksdb Open may read over 160MB from an sst file in one Read request to file system
- Windows GitHub Action build of RocksDB failing silently without marking job as failed HOT 1
- Memory Leak with rocksdb version 6.6.3 ? HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rocksdb.