Comments (12)
I think you will be surprised by the performance of Aerospike for writes.
Even without batch functionality, we have found Aerospike will outperform Cassandra in writes by at least 5-1 or 10-1 on the same hardware. We've had a few people who switched from Cassandra and did benchmarks running on their old Cassandra hardware - YMMV.
I think the "critical part of the system's processing" must be high performance writes, because Cassandra is not providing any kind of consistency guarantees on a batch write. Batch writes are certainly required in Cassanra to achieve performance.
Writing individually has a number of benefits. You can write to different tables/sets, you can get individual error codes, and use individual write policies, one write can overtake another. This is especially beneficial when you use an async API like async Java and/or C interfaces.
There are certainly a few cases - especially very small objects - where batching at the network layer improves performance simply due to processing overhead, but I'd suggest simply trying multithreaded writing. I heard from one of our large customers is running > 10Gb/sec on a single server now, and they use fairly small objects and a lot of writes.
If you'd like to discuss your exact need further, feel free to PM me.
from aerospike-client-java.
Aerospike does not do batch writes it only does batch gets.
from aerospike-client-java.
That sounds like a major oversight.
If I have 1000 keys to update in a batch, does that mean I have to make 1000 separate network calls instead of just 1?
from aerospike-client-java.
Yep! Sorry, I forgot to mention that you can submit multiple operations, in batch mode, on a single record affecting different bins. However, you cannot batch write to different keys , you can only batch read.
from aerospike-client-java.
Sorry, we would definitely have different keys, in different sets (e.g. entities + audit trail + messages for internal message queues), etc.
We're able to do this easily today in Cassandra, so it is a critical part of our system's processing.
from aerospike-client-java.
Hi,
is not so easy. In cassandra we already write to different rows and to different column families within a single batch. We have no expectations around consistency, eventual is fine in this case (as long as it is atomic per row, which is what Cassandra guarantees).
A single batch may have 2,000 writes in it, spread evenly (more or less) across 4 different column families (or sets as they would be in Aerospike).
That is a difference of a single network call vs 2000.
We cannot really simply multithread the writes, because the server is already massively multi-threaded. This would just add more threads and more CPU contention. We already writing these large mutation batches in Cassandra across many parallel threads.
Our customers have different network speeds available and we cannot always assume we will be lucky enough to get the fastest network possible.
So instead of a single network call on a thread I will not need to sequential 2000 network calls on the same thread.
from aerospike-client-java.
I think you will be surprised by the performance of Aerospike for writes.
We've been happy so far : )
Writing individually has a number of benefits. You can write to different tables/sets, you can get individual error codes, and use individual write policies, one write can overtake another. This is especially beneficial when you use an async API like async Java and/or C interfaces.
There's no reason the API couldn't be extended to respond with individual errors per key/record as well as the write policies. Indeed, the get
method could take a Listener
argument much like the async client which could accept (key, record, exception)
. I do think individual write policies would be unnecessary in the vast majority of use cases.
one write can overtake another
True. However, this could be a warning in documentation and the responsibility could be left to the user rather than imposing such a limitation. In fact, the API could ensure that only distinct keys could be written to and only a single bin at a time -- wouldn't that prevent this unexpected behavior?
There are certainly a few cases - especially very small objects - where batching at the network layer improves performance simply due to processing overhead, but I'd suggest simply trying multithreaded writing.
Yep, and this is the exact use case I've had recently at my company. We have relatively small objects, distinct keys, single bins, and millions of these in a single file. I'm using the async client along with a thread pool for the read/write listeners. However, I am presuming that a batched operation to send 2000 of these over the wire at a time rather than 1-by-1 would be faster.
I heard from one of our large customers is running > 10Gb/sec on a single server now, and they use fairly small objects and a lot of writes.
Should this be 10Mb/sec? 10Gb/sec seems quite high for a single server. I would love to hear more about how they've achieved those speeds if such knowledge would make my import program faster :)
Thanks for your time!
from aerospike-client-java.
Sorry but when i do some benchmarks, inserting a lot of records (a key and a bin of 1 kbytes), i see all my indicators (CPU, IO network and disk) at a very low level. But the result is not so good (20000 records/sec ==> 25 Mbytes/sec). I think it's due to network and client/server latency. So with methods for a batch write (and batch delete) , i'm sure i could go more faster!
from aerospike-client-java.
Louis,
How many threads is reasonable for your app?
Batches lower cpu consumption, io , network, disk, generally.
it also gets you ... essentially ... more parallelism, at least at the
database layer. this can be driven, somilarly, by greater threading. the
java client uses concurrent data structures and similar.
sent from my phone, please excuse any terseness
On Apr 29, 2015 9:52 AM, "Louis From Funés" [email protected]
wrote:
Sorry but when i do some benchmarks, inserting a lot of records (a key and
a bin of 1 kbytes), i see all my indicators (CPU, IO network and disk) at a
very low level. But the result is not so good (20000 records/sec ==> 25
Mbytes/sec). I think it's due to network and client/server latency. So with
methods for a batch write (and batch delete) , i'm sure i could go more
faster!—
Reply to this email directly or view it on GitHub
#5 (comment)
.
from aerospike-client-java.
Just to add some weight to why batches are important:
- there is no async python client
- if you use celery you cannot use multiprocessing (which is untill now the most efficient way I found for writing batches in python)
- if you use multithreading you still cannot pass the 1k TPS in python
- so you face a choice - rip out celery in favour of much less mature python-rq which can do multiprocessing or chose a different db (couchbase happily accepts 3k TPS from a single python process with pipeline()) or chose a different language (hardly a choice if you are invested with other parts of the code)
I'm leaning closer and closer to choosing a different db:(
from aerospike-client-java.
I was wondering if there is an example on how to write data from a rdd into aerospike and how to read data from aerospike and create an rdd from it. If there is a link to that example would be nice. Thanks
from aerospike-client-java.
@hammad2861 checkout the Launchpad at http://www.aerospike.com/launchpad/ which has a few Spark related examples.
from aerospike-client-java.
Related Issues (20)
- Streaming support for batch operations HOT 2
- rename Record to AerospikeRecord HOT 1
- View screenshot link is broken HOT 1
- Type Erasure in 5.1.11 HOT 7
- Fully support surrogates symbol in Strings HOT 5
- Wrong condition for aeropsike client connectivity HOT 6
- Add RetryListener HOT 7
- Security contact? HOT 2
- Aerospike Cluster Peer Invalid Node Docker HOT 3
- Node name is null HOT 1
- Error 20: Partition map empty ( Issue for java client ) HOT 8
- Add support for findBySinceUpdate using since_update in metadata HOT 2
- What is the difference between `BatchRecordArrayListener` and `RecordArrayListener`? HOT 2
- How to make async queries faster? HOT 3
- java.lang.NullPointerException in AerospikeClient.put() HOT 4
- Setters for Spring HOT 2
- Enable releases in github HOT 4
- Problem reading Date type with client version 7.X HOT 2
- When compiling lua, the client discards the exception which has the lua compilation error details. HOT 2
- Add listener for node addition / deletion HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aerospike-client-java.