GithubHelp home page GithubHelp logo

Comments (12)

bbulkow avatar bbulkow commented on September 28, 2024 1

I think you will be surprised by the performance of Aerospike for writes.

Even without batch functionality, we have found Aerospike will outperform Cassandra in writes by at least 5-1 or 10-1 on the same hardware. We've had a few people who switched from Cassandra and did benchmarks running on their old Cassandra hardware - YMMV.

I think the "critical part of the system's processing" must be high performance writes, because Cassandra is not providing any kind of consistency guarantees on a batch write. Batch writes are certainly required in Cassanra to achieve performance.

Writing individually has a number of benefits. You can write to different tables/sets, you can get individual error codes, and use individual write policies, one write can overtake another. This is especially beneficial when you use an async API like async Java and/or C interfaces.

There are certainly a few cases - especially very small objects - where batching at the network layer improves performance simply due to processing overhead, but I'd suggest simply trying multithreaded writing. I heard from one of our large customers is running > 10Gb/sec on a single server now, and they use fairly small objects and a lot of writes.

If you'd like to discuss your exact need further, feel free to PM me.

from aerospike-client-java.

jpfuentes2 avatar jpfuentes2 commented on September 28, 2024

Aerospike does not do batch writes it only does batch gets.

from aerospike-client-java.

jacek99 avatar jacek99 commented on September 28, 2024

That sounds like a major oversight.

If I have 1000 keys to update in a batch, does that mean I have to make 1000 separate network calls instead of just 1?

from aerospike-client-java.

jpfuentes2 avatar jpfuentes2 commented on September 28, 2024

Yep! Sorry, I forgot to mention that you can submit multiple operations, in batch mode, on a single record affecting different bins. However, you cannot batch write to different keys , you can only batch read.

from aerospike-client-java.

jacek99 avatar jacek99 commented on September 28, 2024

Sorry, we would definitely have different keys, in different sets (e.g. entities + audit trail + messages for internal message queues), etc.

We're able to do this easily today in Cassandra, so it is a critical part of our system's processing.

from aerospike-client-java.

jacek99 avatar jacek99 commented on September 28, 2024

Hi,

is not so easy. In cassandra we already write to different rows and to different column families within a single batch. We have no expectations around consistency, eventual is fine in this case (as long as it is atomic per row, which is what Cassandra guarantees).

A single batch may have 2,000 writes in it, spread evenly (more or less) across 4 different column families (or sets as they would be in Aerospike).

That is a difference of a single network call vs 2000.

We cannot really simply multithread the writes, because the server is already massively multi-threaded. This would just add more threads and more CPU contention. We already writing these large mutation batches in Cassandra across many parallel threads.

Our customers have different network speeds available and we cannot always assume we will be lucky enough to get the fastest network possible.

So instead of a single network call on a thread I will not need to sequential 2000 network calls on the same thread.

from aerospike-client-java.

jpfuentes2 avatar jpfuentes2 commented on September 28, 2024

I think you will be surprised by the performance of Aerospike for writes.

We've been happy so far : )

Writing individually has a number of benefits. You can write to different tables/sets, you can get individual error codes, and use individual write policies, one write can overtake another. This is especially beneficial when you use an async API like async Java and/or C interfaces.

There's no reason the API couldn't be extended to respond with individual errors per key/record as well as the write policies. Indeed, the get method could take a Listener argument much like the async client which could accept (key, record, exception). I do think individual write policies would be unnecessary in the vast majority of use cases.

one write can overtake another

True. However, this could be a warning in documentation and the responsibility could be left to the user rather than imposing such a limitation. In fact, the API could ensure that only distinct keys could be written to and only a single bin at a time -- wouldn't that prevent this unexpected behavior?

There are certainly a few cases - especially very small objects - where batching at the network layer improves performance simply due to processing overhead, but I'd suggest simply trying multithreaded writing.

Yep, and this is the exact use case I've had recently at my company. We have relatively small objects, distinct keys, single bins, and millions of these in a single file. I'm using the async client along with a thread pool for the read/write listeners. However, I am presuming that a batched operation to send 2000 of these over the wire at a time rather than 1-by-1 would be faster.

I heard from one of our large customers is running > 10Gb/sec on a single server now, and they use fairly small objects and a lot of writes.

Should this be 10Mb/sec? 10Gb/sec seems quite high for a single server. I would love to hear more about how they've achieved those speeds if such knowledge would make my import program faster :)

Thanks for your time!

from aerospike-client-java.

giena avatar giena commented on September 28, 2024

Sorry but when i do some benchmarks, inserting a lot of records (a key and a bin of 1 kbytes), i see all my indicators (CPU, IO network and disk) at a very low level. But the result is not so good (20000 records/sec ==> 25 Mbytes/sec). I think it's due to network and client/server latency. So with methods for a batch write (and batch delete) , i'm sure i could go more faster!

from aerospike-client-java.

bbulkow avatar bbulkow commented on September 28, 2024

Louis,

How many threads is reasonable for your app?

Batches lower cpu consumption, io , network, disk, generally.

it also gets you ... essentially ... more parallelism, at least at the
database layer. this can be driven, somilarly, by greater threading. the
java client uses concurrent data structures and similar.

sent from my phone, please excuse any terseness
On Apr 29, 2015 9:52 AM, "Louis From Funés" [email protected]
wrote:

Sorry but when i do some benchmarks, inserting a lot of records (a key and
a bin of 1 kbytes), i see all my indicators (CPU, IO network and disk) at a
very low level. But the result is not so good (20000 records/sec ==> 25
Mbytes/sec). I think it's due to network and client/server latency. So with
methods for a batch write (and batch delete) , i'm sure i could go more
faster!


Reply to this email directly or view it on GitHub
#5 (comment)
.

from aerospike-client-java.

jmizgajski avatar jmizgajski commented on September 28, 2024

Just to add some weight to why batches are important:

  • there is no async python client
  • if you use celery you cannot use multiprocessing (which is untill now the most efficient way I found for writing batches in python)
  • if you use multithreading you still cannot pass the 1k TPS in python
  • so you face a choice - rip out celery in favour of much less mature python-rq which can do multiprocessing or chose a different db (couchbase happily accepts 3k TPS from a single python process with pipeline()) or chose a different language (hardly a choice if you are invested with other parts of the code)

I'm leaning closer and closer to choosing a different db:(

from aerospike-client-java.

hammad-platalytics avatar hammad-platalytics commented on September 28, 2024

I was wondering if there is an example on how to write data from a rdd into aerospike and how to read data from aerospike and create an rdd from it. If there is a link to that example would be nice. Thanks

from aerospike-client-java.

wchu-citrusleaf avatar wchu-citrusleaf commented on September 28, 2024

@hammad2861 checkout the Launchpad at http://www.aerospike.com/launchpad/ which has a few Spark related examples.

from aerospike-client-java.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.