GithubHelp home page GithubHelp logo

dmitry-saprykin / cassandra-on-azure-vms-performance-experiments Goto Github PK

View Code? Open in Web Editor NEW

This project forked from azure-samples/cassandra-on-azure-vms-performance-experiments

0.0 0.0 0.0 10.04 MB

This repo summarizes learnings from performing various relative performance tests of Apache Cassandra on different Azure VM configurations to answer a few common questions.

License: MIT License

cassandra-on-azure-vms-performance-experiments's Introduction

Apache Cassandra on Azure VMs Performance Experiments

September 2019

Overview

Many customers run Apache Cassandra in Azure and are looking for experiment-based guidance for tuning Cassandra on Azure VMs. This repo summarizes learnings from performing various relative performance tests of Apache Cassandra on different Azure VM configurations to answer a few common questions such as:

  • What stripe/chunk size should I use for Cassandra data disks?
  • What is the impact of data disk caching?
  • Will my cluster performance scale linearly if I add another Cassandra data center to my ring in another Azure region?
  • Is there a performance difference between ext4 and xfs filesystems?

The goal of this repo is to share interesting learnings to help increase knowledge around how Apache Cassandra will behave and perform on various Azure VM configurations. It is not meant to be used as a benchmark of Cassandra on Azure, but rather as a summary of observations and conclusions from micro-scale tests comparing relative performance observed when tuning Azure VM and Cassandra configurations.

Test setup and methodology

The Azure Cosmos DB service offers "Cassandra to Cosmos Exchange (CCX)" which is designed to enable large Enterprise customers to use hybrid Cassandra migrations or runtime scenarios where Cosmos DB augments Cassandra clusters. CCX provides specially-designated Azure VM SKUs: Special_CCX_DS14_v2 and Special_CCX_DS13_v2. These Azure VM sizes are performance-equivalent to the usual Azure Standard_DS14_v2 and Standard_DS13_v2 VM sizes but are hosted on the same infrastructure as Cosmos DB itself, thereby providing the Cosmos DB service team increased control over the updates and maintenance of these compute clusters.

By default, you will not see Special_CCX VM SKUs in your subscription. If you are interested in the CCX offer, please email [email protected] with a description of your Cassandra on Azure scenario.

In these tests, both the Special_CCX_DS14_v2 and Standard_DS14_v2 sizes were used interchangeably, depending on the Azure region where the Cassandra rings were deployed. Therefore, expect to see comparable performance numbers, even if using Standard_DS14_v2 VMs.

Apache Cassandra's (version 3.11.4) standard cassandra-stress tool is used throughout all tests.

Test setup

Parameter Value Comments
Cassandra Node VMs 6 x DS14_v2 VM disk throttles are: uncached 51k IOPS/768MBps, cached 64k/576MBps. Was also tested using Special_CCX_DS14_v2 VM size which has performance identical to DS14_v2.
Data disks 4 x P30 (1TB) P30 disk throttle is 5k/200MBps
Disk Caching None and ReadOnly For many VMs the throughput throttle for uncached disks is higher than cached due to limits of the Azure Blob Cache cache. Average latency is higher with uncached disks since all IOs go to backend storage. Tested both DiskCaching=None and DiskCaching=ReadOnly with empty and full host cache. See Disk Caching learnings.
Accelerated Networking Enabled Accelerated Networking decreases network latency from ~250us to ~50us and improves throughput, allowing DS14_v2 to reach ~11Gbps (1.2GB/s) with single-stream iperf3
Linux distro and version CentOS 7.5 A modern Linux distro is required for latest drivers and the ability to use Accelerated Networking. Latest versions of Ubuntu 16.x and 18.x could also be used. The main reason we used CentOS 7.5 was because it matched one of specific customer scenarios.
Cassandra Version Apache Cassandra 3.11.4 Repo base URL (link)
mdadm chunk size 256KB, 128KB, 64KB, 4KB The default value in mdadm is 256k and was tested initially. Also tested 4k (outlier), 64k, and 128k chunk sizes to see if there was any corresponding difference in performance. See Chunk Sizes learnings.
commitlog Local SSD, Premium Disk Specific customer scenario was using local/ephemeral SSD, likely because of assumption that it provides fastest writes due to low latency of local disk. However, since these local disks are not durable, a VM host crash could cause the commitlog on local/ephemeral disk to be lost. Therefore, we also tested writes with commitlog on attached Premium data disks to assess relative impact to performance.
SSTables RAID 0, XFS mdadm RAID 0 with various chunk sizes above, XFS filesystem with the default 4096 byte block size.
Cassandra Java Garbage Collection UseG1GC Default is CMS GC, but during testing we noticed syscpu on some nodes spiking to 90% and the node becoming unresponsive for some time, presumably during GC pauses. With G1GC, high syscpu is much less frequent, but still occurs sometimes. This requires further investigation. Early straces show Java process doing lots of futex and epoll_wait syscalls during the pause.
Cassandra-Stress client VMs 1 x DS14_v2 (usually with 256KB for writes and 128 for reads) Client VM is deployed in the same VNet and on a separate subnet. It connects to all 6 Cassandra nodes with up to 128 connections per node and uses 128 threads with no throttling. Threads could be increased higher which may increase the ops/sec for some scenarios, including with more clients. That said, in most scenarios we tested, client CPU and network were not the bottleneck.

For more details see setup of Azure VMs used in Cassandra tests.

Learnings and Observations

Appendix

  • Coming Soon - Sample for deploying Apache Cassandra on Azure IaaS and running cassandra-stress tests (GitHub)

cassandra-on-azure-vms-performance-experiments's People

Contributors

microsoftopensource avatar arsenvlad avatar msftgits avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.