GithubHelp home page GithubHelp logo

latte's People

Contributors

jakubzytka avatar pkolaczk avatar rukai avatar vponomaryov avatar vrischmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

latte's Issues

Histograms stored in samples take too much memory during long runs

Screenshot from 2024-03-29 14-19-56
On the screenshot above we see memory utilization of 2 nodes which are used for running latte.
Memory utilization grew up to 10Gb for 3 hours of uptime on each of the nodes.

Debugged a bit locally and observed that memory leaks happen during each event of sampling.
My observation is that memory utilization is directly related to the made operations during a sampling period.

Can't go faster than ~2 million calls / s on a 2x12 core Xeon

There seems to be a congestion somewhere that limits performance of the empty benchmark to about 1.5-2 million calls per second.
This is probably not an issue in most real benchmarks, where Cassandra is a lot slower anyways, however the performance bar for this tool is set very high, so this needs to be solved.

The issue doesn't seem to be visible on single core processors, hence I guess it could be caused by false sharing / shared atomic updates, which are inherently costly on multiprocessor machines.

Plot is saved as .png, but should be .svg

The plot.rs code explicitly creates an svg file, but the extension is coded as png. This confuses the operating system, and all it does require is to change the extension to svg.

Support for Credentials

It would be useful to have Latte support Credentials on a Cassandra cluster to allow for use on clusters that have the Authorizer and Authenticators enabled.

Don't erase data by default

The erase stage is executed by default. The idea was that each benchmark run should be repeatable, so it should start from the same, clean state. However I've learnt this is also a bit dangerous. If someone has a big set of data for read benchmarking, forgetting the --no-load option will erase all data and you can lose hours of loading work, if you didn't save a backup.

fs::read_lines doesn't make a vector of strings

I have prepared this workload

const READ = "read";

const KEYSPACE = "keyspace";
const TABLE = "table";

pub async fn prepare(ctx) {
    ctx.data.ids = fs::read_lines("ids.txt")?;
    ctx.prepare(READ, `SELECT sum(table_column) FROM ${KEYSPACE}.${TABLE} WHERE id = ? AND event_time >= '2021-11-01'`).await?;
}

pub async fn run(ctx, i) {
    let random_id = latte::hash_select(i, ctx.data.ids);
    ctx.execute_prepared(READ, [random_id]).await?
}

The ids.txt file contains ids (id per line). But it looks like the fs::read_lines() function doesn't work as expected, because the workload fails with error:

LOG ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Time  ───── Throughput ─────  ────────────────────────────────── Response times [ms] ───────────────────────────────────
     [s]      [op/s]     [req/s]         Min        25        50        75        90        95        99      99.9       Max
error: Cassandra error: Failed to execute query "SELECT sum(table_column) FROM keyspace.table WHERE id = ? AND event_time >= '2021-11-01'" with params [Text("k42qjlyix0lulp3m\njl36ct77wor07183\nr7hpjf012xdw87jn\n ... all ids from the ids.txt file
\n")]: Database returned an error: The query is syntatically correct but invalid, Error message: Key length of 152685 is longer than maximum of 65535

Provide a separate load subcommand

In serious benchmarking the data are big enough that we want to load it only once per series of benchmarks.
Currently there is no way to just load the data without running the benchmark.

Express workloads as lists of queries to execute

Currently Workload is responsible for both defining and running the queries.
Because running is an async operation, and we want more than one workload, this requires #[async_trait]. Using async functions in traits in Rust is not perfect yet - it requires boxing the returned futures (boxing = additional heap allocation).

Additionally the workload code has still far too much redundant elements e.g. preparing the statements, inspecting the result and counting the rows/partitions.

The goal of this task is to move the running queries responsibility to the main loop. Workloads would be only responsible for providing the queries as CQL text + some minimum binding guidance. This should simplify workload definitions but also make it easy to add user-defined workloads in the future.

have output that is friendly to being run without terminal

In case we might run this tool inside a docker image as part of a bigger setup/test, while we still want to follow the log and parse it in real time, so we can send out metics into our grafana with stats.

for example, here's the output without a terminal running under docker:

LOG ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Time  ───── Throughput ─────  ────────────────────────────────── Response times [ms] ───────────────────────────────────
     [s]      [op/s]     [req/s]         Min        25        50        75        90        95        99      99.9       Max
Running...           [                                                            ]   0.0%             0.0/120s            0
Running...           [                                                            ]   0.8%             1.0/120s       290323
   1.000      290314      290314       0.015     0.234     0.308     0.409     0.615     0.952     1.650     2.078     4.037
Running...           [                                                            ]   0.8%             1.0/120s       290323
Running...           [▪                                                           ]   1.7%             2.0/120s       576337
Running...           [▪                                                           ]   1.7%             2.0/120s       576337
   2.000      285969      285969       0.019     0.238     0.303     0.417     0.806     0.964     1.115     1.264     1.598
Running...           [▪                                                           ]   2.5%             3.0/120s       853499
   3.000      277207      277207       0.017     0.234     0.292     0.451     0.906     0.985     1.124     1.377     2.408
Running...           [▪                                                           ]   2.5%             3.0/120s       853499
Running...           [▪▪                                                          ]   3.3%             4.0/120s      1116203
Running...           [▪▪                                                          ]   3.3%             4.0/120s      1116203
   4.000      262662      262662       0.016     0.239     0.307     0.640     0.936     0.982     1.100     1.289     1.804
Running...           [▪▪                                                          ]   4.2%             5.0/120s      1326661
Running...           [▪▪                                                          ]   4.2%             5.0/120s      1326789

Response time percentiles should be given with higher precision

RESPONSE TIMES [ms] ════════════════════════════════════════════════════════════════════════════════════════════════════════
          Min                       0 ± 0
           25                       1 ± 0
           50                       1 ± 0
           75                       2 ± 0
           90                       2 ± 0
           95                       2 ± 0
           98                       3 ± 0
           99                       3 ± 0
           99.9                    17 ± 9
           99.99                   97 ± 19
          Max                     135 ± 26

Somehow the decimal places are is missing.

Pass elapsed time to workload

Perhaps there's another way of doing so, but I would like to be able to benchmark workloads that change over time. It seems like the easiest way to do this would be to have the run function also accept the elapsed time so that whatever logic executed on each step can adapt to the time the workload has been running. This is different from using the cycle number since different operations may take a different amount of time and using the cycle number could cause different threads to switch at very different times.

Add PostgreSQL as database target

Hi, have you considered adding other databases to this tool?
I like your work and detail on making sure a thread can run without being blocked.
One of the main databases I am working with is PostgreSQL.

Ability to prime BoundedCycleCounter with a different starting position

when we want to split some load across multiple loader machine (i.e. machine that generate traffic in our tests)
we have lots of cases we need to split the ranges each machine is working on
for example if we want to write big amount of data (few terabytes), doing so with multiple machine that would run through the same exact case with the same counter, would cause excessive rewrite, and slow the whole thing down.

I would be nice if we can have a way to pass to latte something like:

latter run -d 10000000 --start-counter 0 -- workload.ns 127.0.0.1
latter run -d 10000000 --start-counter 10000000 -- workload.ns 127.0.0.1
latter run -d 10000000 --start-counter 20000000 -- workload.ns 127.0.0.1
latter run -d 10000000 --start-counter 30000000 -- workload.ns 127.0.0.1
latter run -d 10000000 --counter-range 0-10000000 -- workload.ns 127.0.0.1
latter run -d 10000000 --counter-range 10000000-20000000 -- workload.ns 127.0.0.1
latter run -d 10000000 --counter-range 20000001-30000000 -- workload.ns 127.0.0.1
latter run -d 10000000 --counter-range 30000001-40000000 -- workload.ns 127.0.0.1

Latte stopped reporting per-line stats at some point

 418.000       17146      171419       0.279     1.179     1.422     1.690     1.959     2.144     2.802     5.194     6.554
 419.000       16963      169686       0.181     1.163     1.411     1.692     1.993     2.214     4.178     5.726    13.074
 420.000       17032      170217       0.238     1.158     1.410     1.694     1.989     2.206     3.994     5.636     6.849
 421.000       17194      171957       0.232     1.169     1.415     1.689     1.965     2.154     2.968     5.136     7.905
 422.000       17276      172785       0.236     1.177     1.415     1.683     1.953     2.128     2.560     4.624     6.205
 423.008       15905      159015       0.255     1.154     1.405     1.687     1.971     2.179     3.492     5.493    77.136
 424.001       16995      169992       0.220     1.171     1.414     1.691     1.982     2.195     3.557    79.364    85.393
 631.218       16129      161285       0.163     1.196     1.449     1.734     2.033     2.251     3.549     5.923   675.807

SUMMARY STATS ══════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Elapsed time       [s]    631.223
        CPU time       [s]   1204.266
 CPU utilisation       [%]       95.4

Throughput ceiling at 200k-300k req/s

There seems to be a throughput ceiling caused by the fact that spawning asynchronous queries is essentially single-threaded.
The following operations take surprisingly large amount of time:

  • spawning a new tokio task
  • binding a statement and submitting it asynchronously to the driver (the major cost)

With the current design, these operations are serial and don't scale on multicore.

Proposed solution:

  • refactor the main loop to use asynchronous streams (using the Stream abstraction)
  • don't spawn each query as a separate task, make each stream concurrent, but single-threaded (should decrease scheduling costs; async/await are cheaper than spawn)
  • create many independent query streams and spawn them on a separate threads; merge them using mpsc channel

To be decided later: should we share a single Session or have each stream its own Session?

Unable to build latte on Ubuntu 22.04

Hello!

I tried to build latte from source using cargo and latest stable rust(1.73.0), however I received a build error:

$ cargo build
error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
  --> /home/dr/.asdf/installs/rust/1.73.0/registry/src/index.crates.io-6f17d22bba15001f/rune-0.12.3/src/hash.rs:68:18
   |
68 |         unsafe { mem::transmute(type_id) }
   |                  ^^^^^^^^^^^^^^
   |
   = note: source type: `std::any::TypeId` (128 bits)
   = note: target type: `hash::Hash` (64 bits)

error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
  --> /home/dr/.asdf/installs/rust/1.73.0/registry/src/index.crates.io-6f17d22bba15001f/rune-0.12.3/src/modules/any.rs:15:14
   |
15 |     unsafe { std::mem::transmute(item.type_hash().expect("no type known for item!")) }
   |              ^^^^^^^^^^^^^^^^^^^
   |
   = note: source type: `hash::Hash` (64 bits)
   = note: target type: `modules::any::TypeId` (128 bits)

For more information about this error, try `rustc --explain E0512`.
error: could not compile `rune` (lib) due to 2 previous errors
warning: build failed, waiting for other jobs to finish...

Build env:

╰─$ rustc --version                                                                                                                                                                                                                   
rustc 1.73.0 (cc66ad468 2023-10-03)
╰─$ uname -a
Linux eniac-4 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
╰─$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04 LTS
Release:        22.04
Codename:       jammy

Could you fix the issue or suggest a workaround?

Support retry control

We are looking at using latte in some of our tests, and those test are chaos originated, and can take a node down, or grow/shrink/upgrade the cluster while a tool like latte (or cassandra-stress) is working.

cause of that we'll need some why to control retry requests

  • if in the script itself
  • if in some command line options, similar to cassandra-stress -error retries=20

Support for user defined functions

Rust driver can read/write UDTs:
https://rust-driver.docs.scylladb.com/stable/data-types/udt.html

but seems that using it from latte workload is not currently possible

passing in Rune object didn't worked:

❯ latte load docker/latte/workloads/custom_d1.rn -- 172.17.0.2
info: Loading workload script /home/fruch/projects/scylla-cluster-tests/docker/latte/workloads/custom_d1.rn...
info: Connecting to ["172.17.0.2"]...
info: Connected to  running Cassandra version 3.0.8
info: Preparing...
info: Erasing data...
info: Loading data...
error: Cassandra error: Unsupported type: Object

I'm guessing some constructs needs to be expose to Rune, for building needed UDT objects

execute functions to return responses, so return values can get validated by the workload

While try to play around with latte, I was looking on how can I add data validation to a workflow,
and I've found the execute functions are always return empty/nil/unit

    /// Executes an ad-hoc CQL statement with no parameters. Does not prepare.
    pub async fn execute(&self, cql: &str) -> Result<(), CassError> {
        let start_time = self.stats.try_lock().unwrap().start_request();
        let rs = self.session.query(cql, ()).await;
        let duration = Instant::now() - start_time;
        self.stats
            .try_lock()
            .unwrap()
            .complete_request(duration, &rs);
        rs.map_err(|e| CassError::query_execution_error(cql, &[], e))?;
        Ok(())
    }

    /// Executes a statement prepared and registered earlier by a call to `prepare`.
    pub async fn execute_prepared(&self, key: &str, params: Value) -> Result<(), CassError> {
        let statement = self
            .statements
            .get(key)
            .ok_or_else(|| CassError(CassErrorKind::PreparedStatementNotFound(key.to_string())))?;
        let params = bind::to_scylla_query_params(&params)?;
        let start_time = self.stats.try_lock().unwrap().start_request();
        let rs = self.session.execute(statement, params.clone()).await;
        let duration = Instant::now() - start_time;
        self.stats
            .try_lock()
            .unwrap()
            .complete_request(duration, &rs);
        rs.map_err(|e| CassError::query_execution_error(statement.get_statement(), &params, e))?;
        Ok(())
    }

so in cases the query works but comes out empty, latte consider is as correct, while I would like to be able to consider it as
error, and maybe even stop the workflow cause of that

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.