varadaio / presto-workload-analyzer Goto Github PK

The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them

License: GNU General Public License v3.0

Dockerfile 0.20% Python 69.03% HTML 30.77%

presto prestodb prestosql trino visualization analysis analyzer workloads cluster trinodb

presto-workload-analyzer's Introduction

Presto Workload Analyzer

The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them. The analysis provides improved visibility into your analytical workloads, and enables query optimization - to enhance cluster performance.

The Presto® Workload Analyzer collects, and stores, QueryInfo JSONs for queries executed while it is running, and any historical queries held in the Presto® Coordinator memory.

The collection process has negligible compute-costs, and does not impact cluster query execution in any way. Ensure that sufficient disk space is available in your working directory. Typically, a compressed JSON file size will be 50kb - 200kb.

Features
Supported Versions of Presto
Installation
Usage
Screencasts
Advanced Features
Notes

Features

Continuously collects and stores QueryInfo JSONs, in the background without impacting query performance.
Summarizes key query metrics to a summary.jsonl file.
Generates an analysis report:
- Query detail- query peak memory, input data read by query, and joins distribution.
- Table activity- wall time utilization, and input bytes read, by table scans.
- Presto® Operators- wall time usage, and input bytes read, by operator.

Supported Versions of Presto

The Workload Analyzer supports the following versions:

Trino (FKA PrestoSQL)- 402 and older.
PrestoDB- 0.245.1 and older.
Starburst Enterprise- 402e and older.
Dataproc- 1.5.x and older.

Although the Workload Analyzer may run with newer versions of Presto®, these scenarios have not been tested.

Installation

For installation, see here.

Usage

Local machine/ Remote machine

First, go to the analyzer directory, where the Workload Analyzer Python code can be found.

cd analyzer/

To collect statistics from your cluster, run the following script for a period that will provide a representative sample of your workload.

./collect.py -c http://<presto-coordinator>:8080 --username-request-header "X-Trino-User" -o ./JSONs/ --loop

Notes:

In most cases, this period will be between 5 and 15 days, with longer durations providing more significant analysis.
The above command will continue running until stopped by the user (Ctrl+C).

To analyze the downloaded JSONs directory (e.g. ./JSONs/) and generate a zipped HTML report, execute the following command:

./extract.py -i ./JSONs/ && ./analyze.py -i ./JSONs/summary.jsonl.gz -o ./output.zip

Docker

To collect statistics from your cluster, run the following script for a period that will provide a representative sample of your workload.

$ mkdir JSONs/
$ docker run -v $PWD/JSONs/:/app/JSONs analyzer ./analyzer/collect.py -c http://$PRESTO_COORDINATOR:8080 --username-request-header "X-Trino-User" -o JSONs/ --loop

To analyze the downloaded JSONs directory (e.g. ./JSONs/), and generate a zipped HTML report, execute the following commands:

$ docker run -v $PWD/JSONs/:/app/JSONs analyzer ./analyzer/extract.py -i JSONs/
$ docker run -v $PWD/JSONs/:/app/JSONs analyzer ./analyzer/analyze.py -i JSONs/summary.jsonl.gz -o JSONs/output.zip

Notes:

In most cases, this period will be between 5 and 15 days, with longer durations providing more significant analysis.
The above command will continue running until stopped by the user (Ctrl+C).

Screencasts

See the following screencasts for usage examples:

Collection

Analysis

Advanced Features

In exceptional circumstances, it may be desirable to do one or more of the following:

Obfuscate the schema names
Remove the SQL queries from the summary file
Analyze queries for a specific schema (joins with other schemas are included)

To enable these requirements, the ./jsonl_process.py script may be executed, after the ./extract.py script, but before the ./analyze.py script.

In the example below, only queries from the transactions schema are kept, and the SQL queries are removed from the new summary file:

./jsonl_process.py -i ./JSONs/summary.jsonl.gz -o ./processed_summary.jsonl.gz --filter-schema transactions --remove-query

In the following example, all the schema names are obfuscated:

./jsonl_process.py -i ./JSONs/summary.jsonl.gz -o ./processed_summary.jsonl.gz --rename-schemas

In the following example, all the partition and user names are obfuscated:

./jsonl_process.py -i ./JSONs/summary.jsonl.gz -o ./processed_summary.jsonl.gz --rename-partitions --rename-user

After the ./jsonl_process.py script has been executed, to generate a report based on the new summary file, run:

./analyze.py -i ./processed_summary.jsonl.gz -o ./output.zip

To create a high-contrast report, use the --high-contrast-mode parameter, for example:

./analyze.py --high-contrast-mode -i ./JSONs/summary.jsonl.gz -o ./output.zip

Notes

Presto® is a trademark of The Linux Foundation.

presto-workload-analyzer's People

Contributors

Stargazers

Watchers

presto-workload-analyzer's Issues

Allow showing (x,y) coordinates when hovering

Following https://workload-analyzer.slack.com/archives/C01H9LTDZFX/p1621536293010900.

faced many erros while extracting

I am using trino 406 and faced below errors while doing extract and I am using python 3.8.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 11342.53files/s]
[2023-07-08 12:47:15.354243] INFO: analyze: 28 queries loaded
0%| | 0/28 [00:00<?, ?graphs/s][2023-07-08 12:47:15.685055] ERROR: analyze: failed to generate scheduled_by_date
Traceback (most recent call last):
File "./analyze.py", line 1254, in main
item["doc"]["roots"]["references"].sort(key=lambda r: (r["type"], r["id"]))
TypeError: list indices must be integers or slices, not str
4%|████▉ | 1/28 [00:00<00:08, 3.04graphs/s][2023-07-08 12:47:15.791569] ERROR: analyze: failed to generate scheduled_by_hour
Traceback (most recent call last):
File "./analyze.py", line 1254, in main
item["doc"]["roots"]["references"].sort(key=lambda r: (r["type"], r["id"]))
TypeError: list indices must be integers or slices, not str
7%|█████████▊ | 2/28 [00:00<00:05, 5.05graphs/s][2023-07-08 12:47:15.887112] ERROR: analyze: failed to generate input_by_date
Traceback (most recent call last):
File "./analyze.py", line 1254, in main
item["doc"]["roots"]["references"].sort(key=lambda r: (r["type"], r["id"]))
TypeError: list indices must be integers or slices, not str
[2023-07-08 12:47:15.976779] ERROR: analyze: failed to generate input_by_hour
Traceback (most recent call last):
File "./analyze.py", line 1254, in main
item["doc"]["roots"]["references"].sort(key=lambda r: (r["type"], r["id"]))
TypeError: list indices must be integers or slices, not str
14%|███████████████████▋ | 4/28 [00:00<00:03, 7.58graphs/s][2023-07-08 12:47:16.111376] ERROR: analyze: failed to generate queries_by_date
Traceback (most recent call last):
File "./analyze.py", line 1254, in main
item["doc"]["roots"]["references"].sort(key=lambda r: (r["type"], r["id"]))
TypeError: list indices must be integers or slices, not str
18%|████████████████████████▋ | 5/28 [00:00<00:03, 7.53graphs/s][2023-07-08 12:47:16.222095] ERROR: analyze: failed to generate queries_by_hour
Traceback (most recent call last):
File "./analyze.py", line 1254, in main
item["doc"]["roots"]["references"].sort(key=lambda r: (r["type"], r["id"]))
TypeError: list indices must be integers or slices, not str
21%|█████████████████████████████▌ | 6/28 [00:00<00:02, 7.94graphs/s][2023-07-08 12:47:16.341786] ERROR: analyze: failed to generate peak_mem_by_query

Tried several times, the process was Killed

failed to generate joins_selectivity because of DynamicFilterSourceOperator

The analyzer is not able to deal with DynamicFilterSourceOperator to generate the graphs related to joins:

ERROR: analyze: missing keys ('LookupJoinOperator', 'HashBuilderOperator') in {'DynamicFilterSourceOperator': {'node_id': '398', 'type': 'DynamicFilterSourceOperator', 'input_size': 17950.0, 'output_size': 17950.0, 'network_size': 0.0, 'input_rows': 359, 'output_rows': 359, 'network_rows': 0, 'peak_mem': 0.0, 'input_cpu': 0.0007441399999999999, 'output_cpu': 0.00037216, 'finish_cpu': 0.0, 'input_wall': 0.0007452899999999999, 'output_wall': 0.00036805, 'finish_wall': 0.0, 'blocked_wall': 0.0}, 'HashBuilderOperator': {'node_id': '398', 'type': 'HashBuilderOperator', 'input_size': 17950.0, 'output_size': 17950.0, 'network_size': 0.0, 'input_rows': 359, 'output_rows': 359, 'network_rows': 0, 'peak_mem': 196860.0, 'input_cpu': 0.0008561799999999999, 'output_cpu': 0.0, 'finish_cpu': 0.0, 'input_wall': 0.0008708499999999999, 'output_wall': 0.0, 'finish_wall': 0.0, 'blocked_wall': 0.0}}