test-automation

Tools for making our tests easier to run. Automates setting up a cluster with Azure/Cloudformation and installs a script which automates setting up citus and everything required for testing citus.

Azure
- Getting Started
  - Setup steps for each test
  - Steps to delete a cluster
- Under The Hood
AWS(Deprecated)
- Getting Started
  - Setup steps for each test
  - Steps to delete a cluster
- Detailed Configuration
  - Starting a Cluster
  - Connecting to the Master
Running Rests
Example fab Commands
Tasks, and Ordering of Tasks
Task Namespaces
Advanced fab Usage
- Using Multiple Citus Installations
TroubleShooting

Azure

Getting Started

You can find more information about every step below in other categories. This list of commands show how to get started quickly. Please see other items below to understand details and solve any problems you face.

Prerequisites

You should have az cli in your local to continue. Install instructions
Run az login to make the CLI log in to your account

Make sure that your default subscription is the right one(Azure SQL DB Project Orcas - CitusData):

# List your subscriptions
az account list
# Pick the correct one from the list and run
az account set --subscription {uuid-of-correct-subscription}

If your subscriptions list doesn't contain Azure SQL DB Project Orcas - CitusData, to add it, contact someone who is authorized.

You should use ssh-agent to add your ssh keys, which will be used for downloading the enterprise repository. Note that your keys are kept only in memory, therefore this is a secure step.
```
# start ssh agent
eval `ssh-agent -s`

# Add your Github ssh key for enterprise (private) repo
ssh-add
```

General info

In azuredeploy.parameters.json file, you will see the parameters that you can change. For example if you want to change the number of workers, you will need to change the parameter numberOfWorkers. You can change the type of coordinator and workers separately from the parameters file. Also by default for workers, memory intense vms are used(E type) while for coordinator CPU intense vms are used(D type).

After you run tests, you can see the results in results folder. The results folder will have the name of the config used for the test.

The data will be stored on the attached disk, size of which can be configured in the parameters.

If you dont specify the region, a random region among eastus, west us 2 and south central us will be chosen. This is to use resources uniformly from different regions.

Port 3456 is used for ssh, you can connect to any node via Port 3456, if you don't use this node, you will hit the security rules.

Setup Steps For Each Test

You will need to follow these steps to create a cluster and connect to it, on your local machine:

# in the session that you will use to ssh, set the resource group name
export RESOURCE_GROUP_NAME=give_your_name_citus_test_automation_r_g

# if you want to configure the region
# export AZURE_REGION=eastus2

# Go to the azure directory to have access to the scripts
cd azure

# open and modify the instance types/discs as you wish
less azuredeploy.parameters.json

# Quickly start a cluster of with defaults. This will create a resource group and use it for the cluster.
./create-cluster.sh

# connect to the coordinator
./connect.sh

Steps to delete a cluster

After you are done with testing you can run the following the delete a cluster and the relevant resource group:

# Delete the formation
# It's a good practice to check deletion status from the azure console
./delete-resource-group.sh

Under the hood

Under The Hood

Azure has ARM templates that can be used to deploy servers with ease. There are two main files for ARM templates, azuredeploy.json and azuredeploy.parameters.json. azuredeploy.json has the main template and azuredeploy.parameters.json contains the parameters that are used in the main template. For example if you want to change the number of workers, you would do that in the parameters. You shouldnt change anything in the template file for configuration.

The main template has 4 main parts:

Parameters
Variables
Resources
Outputs

Parameters can be configured from the parameters file. Variables are constants. Resources have all of the resource definitions such as VMs, network security groups. Outputs can be useful for displaying a connection string.

When creating resources, we can specify the order so that if a resource depends on some other resource, it wont be created before the dependant is created. We can also specify how many instances of a resource to create with a copy command.

The first virtual machine with index 0 is treated as a coordinator. When all the virtual machines are ready, a custom script is installed to do initialization in vms. The initailization script is retrieved from the github with a url.

The initialization script also finds the private ip addresses of workers and puts them to the coordinator. The way this is done is with a storage account resource. This storage account resource is created within the template itself and all the vms upload their private ip addresses to the storage. After all are uploaded the coordinator downloads all the private ip addresses from the storage account and puts it to worker-instances file, which is then used when creating a citus cluster.

We have a special security group which blocks ssh traffic. The rule's priority is 103 and 100, 101, 102 are also taken by this security group.

You can use connect.sh which will connect to the coordinator for you on a custom ssh port (at the time of writing 3456).

Before starting the process you should set the environment variable RESOURCE_GROUP_NAME, which is used in all scripts.

export RESOURCE_GROUP_NAME=give_your_name_citus_test_automation_r_g

if you want to configure the region, you can also set that:

export AZURE_REGION=eastus2

You should use a single session because the exported variable is only available in the current session and its children sessions. You should start ssh-agent and add your key with ssh-add.

By default, your public key from ~/.ssh/id_rsa.pub will be used. This public key will be put to the virtual machines so that you can ssh to them.

To simplify this process, there is a script called create-cluster.sh, which:

creates a resource group from the environment variable RESOURCE_GROUP_NAME, and AZURE_REGION.
creates a cluster with the azuredeploy.json template in the resource group
prints the connection string to ssh

then you should run:

./connect.sh

After you are done with testing, you can delete the resource group with:

./delete-resource-group.sh

Currently the default time for tests 300 seconds, and as we have many tests it might take a while to run all the tests. So when testing a change, it is better to change the test times to something short such as 5 seconds. The time can be changed with the -T parameter:

pgbench_command: pgbench -c 32 -j 16 -T 600 -P 10 -r

->

pgbench_command: pgbench -c 32 -j 16 -T 600 -P 10 -r

If you want to add different vm sizes, you should change the allowed values for coordinatorVMSize and workerVMSize in azuredeploy.json.

We run a custom script to initialize the vms. The script is downloaded to /var/lib/waagent/custom-script/download/0 . You can find the script logs in this file.

AWS(Deprecated)

AWS

Getting Started

Setup Steps For Each Test

You will need to follow these steps to create a cluster and connect to it, on your local machine:

# start ssh agent
eval `ssh-agent -s`

# Add your EC2 keypair's private key to your agent
ssh-add path_to_keypair/metin-keypair.pem

# Add your Github ssh key for enterprise (private) repo
ssh-add

# Quickly start a cluster of (1 + 3) c3.4xlarge nodes
cloudformation/create-stack.sh -k metin-keypair -s PgBenchFormation -n 3 -i c3.4xlarge

# When your cluster is ready, it will prompt you with the connection string, connect to master node
ssh -A [email protected]

Steps to delete a cluster

On your local machine: '

# Delete the formation
# It's a good practice to check deletion status from the cloud formation console
aws cloudformation delete-stack --stack-name "ScaleFormation"

Detailed Configuration

Starting a Cluster

You'll need to have installed the AWS CLI. Once that's installed you should configure it with aws configure. Once it's configured you can run something like:

cloudformation/create-stack.sh -k [your keypair name] -s MyStack

This will take some time (around 7 minutes) and emit some instructions for connecting to the master node.

The name you pass -s must be unique. There are more parameters you can pass such as -n, which changes the number of worker nodes which are launched. If you run create-stack with no parameters it will tell you more.

If you forget the name of your cluster you can get the list of active clusters by running:

aws cloudformation list-stacks --stack-status-filter CREATE_COMPLETE CREATE_IN_PROGRESS CREATE_FAILED --query "StackSummaries[*].{StackName:StackName,StackStatus:StackStatus}"

This will only list clusters which are in your default region. You can specify a region with the --region flag.

To get the hostname of the master you can run:

aws cloudformation describe-stacks --stack-name MyStack --query Stacks[0].Outputs[0].OutputValue

Connecting to the Master

Make sure you have a running ssh-agent

If you are running linux you either have a running ssh-agent or know that you don't ;). If you're using OSX you probably have a working ssh-agent but it's possible Apple has pushed an update and broken things.

If it is running then the SSH_AUTH_SOCK environment variable will be set:
```
brian@rhythm:~$ echo $SSH_AUTH_SOCK
/tmp/ssh-s1OzC5ULhRKg/agent.1285
```
If your SSH_AUTH_SOCK is empty then Google is your friend for getting an ssh-agent to start automatically when you login. As a temporary fix you can run exec ssh-agent bash:
```
brian@rhythm:~$ echo $SSH_AUTH_SOCK

brian@rhythm:~$ exec ssh-agent bash
brian@rhythm:~$ echo $SSH_AUTH_SOCK
/tmp/ssh-et5hwGiqPxUn/agent.31580
```
Add your keypair's private key to your ssh agent

When you created your EC2 keypair it gave you a .pem file for safekeeping. Add it to your agent with:
```
brian@rhythm:~$ ssh-add Downloads/brian-eu.pem
Identity added: Downloads/brian-eu.pem (Downloads/brian-eu.pem)
```
find ~ -name '*.pem' can help you find your key. Running ssh-add ~/.ssh/id_rsa will not work, you must use the keypair you received from EC2.
If you plan on checking out private repos, add that private key to your agent as well

When you check out a private repo on the master/workers, they will reach back and talk to your ssh-agent in order to authenticate as you when talking to Github. You can find your list of keys here. One of them should be added to ssh-agent with another ssh-add command. Personally, I run:
```
brian@rhythm:~$ ssh-add
Identity added: /home/brian/.ssh/id_rsa (/home/brian/.ssh/id_rsa)
```
Note: If you use 2-factor authentication that will not change these instructions in any way. For the purposes of ssh access your key counts as a factor. If you have a passphrase on your key you will be prompted for it when you run ssh-add, that's the second factor. If you do not have a passphrase on your key, well, Github has no way of knowing that.
ssh into the master in a way which allows it to use your local ssh-agent

The create-stack.sh script should have given you a connection string to use. You can also find the hostname of the master node in the cloudformation control panel in the "outputs" section. There's a command above which tells you how to get the hostname without going to the control panel.

You'll connect using a string like:
```
ssh -A [email protected]
```
The -A is not optional. It is required so the master node can ssh into the worker nodes and so the nodes can checkout private repos.

That means you should not pass the -i flag into ssh:
```
# careful, this is wrong!
ssh -i Downloads/brian2.pem ec2-user@[hostname]
```
If you need to pass the -i flag in order to connect this means the key is not in your agent. That means the master node will not be able to ssh into the worker nodes when you later run fab.

It's unfortunate that you have no flexibility here. This restriction which will hopefully be lifted in the future.

Running Tests

Running Automated Tests

Depending of the tests you trigger here, you can block at most 3 jobs slots in circleci for around 3 hours. Choose wisely the time you want to run the tests to not block development

If you want, you can run trigger a job which can run pgbench, scale and tpch tests. What the job does is:

It creates a cluster with the test resource group name
It connects to the coordinator
It runs the corresponding test for the job
It deletes the cluster.

There is a separate job for each test and you can run any combinations of them. To trigger a job, you should create a branch which has specific prefixes.

If the branch has a prefix pgbench/, then pgbench job will be triggered.
If the branch has a prefix scale/, then scale job will be triggered.
If the branch has a prefix tpch/, then tpch job will be triggered.
If the branch has a prefix all_performance_test/, then all jobs will be triggered.

You should push your branch to Github so that the circleci job will be triggerred.

Each job uses a specific resource group name so that there will be at most 3 resource groups for these jobs. If there is already a resource group, then you should make sure that:

Someone else is currently not running the same test as you

If not, then you can delete the resource group name from portal, you can find it by search the prefix citusbot. Under normal circumstances the resource group will already be deleted at the end of the test even if it fails.

You can find your test results in https://github.com/citusdata/release-test-results under periodic_job_results folder. Test results will be pushed to a branch which is in the format ${rg_name}/${month_day_year_uniqueID}.

By default the tests will be run against enterprise-master and the latest released version. If you want to test on a custom branch you should change the config files of relevant tests with your custom branch name in:

postgres_citus_versions: [('12.1', 'your-custom-branch-name-in-enterprise'), ('12.1', 'release-9.1')]

You can change all the settings in these files, the config files for tests are located at:

By default, the following tests will be run for each test:

pgbench: pgbench_default.ini and pgbench_default_without_transaction.ini
scale: scale_test.ini
tpch: tpch_default.ini

If you dont want to use default cluster settings(instance types etc), you can change them in https://github.com/citusdata/test-automation/blob/master/azure/azuredeploy.parameters.json.

If you want to change how long each test will be run, you can change the times with the -T parameter. https://github.com/citusdata/test-automation/blob/master/fabfile/pgbench_confs/pgbench_default.ini#L33

pgbench_command: pgbench -c 32 -j 16 -T <test time in seconds> -P 10 -r

Running Automated Hammerdb Benchmark

You should create a new branch and change the settings in the new branch and push the branch so that when the tool clones the repository it can download your branch.

Hammerdb tests are run from a driver node. Driver node is in the same virtual network as the cluster. You can customize the hammerdb cluster in the hammerdb folder using hammerdb/azuredeploy.parameters.json. Note that this is the configuration for the cluster, which is separate than benchmark configurations(fabfile/hammerdb_confs/)

In fabfile/hammerdb_confs you can(and you should probably add at least one more config to this folder):

change postgres version
use enterprise or community
use a custom branch (You can also use git refs instead of branch names)
change/add postgres/citus settings

You can add as many configs as you want to fabfile/hammerdb_confs folder and the automation tool will run the benchmark for each config. It will clean all the tables in each iteration to get more accurate results. So if you want to compare two branches, you can create two identical config files with two different branches. (Note that you can also use git refs instead of branch names) The result logs will contain the config file so that it is easy to know which config was used for a run.

After adding the configs fabfile/hammerdb_confs could look like:

./master.ini
./some_branch.ini
./some_other_branch.ini

In order to run hammerdb benchmark:

eval `ssh-agent -s`
ssh-add
export RESOURCE_GROUP_NAME=<your resource group name>
export GIT_USERNAME=<Your github username>
export GIT_TOKEN=<Your github token with repo, write:packages and read:packages permissions> # You can create a github token from https://github.com/settings/tokens
cd hammerdb
# YOU SHOULD CREATE A NEW BRANCH AND CHANGE THE SETTINGS/CONFIGURATIONS IN THE NEW BRANCH
# AND PUSH THE BRANCH SO THAT WHEN THE TOOL CLONES THE REPOSITORY
# IT CAN DOWNLOAD YOUR BRANCH.
vim fabfile/hammerdb_confs/<branch_name>.ini # verify that your custom config file is correct
./create-run.sh
# you will be given a command to connect to the driver node and what
# to run afterwards.

After running ./create-run.sh you do not have to be connected to the driver node at all, it will take care of the rest for you.

The cluster will be deleted if everything goes okay, but you should check if it is deleted to be on the safe side.(If it is not, you can delete that with azure/delete-resource-group.sh or from the portal).

Sometimes you might get random/temporary errors while provisining the cluster. In that case, simply delete the previous resource group, and try again. If it is persistent, try after a while, and if it is still persistent open an issue on test-automation.

In order to see the process of the tests, from the driver node:

./connect-driver.sh
screen -r

You can see the screen logs in ~/screenlog.0.

You will see the results in a branch hammerdb_date_id in https://github.com/citusdata/release-test-results. You won't get any notifications for the results, so you will need to manually check it. What files are pushed to github:

build.tcl (This is the configuration file used for building hammerdb tables)
run.tcl (This is the configuration file used for running hammerdb tpcc benchmark)
build_<config_file_name>.log (These are the outputs of building the hammerdb tables for the 'config_file_name')
run_<config_file_name>.log (These are the outputs of running hammerdb tpcc benchmark for the 'config_file_name')
ch_benchmarks.log (This is the log file that is generated from ch-benCHmark script)
ch_results.txt (This is the file that contains the results of ch benchmark, each config file's result is saved in a new line)
<config_file_name>.NOPM.log (These are the files that contains the NOPM for the given config file name.)

hammerdb/build.tcl creates and fills hammerdb tpcc tables. You should have at least 1:5 ratio for vuuser:warehouse_count otherwise the build.tcl might get stuck.

hammerdb/run.tcl runs tpcc benchmark. You can configure things such as test duration here.

Note that running a benchmark with a single config file with a vuuser of 250 and 1000 warehouses could take around 2-3 hours. (the whole process)

If you want to run only the tpcc benchmark or the analytical queries, you should change the is_tpcc and is_ch variables in create-run.sh. For example if you want to run only tpcc benchmarks, you should set is_tpcc to true and is_ch to false (Alternatively you can see IS_CH and IS_TPCC environment variables). When you are only running the analytical queries, you can also specify how long you want them to be run by changing the DEFAULT_CH_RUNTIME_IN_SECS variable in build-and-run.sh. By default it will be run 3600 seconds.

You can change the thread count and initial sleep time for analytical queries from build-and-run.sh with CH_THREAD_COUNT and RAMPUP_TIME variables respectively.

If you want to run hammerdb4.0 change hammerdb_version to 4.0 in create-run.sh.

By default a random region will be used, if you want you can specify the region with AZURE_REGION environment variable prior to running create-run.sh such as export AZURE_REGION=westus2.

Basic Cluster Setup

On the coordinator node:

# Setup your test cluster with PostgreSQL 12.1 and Citus master branch
fab use.postgres:12.1 use.citus:master setup.basic_testing

# Lets change some conf values 
fab pg.set_config:max_wal_size,"'5GB'"
fab pg.set_config:max_connections,1000

# And restart the cluster
fab pg.restart

If you want to add the coordinator to the cluster, you can run:

fab add.coordinator_to_metadata

If you want the coordinator to have shards, you can run:

fab add.shards_on_coordinator

Running PgBench Tests

On the coordinator node:

# This will run default pgBench tests with PG=12.1 and Citus Enterprise 9.0 and 8.3 release branches
# and it will log results to pgbench_results_{timemark}.csv file
# Yes, that's all :) You can change settings in fabfile/pgbench_confs/pgbench_default.ini
fab run.pgbench_tests

# It's possible to provide another configuration file for tests
# Such as with this, we run the same set of default pgBench tests without transactions
fab run.pgbench_tests:pgbench_default_without_transaction.ini

Running Scale Tests

On the coordinator node:

# This will run scale tests with PG=12.1 and Citus Enterprise 9.0 and 8.3 release branches
# and it will log results to pgbench_results_{timemark}.csv file
# You can change settings in files under the fabfile/pgbench_confs/ directory
fab run.pgbench_tests:scale_test.ini
fab run.pgbench_tests:scale_test_no_index.ini
fab run.pgbench_tests:scale_test_prepared.ini
fab run.pgbench_tests:scale_test_reference.ini
fab run.pgbench_tests:scale_test_foreign.ini
fab run.pgbench_tests:scale_test_100_columns.ini

Running PgBench Tests Against Hyperscale (Citus)

On the coordinator node:

# Use pgbench_cloud.ini config file with connection string of your Hyperscale (Citus) cluster
# Don't forget to escape `=` at the end of your connection string
fab run.pgbench_tests:pgbench_cloud.ini,connectionURI='postgres://citus:HJ3iS98CGTOBkwMgXM-RZQ@c.fs4qawhjftbgo7c4f7x3x7ifdpe.db.citusdata.com:5432/citus?sslmode\=require'

Running TPC-H Tests

On the coordinator node:

# This will run TPC-H tests with PG=12.1 and Citus Enterprise 9.0 and 8.3 release branches
# and it will log results to their own files on the home directory. You can use diff to 
# compare results.
# You can change settings in files under the fabfile/tpch_confs/ directory
fab run.tpch_automate

# If you want to run only Q1 with scale factor=1 against community master,
# you can use this config file. Feel free to edit conf file
fab run.tpch_automate:tpch_q1.ini

Running TPC-H Tests Against Hyperscale (Citus)

On the coordinator node:

# Provide your tpch config file or go with the default file
# Don't forget to escape `=` at the end of your connection string
fab run.tpch_automate:tpch_q1.ini,connectionURI='postgres://citus:dwVg70yBfkZ6hO1WXFyq1Q@c.fhhwxh5watzbizj3folblgbnpbu.db.citusdata.com:5432/citus?sslmode\=require'

Running Valgrind Tests

TL;DR

# 1 # start valgrind test

# create valgrind instance to run
export RESOURCE_GROUP_NAME='your-valgrind-test-rg-name-here'
export VALGRIND_TEST=1
cd azure
./create-cluster.sh

# connect to coordinator
eval `ssh-agent -s`
ssh-add
./connect.sh

# run fab command in coordinator in a detachable session
tmux new -d "fab use.postgres:12.3 use.enterprise:enterprise-master run.valgrind"

# simply exit from coordinator after detaching

# 2 # finalize valgrind test

# reconnect to coordinator after 9.5 hours (if you preferred default coordinator configuration)
export RESOURCE_GROUP_NAME='your-valgrind-test-rg-name-here'

eval `ssh-agent -s`
ssh-add
cd azure
./connect.sh

# you can first check if valgrind test is finished by attaching to tmux session
tmux a
# then you should detach from the session before moving forward
Ctrl+b d

# run push results script
cd test-automation/azure
./push-results.sh <branch name you prefer to push results>

# simply exit from coordinator after pushing the results

# delete resource group finally
cd azure
./delete-resource-group.sh

DETAILS:

To create a valgrind instance, following the steps in Setup Steps For Each Test, do the following before executing create-cluster.sh:

export VALGRIND_TEST=1

, which makes numberOfWorkers setting useless. This is because we will already be using our regression test structure and it creates a local cluster itself. Also, as we install valgrind only on coordinator, if we have worker nodes, then we cannot build PostgreSQL as we require valgrind on workers and get error even if we do not need them.

On the coordinator node:

# an example usage: Use PostgreSQL 12.1 and run valgrind test on enterprise/enterprise-master
fab use.postgres:12.1 use.enterprise:enterprise-master run.valgrind

However as valgrind tests take too much time to complete, we recommend you to run valgrind tests in a detached session:

tmux new -d "fab use.postgres:12.1 use.enterprise:enterprise-master run.valgrind"

After the tests are finished (takes up to 9 hours with default coordinator size), re-connect to the coordinator. Result can be found under $HOME/results directory.

To push the results to release_test_results repository, run the below command in coordinator node:

sh $HOME/test-automation/azure/push-results.sh <branch_name_to_push>

Finally, delete your resource group. Note that automated (weekly) valgrind test already destroys the resources that it uses.

Example fab Commands

Use fab --list to see all the tasks you can run! This is just a few examples.

Once you have a cluster you can use many different variations of the "fab" command to install Citus:

fab --list will return a list of the tasks you can run.
fab setup.basic_testing, will create a vanilla cluster with postgres and citus. Once this has run you can simply run psql to connect to it.
fab use.citus:v7.1.1 setup.basic_testing will do the same, but use the tag v7.1.1 when installing Citus. You can give it any git ref, it defaults to master.
fab use.postgres:10.1 setup.basic_testing lets you choose your postgres version.
fab use.enterprise:v7.1.1 setup.enterprise will install postgres and the v7.1.1 tag of the enterprise repo.

Tasks, and Ordering of Tasks

When you run a command like fab use.citus:v7.1.1 setup.basic_testing you are running two different tasks: use.citus with a v7.1.1 argument and setup.basic_testing. Those tasks are always executed from left to right, and running them is usually equivalent to running them as separate commands. For example:

# this command:
fab setup.basic_testing add.tpch
# has exactly the same effect as this series of commands:
fab setup.basic_testing
fab add.tpch

An exception is the use namespace, tasks such as use.citus and use.postgres only have an effect on the current command:

# this works:
fab use.citus:v7.1.1 setup.basic_testing
# this does not work:
fab use.citus:v7.1.1  # tells fabric to install v7.1.1, but only works during this command
fab setup.basic_testing  # will install the master branch of citus

use tasks must come before setup tasks:

# this does not work!
# since the `setup` task is run before the `use` task the `use` task will have no effect
fab setup.basic_testing use.citus:v.7.1.1

Finally, there are tasks, such as the ones in the add namespace, which asssume a cluster is already installed and running. They must be run after a setup task!

Task Namespaces

`use` Tasks

These tasks configure the tasks you run after them. When run alone they have no effect. Some examples:

fab use.citus:v7.1.1 setup.basic_testing
fab use.enterprise:v7.1.1 setup.enterprise
fab use.debug_mode use.postgres:10.1 use.citus:v7.1.1 setup.basic_testing

use.debug_mode passes the following flags to postges' configure: --enable-debug --enable-cassert CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"

use.asserts passes --enable-cassert, it's a subset of use.debug_mode.

`add` Tasks

It is possible to add extra extensions and features to a Citus cluster:

fab add.tpch:scale_factor=1,partition_type='hash' will generate and copy tpch tables.

The default scale factor is 10. The default partition type is reference for nation, region and supplier and hash for remaining. If you set partition type to 'hash' or 'append', all the tables will be created with that partition type.
fab add.session_analytics will build and install the session_analytics package (see the instructions above for information on how to checkout this private repo)

For a complete list, run fab --list.

As described above, you can run these at the same time as you run setup tasks:

fab use.citus:v7.1.1 setup.enterprise add.shard_rebalancer does what you'd expect.

`pg` Tasks

These tasks run commands which involve the current postgres instance.

fab pg.stop will stop postgres on all nodes
fab pg.restart will restart postgres on all nodes
fab pg.start guess what this does :)
fab pg.read_config:[parameter] will run SHOW [parameter] on all nodes. For example:
fab pg.read_config:max_prepared_transactions

If you want to use a literal comma in a command you must escape it (this applies to all fab tasks)

fab pg.set_config:shared_preload_libraries,'citus\,cstore_fdw'

Using pg.set_config it's possible to get yourself into trouble. pg.set_config uses ALTER SYSTEM, so if you've broken your postgres instance so bad it won't boot, you won't be able to use pg.set_config to fix it.

To reset to a clean configuration run this command:

fab -- rm pg-latest/data/postgresql.auto.conf

`run` Tasks

In order to run pgbench and tpch tests automatically, you can use run.pgbench_tests or run.tpch_automate. If you want to use default configuration files, running commands without any parameter is enough.

To change configuration file for pgbench tests, you should prepare configuration file similar to fabfile/pgbench_confs/pgbench_config.ini.

To change the configuration file for tpch tests, you should prepare configuration file similar to fabfile/tpch_confs/tpch_default.ini.

Advanced fab Usage

By default your fab commands configure the entire cluster, however you can target roles or individual machines.

fab -R master pg.restart will restart postgres on the master node.
fab -R workers pg.stop will shutdown pg on all the workers.
fab -H 10.0.1.240 pg.start will start pg on that specific node.

You can also ask to run arbitrary commands by adding them after --.

fab -H 10.0.1.240 -- cat "max_prepared_transactions=0" >> pg-latest/data/postgresql.conf will modify the postgresql.conf file on the specified worker.
fab -- 'cd citus && git checkout master && make install' to switch the branch of Citus you're using. (This runs on all nodes)

Using Multiple Citus Installations, `pg-latest`

Some kinds of tests (such as TPC-H) are easier to perform if you create multiple simultanious installations of Citus and are able to switch between them. The fabric scripts allow this by maintaining a symlink called pg-latest.

Most tasks which interact with a postgres installation (such as add.cstore or pg.stop) simply use the installation in pg-latest. Tasks such as setup.basic_testing which install postgres will overwrite whatever is currently in pg-latest.

You can change where pg-latest points by running fab set_pg_latest:some-absolute-path. For example: fab set_pg_latest:$HOME/enterprise-installation. Using multiple installations is a matter of changing your prefix whenever you want to act upon or create a different installation.

Here's an example:

fab set_pg_latest:$HOME/pg-960-citus-600
fab use.postgres:9.6.0 use.citus:v6.0.0 setup.basic_testing
fab set_pg_latest:$HOME/pg-961-citus-601
fab use.postgres:9.6.1 use.citus:v6.0.1 setup.basic_testing
# you now have 2 installations of Citus!
fab pg.stop  # stop the existing Citus instance
fab set_pg_latest:$HOME/pg-960-citus-600  # switch to using the new instance
fab pg.start  # start the new instance
# now you've switched back to the first installation

# the above can be abbreviated by writing the following:
fab pg.stop set_pg_latest:$HOME/pg-960-citus-600 pg.start

TroubleShooting

Currently test automation has a lot of dependencies such as fabfile, azure and more. In general failures are temporary, which may be as long as a few days(If the problem is on azure service). In that case there is nothing we can do, but sometimes there are other problems that we can fix, and it is useful to try some of the following steps in that case:

Even if a creation of a cluster fails, you can still see the logs and what caused the problem:
- Find the public ip address of any instance (connect scripts might not be available if the cluster is in an incorrect state)
- Connect to the machine ssh pguser@<public_ip>
- switch to the root user(since pguser doesn't have the access to the logs) sudo su root
- cd into the log directory /var/lib/waagent/custom-script/download/0
- Now you can look at the stderr or stdout to see what went unexpected.
If you find a problem, and you need to update one of the scripts that are used in our cluster initialization in the fileUris part of azuredeploy.json, make sure that you change the branch name as well to see if the fix works, because by default those scripts are taken from master branch and if you don't update it, your change won't be used.
- Note that https://github.com/citusdata/test-automation/blob/master/hammerdb/azuredeploy.json is used for hammerdb
- https://github.com/citusdata/test-automation/blob/master/azure/azuredeploy.json is used for everything else but hammerdb
Updating az cli is also mostly a good option, follow the installation instructions in https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-linux to update your local az cli installation.
If you suspect if a particular az foo bar command doesn't work as expected, you could also insert --debug to have a closer look.
If you're consistently having connection timeout errors (255) when trying to connect to a VM, then consider setting AZURE_REGION environment variable to eastus.

isabella232 / test-automation Goto Github PK

test-automation's Introduction

test-automation

Table of Contents

Azure

Getting Started

Prerequisites

General info

Setup Steps For Each Test

Steps to delete a cluster

Under The Hood

AWS

Getting Started

Setup Steps For Each Test

Steps to delete a cluster

Detailed Configuration

Starting a Cluster

Connecting to the Master

Running Tests

Running Automated Tests

Running Automated Hammerdb Benchmark

Basic Cluster Setup

Running PgBench Tests

Running Scale Tests

Running PgBench Tests Against Hyperscale (Citus)

Running TPC-H Tests

Running TPC-H Tests Against Hyperscale (Citus)

Running Valgrind Tests

Example fab Commands

Tasks, and Ordering of Tasks

Task Namespaces

use Tasks

add Tasks

pg Tasks

run Tasks

Advanced fab Usage

Using Multiple Citus Installations, pg-latest

TroubleShooting

test-automation's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`use` Tasks

`add` Tasks

`pg` Tasks

`run` Tasks

Using Multiple Citus Installations, `pg-latest`