GithubHelp home page GithubHelp logo

queryanalyzeragent's Introduction

Query Analyzer Agent - Capture and analyze the queries without overhead.

Query Analyzer Agent runs on the database server. It captures all the queries by sniffing the network port, aggregates the queries and sends results to a remote server for further analysis. Refer to LinkedIn's Engineering Blog for more details.

Getting Started

Prerequisites

Query Analyzer Agent is written in Go, so before you get started you should install and setup Go. You can also follow the steps here to install and setup Go.

$ wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz
$ sudo tar -C /usr/local -xzf go1.14.linux-amd64.tar.gz
$ mkdir ~/projects
$ export PATH=$PATH:/usr/local/go/bin
$ export GOPATH=~/projects
$ export GOBIN=~/projects/bin

Query Analyzer Agent requires the following external libraries

  • pcap.h (provided by libpcap-dev package), gcc or build-essential for building this package
    • RHEL/CentOs/Fedora:
      $ sudo yum install gcc libpcap libpcap-devel git
      
    • Debian/Ubuntu:
      $ sudo apt-get install build-essential libpcap-dev git
      
  • Go-MySQL-Driver
    $ go get github.com/go-sql-driver/mysql
    

Third Party Libraries

Go build system automatically downloads the following third party tools from the respective github repository during the compilation of this project.

GoPacket
https://github.com/google/gopacket
Copyright (c) 2012 Google, Inc. All rights reserved.
Copyright (c) 2009-2011 Andreas Krennmair. All rights reserved.
License: BSD 3-Clause "New" or "Revised" License

Percona Go packages for MySQL
https://github.com/percona/go-mysql
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
License: BSD 3-Clause "New" or "Revised" License

Viper
https://github.com/spf13/viper
Copyright (c) 2014 Steve Francia
License: MIT

Setting up remote database

Query Analyzer Agent either prints the aggregated queries to a local log file or sends to a remote database which can store queries collected from all the agents. We have chosen MySQL as the remote database.

Execute the following SQL statements on the remote database server.

CREATE DATABASE IF NOT EXISTS `query_analyzer`;

CREATE TABLE IF NOT EXISTS `query_analyzer`.`query_info` (
  `hostname` varchar(64) NOT NULL DEFAULT '',
  `checksum` char(16) NOT NULL DEFAULT '',
  `fingerprint` longtext NOT NULL,
  `sample` longtext CHARACTER SET utf8mb4,
  `firstseen` datetime NOT NULL,
  `mintime` float NOT NULL DEFAULT '0',
  `mintimeat` datetime NOT NULL,
  `maxtime` float NOT NULL DEFAULT '0',
  `maxtimeat` datetime NOT NULL,
  `is_reviewed` enum('0','1','2') NOT NULL DEFAULT '0',
  `reviewed_by` varchar(20) DEFAULT NULL,
  `reviewed_on` datetime DEFAULT NULL,
  `comments` mediumtext,
  PRIMARY KEY (`hostname`,`checksum`),
  KEY `checksum` (`checksum`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;

CREATE TABLE IF NOT EXISTS `query_analyzer`.`query_history` (
  `hostname` varchar(64) NOT NULL DEFAULT '',
  `checksum` char(16) NOT NULL DEFAULT '',
  `src` varchar(39) NOT NULL DEFAULT '',
  `user` varchar(16) DEFAULT NULL,
  `db` varchar(64) NOT NULL DEFAULT '',
  `ts` datetime NOT NULL,
  `count` int unsigned NOT NULL DEFAULT '1',
  `querytime` float NOT NULL DEFAULT '0',
  `bytes` int unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`hostname`,`checksum`,`ts`),
  KEY `checksum` (`checksum`),
  KEY `user` (`user`),
  KEY `covering` (`hostname`,`ts`,`querytime`,`count`,`bytes`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (TO_DAYS(ts))
(PARTITION p202004 VALUES LESS THAN (TO_DAYS('2020-05-01')) ENGINE = InnoDB,
 PARTITION p202005 VALUES LESS THAN (TO_DAYS('2020-06-01')) ENGINE = InnoDB,
 PARTITION p202006 VALUES LESS THAN (TO_DAYS('2020-07-01')) ENGINE = InnoDB,
 PARTITION p202007 VALUES LESS THAN (TO_DAYS('2020-08-01')) ENGINE = InnoDB,
 PARTITION p202008 VALUES LESS THAN (TO_DAYS('2020-09-01')) ENGINE = InnoDB,
 PARTITION p202009 VALUES LESS THAN (TO_DAYS('2020-10-01')) ENGINE = InnoDB,
 PARTITION p202010 VALUES LESS THAN (TO_DAYS('2020-11-01')) ENGINE = InnoDB,
 PARTITION p202011 VALUES LESS THAN (TO_DAYS('2020-12-01')) ENGINE = InnoDB,
 PARTITION pMAX VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */;
/* You can use different partition scheme based on your retention */

CREATE USER /*!50706 IF NOT EXISTS*/ 'qan_rw'@'qan_agent_ip' IDENTIFIED BY 'Complex_P@ssw0rd';

GRANT SELECT, INSERT, UPDATE ON `query_analyzer`.* TO 'qan_rw'@'qan_agent_ip';

The above SQLs can be found in remote_database/remote_schema.sql and remote_database/users.sql files.

Build and Install

$ git clone https://github.com/linkedin/QueryAnalyzerAgent
$ cd QueryAnalyzerAgent
$ go get
$ go build -o $GOBIN/QueryAnalyzerAgent

Configuration

QueryAnalyzerAgent config is in TOML format which is organized into several subheadings. For the basic use, you need to specify the Ethernet Interface, Port and connection details of remote database endpoint in the config file - qan.toml

Once the remote database is setup, update qan.toml

[remoteDB]
Enabled = 1

# remote database hostname to send results to
Hostname = "remote_database_hostname"

# remote database port to send results to
Port = 3306

# remote database username to send results to
Username = "qan_rw"

# remote database password to send results to
Password = "Complex_P@ssw0rd"

# remote database name to send results to
DBName = "query_analyzer"

If user and db details of connection are needed, create a user to connect to the local database and update the localDB section. Create user SQL can be found at local_database/users.sql

Running Query Analyzer Agent

Since the agent sniffs the network interface, it should have net_raw capability.
$ sudo setcap cap_net_raw+ep $GOBIN/QueryAnalyzerAgent
$ $GOBIN/QueryAnalyzerAgent --config-file qan.toml (or complete path to qan.toml)

If you do not set the net_raw capability, you can run the agent as a root user.
$ sudo $GOBIN/QueryAnalyzerAgent --config-file qan.toml (or complete path to qan.toml)

Query Analytics

Once you understand the schema, you can write queries and build fancy UI to extract the information you want. Examples:

  • Top 5 queries which have the maximum total run time during a specific interval. If a query takes 1 second and executes 1000 times, the total run time is 1000 seconds.

    SELECT 
        SUM(count),
        SUM(querytime) 
    INTO 
        @count, @qt 
    FROM 
        query_history history 
    WHERE 
        history.hostname='mysql.database-server-001.linkedin.com' 
        AND ts>='2020-03-11 09:00:00' 
        AND ts<='2020-03-11 10:00:00';
      
    SELECT 
        info.checksum,
        info.firstseen AS first_seen,
        info.fingerprint,
        info.sample,
        SUM(count) as count,
        ROUND(((SUM(count)/@count)*100),2) AS pct_total_query_count,
        ROUND((SUM(count)/(TIME_TO_SEC(TIMEDIFF(MAX(history.ts),MIN(history.ts))))),2) AS qps,
        ROUND((SUM(querytime)/SUM(count)),6) AS avg_query_time,
        ROUND(SUM(querytime),6) AS sum_query_time,
        ROUND((SUM(querytime)/@qt)*100,2) AS pct_total_query_time,
        MIN(info.mintime) AS min_query_time,
        MAX(info.maxtime) AS max_query_time
    FROM 
        query_history history 
    JOIN     
        query_info info 
    ON 
        info.checksum=history.checksum 
        AND info.hostname=history.hostname 
    WHERE 
        info.hostname='mysql.database-server-001.linkedin.com' 
        AND ts>='2020-03-11 09:00:00' 
        AND ts<='2020-03-11 10:00:00' 
    GROUP BY 
        info.checksum 
    ORDER BY
        pct_total_query_time DESC 
    LIMIT 5\G
    
  • Trend for a particular query

    SELECT 
        UNIX_TIMESTAMP(ts),
        ROUND(querytime/count,6) 
    FROM 
        query_history history 
    WHERE 
        history.checksum='D22AB75FA3CC05DC' 
        AND history.hostname='mysql.database-server-001.linkedin.com' 
        AND ts>='2020-03-11 09:00:00' 
        AND ts<='2020-03-11 10:00:00';
    
  • Queries fired from a particular IP

    SELECT
        info.checksum,
        info.fingerprint,
        info.sample
    FROM 
        query_history history 
    JOIN     
        query_info info 
    ON 
        info.checksum=history.checksum 
        AND info.hostname=history.hostname 
    WHERE 
        history.src='10.251.225.27'
    LIMIT 5;
    
  • New queries on a particular day

    SELECT
        info.firstseen,
        info.checksum,
        info.fingerprint,
        info.sample
    FROM   
        query_info info 
    WHERE 
        info.hostname = 'mysql.database-server-001.linkedin.com' 
        AND info.firstseen >= '2020-03-10 00:00:00'
        AND info.firstseen < '2020-03-11 00:00:00'
    LIMIT 5;
    

Limitations

  • As of now, it works only for MySQL.

  • Does not account for

    • SSL
    • Compressed packets
    • Replication traffic
    • Big queries for performance reasons
  • The number of unique query fingerprints should be limited (like <100K). For example if there is some blob in the query and the tool is unable to generate the correct fingerprint, it will lead to a huge number of fingerprints and can increase the memory footprint of QueryAnalyzerAgent.

    Another example is if you are using Github's Orchestrator in pseudo GTID mode, it generates queries like

    drop view if exists `_pseudo_gtid_`.`_asc:5d8a58c6:0911a85c:865c051f49639e79`
    

    The fingerprint for those queries will be unique each time and it will lead to more number of distinct queries in QueryAnalyzerAgent. Code to ignore those queries is commented, uncomment if needed.

  • Test the performance of QueryAnalyzerAgent in your staging environment before running on production.

queryanalyzeragent's People

Contributors

appigatla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

queryanalyzeragent's Issues

UI availability?

I guess the UI developed by LinkedIn for this is not open source right?

Is there any open source UI for this project? Even a basic one?

No reports sent to remote server

So I have mysql and the agent running on localhost.

I also have another mysql running in a virtual machine, that's the remote db.

The agent is running, and I'm generating mysql traffic using AB, making 1000 reqs. Yet, I can't see any queries captured. The logs shows only this:

2021-01-30 17:57:54
Number of queries more than Capture Length: 0
Number of aborted connections: 0
Number of access denied requests: 0
Length of Query Info Hashmap: 0

The agent produces no output.

I'm new to this tool. Shouldn't it capture all queries? The local mysql is v5.7.32 while the remote one is 8.x. Is this a problem?

My qan.toml:

[sniffer]
# Database type. As of now only MySQL is supported
Database = "MySQL"

# Port to sniff
ListenPort = 3306

# interface to listen on (ex: eth0, eth1, bond0 etc)
ListenInterface = "en0"

# The max payload of TCP is 64K, but it will be a big CPU overhead to capture 64K.
# The capture length can be reduced (most of the queries won't take full payload length) to suit your query length
CaptureLength = 8192

# You can get packets bigger than the capture length and logging the number of such packets helps us in tuning the capture length.
# You can also get number of aborted connections due to timeout, reset connections etc
# Print such instances every ReportStatsInterval seconds
ReportStatsInterval = 60

# Comma separated list of IPs to be ignored from sniffing
IgnoreIPs = ""

[qan-agent]
# Comma separated debug levels
# Example:
# DebugLevels = "2,3,4,5,9,10"
# 1 - Processed query info, this can be used if you do not send results to remote server
# 2 - Source, Query, Query Time
# 3 - Queries greater than capture length
# 4 - User and connection related information, memory related info
# 5 - MySQL packet info
# 6 - Query Response info
# 7 - Orphan packets garbage collection information
# 8 - Orphan packets garbage collection detailed information
# 9 - Access denied requests
# 10 - Processlist
DebugLevels = "1,2,3,4,5,6,7,8,9,10"

# If the query is bigger than MaxRequestLength, it will be ignored. Probably it is a huge insert
MaxRequestLength = 524288

# Maximum number of db connections. This decides the connection buffer for qan agent. Buffer will be set to 1.5 times the max connections. It is fine to have connections to db more than what is specified here.
MaxDBConnections = 1024

# Send the query report to remote server every ReportInterval seconds
ReportInterval = 20

# Log file to print
LogFile = "/var/log/qan.log"

[localDB]
# Some connections might have been established before the sniffer was started. It is not possbile to get those connection details like user and db. If enabled, agent connects to the local database, checks the processlist and gets the user and db info
# 0 - Do not check processlist
# 1 - Check processlist only once at startup
# 2 - Check processlist as and when required
Enabled = 0

# Username to connect to the local database
LocalUsername = "qan_ro"

# Password to connect to the local database
LocalPassword = "xxxx"

# Socket to connect to the local database
LocalSocket = "/var/lib/mysql/mysql.sock"

### Send the query reports to remote database server
[remoteDB]
Enabled = 1

# remote database hostname to send results to. If it is IPv6, enclose with [] example: [::1]
Hostname = "10.211.55.6"

# remote database port to send results to
Port = 3306

# remote database username to send results to
Username = "qan_rw"

# remote database password to send results to
Password = "Complex_P@ssw0rd"

# remote database name to send results to
DBName = "query_analyzer"

# Sample query is the query which took maximum time for that fingerprint.
# Sample query contains data. If you do not want to send data, disable this
IncludeSample = 1

# send the reports over SSL
# 0 - disabled
# 1 - enabled with skip verify
# 2 - self signed SSL certificates taken from Ca_cert, Client_cert, Client_key config
# 3 - SSL certificates taken from Ca_cert, Client_cert, Client_key config
SSLEnabled = 0

# SSL certificate details
Ca_cert = ""
Client_cert = ""
Client_key = ""

### post to remote API instead of remote database (not implemented yet)
[remoteAPI]
Enabled = 0
URL = "https://xxxx"
apikey = "xxxx"

Using this on AWS

Is there any way to deploy this to AWS?

I mean, my database is an RDS instance. I can't install it on that host, since it's a managed service.
How do I monitor an RDS instance using this app?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.