GithubHelp home page GithubHelp logo

cloudwise-opensource / gaia-dataset Goto Github PK

View Code? Open in Web Editor NEW
178.0 10.0 31.0 7.45 GB

GAIA, with the full name Generic AIOps Atlas, is an overall dataset for analyzing operation problems such as anomaly detection, log analysis, fault localization, etc.

License: GNU General Public License v2.0

dataset aiops ai ops devops analysis metrics

gaia-dataset's Introduction

GAIA

Website | Docs | Community & Forum

GAIA, with the full name Generic AIOps Atlas, is an overall dataset for analyzing operation problems such as anomaly detection, log analysis, fault localization, etc.

Quick start

GAIA contains the data from MicroSS (in MicroSS repository in Github link) and metrics from companions (in Companion Data repository in Github link). Statistically, the data from MicroSS contains more than 6,500 metrics, 7,000,000 log items and detailed trace data continuously collected for two weeks. In this scenario, we also simulate the anomalies that may happen in real systems and provide the record for all anomaly injections for fair evaluation of root cause analysis algorithms. This is achieved by controlling the users' behaviors and mimicking the wrong manipulations to the systems.

The data files are listed below.

Git repository Relevant repository Download
MicroSS metric | trace | business | run MicroSS
Companion Data metric_detection | metric_forecast | log Companion Data

Chang Log

  • 2022.05.12 V1.10

Previously, we have provided data for July 2021 of MicroSS. As promised before, we are now updating GAIA to V1.10. In this update, we added one-month data for August 2021 from MicroSS to GAIA. The repository structure is maintained, except that we omitted the trace data whose pattern is quite similar to those that have already been published. Another good news is, we are deploying a new business scenario on MicroSS. The new scenario will contain system logs, which are not provided in the current scenario. Meanwhile, monitoring on more commonly used middlewares and databases is supported, including Zookeeper, Redis, MySQL etc. We also designed more anomaly injection methods so as to simulate system faults as real as possible. The next big update of GAIA may be on September 2021, with data from the new scenario. We hope everyone can enjoy the research on the IT operation, and get benifit from GAIA.

MicroSS

MicroSS rpeository contains all data in different types, selected from the business simulation system MicroSS. It comes from a scenario of logging-in with QR Code. The description of this scenario is also included in MicroSS.

metric

In "metric" folder, each csv filename contains the node to which the file belongs, ip, and the corresponding indicator name and time period, reformulated from the raw data collected by Metricbeat. The data includes fields as follows.

timestamp value
1625133601000 34201179
  • timestamp: the time of data collection: 13-bit time stamp
  • value: value of metric at the timestamp

trace

In "trace" folder, each file contains the trace record, reformulated from the raw data collected by OpenTracing. The data includes fields as follows.

timestamp host_ip service_name trace_id span_id parent_id start_time end_time url status_code message
2021-07-01 10:54:23 0.0.0.4 dbservice1 c124e30fb40651dc 58ac80ceea500f66 8b3e4a4003c5119c 2021-07-01 10:54:22.632751 2021-07-01 10:54:22.632751 http://0.0.0.4:9388/db_login_methods?uuid=a3036736-da17-11eb-9811-0242ac110003&user_id=ToeLCkHR 200 request call function 1 dbservice1.db_login_methods
  • timestamp: string of time record with the form YYYY-MM-DD hh:mm:ss
  • host_ip: the IP of the host running the service named service_name
  • service_name: name of service or host
  • trace_id: UUID of the business trace
  • span_id: UUID of the node in current trace
  • parent_id: UUID of the parent node in current trace
  • start_time:the time this call is created
  • end_time: the time this call is closed
  • url: the RPC url
  • status_code: 200 for normal, and others for anomalies.
  • message: the out-band message for this call

business

In "business" folder, each file contains the business log of a node, reformulated from the raw data. The data includes fields as follows.

datetime service message
2021-07-01 00:00:00 dbservice2 2021-07-01 14:11:54,950 | INFO | 0.0.0.2 | 172.17.0.2 | dbservice2 | 12ef1025e43ec0ef | 3b12f3fa-da33-11eb-875f-0242ac110003-JKrdHZDV-END!RH0>_qOJ token generate success
token=MTYyNTExOTkxNC45NTA0Njk1OjNiMTJmM2ZhLWRhMzMtMTFlYi04NzVmLTAyNDJhYzExMDAwM0pLcmRIWkRWRU5EIVJIMD5fcU9KOjE2MjUxMTk5NzQuOTUwNDc5NTpkZjk2YmIyOThmN2M4ZDg3N2NiYmY2MWZkYWM4ZjBlYw==
  • datetime: string of time record with the form YYYY-MM-DD hh:mm:ss
  • service: the relevant node ID
  • message: extra information in this log.

run

In "run" folder, we provide system log and anomaly injection records. The data includes fields as follows, with the same meaning to files in "business" folder.

datetime service message
2021-07-01 dbservice1 2021-07-01 22:33:05,033 | WARNING | 0.0.0.4 | 172.17.0.3 | dbservice1 | [memory_anomalies] trigger a high memory program, start at 2021-07-01 22:23:04.230332 and lasts 600 seconds and use 1g memory

Companion Data

Companion Data contains metric and log data provided by the companions of Cloudwise. All the data in Companion Data has achieved strict hyposensitization to protect users and companies' privacy. It contains a total of 406 anomaly detection and metric prediction data, including 279 label data, and covers the following types of time series data:

  • Changepoint data
  • Concept_drift_data
  • Linear_data
  • Low_signal-to-noise_ratio_data
  • Partially_stationary_data
  • Periodic_data
  • Staircase_data

In terms of logs, the Companion Data contains log parsing, log semantics anomaly detection, and named entity recognition (NER) data. About 218,736 pieces of log data. Please refer to Companion Data for data description.

metrc_detection

"metrc_detection" folder records the corresponding type of time series data under each subfolder. Notice that all metrics here are labeled, so that metric anomaly detection can be tackled with fair evaluation. The data includes fields as follows.

timestamp value label
1546272000000 168899765 0
1546272300000 168900938.6 0
1546272600000 168902112.2 0
1546272900000 168896334 0
1546273200000 168880129 0
1546273500000 168863924 0
  • timestamp: the time of data collection:13-bit time stamp.
  • value: metric value at the time.
  • label: anomaly label. 0 for normal, and 1 for anomaly.

metrc_forecast

"metrc_detection" folder records the corresponding type of time series data under each subfolder. Time series prediction algorithms can be trained on this data set. The data includes fields as follows.

timestamp value
1546272000000 168899765
1546272300000 168900938.6
1546272600000 168902112.2
1546272900000 168896334
1546273200000 168880129
1546273500000 168863924
  • timestamp: the time of data collection.
  • value: metric value at the time.

log

In "log" folder, three sub-folders are included, "log parsing", "log semantics anomaly detection", and "named entity recognition (NER)", serving for the tasks with the same names. Detailed descriptions of the files within can be found in each sub-folder.

License

GAIA-DataSet is under the Apache 2.0 license. See the LICENSE file for details.

gaia-dataset's People

Contributors

aiwhj avatar neeke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gaia-dataset's Issues

About the label of Micross Data.

First of all, thanks for sharing your data, but I do not find any label in MicroSS\metric dataset, could you please provide complete label such as 0 or 1 of MicroSS-metric. Thanks a lot.

About the data for July

Hello, I only saw the data of webservice1 node in July, may I ask if there is any data of other nodes in July, just like in August

Duplicate timestamps in metrics

There are duplicate timestamps in many metrics. Some of these duplicates have the same value, but often the same timestamp appears in multiple rows with different values. Usually in such cases, one of these rows has a valid value and the remaining rows are 0. Can I just take the non-zero row as the correct row to use for this timestamp? Is this expected when you collected and compiled the data? Thanks.

incomplete data

The data of the business file is incomplete, but someone has used it in the paper, whether the time dimension of some of the data is aligned. For example, whether from July 1st to July 15th is complete. But the log is complete only for August.

Some injected memory anomalies do not have impact on the memory metrics

For example, I am checking the impact of the following injected anomaly:

2021-08-14 03:49:04,212 | WARNING | 0.0.0.4 | 172.17.0.3 | dbservice1 | [memory_anomalies] trigger a high memory program, start at 2021-08-14 03:39:03.575551 and lasts 600 seconds and use 1g memory

Below is a plot of the memory-related metrics on this node and service around that time. Red region is the duration of the high-memory program.
image

There is no change in any metric before and after the anomaly is injected.

Dataset repository not accessible

Hello,

This repository seems not accessible from GitHub except for its mobile app. Please take a look if you know what's going on.

Question regarding to [cpu anomalies] in MicroSS dataset

First of all, it is really appreciated to establish Atlas dataset including metrics, logs, and traces data with anomaly labels!

I have looked into the MicroSS dataset and tried to realize the injected faults which are recored in file MicroSS/run/run.zip.
Running logs have shown that possibly four faulty types have been injected:

  • memory anomalies: 2021-07-26 20:38:42,890 | WARNING | 0.0.0.2 | 172.17.0.2 | dbservice2 | [memory_anomalies] trigger a high memory program, start at 2021-07-26 20:28:42.025044 and lasts 600 seconds and use 1g memory
  • permission denied, 2021-07-27 00:01:00,853 | WARNING | 0.0.0.2 | 172.17.0.2 | dbservice2 | trigger an access permission denied exception, will lasts an hour
  • file missing, 2021-07-28 16:10:01,076 | WARNING | 0.0.0.3 | 172.17.0.4 | webservice2 | trigger the file moving program, start with 2021-07-28 16:00:00.976817, last for 600 seconds
  • cpu anomalies, 2021-07-28 06:40:20,943 | WARNING | 0.0.0.4 | 172.17.0.2 | mobservice2 | [cpu_anomalies] trigger a parallel fast sorting program , start at 2021-07-28 06:40:20.936320 and lasts 3.0034542083740234 seconds

I am curious about the duration of injected cpu anomalies. Since other anomalies are injected for around several hundreds seconds but cpu anomalies are only injected for 3 seconds. An important issue is that can 3s cpu anomalies affect the reliability and availability in system?

When I searched all cpu anomalies duration, a more weird issue emerged. There are several running logs show that cpu anomalies have been injected for more than 1m seconds. For example:

  • 2021-07-29 | logservice1 | 2021-07-29 22:09:57,277 | WARNING | 0.0.0.3 | 172.17.0.3 | logservice1 | [cpu_anomalies] trigger a parallel fast sorting program , start at 2021-07-29 22:09:57.274933 and lasts 1985016.0505759716 seconds

Question about MicroSS Data

In "metric" folder, each csv filename contains the node to which the file belongs. In "business" folder, each file contains the business log of a node. But in "trace" folder, "service_name" is the name of service or host.
What is the difference between a service and a node here?
And why some nodes have the corresponding metric but no associated logs and traces?
I would also like to know the relationship between nodes and containers here, for example, is each node here deployed in a container?

Thanks a lot.

Question about faulty types in MircoSS dataset

First of all, thank you for your MicroSS dataset!
After extracting the templates for the run_table _2021-07.csv, I found that there are 16 templates. Except for the two faulty types [cpu_anomalies] and [memory_anomalies], I am curious whether the following are considered faulty types:

  • "<>-<>-<> <>:<>:<>,<> | WARNING | <> | <> | <> | [normal memory freed label] lasts ten minutes
  • "<>-<>-<> <>:<>:<>,<> | WARNING | <> | <> | <> | <> | wait for <> seconds for follow-up operations to simulate the login failure of the QR code expired
  • "<>-<>-<> <>:<>:<>,<> | ERROR | <> | <> | <> | upload run_logs logs on <>-<>-<*> failed: 'str' object does not support item assignment
  • "<>-<>-<> <>:<>:<>,<> | WARNING | <> | <> | <> | trigger the file moving program, start with <>-<>-<> <>:<>:<>.<>, last for <> seconds
  • "<>-<>-<> <>:<>:<>,<> | ERROR | <> | <> | <> | upload business logs on <>-<>-<> failed: (pymysql.err.OperationalError) (<>, 'ny connections')
  • "<>-<>-<> <>:<>:<>,<> | WARNING | <> | <> | <> | trigger an access permission denied exception, will lasts an hour
  • "<>-<>-<> <>:<>:<>,<> | ERROR | <> | <> | <> | upload <> logs on <>-<>-<> failed: (pymysql.err.OperationalError) (<>, ""Can't connect to MySQL server on '<>' ([Errno <*>] Connection refused)"")

Question about log message

First, thank you for sharing this data set.
I have some questions about the message field in log.

For example,
2021-07-01 10:54:22,639 | INFO | 0.0.0.4 | dbservice1 | permission_operate.py -> permission_operation -> 35 | c124e30fb40651dc | the list of all available services are redisservice1: http://0.0.0.1:9386, redisservice2: http://0.0.0.2:9387

What does this trace id "c124e30fb40651dc" mean?Is there any relationship between this log and this trace?
And What does "permission_operate.py -> permission_operation -> 35" mean?

Thanks a lot.

Complete log data

Excuse me, first of all, thank you for providing the data set. The logs in the business folder only have data between July 1st and July 6th. Can you provide a log data set for a whole month? Thanks a lot.

Missing Files in July and August

Thank you for providing this valuable dataset. However, when I unzipped the business and trace zip files, I only received the trace data in July and the log data of almost the service are in August except webservice1. Could you please provide the complete business and trace data so that we can alignment the those multi-source data along timestamp.
Looking forward to your reply.

Regarding the inconsistent type of metirc

Through preliminary analysis of the metric , we found that the metric names recorded by the same service in different periods and different services in the same period are different. Below are some screenshots of some information about memory,
image
image
image

I would like to ask if the metric of different services are collected in different ways. Can you give a detailed description of these metric ? This will be of great help to us in analyzing the performance of the service and whether there are any abnormalities.

DataSet is Empty

Hi, when accessing the GAIA-DataSet, it shows "This repository is empty." Can you please guide me how to access the data? Thank you!

Causes and measures of failure

Could you explain to me the causes and corresponding measures of several types of failures in a dataset?
Thanks a lot.

the question about the run file

Hello! Thank you for your recently uploaded GAIA dataset! I would like to ask if each line in the csv file in the run file corresponds to an injected exception?
Because I see some lines of log information such as "upload business logs on 2021-07-31 successfully", is this also an exception? If so, what type of exception? Looking forward to your reply, thank you very much!

What are the meanings of the fields in a "business" log message?

For example in 'business_table_2021-08.csv',

the 2nd message is
2021-08-01 00:00:01,315 | INFO | 0.0.0.2 | dbservice2 | permission_operate.py -> permission_operation -> 51 | 7a379c0e7ccf9a58 | now call service:redisservice2, inst:http://0.0.0.2:9387 as a downstream service\n

the 3rd message is
2021-08-01 00:00:01,333 | INFO | 0.0.0.2 | 172.17.0.2 | dbservice2 | 7a379c0e7ccf9a58 | query = select passwards from username_table where user_name='pkiwOTAa'\n

What are the meanings of the 4th and 5th fields in a message?

业务数据缺失

解压出来,只有七月份的webservice1,其他节点的七月份数据确实

Injection Schadule

Hello,

I'm trying to extract time windows between failure injection and occurrence of failure messages in side log data.

Where is the record providing this information?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.