GithubHelp home page GithubHelp logo

aicoe-aiops / ocp-ci-analysis Goto Github PK

View Code? Open in Web Editor NEW
33.0 33.0 71.0 234.86 MB

Developing AI tools for developers by leveraging the data made openly available by OpenShift and Kubernetes CI platforms.

Home Page: https://old.operate-first.cloud/data-science/ai4ci/

License: GNU General Public License v3.0

Python 0.12% Makefile 0.01% Jupyter Notebook 99.87% Shell 0.01%

ocp-ci-analysis's People

Contributors

aakankshaduggal avatar amsaparov avatar antter avatar cdolfi avatar codificat avatar csibbitt avatar dependabot[bot] avatar durandom avatar fridex avatar goern avatar harshad16 avatar hemajv avatar humairak avatar isabelizimm avatar martinpovolny avatar michaelclifford avatar oindrillac avatar sankbad avatar sesheta avatar shreyanand avatar suppathak avatar tumido avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocp-ci-analysis's Issues

Additional EDA on TestGrid Data set

To close issue #15 and build upon the initial EDA work done in #16 there are a number of additional questions that we would like answered about the TestGrid dataset. Specifically:

  • How comparable are the testgrids?
  • How do we analyze them in aggregate to learn from their combined behavior?
  • How many/ which tests do they all have in common?
  • Are their time series dates comparable?
  • Are there sub-groups that should only be compared with one another?
  • Is looking at the grid matrices independent of test names a valid approach for issue identification?
  • What is the expected behavior of a test over time across multiple jobs.
  • How does the entire test platform/specific tests perform on a given day?
  • How does the entire test platform behavior evolve over time.
  • Is there sufficient data here for useful ML approaches?
  • Can we develop some meaningful alerting/ problem identification with the results of the above questions?

Acceptance Criteria:

  • Notebook that address the questions above.

Add data collection step from sippy to this repo.

As a data scientists I would like a simple way to collect new data from openshift CI using sippy so that we can included updated datasets into our analysis process.

Acceptance Criteria:

  • a script that can achieve the following:
install go
git clone github.com/openshift/sippy
make
mkdir /tmp/sippy
# fetch a fresh copy of the raw data from testgrid
./sippy --fetch-data /tmp/sippy --release 4.6
# perform the analysis on the raw data
./sippy --local-data /tmp/sippy --release 4.6 -o json > /tmp/sippy.json ```



Aakanksha up to speed with project

As a data scientist and contributor to this project I need to have a strong hands-on understanding of all the work done to date as well as knowledge of how to improve upon and extend the existing work.

PLEASE READ: This issue should be used as a template. Please make a copy of it and replace <NAME> with your name when creating the new issue.

Acceptance Criteria:

  • Use ocp-ci-analysis:latest image on https://jupyterhub-opf-jupyterhub.apps.cnv.massopen.cloud/ and successfully run every notebook in the notebooks directory.
  • Submit at least 1 Issue/PR fixing a bug, fixing a graph formatting, changing an unclear notebook section, or adding a small additional data analysis to a notebook. (Look for something to improve as you go through the existing work ๐Ÿ˜ƒ)
  • Familiarize yourself with the following 3 resources:
    * Sippy Repo and Dashboard for an example of metrics and TestGrid data analysis.
    * TestGrid Repo and Dashboard to familiarize yourself with our initial data source.
    * Prow and google cloud storage to see the underlying CI data informing these higher levels of abstraction.

Understand the testgrid ecosystem

Have a look at https://github.com/GoogleCloudPlatform/testgrid and the video linked from there.
Testgrid already applies some logic to the test runs, like identifying boards with flaky tests, or boards without tests reporting in. You can also report some additional metrics like test coverage.

  1. is there a central definition for the data uploaded to RH testgrid? Test status, name, infra (gws)
  2. is there some tooling to download the data from testgrid, in python?
  3. is there some prior work on analyzing testgrid data at scale? blogs?

Short video walkthrough for EDA

As a potential Data Science Contributor I wan to quickly understand how to work with the data, review and reproduce the EDA that has already been done.

Acceptance Criteria:

  • Short video on data and data access

Initial EDA on google cloud storage log data.

There is a fair amount of semi-structured log (text) data that gets generated by the CI process. There are likely valuable insights in this data that could be leveraged by SME's if there was an automated way of reducing the total amount of logs that had to be reviewed. As a data scientist, I would like to understand the nature of this data and how best to access it in a data science friendly format, so that I can contribute to the development of automated ML methods for analyzing it.

Acceptance Criteria:

  • EDA notebook for log data found here

Spike: Identify TestGrid Metrics/KPI's - PR Performance

As Data Scientists our first task is to convert the raw data generated by the CI processes into some meaningful KPI's/Metrics/Features (numbers) that we can track and use to describe the state or behavior of the CI process over time. One of the key elements of the CI process are the PR's that are being tested. The ability to understanding the potential behavior of these code changes is critical.

PR KPI's could be things like "Number of commits before merge", "Diff size", "PR complexity", etc. As data scientists, we must admit that we are not currently subject matter experts in CI monitoring and do not know the best metrics to track for monitoring CI to support developers . As such, we need to perform a research spike and look for example KPI's used in the industry that we could collect and monitor from the TestGrid Data.

Acceptance Criteria:

  • Open an issue with one new PR performance KPI. Issue must include link to resource used to discover the metric, an explanation of why it would be useful to track, and a brief outline describing how we could generate it from our existing data sources.

Identifying flakey test in TestGrid data

flaky test result exhibits both a passing and a failing result with the same code. Hence, It takes a lot of effort on the developer side to manually determine whether a new failure is a flaky result or a legitimate failure. Hence, we are interested in identifying failure due to a flakey test in TestGrid data using data-driven methods.

Acceptance criteria:

  • Notebook on Identifying flakey test in TestGrid data

Missing commit ID's for Openshift testgrids

For testgrid of k8s, we have commit ID's for each run. Marked in the red circle image below:
imageedit_20_4566214468

But for testgrid of Openshift, we don't have commit ID's. Marked in the red circle in the image below:
imageedit_24_3846326751

We can collect a lot of metadata if we have commit ID available for each run. From those Commit IDs we can create different features that might be used for the data analysis part. Some of those features are

  1. What type of file is changed in a commit? because Test cases that fail due to changes in config files are very likely to be flaky
  2. A test case that has failed on a git revision that changed a file which was previously changed by more than two authors recently is highly likely to be a real failure
  3. A test case that has failed on a git revision where many source code files were changed, is highly likely to be a real failure.

Acceptance criteria:

  • Communication with an appropriate team to include commit IDs in test grid.

Sanket up to speed with project

As a data scientist and contributor to this project I need to have a strong hands-on understanding of all the work done to date as well as knowledge of how to improve upon and extend the existing work.

Acceptance Criteria:

  • Use ocp-ci-analysis:latest image on https://jupyterhub-opf-jupyterhub.apps.cnv.massopen.cloud/ and successfully run every notebook in the notebooks directory.
  • Submit at least 1 Issue/PR fixing a bug, fixing a graph formatting, changing an unclear notebook section, or adding a small additional data analysis to a notebook. (Look for something to improve as you go through the existing work ๐Ÿ˜ƒ)
  • Familiarize yourself with the following 3 resources:
    * Sippy Repo and Dashboard for an example of metrics and TestGrid data analysis.
    * TestGrid Repo and Dashboard to familiarize yourself with our initial data source.
    * Prow and google cloud storage to see the underlying CI data informing these higher levels of abstraction.

ML Request: Implement a Predictive Test Selection Tool

In an effort to leverage the CI data available to us and improve the kubernetes development process through machine learning, we should look into the development of a predictive test selection tool that can be used to identify a limited number of tests that are most likely to find a regression for a given code change.

Please see this blog post from Facebook engineering outlining their approach.

As noted in the blog the "system automatically develops a test selection strategy by learning from a large data set of historical code changes and test outcomes." Which should be feasible for us given the data we have access to in this project.

Include the job run url into the correlation results

As an end user investigating the underlying issue represented by the highly correlated failure sets, I would like to also be provided with the job run url for the instances where these failures occurred so that I can see more details about the failures and detriment a root cause.

Acceptance Criteria:

  • Add job run urls as an additional column to the output of Inittial_EDA.ipynb

Review flakiness detection in testgrid

We want to know if there is an opportunity to use ML techniques to improve upon the existing Flake detection tool currently being used by testgrid. To answer that question we first have to identify how the current Flake Detection tool is implemented.

Acceptance Criteria:

  • Explanation of Flake detection implementation.

Write over arching project doc

As a data scientists I want to make sure that this project is well defined so that all stakeholders agree on the work to be done.

Acceptance Criteria

  • Project Document Agreed upon by all stakeholders

Catalog the existing Research Papers/Articles for flaky test detection.

Is your feature request related to a problem? Please describe.
There is a lot of research done on flaky test detection. We want to catalog/collect the existing research work. We can explore these research papers/articles in the future.

Describe the solution you'd like
Markdown with a short summary of different research papers.

Review Sippy Analysis Output

As a data scientist I want to list what analysis output is generated by Sippy to determine if it could be recreated in a notebook environment.

Acceptance Criteria:

  • Jupyter Notebook that recreates values generated by Sippy or an explanation why it can't be done.

Complete Sippy Notebook EDA

The Sippy EDA notebook, currently only looks at a portion of the available data set. As a data scientists and contributor to this project, I would like a full explanation of the aggregated CI data that I have access to, so that I do not have to repeat the discovery phase myself.

Acceptance Criteria:

  • EDA notebook is complete, including an exploratory section for each section of the Sippy data sample.

Spike: Identify TestGrid Metrics/KPIs - Test Performance

As Data Scientists our first task is to convert the raw data generated by the CI processes into some meaningful KPI's/Metrics/Features (numbers) that we can track and use to describe the state or behavior of the CI process over time. One of the key elements of the CI process are the tests. Understanding the health and behavior of these test suites is critical.

Test KPI's could be things like "Test Pass Rate", "Test Run Rate", "Number of Correlated Failures with Test", etc. Sippy is currently quantifying these types of metrics and might be a good place to start looking for examples. But, as data scientists, we must admit that we are not currently subject matter experts in CI monitoring and do not know the best metrics to track for monitoring CI platform test health. As such, we need to perform a research spike and look for example KPI's used in the industry that we could collect and monitor from the TestGrid Data.

Acceptance Criteria:

  • create a markdown document of KPI's. Issue must include link to resource used to discover the metric, an explanation of why it would be useful to track, and a brief outline describing how we could generate it from our existing data sources.

Rename Project

Please add your suggestions to this issue for new project name. We'll decide/vote on it in our next sprint meeting.

List of Potential OpenShift CI Data ML Projects

There are a number of potential avenues of investigation for providing ML or automated analysis to the OpenShift CI data. After an initial review of existing work the three ideas that have been presented to date are:

  • Identify canary failures
  • Analyze job runs with a large number of test failures
  • Look for correlation patterns in test failures

That said, I'm sure there are many more potential projects that could be pursued with CI data that could benefit openshift. Please use this issue as a forum to list and discuss these potential projects.

Documentation of different cell labels in the testgrid

Is your feature request related to a problem? Please describe.
There are different cell labels for testgrids,
For example Green cell, Red Cell, Red cell with 'F' annotation, Purple cell, Cell with 'R' annotation

We want to understand the meaning of each of these cells. The logic behind the annotation or color

Describe the solution you'd like
Google doc and markdown with a description of all types of cells and the logic behind annotation.

ML Request : Implement a probabilistic flakiness score for tests

Develop and implement a probabilistic flakiness score for each test as outlined in this article from Facebook engineering. It provides a reliable real time metric that can provide insight into the health of individual tests in a CI pipeline and provide engineers information on where to focus there efforts in updating tests.

Remove dependency on "nbimporter" as it is no longer maintained and breaks pre-commit checks

Describe the bug

The nbimporter package is used throughout the repo to import functions from other notebooks, but is no longer supported by developers and breaks the pre-commit check. Recommend removing it and accessing shared function another way.

To Reproduce

Steps to reproduce the behavior:

  1. Open a new notebook
  2. Import nbimporter
  3. Import function from adjacent notebook
  4. Run git add <new-notebook>
  5. Run pre-commit
  6. See error: F401 'nbimporter' imported but unused from flake8-nb

Expected behavior

pre-commit does not produce any F401 errors

Screenshots

image

Additional context

From the repo's readme:
image

Collect a fixed train/test/validate data set for TestGrid

As a data scientists, its important to have a fixed immutable dataset to work with while developing, evaluating and validating our initial models. Since the TestGrid data updates everyday, there is potential for poor reproduciblity of experiments if we don't maintain a fixed experimental data set before applying to the live data.

Acceptance Criteria:

  • Maximum available data at date of collection for TestGrid data for Red Hat.

  • Stored and accessible in Ceph (or other public hosting)

Write EDA Notebook based on available Sippy Data.

Beyond the failure correlations already started, there may be other features in the sippy data that could be used for additional analysis. In order to do our data science due diligence, we will create a notebook going through this dataset.

Initial EDA

At the onset of the project, as a data scientist I would like to examine the type of data we will be working with, as well as provide some minor insights around correlated tests.

Acceptance Criteria:

  • EDA notebook the explores sippydata.json

  • Find highly correlated failure sets in sippydata.json

Links to resources

Vertical white column: usually means "install failed". That's because we can't run tests if the installer didn't complete, and test grid omits squares. Almost every "infrastructure" flake related to the process of CI will be in this category - if there is any green in the column, odds are the problem is either in the test or in the cluster, not in the CI cluster or the job itself (very rarely will this be something network related).

Rows with red interspersed with green: almost always a flaky test, but sometimes a core bug across. multiple tests. You can guesstimate the frequency by counting the red squares and then visually estimating how many runs showed up over that interval (the boxes are regular, so I usually just hold up two fingers for 20 results and then see how many fall into that. If you look at this page frequently, you can also narrow down the day when the flake was introduced because it'll start flaking at some point

Rows with solid red chunks: almost always a regression either in the test or the product. If you see it on test-grid, it usually means it's post merge, so either someone force merged (bad!) or the test behaves differently between PR and release jobs (for instance, auth and whether it's on quay).

Row with solid red chunk and white to the right: a new test was added that is failing when run in the release job. In the picture below, that's the storage test we added that worked in PR but didn't work against the older RHCoS image

Repeating vertical red bars: If you see a set of rows that all fail together on the same runs, that usually means a subsystem has a bug. Previously, we were seeing that on quota, so every 5-10 runs all of the quota tests would fail in a given run because the kube-apiserver stopped handling quota and all tests would fail.

Failure waterfall: If you see a meandering line moving from bottom to top, right to left, this almost always means "core control plane (cluster infra) flake during the run". This is because the sort order of the grid prioritizes failed runs, and so you can see different tests hitting the flake each time. e2e tests are run in random order, so the same test is unlikely to fail twice if the test runs at different times on each run. Also, in e2e we re-run up to a limited number of failures at the end of the test to see whether these are reproducible failures or just a flaky tests. If the test passes the second time we record it here (red square) but the run itself is allowed to pass if the limit is low.

In the picture below, the line is caused by the kube-apiserver doing a rolling restart after the e2e tests are started (which shouldn't be happening) but if graceful restart is working correctly, the tests shouldn't fail (the point of graceful restart is to drain all short lived requests before we stop accepting new connections). It impacts different tests each time, and some tests are more impacted than others.
-- Clayton

image

Relative link for readme.md file in project-doc.md

Is your feature request related to a problem? Please describe.
Currently, we have a copy readme.md file inside docs/publish/project-doc.md
Hence, whenever we are updating readme.md file we need to update docs/publish/project-doc.md

Describe the solution you'd like
We can use relative link in one of the Markdown files to avoid two different copy of the same file.

Acceptance criteria
relative link of readme.md file in docs/publish/project-doc.md

Milestone 1: EDA Notebook and project doc on operate first website

We want to make sure that the work we are doing for the OCP CI data analysis is easy to follow and interact with so that we can get more contributions from other Data Scientists.

Acceptance Criteria:

Include well written and polished version of the following on operate-first.github.io

  • Project Document: Outlining project goal
  • Rendered notebook on initial testgrid EDA
  • Rendered notebook on in-depth testgrid EDA
  • Polish all content, focused on ease of use by new contributors

Oindrilla up to speed with project

As a data scientist and contributor to this project I need to have a strong hands-on understanding of all the work done to date as well as knowledge of how to improve upon and extend the existing work.

PLEASE READ: This issue should be used as a template. Please make a copy of it and replace <NAME> with your name when creating the new issue.

Acceptance Criteria:

  • Use ocp-ci-analysis:latest image on https://jupyterhub-opf-jupyterhub.apps.cnv.massopen.cloud/ and successfully run every notebook in the notebooks directory.
  • Submit at least 1 Issue/PR fixing a bug, fixing a graph formatting, changing an unclear notebook section, or adding a small additional data analysis to a notebook. (Look for something to improve as you go through the existing work ๐Ÿ˜ƒ)
  • Familiarize yourself with the following 3 resources:
    * Sippy Repo and Dashboard for an example of metrics and TestGrid data analysis.
    * TestGrid Repo and Dashboard to familiarize yourself with our initial data source.
    * Prow and google cloud storage to see the underlying CI data informing these higher levels of abstraction.

<NAME> up to speed with project

As a data scientist and contributor to this project I need to have a strong hands-on understanding of all the work done to date as well as knowledge of how to improve upon and extend the existing work.

PLEASE READ: This issue should be used as a template. Please make a copy of it and replace <NAME> with your name when creating the new issue.

Acceptance Criteria:

  • Use Openshift CI Analysis Notebook Image on https://jupyterhub-opf-jupyterhub.apps.smaug.na.operate-first.cloud/ and successfully run every notebook in the notebooks directory.
  • Submit at least 1 Issue/PR fixing a bug, fixing a graph formatting, changing an unclear notebook section, or adding a small additional data analysis to a notebook. (Look for something to improve as you go through the existing work ๐Ÿ˜ƒ)
  • Familiarize yourself with the following 3 resources:
    * Sippy Repo and Dashboard for an example of metrics and TestGrid data analysis.
    * TestGrid Repo and Dashboard to familiarize yourself with our initial data source.
    * Prow and google cloud storage to see the underlying CI data informing these higher levels of abstraction.

Documentation: Continuous Integration Artifacts From a Data Science Perspective

As a Data Scientists interested in applying my machine learning expertise to the problem of developing intelligent CI/CD tools, I would like clear and concise documentation explaining the CI/CD process, giving special attention to the data types and artifacts (logs, metrics, bug reports, code diffs, etc) generated by these development processes and how these data artifacts relate to each other so that their is a lower the barrier to entry to make meaningful contributions in this domain.

My assumption is that the average data scientist has little experience with the inner workings of large scale application development infrastructure. This lack of domain expertise could create a potential major blocker to contributions. I want to make sure that we have a simple to understand, well vetted (accurate) and singular "anatomy of the Kubernetes/OpenShift CI process" documented that contributors can reference when developing new tools.

This should also address the need in our planning document for an "Anatomy of Kubernetes/OpenShift CI Data"

Acceptance Criteria:

Include `Testgrid_flakiness_detection.ipynb` notebook on OperateFirst website.

Is your feature request related to a problem? Please describe.
We want the flakiness detection notebook on Operate first website so that we get more contributions/feedbacks from other Data Scientists.

Describe the solution you'd like
Rendered Testgrid_flakiness_detection.ipynb notebook on Operate first website

Spike: Research Existing Aiops Features/ Offerings

In an effort to drive opensource AIOps for CI/CD, we want to ensure that we have a complete and up to date understanding of what features are available, being developed, and considered state-of-the-art both in industry and in the opensource community. To start we will do a research spike, identifying the existing offerings by leading AIOps service providers.

Acceptance Criteria:

  • Open an issue outlining an opensource alternative to an offering provided for each leading AIOps service provider.

Some existing providers

Karan up to speed with project

As a data scientist and contributor to this project I need to have a strong hands-on understanding of all the work done to date as well as knowledge of how to improve upon and extend the existing work.

Acceptance Criteria:

  • Use ocp-ci-analysis:latest image on https://jupyterhub-opf-jupyterhub.apps.cnv.massopen.cloud/ and successfully run every notebook in the notebooks directory.
  • Submit at least 1 Issue/PR fixing a bug, fixing a graph formatting, changing an unclear notebook section, or adding a small additional data analysis to a notebook. (Look for something to improve as you go through the existing work ๐Ÿ˜ƒ)
  • Familiarize yourself with the following 3 resources:

Downloading github metadata from commitid's

Is your feature request related to a problem? Please describe.
In testgrid, we have commit ids for each run. We want to download the GitHub metadata from these commit ids.

Acceptance criteria
Download the following metadata for each commit ids

  1. What files are changed in a particular commit
  2. Author of the commit
  3. For each changed file: How many times that particular file is changed previously.
  4. For each changed file: How many authors edited that particular file.

Hema up to speed with project

As a data scientist and contributor to this project I need to have a strong hands-on understanding of all the work done to date as well as knowledge of how to improve upon and extend the existing work.

PLEASE READ: This issue should be used as a template. Please make a copy of it and replace <NAME> with your name when creating the new issue.

Acceptance Criteria:

  • Use ocp-ci-analysis:latest image on https://jupyterhub-opf-jupyterhub.apps.cnv.massopen.cloud/ and successfully run every notebook in the notebooks directory.
  • Submit at least 1 Issue/PR fixing a bug, fixing a graph formatting, changing an unclear notebook section, or adding a small additional data analysis to a notebook. (Look for something to improve as you go through the existing work ๐Ÿ˜ƒ)
  • Familiarize yourself with the following 3 resources:
    * Sippy Repo and Dashboard for an example of metrics and TestGrid data analysis.
    * TestGrid Repo and Dashboard to familiarize yourself with our initial data source.
    * Prow and google cloud storage to see the underlying CI data informing these higher levels of abstraction.

Find place to publicly host data sets

Although the data is public, we will want an additional location to store our interim datasets as we analyze it, as well as keep immutable test/train/validation sets that do not change daily and are independent of the services availability.

This should be done with a Ceph bucket hosted on the MOC-ODH environment.

Acceptance Criteria:

  • Publicly hosted object storage on MOC

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.