GithubHelp home page GithubHelp logo

rht-labs / sre-enablement-content Goto Github PK

View Code? Open in Web Editor NEW
5.0 19.0 3.0 44.53 MB

Home Page: https://rht-labs.com/sre-enablement-content/#/

License: Apache License 2.0

Mustache 2.41% Shell 2.48% Jsonnet 95.11%

sre-enablement-content's Introduction

sre-enablement-content

Exercises

Exercises are created using Docsify. Write docs in Markdown and use Docsify cli to serve them. Store your lab exercises in the docs/<lab-number> dir.

To run and serve the docs:

  1. npm i -g docsify-cli
  2. docsify serve docs and connect to http://localhost:3000

sre-enablement-content's People

Contributors

ckavili avatar eformat avatar jfilipcz avatar tylerauerbeck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sre-enablement-content's Issues

[facilitator guide] Update Deployment Facilitator Guide

๐Ÿ“ Description

After adding steps to introduce problems, update facilitator guide with appropriate information to initiate various problem scenarios (and steps to resolve them)

โœ… A/Cs

  • Update facilitator guide with appropriate steps to initiate problems
  • Update facilitate guide with appropriate steps to resolve problems (in cases where students become stuck)

Related to #7 - Update guide to tell facilitators when to enact the planned problems/incidents

[student guide] - PRR

๐Ÿ“ Description

The purpose of the PRR section is to introduce the concept of a PRR to students and have them walk through building a PRR. What we want to do is start collecting information on what students think are important when running a production system.

This is more or less 90% done, just the below A/Cs left to be completed.

โœ… A/Cs

  • Update student guide and more clearly define / call out(visually) what tasks they should be completing.
  • Clearly defined tasks/exercises

[exercise ] Create/Identify Problem Scenarios For Pet Battle Deployment

๐Ÿ“ Description

We want to begin introducing the capability and mindset of resolving/troubleshooting early in the course. So we want to have at least a handful of scenarios that could cause small problems during the deployment of pet battle to have students begin to look into. We also want to be able to initiate these as facilitators if possible. Current ideas include blocking required images, resource limits, rbac, uploading dog pics, etc.

โœ… A/Cs

  • Identify initial problem scenarios
  • Create manifests/arifacts to put problems into place
  • Ensure problems are repeatable
  • Ensure problems are able to be initiated by facilitator

[facilitator guide] - Facilitator Guide Formatting

๐Ÿ“ Description

Our current goal is to maintain as few disparate areas of documentation as possible. Currently we have notes for our students and facilitators in the same markdown files. What would be useful (and would allow us to prevent confusion between the two groups / prevent us from potentially giving away the answers) would be a way of hiding certain areas of content unless something like a param/flag/path is provided.

Example: The sections labeled "Facilitator Guide" in this page:
https://rht-labs.com/sre-enablement-content/#/1-production-readiness-review/1-before-onboarding-petbattle

โœ… A/Cs

  • Identify whether it is possible to hide certain areas of markdown content unless a certain condition is met utilizing the existing framework that we're using.

[exercise] Create Chaos Testing

๐Ÿ“ Description

Write some chaos testing content to run through some basic experiments and discuss findings within teams.

โœ… A/Cs

  • Learn how to create chaos engineering experiments.
  • Monitor and observe the effects of our experiments on our applications.
  • Have conversations about how to mitigate weaknesses in our design based on the findings.

[facilitator guide] Create facilitator guide for postmortem exercise

๐Ÿ“ Description

We should make sure to have documentation highlighting the key components we want to enfoce within our postmortem exercise

โœ… A/Cs

  • Reiterate the importance of a blameless postmortem.
  • Underling the importance of learning from both the problem itself as well as the way the team reacted to it
  • Highlight the importance of having defined action items coming out of this to address the original problem and events that came about after it occurred.

[student guide ] Create student guide for running Post Mortems

๐Ÿ“ Description

Based off of the problems introduced during the deployment and incident management modules, we will look to run a postmortem to learn what we can from the problems that were encountered. So that the student understands what we are doing, we need to create a student guide to walk them through this exercise

โœ… A/Cs

  • Create user documentation that introduces the idea of postmortem discussions and their purpose
  • Create exercise steps to walk through creating a postmortem template and highlight the importance of the pieces of information you should look to gain from it.

[exercise] Create a postmortem template

๐Ÿ“ Description

The way that a postmortem is run can directly impact the information that is shared there and the resulting actions that are taken. To ensure that one is run smoothly, it can be beneficial to have a template that helps you focus in on what you are trying to achieve. In this exercise, we'll look to build our template and highlight the key areas that are important to a general postmortem

โœ… A/Cs

  • Create building blocks that can be used to put together a postmortem template
  • Identify if there are existing templates in use by other groups at RH (ideally) or other good examples from others out in the wild that we can link to / reference.

Related to #10

[exercise] Create SLA Template

๐Ÿ“ Description

Now that we have a consumer of our application, we want to inspect the differences between crafting an SLO and an SLA. In order to begin understand the types of questions we should be asking, we should go through an exercise of crafting an SLA template.

โœ… A/Cs

  • Create user documentation with examples of concerns that should be included in an SLA
  • Create a template that can be used to establish an SLA between Pet Battle and the Tournament Service
  • Identify whether there are example SLA templates that we can reference (internally/externally)

[exercise] Introduce manual tasks to represent toil after initial application deployment

๐Ÿ“ Description

After the initial deployment of our application, we want to show that not everything has been automated and that there is still some toil that exists in our life. We should come up with a set of tasks that needs to be "hand-rolled" once the rollout of PB completes. The current idea for this could be having to manually upload a base set of cat images to populate Pet Battle (but could include other tasks as well).

โœ… A/Cs

  • Identify manual tasks (i.e. image upload)
  • Show manual process within exercise
  • Identify somewhere to store an initial set of base image (github repo that we can source from repeatedly)
  • Upload base set of cat images
  • Create tasks that would pull image from repository into cluster ( store into PVC)
  • Create tasks that would load images from PVC into application
  • Create documentation showing student how to accomplish these tasks (and reduce this toil)

๐Ÿ“š[exercise] - SLI for Platform

๐Ÿ“ Description

The purpose of Platform SLIs section is to define the right indicators and right data points to track users' happiness about the platform.

โœ… A/Cs

  • Facilitator notes updated (if applicable)
  • Exercise peer reviewed/tested with one other region member
  • Addition of new exercise does not affect previous exercise (maintain modularity)

๐Ÿพ Steps

  • Describe the platform and the capabilities to provide its end-users briefly
  • Introduce a persona and define user journeys for the platform.
  • Select one (important) user journey and map it in the canvas
  • Draw system boundaries and map them to the journey
  • Define the SLIs
  • Select one SLI and define the metrics
  • Create a dashboard & screenshot

[exercise] Create dashboard to track current system experience based on example platform/app SLI and SLO

๐Ÿ“ Description

Now that we've defined an example set of of SLI/SLO, we should look to visualize them. We should look to begin building a dashboard that allows us to see what's going on in the system and make decisions based off of it without having to parse through json/log/etc.

โœ… A/Cs

  • Introduce users to Grafana
  • Show how to pull in various data sources (prometheus)
  • Basic visualization of SLI and SLO's
  • Basic alertmanager configuration to alert when SLO is breached.

[exercise] Introduce deployment of tournament service

๐Ÿ“ Description

In order to have a consumer of our new service, we will look to deploy the tournament service. This will allow us to begin discussing SLA's and the impact of changing priorities on an SRE team.

โœ… A/Cs

  • Update user documentation to show how to update helm chart to deploy and configure tournament service

Related to #16

[exercise] Expand dashboard to visualize Error Budget

๐Ÿ“ Description

After establishing our Error Budget (EB) and Error Budget policy, we should look to further visualize our EB process by adding it to both our dashboards and alerting

โœ… A/Cs

  • Expand our previously created dashboard to add EB information
  • Add simple alerting in order to make team aware of EB violations.

[exercise] Add automation that allows us to force breach of SLO

๐Ÿ“ Description

In order to exercise and experience an SLO being breached, we should have automation (or a trigger of some sort) that allows us to force errors that will cause an SLO to go red.

โœ… A/Cs

  • Based off of the example SLI and SLO that we have created, create a piece of automation that will cause the metrics collected to not match what we're scraping.
  • Facilitator has documentation for how to trigger this automation to inject errors

[video ] - Intro To Pet Battle Product

๐Ÿ“ Description

Currently we just kind of "throw" students into a scenario where we want them to support a thing without much context outside of a high level overview of our story. What might help folks wrap their heads around it more may be something like a fun product video (which could also be used to bring some of the fun that we like to include).

โœ… A/Cs

  • Create a video explaining Pet Battle
  • Call out high level components

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.