GithubHelp home page GithubHelp logo

packit / deployment Goto Github PK

View Code? Open in Web Editor NEW
7.0 7.0 24.0 1.44 MB

Ansible playbooks and scripts for deploying packit-service to OpenShift

License: MIT License

Makefile 1.82% Python 23.78% Dockerfile 0.31% Shell 7.90% Jinja 66.19%
ansible hacktoberfest

deployment's Introduction

Packit

Build Status black pre-commit

Packit is a CLI tool that helps developers auto-package upstream projects into Fedora operating system.

You can use packit to continuously build your upstream project in Fedora.

With packit you can create SRPMs, open pull requests in dist-git, submit koji builds and even create bodhi updates, effectively replacing the whole Fedora packaging workflow.


To start using Packit

See our documentation

To start developing Packit

The Contributing Guidelines hosts all information you need to know to contribute to code and documentation, run tests and additional configuration.

Workflows covered by packit

This list contains workflows covered by packit tool and links to the documentation.

Requirements

Packit is written in Python 3 and is supports version 3.9 or later.

Installation

For complete information on how to start using packit, please click here.

User configuration file

User configuration file for packit is described here.

Who is interested

For the up to date list of projects which are using packit, click here.

Logo design

Created by Marián Mrva - @surfer19

deployment's People

Contributors

dhodovsk avatar icewreck avatar jpopelka avatar lachmanfrantisek avatar lbarcziova avatar majamassarini avatar mfocko avatar mmuzila avatar nforro avatar nikromen avatar pre-commit-ci[bot] avatar rishavanand avatar rpitonak avatar sakalosj avatar shreyaspapi avatar softwarefactory-project-zuul[bot] avatar subpop avatar tomastomecek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deployment's Issues

Rename stream to [centos-]stream-source-git

For the Fedora Source-git bot we have fedora-source-git Openshift project and DNS domain.
But the Openshift project and DNS domain for CentOS Stream Source-git bot has only "stream", i.e. stream-[prod|stg] projects in Openshift and [prod|stg].stream.packit.dev DNS domains.

Some time ago we discussed with @csomh that the CentOS Stream Source-git bot projects and DNS domain names should also include the "source-git".

Would be nice to do this together with #380

Start new rollouts when secrets or configuration changes

With Ansible it should be possible to register when a secret or configuration was changed in the cluster. Register these changes when they happen and force a new rollout of the deployments which depend on them.

This should save us the time to start rollouts manually, when all that a make deploy does is to update some secrets or configs.

Configuring Secrets while deploying packit service for development

I see there is a requirement of the following files:

private-key.pem - mentioned in the secrets list in

https://github.com/packit-service/deployment/blob/master/secrets/README.md

This might be a mundane question, but the documentation asks me to generate a private key using a github app.
https://developer.github.com/apps/building-github-apps/authenticating-with-github-apps/#authenticating-as-a-github-app
Do I need to create a new Github app for the same? How exactly am I to proceed regarding creating the private key pem

UX of moving stable branches

Planned:

  • Print warning if init has not been run and exit
  • Add makefile target that could run init if necessary and then script (#162)
  • (refactor) clean up the subprocess calls

Trailing newlines stripped from secret files

When a file, which we put into a k8s Secret, contains a newline at the end of the file, the newline is removed somewhere in the process.
This can cause a problem for example in ssh keys.
This occurs when running make deploy (if you remove the comment suffix from the secrets files)

validation tests: make the time constraints more generous

Last two days, two validation checks failed even though all GitHub check runs were green after some time:

https://github.com/packit/hello-world/runs/6587387284
https://sentry.io/share/issue/3d1ddfb488cd40eaae3550af18f15d00/
https://sentry.io/organizations/red-hat-0p/issues/3297006723/?referrer=alert_email&alert_type=email&alert_timestamp=1653371582691&alert_rule_id=855946&environment=production

Let's make the time limits more generous and aligned with our actual OKRs.

Example: check Basic test case: copr build and test (https://github.com/packit/hello-world/pull/648) failed: These check runs were not completed 20 minutes is too strict.

Move staging projects from auto-stage to auto-prod

Per Hubert's e-mail from 9/29/22, the auto-stage cluster will be de-provisioned after we move our staging projects to the auto-prod cluster.

  1. Create packit-stg, stream-stg and fedora-source-git-stg projects at auto-prod cluster.
  2. Switch the DNS. We probably want to regenerate the TLS certs but I'm not sure.

(For each project)

  1. Update the host in vars/*/stg[_template].yaml
  2. First, deploy only postgres (DEPLOYMENT=stg SERVICE=xyz make deploy TAGS=postgres); copy db data
  3. Deploy the rest of the project (DEPLOYMENT=stg SERVICE=xyz make deploy)
  4. Add the team members to the project. (Developer -> Project -> Project Access)
  5. Scale down the old project

Don't deploy on Saturday

I was just checking production deployment and realized it was deployed on Saturday - can we change the auto-deployment thingy to run on Sunday night and not on Saturday?

Build and push the cron-job images

Create a GitHub workflows which when a change in cron-jobs/*/ is merged to the main branch, builds the corresponding image and pushes it to Quay.io (delete-pvcs, import-images, packit-service-validation).

Additionally: check if jobs pull a new image from Quay.io or not when they are started, so that we know whether a new deployment needs to be triggered, additionally to rebuilding the image.

Fill the repository-cache

This issue just tracks the work and does not require any work on this repository.

  • Locally, prepare (=git clone) a set of repositories we want to include in the cache.
    • kernel
    • + one of our projects to test this workflow
    • [ ] + some other big project(s) Let's start slowly and verify on our projects.
  • Use oc rsync to sync the content to all of the volumes in both of our deployments.
    • For kernel, this is the only solution. For other projects, you can clone the repo from the pod shell.
  • Document the workflow in this repository.
  • [ ] Think about possible automation and tooling when you are working on this. #234

[Spike-ish] Use kubernetes.core.k8s apply=true by default?

The apply parameter docs says:
"apply compares the desired resource definition with the previously supplied resource definition, ignoring properties that are automatically generated. apply works better with Services than force=yes."

I don't quite understand how it's actually different from the default behavior (what are the "properties that are automatically generated"?), but @csomh says it helps when there are properties being removed in the object.

Investigate:

Automate moving of the stable branches

We'd like to automate moving the stable branches and creating a blog post so that the scripts don't need to be run locally.

Definition of done:

  • moving of the stable branches and creation of the weekly packit.dev blog posts are run in a Github action
    • the work that is currently done by running a script locally will be moved to a GitHub action that can be triggered manually in Github UI (=> the script can run non-interactively)
    • the PR with the blog post will be opened automatically by running the action and the content can be modified by a person
  • instructions for the Service Guru role are updated

When this is done, our team will benefit from elimination of human errors during the process.

Failed to patch object: sts.packit-worker, updates to statefulset spec are forbidden

From time to time make deploy fails with

fatal: [localhost]: FAILED! => {"changed": false, "error": 422, "msg": "Failed to patch object: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"StatefulSet.apps \\"packit-worker\\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than \'replicas\', \'template\', and \'updateStrategy\' are forbidden.","reason":"Invalid","details":{"name":"packit-worker","group":"apps","kind":"StatefulSet","causes":[{"reason":"FieldValueForbidden","message":"Forbidden: updates to statefulset spec for fields other than \'replicas\', \'template\', and \'updateStrategy\' are forbidden.","field":"spec"}]},"code":422}\n'", "reason": "Unprocessable Entity", "status": 422}

One has to oc delete sts/packit-worker and re-run again.

Add redis & postgres image streams

On one of the recent architecture meetings, we (cc @praiskup) realized that we don't update our redis & postgres, i.e. we still use the same versions of container images even though they've been meanwhile updated in the external container registry.

TODO:

  • Add image streams for redis & postgres, they will (as the already existing ones) have scheduled: {{ auto_import_images }} so for stg they'd be automatically synchronized with the external image registry and for prod they'd be synchronized manually/periodically by our import-images cron job. Use those image streams in the DeploymentConfigs.
  • Add those image streams into import-images cron job, re-build and redeploy it.
  • Bonus point: since it means redeploying the redis & postres, we can, at the same time, change those from DeploymentConfig to Deployment, which is basically a successor of DC and in general preferred in OCP 4.

packit-service-validation: check both stg and prod

With the more granular split of prod and stg, we can now better pinpoint jobs and /packit[-stg] comments to specific deployments.

At the same time, the validation script did not account for the stg deployment.

This issue tracks work so that the validation script can check both prod and stg (design and implementation are open).

This will give us daily update about the status of the stg deployment - we can be aware of problems in stg faster.

We should be able to define which test cases are triggered against both deployments to not overload the infra.

Please comment if you disagree or have ideas about stg validation.

Refinement notes:

  • create a new cron job
  • make the cron job configurable

Automate usage of `changelog.py`

Our release process is already quite automatic but we can take it even one step further!

The changelog script could be automatically run based on some kind of indication that we want to create a release and a PR for the release could be created automatically so that we would just have to review and merge. Perhaps Github actions could be used for this?

We also need to keep in mind that the not always is the generated output perfect and may need some manual love so the approach must support this flow.

CentOS Stream 9 based base image

The base image, which we use for building

images, is Fedora based.

Tomáš has suggested using CentOS Stream 9 instead so that we don't have to bump the base image (to newer Fedora) so often.

I checked what base image Software Collections use (we use their c9s based redis & postgres) and if I check the right place it's quay.io/centos/centos:stream9.

The aim of this spike is to check whether all the dependencies installed in our (above-mentioned) images are available for c9s as well and what's missing.

Fedora-36/37 and Centos-7 compatible deployment

We need our deployment to be able to run in Fedora-36/37 and Centos-7 (Zuul).
Figure out how to do that, options:

  • have ansible 2 and 5 compatible playbooks
  • install ansible-5 from PyPI in Zuul
  • try ansible-navigator (it runs in a container environment)

To be done before #384

Run validation job on GitLab

Improve the validation script to run on the GitLab as well, namely, we can use the https://gitlab.com/packit-service/hello-world repo.


Part of packit/packit-service#1821 epic.

Provide a script for manipulating with the repository-cache

  • placed in the deployment repository
  • can add a new repository to all of the volumes /for start, we can somehow specify a list of pods/volumes/..)
    • rsync mode for bigger repositories
    • clone in pod mode (optional)

Implementation notes:

  • For workers, the volumes are mounted and accessible from the worker pod.
  • For sandcastle pods, we need to create a new pod that will mount the volume => it blocks the deployment.
  • => We can use a different approach for both or create a new pod each time and scale down the workers.

Have in mind that this will be extended to support updates and be run as a cron job in the future.

Make it easier to create secrets for local deployment

Documentation for how to create secrets for a local deployment is missing.

We know that want to have similar steps to the ones taken when setting up an env in CI (thanks @TomasTomecek!).

Ideally we should try to factor out those tasks in a playbook in this repo, and create a make-target, which would run this playbook, and create all the secrets needed for a local deployment.

Also update the README documenting the above.

CentOS Stream based sclorg postgres & redis images

More aggressive delete-pvcs cron job

Currently the cron job runs every hour and deletes PVCs older than 1 day.
Recently it happened that we hit a noncompute / storage quota and when checking PVCs in packit-prod-sandbox there were about dozen of PVCs no older than 2 hours.
So we should either delete PVCs older than 1-2hours or even better somehow check whether a PVC is used in any running pod and if not, delete it.

Unify openshift cronjob deployment approach.

Current state:

  • delete-pvcs and import-images use own make file
  • validation is deployed via separate target in root make file

The separate target in the root make file is preferred.

Consider incorporating into the global deployment process (make deploy).

Refine alerting setup

We're getting too many alerts from the alert manager and successfully ignoring most of them. Not cool.

Let's refine the existing setup and improve it:

  • send less emails
  • separately configure SLO-related alerts
  • think about configuration for failing tasks
    • especially document how are these meant to be processed

Store the non-secret Packit service configuration options in vars/

Currently, the Packit service configuration contains both secrets and real configuration options. This causes that each time we want to update the real configuration options, we need to interact with Bitwarden. To avoid this, store these options in this repo publicly (probably in vars/) and inject the secrets to the config during the deployment.
TODO:

  • move the real configuration options from Bitwarden to vars in this repo
  • inject the secret values into the Packit service config during the deployment

Remove centosmsg

tl;dr; revert #71

centosmsg deployment was added when we started working on CentOS Stream.
Back then, repositories were in git.centos.org Pagure, which exposes events to mqtt.git.centos.org, which the centosmsg listens to.
Since then the repositories have been moved to Gitlab and we don't serve the git.centos.org repos anymore.

Note, we still use the centosmsg in dist2src service, but it has its own deployment.

Bitwarden CLI broken on Fedora Linux 36

I noticed only today, that the bw get item secrets-tls-certs is not working on Fedora Linux 36. This makes download_secrets.sh unusable.

$ bw --version
1.22.1

The above seems to be happening, b/c bw fails to decrypt the name and notes fields of secrets shared within an organization (this does not happen to secrets of my own). When I get a secret by ID, I see lines like:

...
  "name": "[error: cannot decrypt]",
  "notes": "[error: cannot decrypt]",
...

Indeed, I can run bw get item '[error: cannot decrypt]', which will complain that there are more than one items with this name:

$ bw get item '[error: cannot decrypt]'
More than one result was found. Try getting a specific object by `id` instead. The following objects were found:
...

bitwarden/clients#2726 is just about this issue, and I made a note that it's also happening on Fedora Linux 36.

The issue might be that these newer distros switched to use openssl3, but I don't know enough about crypto to be able to figure out a solution for this. Theoretically openssl1.1 is still present in Fedora to serve as a compatibility layer, but for some reason it doesn't seem to work.

Until then the solution is to run deployments from a container environment which is running Fedora Linux 35.

Provide a clear list of tools, services and processes for investigation

You're a Chief of Monitors. Something goes wrong. Panic 😨😱. "How do I find out what's wrong?"

TODO:

  • Create a new document with a complete list of
    • tools
    • services
    • people
    • that can help you with an investigation of alerts and sentry events

It's fine to provide only links to other documents or documentation sites.

SPIKE: Auto-scalling of workers

Take a look at the possible ways how to scale our workers automatically, when needed and provide the findings as a document in the research repo.

TODO:

  • describe the possible triggers for the scaling (and how to get such info)
  • propose options we have for the implementation
  • discuss the findings with the team and agree on a solution we will implement
  • create follow-up cards (=issues) to implement it

changelog generation: markdown is lost

This is a bit frustrating that GitHub strips markdown formatting in the commit metadata in a merge commit.

We should instead parse description of a PR to get the changelog entry instead of the merge commit.

make `move-stable.py` and `changelog.py` friends

I know that you cannot make friends forcefully, but with scripts... we can! :)

Seriously now: let's integrate changelog.py script with move-stable so that moving stable branches produces a blog-post-ready changelog for you.

move-stable should still print git-log of the changes that are being pushed to stable. The ask here is to produce a complete changelog after all repos are updated.

Research deployment methodologies

Research different ways we could deploy our projects. Focus on time-saving, manual steps, rolling back to previous versions, time-to-deploy a change.

Talk to other teams in our organization and check how they are doing, esp. those working with the SRE team.

Definition of done: we have data to make a decision about the way we deploy our services and can make a commitment to change it.

Fix the deployment tests

Deploying on the cluster started failing, b/c images couldn't be pulled from the image streams. No clue why.

Until fixed 71f67c0 disabled the tests.

Figure out why the failure occurred, fix it, and re-enable the deployment test.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.