GithubHelp home page GithubHelp logo

jelmer / janitor Goto Github PK

View Code? Open in Web Editor NEW
10.0 4.0 6.0 13.2 MB

Platform for making incremental changes to code in VCSes

Home Page: https://jelmer.uk/code/janitor

License: GNU General Public License v2.0

Makefile 0.13% Python 75.30% Shell 0.04% CSS 3.45% HTML 12.38% Sieve 0.02% JavaScript 0.14% PLpgSQL 2.02% Rust 6.52%
debian lintian lintian-brush silver-platter git vcs bzr

janitor's Introduction

This repository contains the setup for a "Janitor" bot. This is basically a platform for managing large-scale automated code improvements on top of silver-platter.

Any code that is not related to the platform but to actually making changes should probably live in either silver-platter, breezy or a specific codemod (such as lintian-brush).

There are currently several instances of the Janitor running. For their configuration, see:

Philosophy

There are some straightforward changes to code that can be made using scripting. The janitor's job is to opportunistically make those changes when it is certain it can do so with a high confidence, and to back off otherwise.

The janitor continuously tries to run changes on the set of repositories it knows about. It tries to be clever about scheduling those operations that are more likely to yield results and be published (i.e. merged or pushed).

Design

The janitor is made up out of multiple components. he janitor is built on top of silver-platter and relies on that project for most of the grunt work.

Several permanently running jobs:

  • the publisher proposes or pushes changes that have been successfully created and built previously, and which can provide VCS diffs
  • the vcs store manages and stores VCS repositories (git, bzr) [optional]
  • the ognibuild dep server is used to resolve missing dependencies
  • the runner processes the queue, kicks off workers for each package and stores the results.
  • one or more workers which are responsible for actual generating and building changes.
  • an archiver that takes care of managing the apt archives and publishes them
  • a site job that renders the web site
  • the differ takes care of running e.g. debdiff or diffoscope between binary runs

Each instance of the janitor should somehow upload "codebase" entries and "candidates", which describe where to find code and what to do with it.

There are no requirements that these jobs run on the same machine, but they are expected to have secure network access to each other.

Every job runs a HTTP server to allow API requests and use of /metrics, for prometheus monitoring.

Workers are fairly naive; they simply run a silver-platter subcommand to create branches and they build the resulting branches. The runner then fetches the results from each run and (if the run was successful) uploads the .debs and optionally proposes a change.

The publisher is responsible for enforcing rate limiting, i.e. making sure that there are no more than X pull requests open per maintainer.

See the various files in devnotes/ for details.

Worker

The actual changes are made by various codemod scripts that implement the silver-platter protocol.

Installation

There are two common ways of deploying a new janitor instance.

  • On top of kubernetes (see the configuration for the Debian & Upstream janitor)
  • Using e.g. ansible and/or a venv

Docker

Several docker images are provided

  • ghcr.io/jelmer/janitor/archive - APT archive generator
  • ghcr.io/jelmer/janitor/differ - diffoscope/debdiff generator
  • ghcr.io/jelmer/janitor/publish - VCS publisher
  • ghcr.io/jelmer/janitor/runner - Queue management & Run handling
  • ghcr.io/jelmer/janitor/site - Example web site & public API
  • ghcr.io/jelmer/janitor/git_store - storage for Git
  • ghcr.io/jelmer/janitor/bzr_store - storage for Bazaar
  • ghcr.io/jelmer/janitor/worker - Base for workers

Contributing

See CONTRIBUTING.md for instructions on e.g. setting up a development environment.

If you're interested in working on adding another campaign for a janitor instance, see adding-a-new-campaign.

Some of us hang out in the #debian-janitor IRC channel on OFTC (irc.oftc.net) or #debian-janitor:matrix.debian.social.

janitor's People

Contributors

baldurmen avatar dependabot[bot] avatar df7cb avatar dkg avatar g0tmi1k avatar guimard avatar jelmer avatar joejoejoejoejoejoejoe avatar mapreri avatar peutch avatar rhertzog avatar venthur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

janitor's Issues

test_suite in setup.py is deprecated

would you mind if i modernize this setup a bit? I'd like to utilize pytest which is pretty standard nowadays. The code for the tests can be then simplified a lot, you don't need the testcases classes anymore but can write simple test_ functions instead and use assert instead of unitest's crufty self.assertX methods.

Trigger rescheduling regularly

Perhaps as soon as a result has been uploaded?

  • On candidate upload
  • On web hook trigger
  • On run finish
  • On codebase refresh

codemods / builders as containers

For practical purposes, it would be great if we could ship codemods as containers. This is tricky though, since it requires running docker-in-docker (or podman-in-podman/podman-in-kubernetes). These are possible in theory but tricky in practice, with the ways that the janitor is currently being run (e.g. in GKE).

Support for MUT (Multiple Upstream Tarball) seems to be broken

Cf https://janitor.debian.net/cupboard/pkg/policykit-1/ -- we can see a lot of failures over there.

Let's look at https://janitor.debian.net/cupboard/pkg/policykit-1/2f64a9d8-be99-487e-af7e-541327338aaa/ for example.

Error is:

Status: unpack-unexpected-local-upstream-changes ()
Description: Tree has local changes: 97 files

There are changes between the upstream tarball and the non-debian/ part of the packaging repository that are not accounted for by any of the patches under debian/patches.

[...]

policykit-1-121+compat0.1/polkit-pkla-compat/test/mocklibc/src/netgroup.h
policykit-1-121+compat0.1/polkit-pkla-compat/test/mocklibc/src/pwd.c
policykit-1-121+compat0.1/polkit-pkla-compat/test/polkitbackendlocalauthoritytest.c
policykit-1-121+compat0.1/polkit-pkla-compat/test/polkitbackendlocalauthorizationstoretest.c
policykit-1-121+compat0.1/polkit-pkla-compat/test/polkittesthelper.c
policykit-1-121+compat0.1/polkit-pkla-compat/test/polkittesthelper.h
dpkg-source: error: aborting due to unexpected upstream changes, see /tmp/policykit-1_121+compat0.1-4~jan+lint1.diff.wSBH8Z
dpkg-source: info: Hint: make sure the version in debian/changelog matches the unpacked source tree
dpkg-source: info: you can integrate the local changes with dpkg-source --commit
E: Failed to package source directory /tmp/janitorwvcrpyl6/build-area/policykit-1-121+compat0.1
brz: ERROR: The build failed.

The "special thing" with this package is the use of Multiple Upstream Tarball, as can be seen in:

This extra component is unpacked in the directory polkit-pkla-compat, and that's what dpkg-source is complaining about. So clearly, something, somewhere, doesn't handle the MUT correctly.

Note that the Janitor page suggests me to run:

debcheckout policykit-1
cd policykit-1
lintian-brush

And that... works fine. The issue can't be reproduced this way.

NB: same error with Kali's Janitor: https://janitor.kali.org/cupboard/pkg/policykit-1/4df2618a-59c3-4eed-8221-a0e91c6cc117/

Thanks!

implement change sets

Todo:

  • Link to changeset from runs
  • Better changeset overview page
  • Look at change set state when publishing
  • Move changeset state changes to postgresql triggers
  • Mark changesets as publishing once that has started
  • Mark changesets as 'done' when they've been fully published
  • add changesets for every candidate and make that field mandatory
  • add publish blocker for change set state

replace terminal color codes with equivalent HTML

�[1;38;2;255;165;0mTo fix, try combinations of the following: �[0m
�[1;38;2;255;165;0m	 •  Add or edit overrides in your config file:�[0m
�[1;38;2;255;165;0m	    debian/debcargo.toml�[0m
�[1;38;2;255;165;0m	 •  Add or edit files in your overlay directory:�[0m
�[1;38;2;255;165;0m	    debian�[0m
Building the package in /tmp/janitorbmql7dec/build-area/rust-erbium-0.2.12-rc2, using sbuild -A -s -v
dpkg-source: error: can't build with source format '3.0 (quilt)': no upstream tarball found at ../rust-erbium_0.2.12-rc2.orig.tar.{bz2,gz,lzma,xz}
E: Failed to package source directory /tmp/janitorbmql7dec/build-area/rust-erbium-0.2.12-rc2
brz: ERROR: The build failed.

Implement new policy

  • migrate publish policy to named_publish_policy, adding reference in policy to publish policy name
  • Split out rate limit buckets
  • Drop policy table

Interactive verification

Have a way of tools saying "please have human verify X (description starts with article)" in web UI (ideally with context), and then rerun with environment variable set indicating those things have been verified (e.g. for Standards-Version checks that can't be automated)

cc @isomer

split out codebase from package

Keep package Debian-specific, but split out codebase with:

  • name (optional, unique)
  • url (unique)
  • vcs_type_hint (git, hg, etc)
  • last seen commit
  • subpath

and with package referencing codebase.

This is another step towards making the janitor less Debian-specific, since the other fields in package aren't relevant for upstream projects.

Support operation without VCS

In some cases we'd like to operate without a VCS to start with. For example, when importing existing packages that are not in a VCS.

Handle source-not-derived-from-target

Some hosters (e.g. gitlab) don't allow creating pull requests when the source repository doesn't derive from the target repository.

This is rare, but can sometimes happen when e.g. the target repository is deleted and created again from scratch.

The janitor should remove the source repository and create it again from scratch in this situation.

support pushing to a derived branch, but in the main repository

See https://salsa.debian.org/jelmer/debian-janitor/-/issues/156 for background

It would be useful to support new publishing modes that function like push_derived and propose, but create a branch in the original repository rather than in the bots repository. Maybe this should be a flag in the publishing policy rather than altogether different modes - "derived_branches_in_origin: true" or something like that.

For example, it could create a "merge-upstream/3.3" branch in the main packaging repository that the maintainer can then easily find and pull into the main packaging branch at their own leisure.

Most of this is straightforward on the janitor/silver-platter side. The main challenge is around garbage collection - we'd presumably want to clean up these branches when they become obsolete.

See jelmer/silver-platter#15 for the matching silver-platter bug report - which is where the core logic should be.

add per run-specific apt repositories

It should be possible to add per-run-specific apt repositories, to make it possible to e.g. just install the fresh-releases version of a single package.

Support shallow cloning for git repositories

For some campaigns, it might be good to do shallow cloning. Others do need more history (e.g. deb-new-upstream), so perhaps we should make this configurable.

I.e. the equivalent of "git clone --depth=1".

This requires depth support in Breezy and Dulwich.

We probably dont want this for all campaigns, e.g. not for deb-import-uncommitted or deb-new-upstream

Git clone is slow

We should do some performance work on Dulwich/breezy-git.

Http operations currently cache entire packs in memory rather than streaming them.

track stage separately from result code

Currently result codes are prefixed by the stage ("build", "worker", etc) in which they occurred. We should split these, for easier processing and prioritization.

This would also allow more granular tracking of the stage.

close merge proposals when value drops below threshold

Currently, the publisher will only open merge proposals for runs that are above a certain value threshold. This is used to e.g. filter out changes that just strip whitespace.

However, when a merge proposal is updated (e.g. when some of the changes have been merged/applied independently) and drops below the value threshold, the merge proposal is kept open. It probably more sense to close it and mark it as either applied (since some of the earlier changes have been applied) or abandoned.

upload VCS results, even when build failed

This is useful for e.g. debugging purposes.

This would require some more logic on the worker side to push the relevant tags ($UUID/$role) but not update the matching branches ($campaign/$role).

support running without initial vcs

Useful for e.g.

  • debianize
  • deb-import-uncommitted for full history

Requires:

  • Add init flag to campaign
  • Runner special cases init
  • Worker special cases init

prevent duplicate work in publisher

In some cases, the publisher gets notified about a new build via different means. In these cases, it can end up trying to create or push the same run multiple times simultaneously.

The outcome of this isn't /terrible/:

  • For pushes, one of the attempts will win the race and push where the others get an error saying the ref was changed while they were pushing.
  • For merge proposals, in most cases one proposal is created slightly before the other and we detect that and abort. In some cases, we create two merge proposals.

It would still be good to prevent the publisher from launching multiple operations on the same run, so as to prevent occasional creation of multiple merge proposals and to simply reduce unnecessary work (for both the janitor and the forge).

automatically file bugs for certain kinds of result codes

.. either against the debian-janitor project or the debian package.

e.g. for janitor devs:

  • worker-failure

for package maintainers:

  • upstream-branch-unknown
  • quilt-refresh-error
  • hosted-on-alioth
  • upstream-pgp-signature-verification-failed

regular IncompleteRead on workers

See e.g. https://janitor.debian.net/cupboard/pkg/pacemaker/eb48a0cd-94b4-46a6-bc40-a159d6ed8f44/ or https://jenkins.debian.net/job/janitor-worker/1033/console:

Started by timer
Running as SYSTEM
[EnvInject] - Loading node environment variables.
Building remotely on osuosl-build167-amd64.debian.net (osuosl167 amd64) in workspace /srv/jenkins/pseudo-hosts/osuosl-build167-amd64/workspace/janitor-worker
[janitor-worker] $ /bin/sh -xe /tmp/jenkins10539654540650077313.sh
+ /srv/jenkins/bin/jenkins_master_wrapper.sh
Opening branch at https://salsa.debian.org/ha-team/pacemaker.git/
Using cached branch https://janitor.debian.net/git/pacemaker/
Resuming from branch https://janitor.debian.net/git/pacemaker/
Elapsed time: 0:05:10.851644
Traceback (most recent call last):
  File "/usr/lib/python3.7/http/client.py", line 554, in _get_chunk_left
    chunk_left = self._read_next_chunk_size()
  File "/usr/lib/python3.7/http/client.py", line 521, in _read_next_chunk_size
    return int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/http/client.py", line 586, in _readinto_chunked
    chunk_left = self._get_chunk_left()
  File "/usr/lib/python3.7/http/client.py", line 556, in _get_chunk_left
    raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/srv/janitor/debian-janitor/janitor/pull_worker.py", line 436, in <module>
    sys.exit(asyncio.run(main()))
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "/srv/janitor/debian-janitor/janitor/pull_worker.py", line 396, in main
    possible_transports=possible_transports))
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/srv/janitor/debian-janitor/janitor/pull_worker.py", line 173, in run_worker
    possible_transports=possible_transports) as (ws, result):
  File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/srv/janitor/debian-janitor/janitor/worker.py", line 864, in process_package
    pick_additional_colocated_branches(main_branch))) as ws:
  File "/srv/janitor/debian-janitor/silver-platter/silver_platter/proposal.py", line 300, in __enter__
    dir=self._dir, path=self._path)
  File "/srv/janitor/debian-janitor/silver-platter/silver_platter/utils.py", line 86, in create_temp_sprout
    raise e
  File "/srv/janitor/debian-janitor/silver-platter/silver_platter/utils.py", line 70, in create_temp_sprout
    stacked=use_stacking)
  File "/srv/janitor/debian-janitor/breezy/breezy/git/dir.py", line 177, in sprout
    mapping=source_branch.mapping)
  File "/srv/janitor/debian-janitor/breezy/breezy/git/interrepo.py", line 780, in fetch_objects
    determine_wants, graphwalker, f.write)
  File "/srv/janitor/debian-janitor/breezy/breezy/git/remote.py", line 451, in fetch_pack
    progress)
  File "/srv/janitor/debian-janitor/dulwich/dulwich/client.py", line 1810, in fetch_pack
    progress)
  File "/srv/janitor/debian-janitor/dulwich/dulwich/client.py", line 747, in _handle_upload_pack_tail
    SIDE_BAND_CHANNEL_PROGRESS: progress}
  File "/srv/janitor/debian-janitor/dulwich/dulwich/client.py", line 530, in _read_side_band64k_data
    for pkt in proto.read_pkt_seq():
  File "/srv/janitor/debian-janitor/dulwich/dulwich/protocol.py", line 269, in read_pkt_seq
    pkt = self.read_pkt_line()
  File "/srv/janitor/debian-janitor/dulwich/dulwich/protocol.py", line 220, in read_pkt_line
    pkt_contents = read(size-4)
  File "/srv/janitor/debian-janitor/breezy/breezy/transport/http/__init__.py", line 1914, in read
    return self._actual.read(amt)
  File "/usr/lib/python3.7/http/client.py", line 457, in read
    n = self.readinto(b)
  File "/usr/lib/python3.7/http/client.py", line 491, in readinto
    return self._readinto_chunked(b)
  File "/usr/lib/python3.7/http/client.py", line 602, in _readinto_chunked
    raise IncompleteRead(bytes(b[0:total_bytes]))
http.client.IncompleteRead: IncompleteRead(4679 bytes read)

verify that expected changes have been made

lintian is run as part of the package build step; the janitor should verify that those issues that were reported as fixed by lintian-brush no longer show up in the lintian output when building the package, and that no new lintian issues show up.

Ideally this would be done in a way that is not specific to lintian-brush changes, but can also be applied to e.g. multi-arch fixes or fresh upstreams.

Cannot get all tests run locally

Hi,

I'd like to contribute a bit to this repo. Step 1 is usually to get the tests running, so I can make sure things work as expected before and after the changes. However, I'm already struggeling with step 1 :) Here's what I'm doing:

# create a venv
python3 -m venv venv
. venv/bin/activate
pip install -U pip setuptools
pip install -e .[dev]

make test

some tests run, but three do not because of this error:


======================================================================
ERROR: test_worker (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_worker
Traceback (most recent call last):
  File "/usr/lib/python3.10/unittest/loader.py", line 154, in loadTestsFromName
    module = __import__(module_name)
  File "/home/venthur/git/janitor/janitor/tests/test_worker.py", line 18, in <module>
    from janitor.worker import bundle_results
  File "/home/venthur/git/janitor/janitor/worker.py", line 56, in <module>
    from silver_platter.workspace import Workspace
  File "/home/venthur/git/janitor/venv/lib/python3.10/site-packages/silver_platter/__init__.py", line 26, in <module>
    import breezy.plugins.debian  # For apt: URL support  # noqa: F401
ModuleNotFoundError: No module named 'breezy.plugins.debian'

I see you do some magic in the makefile to install the plugin (allthough i'm not sure what breezy or the plugin does) but how can i reproduce this environment on my machine?

don't rebuild revisions that exist in the archive

it might make sense to shortcut builds where we're trying to build something that's already in the archive.

one way of implementing this: have janitor.debian.build check the base apt repository for the version, and dget + provide if present - perhaps after verifying the source matches.

(just a thought - it's not clear to me that this is worth the extra complexity)

don't scan all possible proposals every 24h

For the Debian janitor, the number of historical proposals is now so large (>20000) that we don't want to do this so regularly. In addition to that:

  • older proposals that were merged >2y ago are unlikely to change, and it's okay if it takes us more than 24h to notice
  • we do also scrape emails as a way of being notified of changes (not 100% reliable, though)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.