GithubHelp home page GithubHelp logo

minimization / content-resolver Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 25.0 11.05 MB

Reporting and notifications regarding dependencies and sizes of Fedora-based workloads.

Home Page: https://tiny.distro.builders

License: MIT License

Dockerfile 0.03% Python 63.01% HTML 35.99% Shell 0.29% CSS 0.68%
linux rpm fedora minimization

content-resolver's Introduction

Content Resolver

Content Resolver makes it easy to define and inspect package sets of RPM-based Linux distribution.

You define what packages you need, and Content Resolver gives you the whole picture including all the dependencies. And it keeps it up-to-date as packages get updated over time.

This works across a wide range of environments. It will handle anything from a single person wanting to see how big a web server installation is up to dozens of independent teams comprised of hundreds of people defining an entire Linux distribution, including the entire set of build dependencies and their ownership recommendation.

Content Resolver also helps with minimisation efforts by showing detailed dependency information, flagging certain packages as unwanted, and more.

See it live! (https://tiny.distro.builders)

Using Content Resolver

Controlling Content Resolver

Content Resolver is entirely controlled by a set of YAML files stored in a git repository. Changes to the content can easily be modified and reviewed via pull requests.

See Fedora's input repository

Main concepts

Workloads and Environments

Workloads are the primary reason for Content Resolver's existence, representing a package set with a specific purpose — an application, a runtime, a set of dependencies for a specific use case, etc.

Workloads consist of two parts:

  • Required Packages — defined by the user
  • Dependencies — resolved by Content Resolver

Workloads are resolved on top of Environments which are meant to represent environments in which workloads would typically run — container base images, cloud images, or even desktop installations.

Similar to workloads, environments consist of the same two parts:

  • Required Packages — defined by the user
  • Dependencies — resolved by Content Resolver

Environments are fully resolved first, and workloads are then resolved on top of them.

Repositories are the sources of packages Content Resolver uses to resolve Workloads and Environments.

Finally, everything is connected by Labels. Workloads are resolved on top of environments with the same label, using repositories with (you guessed it!) the same label.

Views

Views

Views are an advanced concept that combines multiple workloads into a single view.

Views can serve many purposes, but the primary reason for their existence is Fedora ELN. Multiple contributors are able to define a subset of Fedora Rawhide to be rebuilt with Enterprise Linux flags.

For this, Content Resolver can do additional things with Views:

  • resolve build dependencies
  • show detailed reasons behind the presence of every package
  • make recommendations for the primary maintainer of shared dependencies
  • mark and track unwanted packages

Build Dependencies (draft)

Content Resolver can resolve build dependencies for views, showing what needs to be in the buildroot in order to rebuild the entire view, including all the build dependencies themselves.

This way, the entire set (required packages + runtime dependencies + build dependencies) is self-hosting (capable of rebuilding itself).

Build dependencies are currently resolved in an external service (dep-tracker) because every SRPM (source package) needs to be rebuilt for all architectures. That's necessary because dependencies can vary across different architectures. And unlike with RPMs (binary packages), SRPMs are not distributed per-architecture, as their primary use case is to distribute the sources rather than provide dependency information.

Current limitations: Because the build dependencies are resolved by an external service, the results might lag as much as several hours behind the runtime views. There is also currently no distinction between direct build dependencies and dependencies of build dependencies in on the package details page.

There's work being done (#27) to change the way build dependencies are resolved (using data from Koji rather than the SRPMs themselves) that will remove the current limitations.

Reasons of Presence

To make minimisation of a set defined by a view easier, each package gets its own detail page.

This page includes information about what workloads pull it in and who owns that workload, as well as package-level dependencies, both runtime and build.

Unwanted Packages

Another way to make minimisation easier is the ability to track unwanted packages.

Maintainers can flag certain packages as unwanted. That causes these packages to be highlighted (if they're present in the set), and they are also listed on dedicated page.

In combination with Reasons of Presence, this helps maintainers identify what dependencies need to be cut in order to remove the unwanted packages.

Current limitations: There's currently only a single level of unwanted packages. But there's work being done (#28) to allow additional levels, such as a distinction between "unwanted in the runtime set" vs. "unwanted in buildroot as well".

Maintainer recommendation

Because views are defined by multiple parties, it's not always clear who should be responsible for owning shared dependencies.

That's why Content Resolver helps identify who pulls each package into the set "the most" and recommends ownership based on that.

Current limitations: Content Resolver doesn't look at anything else, just the dependencies and who pulled it in. But parties can volunteer to own a package. In that case, Content Resolver should be able to see that and show that instead. There's work being done (#29) that will allow maintainers to accept a package, allowing Content Resolver to use this information when recommending owners.

Running Content Resolver

Content Resolver is designed to be highly available without needing the operator to run critical infrastructure.

The input is consumed from a git repository. This allows the use of existing services such as GitHub.

The output is just a set of static HTML pages. This allows it to be served from CDNs using services such as AWS S3.

Lastly, there is a script that reads the input and regenerates the output. This can be run as a periodic job on any appropriate system. Notably, this script only refreshes the output, meaning that it doesn't need to be kept running for the service to be available and that upgrades and maintenance of the Content Resolver won't result in down-time of the resolved views.

Please use the refresh.sh script as a reference for deployment.

Contributing to Content Resolver (TBD)

Developer preview

If you want to contribute and test your changes, run the feedback_pipeline.py script with test configs in the test_configs directory.

To run the script, you'll need Python 3 and the following dependencies:

  • yaml
  • jinja2

Option 1: on Fedora in a container

$ podman build . -t content-resolver-env
$ podman run --rm -it --cap-add CAP_SYS_CHROOT --tmpfs /dnf_cachedir -v $(pwd):/workspace:z content-resolver-env bash

... which starts a shell in the container. And inside the container:

# mkdir -p output/history
# ./feedback_pipeline.py --dev-buildroot --dnf-cache-dir /dnf_cachedir test_configs output

The output will be generated in the output directory. Open the output/index.html in your web browser of choice to see the result.

Option 2: on a Mac using Docker:

$ docker build . -t content-resolver-env
$ docker run --rm -it --tmpfs /dnf_cachedir -v $(pwd):/workspace content-resolver-env bash

... which starts a shell in the container. And inside the container:

# mkdir -p output/history
# ./feedback_pipeline.py --dev-buildroot --dnf-cache-dir /dnf_cachedir test_configs output

The output will be generated in the output directory. Open the output/index.html in your web browser of choice to see the result.

content-resolver's People

Contributors

adamsaleh avatar adamwill avatar asamalik avatar pablomh avatar regexowl avatar sgallagher avatar stickster avatar tdawson avatar uchu-kitagawa avatar voxik avatar yselkowitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

content-resolver's Issues

RFE: machine readbale dependency graph

Could I please get the dependency graph in a machine-readable form, preferably a single json file?

Clicking in the web app to explore a complex dep tree ultimately leads me to loops and dead ends. I basically get lost. I need the data, so I can visualize my own subgraph of what I need to see.

What needs to be in the data:

  • components/source packages
  • binary packages
  • binary packages mapped to source packages (edges from binary packages to source packages)
  • requires (edges from binary packages to binary packages)
  • buildrequires (edges from source packages to binary packages)
  • binary packages and/or components mapped to workloads
  • information about unwanted-ness for each binary package and/or component

Thanks.

Reports: make output chronological

When comparing images, it's easier to read the evolution of image changes if the columns move left to right, i.e. F30 --> F31 --> Rawhide.

Implement the "Use cases by releases" reports

The Use cases by releases view is empty now!

It needs to list all the use cases, and for each to offer reports comparing the use case on all Fedora releases. Because there might be multiple bases each use case is install on, it needs to offer potentially multiple reports per use case — each for one base.

So the items on the page linked above would look like this:


Apache HTTP Server

compared between all releases.
Fedora Container Base | Empty base


And the reports something like this:

All packages in this report Fedora 30 Fedora 31
pkg1 pkg1 pkg1
pkg2 - pkg2
... ... ...

As in the Use cases on bases reports, the required packages could be indicated in the table as well.

The Why column says (dependency) even if the package was pulled via group

When packages are listed explicitly, for example

  packages:
  - httpd

in https://github.com/minimization/content-resolver-input/blob/master/configs/httpd-no-weak-deps.yaml, the package is then listed as (required) in the output, for example at https://tiny.distro.builders/workload--httpd-no-weak-deps--fedora-container-base--repository-fedora-rawhide--x86_64.html.

However, when packages are listed via groups like

  groups:
  - core

in https://github.com/minimization/content-resolver-input/blob/master/configs/sst_security_readiness-core-weak.yaml, the packages installed because they are explicitly listed in the groups are not distinguished from their dependencies, they are all listed as (dependency), see https://tiny.distro.builders/workload--sst_security_readiness-core-weak--fedora-empty-base--repository-fedora-eln--x86_64.html.

The support for groups was added via #18 but it seems like the presentation of results should be amended as well, to make it clear in the output which packages are listed part of the groups (potentially also distinguishing mandatory and default ones) and which packages are dependencies.

RFE: Send automated notifications to "soon-next-in-line" maintainers

Several teams would appreciate being notified they were next in line (or soon to be) for maintaining a package whose dependency another team (so far the maintainer) had dropped. This would ideally be achieved via some automated means based on CR data.

For example: team A is the 3rd in the line of succession to maintain a package. Team Z is the current maintainer and has dropped the dependency. Accordingly, team A is now the 2nd in the line of succession for maintaining the package. Team Y is the likely new maintainer as soon as team Z negotiates it (which probably was already underway).

As such, a notification should be sent to team A alerting it of the new status quo, including the possibility of team Y also looking forward to dropping its dependency and thus request team A to own the package. Team A would then be advised to prepare for that possible state, which could encourage a possible elimination of the dependency as well.

Buildroot dependency resolution 2.0

Build dependencies are currently resolved in an external service (dep-tracker because every SRPM (source package) needs to be rebuilt for all architectures. That's necessary because dependencies can vary across different architectures. And unlike with RPMs (binary packages), SRPMs are not distributed per-architecture, as their primary use case is to distribute the sources rather than provide dependency information.

Because the build dependencies are resolved by an external service , the results might be lagging even several hours. There's also currently no distinction between direct build dependencies and dependencies of build dependencies in on the package details page.

So I want to figure out how to optimise this situation.

View of packages dependent on unwanted packages by maintainer

For minimisation one of the main tasks is identifying where my team has dependencies on packages which other teams have marked "unwanted".

I can't see an easy way to do this today, am I missing something?

I have to iterate through each package in https://tiny.distro.builders/view-unwanted--view-eln.html and check whether one of my team's packages is listed.

Ideally I want a view like https://tiny.distro.builders/view-rpm--view-eln--libdb.html but doing the inverse - per maintainer / team:

  • a list of packages in my workloads which runtime-require packages marked unwanted
  • same for weak deps
  • same for build deps

Content resolver is treating non-default module streams as if they were enabled

As seen in https://tiny.distro.builders/workload--sst_cs_infra_services-subversion-eln--fedora-empty-base--repository-fedora-eln--aarch64.html

 Problem 1: conflicting requests
  - nothing provides python(abi) = 3.9 needed by python3-subversion-1.14.0-6.module_eln+13718+74d85072.aarch64
 Problem 2: package subversion-perl-1.14.0-6.module_eln+13718+74d85072.aarch64 requires perl(:MODULE_COMPAT_5.32.1), but none of the providers can be installed
  - package subversion-perl-1.14.0-6.module_eln+13718+74d85072.aarch64 requires libperl.so.5.32()(64bit), but none of the providers can be installed
  - conflicting requests
  - package perl-libs-4:5.32.1-477.module_eln+13903+73720ee2.aarch64 is filtered out by modular filtering

Unwanted packages 2.0

Currently there's just a single level of unwanted packages.

Implement multiple levels of unwanted packages:

  • unwanted in the runtime set
  • unwanted completely (runtime + build)

And also have a distinction between

  • unwanted in the set
  • unwanted only by a specific maintainer (they don't want to own the package, but are OK if someone else does)

This must not over-complicate the user experience. So I need to come up with a simple and elegant solution.

Include plain package lists in the output to have history

Pushing package lists and the overall size as a plain text into git will give us a nice view into the history — we'll be able to see the package and size difference between any points in time.

And thanks to Pagure or Github, we'll get a free web UI for that!

Since all the results are already pushed to the reports repo, all that needs to happen is to output those lists.

The format could be as simple as:

< Use case name >
on < Base name >
42 MB
---
pkg1
pkg2
pkg3

RFE: a way to deal with alternatives

There doesn't appear to be any way to select an "alternative" in a workload. e.g. iptables could be iptables-nft or iptables-legacy. How do I select an alternative?

[root@vmhost-fedora-test1 ~]# alternatives --list
[..]
iptables                auto    /usr/sbin/iptables-legacy
ebtables                auto    /usr/sbin/ebtables-legacy
arptables               auto    /usr/sbin/arptables-nft

Get RPM data without installing

In the backend, the script first installs all bases and use cases into directories and then queries them to get the data it needs.

That's quite inefficient and takes a long time (but I knew how to do that!).

So we need to find a way to get the same data without the installation step. I'm sure it's possible and it might be even quite easy.

Graph grows vertically

Whenever I display page like https://tiny.distro.builders/workload-overview--httpd-no-weak-deps--repository-fedora-rawhide.html on my Firefox under XFCE, the canvas with the graph gradually grows vertically, keeping CPU quite busy.

Looking at https://www.chartjs.org/docs/latest/general/responsive.html#important-note, the div around should be with position: relative. I am able to stop the growth when I put say style="position: relative; height: 80vh" (in Firefox web developer editor) to the div class="card-body". However, I have no idea what value of height I should pick for a potential pull request.

The Important note also suggest that another way of fixing the problem is disabling the resize altogether, with `maintainAspectRation: true'. But I have no idea what that change would break and how important the resizing is for Content Resolver's operation.

I don't know what causes that issue to manifest itself on my Firefoxes and how to prevent it on my side, while I understand it does not happen for others. However, since the Chart.js documentation says "this method requires the container to be relatively positioned", I believe Content Resolver should go with that recommendation/requirement anyway.

Provice an API that maintainers can consume on the CLI

Already partially addressed by the JSON payloads that can be retrieved in some views.
It wold be also useful to dynamically retrieve the recommendations that Content Resolver issues, based on the multikple SSTs that require a package at different levels of dependency

Add the ability to specify accounting for weak dependencies on a per workload basis

Some workloads (like KDE Plasma Desktop) have numerous weak dependencies that lead to a fully functional experience. These weak dependencies are declared as weak dependencies as a compromise, not because the intended desktop experience is to go without them.

It would be great if this intention can be codified in the workloads being tracked by the content resolver, so that we can have weak dependencies included (obviously denoted specially in the graph and charts) so that we can be sure we've captured everything.

Maintainer Recommendation 2.0

Content Resolver can recommend maintainers of shared dependencies in views.

However, it only takes the workload definitions and the dependency information into account. It doesn't allow maintainers to arbitrarily volunteer to own something (like a Python maintainer offering to own some Python-related packages, without necessarily pulling them in themselves).

I want to add a functionality to allow maintainers to explicitly maintain a package. This needs to then influence the dependency recommendations, too.

Minimise the HTML output

There's a lot of white space in the HTML output, get rid of that.

This should help some of the larger lists load faster.

Implement CI for this repo - initial setup

My goal is to make Content Resolver easier to contribute by others. And because it's a fairly complex thing, people need to have confidence that their changes didn't break something. Especially now when one of the things I want this project to focus on is performance improvements (which will involve concurency and other fun stuff).

So let's set up a CI and a set of tests for the core functionality, to make sure our changes don't break Content Resolver.

The initial setup (this issue) is about:

  • Setting up a CI for this repo using GitHub Actions (needs to run in a Fedora container)
  • A basic test that just runs Content Resolver with the test configs on every PR

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.