solus-project / ferryd Goto Github PK

Fast, safe and reliable transit for the delivery of software updates to users.

License: Apache License 2.0

Makefile 0.41% Go 99.59%

binary daemon database delivery delta delta-packages eopkg linux linux-distribution manifest monitor needs-more-tags package pool repository repository-management software-update solus transit

ferryd's Introduction

ferryd

Fast, safe and reliable transit for the delivery of software updates to users.

ferryd is a Solus project.

ferryd is the binary repository manager for Solus. In addition to providing basic management for repositories, it is also an asynchronous job-based daemon, processing incoming package uploads from authorised builder machines. ferryd attempts to optimise all operations ahead of time, by caching all metadata required for repository indexes.

The primary goal for ferryd is to provide a daemon that constantly monitors new uploads, and processes them as fast as possible. This ensures new packages are available almost immediately. Complex, long running operations, are run in the background within a dedicated worker pool. This allows new packages to turn up in batch, and the delta packages to be produced lazily. Once those delta packages are available, they're inserted into the main repository (and will appear in the index.)

The design of ferryd allows us to blit a repository index from the database to disk very fast (around 2-3s for a large repository). Special care is taken to only perform atomic updates to the index - meaning no connectivity issues for clients with corrupt or partial indexes. The repository index should always be available, and all published packages should permanently be present.

ferryd takes special care to cache wherever possible, and uses a reference-counted package pool. All package files within each repository are hard-linked from the pool tree, allowing to save disk space through enforced deduplication. As such, a package's ID (the basename of the file) must be unique to a ferryd instance. Putting it all together, this allows us to simply "ref" a package into a repository from the pool, which is used for very rapid clone and pull operations.

ferryd is the replacement for the aging binman.py script previously used by Solus, and is designed to combat the design mistakes of that implementation. Emphasis is placed on speed, scaling, and having packages immediately and permanently available. Less delays for developers, and rapid updates and sync deployment to users.

Lastly, ferryd aims to provide very simple sync abilities to help control deployment of packages to other repositories. An explicit design goal is to enable "Pulling" a repository into an existing repository, which in turn publishes one channel to another. This is used in Solus to control sync-windows from unstable to stable, and is done as a single atomic operation.

Disclaimer regarding the name: Solus has this weird obsession with all things nautical. Oh and birds.

Usage (basic)

Start ferryd to monitor ferryd.sock:

./bin/ferryd -d myRepoBase -s ./ferryd.sock

Create a repo:

./bin/ferryctl -s ./ferryd.sock create-repo testing

Add packages:

./bin/ferryctl -s ./ferryd.sock import testing path/to/eopkgs

License

ferryd is available under the terms of the Apache-2.0 license

ferryd's People

Contributors

Stargazers

Watchers

Forkers

pombredanne jasonmccallister gitalot staudey tubbz-alt aby-holding ionutnechita

ferryd's Issues

attempt conversion to gorm/sqlite

tldr boltdb is really, really bad for concurrency.

Establish a simpler model that gets rid of all the bucket-orientated logic, and rely on many to many relationships:

struct RepoEntry {
    Repository Repository
    Published Package
    Packages []Package ..
    Deltas []Package
}

Basically rip out all the current boltdb crap, start a new subproject to test the data storage.

Spawn async delta creation on pull

The delta repo job is fairly slow as it has to check every package in the repo, and is only really intended to be used after initial import. During a pull operation, track the names in the repo that are being modified, and then spawn a delta job for each one of them on the async queue.

This will greatly reduce the time it takes to delta the repo after sync windows.

Add ability to promote packages (incl deps)

One of the last standing issues with getting packages to users is now security updates outside of the standard repo syncs, or quick fixes to problems not spotted prior to sync e.g. https://dev.solus-project.com/T4575

Seeing the full list of promoted packages before hand would be important, so we can validate the revdeps are safe to push (where the full repo may not be properly tested or in a sync state). It maybe that waiting for a full sync is a better option in some circumstances.
Validating the repo (all packages can be installed, including release numbers), so a partial push doesn't create issues. Whether this is a part of ferryd or elsewhere.

Actually implement this

Long story short I'm pulling me hair out with how long each repo settle is taking. This needs implementing immediately ..

In order of Things To Make It All Work:

Add basic .eopkg add support (cache into pool and repo storage)
Add eopkg-index.xml write support (stable sort + constant time emission)
Add components.xml distribution.xml groups.xml merge support
Add compression and validation files (.xz .sha1sum)
Hook in deltas
Throw the entire repo at it and ensure constant time is still true
Now add the monitor
Profit $$$

PullRepo should not delta

If we do a pull that results in 320 jobs, and average each sync job as ~11s, this is an hour added for no reason. Sure, it might not always index because deltas existing or not needed, but still, optimise it.

Just swap it out for a DeltaJob and then we can manually index it once status shows it done.

Marking it here so i don't actually forget..

Logo in Readme is Broken

it's returning an error, Cannot proxy the given URL.

Rework libdb to return Connection handle

We should have a simple system in place that returns a connection handle for a given duration (i.e. batched functions) and is then later closed. When the last connection is closed, set the connection count to 0, and after some timeout, close the underlying database connection again. This will help to ensure that leveldb memory is reclaimed and ensure recovery, etc, all work properly.

Without this, our memory usage is going to keep growing unbounded.

Use pool caching for deltas

Due to the sync/async/sync -> sync/async changes we have a lot of specialist code within the delta job.
This means we're always creating new deltas - we first need to check if the delta ID exists within the pool, and ref that. Otherwise we can just create and then ref.