GithubHelp home page GithubHelp logo

alexoundos / mirror-nix Goto Github PK

View Code? Open in Web Editor NEW
16.0 3.0 1.0 5.68 MB

mirror Nix binary cache for the offline workflow

License: GNU General Public License v3.0

Haskell 93.12% Nix 4.80% Shell 2.08%
nixos mirroring haskell mirror nix

mirror-nix's Introduction

THE README IS OUTDATED BY A LOT


preface

Prehistory. There is an unsolved issue NixOS/nix#2774 indirectly related to the problems in mirroring nix binary cache due to excessive complexity of the nix program for such a simple task. Meanwhile, I started to address the issue in a separate project. However, the project is not only aimed at binary cache, see below.


Resources here help you to prepare Nix/OS infrastracture for a completely offline (without further network internet connection) workflow. Thus, download binary cache and fixed outputs (i.e. tarballs / sources), serve them in your offline LAN and use.


The fixed output and derivation terms may be mistakenly used throughout this README! Sorry, I still don’t quite understand Nix Store Paths.


Currently, the project contains:

The program is not guaranteed to run in constant space as a whole but for the crucial part – single file download. The binary cache download mechanics do not seem to be deterministic (each step is effectful), based on narinfo metainformation for references (dependencies of store-paths), not nixpkgs. But at least, the initial goal (solving the issue for me) seems to be met.

table of contents

goals

binary cache mirroring

program input

implementation

  • [x] download narinfo for every input store-path
  • [x] recursively download narinfo for every reference (dependency)
    • dependencies are read from narinfo from the field References:
  • [x] download *.nar.xz taken from URL: field of each narinfo
  • [x] validate downloaded *.nar.xz with checksum from narinfo

features

  • [x] resumable downloads
    • Interrupted download should be (unless I’m mistaken) easily resumed. Since all partially downloaded files are stored with a temporary filename and only fully downloaded and valid (correct checksum) files obtain the final filename.

      So, either the file download results in a genuine file, either not. Thus, individual files download is not resumed, but already downloaded valid files are not re-downloaded (nor contents being re-checked).

    • Every time the program is run it traverses the whole narinfo tree but it’s not very time consuming even on NanoPi Neo2).
  • [ ] ensure saved files integrity later by leveraging btrfs snapshots capability
    • At least some obstacles are expected in btrfs snapshots machinery when manipulated as a non-superuser.

fixed outputs mirroring

While binary cache is enough for NixOS installation, now this is the last part, I believe, needed for a completely offline NixOS operation.

Currently, I come to a guess that there are 2 kinds of sources for nix derivations when installing:

  • fixed output derivations (reside in the binary cache)
  • fixed output tarballs (http://tarballs.nixos.org/ contains them, but not only them) which relates to “bootstrapping” of the Nix itself and might not be found in the http://cache.nixos.org/.

Following that, mirroring fixed output derivations from http://cache.nixos.org/ should give everything unless you try to build Nix without Nix. But this can be a faulty assumption, needs proof.

So, the goals are:

  • [-] provide means of getting metainformation, required to perform a download of all fixed outputs (the download can be performed by a separate program)
    • [x] derivations
    • [-] tarballs (see below)
    • [ ] usable on moderate specs machines (peculiar RAM needs!)
  • [x] downloading program, consuming fixed outputs metainformation as input
  • [ ] fully automatic procedure of mirroring, requiring nixpkgs commit hash only (seems to be truly possible having nix tools onboard, but what about standalone application?)

Currently, if we stick only to fixed output tarballs (as opposed to fixed output derivations) we’ll be still missing some of them, because the find-fixed-outputs.nix expression does not find some of tarballs urls. Some packages omit common key set attributes (missing both url and urls) and use manual fetch expressions.

getting fixed output metainformation

This project hosts the nix expression with the helper script for getting the mentioned metainformation, required to download both kinds of fixed outputs.

how to install NixOS when offline using the mirror

# at the end this command produces `./result` symbolic link for nixos-install
$ nix-build -vvv -I nixos-config=/mnt/etc/nixos/configuration.nix '<nixpkgs/nixos>' -A system --option substituters http://$HOST:$PORT/$ENDPOINT

# (for some reason substituters option is still needed here)
nixos-install --option substituters http://$HOST:$PORT/$ENDPOINT --system ./result

Set the HOST, PORT, ENDPOINT variables accordingly to yours.

Also, there could be used the following option:

--option hashed-mirrors http://$HOST:$PORT/$ENDPOINT_FIXED_OUTPUTS

But I’m not sure this is useful for a typical installation.

build instructions

$ stack build

nix-mirror program help

nix-mirror - download nix binary cache and fixed outputs

Usage: nix-mirror [--base-path BASE_PATH] COMMAND

Available options:
  -h,--help                Show this help text
  --base-path BASE_PATH    Base path for mirror contents (unimplemented!).

Available commands:
  binaryCache              Download Nix binary cache given `store-paths` file.
  fixedOutputs             Download Nix fixed outputs given json array of
                           derivations.
Usage: nix-mirror binaryCache [--input-help] --store-paths STORE_PATHS
                              [--conduit-recurse]
  Download Nix binary cache given `store-paths` file.

Available options:
  --input-help             Instructions for obtaining `store-paths` input file.
  --store-paths STORE_PATHS
                           Path to a "store-paths" file (a list of /nix/store/*
                           paths).
  --conduit-recurse        Use `leftover` conduit streaming mechanism for
                           `NarInfo` recursion.
  -h,--help                Show this help text
Usage: nix-mirror fixedOutputs [--input-help] --drvs-json DRVS_JSON_FILE
                               [--dry-run] ([--print-drv] | [--print-hash] |
                               [--print-mode] | [--print-name] | [--print-path]
                               | [--print-hash-type] | [--print-urls])
                               (--derivations | --tarballs)
  Download Nix fixed outputs given json array of derivations.

Available options:
  --input-help             Instructions for obtaining fixed output derivations
                           json input file.
  --drvs-json DRVS_JSON_FILE
                           Path to a json file produced with
                           find-fixed-outputs.nix.
  --dry-run                Do not actually download. Useful in combination with
                           --print-*.
  --print-drv              Print `drv` path (/nix/store/*.drv).
  --print-hash             Print hashes.
  --print-mode             Print mode: `flat` or `recursive`.
  --print-name             Print name of derivations.
  --print-path             Print store path (/nix/store/*).
  --print-hash-type        Print hash type, e.g. `sha1`.
  --print-urls             Print original source urls.
  --derivations            Download fixed output derivations (from
                           cache.nixos.org), targeting at /nix/store/.
  --tarballs               Download the "tarballs" of fixed output derivations,
                           building up a mirror of tarballs.nixos.org.
  -h,--help                Show this help text

reports

aarch64 build

This section may be outdated.

Builds and runs successfully under NixOS, but see caveats below.

As for Raspberry Pi 3 / NanoPi Neo2 building the whole project may take a ton of time (maybe half a month) with ~4 GiB swap provided. There are lots of packages to build as dependencies. Personally I’ve never completed the Cabal dependency build on the real hardware - too few RAM. Based on my experience, it needs at least 4 GiB (rpi3 has only 1 GiB).

So, the solution is to build under qemu virtual machine. It works fine, except the limit of 3 GiB RAM caused by broken AHCI emulation. It takes approximately 2 days to build from scratch. All the built dependencies can be copied from ~/.stack to the real hardware aarch64 machine. So, you are able to build just the source code of the project. But still it takes almost an hour on NanoPi Neo2 with 512 MiB RAM.

downloaded binary cache stats

nixos-19.03.173202.31d476b8797

Git revision: 31d476b87972d8f97d67fd65e74c477b23227434.

  • store paths count: 32187
  • narinfo count: 38634
    • I haven’t checked yet whether these are really all narinfos available for this specific nixpkgs revision
  • nar count: 38093
    • lower than narinfo count because of duplicates, i.e. several narinfos point to the same nar file
  • size
    • on disk (ext4):
      • total: 72263 MiB
      • narinfos: 154 MiB
      • nars: 72109 MiB
    • apparent:
      • total: 72067 MiB
      • narinfos: 36 MiB
      • nars: 72032 MiB
  • approximate time consumed: 30 hours running on NanoPi Neo2 on my 100 Mbit internet.

previously supposed methods of getting fixed output metainformation

instantiate find-tarballs.nix

$ nix-instantiate --readonly-mode --eval --strict --json ./maintainers/scripts/find-tarballs.nix --arg expr 'import ./maintainers/scripts/all-tarballs.nix'
  • Produces a json array, each element of which contains: name, hash, original url, hash type (sha256, sha1, sha512, etc).
  • The way copy-tarballs.pl does.
  • Omits many sources, at least fetchgit. The produced array is a subset of what all-sources.nix produces (if we could get name, hash, url, type from derivations).
  • Uses a ton (~8 GiB) of RAM.

instantiate all-sources.nix

Assuming all-sources.nix is put into ./maintainers/scripts.

$ nix-instantiate --readonly-mode --eval --strict --json ./maintainers/scripts/all-sources.nix --arg expr 'import ./maintainers/scripts/all-tarballs.nix'
  • Produces a json array of fixed outputs. A superset of find-tarballs.nix produces if converted to fixed outputs.
  • Are there any other missing fixed outputs?
  • How to download the files knowing only their fixed output name? nix-store -r?
  • Contains a few duplicates.
  • Uses a ton (~8 GiB) of RAM.
  • Fixed outputs are tarballs itself, not having drv extension.

Assuming find-fixed-outputs.nix is put into ./maintainers/scripts.

$ nix-instantiate --readonly-mode --eval --strict --json ./maintainers/scripts/find-fixed-outputs.nix --arg expr 'import ./maintainers/scripts/all-tarballs.nix'
  • Produces a json array, each element of which contains: name, hash, drv (derivation name), hash type, mode (has two posssible values: flat, recursive).
  • Gives the most number of items out of the supposed methods. Checked that this is a superset of all-sources method! Great thanks to the author!
  • Compared to all-sources.nix, allows to easily download using either hash, or drv for nix-store -r – if you do not want to have downloaded fixed-outputs as a separate entity, but fetch them into your /nix/store.
  • Uses a ton (~8 GiB) of RAM.

Finally, the last method is taken as a current basis for this project.

resources I used to get into this

serve in LAN

For example, with nginx. Nix integration is not worked on yet.

warnings

Please, do not stress the nixos.org servers with excessive load caused by the nix-mirror overuse (when not really needed).

The current README is subject to mistakes and factual inaccuracy.

questions

  • Does the mirror process really benefit from the req package (instead of http-conduit)?
    • advantages: automatic retries, sharing the same connection across requests?
    • disadvantage: req brings twice as more dependencies
  • How to generate store-paths list for a specific nixpkgs commit?
  • All supposed methods for downloading fixed outputs:
    • use all-tarballs.nix as an argument, is it the right way?
    • use a ton (~8 GiB) of RAM, have a try for hnix?

ideas / TODO

  • Get store-paths and download binary cache for aarch64!
  • Accumulate and print number of downloaded derivations references statistics.
  • Download nix-cache-info as the first step of downloading the binary cache.
  • Benchmark naive solution vs Conduit streaming (untested) for recursion.
  • Produce Nar urls and download Nars aheadly of consume (when using conduit).
  • Download narinfos only feature to compute estimated binary cache size.
  • Show download progress.
  • Get an experience with serving offline artifacts bundle for several nixpkgs commits at the same time from a single http endpoint while retaining control of the every bundle (binary cache + fixed outputs). Hard link the files?
  • Implement a program option for checksum recalculation and recheck of downloaded files.
  • Add --verbose flag as alias for some --print- options.
  • Log failed file downloads with maximum info or in their native input format.
  • Compute and print estimated/downloaded size in live.

mirror-nix's People

Contributors

alexoundos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

jhhuh

mirror-nix's Issues

realise and nix copy in parallel

Switching of bottlenecks happens since xz compression takes considerable amount of time. We could realise next nar while compressing the previous. More relevant for fixed output paths.

For example, nix copy /nix/store/4x7f05y3zxkaq74mk44jinnxnpx26yri-MafDb.gnomAD.r2.1.hs37d5_3.8.0.tar.gz means compressing 5.8 GiB with slow lzma/xz. Moreover, it's already compressed with zlib/gzip.

Maybe I should consider switching between compression algorithms when the size is such big or the path is already compressed anyway.

Skip errors

Some NARINFO downloads fail for me. Sometimes mirror-nix can't parse them for some reason, sometimes my content filter doesn't like the look of them.

When a Narinfo fails, mirror-nix stops. I think it should continue downloading the rest.

Is this project maintained? has it been superseded by something else?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.