GithubHelp home page GithubHelp logo

serokell / deploy-rs Goto Github PK

View Code? Open in Web Editor NEW
1.2K 27.0 100.0 487 KB

A simple multi-profile Nix-flake deploy tool.

License: Other

Nix 16.06% Rust 83.84% Shell 0.10%
nix nixos deployment nix-flake flakes tool

deploy-rs's Introduction

deploy-rs logo


A Simple, multi-profile Nix-flake deploy tool.

Questions? Need help? Join us on Matrix: #deploy-rs:matrix.org

Usage

Basic usage: deploy [options] <flake>.

Using this method all profiles specified in the given <flake> will be deployed (taking into account the profilesOrder).

Optionally the flake can be constrained to deploy just a single node (my-flake#my-node) or a profile (my-flake#my-node.my-profile).

If your profile or node name has a . in it, simply wrap it in quotes, and the flake path in quotes (to avoid shell escaping), for example 'my-flake#"myserver.com".system'.

Any "extra" arguments will be passed into the Nix calls, so for instance to deploy an impure profile, you may use deploy . -- --impure (note the explicit flake path is necessary for doing this).

You can try out this tool easily with nix run:

  • nix run github:serokell/deploy-rs your-flake

If you want to deploy multiple flakes or a subset of profiles with one invocation, instead of calling deploy <flake> you can issue deploy --targets <flake> [<flake> ...] where <flake> is supposed to take the same format as discussed before.

Running in this mode, if any of the deploys fails, the deploy will be aborted and all successful deploys rolled back. --rollback-succeeded false can be used to override this behavior, otherwise the auto-rollback argument takes precedent.

If you require a signing key to push closures to your server, specify the path to it in the LOCAL_KEY environment variable.

Check out deploy --help for CLI flags! Remember to check there before making one-time changes to things like hostnames.

There is also an activate binary though this should be ignored, it is only used internally (on the deployed system) and for testing/hacking purposes.

Ideas

deploy-rs is a simple Rust program that will take a Nix flake and use it to deploy any of your defined profiles to your nodes. This is strongly based off of serokell/deploy, designed to replace it and expand upon it.

Multi-profile

This type of design (as opposed to more traditional tools like NixOps or morph) allows for lesser-privileged deployments, and the ability to update different things independently of each other. You can deploy any type of profile to any user, not just a NixOS profile to root.

Magic Rollback

There is a built-in feature to prevent you making changes that might render your machine unconnectable or unusuable, which works by connecting to the machine after profile activation to confirm the machine is still available, and instructing the target node to automatically roll back if it is not confirmed. If you do not disable magicRollback in your configuration (see later sections) or with the CLI flag, you will be unable to make changes to the system which will affect you connecting to it (changing SSH port, changing your IP, etc).

API

Overall usage

deploy-rs is designed to be used with Nix flakes. There is a Flake-less mode of operation which will automatically be used if your available Nix version does not support flakes, however you will likely want to use a flake anyway, just with flake-compat (see this wiki page) for usage).

deploy-rs also outputs a lib attribute, with tools used to make your definitions simpler and safer, including deploy-rs.lib.${system}.activate (see later section "Profile"), and deploy-rs.lib.${system}.deployChecks which will let nix flake check ensure your deployment is defined correctly.

There are full working deploy-rs Nix expressions in the examples folder, and there is a JSON schema here which is used internally by the deployChecks mentioned above to validate your expressions.

A basic example of a flake that works with deploy-rs and deploys a simple NixOS configuration could look like this

{
  description = "Deployment for my server cluster";

  # For accessing `deploy-rs`'s utility Nix functions
  inputs.deploy-rs.url = "github:serokell/deploy-rs";

  outputs = { self, nixpkgs, deploy-rs }: {
    nixosConfigurations.some-random-system = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [ ./some-random-system/configuration.nix ];
    };

    deploy.nodes.some-random-system = {
        hostname = "some-random-system";
        profiles.system = {
          user = "root";
          path = deploy-rs.lib.x86_64-linux.activate.nixos self.nixosConfigurations.some-random-system;
        };
    };

    # This is highly advised, and will prevent many possible mistakes
    checks = builtins.mapAttrs (system: deployLib: deployLib.deployChecks self.deploy) deploy-rs.lib;
  };
}

In the above configuration, deploy-rs is built from the flake, not from nixpkgs. To take advantage of the nixpkgs binary cache, the deploy-rs package can be overwritten in an overlay:

{
  # ...
  outputs = { self, nixpkgs, deploy-rs }: let
    system = "x86_64-linux";
    # Unmodified nixpkgs
    pkgs = import nixpkgs { inherit system; };
    # nixpkgs with deploy-rs overlay but force the nixpkgs package
    deployPkgs = import nixpkgs {
      inherit system;
      overlays = [
        deploy-rs.overlay # or deploy-rs.overlays.default
        (self: super: { deploy-rs = { inherit (pkgs) deploy-rs; lib = super.deploy-rs.lib; }; })
      ];
    };
  in {
    # ...
    deploy.nodes.some-random-system.profiles.system = {
        user = "root";
        path = deployPkgs.deploy-rs.lib.activate.nixos self.nixosConfigurations.some-random-system;
    };
  };
}

Profile

This is the core of how deploy-rs was designed, any number of these can run on a node, as any user (see further down for specifying user information). If you want to mimic the behaviour of traditional tools like NixOps or Morph, try just defining one profile called system, as root, containing a nixosSystem, and you can even similarly use home-manager on any non-privileged user.

{
  # A derivation containing your required software, and a script to activate it in `${path}/deploy-rs-activate`
  # For ease of use, `deploy-rs` provides a function to easily add the required activation script to any derivation
  # Both the working directory and `$PROFILE` will point to `profilePath`
  path = deploy-rs.lib.x86_64-linux.activate.custom pkgs.hello "./bin/hello";

  # An optional path to where your profile should be installed to, this is useful if you want to use a common profile name across multiple users, but would have conflicts in your node's profile list.
  # This will default to `"/nix/var/nix/profiles/system` if `user` is `root` and profile name is `system`,
  # `/nix/var/nix/profiles/per-user/root/$PROFILE_NAME` if profile name is different.
  # For non-root profiles will default to /nix/var/nix/profiles/per-user/$USER/$PROFILE_NAME if `/nix/var/nix/profiles/per-user/$USER` already exists,
  # and `${XDG_STATE_HOME:-$HOME/.local/state}/nix/profiles/$PROFILE_NAME` otherwise.
  profilePath = "/home/someuser/.local/state/nix/profiles/someprofile";

  # ...generic options... (see lower section)
}

Node

This defines a single node/server, and the profiles you intend it to run.

{
  # The hostname of your server. Can be overridden at invocation time with a flag.
  hostname = "my.server.gov";

  # An optional list containing the order you want profiles to be deployed.
  # This will take effect whenever you run `deploy` without specifying a profile, causing it to deploy every profile automatically.
  # Any profiles not in this list will still be deployed (in an arbitrary order) after those which are listed
  profilesOrder = [ "something" "system" ];

  profiles = {
    # Definition format shown above
    system = {};
    something = {};
  };

  # ...generic options... (see lower section)
}

Deploy

This is the top level attribute containing all of the options for this tool

{
  nodes = {
    # Definition format shown above
    my-node = {};
    another-node = {};
  };

  # ...generic options... (see lower section)
}

Generic options

This is a set of options that can be put in any of the above definitions, with the priority being profile > node > deploy

{
  # This is the user that deploy-rs will use when connecting.
  # This will default to your own username if not specified anywhere
  sshUser = "admin";

  # This is the user that the profile will be deployed to (will use sudo if not the same as above).
  # If `sshUser` is specified, this will be the default (though it will _not_ default to your own username)
  user = "root";

  # Which sudo command to use. Must accept at least two arguments:
  # the user name to execute commands as and the rest is the command to execute
  # This will default to "sudo -u" if not specified anywhere.
  sudo = "doas -u";

  # This is an optional list of arguments that will be passed to SSH.
  sshOpts = [ "-p" "2121" ];

  # Fast connection to the node. If this is true, copy the whole closure instead of letting the node substitute.
  # This defaults to `false`
  fastConnection = false;

  # If the previous profile should be re-activated if activation fails.
  # This defaults to `true`
  autoRollback = true;

  # See the earlier section about Magic Rollback for more information.
  # This defaults to `true`
  magicRollback = true;

  # The path which deploy-rs will use for temporary files, this is currently only used by `magicRollback` to create an inotify watcher in for confirmations
  # If not specified, this will default to `/tmp`
  # (if `magicRollback` is in use, this _must_ be writable by `user`)
  tempPath = "/home/someuser/.deploy-rs";

  # Build the derivation on the target system.
  # Will also fetch all external dependencies from the target system's substituters.
  # This default to `false`
  remoteBuild = true;

  # Timeout for profile activation.
  # This defaults to 240 seconds.
  activationTimeout = 600;

  # Timeout for profile activation confirmation.
  # This defaults to 30 seconds.
  confirmTimeout = 60;
}

Some of these options can be provided during deploy invocation to override default values or values provided in your flake, see deploy --help.

About Serokell

deploy-rs is maintained and funded with ❤️ by Serokell. The names and logo for Serokell are trademark of Serokell OÜ.

We love open source software! See our other projects or hire us to design, develop and grow your idea!

deploy-rs's People

Contributors

06kellyjac avatar anillc avatar antifuchs avatar atry avatar balsoft avatar blaggacao avatar derekmahar avatar emanueljg avatar eyeinsky avatar fd avatar flakebi avatar ivarmedi avatar lovesegfault avatar ma27 avatar notgne2 avatar nrdxp avatar philtaken avatar pjjw avatar rvem avatar rycee avatar sereja313 avatar serokell-bot avatar stevenroose avatar stuebinm avatar supersandro2000 avatar talw avatar wigust avatar xvello avatar ysndr avatar zhenyavinogradov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deploy-rs's Issues

Try to limit the number of SSH connections

When someone uses a Yubikey to for their SSH key or when someone doesn't use ssh-agent, there are currently 4 distinct SSH connections, one in a first step and then 3 in a row:

🚀 ℹ️ [deploy] [INFO] Building profile `system` for node `XX`
warning: Git tree '/home/steven/projects/XX' is dirty
Enter passphrase for key '/home/steven/.ssh/XXXX': 
🚀 ℹ️ [deploy] [INFO] Activating profile `system` for node `XX`
🚀 ℹ️ [deploy] [INFO] Creating activation waiter
Enter passphrase for key '/home/steven/.ssh/XXXX': 
Enter passphrase for key '/home/steven/.ssh/XXXX': 
Enter passphrase for key '/home/steven/.ssh/XXXX': 
👀 ℹ️ [wait] [INFO] Waiting for confirmation event...

It might be the case that the latter 3 could be bundled into a single SSH command.

Don't quit until SSH connections are closed

The current behaviour allows deploy to quit due to errors while leaving SSH connections open, printing messages while the program is closed and you have returned to your shell. The excess of logs is nice, though it is not very good looking, we should hold deploy open until these connections no longer exist

semver

Somewhat self explanatory, we need to actually use versions (correctly)

Infinite Recursion Encountered

I get the following error when running deploy-rs. I ended up pulling down the example to see what I was doing wrong and I am still getting this. I am a bit stuck.

`nix run github:serokell/deploy-rs -- . -- --impure --show-trace
🚀 ℹ️ [deploy] [INFO] Running checks for flake in .
warning: Git tree '/home/eric/workspace/foobar' is dirty
warning: unknown flake output 'deploy'
🚀 ℹ️ [deploy] [INFO] Evaluating flake in .
warning: Git tree '/home/eric/workspace/foobar' is dirty
error: --- EvalError ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- nix
infinite recursion encountered
------------------------------------------------------------------------------------------------ show-trace ------------------------------------------------------------------------------------------------
trace: while evaluating the attribute 'text' of the derivation 'deploy.json'
at: (7:7) in file: /nix/store/sdjxjsd5phr225rs2qzklj2xci0c9gr0-source/pkgs/build-support/trivial-builders.nix

 6|     stdenv.mkDerivation ({
 7|       name = lib.strings.sanitizeDerivationName name;
  |       ^
 8|       inherit buildCommand;

trace: while evaluating the attribute 'buildCommand' of the derivation 'jsonschema-deploy-system'
at: (7:7) in file: /nix/store/sdjxjsd5phr225rs2qzklj2xci0c9gr0-source/pkgs/build-support/trivial-builders.nix

 6|     stdenv.mkDerivation ({
 7|       name = lib.strings.sanitizeDerivationName name;
  |       ^
 8|       inherit buildCommand;

🚀 ❌ [deploy] [ERROR] Failed to evaluate deployment data: Evaluation resulted in a bad exit code: Some(1)`

How to trouble-shoot no confirmation signal coming back

I'm observing this quite regularly on my setup. I changed quite a few things (new nixpkgs, replaying many of the active services), but it just hangs and times out and then doesn't exit because of some lingering SSH connection.

Activate on aach64-linux from x86_64-linux

The current_exe used within the remote activation command is taken std::env:current_exe.
This assumes that the system on the remote node is the same as the system on the deployment node, which is not when we want to deploy on a raspberry pi from a x86 desktop.

Magic rollback is not working if previous version was not deployed with deploy-rs

I'm trying to migrate my deployments to deploy-rs. And for one of my nodes (deployed with nixops) after the first attempt to deploy with deploy-rs something went wrong, network was lost, but magic rollback failed with
ERROR [activate] Error de-activating due to another error waiting for confirmation, oh no...: Failed to run command for re-activating the last generation: No such file or directory (os error 2)

As I understand it because deploy-rs-activate is calling in the old profile.

let re_activate_exit_status = Command::new(format!("{}/deploy-rs-activate", profile_path))

I know that it's a corner case, but it's inconvenient.

λ cat -p /var/log/deploy/deploy_2021-05-09_09-06-38.log
DEBUG [deploy] Checking for flake support
INFO [deploy] Evaluating flake in .
INFO [deploy] The following profiles are going to be deployed:
[router.system]
user = "root"
ssh_user = "root"
path = "/nix/store/mblwh0vfb61lsrgfskaa4vplilnky0a2-activatable-nixos-system-nixos-21.05.20210506.6358647"
hostname = "router"
ssh_opts = []

INFO [deploy::push] Building profile `system` for node `router`
DEBUG [deploy::push] Copying profile `system` to node `router`
INFO [deploy::deploy] Activating profile `system` for node `router`
DEBUG [deploy::deploy] Constructed activation command: /nix/store/mblwh0vfb61lsrgfskaa4vplilnky0a2-activatable-nixos-system-nixos-21.05.20210506.6358647/activate-rs --debug-logs --log-dir /var/log/deploy --temp-path '/tmp' activate '/nix/store/mblwh0vfb61lsrgfskaa4vplilnky0a2-activatable-nixos-system-nixos-21.05.20210506.6358647' '/nix/var/nix/profiles/system' --confirm-timeout 30 --magic-rollback --auto-rollback
DEBUG [deploy::deploy] Constructed wait command: /nix/store/mblwh0vfb61lsrgfskaa4vplilnky0a2-activatable-nixos-system-nixos-21.05.20210506.6358647/activate-rs --debug-logs --log-dir /var/log/deploy --temp-path '/tmp' wait '/nix/store/mblwh0vfb61lsrgfskaa4vplilnky0a2-activatable-nixos-system-nixos-21.05.20210506.6358647'
INFO [deploy::deploy] Creating activation waiter
DEBUG [deploy::deploy] Wait command ended
ERROR [deploy] Failed to deploy profile: Waiting over SSH resulted in a bad exit code: Some(255)

# cat /var/log/deploy/activate_activate_2021-05-09_06-06-48.log
INFO [activate] Activating profile
DEBUG [activate] Running activation script
INFO [activate] Activation succeeded!
INFO [activate] Magic rollback is enabled, setting up confirmation hook...
DEBUG [activate] Ensuring parent directory exists for canary file
DEBUG [activate] Creating canary file
DEBUG [activate] Creating notify watcher
INFO [activate] Waiting for confirmation event...
ERROR [activate] Error waiting for confirmation event: Timeout elapsed for confirmation
WARN [activate] De-activating due to error
DEBUG [activate] Listing generations
DEBUG [activate] Removing generation entry   88   2021-05-09 06:06:48
WARN [activate] Removing generation by ID 88
INFO [activate] Attempting to re-activate the last generation
ERROR [activate] Error de-activating due to another error waiting for confirmation, oh no...: Failed to run command for re-activating the last generation: No such file or directory (os error 2)
# cat /var/log/deploy/activate_wait_2021-05-09_06-06-47.log
INFO [activate] Waiting for confirmation event...
INFO [activate] Found canary file, done waiting!

Unable to disable magic rollback individually

In trying to work around #32 I wanted to disable magic rollback for the one host where the issue reproduces consistently, my laptop. The README says the following about the generic options:

This is a set of options that can be put in any of the above definitions, with the priority being profile > node > deploy

This means I expected to be able to to simply say deploy.nodes.myHost.magicRollback = false; and have it override my previous deploy.magicRollback = true; for that host. As I did in lovesegfault/nix-config@c74063d.

During deploy, however, I still see that magicRollback is enabled and that setting has not taken effect:

❯ deploy .#foucault
 INFO  deploy > Running checks for flake in .
warning: unknown flake output 'deploy'
 INFO  deploy > Evaluating flake in .
 WARN  deploy > The following profiles are going to be deployed:
[foucault.system]
user = "root"
ssh_user = "bemeurer"
path = "/nix/store/m8cv0h46jl4krqvwhf469561c3mlv1xs-activatable-nixos-system-foucault-21.03.20201218.9e67377"
hostname = "100.67.182.67"
ssh_opts = []

 INFO  deploy::utils::push > Building profile `system` for node `foucault`
 INFO  deploy::utils::deploy > Activating profile `system` for node `foucault`
 INFO  activate > Activating profile
activating the configuration...
setting up /etc...
setting up secrets...
sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint 5c8b2aca031733d4c59a8af19de06fed73f48056
reloading user units for bemeurer...
setting up tmpfiles
 INFO  activate > Activation succeeded!
 INFO  activate > Magic rollback is enabled, setting up confirmation hook...
Shared connection to 100.67.182.67 closed.
 INFO  deploy::utils::deploy > Success activating!
 INFO  deploy::utils::deploy > Attempting to confirm activation
 INFO  deploy::utils::deploy > Deployment confirmed.

"value is a string while a set was expected" in profiles

$ deploy .
 INFO  deploy > Running checks for flake in .
warning: Git tree 'XXX' is dirty
warning: unknown flake output 'deploy'
error: --- TypeError ------------------------------------------------------------------------------------------------------------------- nix
at: (97:188) in file: /nix/store/37pnjz9p96pfjmgfz8hcsc9ijazpzvfp-source/flake.nix

    96|               let
    97|                 profiles = builtins.concatLists (pkgs.lib.mapAttrsToList (nodeName: node: pkgs.lib.mapAttrsToList (profileName: profile: [ (toString profile.path) nodeName profileName ]) node.profiles) deploy.nodes);
      |                                                                                                                                                                                            ^
    98|               in

value is a string while a set was expected
(use '--show-trace' to show detailed location information)
 ERROR deploy > Failed to check deployment: Nix checking command resulted in a bad exit code: Some(1)

This is my flake.nix:

{
  description = "XXX";

  # For accessing `deploy-rs`'s utility Nix functions
  inputs.deploy-rs.url = "github:serokell/deploy-rs";

  outputs = { self, nixpkgs, deploy-rs }: {
    nixosConfigurations.peertube = nixpkgs.lib.nixosSystem {
      system = "x86_64-linux";
      modules = [ ./peertube/configuration.nix ];
    };

    deploy.nodes = {
      peertube = {
        hostname = "192.168.1.224";
        profiles.system = {
            user = "root";
            path = deploy-rs.lib.x86_64-linux.activate.nixos self.nixosConfigurations.peertube;
        };
      };

      sshUser = "admin";
    };

    # This is highly advised, and will prevent many possible mistakes
    checks = builtins.mapAttrs (system: deployLib: deployLib.deployChecks self.deploy) deploy-rs.lib;
  };
}

(Also: because there is a network.hostname inside the configuration.nix of a node, I assumed the deploy.node.XX.hostname would be the place to deploy it to? Is that correct?)

Using different nixpkgs channels for different machines causes excessive builds

So I just ran into an interesting problem in my "infrastructure" (basically, a remote linode box and a raspberry pi in my home): I was using two different nixpkgs release channels for the two machines, with the raspi running nixos-20.03, and the linode box running unstable. That meant that deploy-rs's nixpkgs version was pinned to the newer of the two channels. Now, when I tried deploying to the raspberry pi, it would have to download and compile a whole build toolchain (with two versions of llvm, a lot of font-rendering stuff, X libraries, gcc and gdb) - all on the Raspberry pi.

I think that's due to the "activate-rs" binary from #14 - it gets required on all target systems (which is right), but as you can't pick which toolchain it is built with, it pulls in the rustc and cargo from the toolchain on its flake inputs... and that can be quite a heavy lift.

The way I worked around this issue for now is by pinning all the deployed-to boxes to the same nixos version (20.09), which builds fast-enough on all the systems involved. But a real solution that lets me run different (and unstable) versions of nixos on subsets of the running machines would be ideal.

I'm not sure that's easily solvable (maybe by putting a activate-rs derivation into nixpkgs itself?), maybe a solution to #12 would fix this problem, too.

nixos-manual-combined.drv failed to validate

I created some really basic configuration and this happened:

$ deploy .
 INFO  deploy > Running checks for flake in .
warning: Git tree 'XXX' is dirty
warning: unknown flake output 'deploy'
warning: unknown flake output 'sshUser'
error: --- Error ----------------------------------------------------------------------------------------------------------------------- nix
builder for '/nix/store/n54gx86pxxlypkybayvdjfrr8zjxa3cj-nixos-manual-combined.drv' failed with exit code 3; last 10 log lines:
  manual-combined.xml:3: element info: Relax-NG validity error : Element book has extra content: info
       1     <?xml version="1.0"?>
       2  <book xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude" version="5.0" xml:id="book-nixos-manual">
       3     <info>
       4          <title>NixOS Manual</title>
       5    <subtitle>Version 21.03
       6        </subtitle>
       7   </info>
  
  manual-combined.xml fails to validate
error: --- Error ----------------------------------------------------------------------------------------------------------------------- nix
1 dependencies of derivation '/nix/store/hab0rywg9xljxyqsnd4rpg5f7pz8wd3j-nixos-manpages.drv' failed to build
error: --- Error ----------------------------------------------------------------------------------------------------------------------- nix
1 dependencies of derivation '/nix/store/f13in78rbmzwq3zwiw79misya6xykbmd-nixos-manual-html.drv' failed to build
error: --- Error ----------------------------------------------------------------------------------------------------------------------- nix
1 dependencies of derivation '/nix/store/gkkd5a9lpyf4ggndjjwpxbwxfq0zk9rd-system-path.drv' failed to build
error: --- Error ----------------------------------------------------------------------------------------------------------------------- nix
1 dependencies of derivation '/nix/store/pccknb8sn6ay4imgr9d5mhwgkd3s0x4q-nixos-system-XXX-21.03.20201205.118b6d8.drv' failed to build
error: --- Error ----------------------------------------------------------------------------------------------------------------------- nix
1 dependencies of derivation '/nix/store/l24jvvr2kz9zbhnd27982bs9f2krym5v-activatable-nixos-system-XXX-21.03.20201205.118b6d8.drv' failed to build
error: --- Error ----------------------------------------------------------------------------------------------------------------------- nix
1 dependencies of derivation '/nix/store/y0lkh206j9pwh61nfc7wh8xky2ppcisj-deploy-rs-check-activate.drv' failed to build
error: --- Error ----------------------------------------------------------------------------------------------------------------------- nix
build of '/nix/store/2ngx6q2fl6wkaz389xv4l4cdnf72752h-jsonschema-deploy-system.drv', '/nix/store/y0lkh206j9pwh61nfc7wh8xky2ppcisj-deploy-rs-check-activate.drv' failed
 ERROR deploy > Failed to check deployment: Nix checking command resulted in a bad exit code: Some(100)

It's hard for me to tell if this is deploy-rs's fault, though. I'm new to using flakes as well.

Configurable file permissions for tempPath

Once this is added, we can safely make it default to 777 alongside the default path being /tmp/deploy-rs fixing issues with user profiles deploying after system profiles running into permission errors.

deploy-rs gets stuck after deployment failure when magicRollback is enabled

For some reason, when magicRollback in enabled and the deployment script fails, deploy-rs hangs until the default timeout elapses instead of exiting immediately. Example of logs:

🚀 ℹ️ [deploy] [INFO] Activating profile `myprofile` for node `test`
🚀 ℹ️ [deploy] [INFO] Creating activation waiter
⭐ ℹ️ [activate] [INFO] Activating profile
👀 ℹ️ [wait] [INFO] Waiting for confirmation event...
⭐ ❌ [activate] [ERROR] The activation script resulted in a bad exit code: Some(1)
<here it gets stuck until the timeout>
👀 ❌ [wait] [ERROR] Error waiting for activation: Timeout elapsed for confirmation
🚀 ❌ [deploy] [ERROR] Failed to deploy profile: Waiting over SSH resulted in a bad exit code: Some(1)

Disabling magicRollback makes it fail immediately as expected:

⭐ ℹ️ [activate] [INFO] Activating profile
⭐ ❌ [activate] [ERROR] The activation script resulted in a bad exit code: Some(1)
🚀 ❌ [deploy] [ERROR] Failed to deploy profile: Activating over SSH resulted in a bad exit code: Some(1)

"Activation succeeded" even with errors

deploy-rs 1.0

A dependency job for systemd-networkd-wait-online.service failed. See 'journalctl -xe' for details.
A dependency job for network-local-commands.service failed. See 'journalctl -xe' for details.
Job for home-manager-bbigras.service failed because the control process exited with error code.
See "systemctl status home-manager-bbigras.service" and "journalctl -xe" for details.
the following new units were started: sys-fs-fuse-connections.mount
warning: the following units failed: home-manager-bbigras.service

● home-manager-bbigras.service - Home Manager environment for bbigras
     Loaded: loaded (/nix/store/awkgjr5m89r2d478z2jsv9kbczqn2psb-unit-home-manager-bbigras.service/home-manager-bbigras.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2021-01-19 18:39:34 EST; 432ms ago
    Process: 5155 ExecStart=/nix/store/6r4xq5sysphida907rnp3ka3zcq1v8p2-activate-bbigras (code=exited, status=1/FAILURE)
   Main PID: 5155 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
        CPU: 1.110s

jan 19 18:39:34 laptop hm-activate-bbigras[5680]: error: --- Error --- nix-daemon
jan 19 18:39:34 laptop hm-activate-bbigras[5680]: packages '/nix/store/rmc0qqkr8h4cvldingcq0vxg5hvlixww-home-manager-path/bin/tab' and '/nix/store/g75ymy4s3ypm5n24wwxhmz41q3gpz1c6-tab-rs-0.5.5/bin/tab' have the same priority 5; use 'nix-env --set-flag priority NUMBER INSTALLED_PKGNAME' to change the priority of one of the conflicting packages (0 being the highest priority)
jan 19 18:39:34 laptop hm-activate-bbigras[5680]: error: --- Error --- nix-env
jan 19 18:39:34 laptop hm-activate-bbigras[5680]: builder for '/nix/store/dyddazcgksnpjmkzkm4sc050q2a9qb8x-user-environment.drv' failed with exit code 1; last 2 log lines:
jan 19 18:39:34 laptop hm-activate-bbigras[5680]:   error: --- Error --- nix-daemon
jan 19 18:39:34 laptop hm-activate-bbigras[5680]:   packages '/nix/store/rmc0qqkr8h4cvldingcq0vxg5hvlixww-home-manager-path/bin/tab' and '/nix/store/g75ymy4s3ypm5n24wwxhmz41q3gpz1c6-tab-rs-0.5.5/bin/tab' have the same priority 5; use 'nix-env --set-flag priority NUMBER INSTALLED_PKGNAME' to change the priority of one of the conflicting packages (0 being the highest priority)
jan 19 18:39:34 laptop systemd[1]: home-manager-bbigras.service: Main process exited, code=exited, status=1/FAILURE
jan 19 18:39:34 laptop systemd[1]: home-manager-bbigras.service: Failed with result 'exit-code'.
jan 19 18:39:34 laptop systemd[1]: Failed to start Home Manager environment for bbigras.
jan 19 18:39:34 laptop systemd[1]: home-manager-bbigras.service: Consumed 1.110s CPU time, no IP traffic.
⭐ ℹ️ [activate] [INFO] Activation succeeded!
⭐ ℹ️ [activate] [INFO] Magic rollback is enabled, setting up confirmation hook...
⭐ ℹ️ [activate] [INFO] Waiting for confirmation event...
🚀 ℹ️ [deploy] [INFO] Success activating, attempting to confirm activation
🚀 ℹ️ [deploy] [INFO] Deployment confirmed.

Default boot entry is never updated

If I look at my system profiles I see this:

❯ ls /nix/var/nix/profiles
Permissions Size User Date Modified Name
lrwxrwxrwx    15 root 14 Dec 18:01  default -> default-87-link
lrwxrwxrwx    60 root 14 Dec  7:37  default-87-link -> /nix/store/nivqnkygpax10ph1s2f5m8km4nr8sc24-user-environment
drwxr-xr-x     - root  4 Jun  1:27  per-user/
lrwxrwxrwx    15 root 18 Dec 18:12  system -> system-434-link
lrwxrwxrwx   100 root 18 Dec 17:28  system-432-link -> /nix/store/nzd9if3g1m1z2b37694pkq3472mdrhnx-activatable-nixos-system-foucault-21.03.20201218.9e67377
lrwxrwxrwx   100 root 18 Dec 17:46  system-433-link -> /nix/store/yycw0bmzpi7l5bx5zkpk4nb79sfydyki-activatable-nixos-system-foucault-21.03.20201218.9e67377
lrwxrwxrwx   100 root 18 Dec 18:12  system-434-link -> /nix/store/m8cv0h46jl4krqvwhf469561c3mlv1xs-activatable-nixos-system-foucault-21.03.20201218.9e67377

This looks exactly right, my last system profile, 434, was a successful deploy and so /nix/var/nix/profiles/system points to it.

If, now, I go look at my bootloader (systemd-boot) config, I see this:

❯ cat /boot/loader/loader.conf
───────┬──────────────────────────────────────────────────
       │ File: /boot/loader/loader.conf
───────┼──────────────────────────────────────────────────
   1   │ timeout 2
   2   │ default nixos-generation-426.conf
   3   │ console-mode keep
───────┴──────────────────────────────────────────────────

And this is despite the fact that the correct generation is available in the boot partition

❯ ls /boot/loader/entries
Permissions Size User Date Modified Name
.rwxr-xr-x   748 root 18 Dec 18:12  nixos-generation-432.conf*
.rwxr-xr-x   748 root 18 Dec 18:12  nixos-generation-433.conf*
.rwxr-xr-x   748 root 18 Dec 18:12  nixos-generation-434.conf*

The current default happened right around the same time I switched to deploy-rs. Could it be that it's just not updating that parameter?

Password based sudo

After bootstrapping a host with regular nixos-install and setting a password for nixos user, I then tried to deploy subsequent generations with deploy-rs. However, even if I turned on interactive mode, I was not able to enter the sudo password.

Either I didn't understand how to do it (ux problem) or it is plain not possible at the moment (i think this is the case).

Some people might be reluctant to configure passwordless sudo as base line, hence a way is needed to supply elevation credentials during deployment.

Coming from divnix/digga#197 (comment)

Parallel Deploys

Would you consider allowing deploy-rs to deploy to multiple nodes simultaneously?

I wanted to attempt implementing that, but figured I'd check before so as not to end up wasting time.

Huge first deployment

I'm currently using nixus and would like to give deploy-rs a try.

I wasn't using flakes before.

One weird thing is that a lot of files seems to be copied for the deployment:

[2/210/2351 copied (931.2/7020.5 MiB)] copying path '/nix/store/cmrw93jr3a97jdbgqfzv3wi

I'm surprised since my computer shouldn't be that far behind.

Is that normal? Maybe it's just because I'm starting to use flakes or maybe deploy-rs does something different from nixus.

Likely confusion between target system & system running deploy

I finally got to the "activate" stage in my porting to deploy-rs (while running deploy on my macOS Big Sur machine), and unfortunately the deploy failed with:

 INFO  deploy > Evaluating flake in ./
 WARN  deploy > The following profiles are going to be deployed:
[monitor.system]
user = "root"
ssh_user = "asf"
path = "/nix/store/rscx6ya03wvd4qj2vv80pz1kazw7whrh-activatable-nixos-system-monitor-21.03.20201121.1ffd7cf"
hostname = "monitor.example.com"
ssh_opts = []

 INFO  deploy::utils::push > Building profile `system` for node `monitor`
warning: Git tree '/Users/asf/home' is dirty
 INFO  deploy::utils::deploy > Activating profile `system` for node `monitor`
sudo: /nix/store/n8hj08m4n6bwbvdm92ka59rjmaz29hvg-deploy-rs-0.1.0/bin/activate: command not found
Connection to monitor.example.com closed.
 ERROR deploy                > Failed to deploy profile: Activation over SSH resulted in a bad exit code: Some(1)

What's more, the closure that lands on the linux machine, printed above (/nix/store/n8hj08m4n6bwbvdm92ka59rjmaz29hvg-deploy-rs-0.1.0/bin/) contains a deploy binary that's built for macOS.

My guess is that the closure that gets copied to the target machine is compiled for the machine running the deploy. Maybe there's one more thing that needs building with the correct target platform set.

Binary install?

👋

Forgive the naive question here, but I can't figure out how I'm supposed to ensure the deploy binary is present on the system I want to deploy from.

My flake.nix houses my system configuration for my laptops and remote machines I want to deploy to.
I want the deploy binary to be present in my laptop's PATH so I can use it to deploy from the flake.

I see the flake presents a packages.deploy-rs and have tried a combination of adding this to my home-manager packages/systemPackages, but I'm not having any luck.

I was able to confirm my deployment worked by using nix run github:serokell/deploy-rs <my flake>, which is great - I'd just like to not rely on running the flake using the GitHub fetcher :)

Here's a succinct example of what I've tried:

{
  inputs.nixpkgs.url = "github:nixos/nixpkgs-unstable";
  inputs.darwin.url = "github:lnl7/nix-darwin";
  inputs.deploy-rs.url = "github:serokell/deploy-rs";

  outputs = { self, nixpkgs, darwin, deploy-rs }: {
    darwinConfigurations.example = darwin.lib.darwinSystem {
      modules = [
        {
          # have also tried just `deploy-rs`
          environment.systemPackages = [ deploy-rs.packages.deploy-rs ];
        }
      ];
    };
  };
}

I'm sure it's just something stupid I'm missing, but I'd really appreciate a hand here 🙇
Many thanks for building and sharing this wonderful tool!

Specify MSRV

Every time I try to rebuild deploy-rs I had compile errors because the Rust I'm using is too old (1.42). It'd be nice if the project could pick a Rust version, document it's support and then add a CI job to ensure compatibility is kept.

Activate is trying connect with wrong user name

I deploy the flake:
`
{
description = "Deployment for my servers";
inputs = {
deploy-rs.url = "github:serokell/deploy-rs";
nixpkgs.url = "github:NixOS/nixpkgs/release-20.09";
};

outputs = { self, nixpkgs, deploy-rs }: {

nixosConfigurations = {
  nixos-test = nixpkgs.lib.nixosSystem {
    system = "x86_64-linux";
    modules = [ ./nixos-test/configuration.nix ];
  };
};

deploy = {
  magicRollback = false;
  autoRollback = true;
  nodes = {
    nixos-test = {
      hostname = "192.168.122.107";
      profiles = {
        system = {
          user = "root";
          sshUser = "dz";
          fastConnection = false;
          path = deploy-rs.lib.x86_64-linux.activate.nixos self.nixosConfigurations.nixos-test;
        };
      };
    };
  };
};
checks = builtins.mapAttrs (system: deployLib: deployLib.deployChecks self.deploy) deploy-rs.lib;

};
}
`
Build and "nix copy" steps are running fine. But Activate step is trying to connect with username "ssh".

fqdn / domain attribute or cli flag

hostname is a good approximation, but we also need a domain, so that we can construct FQDNs even in split-horizon scenarios, where the domain is swapped.

This is the case for MulticastDNS which, per rfc6762 swaps the domain for local. to hostname.local.

NixOS to that end knows:

{
    networking.hostName = mkOption {
      default = "nixos";
      # Only allow hostnames without the domain name part (i.e. no FQDNs, see
      # e.g. "man 5 hostname") and require valid DNS labels (recommended
      # syntax). Note: We also allow underscores for compatibility/legacy
      # reasons (as undocumented feature):
      type = types.strMatching
        "^$|^[[:alnum:]]([[:alnum:]_-]{0,61}[[:alnum:]])?$";
      description = ''
        The name of the machine. Leave it empty if you want to obtain it from a
        DHCP server (if using DHCP). The hostname must be a valid DNS label (see
        RFC 1035 section 2.3.1: "Preferred name syntax", RFC 1123 section 2.1:
        "Host Names and Numbers") and as such must not contain the domain part.
        This means that the hostname must start with a letter or digit,
        end with a letter or digit, and have as interior characters only
        letters, digits, and hyphen. The maximum length is 63 characters.
        Additionally it is recommended to only use lower-case characters.
        If (e.g. for legacy reasons) a FQDN is required as the Linux kernel
        network node hostname (uname --nodename) the option
        boot.kernel.sysctl."kernel.hostname" can be used as a workaround (but
        the 64 character limit still applies).
      '';
    };

    networking.fqdn = mkOption {
      readOnly = true;
      type = types.str;
      default = if (cfg.hostName != "" && cfg.domain != null)
        then "${cfg.hostName}.${cfg.domain}"
        else throw ''
          The FQDN is required but cannot be determined. Please make sure that
          both networking.hostName and networking.domain are set properly.
        '';
      defaultText = literalExample ''''${networking.hostName}.''${networking.domain}'';
      description = ''
        The fully qualified domain name (FQDN) of this host. It is the result
        of combining networking.hostName and networking.domain. Using this
        option will result in an evaluation error if the hostname is empty or
        no domain is specified.
      '';
    };


    networking.domain = mkOption {
      default = null;
      example = "home.arpa";
      type = types.nullOr types.str;
      description = ''
        The domain.  It can be left empty if it is auto-detected through DHCP.
      '';
    };
}

deployd-rs

I imagine a deployd daemon that acts in principal similar to https://fluxcd.io in that it regularily polls a cache for uptades to a profile that is mapped through some sort of machine id, and if a new version is available automagically 1) pulls it, 2) updates its current generation and 3) does basic safety rollback stuff.

Check generated fstab for changes

I had the unfortunate experience of accidentally swapping the mapping of servers, then deploying. E.g. I deployed my laptop configuration to my desktop on accident. This passed local deploy-rs checks, but resulted in my server mounting the filesystems wrong and crashing.

I don't know how difficult this would be to implement but it would be a cool feature to add a check for this to deploy-rs. This check would ensure that the filesystem currently mounted (current /etc/static/fstab) on the system(s) to deploy to isn't changed by the new configuration. If it is, one could prompt the user if they still want to deploy (because it might be an accident).

rollback even if I got "Success activating!"

I'm pretty sure I'm seeing a rollback even if I got "Success activating!".

Is there a log or something on the target to verify?

EDIT: I see 2 hm-activate-bbigras runs for the same deploy.

sshd.conf-validated "lacks valid signature"

I finally got my config to pass checks, yay! This is what I get when trying to deploy to virtualbox:

 WARN  deploy > The following profiles are going to be deployed:
[peertube.system]
user = "root"
ssh_user = "admin"
path = "/nix/store/ciawrnkay7nf3400cxa897ngqh1p14gz-activatable-nixos-system-peertube-test-20.09.20201205.99f8282"
hostname = "172.16.5.118"
ssh_opts = []

 INFO  deploy::utils::push > Building profile `system` for node `peertube`
warning: Git tree 'XXX' is dirty
Enter passphrase for key 'XXX': 
copying 171 pathserror: cannot add path '/nix/store/38j8q2djbd2n6kh84gqdaz8z9gdilvm3-sshd.conf-validated' because it lacks a valid signature
error (ignored): error: --- EndOfFile ------------------------------------------------------------------------------------------------------------------- nix
unexpected end-of-file
error (ignored): error: --- EndOfFile ------------------------------------------------------------------------------------------------------------------- nix
unexpected end-of-file
error: --- Error ----------------------------------------------------------------------------------------------------------------------- nix
unexpected end-of-file
 ERROR deploy              > Failed to push profile: Nix copy command resulted in a bad exit code: Some(1)

Can't run `nix run github:serokell/deploy-rs`

❯ nix run github:serokell/deploy-rs                                           nix-shell
error: --- UsageError -------------------------------------------------------------- nix
unrecognised flag '--command'
Try 'nix --help' for more information

Master doesn't build with the rust toolchain in 20.09 release

I was surprised to find that the latest HEAD of deploy-rs doesn't build with rust 1.45.2, the version that's in the 20.09 release:

:;    cargo +1.45.2 test
   Compiling smol_str v0.1.17
   Compiling thin-dst v1.1.0
   Compiling rustc-hash v1.1.0
   Compiling text_unit v0.1.10
   Compiling cbitset v0.2.0
error[E0658]: `while` is not allowed in a `const fn`
  --> /Users/asf/.cargo/registry/src/github.com-1ecc6299db9ec823/smol_str-0.1.17/src/lib.rs:58:9
   |
58 | /         while i < text.len() {
59 | |             buf[i] = text.as_bytes()[i];
60 | |             i += 1
61 | |         }
   | |_________^
   |
   = note: see issue #52000 <https://github.com/rust-lang/rust/issues/52000> for more information

error: aborting due to previous error

For more information about this error, try `rustc --explain E0658`.
error: could not compile `smol_str`.

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed

Looks like smol_str (pulled in by rowan, via rnix) is using const fn's in a slightly too-modern way.

Rollback not working as expected.

Hi,

I was testing the rollback feature and tested when I made mistake with sshd service and networking.

restarting the following units: network-addresses-ens18.service, sshd.service
⭐ ℹ️ [activate] [INFO] Activation succeeded!
⭐ ℹ️ [activate] [INFO] Magic rollback is enabled, setting up confirmation hook...
⭐ ℹ️ [activate] [INFO] Waiting for confirmation event...
setting up /etc...
⭐ ❌ [activate] [ERROR] Error waiting for confirmation event: Timeout elapsed for confirmation
⭐ ⚠️ [activate] [WARN] De-activating due to error
warning: unknown setting 'experimental-features'
switching from generation 19 to 18
⭐ ⚠️ [activate] [WARN] Removing generation by ID 19
warning: unknown setting 'experimental-features'
removing generation 19
⭐ ℹ️ [activate] [INFO] Attempting to re-activate the last generation
warning: unknown setting 'experimental-features'
activating the configuration...
reloading user units for root...
setting up tmpfiles
restarting the following units: network-addresses-ens18.service

As expected, deploy rollback but instead of restarting network & sshd (like the switch profile do), it only restart network.

Testing with only mistake in network or sshd, the rollback works.

Is there somethings I missed?

Best,

Show profiles before deploying them

Sometimes it would be nice to see what's going to happen before deploy-rs starts doing its work. In particular, I propose to add a generic interactive :: boolean attribute to the deploy attrs, and if it is set to true for a profile, deploy-rs dumps the fully evaluated profile to stdout/stderr (in pretty-printed JSON/YAML/TOML, doesn't really matter here but it should be human-readable) and interactively asks the user to confirm the deployment. A very nice feature would be to also consider where the generic setting was specified, and ask before doing anything for that particular level (i.e. if interactive = true for a node, dump all profiles that are to be deployed for that node and ask before deploying anything for that node).

Example of desired behavior:

$ nix run github:serokell/deploy-rs -- .#foo.system --interactive
 INFO  deploy > Running checks for flake in .
warning: unknown flake output 'deploy'
 INFO  deploy > Evaluating flake in .
 INFO  deploy > The following profiles for flake . are to be deployed:
 foo:
   system:
     user: root
     path: /nix/store/...-nixos-system-foo-20.09pre-git
     hostname: foo.example.com
     interactive: true
     sshOpts:
       - "-p"
       - "1234"
 INFO  deploy > Are you sure you want to deploy those profiles? (only the word yes is accepted)
 ? yes
 INFO  deploy::utils::push > Building profile `system` for node `foo`
<...>
$ nix run github:serokell/deploy-rs -- .#foo --interactive
 INFO  deploy > Running checks for flake in .
warning: unknown flake output 'deploy'
 INFO  deploy > Evaluating flake in .
 INFO  deploy > The following profiles for flake . are to be deployed:
 foo:
   system:
     user: root
     path: /nix/store/...-nixos-system-foo-20.09pre-git
     hostname: foo.example.com
     interactive: true
     sshOpts:
       - "-p"
       - "1234"
   home-manager:
     user: vasya
     path: /nix/store/...-home-manager-generation-vasya
     hostname: foo.example.com
     interactive: true
     sshOpts:
       - "-p"
       - "1234"
 INFO  deploy > Are you sure you want to deploy those profiles? (only the word yes is accepted)
 ? yes
 INFO  deploy::utils::push > Building profile `system` for node `foo`
<...>

deploy install

In the context of https://devos.divnix.com/doc/start/bootstrapping.html, it would be nice if deploy-rs could handle to install onto /mnt just as the official installer during bootstrapping operations.

While it is currently possible to (almost) fully remote bootstrap, except for sticking the usb into the host, using deploy-rs for the initial deployment onto /mnt has the added benefit of having a single and uniform workflow from generation 1, already.

There are probably more benefits waiting downstream.

Activation preset for home-manager

Despite intentionally remaining profile-agnostic, deploy-rs provides support for NixOS in the form of activate.nixos, I would like to provide a similar option for home-manager deployments, this is something I know to be possible as I use it in my own deployments, though last time I checked required modifying some of home-manager's code.

example `system` doesn't work.

To Reproduce

I ran these commands:

git clone https://github.com/serokell/deploy-rs
cd deploy-rs/examples/system/
nix build .#nixosConfigurations.bare.config.system.build.vm
QEMU_NET_OPTS=hostfwd=tcp::2221-:22 ./result/bin/run-bare-system-vm &
nix run github:serokell/deploy-rs 

Error

And got this error:

🚀 ℹ️ [deploy] [INFO] Running checks for flake in .
warning: unknown flake output 'deploy'
error: attribute 'activate' missing

       at /nix/store/frx9n1lwlhvl6ab3hm6mnq69yid117xz-source/examples/system/flake.nix:38:18:

           37|           sshUser = "hello";
           38|           path = deploy-rs.lib.x86_64-linux.activate.custom self.defaultPackage.x86_64-linux "./bin/activate";
             |                  ^
           39|           user = "hello";
(use '--show-trace' to show detailed location information)
🚀 ❌ [deploy] [ERROR] Failed to check deployment: Nix checking command resulted in a bad exit code: Some(1)

About my system

  • system: "x86_64-linux"
  • host os: Linux 5.11.11, NixOS, 20.09.4021.2685792d396 (Nightingale)
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.4pre20210326_dd77f71
  • channels(root): "nixos-20.09.4021.2685792d396, nixos-unstable-21.05pre285574.8e4fe32876c"
  • channels(wucke13): ""
  • nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

Sporadic (??!) "Activation script deploy-rs-activate does not exist in profile." errors

I'm running into a bit of a stumper: Occasionally, deploy-rs (c599d36) will error with "Activation script deploy-rs-activate does not exist in profile." However, if I re-run the deploy command, deploying exactly the same revision as before, it succeeds.

Here's a shell transcript:

$ nix run ./#deploy -- './#bonnetmaker' -- --show-trace -L
warning: Git tree '/Users/asf/home' is dirty
trace: Warning: `stdenv.lib` is deprecated and will be removed in the next release. Please use `lib` instead. For more information see https://github.com/NixOS/nixpkgs/issues/108938
🚀 ℹ️ [deploy] [INFO] Running checks for flake in ./
warning: Git tree '/Users/asf/home' is dirty
warning: unknown flake output 'deploy'
warning: unknown flake output 'config'
warning: unknown flake output 'darwinConfigurations'
🚀 ℹ️ [deploy] [INFO] Evaluating flake in ./
warning: Git tree '/Users/asf/home' is dirty
trace: Warning: `stdenv.lib` is deprecated and will be removed in the next release. Please use `lib` instead. For more information see https://github.com/NixOS/nixpkgs/issues/108938
trace: Warning: `stdenv.lib` is deprecated and will be removed in the next release. Please use `lib` instead. For more information see https://github.com/NixOS/nixpkgs/issues/108938
🚀 ℹ️ [deploy] [INFO] The following profiles are going to be deployed:
[bonnetmaker.system]
user = "root"
ssh_user = "asf"
path = "/nix/store/dxslwd32gzd9nn3zw3rks4mr1gghrpln-activatable-nixos-system-bonnetmaker-21.05.20210224.fdd622d"
hostname = "bonnetmaker.example.com"
ssh_opts = []

🚀 ℹ️ [deploy] [INFO] Building profile `system` for node `bonnetmaker`
warning: Git tree '/Users/asf/home' is dirty
trace: Warning: `stdenv.lib` is deprecated and will be removed in the next release. Please use `lib` instead. For more information see https://github.com/NixOS/nixpkgs/issues/108938
trace: Warning: `stdenv.lib` is deprecated and will be removed in the next release. Please use `lib` instead. For more information see https://github.com/NixOS/nixpkgs/issues/108938
activatable-nixos-system-bonnetmaker> created 20 symlinks in user environment
🚀 ❌ [deploy] [ERROR] Failed to push profile: Activation script deploy-rs-activate does not exist in profile.
Did you forget to use deploy-rs#lib.<...>.activate.<...> on your profile path?

$ nix run ./#deploy -- './#bonnetmaker' -- --show-trace -L
warning: Git tree '/Users/asf/home' is dirty
trace: Warning: `stdenv.lib` is deprecated and will be removed in the next release. Please use `lib` instead. For more information see https://github.com/NixOS/nixpkgs/issues/108938
🚀 ℹ️ [deploy] [INFO] Running checks for flake in ./
warning: Git tree '/Users/asf/home' is dirty
warning: unknown flake output 'deploy'
warning: unknown flake output 'config'
warning: unknown flake output 'darwinConfigurations'
🚀 ℹ️ [deploy] [INFO] Evaluating flake in ./
warning: Git tree '/Users/asf/home' is dirty
trace: Warning: `stdenv.lib` is deprecated and will be removed in the next release. Please use `lib` instead. For more information see https://github.com/NixOS/nixpkgs/issues/108938
trace: Warning: `stdenv.lib` is deprecated and will be removed in the next release. Please use `lib` instead. For more information see https://github.com/NixOS/nixpkgs/issues/108938
🚀 ℹ️ [deploy] [INFO] The following profiles are going to be deployed:
[bonnetmaker.system]
user = "root"
ssh_user = "asf"
path = "/nix/store/8p86hrg6sb5gayzlfxa41ml26ry77smg-activatable-nixos-system-bonnetmaker-21.05.20210224.fdd622d"
hostname = "bonnetmaker.example.com"
ssh_opts = []

🚀 ℹ️ [deploy] [INFO] Building profile `system` for node `bonnetmaker`
warning: Git tree '/Users/asf/home' is dirty
🚀 ℹ️ [deploy] [INFO] Activating profile `system` for node `bonnetmaker`
🚀 ℹ️ [deploy] [INFO] Creating activation waiter
👀 ℹ️ [wait] [INFO] Waiting for confirmation event...
⭐ ℹ️ [activate] [INFO] Activating profile
810 blocks
810 blocks
810 blocks
810 blocks
810 blocks
810 blocks
810 blocks
810 blocks
810 blocks
810 blocks
stopping the following units: home-manager-asf.service
activating the configuration...
setting up /etc...
setting up secrets...
sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key with fingerprint 3092a98082c20dbf46ea0a94570ada1798d03d7e
reloading user units for asf...
setting up tmpfiles
starting the following units: home-manager-asf.service
⭐ ℹ️ [activate] [INFO] Activation succeeded!
⭐ ℹ️ [activate] [INFO] Magic rollback is enabled, setting up confirmation hook...
👀 ℹ️ [wait] [INFO] Found canary file, done waiting!
⭐ ℹ️ [activate] [INFO] Waiting for confirmation event...
🚀 ℹ️ [deploy] [INFO] Success activating, attempting to confirm activation
🚀 ℹ️ [deploy] [INFO] Deployment confirmed.

Question about multiple profiles

This type of design (as opposed to more traditional tools like NixOps or morph) allows for lesser-privileged deployments, and the ability to update different things independently of eachother. You can deploy any type of profile to any user, not just a NixOS profile to root.

If I have a system profile for my server with nginx set up. Is it possible to deploy another profile that would set another vhost for nginx (services.nginx.virtualHosts)?

I'm guessing it can't be done and multiple profiles are more for things that wouldn't have to modify the configuration.nix file.

Build on target server?

Is it possible to build the flake on the targets server instead of my laptop?
I don't want to upload big build closures to my machines all the time.

Deploying to other platforms with remote builders?

I'd love to deploy from my darwin system to a x86_64 linux system (with the aid of #9), which used to work in my previous nixus-based deploy system via remote builders.

I thought I could use the same builder setup with deploy-rs:

$ nix run github:antifuchs/deploy-rs/attempt-to-fix-corefoundation-error -- ./#monitor -- --show-trace --builders 'ssh-ng://linux-box.local x86_64-linux ; ssh-ng://raspi.local aarch64-linux' --builders-use-substitutes
warning: unknown flake output 'config'
warning: unknown flake output 'darwinConfigurations'
warning: unknown flake output 'deploy'
 INFO  deploy > Evaluating flake in ./
 WARN  deploy > The following profiles are going to be deployed:
[monitor.system]
user = "root"
ssh_user = "asf"
path = "/nix/store/rscx6ya03wvd4qj2vv80pz1kazw7whrh-activatable-nixos-system-monitor-21.03.20201121.1ffd7cf"
hostname = "monitor.example.com"
ssh_opts = []

 INFO  deploy::utils::push > Building profile `system` for node `monitor`
error: --- Error ------------------------------------------------------------------------------------------------------ nix
a 'x86_64-linux' with features {} is required to build '/nix/store/91slb8lh1xpqbxx7lhd58hzv4wpmwbh2-builder.pl.drv', but I am a 'x86_64-darwin' with features {benchmark, big-parallel, nixos-test, recursive-nix}
 ERROR deploy              > Failed to push profile: Nix build command resulted in a bad exit code: Some(1)

(redacted actual host names).

I think using remote builders ought to work, considering that a previous non-flake-based nix-build did use the given builders to build off-platform binaries. Is this a deploy-rs limitation, or is the beta nix build program not working right?

Hostnames with . in them have weird results

I thought I'd define my nixos config entries to be the same as the FQDN of the hosts that I'm deploying to - but apparently deploy-rs uses . as a separator between the hostname and a profile name:

$ nix run github:serokell/deploy-rs ./#monitor.example.com
[...]
 ERROR deploy > Failed to deploy all profiles: example.com

I assume the . separates hosts from deploy profiles; is there a better way to do that? I think FQDNs would be really useful for identifying larger sets of hosts, making the dot character quite valuable.

Activation script runs forever

Even after the deploy has been successful and my deploy .#myHost command has exited, I can see the activation script running on my machine and pegging one CPU on 100% utilization.

❯ ps aux | rg -i deploy-rs
root     18497  100  0.0 953168   608 ?        Sl   20:57   0:41 /nix/store/5sdsrjrk314jv8zw8179bl65bkcg3dim-deploy-rs-0.1.0/bin/activate /nix/var/nix/profiles/system /nix/store/m8cv0h46jl4krqvwhf469561c3mlv1xs-activatable-nixos-system-foucault-21.03.20201218.9e67377 --temp-path /tmp --confirm-timeout 30 --magic-rollback --auto-rollback
bemeurer 18988  0.0  0.0   9700  4556 pts/2    R+   20:58   0:00 rg -i deploy-rs

This happens every single time I deploy to my machine.

does not provide attribute 'packages.x86_64-linux.deploy.nodes.laptop.nix.profiles.system.path', 'legacyPackages.x86_64-linux.deploy.nodes.laptop.nix.profiles.system.path' or 'deploy.nodes.laptop.nix.profiles.system.path'

Not sure what I'm doing wrong.

Here's my https://github.com/bbigras/nix-config/blob/ec185dc51b4818c154cdd6d88a0c0080f3000a71/flake.nix .

Note that I'm using a nix file for each system instead of a directory with a default.nix.

[bbigras@desktop:~/nix-config]$ deploy
 INFO  deploy > Running checks for flake in .
warning: Git tree '/home/bbigras/nix-config' is dirty
warning: unknown flake output 'deploy'
 INFO  deploy > Evaluating flake in .
warning: Git tree '/home/bbigras/nix-config' is dirty
 WARN  deploy > The following profiles are going to be deployed:
["laptop.nix".system]
user = "root"
ssh_user = "bbigras"
path = "/nix/store/5q9256z5p4fxqksq4mqj44j7rf02ngn6-activatable-nixos-system-laptop-21.03.20201121.2247d82"
hostname = "laptop"
ssh_opts = []

 INFO  deploy::utils::push > Building profile `system` for node `laptop.nix`
warning: Git tree '/home/bbigras/nix-config' is dirty
error: --- Error --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- nix
flake 'git+file:///home/bbigras/nix-config' does not provide attribute 'packages.x86_64-linux.deploy.nodes.laptop.nix.profiles.system.path', 'legacyPackages.x86_64-linux.deploy.nodes.laptop.nix.profiles.system.path' or 'deploy.nodes.laptop.nix.profiles.system.path'
 ERROR deploy              > Failed to push profile: Nix build command resulted in a bad exit code: Some(1)

Also it's weird that I can't run nix run github:serokell/deploy-rs. I'm using a nix develop shell instead:

❯ nix run github:serokell/deploy-rs                                                                                                                                                                     nix-shell
error: --- UsageError ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- nix
unrecognised flag '--command'
Try 'nix --help' for more information.

How to prepare target machines?

This is the first time I'm using a deploy tool to setup NixOS machines. I have only used regular NixOS machines manually by pushing configurations into them and updating.

So now for testing I have a VirtualBox VM that I have installed NixOS onto and I'd like to use deploy-rs to deploy onto the machine (in order to later deploy onto a real VPS).

Is there a standard way of "preparing" a machine to be a target for a deployment like with deploy-rs?

deploy stuck when target network goes down during deploy

My deploy has been stuck like this for a while.

I was deploying from my desktop to my laptop.

My laptop lost network during the deploy.

[...]
🚀 ℹ️ [deploy] [INFO] Activating profile `system` for node `laptop`
🚀 ℹ️ [deploy] [INFO] Creating activation waiter
⭐ ℹ️ [activate] [INFO] Activating profile
👀 ℹ️ [wait] [INFO] Waiting for confirmation event...
stopping the following units: acpid.service, alsa-store.service, audit.service, chronyd.service, eternal-terminal.service, fstrim.timer, geoclue.service, home-manager-bbigras.service, iwd.service, kmod-static-nodes.service, network-local-commands.service, nix-daemon.service, nix-daemon.socket, nscd.service, systemd-journal-catalog-update.service, systemd-modules-load.service, systemd-networkd-wait-online.service, systemd-networkd.service, systemd-networkd.socket, systemd-resolved.service, systemd-sysctl.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service, systemd-update-done.service, tailscaled.service, thermald.service, tlp.service, zerotierone.service

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.