GithubHelp home page GithubHelp logo

equinix-metal-builders's Introduction

Transient Nix Builders on Packet.com

I use Packet.com's spot market to run transient, powerful Nix builders. The files here are custom to my builders and my use case. However, they could easily serve as a nice template for you to use.

I would accept PRs parameterizing the code.

Principles of Operation

This repository creates a bootable iPXE image. We assume each boot starts with an empty set of disks. In case this is not true, it erases all of the disks and creates one large ZFS stripe across all disks. If you have any disks attached, it WILL erase it on every boot.

Each machine is stateless. As soon as it boots it is ready to build. When the machine shuts down, all data is lost.

To customize

  1. Edit ./user.nix to have your user and your user's key. If you have a key which is only used by Nix's remote builder protocol, then they might belong in in the sshKeys list at the top.

  2. Edit ./instances/m2.xlarge.x86.nix to match the hardware you'll be deploying to. These machines are all Packet.com's m2.xlarge.x86 type, so if you're also using those, it is ready to go.

Building

m2.xlarge.x86

You can simply nix-build ./instances/m2.xlarge.x86.nix in this directory and create a bootable image. On the other hand, I use ./build-x86_64-linux.sh, which instantiates locally and builds on my netboot server. The remote server builds much faster and saves my battery life.

c2.large.arm

In principle this one is just as easy. If you're on a machine which can build aarch64 binaries, then you can just run nix-build ./instances/c2.large.arm.nix. However, I am not and this is a bit annoying.

Therefore, I've written ./build-aarch64-linux.sh which requires a configuration file of this format:

buildHost       [email protected]
pxeHost         [email protected]
pxeDir          /var/lib/nginx/netboot/
opensslServer   my.netboot.server
opensslPort     61616

This will copy the derivations to buildHost for building, and then set up an openssl-wrapped netcat tunnel from buildHost to opensslServer:opensslPort for transfering the build products.

My laptop will SSH to the pxeHost and launch openssl and netcat, then SSH to the buildHost and initiate a connection to opensslServer:opensslPort. If this doesn't work, make sure that port is open.

Deploying

After building, copy the resulting directory's files to a web accessible directory and instruct the server to boot from the netboot.ipxe file in the result.

On Packet, edit ./create-spot-request.sh to include the Packet API information, and the URL of the netboot.ipxe. This might be expensive! Make sure you understand what it will cost.

I always use their spot market, but you could deploy this to a regular or reserved server just the same.

If you use their spot market, this repository leaves it up as an exercise to the reader to implement server discovery. Although, if you're using Hydra, an importer exists at https://github.com/NixOS/nixos-org-configurations/tree/master/hydra-packet-importer already.

Implementation Notes

  • A naive implementation of a remote Nix builder might stick with the default unionfs. However, this approach uses a lot of extra CPU and causes unstable and broken builds for more complex builds. Because of this, we switched to making a full, proper filesystem across all the disks present. See: NixOS/nixpkgs#64126

btw: I don't work for Packet. Just a fan.

equinix-metal-builders's People

Contributors

andir avatar cole-h avatar delroth avatar dustinmiller avatar grahamc avatar julienmalka avatar lheckemann avatar mweinelt avatar thefloweringash avatar vcunat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

equinix-metal-builders's Issues

Pass on 404

If any of these calls 404, exit the build with a success immediately:

https://github.com/grahamc/packet-nix-builder/blob/master/drain.sh#L30-L51

https://github.com/grahamc/packet-nix-builder/blob/master/reboot.sh#L29-L35

https://github.com/grahamc/packet-nix-builder/blob/master/restore.sh#L30-L50

A 404 means the spot instance was taken back, not a problem. Probably fine to check for a 404 before running and aborting early, instead of modifying each call. A minor race, but low stakes.

Change the reboot to a shutdown / bootup

2020-11-25 21:44:43 gchristensen change https://github.com/grahamc/packet-nix-builder/blob/master/reboot.sh#L30 to do a power_off, then wait for . "state": "powering_off",
2020-11-25 21:44:57 gchristensen erm, wait for "state": "inactive",
2020-11-25 21:45:11 gchristensen then send a curl -v --data '{"type": "power_on"}'
2020-11-25 21:45:23 sphalerite why that?
2020-11-25 21:45:41 gchristensen these machines' firmware gets persnickety after a bunch of reboots
2020-11-25 21:45:51 sphalerite oh wow
2020-11-25 21:45:59 gchristensen :)
2020-11-25 21:47:42 sphalerite gchristensen: the power-off thing is what you want a PR for?
2020-11-25 21:47:53 gchristensen yeah
2020-11-25 21:48:00 gchristensen curl --header 'Accept: application/json' --header "X-Auth-Token: ..." https://api.packet.net/devices/74d546a1-7f1f-46b2-b271-263ad0474495 | jq .state is how to check for "inactive"

Delete instances which fail to boot

If the "reboot" phase fails, we should upload a new build step to delete the machine, which might get us a firmware update. We'll get the machine back later via terraform or the spot market request.

flake.lock contains an invalid "treeHash" attribute for nixpkgs (a github: input)

Regression from f8bba1b causing https://hydra.nixos.org/jobset/equinix-metal-builders/main to not eval correctly anymore.

'nix flake metadata' returned exit code 1:
error:
       … while updating the lock file of flake 'github:NixOS/equinix-metal-builders/aeae1ffb8c1b1682913a1e7c6bc8c970b3afcfba'

       error: input attribute 'treeHash' not supported by scheme 'github'

I wonder if this maybe was locked with a weird version of Nix. Anyway, needs a refresh.

Create a way to merge PRs without rebooting all the machines

For example, maybe a PR to the terraform config in #7 would scale out and not necessitate a full reboot. Or, a change is only a debug change to aarch64. Having a way to not reboot everything would be nice.

It could be as simple as "don't reboot the machines" ... or as complicated as "reboot this specific machine" / "reboot this architecture" / "reboot no machines".

Manage spot market bids with Terraform

Right now the Packet spot market bids are just managed by hand. I think it'd be better and more transparent to manage them via terraform, right here.

Here is a starter config:

variable "project_id" {
  default = "86d5d066-b891-4608-af55-a481aa2c0094"
}

resource "packet_spot_market_request" "req" {
  project_id    = var.project_id
  max_bid_price = 1.80
  facilities    = ["ams1", "sjc1", "dfw2", "nrt1", "ewr1"]
  devices_min   = 1
  devices_max   = 1

  instance_parameters {
    hostname         = "c2.large.arm"
    billing_cycle    = "hourly"
    operating_system = "custom_ipxe"
    always_pxe       = true
    plan             = "c2.large.arm"
    ipxe_script_url  = "https://netboot.gsc.io/hydra-aarch64-linux/netboot.ipxe"
    project_ssh_keys = []
    user_ssh_keys    = []
    tags             = [ "hydra" ]
  }
}

resource "packet_spot_market_request" "req2" {
  project_id    = var.project_id
  max_bid_price = 1.70
  facilities    = ["ams1", "sjc1", "dfw2", "nrt1", "ewr1"]
  devices_min   = 1
  devices_max   = 1

  instance_parameters {
    hostname         = "c2.large.arm"
    billing_cycle    = "hourly"
    operating_system = "custom_ipxe"
    always_pxe       = true
    plan             = "c2.large.arm"
    ipxe_script_url  = "https://netboot.gsc.io/hydra-aarch64-linux/netboot.ipxe"
    project_ssh_keys = []
    user_ssh_keys    = []
    tags             = [ "hydra" ]
  }
}

Setting up the shell.nix, adding it to the build pipeline, etc. is still to-do. We should assume the packet API key will come from the environment, though not sure exactly what the environment variable should be named, it is currently named PACKET_TOKEN.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.