GithubHelp home page GithubHelp logo

equinix-metal-builders's Issues

Create a way to merge PRs without rebooting all the machines

For example, maybe a PR to the terraform config in #7 would scale out and not necessitate a full reboot. Or, a change is only a debug change to aarch64. Having a way to not reboot everything would be nice.

It could be as simple as "don't reboot the machines" ... or as complicated as "reboot this specific machine" / "reboot this architecture" / "reboot no machines".

flake.lock contains an invalid "treeHash" attribute for nixpkgs (a github: input)

Regression from f8bba1b causing https://hydra.nixos.org/jobset/equinix-metal-builders/main to not eval correctly anymore.

'nix flake metadata' returned exit code 1:
error:
       … while updating the lock file of flake 'github:NixOS/equinix-metal-builders/aeae1ffb8c1b1682913a1e7c6bc8c970b3afcfba'

       error: input attribute 'treeHash' not supported by scheme 'github'

I wonder if this maybe was locked with a weird version of Nix. Anyway, needs a refresh.

Manage spot market bids with Terraform

Right now the Packet spot market bids are just managed by hand. I think it'd be better and more transparent to manage them via terraform, right here.

Here is a starter config:

variable "project_id" {
  default = "86d5d066-b891-4608-af55-a481aa2c0094"
}

resource "packet_spot_market_request" "req" {
  project_id    = var.project_id
  max_bid_price = 1.80
  facilities    = ["ams1", "sjc1", "dfw2", "nrt1", "ewr1"]
  devices_min   = 1
  devices_max   = 1

  instance_parameters {
    hostname         = "c2.large.arm"
    billing_cycle    = "hourly"
    operating_system = "custom_ipxe"
    always_pxe       = true
    plan             = "c2.large.arm"
    ipxe_script_url  = "https://netboot.gsc.io/hydra-aarch64-linux/netboot.ipxe"
    project_ssh_keys = []
    user_ssh_keys    = []
    tags             = [ "hydra" ]
  }
}

resource "packet_spot_market_request" "req2" {
  project_id    = var.project_id
  max_bid_price = 1.70
  facilities    = ["ams1", "sjc1", "dfw2", "nrt1", "ewr1"]
  devices_min   = 1
  devices_max   = 1

  instance_parameters {
    hostname         = "c2.large.arm"
    billing_cycle    = "hourly"
    operating_system = "custom_ipxe"
    always_pxe       = true
    plan             = "c2.large.arm"
    ipxe_script_url  = "https://netboot.gsc.io/hydra-aarch64-linux/netboot.ipxe"
    project_ssh_keys = []
    user_ssh_keys    = []
    tags             = [ "hydra" ]
  }
}

Setting up the shell.nix, adding it to the build pipeline, etc. is still to-do. We should assume the packet API key will come from the environment, though not sure exactly what the environment variable should be named, it is currently named PACKET_TOKEN.

Delete instances which fail to boot

If the "reboot" phase fails, we should upload a new build step to delete the machine, which might get us a firmware update. We'll get the machine back later via terraform or the spot market request.

Change the reboot to a shutdown / bootup

2020-11-25 21:44:43 gchristensen change https://github.com/grahamc/packet-nix-builder/blob/master/reboot.sh#L30 to do a power_off, then wait for . "state": "powering_off",
2020-11-25 21:44:57 gchristensen erm, wait for "state": "inactive",
2020-11-25 21:45:11 gchristensen then send a curl -v --data '{"type": "power_on"}'
2020-11-25 21:45:23 sphalerite why that?
2020-11-25 21:45:41 gchristensen these machines' firmware gets persnickety after a bunch of reboots
2020-11-25 21:45:51 sphalerite oh wow
2020-11-25 21:45:59 gchristensen :)
2020-11-25 21:47:42 sphalerite gchristensen: the power-off thing is what you want a PR for?
2020-11-25 21:47:53 gchristensen yeah
2020-11-25 21:48:00 gchristensen curl --header 'Accept: application/json' --header "X-Auth-Token: ..." https://api.packet.net/devices/74d546a1-7f1f-46b2-b271-263ad0474495 | jq .state is how to check for "inactive"

Pass on 404

If any of these calls 404, exit the build with a success immediately:

https://github.com/grahamc/packet-nix-builder/blob/master/drain.sh#L30-L51

https://github.com/grahamc/packet-nix-builder/blob/master/reboot.sh#L29-L35

https://github.com/grahamc/packet-nix-builder/blob/master/restore.sh#L30-L50

A 404 means the spot instance was taken back, not a problem. Probably fine to check for a 404 before running and aborting early, instead of modifying each call. A minor race, but low stakes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.