GithubHelp home page GithubHelp logo

cafebazaar / blacksmith Goto Github PK

View Code? Open in Web Editor NEW
68.0 9.0 15.0 6.29 MB

Bare-Metal CoreOS Cluster Manager

License: GNU General Public License v2.0

Go 69.04% Shell 8.86% HTML 7.46% JavaScript 9.81% CSS 1.71% Makefile 3.12%
bare-metal ignition coreos cluster

blacksmith's People

Contributors

alialaee avatar arastu avatar danderson avatar ebraminio avatar farnasirim avatar givia avatar jonboulle avatar mehdy avatar praal avatar remohammadi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blacksmith's Issues

Restarting goroutines instead of ending the process

The HA feature is implemented (see #7), but the process is killed when it isn't master anymore. This behaviour makes blacksmith dependant to a service manager configuration. The better mechanism is to kill the DHCP server, and start again from the watch loop (to become master again).

IPMI

understanding IPMI (e.g. for power control) would virtually turn blacksmith into a cloud host.

Automated upgrade mechanism

  • The image files should be downloaded inside <workspace>/images/<new-version>/
  • The etcd entry should be updated after validating the image files
  • Nodes should restart one by one (by signaling locksmith?).

etcd configuration should be passed to the templates

Having the state unknown, the bootstrappers may need to have access to BoB's etcd (e.g. for reboot service). This can be implemented by adding etcd configurations to cloud-config templates context and providing a way to access them from the template files.

Web UI refactor

  • Improving the list of machines: suggestions are noted in the attached screenshot.
  • When clicked on a machine, a modal window should be shown with these details and actions on the selected machine:
    • Hardware address
    • Links to addresses which generate cloudconfig, ignition, and bootparams files for the selected machine.
    • The editable list of current flags for this machine, and a way to add more.

suggestions

Authorization for API

The api (/api/*) should be protected. Although a machine should have access to (/api/machine/<MAC>/) when its ip is equal to the ip assigned to this mac.

Bootstrappers dns entry

Bootstrappers need to register their (name, address) identity pairs in the skydns etcd directory since their hostnames need to be resolved by skydns.

DNS forwarder

We should also handle dns requests, to be able to resolve node names, which is required by some platforms like kubernetes. Other requests should be forwarded to the given DNS addresses.

If the blacksmith is run in HA mode, the address of the other instances should also be passed to the nodes, and they should respond to dns requests even if they aren't in the leader mode (See #7).

Leader election through etcd

If multiple instances are running using the same etcd cluster and the same etcd directory (-etcd-dir), only one of them should be active (response to dhcp requests), and the others should wait until the leader is killed somehow.

Proposal: Dynamic CloudConfig and IgnitionConfig, using etcd

The idea is to compile those configs by merging yaml chunks stored in etcd, in a tree like this:

/aghajoon                     (namespace, can be changed by a flag)
  ├─ coreos-version           (e.x. `835.1.0`)
  ├─ cloud-config
  │    ├─ etcd
  │    ├─ customizations
  │    ├─ docker
  │    ├─ flannel
  │    └─ ssh-keys-staff1
  └─ ignition-config
       └─ bootstrap-harddisk

Example for content for key=aghajoon/cloud-config/customizations:

coreos:
  units:
    - name: increase-nf_conntrack-connections.service
      command: start
      content: |
        [Unit]
        Description=Increase the number of connections in nf_conntrack. default is 65536

        [Service]
        Type=oneshot
        ExecStartPre=/usr/sbin/modprobe nf_conntrack
        ExecStart=/bin/sh -c "sysctl -w net.netfilter.nf_conntrack_max=262144"
write_files:
  - path: /etc/environment
    owner: core
    content: |
      public_ipv4={{.IP}}

Workspace syncing among the instances

Currently, it's the task of the user to sync the workspace among the instances, so if the master fails for some reason, the next master be as uptodate as the first one.

In practice, the lack of syncing mechanism makes the "files" feature useless.

Node names are too much random

It will be very easier to manage nodes if their names selected from a dictionary instead of a sequence of random characters. node0cc47a781c9a could be node_jaguar or node_saturn or something like this.

TLS for the web server

Adding a flag, to make the web service of blacksmith serve using TLS.
#28 depends on this issue, if we want to do it using client TLS authentication.

For example:

./blacksmith --http-listen 0.0.0.0:8443 --tls-ca <CA Path> --tls-cert <CERT-PATH> --tls-key <KEY-PATH>

should make the web service of blacksmith listen on 0.0.0.0:8443 for https requests.

Blacksmith UI cache

Blacksmith UI caches machines states, which is not a good thing especially at the bootstrapping phase when the states may change a number of times in a short period.

Global flags

Currently, flags can only be assigned to specific machines. It would be nice to be able to declare global flags which would affect all of the workers (e.g. for updating docker tags or any other bunch upgrade).

Announcements

Subscribe to this issue to get notified about project announcements.

Please do not use this thread to report issues.

panic: inconsistent label cardinality

Running dev_run.bash I get this panic:

panic: inconsistent label cardinality

Log:

Blacksmith (testing)
  Commit:        ffe69c5a70ab88d01ed6f3ab0293d883234bd153
  Build Time:    2017-02-18_16:30:55_+0330
Interface IP:    172.19.1.1
Interface Name:  vboxnet0
INFO[0000] Listening on 172.19.1.1:8000                  action=announce where=web.ServeWeb
DEBU[0000] Now we're the master instance. Starting the services...  action=debug where=blacksmith.main
INFO[0000] Listening on 172.19.1.1:70                    action=announce where=pxe.ServeHTTPBooter
INFO[0000] TFTP listening on 172.19.1.1:69               action=tftp where=pxe.ServeTFTP
INFO[0000] Listening on 172.19.1.1:4011                  action=announce where=pxe.ServePXE
INFO[0000] Listening on 172.19.1.1:67 (interface: vboxnet0)  action=announce where=dhcp.StartDHCP
INFO[0001] assignedIp=172.19.1.12 isPxe=true             action=debug object="00:02:7d:15:be:84" subject=Discover where=dhcp.ServeDHCP
INFO[0002] assignedIp=172.19.1.13 isPxe=true             action=debug object="00:02:7d:15:be:86" subject=Discover where=dhcp.ServeDHCP
INFO[0003] assignedIp=172.19.1.11 isPxe=true             action=debug object="00:02:7d:15:be:82" subject=Discover where=dhcp.ServeDHCP
INFO[0004] assignedIp=172.19.1.13 isPxe=true             action=debug object="00:02:7d:15:be:86" subject=Request where=dhcp.ServeDHCP
INFO[0005] assignedIp=172.19.1.12 isPxe=true             action=debug object="00:02:7d:15:be:84" subject=Request where=dhcp.ServeDHCP
INFO[0007]                                               action=debug object=00:02:7d:15:be:86 where=pxe.ServePXE
WARN[0008] error while transfering to 172.19.1.13:2070   action=tftp-transfer error="\"172.19.1.13:2070\": sending OACK: client aborted transfer: TFTP Aborted" subject=boot where=pxe.ServeTFTP
INFO[0008] clamping blocksize to "172.19.1.13:2071": 1456 -> 1450  action=tftp where=pxe.ServeTFTP
DEBU[0008] transfered to 172.19.1.13:2071                action=tftp-transfer subject=boot where=pxe.ServeTFTP
INFO[0008]                                               host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/ldlinux.c32" user-agent="Syslinux/6.03" where=pxe.ldlinuxHandler
DEBU[0008] 172.19.1.13:49154 requested a pxelinux config from URL "/pxelinux.cfg/a7ec255c-cb17-4785-b47b-146ffb6de852", which does not include a correct MAC address  host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/a7ec255c-cb17-4785-b47b-146ffb6de852" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0008]                                               host="172.19.1.1:70" method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/01-00-02-7d-15-be-86" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0008]                                               action=debug object=00:02:7d:15:be:84 where=pxe.ServePXE
WARN[0010] error while transfering to 172.19.1.12:2070   action=tftp-transfer error="\"172.19.1.12:2070\": sending OACK: client aborted transfer: TFTP Aborted" subject=boot where=pxe.ServeTFTP
INFO[0010] clamping blocksize to "172.19.1.12:2071": 1456 -> 1450  action=tftp where=pxe.ServeTFTP
DEBU[0010] transfered to 172.19.1.12:2071                action=tftp-transfer subject=boot where=pxe.ServeTFTP
INFO[0010]                                               host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/ldlinux.c32" user-agent="Syslinux/6.03" where=pxe.ldlinuxHandler
DEBU[0010] 172.19.1.12:49154 requested a pxelinux config from URL "/pxelinux.cfg/8f7e1db2-1494-4ad5-b2e3-b93d20a6d084", which does not include a correct MAC address  host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/8f7e1db2-1494-4ad5-b2e3-b93d20a6d084" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0010]                                               host="172.19.1.1:70" method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/01-00-02-7d-15-be-84" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0011] assignedIp=172.19.1.11 isPxe=true             action=debug object="00:02:7d:15:be:82" subject=Request where=dhcp.ServeDHCP
INFO[0014]                                               action=debug object=00:02:7d:15:be:82 where=pxe.ServePXE
WARN[0015] error while transfering to 172.19.1.11:2070   action=tftp-transfer error="\"172.19.1.11:2070\": sending OACK: client aborted transfer: TFTP Aborted" subject=boot where=pxe.ServeTFTP
INFO[0015] clamping blocksize to "172.19.1.11:2071": 1456 -> 1450  action=tftp where=pxe.ServeTFTP
DEBU[0015] transfered to 172.19.1.11:2071                action=tftp-transfer subject=boot where=pxe.ServeTFTP
INFO[0015]                                               host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/ldlinux.c32" user-agent="Syslinux/6.03" where=pxe.ldlinuxHandler
DEBU[0015] 172.19.1.11:49154 requested a pxelinux config from URL "/pxelinux.cfg/2f4a4bca-9d41-4298-9ffb-7e63b0cb842c", which does not include a correct MAC address  host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/2f4a4bca-9d41-4298-9ffb-7e63b0cb842c" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0015]                                               host="172.19.1.1:70" method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/01-00-02-7d-15-be-82" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0023] assignedIp=172.19.1.13 isPxe=false            action=debug object="00:02:7d:15:be:86" subject=Discover where=dhcp.ServeDHCP
INFO[0023] assignedIp=172.19.1.13 isPxe=false            action=debug object="00:02:7d:15:be:86" subject=Request where=dhcp.ServeDHCP
WARN[0023] error while executeTemplate(templateName=workspace-installer.sh machine=00:02:7d:15:be:86)  error="template with name=workspace-installer.sh wasn't found for root=&{ 0xc4201088c0 0xc4202b6380 {{ }}}" where=templating.executeTemplate
INFO[0023] 172.19.1.13 - - [18/Feb/2017:13:01:32 +0000] "GET /t/ig/00:02:7d:15:be:86 HTTP/1.1" 200 1482 
INFO[0024] assignedIp=172.19.1.12 isPxe=false            action=debug object="00:02:7d:15:be:84" subject=Discover where=dhcp.ServeDHCP
INFO[0024] assignedIp=172.19.1.12 isPxe=false            action=debug object="00:02:7d:15:be:84" subject=Request where=dhcp.ServeDHCP
INFO[0029] assignedIp=172.19.1.11 isPxe=false            action=debug object="00:02:7d:15:be:82" subject=Discover where=dhcp.ServeDHCP
INFO[0029] assignedIp=172.19.1.11 isPxe=false            action=debug object="00:02:7d:15:be:82" subject=Request where=dhcp.ServeDHCP
WARN[0030] error while executeTemplate(templateName=workspace-installer.sh machine=00:02:7d:15:be:82)  error="template with name=workspace-installer.sh wasn't found for root=&{ 0xc4201bc620 0xc42044ff80 {{ }}}" where=templating.executeTemplate
INFO[0030] 172.19.1.11 - - [18/Feb/2017:13:01:39 +0000] "GET /t/ig/00:02:7d:15:be:82 HTTP/1.1" 200 1482 
WARN[0030] error while executeTemplate(templateName=workspace-installer.sh machine=00:02:7d:15:be:84)  error="template with name=workspace-installer.sh wasn't found for root=&{ 0xc4201bc7e0 0xc42031a440 {{ }}}" where=templating.executeTemplate
INFO[0030] 172.19.1.12 - - [18/Feb/2017:13:01:40 +0000] "GET /t/ig/00:02:7d:15:be:84 HTTP/1.1" 200 1482 
INFO[0033] assignedIp=172.19.1.13 isPxe=false            action=debug object="00:02:7d:15:be:86" subject=Discover where=dhcp.ServeDHCP
INFO[0033] assignedIp=172.19.1.13 isPxe=false            action=debug object="00:02:7d:15:be:86" subject=Request where=dhcp.ServeDHCP
WARN[0033] error while executeTemplate(templateName=workspace-updater.sh machine=00:02:7d:15:be:86)  error="template with name=workspace-updater.sh wasn't found for root=&{ 0xc4201bcb60 0xc42031b240 {{ }}}" where=templating.executeTemplate
INFO[0033] 172.19.1.13 - - [18/Feb/2017:13:01:43 +0000] "GET /t/cc/00:02:7d:15:be:86 HTTP/1.1" 200 2046 
INFO[0035] assignedIp=172.19.1.12 isPxe=false            action=debug object="00:02:7d:15:be:84" subject=Discover where=dhcp.ServeDHCP
INFO[0035] assignedIp=172.19.1.12 isPxe=false            action=debug object="00:02:7d:15:be:84" subject=Request where=dhcp.ServeDHCP
panic: inconsistent label cardinality

goroutine 670 [running]:
panic(0xb4efe0, 0xc420130cf0)
        /home/sina/src/go/src/runtime/panic.go:500 +0x1a1
github.com/prometheus/client_golang/prometheus.(*MetricVec).WithLabelValues(0xc4201262c0, 0xc42039dc18, 0x2, 0x2, 0x12204e0, 0xc4203d5280)
        /home/sina/go/src/github.com/prometheus/client_golang/prometheus/vec.go:135 +0xa0
github.com/prometheus/client_golang/prometheus.(*HistogramVec).WithLabelValues(0xc42012a068, 0xc42039dc18, 0x2, 0x2, 0x0, 0x0)
        /home/sina/go/src/github.com/prometheus/client_golang/prometheus/histogram.go:337 +0x50
github.com/miekg/coredns/middleware/proxy.Proxy.ServeDNS(0x0, 0x0, 0xc4203d3240, 0x1, 0x1, 0x7fa179883278, 0xc4202691a0, 0x12204e0, 0xc4203d5280, 0xc420346090, ...)
        /home/sina/go/src/github.com/miekg/coredns/middleware/proxy/proxy.go:109 +0x477
github.com/cafebazaar/blacksmith/dns.(*dnsServer).generalDNS(0xc42034f460, 0x12204e0, 0xc4203d5280, 0xc420346090)
        /home/sina/go/src/github.com/cafebazaar/blacksmith/dns/dns.go:117 +0x19f
github.com/cafebazaar/blacksmith/dns.(*dnsServer).(github.com/cafebazaar/blacksmith/dns.generalDNS)-fm(0x12204e0, 0xc4203d5280, 0xc420346090)
        /home/sina/go/src/github.com/cafebazaar/blacksmith/dns/dns.go:127 +0x48
github.com/miekg/dns.HandlerFunc.ServeDNS(0xc42036a220, 0x12204e0, 0xc4203d5280, 0xc420346090)
        /home/sina/go/src/github.com/miekg/dns/server.go:84 +0x44
github.com/miekg/dns.(*ServeMux).ServeDNS(0xc420130790, 0x12204e0, 0xc4203d5280, 0xc420346090)
        /home/sina/go/src/github.com/miekg/dns/server.go:210 +0x61
github.com/miekg/dns.(*Server).serve(0xc420191e10, 0x12190a0, 0xc42027a360, 0x1215360, 0xc420130790, 0xc420288400, 0x27, 0x200, 0xc42012a240, 0xc4203d1940, ...)
        /home/sina/go/src/github.com/miekg/dns/server.go:579 +0x500
created by github.com/miekg/dns.(*Server).serveUDP
        /home/sina/go/src/github.com/miekg/dns/server.go:533 +0x2d7

Provision failure

vagrant up --provider=libvirt pxeserver gives me this:

==> pxeserver:  /usr/local/go/src/github.com/elazarl/go-bindata-assetfs (from $GOROOT)
==> pxeserver:  /go/src/github.com/elazarl/go-bindata-assetfs (from $GOPATH)
==> pxeserver: 
==> pxeserver: The command '/bin/sh -c go build .' returned a non-zero code: 1

Adding a line to Docker file to "go get" that external dependency seemed to fix the issue. Pull request sent.

Data sources refactoring

Already in progress.
TODO:
1- workaround for templating, previously done by "Repo"s with references to multiple data sources.
2- etcd_datasource is getting pretty big. Perhaps the implementations for different parts of the MasterDataSource interface should be separated out into multiple files.
3- Switching DHCP server's back-end data source from lease pool too an implementation of DHCPDataSource.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.