cafebazaar / blacksmith Goto Github PK
View Code? Open in Web Editor NEWBare-Metal CoreOS Cluster Manager
License: GNU General Public License v2.0
Bare-Metal CoreOS Cluster Manager
License: GNU General Public License v2.0
The HA feature is implemented (see #7), but the process is killed when it isn't master anymore. This behaviour makes blacksmith dependant to a service manager configuration. The better mechanism is to kill the DHCP
server, and start again from the watch loop (to become master again).
To manage a cluster, aghajoon should be able to assign IPs when there isn't any DHCP available in the network.
--dhcp-range
of DNSMASQ can be seen as a syntax example.
understanding IPMI (e.g. for power control) would virtually turn blacksmith into a cloud host.
<workspace>/images/<new-version>/
Having the state unknown, the bootstrappers may need to have access to BoB
's etcd (e.g. for reboot service). This can be implemented by adding etcd configurations to cloud-config templates context and providing a way to access them from the template files.
The api (/api/*
) should be protected. Although a machine should have access to (/api/machine/<MAC>/
) when its ip is equal to the ip assigned to this mac.
Bootstrappers need to register their (name, address) identity pairs in the skydns etcd directory since their hostnames need to be resolved by skydns.
We want to be able to reconfigure some of the network parameters in run-time. These are the important ones:
We should also handle dns requests, to be able to resolve node names, which is required by some platforms like kubernetes. Other requests should be forwarded to the given DNS addresses.
If the blacksmith is run in HA mode, the address of the other instances should also be passed to the nodes, and they should respond to dns requests even if they aren't in the leader mode (See #7).
If multiple instances are running using the same etcd cluster and the same etcd directory (-etcd-dir
), only one of them should be active (response to dhcp requests), and the others should wait until the leader is killed somehow.
The idea is to compile those configs by merging yaml chunks stored in etcd, in a tree like this:
/aghajoon (namespace, can be changed by a flag)
├─ coreos-version (e.x. `835.1.0`)
├─ cloud-config
│ ├─ etcd
│ ├─ customizations
│ ├─ docker
│ ├─ flannel
│ └─ ssh-keys-staff1
└─ ignition-config
└─ bootstrap-harddisk
Example for content for key=aghajoon/cloud-config/customizations
:
coreos:
units:
- name: increase-nf_conntrack-connections.service
command: start
content: |
[Unit]
Description=Increase the number of connections in nf_conntrack. default is 65536
[Service]
Type=oneshot
ExecStartPre=/usr/sbin/modprobe nf_conntrack
ExecStart=/bin/sh -c "sysctl -w net.netfilter.nf_conntrack_max=262144"
write_files:
- path: /etc/environment
owner: core
content: |
public_ipv4={{.IP}}
Currently, it's the task of the user to sync the workspace among the instances, so if the master fails for some reason, the next master be as uptodate as the first one.
In practice, the lack of syncing mechanism makes the "files" feature useless.
It will be very easier to manage nodes if their names selected from a dictionary instead of a sequence of random characters. node0cc47a781c9a could be node_jaguar or node_saturn or something like this.
Adding a flag, to make the web service of blacksmith serve using TLS.
#28 depends on this issue, if we want to do it using client TLS authentication.
For example:
./blacksmith --http-listen 0.0.0.0:8443 --tls-ca <CA Path> --tls-cert <CERT-PATH> --tls-key <KEY-PATH>
should make the web service of blacksmith listen on 0.0.0.0:8443 for https requests.
Blacksmith UI caches machines states, which is not a good thing especially at the bootstrapping phase when the states may change a number of times in a short period.
Currently, flags can only be assigned to specific machines. It would be nice to be able to declare global flags which would affect all of the workers (e.g. for updating docker tags or any other bunch upgrade).
e.g. for being able to assign a used IP to another node, or removing a node from the cluster.
Subscribe to this issue to get notified about project announcements.
Please do not use this thread to report issues.
Running dev_run.bash I get this panic:
panic: inconsistent label cardinality
Log:
Blacksmith (testing)
Commit: ffe69c5a70ab88d01ed6f3ab0293d883234bd153
Build Time: 2017-02-18_16:30:55_+0330
Interface IP: 172.19.1.1
Interface Name: vboxnet0
INFO[0000] Listening on 172.19.1.1:8000 action=announce where=web.ServeWeb
DEBU[0000] Now we're the master instance. Starting the services... action=debug where=blacksmith.main
INFO[0000] Listening on 172.19.1.1:70 action=announce where=pxe.ServeHTTPBooter
INFO[0000] TFTP listening on 172.19.1.1:69 action=tftp where=pxe.ServeTFTP
INFO[0000] Listening on 172.19.1.1:4011 action=announce where=pxe.ServePXE
INFO[0000] Listening on 172.19.1.1:67 (interface: vboxnet0) action=announce where=dhcp.StartDHCP
INFO[0001] assignedIp=172.19.1.12 isPxe=true action=debug object="00:02:7d:15:be:84" subject=Discover where=dhcp.ServeDHCP
INFO[0002] assignedIp=172.19.1.13 isPxe=true action=debug object="00:02:7d:15:be:86" subject=Discover where=dhcp.ServeDHCP
INFO[0003] assignedIp=172.19.1.11 isPxe=true action=debug object="00:02:7d:15:be:82" subject=Discover where=dhcp.ServeDHCP
INFO[0004] assignedIp=172.19.1.13 isPxe=true action=debug object="00:02:7d:15:be:86" subject=Request where=dhcp.ServeDHCP
INFO[0005] assignedIp=172.19.1.12 isPxe=true action=debug object="00:02:7d:15:be:84" subject=Request where=dhcp.ServeDHCP
INFO[0007] action=debug object=00:02:7d:15:be:86 where=pxe.ServePXE
WARN[0008] error while transfering to 172.19.1.13:2070 action=tftp-transfer error="\"172.19.1.13:2070\": sending OACK: client aborted transfer: TFTP Aborted" subject=boot where=pxe.ServeTFTP
INFO[0008] clamping blocksize to "172.19.1.13:2071": 1456 -> 1450 action=tftp where=pxe.ServeTFTP
DEBU[0008] transfered to 172.19.1.13:2071 action=tftp-transfer subject=boot where=pxe.ServeTFTP
INFO[0008] host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/ldlinux.c32" user-agent="Syslinux/6.03" where=pxe.ldlinuxHandler
DEBU[0008] 172.19.1.13:49154 requested a pxelinux config from URL "/pxelinux.cfg/a7ec255c-cb17-4785-b47b-146ffb6de852", which does not include a correct MAC address host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/a7ec255c-cb17-4785-b47b-146ffb6de852" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0008] host="172.19.1.1:70" method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/01-00-02-7d-15-be-86" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0008] action=debug object=00:02:7d:15:be:84 where=pxe.ServePXE
WARN[0010] error while transfering to 172.19.1.12:2070 action=tftp-transfer error="\"172.19.1.12:2070\": sending OACK: client aborted transfer: TFTP Aborted" subject=boot where=pxe.ServeTFTP
INFO[0010] clamping blocksize to "172.19.1.12:2071": 1456 -> 1450 action=tftp where=pxe.ServeTFTP
DEBU[0010] transfered to 172.19.1.12:2071 action=tftp-transfer subject=boot where=pxe.ServeTFTP
INFO[0010] host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/ldlinux.c32" user-agent="Syslinux/6.03" where=pxe.ldlinuxHandler
DEBU[0010] 172.19.1.12:49154 requested a pxelinux config from URL "/pxelinux.cfg/8f7e1db2-1494-4ad5-b2e3-b93d20a6d084", which does not include a correct MAC address host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/8f7e1db2-1494-4ad5-b2e3-b93d20a6d084" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0010] host="172.19.1.1:70" method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/01-00-02-7d-15-be-84" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0011] assignedIp=172.19.1.11 isPxe=true action=debug object="00:02:7d:15:be:82" subject=Request where=dhcp.ServeDHCP
INFO[0014] action=debug object=00:02:7d:15:be:82 where=pxe.ServePXE
WARN[0015] error while transfering to 172.19.1.11:2070 action=tftp-transfer error="\"172.19.1.11:2070\": sending OACK: client aborted transfer: TFTP Aborted" subject=boot where=pxe.ServeTFTP
INFO[0015] clamping blocksize to "172.19.1.11:2071": 1456 -> 1450 action=tftp where=pxe.ServeTFTP
DEBU[0015] transfered to 172.19.1.11:2071 action=tftp-transfer subject=boot where=pxe.ServeTFTP
INFO[0015] host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/ldlinux.c32" user-agent="Syslinux/6.03" where=pxe.ldlinuxHandler
DEBU[0015] 172.19.1.11:49154 requested a pxelinux config from URL "/pxelinux.cfg/2f4a4bca-9d41-4298-9ffb-7e63b0cb842c", which does not include a correct MAC address host=172.19.1.1 method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/2f4a4bca-9d41-4298-9ffb-7e63b0cb842c" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0015] host="172.19.1.1:70" method=GET proto="HTTP/1.0" referer= uri="/pxelinux.cfg/01-00-02-7d-15-be-82" user-agent="Syslinux/6.03" where=pxe.pxelinuxConfig
INFO[0023] assignedIp=172.19.1.13 isPxe=false action=debug object="00:02:7d:15:be:86" subject=Discover where=dhcp.ServeDHCP
INFO[0023] assignedIp=172.19.1.13 isPxe=false action=debug object="00:02:7d:15:be:86" subject=Request where=dhcp.ServeDHCP
WARN[0023] error while executeTemplate(templateName=workspace-installer.sh machine=00:02:7d:15:be:86) error="template with name=workspace-installer.sh wasn't found for root=&{ 0xc4201088c0 0xc4202b6380 {{ }}}" where=templating.executeTemplate
INFO[0023] 172.19.1.13 - - [18/Feb/2017:13:01:32 +0000] "GET /t/ig/00:02:7d:15:be:86 HTTP/1.1" 200 1482
INFO[0024] assignedIp=172.19.1.12 isPxe=false action=debug object="00:02:7d:15:be:84" subject=Discover where=dhcp.ServeDHCP
INFO[0024] assignedIp=172.19.1.12 isPxe=false action=debug object="00:02:7d:15:be:84" subject=Request where=dhcp.ServeDHCP
INFO[0029] assignedIp=172.19.1.11 isPxe=false action=debug object="00:02:7d:15:be:82" subject=Discover where=dhcp.ServeDHCP
INFO[0029] assignedIp=172.19.1.11 isPxe=false action=debug object="00:02:7d:15:be:82" subject=Request where=dhcp.ServeDHCP
WARN[0030] error while executeTemplate(templateName=workspace-installer.sh machine=00:02:7d:15:be:82) error="template with name=workspace-installer.sh wasn't found for root=&{ 0xc4201bc620 0xc42044ff80 {{ }}}" where=templating.executeTemplate
INFO[0030] 172.19.1.11 - - [18/Feb/2017:13:01:39 +0000] "GET /t/ig/00:02:7d:15:be:82 HTTP/1.1" 200 1482
WARN[0030] error while executeTemplate(templateName=workspace-installer.sh machine=00:02:7d:15:be:84) error="template with name=workspace-installer.sh wasn't found for root=&{ 0xc4201bc7e0 0xc42031a440 {{ }}}" where=templating.executeTemplate
INFO[0030] 172.19.1.12 - - [18/Feb/2017:13:01:40 +0000] "GET /t/ig/00:02:7d:15:be:84 HTTP/1.1" 200 1482
INFO[0033] assignedIp=172.19.1.13 isPxe=false action=debug object="00:02:7d:15:be:86" subject=Discover where=dhcp.ServeDHCP
INFO[0033] assignedIp=172.19.1.13 isPxe=false action=debug object="00:02:7d:15:be:86" subject=Request where=dhcp.ServeDHCP
WARN[0033] error while executeTemplate(templateName=workspace-updater.sh machine=00:02:7d:15:be:86) error="template with name=workspace-updater.sh wasn't found for root=&{ 0xc4201bcb60 0xc42031b240 {{ }}}" where=templating.executeTemplate
INFO[0033] 172.19.1.13 - - [18/Feb/2017:13:01:43 +0000] "GET /t/cc/00:02:7d:15:be:86 HTTP/1.1" 200 2046
INFO[0035] assignedIp=172.19.1.12 isPxe=false action=debug object="00:02:7d:15:be:84" subject=Discover where=dhcp.ServeDHCP
INFO[0035] assignedIp=172.19.1.12 isPxe=false action=debug object="00:02:7d:15:be:84" subject=Request where=dhcp.ServeDHCP
panic: inconsistent label cardinality
goroutine 670 [running]:
panic(0xb4efe0, 0xc420130cf0)
/home/sina/src/go/src/runtime/panic.go:500 +0x1a1
github.com/prometheus/client_golang/prometheus.(*MetricVec).WithLabelValues(0xc4201262c0, 0xc42039dc18, 0x2, 0x2, 0x12204e0, 0xc4203d5280)
/home/sina/go/src/github.com/prometheus/client_golang/prometheus/vec.go:135 +0xa0
github.com/prometheus/client_golang/prometheus.(*HistogramVec).WithLabelValues(0xc42012a068, 0xc42039dc18, 0x2, 0x2, 0x0, 0x0)
/home/sina/go/src/github.com/prometheus/client_golang/prometheus/histogram.go:337 +0x50
github.com/miekg/coredns/middleware/proxy.Proxy.ServeDNS(0x0, 0x0, 0xc4203d3240, 0x1, 0x1, 0x7fa179883278, 0xc4202691a0, 0x12204e0, 0xc4203d5280, 0xc420346090, ...)
/home/sina/go/src/github.com/miekg/coredns/middleware/proxy/proxy.go:109 +0x477
github.com/cafebazaar/blacksmith/dns.(*dnsServer).generalDNS(0xc42034f460, 0x12204e0, 0xc4203d5280, 0xc420346090)
/home/sina/go/src/github.com/cafebazaar/blacksmith/dns/dns.go:117 +0x19f
github.com/cafebazaar/blacksmith/dns.(*dnsServer).(github.com/cafebazaar/blacksmith/dns.generalDNS)-fm(0x12204e0, 0xc4203d5280, 0xc420346090)
/home/sina/go/src/github.com/cafebazaar/blacksmith/dns/dns.go:127 +0x48
github.com/miekg/dns.HandlerFunc.ServeDNS(0xc42036a220, 0x12204e0, 0xc4203d5280, 0xc420346090)
/home/sina/go/src/github.com/miekg/dns/server.go:84 +0x44
github.com/miekg/dns.(*ServeMux).ServeDNS(0xc420130790, 0x12204e0, 0xc4203d5280, 0xc420346090)
/home/sina/go/src/github.com/miekg/dns/server.go:210 +0x61
github.com/miekg/dns.(*Server).serve(0xc420191e10, 0x12190a0, 0xc42027a360, 0x1215360, 0xc420130790, 0xc420288400, 0x27, 0x200, 0xc42012a240, 0xc4203d1940, ...)
/home/sina/go/src/github.com/miekg/dns/server.go:579 +0x500
created by github.com/miekg/dns.(*Server).serveUDP
/home/sina/go/src/github.com/miekg/dns/server.go:533 +0x2d7
https://github.com/cafebazaar/blacksmith/blob/master/web/files.go#L53-L62
vagrant up --provider=libvirt pxeserver gives me this:
==> pxeserver: /usr/local/go/src/github.com/elazarl/go-bindata-assetfs (from $GOROOT)
==> pxeserver: /go/src/github.com/elazarl/go-bindata-assetfs (from $GOPATH)
==> pxeserver:
==> pxeserver: The command '/bin/sh -c go build .' returned a non-zero code: 1
Adding a line to Docker file to "go get" that external dependency seemed to fix the issue. Pull request sent.
Already in progress.
TODO:
1- workaround for templating, previously done by "Repo"s with references to multiple data sources.
2- etcd_datasource is getting pretty big. Perhaps the implementations for different parts of the MasterDataSource interface should be separated out into multiple files.
3- Switching DHCP server's back-end data source from lease pool too an implementation of DHCPDataSource.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.