GithubHelp home page GithubHelp logo

ciao's Issues

Add networking benchmark to ciao-cli

Our developers need a quick and easy way to measure networking performance between tenant instances (both vms and containers). Extend the ssntp tracing support to measure network performance, and add some benchmarks to the cli to measure the how long it takes to create the network interface, as well as the throughput between instances.

ciao-cli -list-tenants should not ask for a tenant to be passed as argument

When having more than 1 tenants for a specific user, and trying to list the tenants, you are asked to specify the tenant. But since the tenant's list is the information you are looking for, you should not get the next error:

root@sn-controller ~ # ./ciao-cli -list-tenants
Available projects for csr:
Project[1]: ciao1 (4e3290a89b764b979d508b22cef4048d)
Project[2]: service (92834d7fd3494ef994e421fdb3eff3a9)
F0421 13:11:04.421239 16964 main.go:67] ciao-cli FATAL: Please specify a project to use with -tenant-name or -tenant-id
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa6ca00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa507e0, 0xc800000003, 0xc8201e3080, 0xa31c88, 0x7, 0x43, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa507e0, 0x3, 0xc820016280, 0x4f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc820016280, 0x4f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0x8d9a80, 0x3f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/main.go:67 +0x89
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/main.go:793 +0x5cf

Not able to list tenants with ciao-cli. Tenants are listed only after listing workloads or quotas

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -list-tenants

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357
-list-quotas -tenant 92834d7fd3494ef994e421fdb3eff3a9
Quotas for tenant 92834d7fd3494ef994e421fdb3eff3a9:
Instances: 0 | Unlimited
CPUs: 0 | Unlimited
Memory: 0 | Unlimited
Disk: 0 | Unlimited

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -list-tenants
Tenant 1
UUID: 92834d7fd3494ef994e421fdb3eff3a9
Name:

Introduce new 'downloading' State for instances

If you start a docker container on a compute node which uses a base image that has not previously been used by any container on that compute node, ciao-launcher needs to download the image. This can greatly increase the start time of instance. We should therefore add a new state to indicate to the users why the image start is taking so long. 'downloading' should do the trick.

Implement database schema migration

we need to add functionality to migrate database schema to newer versions so that we don't all have to delete our databases everytime I change the schema.

SSNTP version checking

We do not check the SSNTP version for doing backward compatibility.
We need to e.g. keep track of supported commands for a given version.

Failed to start container: failed to add interface svp_ae86911 to sandbox

I0414 15:19:53.285424   10726 network.go:282] CN VNIC created = svn_ae86911 &{2 sbr_3f2cf6ba {172.16.0.0 ffffff00} 172.16.0.1 br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.0.0/24_902d35e7-d635-4000-91ba-dc76ef8ef389_192.168.0.118} &{2 192.168.0.118 192.168.0.83 172.16.0.0/24 8fa6d6ec08c444ed951b2c2018a4c6ad 172.16.0.0/24 902d35e7-d635-4000-91ba-dc76ef8ef389  4268 }
I0414 15:19:53.762754   10726 network.go:282] CN VNIC created = svn_ae86911 &{0 sbr_3f2cf6ba {172.16.0.0 ffffff00} 172.16.0.1 br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.0.0/24_902d35e7-d635-4000-91ba-dc76ef8ef389_192.168.0.118} <nil>
E0414 15:19:54.201167   10726 docker.go:257] Unable to start container Error response from daemon: Cannot start container 0665d3cb511f9405f291dd4b43a1865e9bd69359bf5a280867d2fd9af2840aab: [9] System error: failed to add interface svp_ae86911 to sandbox: failed in prefunc: failed to get link by name "svp_ae86911": Link not found
E0414 15:19:54.201192   10726 instance.go:134] Unable to start instance[launch_failure]: Error response from daemon: Cannot start container 0665d3cb511f9405f291dd4b43a1865e9bd69359bf5a280867d2fd9af2840aab: [9] System error: failed to add interface svp_ae86911 to sandbox: failed in prefunc: failed to get link by name "svp_ae86911": Link not found

Extend CONNECTED payload

Extend it to support the CONFIGURE payload and dispatch cluster configuration information to the connected client.

networking - cnci-agent - Add support for recovery on reboot

Add support in the CNCI Agent to store state that can be recovered if the CNCI VM hosting the CNCI Agent reboots. Currently the CNCI Agent can recover state if the Agent itself crashes. But it cannot recover its state without persisting it in a data base.

Ideally the data base should be stored outside of the context of the VM in a network attached storage volume or KV store.

Calls to docker APIs cannot be cancelled

The reason is that when we call these functions we simply pass in a background context. The result is that the instance go routines that call these functions block until they return. This is only really a problem when shutting down launcher. A blocking call to a docker API may prevent ciao-launcher from shutting down properly when it receives a SIGTERM.

Unable to launch new CNCIs when trying to launch instances with other tenants

  1. Launched docker and clear instances with tenant 92834d7fd3494ef994e421fdb3eff3a9. Instances were successfully created.
  2. Tried to launch instances with another tenant of the same user and failed with error: Unable to Launch Tenant CNCI.
  3. Tried to launch instances with the admin tenant and failed with same error as above.

Below logs from cli and network node:

  • cli logs:

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -launch-instances -workload ca957444-fa46-11e5-94f9-38607786d9ec
Created new instance: 2369578a-8f12-4bf0-af1e-a129295427dc
root@sn-controller ~/bin # ./ciao-cli -password supernova -username admin -identity https://sn-keystone.zpn.intel.com:35357 -tenant d750a83434e941639ba0c3a37fe18efa -scope admin -launch-instances -workload e35ed972-c46c-4aad-a1e7-ef103ae079a2
F0418 14:18:53.028643 3137 ciao-cli.go:109] ciao-cli FATAL: HTTP Error [500] for [POST https://localhost:8774/v2.1/d750a83434e941639ba0c3a37fe18efa/servers]: Unable to Launch Tenant CNCI
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa60a00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa447a0, 0xc800000003, 0xc8201c8fc0, 0xa26054, 0xb, 0x6d, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa447a0, 0x3, 0xc8200f45a0, 0x8f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc8200f45a0, 0x8f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0xc8200e4680, 0x7f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:109 +0x89
main.createTenantInstance(0x7ffd87812721, 0x20, 0x7ffd8781276b, 0x24, 0x1)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:598 +0x473
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:807 +0x6b7

  • Network node Error Log:

Log file created at: 2016/04/18 14:01:51
Running on machine: sn-network
Binary: Built with gc go1.6 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0418 14:01:51.307449 588 network.go:287] cn.CreateCnciVnic failed cncivnic error: enable link up device or resource busy
E0418 14:01:51.307578 588 instance.go:134] Unable to start instance[network_failure]: cncivnic error: enable link up device or resource busy
E0418 14:13:51.977771 588 network.go:287] cn.CreateCnciVnic failed cncivnic error: enable link up device or resource busy
E0418 14:13:51.977792 588 instance.go:134] Unable to start instance[network_failure]: cncivnic error: enable link up device or resource busy
E0418 14:18:52.817650 588 network.go:287] cn.CreateCnciVnic failed cncivnic error: enable link up device or resource busy
E0418 14:18:52.817667 588 instance.go:134] Unable to start instance[network_failure]: cncivnic error: enable link up device or resource busy

  • output from # ip -d link

root@sn-network ~ # ip -d link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 addrgenmode eui64
2: tunl0@NONE: mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0 promiscuity 0
ipip remote any local any ttl inherit nopmtudisc addrgenmode eui64
3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether 34:13:e8:36:b0:a0 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
4: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether b8:ae:ed:77:02:86 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:52:d2:61:8e brd ff:ff:ff:ff:ff:ff promiscuity 0
bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q addrgenmode eui64
6: svc_6c531194@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
link/ether 02:37:a6:4c:58:33 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_92834d7fd3494ef994e421fdb3eff3a9_044939c4-feff-447b-ac05-6e3ec8bacc2c
7: svc_bd5f8286@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
link/ether 02:13:95:c5:51:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_4e3290a89b764b979d508b22cef4048d_d1b20e75-a07f-471b-8315-d04095137235
8: svc_b4206548@enp0s25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 02:13:95:c5:51:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_4e3290a89b764b979d508b22cef4048d_7f1c862d-ba25-47f3-9f85-a86ea73d531a
9: svc_2c11fbc3@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
link/ether 02:3d:80:f6:6f:40 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_d750a83434e941639ba0c3a37fe18efa_b479959c-99ee-4d59-9331-8b76e625225c
10: svc_44124e30@enp0s25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 02:3d:80:f6:6f:40 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_d750a83434e941639ba0c3a37fe18efa_18e4cdca-4d6b-4078-87e7-f8ce647cc961
11: svc_9e0bcea4@enp0s25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 02:3d:80:f6:6f:40 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_d750a83434e941639ba0c3a37fe18efa_aded9538-e4f4-49e5-97d7-9b9b014743c9

Limit number of parallel starts

Currently, there is no limit to the number of instances that ciao-launcher will start in parallel. This places extreme load on the compute node when spawning large amounts of instances at once and can lead to various errors. It may be better for launcher to introduce some sort of semaphore to limit the number of instances that can be launched in parallel to some function of the number of cores on the machine. We could also return a special STATUS, e.g., throttle, to indicate that launcher is overloaded but not full.

When launching large amounts of instances, e.g., 10000, we often see some failures and timeouts in qemu and networking. Reducing the load on the compute node may prevent these failures.

Networking - cn api - support macvtap bridge mode

Add support for macvtap bridge mode for CNCI VNICs. This will allow the setup of a single node test systems. Currently the CNCI VNICs are always setup in VEPA mode which prevents the hosts (and other CNCIs) from being able to reach the CNCI.

docker.go file needs cleaning up

Ideally this file should just contain functions for the virtualizer, i.e., functions that are executed in the context of the instance go routine. The functions at the end of this file not executed in this context need to be moved elsewhere.

Handle corrupt instances

We need some sort of strategy for handling corrupt instances. For example, launcher might detect on start up that the state it maintains about an instance has become corrupted, or the docker container associated by with that instance has been deleted.

Currently, if launcher cannot retrieve an instance state it simply ignores it. This means that it is not reported to the upper layers, and so cannot be deleted, and is essentially leaked on the compute node.

Special handling is also required for instances associated with docker containers that get deleted by some out of bound mechanism, i.e., by someone logging into the compute node and doing

sudo docker rm -f

Networking - cn api - limit concurrency

Add concurrency control in the CN API to limit the number of simultaneous calls into the netlink library. Making too many concurrent netlink calls to create/delete interfaces seems to add large latencies to the API.

Longer term the degree of concurrency needs to be tuneable based on system response.

ciao-launcher: do not start docker plugin on NN

On NN nodes docker may not be provisioned. Hence it may not make sense to start the docker plugin or interact with Docker on the NN. (or start docker plugin only if docker is detected on the NN).

Support volatile instances

ciao-launcher currently only supports persistent images. It retains instance information about these instances even after they exit. Support for volatile instances needs to be added.

Should specify media type when creating iso images

ciao-launcher creates iso images when launching QEMU VMs. The path to these images are passed to the VMs so there contents can be access when the VM boots. These VMs are used to store the cloud-init data and also some ciao specific information needed by the cnci-agent. ciao-launcher does not specify a media type for these isos when it invokes qemu and this results in a qemu warning.

Race detected when closing launcher

Here's the race

==================
WARNING: DATA RACE
Write by goroutine 10:
  runtime.closechan()
      /usr/local/go/src/runtime/chan.go:292 +0x0
  main.connectToServer()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/main.go:375 +0x507

Previous read by goroutine 16:
  runtime.chansend()
      /usr/local/go/src/runtime/chan.go:115 +0x0
  main.(*instanceData).instanceLoop()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/instance.go:243 +0x3fac

Goroutine 10 (running) created at:
  main.main()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/main.go:599 +0xb8e

Goroutine 16 (running) created at:
  main.startInstance()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/instance.go:283 +0x474
  main.startOverseer.func1()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/overseer.go:680 +0x6f1
  path/filepath.walk()
      /usr/local/go/src/path/filepath/path.go:349 +0xa3
  path/filepath.walk()
      /usr/local/go/src/path/filepath/path.go:374 +0x619
  path/filepath.Walk()
      /usr/local/go/src/path/filepath/path.go:396 +0x106
  main.startOverseer()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/overseer.go:696 +0x2b7
  main.connectToServer()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/main.go:328 +0x3b3
==================

The problem is that the overseer channel is closed and written to from different go routines. Need to find a better way to close down the overseer.

hard-reset should delete all docker ciao networks

Currently when we perform a hard-reset we only delete the docker networks associated with the containers we have just deleted. In general this is fine but it might be possible to docker networks to get leaked if there is some sort of launcher crash or node shutdown when executing hard-reset. These networks would then node get cleaned up. This makes it hard to ensure that we are always tested on a clean environment.

Instead we should enumerate all the networks and delete the networks created by the ciao plugin.

cannot delete all docker instances through the cli

Launched around 50 docker instances and then tried to delete them. 1 of them failed to be deleted, listed instances to be sure it was still there and it appears listed, but in the compute nodes there is no docker instance running.

root@sn-controller ~/bin # for i in $(./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -list-instances | grep -A1 'Instance' | awk '/UUID/ {print $2}'); do ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -delete-instance -instance $i; done
Deleted instance: 93b0f44b-253b-492d-92ca-47cea6bdf1d6
Deleted instance: b9957b8c-0489-41fe-99cf-e2da361f3f6f
Deleted instance: 51c8ab26-f8e7-48ae-a5e8-ba210aa391d1
....
Deleted instance: 10e096de-28f7-4f7b-a1e4-ac550818dff6
Deleted instance: d2727311-66db-4f35-8025-9a3d11b6d018

  • failure

F0415 21:12:32.270183 1798 ciao-cli.go:109] ciao-cli FATAL: HTTP Error [500] for [DELETE https://localhost:8774/v2.1/92834d7fd3494ef994e421fdb3eff3a9/servers/6eef8f8d-a6a8-48d0-9e93-2701c2a49ac4]: Instance not available
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa64a00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa487c0, 0xc800000003, 0xc82000a0c0, 0xa2a264, 0xb, 0x6d, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa487c0, 0x3, 0xc820112d10, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc820112d10, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0xc820104b40, 0xa0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:109 +0x89
main.deleteTenantInstance(0x7ffffff4372f, 0x20, 0x7ffffff4376b, 0x24)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:625 +0x27b
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:907 +0x7c9
Deleted instance: 715c7858-0f4b-451c-8cff-d903b16b4f8d
Deleted instance: c3ff8985-e9c5-4912-9331-ceda5740404b
Deleted instance: 74c84ae2-c81c-4e6a-a592-263b7fc9f1ad

  • Instance still shown

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -list-instances
Instance #1
UUID: 4d4ff4f3-c3fc-41b0-950f-173362273f37
Status: running
Private IP: 172.16.0.15
MAC Address: 02:00:ac:10:00:0f
CN UUID: 3534b9eb-b2ad-47b5-9248-4f329aa59f41
Image UUID: fa7d86d8-fa46-11e5-8493-38607786d9ec
Tenant UUID: 92834d7fd3494ef994e421fdb3eff3a9
SSH IP: 192.168.10.169
SSH Port: 33015

  • tried to delete again

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -delete-instance -instance 4d4ff4f3-c3fc-41b0-950f-173362273f37
F0415 21:14:24.212779 1880 ciao-cli.go:109] ciao-cli FATAL: HTTP Error [500] for [DELETE https://localhost:8774/v2.1/92834d7fd3494ef994e421fdb3eff3a9/servers/4d4ff4f3-c3fc-41b0-950f-173362273f37]: Instance not available
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa64a00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa487c0, 0xc800000003, 0xc820224180, 0xa2a264, 0xb, 0x6d, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa487c0, 0x3, 0xc8202086e0, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc8202086e0, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0xc8202021e0, 0xa0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:109 +0x89
main.deleteTenantInstance(0x7ffe45f5b72f, 0x20, 0x7ffe45f5b76b, 0x24)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:625 +0x27b
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:907 +0x7c9

  • on compute nodes, there is no docker instance running:

root@sn-compute01 /var/lib/ciao/logs/launcher # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

root@sn-compute02 /var/lib/ciao/logs/launcher # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@sn-compute02 /var/lib/ciao/logs/launcher #

Add launch time benchmark to ciao-cli

Our developers need a way to very quickly and easily run a simple launch time benchmark on a cluster. Add functionality to ciao-cli to launch vms and docker instances and report the total number of seconds elapsed using the existing tracing functionality. Add options to specify the total number of instances to launch or the instances per node to launch.

FULL command should not contain a payload

ciao-launcher currently sends a payload.Ready payload when sending the FULL command. It shouldn't do this. In the SSNTP specification FULL does not have a payload. In addition, ciao-scheduler doesn't even look at it so it's a waste

Failed to create docker instance: cn.CreateVnic failed Timeout waiting for device ready

E0405 17:45:05.075607   11595 network.go:252] cn.CreateVnic failed br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready
E0405 17:45:05.075977   11595 instance.go:111] Unable to start instance[network_failure]: br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready
E0405 17:45:05.171449   11595 network.go:252] cn.CreateVnic failed br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready
E0405 17:45:05.171485   11595 instance.go:111] Unable to start instance[network_failure]: br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready

Also, we've seen a lot of a slowness today when launching containers. I wonder if this could be somehow connected.

This failures are happening on the first attempt to create a subnet for a docker instance on a compute node. I guess launcher will block for 60 seconds and due to the lock I added the other day, all start commands for the same subnet will block on that launcher.

I suspect once we've fixed this problem our docker related performance problems will go away as well.

Feel free to assign back to me if it's a launcher problem. I'll try to reproduce Monday anyway.

Cleanup launcher source

The launcher source needs to be cleaned up a little

  1. There are some functions which are too long, particularly in instance.go
  2. The payloads.go file needs to go and to be split up over the other files that implement commands.
  3. It would be nice to rename some of the files that implement commands, e.g., start to indicate that these commands run in the instance go routine context.

Query qmp for spice and netcat ports

The --with-ui option enables netcat and spice support for qemu. A port is reserved for instance that is launched which spice and netcat can use to communicate to the underlying VM. Internally, launcher maintains a list of free ports so it knows which ports to use when a new START request comes in. The problem is that when you stop and restart launcher, launcher loses track of which ports are assigned to running instances. It is possible for launcher to query the running vms to find out which ports are currently in use, but it does not yet do this. The end results is that VM launch time can be increases while launcher searches for a free port.

Now this is a debug option, for the time being at least, so there's no urgency to fix this.

Restrict container resource usage

The START payload that creates docker containers contains a list of resources restrictions for that container. ciao-launcher currently ignores these restrictions. This needs to change

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.