ciao-project / ciao Goto Github PK

Ciao - Cloud Integrated Advanced Orchestrator

License: Apache License 2.0

Go 98.50% Shell 1.50%

ciao's Issues

Add networking benchmark to ciao-cli

Our developers need a quick and easy way to measure networking performance between tenant instances (both vms and containers). Extend the ssntp tracing support to measure network performance, and add some benchmarks to the cli to measure the how long it takes to create the network interface, as well as the throughput between instances.

ciao-cli -list-tenants should not ask for a tenant to be passed as argument

When having more than 1 tenants for a specific user, and trying to list the tenants, you are asked to specify the tenant. But since the tenant's list is the information you are looking for, you should not get the next error:

root@sn-controller ~ # ./ciao-cli -list-tenants
Available projects for csr:
Project[1]: ciao1 (4e3290a89b764b979d508b22cef4048d)
Project[2]: service (92834d7fd3494ef994e421fdb3eff3a9)
F0421 13:11:04.421239 16964 main.go:67] ciao-cli FATAL: Please specify a project to use with -tenant-name or -tenant-id
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa6ca00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa507e0, 0xc800000003, 0xc8201e3080, 0xa31c88, 0x7, 0x43, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa507e0, 0x3, 0xc820016280, 0x4f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc820016280, 0x4f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0x8d9a80, 0x3f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/main.go:67 +0x89
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/main.go:793 +0x5cf

Check container instance memory use is correct when container has multiple processes

ciao-launcher returns the pss of the process associated with a docker instance in the stats command. It's not really clear what happens if the container has multiple processes. I'm guessing that launcher does not return the correct memory usage for such containers. This needs to be checked and fixed if it is indeed a problem.

Not able to list tenants with ciao-cli. Tenants are listed only after listing workloads or quotas

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -list-tenants

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357
-list-quotas -tenant 92834d7fd3494ef994e421fdb3eff3a9
Quotas for tenant 92834d7fd3494ef994e421fdb3eff3a9:
Instances: 0 | Unlimited
CPUs: 0 | Unlimited
Memory: 0 | Unlimited
Disk: 0 | Unlimited

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -list-tenants
Tenant 1
UUID: 92834d7fd3494ef994e421fdb3eff3a9
Name:

Introduce new 'downloading' State for instances

If you start a docker container on a compute node which uses a base image that has not previously been used by any container on that compute node, ciao-launcher needs to download the image. This can greatly increase the start time of instance. We should therefore add a new state to indicate to the users why the image start is taking so long. 'downloading' should do the trick.

Send a keystone request for ciao-cli -list-tenants

Instead of only talking to the ciao-controller db.

Implement database schema migration

we need to add functionality to migrate database schema to newer versions so that we don't all have to delete our databases everytime I change the schema.

Send Node UUID on StartFailure error

It would be handy to be able to see if Start Failures were tied to a particular node.

SSNTP version checking

We do not check the SSNTP version for doing backward compatibility.
We need to e.g. keep track of supported commands for a given version.

Failed to start container: failed to add interface svp_ae86911 to sandbox

I0414 15:19:53.285424   10726 network.go:282] CN VNIC created = svn_ae86911 &{2 sbr_3f2cf6ba {172.16.0.0 ffffff00} 172.16.0.1 br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.0.0/24_902d35e7-d635-4000-91ba-dc76ef8ef389_192.168.0.118} &{2 192.168.0.118 192.168.0.83 172.16.0.0/24 8fa6d6ec08c444ed951b2c2018a4c6ad 172.16.0.0/24 902d35e7-d635-4000-91ba-dc76ef8ef389  4268 }
I0414 15:19:53.762754   10726 network.go:282] CN VNIC created = svn_ae86911 &{0 sbr_3f2cf6ba {172.16.0.0 ffffff00} 172.16.0.1 br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.0.0/24_902d35e7-d635-4000-91ba-dc76ef8ef389_192.168.0.118} <nil>
E0414 15:19:54.201167   10726 docker.go:257] Unable to start container Error response from daemon: Cannot start container 0665d3cb511f9405f291dd4b43a1865e9bd69359bf5a280867d2fd9af2840aab: [9] System error: failed to add interface svp_ae86911 to sandbox: failed in prefunc: failed to get link by name "svp_ae86911": Link not found
E0414 15:19:54.201192   10726 instance.go:134] Unable to start instance[launch_failure]: Error response from daemon: Cannot start container 0665d3cb511f9405f291dd4b43a1865e9bd69359bf5a280867d2fd9af2840aab: [9] System error: failed to add interface svp_ae86911 to sandbox: failed in prefunc: failed to get link by name "svp_ae86911": Link not found

networking - unit tests - move parallel and concurrency tests to main test suite

Add parallel and concurrency testing to the CN API. Currently there is a bug in the parallel testing and the tests are hosted in a separate directory https://github.com/01org/ciao/tree/master/networking/libsnnet/tests/parallel.

These test should be Incorporated in the main unit tests once the bug #22 has been resolved.

Extend CONNECTED payload

Extend it to support the CONFIGURE payload and dispatch cluster configuration information to the connected client.

networking - cnci-agent - Add support for recovery on reboot

Add support in the CNCI Agent to store state that can be recovered if the CNCI VM hosting the CNCI Agent reboots. Currently the CNCI Agent can recover state if the Agent itself crashes. But it cannot recover its state without persisting it in a data base.

Ideally the data base should be stored outside of the context of the VM in a network attached storage volume or KV store.

Retrieve the scheduler FQDN from the CA certificate

We should use the CA certificate FQDN as either the main server URI or an alternative one if no URI is passed through the SSNTP client config.

Calls to docker APIs cannot be cancelled

The reason is that when we call these functions we simply pass in a background context. The result is that the instance go routines that call these functions block until they return. This is only really a problem when shutting down launcher. A blocking call to a docker API may prevent ciao-launcher from shutting down properly when it receives a SIGTERM.

Unable to launch new CNCIs when trying to launch instances with other tenants

Launched docker and clear instances with tenant 92834d7fd3494ef994e421fdb3eff3a9. Instances were successfully created.
Tried to launch instances with another tenant of the same user and failed with error: Unable to Launch Tenant CNCI.
Tried to launch instances with the admin tenant and failed with same error as above.

Below logs from cli and network node:

cli logs:

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -launch-instances -workload ca957444-fa46-11e5-94f9-38607786d9ec
Created new instance: 2369578a-8f12-4bf0-af1e-a129295427dc
root@sn-controller ~/bin # ./ciao-cli -password supernova -username admin -identity https://sn-keystone.zpn.intel.com:35357 -tenant d750a83434e941639ba0c3a37fe18efa -scope admin -launch-instances -workload e35ed972-c46c-4aad-a1e7-ef103ae079a2
F0418 14:18:53.028643 3137 ciao-cli.go:109] ciao-cli FATAL: HTTP Error [500] for [POST https://localhost:8774/v2.1/d750a83434e941639ba0c3a37fe18efa/servers]: Unable to Launch Tenant CNCI
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa60a00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa447a0, 0xc800000003, 0xc8201c8fc0, 0xa26054, 0xb, 0x6d, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa447a0, 0x3, 0xc8200f45a0, 0x8f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc8200f45a0, 0x8f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0xc8200e4680, 0x7f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:109 +0x89
main.createTenantInstance(0x7ffd87812721, 0x20, 0x7ffd8781276b, 0x24, 0x1)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:598 +0x473
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:807 +0x6b7

Network node Error Log:

Log file created at: 2016/04/18 14:01:51
Running on machine: sn-network
Binary: Built with gc go1.6 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0418 14:01:51.307449 588 network.go:287] cn.CreateCnciVnic failed cncivnic error: enable link up device or resource busy
E0418 14:01:51.307578 588 instance.go:134] Unable to start instance[network_failure]: cncivnic error: enable link up device or resource busy
E0418 14:13:51.977771 588 network.go:287] cn.CreateCnciVnic failed cncivnic error: enable link up device or resource busy
E0418 14:13:51.977792 588 instance.go:134] Unable to start instance[network_failure]: cncivnic error: enable link up device or resource busy
E0418 14:18:52.817650 588 network.go:287] cn.CreateCnciVnic failed cncivnic error: enable link up device or resource busy
E0418 14:18:52.817667 588 instance.go:134] Unable to start instance[network_failure]: cncivnic error: enable link up device or resource busy

output from # ip -d link

root@sn-network ~ # ip -d link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 addrgenmode eui64
2: tunl0@NONE: mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0 promiscuity 0
ipip remote any local any ttl inherit nopmtudisc addrgenmode eui64
3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether 34:13:e8:36:b0:a0 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
4: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether b8:ae:ed:77:02:86 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:52:d2:61:8e brd ff:ff:ff:ff:ff:ff promiscuity 0
bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q addrgenmode eui64
6: svc_6c531194@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
link/ether 02:37:a6:4c:58:33 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_92834d7fd3494ef994e421fdb3eff3a9_044939c4-feff-447b-ac05-6e3ec8bacc2c
7: svc_bd5f8286@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
link/ether 02:13:95:c5:51:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_4e3290a89b764b979d508b22cef4048d_d1b20e75-a07f-471b-8315-d04095137235
8: svc_b4206548@enp0s25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 02:13:95:c5:51:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_4e3290a89b764b979d508b22cef4048d_7f1c862d-ba25-47f3-9f85-a86ea73d531a
9: svc_2c11fbc3@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
link/ether 02:3d:80:f6:6f:40 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_d750a83434e941639ba0c3a37fe18efa_b479959c-99ee-4d59-9331-8b76e625225c
10: svc_44124e30@enp0s25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 02:3d:80:f6:6f:40 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_d750a83434e941639ba0c3a37fe18efa_18e4cdca-4d6b-4078-87e7-f8ce647cc961
11: svc_9e0bcea4@enp0s25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 02:3d:80:f6:6f:40 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_d750a83434e941639ba0c3a37fe18efa_aded9538-e4f4-49e5-97d7-9b9b014743c9

ciao-launcher needs some unit tests

Its one existing test cases only covers 6.2 % of the code. We need to get this up to at least 70%.

deleting docker instances causes failures on next launch

To reproduce:

launch some docker instances.
delete them.
launch some more docker instances.

observe the breakage.

Limit number of parallel starts

Currently, there is no limit to the number of instances that ciao-launcher will start in parallel. This places extreme load on the compute node when spawning large amounts of instances at once and can lead to various errors. It may be better for launcher to introduce some sort of semaphore to limit the number of instances that can be launched in parallel to some function of the number of cores on the machine. We could also return a special STATUS, e.g., throttle, to indicate that launcher is overloaded but not full.

When launching large amounts of instances, e.g., 10000, we often see some failures and timeouts in qemu and networking. Reducing the load on the compute node may prevent these failures.

Networking - cn api - support macvtap bridge mode

Add support for macvtap bridge mode for CNCI VNICs. This will allow the setup of a single node test systems. Currently the CNCI VNICs are always setup in VEPA mode which prevents the hosts (and other CNCIs) from being able to reach the CNCI.

docker.go file needs cleaning up

Ideally this file should just contain functions for the virtualizer, i.e., functions that are executed in the context of the instance go routine. The functions at the end of this file not executed in this context need to be moved elsewhere.

networking - hard reset - docker database needs to be cleaned up

On a hard reset the docker database needs to be cleaned up
A workaround is to rm /var/lib/ciao/supernova/networking/docker_plugin.db after hard-reset

The networking code should cleanup the docker database as part of network reset.

Listing and dumping trace labels should not be a privileged operation

A user should be able to see her/his own labels.

Handle corrupt instances

We need some sort of strategy for handling corrupt instances. For example, launcher might detect on start up that the state it maintains about an instance has become corrupted, or the docker container associated by with that instance has been deleted.

Currently, if launcher cannot retrieve an instance state it simply ignores it. This means that it is not reported to the upper layers, and so cannot be deleted, and is essentially leaked on the compute node.

Special handling is also required for instances associated with docker containers that get deleted by some out of bound mechanism, i.e., by someone logging into the compute node and doing

sudo docker rm -f

Networking - cn api - limit concurrency

Add concurrency control in the CN API to limit the number of simultaneous calls into the netlink library. Making too many concurrent netlink calls to create/delete interfaces seems to add large latencies to the API.

Longer term the degree of concurrency needs to be tuneable based on system response.

Use environment variables to specifiy ciao-cli credentials

We could export and use CIAO_IDENTITY, CIAO_CONTROLLER, CIAO_USERNAME and CIAO_PASSWORD instead of always passing those as command line options.

ciao-launcher: do not start docker plugin on NN

On NN nodes docker may not be provisioned. Hence it may not make sense to start the docker plugin or interact with Docker on the NN. (or start docker plugin only if docker is detected on the NN).

Support volatile instances

ciao-launcher currently only supports persistent images. It retains instance information about these instances even after they exit. Support for volatile instances needs to be added.

Implement frame tracing compute interface

We want to be able to launch traced frames from the cli and then retrieve the tracing data back.

Client connection/disconnection events

We want to have SSNTP events for client disconnection/connection events and among other things forward them to ciao-controller.

Should specify media type when creating iso images

ciao-launcher creates iso images when launching QEMU VMs. The path to these images are passed to the VMs so there contents can be access when the VM boots. These VMs are used to store the cloud-init data and also some ciao specific information needed by the cnci-agent. ciao-launcher does not specify a media type for these isos when it invokes qemu and this results in a qemu warning.

Race detected when closing launcher

Here's the race

==================
WARNING: DATA RACE
Write by goroutine 10:
  runtime.closechan()
      /usr/local/go/src/runtime/chan.go:292 +0x0
  main.connectToServer()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/main.go:375 +0x507

Previous read by goroutine 16:
  runtime.chansend()
      /usr/local/go/src/runtime/chan.go:115 +0x0
  main.(*instanceData).instanceLoop()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/instance.go:243 +0x3fac

Goroutine 10 (running) created at:
  main.main()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/main.go:599 +0xb8e

Goroutine 16 (running) created at:
  main.startInstance()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/instance.go:283 +0x474
  main.startOverseer.func1()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/overseer.go:680 +0x6f1
  path/filepath.walk()
      /usr/local/go/src/path/filepath/path.go:349 +0xa3
  path/filepath.walk()
      /usr/local/go/src/path/filepath/path.go:374 +0x619
  path/filepath.Walk()
      /usr/local/go/src/path/filepath/path.go:396 +0x106
  main.startOverseer()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/overseer.go:696 +0x2b7
  main.connectToServer()
      /home/markus/go/src/github.com/01org/ciao/ciao-launcher/main.go:328 +0x3b3
==================

The problem is that the overseer channel is closed and written to from different go routines. Need to find a better way to close down the overseer.

networking - hard reset - ciao links without alias not cleaned up

hard reset fails to clean up links which were created by ciao, but due to failure that ciao alias was not set on them.

ciao-cli should allow --list-events per tenant

A tenant should be able to view their own error log.

address "race: limit on 8192 simultaneously alive goroutines is exceeded, dying" when launching 10K containers

Build controller with --race
Attempt to launch 10K instances
Observe the controller die with the following error message:

race: limit on 8192 simultaneously alive goroutines is exceeded, dying

Determine if this is a --race limitation or something real to be concerned about.

hard-reset should delete all docker ciao networks

Currently when we perform a hard-reset we only delete the docker networks associated with the containers we have just deleted. In general this is fine but it might be possible to docker networks to get leaked if there is some sort of launcher crash or node shutdown when executing hard-reset. These networks would then node get cleaned up. This makes it hard to ensure that we are always tested on a clean environment.

Instead we should enumerate all the networks and delete the networks created by the ciao plugin.

cannot delete all docker instances through the cli

Launched around 50 docker instances and then tried to delete them. 1 of them failed to be deleted, listed instances to be sure it was still there and it appears listed, but in the compute nodes there is no docker instance running.

root@sn-controller ~/bin # for i in $(./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -list-instances | grep -A1 'Instance' | awk '/UUID/ {print $2}'); do ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -delete-instance -instance $i; done
Deleted instance: 93b0f44b-253b-492d-92ca-47cea6bdf1d6
Deleted instance: b9957b8c-0489-41fe-99cf-e2da361f3f6f
Deleted instance: 51c8ab26-f8e7-48ae-a5e8-ba210aa391d1
....
Deleted instance: 10e096de-28f7-4f7b-a1e4-ac550818dff6
Deleted instance: d2727311-66db-4f35-8025-9a3d11b6d018

failure

F0415 21:12:32.270183 1798 ciao-cli.go:109] ciao-cli FATAL: HTTP Error [500] for [DELETE https://localhost:8774/v2.1/92834d7fd3494ef994e421fdb3eff3a9/servers/6eef8f8d-a6a8-48d0-9e93-2701c2a49ac4]: Instance not available
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa64a00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa487c0, 0xc800000003, 0xc82000a0c0, 0xa2a264, 0xb, 0x6d, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa487c0, 0x3, 0xc820112d10, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc820112d10, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0xc820104b40, 0xa0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:109 +0x89
main.deleteTenantInstance(0x7ffffff4372f, 0x20, 0x7ffffff4376b, 0x24)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:625 +0x27b
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:907 +0x7c9
Deleted instance: 715c7858-0f4b-451c-8cff-d903b16b4f8d
Deleted instance: c3ff8985-e9c5-4912-9331-ceda5740404b
Deleted instance: 74c84ae2-c81c-4e6a-a592-263b7fc9f1ad

Instance still shown

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -list-instances
Instance #1
UUID: 4d4ff4f3-c3fc-41b0-950f-173362273f37
Status: running
Private IP: 172.16.0.15
MAC Address: 02:00:ac:10:00:0f
CN UUID: 3534b9eb-b2ad-47b5-9248-4f329aa59f41
Image UUID: fa7d86d8-fa46-11e5-8493-38607786d9ec
Tenant UUID: 92834d7fd3494ef994e421fdb3eff3a9
SSH IP: 192.168.10.169
SSH Port: 33015

tried to delete again

root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -delete-instance -instance 4d4ff4f3-c3fc-41b0-950f-173362273f37
F0415 21:14:24.212779 1880 ciao-cli.go:109] ciao-cli FATAL: HTTP Error [500] for [DELETE https://localhost:8774/v2.1/92834d7fd3494ef994e421fdb3eff3a9/servers/4d4ff4f3-c3fc-41b0-950f-173362273f37]: Instance not available
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa64a00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa487c0, 0xc800000003, 0xc820224180, 0xa2a264, 0xb, 0x6d, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa487c0, 0x3, 0xc8202086e0, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc8202086e0, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0xc8202021e0, 0xa0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:109 +0x89
main.deleteTenantInstance(0x7ffe45f5b72f, 0x20, 0x7ffe45f5b76b, 0x24)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:625 +0x27b
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:907 +0x7c9

on compute nodes, there is no docker instance running:

root@sn-compute01 /var/lib/ciao/logs/launcher # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

root@sn-compute02 /var/lib/ciao/logs/launcher # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@sn-compute02 /var/lib/ciao/logs/launcher #

SSNTP fuzzing

Use go-fuzz to fuzz the SSNTP APIs.

Networking: Implement test case to evaluate routing performance

Implement a test to generate and measure network bandwidth testing with traffic routed between two subnets belonging to a tenant. In this case the traffic will be routed between CNCI bridges.

networking - cn api - support container MTU configuration

The container MTU is currently set to 1400. The MTU should either configured by the controller or auto discovered.

Networking - Docker Plugin - Data race on http server shutdown

The graceful http server has a data race on launcher invoked shutdown.

Implement custom 'delete all instances' endpoint

Add an endpoint for deleting all running instances.

Add launch time benchmark to ciao-cli

Our developers need a way to very quickly and easily run a simple launch time benchmark on a cluster. Add functionality to ciao-cli to launch vms and docker instances and report the total number of seconds elapsed using the existing tracing functionality. Add options to specify the total number of instances to launch or the instances per node to launch.

Implement an event log retrieval endpoint

That debug endpoint would allow us to fetch all controller received events.

Networking - Add documentation for outbound NAT set

Add documentation on how to setup the cluster such that traffic outbound from the CNCI is properly NATed to establish external connectivity for the workloads

FULL command should not contain a payload

ciao-launcher currently sends a payload.Ready payload when sending the FULL command. It shouldn't do this. In the SSNTP specification FULL does not have a payload. In addition, ciao-scheduler doesn't even look at it so it's a waste

Failed to create docker instance: cn.CreateVnic failed Timeout waiting for device ready

E0405 17:45:05.075607   11595 network.go:252] cn.CreateVnic failed br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready
E0405 17:45:05.075977   11595 instance.go:111] Unable to start instance[network_failure]: br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready
E0405 17:45:05.171449   11595 network.go:252] cn.CreateVnic failed br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready
E0405 17:45:05.171485   11595 instance.go:111] Unable to start instance[network_failure]: br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready

Also, we've seen a lot of a slowness today when launching containers. I wonder if this could be somehow connected.

This failures are happening on the first attempt to create a subnet for a docker instance on a compute node. I guess launcher will block for 60 seconds and due to the lock I added the other day, all start commands for the same subnet will block on that launcher.

I suspect once we've fixed this problem our docker related performance problems will go away as well.

Feel free to assign back to me if it's a launcher problem. I'll try to reproduce Monday anyway.

Cleanup launcher source

The launcher source needs to be cleaned up a little

There are some functions which are too long, particularly in instance.go
The payloads.go file needs to go and to be split up over the other files that implement commands.
It would be nice to rename some of the files that implement commands, e.g., start to indicate that these commands run in the instance go routine context.

Query qmp for spice and netcat ports

The --with-ui option enables netcat and spice support for qemu. A port is reserved for instance that is launched which spice and netcat can use to communicate to the underlying VM. Internally, launcher maintains a list of free ports so it knows which ports to use when a new START request comes in. The problem is that when you stop and restart launcher, launcher loses track of which ports are assigned to running instances. It is possible for launcher to query the running vms to find out which ports are currently in use, but it does not yet do this. The end results is that VM launch time can be increases while launcher searches for a free port.

Now this is a debug option, for the time being at least, so there's no urgency to fix this.

Restrict container resource usage

The START payload that creates docker containers contains a list of resources restrictions for that container. ciao-launcher currently ignores these restrictions. This needs to change

ciao-project / ciao Goto Github PK

ciao's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs