ciao-project / ciao Goto Github PK
View Code? Open in Web Editor NEWCiao - Cloud Integrated Advanced Orchestrator
License: Apache License 2.0
Ciao - Cloud Integrated Advanced Orchestrator
License: Apache License 2.0
Our developers need a quick and easy way to measure networking performance between tenant instances (both vms and containers). Extend the ssntp tracing support to measure network performance, and add some benchmarks to the cli to measure the how long it takes to create the network interface, as well as the throughput between instances.
When having more than 1 tenants for a specific user, and trying to list the tenants, you are asked to specify the tenant. But since the tenant's list is the information you are looking for, you should not get the next error:
root@sn-controller ~ # ./ciao-cli -list-tenants
Available projects for csr:
Project[1]: ciao1 (4e3290a89b764b979d508b22cef4048d)
Project[2]: service (92834d7fd3494ef994e421fdb3eff3a9)
F0421 13:11:04.421239 16964 main.go:67] ciao-cli FATAL: Please specify a project to use with -tenant-name or -tenant-id
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa6ca00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa507e0, 0xc800000003, 0xc8201e3080, 0xa31c88, 0x7, 0x43, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa507e0, 0x3, 0xc820016280, 0x4f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc820016280, 0x4f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0x8d9a80, 0x3f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/main.go:67 +0x89
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/main.go:793 +0x5cf
ciao-launcher returns the pss of the process associated with a docker instance in the stats command. It's not really clear what happens if the container has multiple processes. I'm guessing that launcher does not return the correct memory usage for such containers. This needs to be checked and fixed if it is indeed a problem.
root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -list-tenants
root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357
-list-quotas -tenant 92834d7fd3494ef994e421fdb3eff3a9
Quotas for tenant 92834d7fd3494ef994e421fdb3eff3a9:
Instances: 0 | Unlimited
CPUs: 0 | Unlimited
Memory: 0 | Unlimited
Disk: 0 | Unlimited
root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -list-tenants
Tenant 1
UUID: 92834d7fd3494ef994e421fdb3eff3a9
Name:
If you start a docker container on a compute node which uses a base image that has not previously been used by any container on that compute node, ciao-launcher needs to download the image. This can greatly increase the start time of instance. We should therefore add a new state to indicate to the users why the image start is taking so long. 'downloading' should do the trick.
Instead of only talking to the ciao-controller db.
we need to add functionality to migrate database schema to newer versions so that we don't all have to delete our databases everytime I change the schema.
It would be handy to be able to see if Start Failures were tied to a particular node.
We do not check the SSNTP version for doing backward compatibility.
We need to e.g. keep track of supported commands for a given version.
I0414 15:19:53.285424 10726 network.go:282] CN VNIC created = svn_ae86911 &{2 sbr_3f2cf6ba {172.16.0.0 ffffff00} 172.16.0.1 br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.0.0/24_902d35e7-d635-4000-91ba-dc76ef8ef389_192.168.0.118} &{2 192.168.0.118 192.168.0.83 172.16.0.0/24 8fa6d6ec08c444ed951b2c2018a4c6ad 172.16.0.0/24 902d35e7-d635-4000-91ba-dc76ef8ef389 4268 }
I0414 15:19:53.762754 10726 network.go:282] CN VNIC created = svn_ae86911 &{0 sbr_3f2cf6ba {172.16.0.0 ffffff00} 172.16.0.1 br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.0.0/24_902d35e7-d635-4000-91ba-dc76ef8ef389_192.168.0.118} <nil>
E0414 15:19:54.201167 10726 docker.go:257] Unable to start container Error response from daemon: Cannot start container 0665d3cb511f9405f291dd4b43a1865e9bd69359bf5a280867d2fd9af2840aab: [9] System error: failed to add interface svp_ae86911 to sandbox: failed in prefunc: failed to get link by name "svp_ae86911": Link not found
E0414 15:19:54.201192 10726 instance.go:134] Unable to start instance[launch_failure]: Error response from daemon: Cannot start container 0665d3cb511f9405f291dd4b43a1865e9bd69359bf5a280867d2fd9af2840aab: [9] System error: failed to add interface svp_ae86911 to sandbox: failed in prefunc: failed to get link by name "svp_ae86911": Link not found
Add parallel and concurrency testing to the CN API. Currently there is a bug in the parallel testing and the tests are hosted in a separate directory https://github.com/01org/ciao/tree/master/networking/libsnnet/tests/parallel.
These test should be Incorporated in the main unit tests once the bug #22 has been resolved.
Extend it to support the CONFIGURE payload and dispatch cluster configuration information to the connected client.
Add support in the CNCI Agent to store state that can be recovered if the CNCI VM hosting the CNCI Agent reboots. Currently the CNCI Agent can recover state if the Agent itself crashes. But it cannot recover its state without persisting it in a data base.
Ideally the data base should be stored outside of the context of the VM in a network attached storage volume or KV store.
We should use the CA certificate FQDN as either the main server URI or an alternative one if no URI is passed through the SSNTP client config.
The reason is that when we call these functions we simply pass in a background context. The result is that the instance go routines that call these functions block until they return. This is only really a problem when shutting down launcher. A blocking call to a docker API may prevent ciao-launcher from shutting down properly when it receives a SIGTERM.
Below logs from cli and network node:
root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -launch-instances -workload ca957444-fa46-11e5-94f9-38607786d9ec
Created new instance: 2369578a-8f12-4bf0-af1e-a129295427dc
root@sn-controller ~/bin # ./ciao-cli -password supernova -username admin -identity https://sn-keystone.zpn.intel.com:35357 -tenant d750a83434e941639ba0c3a37fe18efa -scope admin -launch-instances -workload e35ed972-c46c-4aad-a1e7-ef103ae079a2
F0418 14:18:53.028643 3137 ciao-cli.go:109] ciao-cli FATAL: HTTP Error [500] for [POST https://localhost:8774/v2.1/d750a83434e941639ba0c3a37fe18efa/servers]: Unable to Launch Tenant CNCI
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa60a00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa447a0, 0xc800000003, 0xc8201c8fc0, 0xa26054, 0xb, 0x6d, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa447a0, 0x3, 0xc8200f45a0, 0x8f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc8200f45a0, 0x8f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0xc8200e4680, 0x7f, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:109 +0x89
main.createTenantInstance(0x7ffd87812721, 0x20, 0x7ffd8781276b, 0x24, 0x1)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:598 +0x473
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:807 +0x6b7
Log file created at: 2016/04/18 14:01:51
Running on machine: sn-network
Binary: Built with gc go1.6 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0418 14:01:51.307449 588 network.go:287] cn.CreateCnciVnic failed cncivnic error: enable link up device or resource busy
E0418 14:01:51.307578 588 instance.go:134] Unable to start instance[network_failure]: cncivnic error: enable link up device or resource busy
E0418 14:13:51.977771 588 network.go:287] cn.CreateCnciVnic failed cncivnic error: enable link up device or resource busy
E0418 14:13:51.977792 588 instance.go:134] Unable to start instance[network_failure]: cncivnic error: enable link up device or resource busy
E0418 14:18:52.817650 588 network.go:287] cn.CreateCnciVnic failed cncivnic error: enable link up device or resource busy
E0418 14:18:52.817667 588 instance.go:134] Unable to start instance[network_failure]: cncivnic error: enable link up device or resource busy
root@sn-network ~ # ip -d link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 addrgenmode eui64
2: tunl0@NONE: mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0 promiscuity 0
ipip remote any local any ttl inherit nopmtudisc addrgenmode eui64
3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether 34:13:e8:36:b0:a0 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
4: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether b8:ae:ed:77:02:86 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:52:d2:61:8e brd ff:ff:ff:ff:ff:ff promiscuity 0
bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q addrgenmode eui64
6: svc_6c531194@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
link/ether 02:37:a6:4c:58:33 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_92834d7fd3494ef994e421fdb3eff3a9_044939c4-feff-447b-ac05-6e3ec8bacc2c
7: svc_bd5f8286@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
link/ether 02:13:95:c5:51:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_4e3290a89b764b979d508b22cef4048d_d1b20e75-a07f-471b-8315-d04095137235
8: svc_b4206548@enp0s25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 02:13:95:c5:51:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_4e3290a89b764b979d508b22cef4048d_7f1c862d-ba25-47f3-9f85-a86ea73d531a
9: svc_2c11fbc3@enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
link/ether 02:3d:80:f6:6f:40 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_d750a83434e941639ba0c3a37fe18efa_b479959c-99ee-4d59-9331-8b76e625225c
10: svc_44124e30@enp0s25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 02:3d:80:f6:6f:40 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_d750a83434e941639ba0c3a37fe18efa_18e4cdca-4d6b-4078-87e7-f8ce647cc961
11: svc_9e0bcea4@enp0s25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
link/ether 02:3d:80:f6:6f:40 brd ff:ff:ff:ff:ff:ff promiscuity 0
macvtap mode vepa addrgenmode eui64
alias cncivnic_d750a83434e941639ba0c3a37fe18efa_aded9538-e4f4-49e5-97d7-9b9b014743c9
Its one existing test cases only covers 6.2 % of the code. We need to get this up to at least 70%.
To reproduce:
launch some docker instances.
delete them.
launch some more docker instances.
observe the breakage.
Currently, there is no limit to the number of instances that ciao-launcher will start in parallel. This places extreme load on the compute node when spawning large amounts of instances at once and can lead to various errors. It may be better for launcher to introduce some sort of semaphore to limit the number of instances that can be launched in parallel to some function of the number of cores on the machine. We could also return a special STATUS, e.g., throttle, to indicate that launcher is overloaded but not full.
When launching large amounts of instances, e.g., 10000, we often see some failures and timeouts in qemu and networking. Reducing the load on the compute node may prevent these failures.
Add support for macvtap bridge mode for CNCI VNICs. This will allow the setup of a single node test systems. Currently the CNCI VNICs are always setup in VEPA mode which prevents the hosts (and other CNCIs) from being able to reach the CNCI.
Ideally this file should just contain functions for the virtualizer, i.e., functions that are executed in the context of the instance go routine. The functions at the end of this file not executed in this context need to be moved elsewhere.
On a hard reset the docker database needs to be cleaned up
A workaround is to rm /var/lib/ciao/supernova/networking/docker_plugin.db
after hard-reset
The networking code should cleanup the docker database as part of network reset.
A user should be able to see her/his own labels.
We need some sort of strategy for handling corrupt instances. For example, launcher might detect on start up that the state it maintains about an instance has become corrupted, or the docker container associated by with that instance has been deleted.
Currently, if launcher cannot retrieve an instance state it simply ignores it. This means that it is not reported to the upper layers, and so cannot be deleted, and is essentially leaked on the compute node.
Special handling is also required for instances associated with docker containers that get deleted by some out of bound mechanism, i.e., by someone logging into the compute node and doing
sudo docker rm -f
Add concurrency control in the CN API to limit the number of simultaneous calls into the netlink library. Making too many concurrent netlink calls to create/delete interfaces seems to add large latencies to the API.
Longer term the degree of concurrency needs to be tuneable based on system response.
We could export and use CIAO_IDENTITY, CIAO_CONTROLLER, CIAO_USERNAME and CIAO_PASSWORD instead of always passing those as command line options.
On NN nodes docker may not be provisioned. Hence it may not make sense to start the docker plugin or interact with Docker on the NN. (or start docker plugin only if docker is detected on the NN).
ciao-launcher currently only supports persistent images. It retains instance information about these instances even after they exit. Support for volatile instances needs to be added.
We want to be able to launch traced frames from the cli and then retrieve the tracing data back.
We want to have SSNTP events for client disconnection/connection events and among other things forward them to ciao-controller.
ciao-launcher creates iso images when launching QEMU VMs. The path to these images are passed to the VMs so there contents can be access when the VM boots. These VMs are used to store the cloud-init data and also some ciao specific information needed by the cnci-agent. ciao-launcher does not specify a media type for these isos when it invokes qemu and this results in a qemu warning.
Here's the race
==================
WARNING: DATA RACE
Write by goroutine 10:
runtime.closechan()
/usr/local/go/src/runtime/chan.go:292 +0x0
main.connectToServer()
/home/markus/go/src/github.com/01org/ciao/ciao-launcher/main.go:375 +0x507
Previous read by goroutine 16:
runtime.chansend()
/usr/local/go/src/runtime/chan.go:115 +0x0
main.(*instanceData).instanceLoop()
/home/markus/go/src/github.com/01org/ciao/ciao-launcher/instance.go:243 +0x3fac
Goroutine 10 (running) created at:
main.main()
/home/markus/go/src/github.com/01org/ciao/ciao-launcher/main.go:599 +0xb8e
Goroutine 16 (running) created at:
main.startInstance()
/home/markus/go/src/github.com/01org/ciao/ciao-launcher/instance.go:283 +0x474
main.startOverseer.func1()
/home/markus/go/src/github.com/01org/ciao/ciao-launcher/overseer.go:680 +0x6f1
path/filepath.walk()
/usr/local/go/src/path/filepath/path.go:349 +0xa3
path/filepath.walk()
/usr/local/go/src/path/filepath/path.go:374 +0x619
path/filepath.Walk()
/usr/local/go/src/path/filepath/path.go:396 +0x106
main.startOverseer()
/home/markus/go/src/github.com/01org/ciao/ciao-launcher/overseer.go:696 +0x2b7
main.connectToServer()
/home/markus/go/src/github.com/01org/ciao/ciao-launcher/main.go:328 +0x3b3
==================
The problem is that the overseer channel is closed and written to from different go routines. Need to find a better way to close down the overseer.
hard reset fails to clean up links which were created by ciao, but due to failure that ciao alias was not set on them.
A tenant should be able to view their own error log.
race: limit on 8192 simultaneously alive goroutines is exceeded, dying
Determine if this is a --race limitation or something real to be concerned about.
Currently when we perform a hard-reset we only delete the docker networks associated with the containers we have just deleted. In general this is fine but it might be possible to docker networks to get leaked if there is some sort of launcher crash or node shutdown when executing hard-reset. These networks would then node get cleaned up. This makes it hard to ensure that we are always tested on a clean environment.
Instead we should enumerate all the networks and delete the networks created by the ciao plugin.
Launched around 50 docker instances and then tried to delete them. 1 of them failed to be deleted, listed instances to be sure it was still there and it appears listed, but in the compute nodes there is no docker instance running.
root@sn-controller ~/bin # for i in $(./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -list-instances | grep -A1 'Instance' | awk '/UUID/ {print $2}'); do ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -delete-instance -instance $i; done
Deleted instance: 93b0f44b-253b-492d-92ca-47cea6bdf1d6
Deleted instance: b9957b8c-0489-41fe-99cf-e2da361f3f6f
Deleted instance: 51c8ab26-f8e7-48ae-a5e8-ba210aa391d1
....
Deleted instance: 10e096de-28f7-4f7b-a1e4-ac550818dff6
Deleted instance: d2727311-66db-4f35-8025-9a3d11b6d018
F0415 21:12:32.270183 1798 ciao-cli.go:109] ciao-cli FATAL: HTTP Error [500] for [DELETE https://localhost:8774/v2.1/92834d7fd3494ef994e421fdb3eff3a9/servers/6eef8f8d-a6a8-48d0-9e93-2701c2a49ac4]: Instance not available
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa64a00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa487c0, 0xc800000003, 0xc82000a0c0, 0xa2a264, 0xb, 0x6d, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa487c0, 0x3, 0xc820112d10, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc820112d10, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0xc820104b40, 0xa0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:109 +0x89
main.deleteTenantInstance(0x7ffffff4372f, 0x20, 0x7ffffff4376b, 0x24)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:625 +0x27b
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:907 +0x7c9
Deleted instance: 715c7858-0f4b-451c-8cff-d903b16b4f8d
Deleted instance: c3ff8985-e9c5-4912-9331-ceda5740404b
Deleted instance: 74c84ae2-c81c-4e6a-a592-263b7fc9f1ad
root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -list-instances
Instance #1
UUID: 4d4ff4f3-c3fc-41b0-950f-173362273f37
Status: running
Private IP: 172.16.0.15
MAC Address: 02:00:ac:10:00:0f
CN UUID: 3534b9eb-b2ad-47b5-9248-4f329aa59f41
Image UUID: fa7d86d8-fa46-11e5-8493-38607786d9ec
Tenant UUID: 92834d7fd3494ef994e421fdb3eff3a9
SSH IP: 192.168.10.169
SSH Port: 33015
root@sn-controller ~/bin # ./ciao-cli -password hello -username csr -identity https://sn-keystone.zpn.intel.com:35357 -tenant 92834d7fd3494ef994e421fdb3eff3a9 -delete-instance -instance 4d4ff4f3-c3fc-41b0-950f-173362273f37
F0415 21:14:24.212779 1880 ciao-cli.go:109] ciao-cli FATAL: HTTP Error [500] for [DELETE https://localhost:8774/v2.1/92834d7fd3494ef994e421fdb3eff3a9/servers/4d4ff4f3-c3fc-41b0-950f-173362273f37]: Instance not available
goroutine 1 [running]:
github.com/golang/glog.stacks(0xa64a00, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(_loggingT).output(0xa487c0, 0xc800000003, 0xc820224180, 0xa2a264, 0xb, 0x6d, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:720 +0x2ce
github.com/golang/glog.(_loggingT).printf(0xa487c0, 0x3, 0xc8202086e0, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:655 +0x1d4
github.com/golang/glog.Fatalf(0xc8202086e0, 0xb0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/golang/glog/glog.go:1148 +0x5d
main.fatalf(0xc8202021e0, 0xa0, 0x0, 0x0, 0x0)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:109 +0x89
main.deleteTenantInstance(0x7ffe45f5b72f, 0x20, 0x7ffe45f5b76b, 0x24)
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:625 +0x27b
main.main()
/home/fuentess/go/src/github.com/01org/ciao/ciao-cli/ciao-cli.go:907 +0x7c9
root@sn-compute01 /var/lib/ciao/logs/launcher # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@sn-compute02 /var/lib/ciao/logs/launcher # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@sn-compute02 /var/lib/ciao/logs/launcher #
Use go-fuzz to fuzz the SSNTP APIs.
Implement a test to generate and measure network bandwidth testing with traffic routed between two subnets belonging to a tenant. In this case the traffic will be routed between CNCI bridges.
The container MTU is currently set to 1400. The MTU should either configured by the controller or auto discovered.
The graceful http server has a data race on launcher invoked shutdown.
Add an endpoint for deleting all running instances.
Our developers need a way to very quickly and easily run a simple launch time benchmark on a cluster. Add functionality to ciao-cli to launch vms and docker instances and report the total number of seconds elapsed using the existing tracing functionality. Add options to specify the total number of instances to launch or the instances per node to launch.
That debug endpoint would allow us to fetch all controller received events.
Add documentation on how to setup the cluster such that traffic outbound from the CNCI is properly NATed to establish external connectivity for the workloads
ciao-launcher currently sends a payload.Ready payload when sending the FULL command. It shouldn't do this. In the SSNTP specification FULL does not have a payload. In addition, ciao-scheduler doesn't even look at it so it's a waste
E0405 17:45:05.075607 11595 network.go:252] cn.CreateVnic failed br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready
E0405 17:45:05.075977 11595 instance.go:111] Unable to start instance[network_failure]: br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready
E0405 17:45:05.171449 11595 network.go:252] cn.CreateVnic failed br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready
E0405 17:45:05.171485 11595 instance.go:111] Unable to start instance[network_failure]: br_8fa6d6ec08c444ed951b2c2018a4c6ad_172.16.33.0/24_db9ea216-aa11-46ad-a715-1616d5a74984_192.168.0.108Timeout waiting for device ready
Also, we've seen a lot of a slowness today when launching containers. I wonder if this could be somehow connected.
This failures are happening on the first attempt to create a subnet for a docker instance on a compute node. I guess launcher will block for 60 seconds and due to the lock I added the other day, all start commands for the same subnet will block on that launcher.
I suspect once we've fixed this problem our docker related performance problems will go away as well.
Feel free to assign back to me if it's a launcher problem. I'll try to reproduce Monday anyway.
The launcher source needs to be cleaned up a little
The --with-ui option enables netcat and spice support for qemu. A port is reserved for instance that is launched which spice and netcat can use to communicate to the underlying VM. Internally, launcher maintains a list of free ports so it knows which ports to use when a new START request comes in. The problem is that when you stop and restart launcher, launcher loses track of which ports are assigned to running instances. It is possible for launcher to query the running vms to find out which ports are currently in use, but it does not yet do this. The end results is that VM launch time can be increases while launcher searches for a free port.
Now this is a debug option, for the time being at least, so there's no urgency to fix this.
The START payload that creates docker containers contains a list of resources restrictions for that container. ciao-launcher currently ignores these restrictions. This needs to change
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.