GithubHelp home page GithubHelp logo

lago-project / lago-ost-plugin Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 10.0 12.61 MB

Lago ovirt-system-tests plugin

License: GNU General Public License v2.0

Makefile 2.09% Shell 20.16% Python 77.75%
lago ovirt python testing virtualization

lago-ost-plugin's People

Contributors

david-caro avatar didib avatar dimakuz avatar eedri avatar galitf avatar gbenhaim avatar lago-bot avatar leongold avatar machacekondra avatar mpolednik avatar mz-pdm avatar nirs avatar nvgoldin avatar ovirt-infra avatar pilou- avatar sandrobonazzola avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lago-ost-plugin's Issues

Explain why "lago ovirt stop" fails

This is the current stack trace.
We shouldn't show it to the user. Instead, we need to explain why the operation failed,
and what needs to be done in order to run it again successfully.

@ Stopping oVirt environment: ERROR (in 0:05:25)
Error occured, aborting
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 362, in do_run
self.cli_plugins[args.ovirtverb].do_run(args)
File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line 184, in do_run
self._do_run(**vars(args))
File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in wrapper
return func(*args, prefix=prefix, **kwargs)
File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 294, in do_ovirt_stop
prefix.virt_env.engine_vm().stop_all_hosts()
File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 148, in wrapped_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/ovirtlago/virt.py", line 446, in stop_all_hosts
testlib.assert_true_within(_host_is_maint, timeout=timeout)
File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 263, in assert_true_within
assert_equals_within(func, True, timeout, allowed_exceptions)
File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 237, in assert_equals_within
'%s != %s after %s seconds' % (res, value, timeout)
AssertionError: None != True after 300 seconds

'lago ovirt stop' fails due to running asynchronous tasks

12:16:15 + cd /dev/shm/ost/deployment-basic-suite-master
12:16:15 + lago ovirt stop
12:16:15 @ Stopping oVirt environment:
12:16:15 # Stopping Engine VMs:
12:16:20 # Stopping Engine VMs: Success (in 0:00:05)
12:16:20 # Putting hosts in maintenance mode:
12:16:21 # Putting hosts in maintenance mode: ERROR (in 0:00:00)
12:16:21 @ Stopping oVirt environment: ERROR (in 0:00:06)
12:16:21 Error occured, aborting
12:16:21 Traceback (most recent call last):
12:16:21 File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 360, in do_run
12:16:21 self.cli_plugins[args.ovirtverb].do_run(args)
12:16:21 File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line 184, in do_run
12:16:21 self._do_run(**vars(args))
12:16:21 File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in wrapper
12:16:21 return func(*args, **kwargs)
12:16:21 File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in wrapper
12:16:21 return func(*args, prefix=prefix, **kwargs)
12:16:21 File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 292, in do_ovirt_stop
12:16:21 prefix.virt_env.engine_vm().stop_all_hosts()
12:16:21 File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 145, in wrapped_func
12:16:21 return func(*args, **kwargs)
12:16:21 File "/usr/lib/python2.7/site-packages/ovirtlago/virt.py", line 390, in stop_all_hosts
12:16:21 host_service.deactivate()
12:16:21 File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 30877, in deactivate
12:16:21 return self._internal_action(action, 'deactivate', None, headers, query, wait)
12:16:21 File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 290, in _internal_action
12:16:21 return future.wait() if wait else future
12:16:21 File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 53, in wait
12:16:21 return self._code(response)
12:16:21 File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 287, in callback
12:16:21 self._check_fault(response)
12:16:21 File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 125, in _check_fault
12:16:21 self._raise_error(response, body.fault)
12:16:21 File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 109, in _raise_error
12:16:21 raise error
12:16:21 Error: Fault reason is "Operation Failed". Fault detail is "[Cannot switch Host to Maintenance mode. Host has asynchronous running tasks,
12:16:21 wait for operation to complete and retry.]". HTTP response code is 409.

Make the repositpory server into a fully flegged Lago object

The repository server is currently a very strange thing, is something that lives in the Lago environment but:

  • It isn't mentioned in the LagoInitFile
  • Its not started on lago start
  • Its not stopped on lago stop
  • It can be left behind even after lago destroy
  • lago ovirt serve is the only lago command that creates a long lived Lago process.

The above had been causing several issues:

  • Users frequently forget to run lago ovirt serve
  • Whet cleanup steps fail, lago ovirt serve can keep runing and prevent the same command from being run in a new lago environment on the same host

What needs to be done IMO:

  1. The code for running the local server should be changed so that instead of assuming the server will always be started and stopped within the same python process, the server will be double forked to a new daemon process whose PID will be tracked with a file in the prefix.
  2. A new syntax will be added to the LagoInitFile to specify that a local repo server should be available in the environment (It could be expanded in the future to enable running multiple servers and specifying that reposync should run at deploy, but lets not spend time on enhancements ATM)
  3. lago start needs to be changed to start the server if asked for, lago stop to stop it etc.
  4. lago status should show the status of the repo server. To do that, it should have a special URL defined that will return some status JSON. To be robust the status command should probably always try to check if the server process is up before trying to query it over HTTP.
  5. The lago ovirt serve command should be converted into a noop showing a deprecation warning with some instructions on how to add the server to the LagoIniFile.

Also it might be useful to move the server code to its own separate Python file so that we don't have to have the whole Lago code base in memory just to serve some files over HTTP.

Query the engine for supported CPUs

  • It will save us the effort of maintaining ovirt_cpu_map.yaml.
  • If the hypervisor's CPU family is not supported, we can dynamically find another CPU family to use (for example, we currently map between IvyBridge and Intel SandyBridge Family)

Rotate logs between tests

Now that we collect the entire '/var/log' directory in ost-plugin, it would be that on each 'runtest' command we would only collect the logs per that test, instead of repeatedly collecting the entire directory. This will also reduce the size of the collected logs.

hosts cpu set by image build

Moved from lago-project/lago#548 to here:

@dron1 wrote:

I created a new image on http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/ and was trying to use the image to run lago on my local machine.
when I ran 'lago init' the vms failed to activate the hosts:

Activating Engine Hosts: ERROR (in 0:00:57)

@ Starting oVirt environment: ERROR (in 0:00:58)
Error occured, aborting
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 360, in do_run
self.cli_plugins[args.ovirtverb].do_run(args)
File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line 184, in do_run
self._do_run(**vars(args))
File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in wrapper
return func(*args, prefix=prefix, **kwargs)
File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 254, in do_ovirt_start
prefix.virt_env.engine_vm().start_all_hosts(timeout=5 * 60)
File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 145, in wrapped_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/ovirtlago/virt.py", line 414, in start_all_hosts
api = self.get_api_v4(check=True)
File "/usr/lib/python2.7/site-packages/ovirtlago/virt.py", line 302, in get_api_v4
self._api_v4 = self._get_api(api_ver=4)
File "/usr/lib/python2.7/site-packages/ovirtlago/virt.py", line 282, in _get_api
raise RuntimeError('test api call failed')
RuntimeError: test api call failed

looking further at the reason I can see it's a cpu issue for the hosts

May 21 18:45:34 dhcp-0-198 kernel: kvm [3247]: vcpu0 unhandled rdmsr: 0x345
May 21 18:45:34 dhcp-0-198 kernel: kvm [3247]: vcpu0 unhandled wrmsr: 0x680 data 0
May 21 18:45:34 dhcp-0-198 kernel: kvm [3245]: vcpu0 unhandled rdmsr: 0x345
May 21 18:45:34 dhcp-0-198 kernel: kvm [3245]: vcpu0 unhandled wrmsr: 0x680 data 0
May 21 18:45:34 dhcp-0-198 kernel: kvm [3245]: vcpu0 unhandled wrmsr: 0x6c0 data 0
May 21 18:45:34 dhcp-0-198 kernel: kvm [3245]: vcpu0 unhandled wrmsr: 0x681 data 0
May 21 18:45:34 dhcp-0-198 kernel: kvm [3245]: vcpu0 unhandled wrmsr: 0x6c1 data 0
May 21 18:45:34 dhcp-0-198 kernel: kvm [3245]: vcpu0 unhandled wrmsr: 0x682 data 0
May 21 18:45:34 dhcp-0-198 kernel: kvm [3245]: vcpu0 unhandled wrmsr: 0x6c2 data 0
May 21 18:45:34 dhcp-0-198 kernel: kvm [3245]: vcpu0 unhandled wrmsr: 0x683 data 0
May 21 18:45:34 dhcp-0-198 kernel: kvm [3245]: vcpu0 unhandled wrmsr: 0x6c3 data 0
May 21 18:45:34 dhcp-0-198 kernel: kvm [3245]: vcpu0 unhandled wrmsr: 0x684 data 0
May 21 18:45:34 dhcp-0-198 kernel: kvm [3249]: vcpu0 unhandled rdmsr: 0x345
May 21 18:45:40 dhcp-0-198 kvm: 2 guests now active
May 21 18:45:40 dhcp-0-198 kvm: 1 guest now active

when I logged in to the ovirt engine vm I could see that hosts fail to activate because a wrong cpu type.

This seems to happen since the cpu type has been selected based on the HW in which I created the image.
hence, if I try to use the image in lago in any computer that has a different cpu it would not be able to activate the ovirt hosts.

running 'lago ovirt status' after 'lago ovirt stop' hangs

Trying to run the lago demo tool with the following commands ( after extracting the image ) works:

lago init
lago ovirt start --with-vm
lago ovirt status
lago stop
But when running 'lago ovirt status' after the env is stopped, the commands hangs and eventually I had to run CTRL-C to stop it, got this exception:

lago ovirt status
Error occured, aborting
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 325, in do_run
self.cli_plugins[args.ovirtverb].do_run(args)
File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line 184, in do_run
self._do_run(**vars(args))
File "/usr/lib/python2.7/site-packages/lago/utils.py", line 501, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/lago/utils.py", line 512, in wrapper
return func(*args, prefix=prefix, **kwargs)
File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 211, in do_ovirt_status
prefix.virt_env.engine_vm().status()
File "/usr/lib/python2.7/site-packages/ovirtlago/utils.py", line 145, in wrapped_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/ovirtlago/virt.py", line 463, in status
api = self.get_api_v4(check=True)
File "/usr/lib/python2.7/site-packages/ovirtlago/virt.py", line 301, in get_api_v4
self._api_v4 = self._get_api(api_ver=4)
File "/usr/lib/python2.7/site-packages/ovirtlago/virt.py", line 281, in _get_api
raise RuntimeError('test api call failed')
RuntimeError: test api call failed

Log output show success even when a test fails

testlib.LogCollectorPlugin should be fixed. It should show ERROR message when a test fails.

17:12:52 [basic-suit] @ Run test: 007_sd_reattach.py: 
17:12:52 [basic-suit] nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
17:12:52 [basic-suit]   # deactivate_storage_domain: 
17:12:52 [basic-suit]     * Collect artifacts: 
17:13:19 [basic-suit]     * Collect artifacts: Success (in 0:00:23)
17:13:19 [basic-suit]   # deactivate_storage_domain: Success (in 0:00:24)
17:13:19 [basic-suit]   # Results located at /dev/shm/ost/deployment-basic-suite-4.2/default/007_sd_reattach.py.junit.xml
17:13:19 [basic-suit] @ Run test: 007_sd_reattach.py: Success (in 0:00:24)
17:13:19 [basic-suit] Error occured, aborting
17:13:19 [basic-suit] Traceback (most recent call last):
17:13:19 [basic-suit]   File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 362, in do_run
17:13:19 [basic-suit]     self.cli_plugins[args.ovirtverb].do_run(args)
17:13:19 [basic-suit]   File "/usr/lib/python2.7/site-packages/lago/plugins/cli.py", line 184, in do_run
17:13:19 [basic-suit]     self._do_run(**vars(args))
17:13:19 [basic-suit]   File "/usr/lib/python2.7/site-packages/lago/utils.py", line 505, in wrapper
17:13:19 [basic-suit]     return func(*args, **kwargs)
17:13:19 [basic-suit]   File "/usr/lib/python2.7/site-packages/lago/utils.py", line 516, in wrapper
17:13:19 [basic-suit]     return func(*args, prefix=prefix, **kwargs)
17:13:19 [basic-suit]   File "/usr/lib/python2.7/site-packages/ovirtlago/cmd.py", line 99, in do_ovirt_runtest
17:13:19 [basic-suit]     raise RuntimeError('Some tests failed')
17:13:19 [basic-suit] RuntimeError: Some tests failed

Use multi threaded web server to serve the internal repo

It seems that the server that we currently use is overloaded:

11:49:32 @ Deploy oVirt environment: 
11:49:33   # Deploy environment: 
11:49:33     * [Thread-2] Deploy VM lago-basic-suite-master-host-0: 
11:49:33     * [Thread-3] Deploy VM lago-basic-suite-master-host-1: 
11:49:33     * [Thread-4] Deploy VM lago-basic-suite-master-engine: 
11:49:55     * [Thread-3] Deploy VM lago-basic-suite-master-host-1: Success (in 0:00:22)
11:49:57 Traceback (most recent call last):
11:49:57   File "/usr/lib64/python2.7/SocketServer.py", line 295, in _handle_request_noblock
11:49:57     self.process_request(request, client_address)
11:49:57   File "/usr/lib64/python2.7/SocketServer.py", line 321, in process_request
11:49:57     self.finish_request(request, client_address)
11:49:57   File "/usr/lib64/python2.7/SocketServer.py", line 334, in finish_request
11:49:57     self.RequestHandlerClass(request, client_address, self)
11:49:57   File "/usr/lib64/python2.7/SocketServer.py", line 651, in __init__
11:49:57     self.finish()
11:49:57   File "/usr/lib64/python2.7/SocketServer.py", line 710, in finish
11:49:57     self.wfile.close()
11:49:57   File "/usr/lib64/python2.7/socket.py", line 279, in close
11:49:57     self.flush()
11:49:57   File "/usr/lib64/python2.7/socket.py", line 303, in flush
11:49:57     self._sock.sendall(view[write_offset:write_offset+buffer_size])
11:49:57 error: [Errno 32] Broken pipe

I think that yum aborts the connection because the data arrives to slow.
Another option is to configure yum, limit the number of connections / extend timeout.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.