GithubHelp home page GithubHelp logo

log-extractor's Introduction

log-extractor

When running lots of tests, it is easy for logs to get huge and trying to search through them to find the data that is relevant to the portion of time a specific test was running for is tedious, even with time stamping. log-extractor rei-structures the logs, creating per testcase log files.

Install

sudo pip install . -U

Usage:

Source can be either:

Jenkins build URL:

$ log-extractor \
    --source https://jenkins.example.com/job/rhv-master-ge-runner-network/275 \
    --team networking \
    --logs engine.log,vdsm.log

Locally downladed logs in zip format from Jenkins job artifacts:

$ log_extractor \
    --source /home/kkoukiou/Downloads/archive.zip \
    --team networking \
    --logs engine.log,vdsm.log

Local folder containing the logs:

$ log_extractor \
    --source /home/kkoukiou/Downloads/archive \
    --team networking \
    --logs engine.log,vdsm.log

log-extractor's People

Contributors

cynepco3hahue avatar kkoukiou avatar myakove avatar stluke avatar tareqalayan avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

log-extractor's Issues

Add support for flow jobs

If flow job is linked to the tool, it downloads whole HTML page and tries to unpack it, first this case should have some sensible error, with a check for archive format.

As RFE we should redirect from flow job to runner.

parse_art_log skipped during run

[lsvaty@******** ~]$ log-extractor --job rhv-4.2-ge-runner-coresystem --build 304 --team coresystem
Download artifacts from the link https://*****/job/rhv-4.2-ge-runner-coresystem/304/
==== Unpack the file /home/lsvaty/art-tests-logs/rhv-4.2-ge-runner-coresystem/304/artifact.zip ====
==== Parse ART logs ====
parse file /home/lsvaty/art-tests-logs/rhv-4.2-ge-runner-coresystem/304/art_test_runner.log
Traceback (most recent call last):
File "/bin/log-extractor", line 11, in
sys.exit(run())
File "/usr/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/log_extractor/log_extractor.py", line 503, in run
log_extractor.parse_logs()
File "/usr/lib/python2.7/site-packages/log_extractor/log_extractor.py", line 348, in parse_logs
raise RuntimeError("You need to run parse_art_logs first")
RuntimeError: You need to run parse_art_logs first

parsing engine.log throws: TypeError: unsupported operand type(s) for +: 'NoneType' and 'datetime.timedelta'

When trying to parse the engine.log, I am getting a traceback with TypeError:

$ log-extractor --job rhv-4.2-ge-runner-webui-dev --build 42 --logs engine.log
Download artifacts from the link https://jenkins.example.com/job/rhv-4.2-ge-runner-webui-dev/42/
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact.zip ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/db-system-ge/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/dwh-system-ge/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/hypervisor-ge-system-01/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/hypervisor-ge-system-03/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/hypervisor-ge-system-02/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/engine-system-ge/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/db-system-ge/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/dwh-system-ge/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/hypervisor-ge-system-01/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/hypervisor-ge-system-03/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/hypervisor-ge-system-02/ovirt-engine-logs.tar.gz ====
==== Unpack the file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/artifact/archive/ansible-playbooks/playbooks/ovirt-collect-logs/logs/engine-system-ge/ovirt-engine-logs.tar.gz ====
==== Parse ART logs ====
parse file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/art_test_runner.log
==== Parse engine.log's ====
parse file /home/pnovotny/art-tests-logs/rhv-4.2-ge-runner-webui-dev/42/engine.log
Traceback (most recent call last):
  File "/usr/local/bin/log-extractor", line 11, in <module>
    load_entry_point('log-extractor==1.0', 'console_scripts', 'log-extractor')()
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/log_extractor/log_extractor.py", line 503, in run
    log_extractor.parse_logs()
  File "/usr/local/lib/python2.7/dist-packages/log_extractor/log_extractor.py", line 439, in parse_logs
    test_name=cur_test_name
  File "/usr/local/lib/python2.7/dist-packages/log_extractor/log_extractor.py", line 220, in _define_tss
    self.tss[test_name][const.TS_END] + datetime.timedelta(minutes=1)
TypeError: unsupported operand type(s) for +: 'NoneType' and 'datetime.timedelta'

Refactoring log-extractor

As you know, I would like to integrate this script in our Jenkins jobs, so as to re-factor the logs in the end of each run, as a post-build task.
Because log sizes can be quite big, ~30GB, we need to make sure we manipulate with them efficiently and avoid duplication of files.
Thus I would like to raise here a few points, and if you agree, re-factor the log-extractor to make it more generic regarding the file type manipulation, more modular and with less parameters.

  • --source parameter: The input logs right now can be either downloaded from Jenkins or be local archive downloaded from Jobs artifacts. I need to add one more parameter, to allow parsing the $WORKSPACE in Jenkins Jobs instead of archives because in Jenkins we don't have generated archives, at the point I am parsing the logs. And of course archiving -> unarchiving is completely redundant. Thus we already have 3 input methods, and a lot of relevant parameters, which can be merged into one.
    --build, --job, --skip-download, --local-log-file, (future local-log-dir) all can be merged into --source parameter.
    Then we can check parameter value, if it's local dir, local file or url and act accordingly.

  • extract_all: The whole extracting and moving of the files in this method is not needed. Regardless if we are dealing with zip files, tar.gz files or directories we can list and open files there without extracting entire archive which is time consuming and taking up space. I believe we should skip extraction and just parse the relevant files in-place using following modules zipfile & tarfile.

What do you think?

Unnecessarily wasted storage space

It might be better to remove all unnecessary content after extraction.
It takes 17 GB of my storage space from which my desired folder, and the only thing I actually need, takes 223 MB.

Maybe we can have some option on this? Or something like remove everything else (apart from desired folder) if team is specified.

I don't mind downloading the zip, neither I mind having that space taken for the time it takes to parse everything, but it's not necessary to have those files later on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.