GithubHelp home page GithubHelp logo

juju-crashdump's Introduction

juju-crashdump

Script to assist in gathering logs and other debugging info from a Juju model

Installation

The best way to install this plugin is via the snap:

sudo snap install --classic juju-crashdump

However, you can also install using pip:

sudo pip install git+https://github.com/juju/juju-crashdump.git

Usage

juju crashdump [-h] [-d] [-m MODEL] [-f MAX_FILE_SIZE] [-b BUG]
               [-o OUTPUT_DIR] [-u UNIQ] [-s] [-a ADDON]
               [--addons-file ADDONS_FILE]
               [extra_dir [extra_dir ...]]
extra_dir
Extra directories to snapshot
-h, --help
show this help message and exit
-d, --description
Output a short description of the plugin
-m MODEL, --model MODEL
Model to act on
-f MAX_FILE_SIZE, --max-file-size MAX_FILE_SIZE
The max file size (bytes) for included files
-b BUG, --bug BUG
Upload crashdump to the given launchpad bug #
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
Store the completed crash dump in this dir.
-u UNIQ, --uniq UNIQ
Unique id for this crashdump. We generate a uuid if this is not specified.
-s, --small
Make a 'small' crashdump, by skipping the contents of /var/lib/juju.
-a ADDON, --addon ADDON
Enable the addon with the given name
--addons-file ADDONS_FILE
Use this file for addon definitions
--as-root
Collect logs as root, may contain passwords etc. Addons with local commands will only run if this flag is enabled.

Addons

Addons can be used to collect information that is not already present in files on the nodes. The following addons can be chosen from:

  • crm-status
  • listening (shows netstat)
  • psaux
  • juju-show-unit
  • juju-show-status-log
  • juju-show-machine
  • ps-mem
  • sosreport
  • config (shows juju-config)
  • engine-report (shows juju-introspection)

Additional addons can be loaded using --addons-file. Addons files must take the format of:

addon-name:
 # command to run locally (on the machine running juju crashdump),
 # all created files will be pushed to {location} on all units.
 local: echo "example" > example.txt
 # command to run on every unit, all files created in {output} will be saved in the crashdump.
 remote: mv {location}/example.txt {output}/example.txt
 # local command to run for each {unit} or each {machine}. Std output will be saved.
 local-per-unit: echo "example including {unit}"

The commands can appear in any order, any command can be left out, but every command can only be used once.

juju-crashdump's People

Contributors

asbalderson avatar ashleylai avatar basdbruijne avatar chrismacnaughton avatar cynerva avatar dannf avatar fnordahl avatar freyes avatar genet022 avatar jhobbs avatar johnsca avatar kwmonroe avatar lutostag avatar marosg42 avatar nobuto-m avatar pengale avatar sabaini avatar smoser avatar tvansteenburgh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

juju-crashdump's Issues

keyerror in ip_to_machine

command juju scp -- -r {}:/tmp/17e43808-f4e4-4ea0-9133-c349ed9fb3be/addons failed
command juju scp -- -r {}:/tmp/17e43808-f4e4-4ea0-9133-c349ed9fb3be/addons failed
command juju scp -- -r {}:/tmp/17e43808-f4e4-4ea0-9133-c349ed9fb3be/addons failed
Traceback (most recent call last):
File "/snap/juju-crashdump/156/bin/juju-crashdump", line 11, in
load_entry_point('jujucrashdump==0.0.0', 'console_scripts', 'juju-crashdump')()
File "/snap/juju-crashdump/156/lib/python3.6/site-packages/jujucrashdump/crashdump.py", line 432, in main
filename = collector.collect()
File "/snap/juju-crashdump/156/lib/python3.6/site-packages/jujucrashdump/crashdump.py", line 311, in collect
self.run_addons()
File "/snap/juju-crashdump/156/lib/python3.6/site-packages/jujucrashdump/crashdump.py", line 228, in run_addons
machines = service_unit_addresses(juju_status).keys()
File "/snap/juju-crashdump/156/lib/python3.6/site-packages/jujucrashdump/crashdump.py", line 108, in service_unit_addresses
machine = ip_to_machine[u_info['public-address']]
KeyError: '10.244.41.26'
Traceback (most recent call last):
File "/usr/local/bin/fce", line 11, in
load_entry_point('foundationcloudengine', 'console_scripts', 'fce')()
File "/home/ubuntu/cpe/foundation/foundationcloudengine/foundationcloudengine/main.py", line 141, in entry_point
sys.exit(main(sys.argv[1:]))
File "/home/ubuntu/cpe/foundation/foundationcloudengine/foundationcloudengine/main.py", line 132, in main
opts.func(opts)
File "/home/ubuntu/cpe/foundation/foundationcloudengine/foundationcloudengine/collect_logs.py", line 41, in collect_logs_main
log_paths = collect_logs_inner(project, args.layer)
File "/home/ubuntu/cpe/foundation/foundationcloudengine/foundationcloudengine/collect_logs.py", line 29, in collect_logs_inner
log_paths[layer.name] = layer.collect_logs()
File "/home/ubuntu/cpe/foundation/foundationcloudengine/foundationcloudengine/layers/jujuworkloadlayer.py", line 954, in collect_logs
addons_files=self.addons_files, addons=self.addons)]
File "/home/ubuntu/cpe/foundation/foundationcloudengine/foundationcloudengine/juju_cli.py", line 352, in capture_crashdump
subprocess.check_call(command)
File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['juju-crashdump', '-m', 'foundations-maas:openstack', '--timeout', '300', '--small', '-f', '100000000', '--compression', 'gz', '-o', '/home/ubuntu/project/generated/openstack/juju-crashdump-openstack-2019-12-01-03.05.21.tar.gz', '--addons-file', '/home/ubuntu/cpe/foundation/foundationcloudengine/foundationcloudengine/layers/../resources/addons.yaml', '--addon', 'juju-engine-report']' returned non-zero exit status 1.

Collect from windows machines

Include ability to collect from windows machines, where juju ssh/scp don't work.

Not quite sure on path forward here. Will have to spec something out.
Two big options:

  • make sure the windows images have an ssh server
  • use the windows remote tooling instead of ssh

Looking for anyone with background to help here.

debug-layer and config addons don't work

The debug-layer and config addons don't seem to work. When I run juju-crashdump with those addons selected, it spews some errors:

$ juju-crashdump -a debug-layer -a config
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju scp -- -r   {}:/tmp/310c1585-66c5-43db-9bab-960f0804e520/addons failed
command juju ssh {} -- "cd /tmp/310c1585-66c5-43db-9bab-960f0804e520/addons; mkdir /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; . /etc/profile.d/juju-introspection.sh; for app in $(juju_machine_lock | grep -E '^\w' |sed 's/-[0-9]\+:$//g' | sed 's/unit-//g'); do cp $app-config.yaml /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; done;" failed
command juju ssh {} -- "cd /tmp/310c1585-66c5-43db-9bab-960f0804e520/addons; mkdir /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; . /etc/profile.d/juju-introspection.sh; for app in $(juju_machine_lock | grep -E '^\w' |sed 's/-[0-9]\+:$//g' | sed 's/unit-//g'); do cp $app-config.yaml /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; done;" failed
command juju ssh {} -- "cd /tmp/310c1585-66c5-43db-9bab-960f0804e520/addons; mkdir /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; . /etc/profile.d/juju-introspection.sh; for app in $(juju_machine_lock | grep -E '^\w' |sed 's/-[0-9]\+:$//g' | sed 's/unit-//g'); do cp $app-config.yaml /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; done;" failed
command juju ssh {} -- "cd /tmp/310c1585-66c5-43db-9bab-960f0804e520/addons; mkdir /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; . /etc/profile.d/juju-introspection.sh; for app in $(juju_machine_lock | grep -E '^\w' |sed 's/-[0-9]\+:$//g' | sed 's/unit-//g'); do cp $app-config.yaml /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; done;" failed
command juju ssh {} -- "cd /tmp/310c1585-66c5-43db-9bab-960f0804e520/addons; mkdir /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; . /etc/profile.d/juju-introspection.sh; for app in $(juju_machine_lock | grep -E '^\w' |sed 's/-[0-9]\+:$//g' | sed 's/unit-//g'); do cp $app-config.yaml /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; done;" failed
command juju ssh {} -- "cd /tmp/310c1585-66c5-43db-9bab-960f0804e520/addons; mkdir /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; . /etc/profile.d/juju-introspection.sh; for app in $(juju_machine_lock | grep -E '^\w' |sed 's/-[0-9]\+:$//g' | sed 's/unit-//g'); do cp $app-config.yaml /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; done;" failed
command juju ssh {} -- "cd /tmp/310c1585-66c5-43db-9bab-960f0804e520/addons; mkdir /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; . /etc/profile.d/juju-introspection.sh; for app in $(juju_machine_lock | grep -E '^\w' |sed 's/-[0-9]\+:$//g' | sed 's/unit-//g'); do cp $app-config.yaml /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; done;" failed
command juju ssh {} -- "cd /tmp/310c1585-66c5-43db-9bab-960f0804e520/addons; mkdir /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; . /etc/profile.d/juju-introspection.sh; for app in $(juju_machine_lock | grep -E '^\w' |sed 's/-[0-9]\+:$//g' | sed 's/unit-//g'); do cp $app-config.yaml /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; done;" failed
command juju ssh {} -- "cd /tmp/310c1585-66c5-43db-9bab-960f0804e520/addons; mkdir /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; . /etc/profile.d/juju-introspection.sh; for app in $(juju_machine_lock | grep -E '^\w' |sed 's/-[0-9]\+:$//g' | sed 's/unit-//g'); do cp $app-config.yaml /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; done;" failed
command juju ssh {} -- "cd /tmp/310c1585-66c5-43db-9bab-960f0804e520/addons; mkdir /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; . /etc/profile.d/juju-introspection.sh; for app in $(juju_machine_lock | grep -E '^\w' |sed 's/-[0-9]\+:$//g' | sed 's/unit-//g'); do cp $app-config.yaml /tmp/310c1585-66c5-43db-9bab-960f0804e520/addon_output/config; done;" failed

The resulting crashdump .tar.gz doesn't include config info or debug action output.

Affected version:

$ which juju-crashdump
/snap/bin/juju-crashdump
$ snap list juju-crashdump
Name            Version              Rev  Tracking  Publisher    Notes
juju-crashdump  1.0.2+git55.8fcf96c  156  edge      jason-hobbs  classic

Scope request to a single application/unit

It would be great if I could target the crashdump collector to only a single Application/Unit level scope. I appreciate wanting the entire model however this could bloat a debug dump from a succinct unit's output that's in failure mode, and skip all the healthy units that came up fine.

This would be more useful for point-in-time captures for things like bug feedback and peer review of a feature under development in a failure-mode while pair programming.

Not collecting logs from errored units

not collecting logs from specific units which are in an error state (in an openstack deployment). - all other units have crashdump logs, only the errored units are missing them.

Example 1:

ceph-mon/0 error idle 1/lxd/0 10.245.168.46 hook failed: "radosgw-relation-changed"

crashdump error:

/snap/bin/juju-crashdump -o crashdumps/arm64-mosci-ruxton-maas-2030.tar.xz
Command "juju scp 1/lxd/0:/tmp/juju-dump-8bcc3e57-0831-4f1a-94f0-55f35194abc5.tar 8b65003b-6656-4e75-93c2-f832869680a4.tar" failed
Command "tar -pxf 8b65003b-6656-4e75-93c2-f832869680a4.tar -C 1/lxd/0" failed
Command "rm 8b65003b-6656-4e75-93c2-f832869680a4.tar" failed

Example 2:

glance/0* error idle 2/lxd/2 10.245.168.46 hook failed: "install"

/snap/bin/juju-crashdump -o crashdumps/amd64-mosci-ruxton-maas-2031.tar.xz
Command "juju scp 2/lxd/2:/tmp/juju-dump-62aca719-25fb-4d93-a366-8a3112c38f73.tar 2af6c82b-e785-4abf-86ad-fab8c60e7ac7.tar" failed
Command "tar -pxf 2af6c82b-e785-4abf-86ad-fab8c60e7ac7.tar -C 2/lxd/2" failed
Command "rm 2af6c82b-e785-4abf-86ad-fab8c60e7ac7.tar" failed

Unfortunately the deployments had been destroyed before I could check the units to see if the tar files were actually there.

KeyError from retrieve_unit_tarballs

$ juju-crashdump -m default -o /tmp/

  • MachineId: "0"
    Stdout: ""
  • MachineId: 0/lxd/0
    Stdout: ""
  • MachineId: 0/lxd/1
    Stdout: ""
  • MachineId: 0/lxd/2
    Stdout: ""
  • MachineId: 0/lxd/3
    Stdout: ""
  • MachineId: 0/lxd/4
    Stdout: ""
  • MachineId: "1"
    Stdout: ""

Traceback (most recent call last):
File "/snap/juju-crashdump/1/bin/juju-crashdump", line 259, in
main()
File "/snap/juju-crashdump/1/bin/juju-crashdump", line 253, in main
filename = collector.collect()
File "/snap/juju-crashdump/1/bin/juju-crashdump", line 159, in collect
self.retrieve_unit_tarballs()
File "/snap/juju-crashdump/1/bin/juju-crashdump", line 141, in retrieve_unit_tarballs
any_unit = alias_group.intersection(units).pop()
KeyError: 'pop from an empty set'

debug-log doesn't include the date, but time only

def juju_debuglog():
juju_cmd('debug-log --replay --no-tail', to_file='debug_log.txt')

would be nice to have --date which is false by default so that we easily know which date the log was collected.

--date  (= false)
    Show dates as well as times

Without --date

unit-containerd-1: 18:07:04 INFO unit.containerd/1.juju-log status-set: active: Container runtime available
unit-containerd-1: 18:07:04 INFO juju.worker.uniter.operation ran "update-status" hook (via explicit, bespoke hook script)

With --date

unit-containerd-1: 2020-09-07 18:07:04 INFO unit.containerd/1.juju-log status-set: active: Container runtime available
unit-containerd-1: 2020-09-07 18:07:04 INFO juju.worker.uniter.operation ran "update-status" hook (via explicit, bespoke hook script)

Crashdump leaves important information out as -f is 20MB by default

It's very common (if not the rule) in production environments logs with more than 20MB.
As its also not easy to ask customer to check the size of those files, before running the tool,
and them calling it with '-f <right_size_of_the_file_I_want>', it would be nice at least that -f would
tail the logs and send at least the last 20MB of information.

Exclude argument is not passed on to tar collecting data on units

It appears the exclude (-x / --exclude) was first developed for a specific set of juju specific files.

Since juju crashdump now also collects data from other places on the system, it would be useful to pass the exclude list on to the tar command collecting data elsewhere too.

tar_cmd = (
"sudo find {dirs} -mount -type f -size -{max_size}c -o -size "
"{max_size}c 2>/dev/null | sudo tar -pcf /tmp/juju-dump-{uniq}.tar"
" --files-from - 2>/dev/null"
).format(dirs=" ".join(directories),
max_size=self.max_size,
uniq=self.uniq)
self._run_all(';'.join([
tar_cmd,
_append("cmd_output", "."),
_append("", "journalctl"),
_append("addon_output", "."),
]))

Add juju metrics collection

Juju provides wrappers on each machine that can be used to collect metrics
/etc/profile.d/juju-intospection shows all the calls

Call with 'juju run' or 'juju ssh'

The juju team asked that the following be collected by juju-crashdump:

  1. juju_machine_lock
    • Can help with understanding deployment failures
    • every machine
  2. Juju_metrics
    • get from all API servers
  3. Juju_goroutines
    • collects details about current execution path of each go routine
  4. Juju_presence_report
    • Get from all API servers
    • Who is connected to who

Option to disable compression

In a standard OpenStack deployment, juju-crashdump collects a fair amount of data which is basically good. However, the compression takes a significant amount of time so it would be nice to have an option to disable the compression. So that the tar file can be transferred to a separate host as is, not to affect the CPU resource of the juju client machine itself.

[with --small]
3.0GB (before compressed) -> 12m32.913s -> 452MB (xz)

[without --small:
33GB (before compressed) -> takes forever... -> ???

Add option to upload to GitHub bug

Crash reports can be uploaded to Launchpad bugs, but many charms / bundles are now primarily on GitHub. We should support uploading to an existing bug and / or (ideally) creating a new bug.

Breakage @ master

$ sudo snap refresh --channel=edge juju-crashdump 
juju-crashdump (edge) 1.0.2+git35.05671e4 from Cory Johns (johnsca) refreshed

$ juju crashdump
Traceback (most recent call last):
  File "/snap/juju-crashdump/102/bin/juju-crashdump", line 11, in <module>
    load_entry_point('jujucrashdump==0.0.0', 'console_scripts', 'juju-crashdump')()
  File "/snap/juju-crashdump/102/lib/python2.7/site-packages/jujucrashdump/crashdump.py", line 393, in main
    opts.addons_file.insert(0, ADDONS_FILE_PATH)
AttributeError: 'NoneType' object has no attribute 'insert'

Latest juju-crashdump (rev 259) fails to run on arm64

With the latest revision 259 (which is the same for arm64 and amd64) of the juju-crashdump snap we failed to run it on arm64 systems.

17:54:34  + timeout -s KILL 5m sudo -u root -H /snap/bin/juju crashdump -s -a debug-layer
17:54:34  /snap/juju-crashdump/259/snap/command-chain/snapcraft-runner: 3: exec: /snap/juju-crashdump/259/usr/bin/python3: Exec format error

This smells like the amd64 snap was published for arm64

Please include `juju export-bundle`

juju export-bundle may not be the perfect reflection of the deployment, but still would be nice to capture the output in the crashdump as in juju export-bundle --filename mymodel.yaml

Crashdump upload broken

Getting:

juju-crashdump -b 1838964 -l DEBUG
2020-10-23 12:38:02,977 - juju-crashdump started.
2020-10-23 12:38:02,977 - Apport not available in this environment.
You must 'apt install' apport to use the 'bug' option.
Aborting run.

The snapcraft.yaml includes python-apport but also specifices python3:

parts:
  juju-crashdump:
    plugin: python
    python-version: python3
    build-packages: [lsb-release]
    stage-packages:
      - python-apport
      - jq

Afaict the motivation to not switch to python3-apport is https://bugs.launchpad.net/ubuntu/+source/python-launchpadlib/+bug/1425575

Add option to sanitise data

Some people might want to run this tool in production environments that contain sensitive data e.g. passwords, that they do not wish to share in the public domain. It would be nice to have an option (or perhaps just do this by default) to sanitise data pulled.

Exception when running juju-crashdump on a kubernetes-model

i'm GUESSING that list(services.values()) is empty?

❯ snap list juju-crashdump
Name            Version               Rev  Tracking       Publisher    Notes
juju-crashdump  1.0.2+git114.6c1c71e  259  latest/stable  jason-hobbs  classic

❯ juju-crashdump -m test-kubernetes-metrics-server-krgf
2022-04-28 14:44:36,738 - juju-crashdump started.
Traceback (most recent call last):
  File "/snap/juju-crashdump/259/bin/juju-crashdump", line 33, in <module>
    sys.exit(load_entry_point('jujucrashdump==0.0.0', 'console_scripts', 'juju-crashdump')())
  File "/snap/juju-crashdump/259/lib/python3.6/site-packages/jujucrashdump/crashdump.py", line 597, in main
    filename = collector.collect()
  File "/snap/juju-crashdump/259/lib/python3.6/site-packages/jujucrashdump/crashdump.py", line 387, in collect
    self.run_addons()
  File "/snap/juju-crashdump/259/lib/python3.6/site-packages/jujucrashdump/crashdump.py", line 277, in run_addons
    units = [v for v in list(set.union(*list(services.values()))) if "/" in v]
TypeError: descriptor 'union' of 'set' object needs an argument

Collect systemd service log

Collecting the systemd logs for a service using journalctl seems like it would be a common enough need that it should be supported directly in crashdump without the need for layer:debug doing it at the charm level.

turns out /var/lib/juju/agents isn't small

#29 was fixed on the theory that /var/lib/juju/agents would be small. It isn't.

For just one container, it's 67M in this example. We have around 70 containers - that's 4 GB.

ubuntu@staging-cpe-140a4b83-023c-40b6-a5a3-da929cef4535:~/project/17/lxd/0/var/lib/juju/agents$ du -ckhs *
4.0K machine-17-lxd-0
4.2M unit-ceph-mon-2
31M unit-filebeat-60
944K unit-nrpe-container-37
32M unit-telegraf-61
67M total

We need to not collect it by default.

capture additional juju debug info

For each machine run:

juju_machine_lock
juju_engine_report - run without args and run with each unit log.

I mentioned this the other day in IRC, but thought I'd follow up in email.

By adding /var/lib/juju/agents to the crash dump we'll see local
configuration files, and also charm related information. It will grab
the entire charm and charm state for any deployed unit. It shouldn't be
too big.

We do not want /var/lib/juju/tools for example, as it would add a huge
amount, and we don't care about the agent binaries.

optionally collect application config settings

config values are critical for debugging many issues.

We should add a feature to enable optionally collecting config settings for all applications, and to allow excluding some config settings, since config settings may include secrets.

stdout and stderr from failed commands are sent to /dev/null

I recently ran juju-crashdump with:

juju-crashdump -s -a debug-layer -a config -m mymodel

and received this output:

2021-08-09 18:04:51,930 - juju-crashdump started.
2021-08-09 18:05:51,103 - command juju ssh {} -- "cd /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/addons; for unit in $(cat debug-layer-units); do sudo juju-run $unit actions/debug; done; cp /home/ubuntu/debug-*.tar.gz /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/addon_output || true" failed
2021-08-09 18:05:51,104 - command juju ssh {} -- "cd /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/addons; for unit in $(cat debug-layer-units); do sudo juju-run $unit actions/debug; done; cp /home/ubuntu/debug-*.tar.gz /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/addon_output || true" failed
2021-08-09 18:06:01,741 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'mkdir -p /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/cmd_output;sudo netstat -taupn | grep LISTEN 2>/dev/null | tee /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/cmd_output/listening.txt || true'" failed
2021-08-09 18:06:01,743 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'mkdir -p /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/cmd_output;sudo netstat -taupn | grep LISTEN 2>/dev/null | tee /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/cmd_output/listening.txt || true'" failed
2021-08-09 18:06:04,813 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'mkdir -p /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/cmd_output;sudo ps aux | tee /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/cmd_output/psaux.txt || true'" failed
2021-08-09 18:06:04,815 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'mkdir -p /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/cmd_output;sudo ps aux | tee /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/cmd_output/psaux.txt || true'" failed
2021-08-09 18:06:07,885 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'mkdir -p /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa'" failed
2021-08-09 18:06:07,885 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'mkdir -p /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa'" failed
2021-08-09 18:06:10,957 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'find /etc/alternatives /etc/ceilometer /etc/ceph /etc/cinder /etc/cloud /etc/glance /etc/gnocchi /etc/keystone /etc/netplan /etc/network /etc/neutron /etc/nova /etc/quantum /etc/swift /etc/udev/rules.d /lib/udev/rules.d /opt/nedge/var/log /run/cloud-init /usr/share/lxc/config /var/lib/charm /var/lib/libvirt/filesystems/plumgrid-data/log /var/lib/libvirt/filesystems/plumgrid/var/log /var/lib/cloud/seed /var/log /var/snap/simplestreams/common/sstream-mirror-glance.log /var/crash /var/snap/juju-db/common/logs/ /var/lib/mysql/*-mysql-router /tmp/juju-exec*/script.sh /var/lib/lxd/containers/*/rootfs/etc/alternatives /var/lib/lxd/containers/*/rootfs/etc/ceilometer /var/lib/lxd/containers/*/rootfs/etc/ceph /var/lib/lxd/containers/*/rootfs/etc/cinder /var/lib/lxd/containers/*/rootfs/etc/cloud /var/lib/lxd/containers/*/rootfs/etc/glance /var/lib/lxd/containers/*/rootfs/etc/gnocchi /var/lib/lxd/containers/*/rootfs/etc/keystone /var/lib/lxd/containers/*/rootfs/etc/netplan /var/lib/lxd/containers/*/rootfs/etc/network /var/lib/lxd/containers/*/rootfs/etc/neutron /var/lib/lxd/containers/*/rootfs/etc/nova /var/lib/lxd/containers/*/rootfs/etc/quantum /var/lib/lxd/containers/*/rootfs/etc/swift /var/lib/lxd/containers/*/rootfs/etc/udev/rules.d /var/lib/lxd/containers/*/rootfs/lib/udev/rules.d /var/lib/lxd/containers/*/rootfs/opt/nedge/var/log /var/lib/lxd/containers/*/rootfs/run/cloud-init /var/lib/lxd/containers/*/rootfs/usr/share/lxc/config /var/lib/lxd/containers/*/rootfs/var/lib/charm /var/lib/lxd/containers/*/rootfs/var/lib/libvirt/filesystems/plumgrid-data/log /var/lib/lxd/containers/*/rootfs/var/lib/libvirt/filesystems/plumgrid/var/log /var/lib/lxd/containers/*/rootfs/var/lib/cloud/seed /var/lib/lxd/containers/*/rootfs/var/log /var/lib/lxd/containers/*/rootfs/var/snap/simplestreams/common/sstream-mirror-glance.log /var/lib/lxd/containers/*/rootfs/var/crash /var/lib/lxd/containers/*/rootfs/var/snap/juju-db/common/logs/ /var/lib/lxd/containers/*/rootfs/var/lib/mysql/*-mysql-router /var/lib/lxd/containers/*/rootfs/tmp/juju-exec*/script.sh -mount -type f -size -5000000c -o -size 5000000c 2>/dev/null | tar -pcf /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar --files-from - 2>/dev/null'" failed
2021-08-09 18:06:10,959 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'find /etc/alternatives /etc/ceilometer /etc/ceph /etc/cinder /etc/cloud /etc/glance /etc/gnocchi /etc/keystone /etc/netplan /etc/network /etc/neutron /etc/nova /etc/quantum /etc/swift /etc/udev/rules.d /lib/udev/rules.d /opt/nedge/var/log /run/cloud-init /usr/share/lxc/config /var/lib/charm /var/lib/libvirt/filesystems/plumgrid-data/log /var/lib/libvirt/filesystems/plumgrid/var/log /var/lib/cloud/seed /var/log /var/snap/simplestreams/common/sstream-mirror-glance.log /var/crash /var/snap/juju-db/common/logs/ /var/lib/mysql/*-mysql-router /tmp/juju-exec*/script.sh /var/lib/lxd/containers/*/rootfs/etc/alternatives /var/lib/lxd/containers/*/rootfs/etc/ceilometer /var/lib/lxd/containers/*/rootfs/etc/ceph /var/lib/lxd/containers/*/rootfs/etc/cinder /var/lib/lxd/containers/*/rootfs/etc/cloud /var/lib/lxd/containers/*/rootfs/etc/glance /var/lib/lxd/containers/*/rootfs/etc/gnocchi /var/lib/lxd/containers/*/rootfs/etc/keystone /var/lib/lxd/containers/*/rootfs/etc/netplan /var/lib/lxd/containers/*/rootfs/etc/network /var/lib/lxd/containers/*/rootfs/etc/neutron /var/lib/lxd/containers/*/rootfs/etc/nova /var/lib/lxd/containers/*/rootfs/etc/quantum /var/lib/lxd/containers/*/rootfs/etc/swift /var/lib/lxd/containers/*/rootfs/etc/udev/rules.d /var/lib/lxd/containers/*/rootfs/lib/udev/rules.d /var/lib/lxd/containers/*/rootfs/opt/nedge/var/log /var/lib/lxd/containers/*/rootfs/run/cloud-init /var/lib/lxd/containers/*/rootfs/usr/share/lxc/config /var/lib/lxd/containers/*/rootfs/var/lib/charm /var/lib/lxd/containers/*/rootfs/var/lib/libvirt/filesystems/plumgrid-data/log /var/lib/lxd/containers/*/rootfs/var/lib/libvirt/filesystems/plumgrid/var/log /var/lib/lxd/containers/*/rootfs/var/lib/cloud/seed /var/lib/lxd/containers/*/rootfs/var/log /var/lib/lxd/containers/*/rootfs/var/snap/simplestreams/common/sstream-mirror-glance.log /var/lib/lxd/containers/*/rootfs/var/crash /var/lib/lxd/containers/*/rootfs/var/snap/juju-db/common/logs/ /var/lib/lxd/containers/*/rootfs/var/lib/mysql/*-mysql-router /var/lib/lxd/containers/*/rootfs/tmp/juju-exec*/script.sh -mount -type f -size -5000000c -o -size 5000000c 2>/dev/null | tar -pcf /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar --files-from - 2>/dev/null'" failed
2021-08-09 18:06:11,442 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'find /etc/alternatives /etc/ceilometer /etc/ceph /etc/cinder /etc/cloud /etc/glance /etc/gnocchi /etc/keystone /etc/netplan /etc/network /etc/neutron /etc/nova /etc/quantum /etc/swift /etc/udev/rules.d /lib/udev/rules.d /opt/nedge/var/log /run/cloud-init /usr/share/lxc/config /var/lib/charm /var/lib/libvirt/filesystems/plumgrid-data/log /var/lib/libvirt/filesystems/plumgrid/var/log /var/lib/cloud/seed /var/log /var/snap/simplestreams/common/sstream-mirror-glance.log /var/crash /var/snap/juju-db/common/logs/ /var/lib/mysql/*-mysql-router /tmp/juju-exec*/script.sh /var/lib/lxd/containers/*/rootfs/etc/alternatives /var/lib/lxd/containers/*/rootfs/etc/ceilometer /var/lib/lxd/containers/*/rootfs/etc/ceph /var/lib/lxd/containers/*/rootfs/etc/cinder /var/lib/lxd/containers/*/rootfs/etc/cloud /var/lib/lxd/containers/*/rootfs/etc/glance /var/lib/lxd/containers/*/rootfs/etc/gnocchi /var/lib/lxd/containers/*/rootfs/etc/keystone /var/lib/lxd/containers/*/rootfs/etc/netplan /var/lib/lxd/containers/*/rootfs/etc/network /var/lib/lxd/containers/*/rootfs/etc/neutron /var/lib/lxd/containers/*/rootfs/etc/nova /var/lib/lxd/containers/*/rootfs/etc/quantum /var/lib/lxd/containers/*/rootfs/etc/swift /var/lib/lxd/containers/*/rootfs/etc/udev/rules.d /var/lib/lxd/containers/*/rootfs/lib/udev/rules.d /var/lib/lxd/containers/*/rootfs/opt/nedge/var/log /var/lib/lxd/containers/*/rootfs/run/cloud-init /var/lib/lxd/containers/*/rootfs/usr/share/lxc/config /var/lib/lxd/containers/*/rootfs/var/lib/charm /var/lib/lxd/containers/*/rootfs/var/lib/libvirt/filesystems/plumgrid-data/log /var/lib/lxd/containers/*/rootfs/var/lib/libvirt/filesystems/plumgrid/var/log /var/lib/lxd/containers/*/rootfs/var/lib/cloud/seed /var/lib/lxd/containers/*/rootfs/var/log /var/lib/lxd/containers/*/rootfs/var/snap/simplestreams/common/sstream-mirror-glance.log /var/lib/lxd/containers/*/rootfs/var/crash /var/lib/lxd/containers/*/rootfs/var/snap/juju-db/common/logs/ /var/lib/lxd/containers/*/rootfs/var/lib/mysql/*-mysql-router /var/lib/lxd/containers/*/rootfs/tmp/juju-exec*/script.sh -mount -type f -size -5000000c -o -size 5000000c 2>/dev/null | tar -pcf /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar --files-from - 2>/dev/null'" failed
2021-08-09 18:06:11,553 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'find /etc/alternatives /etc/ceilometer /etc/ceph /etc/cinder /etc/cloud /etc/glance /etc/gnocchi /etc/keystone /etc/netplan /etc/network /etc/neutron /etc/nova /etc/quantum /etc/swift /etc/udev/rules.d /lib/udev/rules.d /opt/nedge/var/log /run/cloud-init /usr/share/lxc/config /var/lib/charm /var/lib/libvirt/filesystems/plumgrid-data/log /var/lib/libvirt/filesystems/plumgrid/var/log /var/lib/cloud/seed /var/log /var/snap/simplestreams/common/sstream-mirror-glance.log /var/crash /var/snap/juju-db/common/logs/ /var/lib/mysql/*-mysql-router /tmp/juju-exec*/script.sh /var/lib/lxd/containers/*/rootfs/etc/alternatives /var/lib/lxd/containers/*/rootfs/etc/ceilometer /var/lib/lxd/containers/*/rootfs/etc/ceph /var/lib/lxd/containers/*/rootfs/etc/cinder /var/lib/lxd/containers/*/rootfs/etc/cloud /var/lib/lxd/containers/*/rootfs/etc/glance /var/lib/lxd/containers/*/rootfs/etc/gnocchi /var/lib/lxd/containers/*/rootfs/etc/keystone /var/lib/lxd/containers/*/rootfs/etc/netplan /var/lib/lxd/containers/*/rootfs/etc/network /var/lib/lxd/containers/*/rootfs/etc/neutron /var/lib/lxd/containers/*/rootfs/etc/nova /var/lib/lxd/containers/*/rootfs/etc/quantum /var/lib/lxd/containers/*/rootfs/etc/swift /var/lib/lxd/containers/*/rootfs/etc/udev/rules.d /var/lib/lxd/containers/*/rootfs/lib/udev/rules.d /var/lib/lxd/containers/*/rootfs/opt/nedge/var/log /var/lib/lxd/containers/*/rootfs/run/cloud-init /var/lib/lxd/containers/*/rootfs/usr/share/lxc/config /var/lib/lxd/containers/*/rootfs/var/lib/charm /var/lib/lxd/containers/*/rootfs/var/lib/libvirt/filesystems/plumgrid-data/log /var/lib/lxd/containers/*/rootfs/var/lib/libvirt/filesystems/plumgrid/var/log /var/lib/lxd/containers/*/rootfs/var/lib/cloud/seed /var/lib/lxd/containers/*/rootfs/var/log /var/lib/lxd/containers/*/rootfs/var/snap/simplestreams/common/sstream-mirror-glance.log /var/lib/lxd/containers/*/rootfs/var/crash /var/lib/lxd/containers/*/rootfs/var/snap/juju-db/common/logs/ /var/lib/lxd/containers/*/rootfs/var/lib/mysql/*-mysql-router /var/lib/lxd/containers/*/rootfs/tmp/juju-exec*/script.sh -mount -type f -size -5000000c -o -size 5000000c 2>/dev/null | tar -pcf /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar --files-from - 2>/dev/null'" failed
2021-08-09 18:06:14,029 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'tar --append -f /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar -C /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/cmd_output . || true;tar --append -f /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar -C /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/ journalctl || true;tar --append -f /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar -C /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/addon_output . || true'" failed
2021-08-09 18:06:14,036 - Command "timeout 45s ssh -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected] 'tar --append -f /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar -C /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/cmd_output . || true;tar --append -f /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar -C /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/ journalctl || true;tar --append -f /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar -C /tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/addon_output . || true'" failed
2021-08-09 18:06:17,102 - Command "scp -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected]:/tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar a4cb1336-c7c6-4e1d-b238-ed7478c40f1f.tar" failed
2021-08-09 18:06:17,105 - Command "scp -o StrictHostKeyChecking=no -i ~/.local/share/juju/ssh/juju_id_rsa [email protected]:/tmp/ce11e8be-cb51-4b5b-8555-7b5ab370f4fa/juju-dump-ce11e8be-cb51-4b5b-8555-7b5ab370f4fa.tar f6b8e6df-e43e-43fc-9506-c4f5294c0e5f.tar" failed
2021-08-09 18:06:23,904 - juju-crashdump finished.

Important information is missing from the crashdump; the debug-layer and config addons did not add their information to the crashdump like they were supposed to. I'm unable to troubleshoot this because all I can see from the output of juju-crashdump itself is that the commands "failed".

Command output is clearly being sent to /dev/null here, here, and here. Additionally, in the failure case here, stdout has been captured but is never logged. It would be really helpful to be able to see the command output so I can troubleshoot further.

feature request: provide `show-status-log -n 100` data on all units

feature request: provide show-status-log -n 100 data on all units

The unit status history is very helpful in retroactively diagnosing deployments. This is a feature request to begin collecting that information for all units, ex.:

juju show-status-log keystone/0 -n 100

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.