GithubHelp home page GithubHelp logo

hakavlad / nohang Goto Github PK

View Code? Open in Web Editor NEW
998.0 18.0 47.0 939 KB

A sophisticated low memory handler for Linux

License: MIT License

Python 96.44% Makefile 3.10% Shell 0.46%
out-of-memory oom psi sigterm zram cgroups oom-killer

nohang's Introduction

pic

nohang

Build Status CodeQL Packaging status

nohang package provides a highly configurable daemon for Linux which is able to correctly prevent out of memory (OOM) and keep system responsiveness in low memory conditions.

The package also includes additional diagnostic tools (oom-sort, psi2log, psi-top).

What is the problem?

OOM conditions may cause freezes, livelocks, drop caches and processes to be killed (via sending SIGKILL) instead of trying to terminate them correctly (via sending SIGTERM or takes other corrective action). Some applications may crash if it's impossible to allocate memory.

Here are the statements of some users:

"How do I prevent Linux from freezing when out of memory? Today I (accidentally) ran some program on my Linux box that quickly used a lot of memory. My system froze, became unresponsive and thus I was unable to kill the offender. How can I prevent this in the future? Can't it at least keep a responsive core or something running?"

serverfault

"With or without swap it still freezes before the OOM killer gets run automatically. This is really a kernel bug that should be fixed (i.e. run OOM killer earlier, before dropping all disk cache). Unfortunately kernel developers and a lot of other folk fail to see the problem. Common suggestions such as disable/enable swap, buy more RAM, run less processes, set limits etc. do not address the underlying problem that the kernel's low memory handling sucks camel's balls."

serverfault

"The traditional Linux OOM killer works fine in some cases, but in others it kicks in too late, resulting in the system entering a livelock for an indeterminate period."

engineering.fb.com

Also look at these discussions:

Solution

Use one of the userspace OOM killers:

  • earlyoom: This is a simple, stable and tiny OOM prevention daemon written in C (the best choice for emedded and old servers). It has a minimum dependencies and can work with oldest kernels. It is enabled by default on Fedora 32 Workstation (and F33 KDE).
  • oomd: This is a userspace OOM killer for linux systems written in C++ and developed by Facebook. This is the best choice for use in large data centers. It needs Linux 4.20+.
  • systemd-oomd: Provided by systemd as systemd-oomd.service that uses cgroups-v2 and pressure stall information (PSI) to monitor and take action on processes before an OOM occurs in kernel space. It's used by default on desktop versions of Fedora 34.
  • low-memory-monitor: There's a project announcement.
  • psi-monitor: It's used by default on Endless OS.
  • nohang: nohang is earlyoom on steroids and has many useful features, see below. Maybe this is a good choice for modern desktops and servers if you need fine-tuning. Previously it was used by default on Garuda Linux.

Use these tools to improve responsiveness during heavy swapping:

  • MGLRU patchset is merged in Linux 6.1. Setting min_ttl_ms > 50 can help you.
  • le9-patch: [PATCH] mm: Protect clean file pages under memory pressure to prevent thrashing, avoid high latency and prevent livelock in near-OOM conditions. It's kernel-side solution that can fix the OOM killer behavior.
  • prelockd: Lock executables and shared libraries in memory to improve system responsiveness under low-memory conditions.
  • memavaild: Keep amount of available memory by evicting memory of selected cgroups into swap space.
  • uresourced: This daemon will give resource allocations to active graphical users. It's enabled by default on Fedora 33 Workstation.

Of course, you can also download more RAM, tune virtual memory, use zram/zswap and use limits for cgroups.

Features

  • Sending the SIGTERM signal is default corrective action. If the victim does not respond to SIGTERM, with a further drop in the level of memory it gets SIGKILL;
  • Customizing victim selection: impact on the badness of processes via matching their names, cgroups, exe realpathes, environs, cmdlines and euids with specified regular expressions;
  • Customizing corrective actions: if the name or control group of the victim matches a certain regex pattern, you can run any command instead of sending the SIGTERM signal (the default corrective action) to the victim. For example:
    • systemctl restart foo;
    • kill -INT $PID (you can override the signal sent to the victim, $PID will be replaced by the victim's PID).
  • GUI notifications:
    • Notification of corrective actions taken and displaying the name and PID of the victim;
    • Low memory warnings.
  • zram support (mem_used_total as a trigger);
  • PSI (pressure stall information) support;
  • Optional checking kernel messages for OOM events;
  • Easy setup with configuration files (nohang.conf, nohang-desktop.conf).

Demo

nohang prevents Out Of Memory with GUI notifications:

Requirements

For basic usage:

  • Linux (>= 3.14, since MemAvailable appeared in /proc/meminfo)
  • Python (>= 3.3)

To respond to PSI metrics (optional):

  • Linux (>= 4.20) with CONFIG_PSI=y

To show GUI notifications (optional):

  • notification server (most of desktop environments use their own implementations)
  • libnotify (Arch Linux, Fedora, openSUSE) or libnotify-bin (Debian GNU/Linux, Ubuntu)
  • sudo if nohang started with UID=0.

Memory and CPU usage

  • VmRSS is about 10–14 MiB instead of the settings, about 10–11 MiB by default (with Python <= 3.8), about 16–17 MiB with Python 3.9.
  • CPU usage depends on the level of available memory and monitoring intensity.

Warnings

  • the daemon runs with super-user privileges and has full access to all private memory of all processes and sensitive user data;
  • the daemon does not forbid you to shoot yourself in the foot: with some settings, unwanted killings of processes can occur;
  • the daemon is not a panacea: there are no universal settings that reliably protect against all types of threats.

Known problems

  • The documentation is terrible.
  • The ZFS ARC cache is memory-reclaimable, like the Linux buffer cache. However, in contrast to the buffer cache, it currently does not count to MemAvailable (see openzfs/zfs#10255). See also rfjakob/earlyoom#191 and #89.
  • Linux kernels without CONFIG_CGROUP_CPUACCT=y (linux-ck, for example) provide incorrect PSI metrics, see issue.

nohang vs nohang-desktop

nohang comes with two configs: nohang.conf and nohang-desktop.conf. nohang comes with two systemd service unit files: nohang.service and nohang-desktop.service. Choose one.

  • nohang.conf provides vanilla default settings without PSI checking enabled, without any badness correction and without GUI notifications enabled.
  • nohang-desktop.conf provides default settings optimized for desktop usage.

How to install

To install on Fedora:

Orphaned for 6+ weeks, not available.

To install on RHEL 7 and RHEL 8:

nohang is avaliable in EPEL repos.

sudo yum install nohang
sudo systemctl enable nohang.service
sudo systemctl start nohang.service

To enable PSI on RHEL 8 pass psi=1 to kernel boot cmdline.

For Arch Linux there's an AUR package

Use your favorite AUR helper. For example,

yay -S nohang-git
sudo systemctl enable --now nohang-desktop.service

To install on Ubuntu 20.04/20.10

To install from PPA:

sudo add-apt-repository ppa:oibaf/test
sudo apt update
sudo apt install nohang
sudo systemctl enable --now nohang-desktop.service

To install on Debian and Ubuntu-based systems:

Outdated and buggy nohang v0.1 release was packaged for Debian 11 and Ubuntu 20.10.

It's easy to build a deb package with the latest git snapshot. Install build dependencies:

sudo apt install make fakeroot

Clone the latest git snapshot and run the build script to build the package:

git clone https://github.com/hakavlad/nohang.git && cd nohang
deb/build.sh

Install the package:

sudo apt install --reinstall ./deb/package.deb

Start and enable nohang.service or nohang-desktop.service after installing the package:

sudo systemctl enable --now nohang-desktop.service

To install on Gentoo and derivatives (e.g. Funtoo):

Add the eph kit overlay, for example using layman or as a local repository. Then update your repos:

sudo layman -S # if added via layman
sudo emerge --sync # local repo on Gentoo
sudo ego sync # local repo on Funtoo

Install:

sudo emerge -a nohang

Start the service:

sudo rc-service nohang-desktop start

Optionally add to startup:

sudo rc-update add nohang-desktop default

To install the latest version on any distro:

git clone https://github.com/hakavlad/nohang.git && cd nohang
sudo make install

Config files will be located in /usr/local/etc/nohang/. To enable and start unit without GUI notifications:

sudo systemctl enable --now nohang.service

To enable and start unit with GUI notifications:

sudo systemctl enable --now nohang-desktop.service

On systems with OpenRC:

sudo make install-openrc

To uninstall:

sudo make uninstall

Command line options

./nohang -h
usage: nohang [-h|--help] [-v|--version] [-m|--memload]
              [-c|--config CONFIG] [--check] [--monitor] [--tasks]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show version of installed package and exit
  -m, --memload         consume memory until 40 MiB (MemAvailable + SwapFree)
                        remain free, and terminate the process
  -c CONFIG, --config CONFIG
                        path to the config file. This should only be used
                        with one of the following options:
                        --monitor, --tasks, --check
  --check               check and show the configuration and exit. This should
                        only be used with -c/--config CONFIG option
  --monitor             start monitoring. This should only be used with
                        -c/--config CONFIG option
  --tasks               show tasks state and exit. This should only be used
                        with -c/--config CONFIG option

How to configure

The program can be configured by editing the config file. The configuration includes the following sections:

  1. Checking kernel messages for OOM events;
  2. Common zram settings;
  3. Common PSI settings;
  4. Poll rate;
  5. Warnings and notifications;
  6. Soft threshold;
  7. Hard threshold;
  8. Customize victim selection;
  9. Customize soft corrective actions;
  10. Misc settings;
  11. Verbosity, debug, logging.

Just read the description of the parameters and edit the values. Please restart the daemon to apply the changes.

How to test nohang

  • The safest way is to run nohang --memload. This causes memory consumption, and the process will exits before OOM occurs.
  • Another way is to run tail /dev/zero. This causes fast memory comsumption and causes OOM at the end.

If testing occurs while nohang is running, these processes should be terminated before OOM occurs.

Tasks state

Run sudo nohang -c/--config CONFIG --tasks to see the table of prosesses with their badness values, oom_scores, names, UIDs etc.

Output example
Config: /etc/nohang/nohang.conf
###################################################################################################################
#    PID     PPID  badness  oom_score  oom_score_adj        eUID  S  VmSize  VmRSS  VmSwap  Name             CGroup
#-------  -------  -------  ---------  -------------  ----------  -  ------  -----  ------  ---------------  --------
#    336        1        1          1              0           0  S      85     25       0  systemd-journal  /system.slice/systemd-journald.service
#    383        1        0          0          -1000           0  S      46      5       0  systemd-udevd    /system.slice/systemd-udevd.service
#    526     2238        7          7              0        1000  S     840     96       0  kate             /user.slice/user-1000.slice/session-7.scope
#    650        1        3          3              0        1000  S     760     50       0  kate             /user.slice/user-1000.slice/session-7.scope
#    731        1        0          0              0         100  S     126      4       0  systemd-timesyn  /system.slice/systemd-timesyncd.service
#    756        1        0          0              0         105  S     181      3       0  rtkit-daemon     /system.slice/rtkit-daemon.service
#    759        1        0          0              0           0  S     277      7       0  accounts-daemon  /system.slice/accounts-daemon.service
#    761        1        0          0              0           0  S     244      3       0  rsyslogd         /system.slice/rsyslog.service
#    764        1        0          0           -900         108  S      45      5       0  dbus-daemon      /system.slice/dbus.service
#    805        1        0          0              0           0  S      46      5       0  systemd-logind   /system.slice/systemd-logind.service
#    806        1        0          0              0           0  S      35      3       0  irqbalance       /system.slice/irqbalance.service
#    813        1        0          0              0           0  S      29      3       0  cron             /system.slice/cron.service
#    814        1       11         11              0           0  S     176    160       0  memlockd         /system.slice/memlockd.service
#    815        1        0          0            -10           0  S      32      9       0  python3          /fork.slice/fork-bomb.slice/fork-bomb-killer.slice/fork-bomb-killer.service
#    823        1        0          0              0           0  S      25      4       0  smartd           /system.slice/smartd.service
#    826        1        0          0              0         113  S      46      3       0  avahi-daemon     /system.slice/avahi-daemon.service
#    850      826        0          0              0         113  S      46      0       0  avahi-daemon     /system.slice/avahi-daemon.service
#    868        1        0          0              0           0  S     281      8       0  polkitd          /system.slice/polkit.service
#    903        1        1          1              0           0  S    4094     16       0  stunnel4         /system.slice/stunnel4.service
#    940        1        0          0           -600           0  S      39     10       0  python3          /nohang.slice/nohang.service
#   1014        1        0          0              0          13  S      22      2       0  obfs-local       /system.slice/obfs-local.service
#   1015        1        0          0              0        1000  S      36      4       0  ss-local         /system.slice/ss-local.service
#   1023        1        0          0              0         116  S      33      2       0  dnscrypt-proxy   /system.slice/dnscrypt-proxy.service
#   1029        1        1          1              0         119  S    4236     16       0  privoxy          /system.slice/privoxy.service
#   1035        1        0          0              0           0  S     355      6       0  lightdm          /system.slice/lightdm.service
#   1066        1        0          0              0           0  S      45      7       0  wpa_supplicant   /system.slice/wpa_supplicant.service
#   1178        1        0          0              0           0  S      14      2       0  agetty           /system.slice/system-getty.slice/[email protected]
#   1294        1        0          0          -1000           0  S       4      1       0  watchdog         /system.slice/watchdog.service
#   1632        1        1          1              0        1000  S    1391     22       0  pulseaudio       /user.slice/user-1000.slice/session-2.scope
#   1689     1632        0          0              0        1000  S     125      5       0  gconf-helper     /user.slice/user-1000.slice/session-2.scope
#   1711        1        0          0              0           0  S     367      8       0  udisksd          /system.slice/udisks2.service
#   1819        1        0          0              0           0  S     304      8       0  upowerd          /system.slice/upower.service
#   1879        1        0          0              0        1000  S      64      7       0  systemd          /user.slice/user-1000.slice/[email protected]/init.scope
#   1880     1879        0          0              0        1000  S     229      2       0  (sd-pam)         /user.slice/user-1000.slice/[email protected]/init.scope
#   1888        1        0          0              0           0  S      14      2       0  agetty           /system.slice/system-getty.slice/[email protected]
#   1889        1        0          0              0           0  S      14      2       0  agetty           /system.slice/system-getty.slice/[email protected]
#   1890        1        0          0              0           0  S      14      2       0  agetty           /system.slice/system-getty.slice/[email protected]
#   1891        1        0          0              0           0  S      14      2       0  agetty           /system.slice/system-getty.slice/[email protected]
#   1892        1        0          0              0           0  S      14      2       0  agetty           /system.slice/system-getty.slice/[email protected]
#   1893     1035       14         14              0           0  R     623    208       0  Xorg             /system.slice/lightdm.service
#   1904        1        0          0              0         111  S      64      7       0  systemd          /user.slice/user-111.slice/[email protected]/init.scope
#   1905     1904        0          0              0         111  S     229      2       0  (sd-pam)         /user.slice/user-111.slice/[email protected]/init.scope
#   1916     1904        0          0              0         111  S      44      3       0  dbus-daemon      /user.slice/user-111.slice/[email protected]/dbus.service
#   1920        1        0          0              0         111  S     215      5       0  at-spi2-registr  /user.slice/user-111.slice/session-c2.scope
#   1922     1904        0          0              0         111  S     278      6       0  gvfsd            /user.slice/user-111.slice/[email protected]/gvfs-daemon.service
#   1935     1035        0          0              0           0  S     238      6       0  lightdm          /user.slice/user-1000.slice/session-7.scope
#   1942        1        0          0              0        1000  S     210      9       0  gnome-keyring-d  /user.slice/user-1000.slice/session-7.scope
#   1944     1935        1          1              0        1000  S     411     21       0  mate-session     /user.slice/user-1000.slice/session-7.scope
#   1952     1879        0          0              0        1000  S      45      5       0  dbus-daemon      /user.slice/user-1000.slice/[email protected]/dbus.service
#   1981     1944        0          0              0        1000  S      11      0       0  ssh-agent        /user.slice/user-1000.slice/session-7.scope
#   1984     1879        0          0              0        1000  S     278      6       0  gvfsd            /user.slice/user-1000.slice/[email protected]/gvfs-daemon.service
#   1990     1879        0          0              0        1000  S     341      5       0  at-spi-bus-laun  /user.slice/user-1000.slice/[email protected]/at-spi-dbus-bus.service
#   1995     1990        0          0              0        1000  S      44      4       0  dbus-daemon      /user.slice/user-1000.slice/[email protected]/at-spi-dbus-bus.service
#   1997     1879        0          0              0        1000  S     215      5       0  at-spi2-registr  /user.slice/user-1000.slice/[email protected]/at-spi-dbus-bus.service
#   2000     1879        0          0              0        1000  S     184      5       0  dconf-service    /user.slice/user-1000.slice/[email protected]/dbus.service
#   2009     1944        2          2              0        1000  S    1308     35       0  mate-settings-d  /user.slice/user-1000.slice/session-7.scope
#   2013     1944        2          2              0        1000  S     436     32       0  marco            /user.slice/user-1000.slice/session-7.scope
#   2024     1944        4          4              0        1000  S    1258     55       0  caja             /user.slice/user-1000.slice/session-7.scope
#   2032        1        1          1              0        1000  S     333     18       0  msd-locate-poin  /user.slice/user-1000.slice/session-7.scope
#   2033     1879        0          0              0        1000  S     348     11       0  gvfs-udisks2-vo  /user.slice/user-1000.slice/[email protected]/gvfs-udisks2-volume-monitor.service
#   2036     1944        1          1              0        1000  S     331     17       0  polkit-mate-aut  /user.slice/user-1000.slice/session-7.scope
#   2038     1944        5          5              0        1000  S     682     78       0  mate-panel       /user.slice/user-1000.slice/session-7.scope
#   2041     1944        2          2              0        1000  S     514     31       0  nm-applet        /user.slice/user-1000.slice/session-7.scope
#   2046     1944        1          1              0        1000  S     495     25       0  mate-power-mana  /user.slice/user-1000.slice/session-7.scope
#   2047     1944        2          2              0        1000  S     692     32       0  mate-volume-con  /user.slice/user-1000.slice/session-7.scope
#   2049     1944        3          3              0        1000  S     548     44       0  mate-screensave  /user.slice/user-1000.slice/session-7.scope
#   2059     1879        0          0              0        1000  S     263      5       0  gvfs-goa-volume  /user.slice/user-1000.slice/[email protected]/gvfs-goa-volume-monitor.service
#   2076     1879        0          0              0        1000  S     352      7       0  gvfsd-trash      /user.slice/user-1000.slice/[email protected]/gvfs-daemon.service
#   2077     1879        0          0              0        1000  S     362      7       0  gvfs-afc-volume  /user.slice/user-1000.slice/[email protected]/gvfs-afc-volume-monitor.service
#   2087     1879        0          0              0        1000  S     263      5       0  gvfs-mtp-volume  /user.slice/user-1000.slice/[email protected]/gvfs-mtp-volume-monitor.service
#   2093     1879        0          0              0        1000  S     275      6       0  gvfs-gphoto2-vo  /user.slice/user-1000.slice/[email protected]/gvfs-gphoto2-volume-monitor.service
#   2106     1879        3          3              0        1000  S     544     42       0  wnck-applet      /user.slice/user-1000.slice/[email protected]/dbus.service
#   2108     1879        1          1              0        1000  S     396     21       0  notification-ar  /user.slice/user-1000.slice/[email protected]/dbus.service
#   2112     1879        1          1              0        1000  S     499     25       0  mate-sensors-ap  /user.slice/user-1000.slice/[email protected]/dbus.service
#   2113     1879        1          1              0        1000  S     390     21       0  mate-brightness  /user.slice/user-1000.slice/[email protected]/dbus.service
#   2114     1879        1          1              0        1000  S     534     22       0  mate-multiload-  /user.slice/user-1000.slice/[email protected]/dbus.service
#   2118     1879        2          2              0        1000  S     547     29       0  clock-applet     /user.slice/user-1000.slice/[email protected]/dbus.service
#   2152     1879        1          1              0        1000  S     218     22       0  gvfsd-metadata   /user.slice/user-1000.slice/[email protected]/gvfs-metadata.service
#   2206        1        3          3              0         110  S     106     48       0  tor              /system.slice/system-tor.slice/[email protected]
#   2229        1        3          3              0        1000  S     999     42       0  kactivitymanage  /user.slice/user-1000.slice/session-7.scope
#   2238        1        0          0              0        1000  S     150      9       0  kdeinit5         /user.slice/user-1000.slice/session-7.scope
#   2239     2238        3          3              0        1000  S     648     41       0  klauncher        /user.slice/user-1000.slice/session-7.scope
#   3959        1        1          1              0           0  S     615     18       0  NetworkManager   /system.slice/NetworkManager.service
#   3977     3959        0          0              0           0  S      20      4       0  dhclient         /system.slice/NetworkManager.service
#   5626     1879        0          0              0        1000  S     355      7       0  gvfsd-network    /user.slice/user-1000.slice/[email protected]/gvfs-daemon.service
#   5637     1879        1          1              0        1000  S     623     14       0  gvfsd-smb-brows  /user.slice/user-1000.slice/[email protected]/gvfs-daemon.service
#   6296     1879        0          0              0        1000  S     435      7       0  gvfsd-dnssd      /user.slice/user-1000.slice/[email protected]/gvfs-daemon.service
#  11129     1879        3          3              0        1000  S     597     42       0  kded5            /user.slice/user-1000.slice/[email protected]/dbus.service
#  11136     1879        2          2              0        1000  S     639     39       0  kuiserver5       /user.slice/user-1000.slice/[email protected]/dbus.service
#  11703     1879        3          3              0        1000  S     500     45       0  mate-system-mon  /user.slice/user-1000.slice/[email protected]/dbus.service
#  16798     1879        0          0              0        1000  S     346     10       0  gvfsd-http       /user.slice/user-1000.slice/[email protected]/gvfs-daemon.service
#  18133        1        3          3              0        1000  S     760     49       0  kate             /user.slice/user-1000.slice/session-7.scope
#  18144     2038        1          1              0        1000  S     301     23       0  lxterminal       /user.slice/user-1000.slice/session-7.scope
#  18147    18144        0          0              0        1000  S      14      2       0  gnome-pty-helpe  /user.slice/user-1000.slice/session-7.scope
#  18148    18144        1          1              0        1000  S      42     26       0  bash             /user.slice/user-1000.slice/session-7.scope
#  18242     2238        1          1              0        1000  S     194     14       0  file.so          /user.slice/user-1000.slice/session-7.scope
#  18246    18148        0          0              0           0  S      54      4       0  sudo             /user.slice/user-1000.slice/session-7.scope
#  19003        1        0          0              0           0  S     310     12       0  packagekitd      /system.slice/packagekit.service
#  26993     2038       91         91              0        1000  S    3935   1256       0  firefox-esr      /user.slice/user-1000.slice/session-7.scope
#  27275    26993      121        121              0        1000  S    3957   1684       0  Web Content      /user.slice/user-1000.slice/session-7.scope
#  30374        1        1          1              0        1000  S     167     14       0  VBoxXPCOMIPCD    /user.slice/user-1000.slice/session-7.scope
#  30380        1        2          2              0        1000  S     958     27       0  VBoxSVC          /user.slice/user-1000.slice/session-7.scope
#  30549    30380       86         86              0        1000  S    5332   1192       0  VirtualBox       /user.slice/user-1000.slice/session-7.scope
#  30875        1        1          1              0        1000  S     345     26       0  leafpad          /user.slice/user-1000.slice/session-7.scope
#  32689        1        7          7              0        1000  S     896     99       0  dolphin          /user.slice/user-1000.slice/session-7.scope
###################################################################################################################
Process with highest badness (found in 55 ms):
  PID: 27275, Name: Web Content, badness: 121

Logging

To view the latest entries in the log (for systemd users):

sudo journalctl -eu nohang.service

#### or

sudo journalctl -eu nohang-desktop.service

You can also enable separate_log in the config to logging in /var/log/nohang/nohang.log.

oom-sort

oom-sort is an additional diagnostic tool that will be installed with nohang package. It sorts the processes in descending order of their oom_score and also displays oom_score_adj, Uid, Pid, Name, VmRSS, VmSwap and optionally cmdline. Run oom-sort --help for more info. Man page: oom-sort.manpage.md.

Usage:

oom-sort
Output example
oom_score oom_score_adj  UID   PID Name            VmRSS   VmSwap   cmdline
--------- ------------- ---- ----- --------------- ------- -------- -------
       23             0    0   964 Xorg               58 M     22 M /usr/libexec/Xorg -background none :0 vt01 -nolisten tcp -novtswitch -auth /var/run/lxdm/lxdm-:0.auth
       13             0 1000  1365 pcmanfm            38 M     10 M pcmanfm --desktop --profile LXDE
       10             0 1000  1408 dnfdragora-upda     9 M     27 M /usr/bin/python3 /bin/dnfdragora-updater
        5             0    0   822 firewalld           0 M     19 M /usr/bin/python3 /usr/sbin/firewalld --nofork --nopid
        5             0 1000  1364 lxpanel            18 M      2 M lxpanel --profile LXDE
        5             0 1000  1685 nm-applet           6 M     12 M nm-applet
        5             0 1000  1862 lxterminal         16 M      2 M lxterminal
        4             0  996   890 polkitd             8 M      6 M /usr/lib/polkit-1/polkitd --no-debug
        4             0 1000  1703 pnmixer             6 M     11 M pnmixer
        3             0    0   649 systemd-journal    10 M      1 M /usr/lib/systemd/systemd-journald
        3             0 1000  1360 openbox             9 M      2 M openbox --config-file /home/user/.config/openbox/lxde-rc.xml
        3             0 1000  1363 notification-da     3 M     10 M /usr/libexec/notification-daemon
        2             0 1000  1744 clipit              5 M      3 M clipit
        2             0 1000  2619 python3             9 M      0 M python3 /bin/oom-sort
        1             0    0   809 rsyslogd            3 M      3 M /usr/sbin/rsyslogd -n
        1             0    0   825 udisksd             2 M      2 M /usr/libexec/udisks2/udisksd
        1             0    0   873 sssd_nss            4 M      1 M /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
        1             0    0   876 systemd-logind      2 M      2 M /usr/lib/systemd/systemd-logind
        1             0    0   907 abrt-dump-journ     2 M      1 M /usr/bin/abrt-dump-journal-oops -fxtD
        1             0    0   920 NetworkManager      3 M      2 M /usr/sbin/NetworkManager --no-daemon
        1             0 1000  1115 systemd             4 M      1 M /usr/lib/systemd/systemd --user
        1             0 1000  1118 (sd-pam)            0 M      5 M (sd-pam)
        1             0 1000  1366 xscreensaver        5 M      0 M xscreensaver -no-splash
        1             0 1000  1851 gvfsd-trash         3 M      1 M /usr/libexec/gvfsd-trash --spawner :1.6 /org/gtk/gvfs/exec_spaw/0
        1             0 1000  1969 gvfsd-metadata      6 M      0 M /usr/libexec/gvfsd-metadata
        1             0 1000  2262 bash                5 M      0 M bash
        0         -1000    0   675 systemd-udevd       0 M      4 M /usr/lib/systemd/systemd-udevd
        0         -1000    0   787 auditd              0 M      1 M /sbin/auditd
        0             0    0   807 ModemManager        0 M      1 M /usr/sbin/ModemManager
        0             0    0   808 smartd              0 M      1 M /usr/sbin/smartd -n -q never
        0             0    0   810 alsactl             0 M      0 M /usr/sbin/alsactl -s -n 19 -c -E ALSA_CONFIG_PATH=/etc/alsa/alsactl.conf --initfile=/lib/alsa/init/00main rdaemon
        0             0    0   811 mcelog              0 M      0 M /usr/sbin/mcelog --ignorenodev --daemon --foreground
        0             0  172   813 rtkit-daemon        0 M      0 M /usr/libexec/rtkit-daemon
        0             0    0   814 VBoxService         0 M      1 M /usr/sbin/VBoxService -f
        0             0    0   817 rngd                0 M      1 M /sbin/rngd -f
        0          -900   81   818 dbus-daemon         3 M      0 M /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
        0             0    0   823 irqbalance          0 M      0 M /usr/sbin/irqbalance --foreground
        0             0   70   824 avahi-daemon        0 M      0 M avahi-daemon: running [linux.local]
        0             0    0   826 sssd                0 M      2 M /usr/sbin/sssd -i --logger=files
        0             0  995   838 chronyd             1 M      0 M /usr/sbin/chronyd
        0             0    0   849 gssproxy            0 M      1 M /usr/sbin/gssproxy -D
        0             0    0   866 abrtd               0 M      2 M /usr/sbin/abrtd -d -s
        0             0   70   870 avahi-daemon        0 M      0 M avahi-daemon: chroot helper
        0             0    0   871 sssd_be             0 M      2 M /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files
        0             0    0   875 accounts-daemon     0 M      1 M /usr/libexec/accounts-daemon
        0             0    0   906 abrt-dump-journ     1 M      2 M /usr/bin/abrt-dump-journal-core -D -T -f -e
        0             0    0   908 abrt-dump-journ     1 M      2 M /usr/bin/abrt-dump-journal-xorg -fxtD
        0             0    0   950 crond               2 M      1 M /usr/sbin/crond -n
        0             0    0   951 atd                 0 M      0 M /usr/sbin/atd -f
        0             0    0   953 lxdm-binary         0 M      0 M /usr/sbin/lxdm-binary
        0             0    0  1060 dhclient            0 M      2 M /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-enp0s3.pid -lf /var/lib/NetworkManager/dhclient-939eab05-4796-3792-af24-9f76cf53ca7f-enp0s3.lease -cf /var/lib/NetworkManager/dhclient-enp0s3.conf enp0s3
        0             0    0  1105 lxdm-session        0 M      1 M /usr/libexec/lxdm-session
        0             0 1000  1123 pulseaudio          0 M      3 M /usr/bin/pulseaudio --daemonize=no
        0             0 1000  1124 lxsession           1 M      2 M /usr/bin/lxsession -s LXDE -e LXDE
        0             0 1000  1134 dbus-daemon         2 M      0 M /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
        0             0 1000  1215 imsettings-daem     0 M      1 M /usr/libexec/imsettings-daemon
        0             0 1000  1218 gvfsd               3 M      1 M /usr/libexec/gvfsd
        0             0 1000  1223 gvfsd-fuse          0 M      1 M /usr/libexec/gvfsd-fuse /run/user/1000/gvfs -f -o big_writes
        0             0 1000  1309 VBoxClient          0 M      0 M /usr/bin/VBoxClient --display
        0             0 1000  1310 VBoxClient          0 M      0 M /usr/bin/VBoxClient --clipboard
        0             0 1000  1311 VBoxClient          0 M      0 M /usr/bin/VBoxClient --draganddrop
        0             0 1000  1312 VBoxClient          0 M      0 M /usr/bin/VBoxClient --display
        0             0 1000  1313 VBoxClient          1 M      0 M /usr/bin/VBoxClient --clipboard
        0             0 1000  1316 VBoxClient          0 M      0 M /usr/bin/VBoxClient --seamless
        0             0 1000  1318 VBoxClient          0 M      0 M /usr/bin/VBoxClient --seamless
        0             0 1000  1320 VBoxClient          0 M      0 M /usr/bin/VBoxClient --draganddrop
        0             0 1000  1334 ssh-agent           0 M      0 M /usr/bin/ssh-agent /bin/sh -c exec -l bash -c "/usr/bin/startlxde"
        0             0 1000  1362 lxpolkit            0 M      1 M lxpolkit
        0             0 1000  1370 lxclipboard         0 M      1 M lxclipboard
        0             0 1000  1373 ssh-agent           0 M      1 M /usr/bin/ssh-agent -s
        0             0 1000  1485 agent               0 M      1 M /usr/libexec/geoclue-2.0/demos/agent
        0             0 1000  1751 menu-cached         0 M      1 M /usr/libexec/menu-cache/menu-cached /run/user/1000/menu-cached-:0
        0             0 1000  1780 at-spi-bus-laun     0 M      1 M /usr/libexec/at-spi-bus-launcher
        0             0 1000  1786 dbus-daemon         1 M      0 M /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3
        0             0 1000  1792 at-spi2-registr     1 M      1 M /usr/libexec/at-spi2-registryd --use-gnome-session
        0             0 1000  1840 gvfs-udisks2-vo     0 M      2 M /usr/libexec/gvfs-udisks2-volume-monitor
        0             0 1000  1863 gnome-pty-helpe     1 M      0 M gnome-pty-helper
        0             0 1000  1864 bash                0 M      1 M bash
        0             0    0  1899 sudo                0 M      1 M sudo -i
        0             0    0  1901 bash                0 M      1 M -bash
        0             0    0  1953 oomd_bin            0 M      0 M oomd_bin -f /sys/fs/cgroup/unified
        0          -600    0  2562 python3            10 M      0 M python3 /usr/sbin/nohang --config /etc/nohang/nohang.conf

Kthreads, zombies and Pid 1 will not be displayed.

psi-top

psi-top is script that prints the PSI metrics values for every cgroup. It requires Linux >= 4.20 with CONFIG_PSI=y. Man page: psi-top.manpage.md.

Output example
$ psi-top
cgroup2 mountpoint: /sys/fs/cgroup
      avg10  avg60 avg300         avg10  avg60 avg300  cgroup2
      -----  ----- ------         -----  ----- ------  ---------
some   0.00   0.21   1.56 | full   0.00   0.16   1.14  [SYSTEM_WIDE]
some   0.00   0.21   1.56 | full   0.00   0.16   1.14
some   0.00   0.15   1.11 | full   0.00   0.12   0.89  /user.slice
some  45.92  28.77  20.19 | full  45.05  28.17  19.56  /user.slice/user-1000.slice
some   1.44   4.67   9.24 | full   1.44   4.65   9.20  /user.slice/user-1000.slice/[email protected]
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /user.slice/user-1000.slice/[email protected]/pulseaudio.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /user.slice/user-1000.slice/[email protected]/gvfs-daemon.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /user.slice/user-1000.slice/[email protected]/dbus.socket
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /user.slice/user-1000.slice/[email protected]/gvfs-udisks2-volume-monitor.service
some   0.25   1.97   4.05 | full   0.25   1.96   4.03  /user.slice/user-1000.slice/[email protected]/xfce4-notifyd.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /user.slice/user-1000.slice/[email protected]/init.scope
some   0.00   0.66   1.99 | full   0.00   0.66   1.97  /user.slice/user-1000.slice/[email protected]/gpg-agent.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /user.slice/user-1000.slice/[email protected]/gvfs-gphoto2-volume-monitor.service
some   0.93   0.75   0.20 | full   0.93   0.75   0.20  /user.slice/user-1000.slice/[email protected]/at-spi-dbus-bus.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /user.slice/user-1000.slice/[email protected]/gvfs-metadata.service
some   0.00   2.44   6.78 | full   0.00   2.43   6.74  /user.slice/user-1000.slice/[email protected]/dbus.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /user.slice/user-1000.slice/[email protected]/gvfs-mtp-volume-monitor.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /user.slice/user-1000.slice/[email protected]/gvfs-afc-volume-monitor.service
some  44.99  28.30  19.41 | full  44.10  27.70  18.79  /user.slice/user-1000.slice/session-2.scope
some   0.00   0.31   0.53 | full   0.00   0.31   0.53  /init.scope
some   7.25  11.40  13.34 | full   7.23  11.32  13.24  /system.slice
some   0.00   0.01   0.02 | full   0.00   0.01   0.02  /system.slice/systemd-udevd.service
some   0.00   0.58   1.55 | full   0.00   0.58   1.55  /system.slice/cronie.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /system.slice/sys-kernel-config.mount
some   0.00   0.22   0.35 | full   0.00   0.22   0.35  /system.slice/polkit.service
some   0.00   0.06   0.20 | full   0.00   0.06   0.20  /system.slice/rtkit-daemon.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /system.slice/sys-kernel-debug.mount
some   0.00   0.14   0.62 | full   0.00   0.14   0.62  /system.slice/accounts-daemon.service
some   7.86  11.48  12.56 | full   7.84  11.42  12.51  /system.slice/lightdm.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /system.slice/ModemManager.service
some   0.00   1.82   5.47 | full   0.00   1.81   5.43  /system.slice/systemd-journald.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /system.slice/dev-mqueue.mount
some   0.00   1.64   4.07 | full   0.00   1.64   4.07  /system.slice/NetworkManager.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /system.slice/tmp.mount
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /system.slice/lvm2-lvmetad.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /system.slice/dev-disk-by\x2duuid-5d7355c0\x2dc131\x2d40c5\x2d8541\x2d1e04ad7c8b8d.swap
some   0.00   0.09   0.11 | full   0.00   0.09   0.11  /system.slice/upower.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /system.slice/udisks2.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /system.slice/dev-hugepages.mount
some   0.00   0.27   0.49 | full   0.00   0.27   0.48  /system.slice/dbus.service
some   0.00   0.00   0.00 | full   0.00   0.00   0.00  /system.slice/system-getty.slice
some   0.00   0.12   0.20 | full   0.00   0.12   0.20  /system.slice/avahi-daemon.service
some   0.00   0.18   0.30 | full   0.00   0.18   0.30  /system.slice/systemd-logind.service

psi2log

psi2log is a CLI tool that can check and log PSI metrics from specified target. It requires Linux >= 4.20 with CONFIG_PSI=y. Man page: psi2log.manpage.md.

Output example
$ psi2log
Starting psi2log
target: SYSTEM_WIDE
period: 2
------------------------------------------------------------------------------------------------------------------
 some cpu pressure   || some memory pressure | full memory pressure ||  some io pressure    |  full io pressure
---------------------||----------------------|----------------------||----------------------|---------------------
 avg10  avg60 avg300 ||  avg10  avg60 avg300 |  avg10  avg60 avg300 ||  avg10  avg60 avg300 |  avg10  avg60 avg300
------ ------ ------ || ------ ------ ------ | ------ ------ ------ || ------ ------ ------ | ------ ------ ------
  0.13   0.26   0.08 ||   3.36  10.31   3.47 |   2.68   7.69   2.56 ||  20.24  26.90   8.60 |  18.80  23.16   7.33
  0.11   0.25   0.08 ||   2.75   9.97   3.45 |   2.20   7.44   2.54 ||  18.38  26.34   8.61 |  17.21  22.73   7.35
  0.09   0.25   0.07 ||   2.25   9.65   3.43 |   1.80   7.20   2.52 ||  15.05  25.48   8.55 |  14.09  21.99   7.30
  0.07   0.24   0.07 ||   1.84   9.33   3.40 |   1.47   6.96   2.51 ||  13.05  24.78   8.52 |  12.26  21.40   7.28
^C
Peak values:  avg10  avg60 avg300
-----------  ------ ------ ------
some cpu       0.13   0.26   0.08
-----------  ------ ------ ------
some memory    3.36  10.31   3.47
full memory    2.68   7.69   2.56
-----------  ------ ------ ------
some io       20.24  26.90   8.61
full io       18.80  23.16   7.35
$ psi2log -t /user.slice -l pm.log
Starting psi2log
target: /user.slice
period: 2
log file: pm.log
cgroup2 mountpoint: /sys/fs/cgroup
------------------------------------------------------------------------------------------------------------------
 some cpu pressure   || some memory pressure | full memory pressure ||  some io pressure    |  full io pressure
---------------------||----------------------|----------------------||----------------------|---------------------
 avg10  avg60 avg300 ||  avg10  avg60 avg300 |  avg10  avg60 avg300 ||  avg10  avg60 avg300 |  avg10  avg60 avg300
------ ------ ------ || ------ ------ ------ | ------ ------ ------ || ------ ------ ------ | ------ ------ ------
 28.32  11.97   3.03 ||   0.00   1.05   1.65 |   0.00   0.85   1.33 ||   0.55   7.79   7.21 |   0.54   7.52   6.80
 29.53  12.72   3.25 ||   0.00   1.01   1.64 |   0.00   0.82   1.32 ||   0.81   7.60   7.17 |   0.44   7.27   6.76
 29.80  13.32   3.44 ||   0.00   0.98   1.63 |   0.00   0.79   1.31 ||   0.66   7.35   7.12 |   0.36   7.03   6.71
 29.83  13.86   3.62 ||   0.00   0.95   1.62 |   0.00   0.77   1.30 ||   0.54   7.11   7.08 |   0.30   6.80   6.66
 29.86  14.39   3.80 ||   0.00   0.91   1.60 |   0.00   0.74   1.29 ||   0.44   6.88   7.03 |   0.24   6.58   6.62
 30.07  14.94   3.99 ||   0.00   0.88   1.59 |   0.00   0.72   1.28 ||   0.36   6.65   6.98 |   0.20   6.36   6.57
^C
Peak values:  avg10  avg60 avg300
-----------  ------ ------ ------
some cpu      30.07  14.94   3.99
-----------  ------ ------ ------
some memory    0.00   1.05   1.65
full memory    0.00   0.85   1.33
-----------  ------ ------ ------
some io        0.81   7.79   7.21
full io        0.54   7.52   6.80

Contribution

  • Use cases, feature requests and any questions are welcome.
  • Pull requests in dev branch are welcome.

Documentation

License

This project is licensed under the terms of the MIT license.

nohang's People

Contributors

actionless avatar bcho avatar elijahlynn avatar esthvnferrtn avatar flaviut avatar hakavlad avatar kamikazow avatar kawaii-ghost avatar literacyfanatic avatar maximvl avatar mikhailnov avatar monkeysareevil avatar muesli avatar rx14 avatar simonheimberg avatar tim77 avatar yiffyrusdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nohang's Issues

Problem with PSI and zram

Jun 07 05:54:28 user-pc nohang[3047]: PSI avg:  85.81 | MemAvail: 2035 M, 90.6 % | SwapFree: 4636 M,  93.8 % | dMem:    -1 M/s
Jun 07 05:54:28 user-pc nohang[3047]: psi_post_action_delay_exceeded: True
Jun 07 05:54:28 user-pc nohang[3047]: sigkill_psi_exceeded: False
Jun 07 05:54:28 user-pc nohang[3047]: psi_kill_exceeded_timer: 0
Jun 07 05:54:28 user-pc nohang[3047]: sigterm_psi_exceeded: True
Jun 07 05:54:28 user-pc nohang[3047]: psi_term_exceeded_timer: 159.4
Jun 07 05:54:28 user-pc nohang[3047]: min_delay_after_sigterm IS EXCEEDED, it is time to action
Jun 07 05:54:28 user-pc nohang[3047]: PSI avg (85.81) > sigterm_psi_threshold (60.0)
Jun 07 05:54:28 user-pc nohang[3047]: PSI avg exceeded psi_excess_duration (value = 30.0 sec) for 159.4 seconds
Jun 07 05:54:28 user-pc nohang[3047]: Found 66 processes with existing /proc/[pid]/exe
Jun 07 05:54:28 user-pc nohang[3047]: Process with highest badness (found in 7 ms):
Jun 07 05:54:28 user-pc nohang[3047]:   PID: 482, Name: Xorg, badness: 11
Jun 07 05:54:28 user-pc nohang[3047]: Thresholds is not exceeded now
Jun 07 05:54:28 user-pc nohang[3047]: psi_post_action_delay_exceeded: False
Jun 07 05:54:28 user-pc nohang[3047]: sigkill_psi_exceeded: False
Jun 07 05:54:28 user-pc nohang[3047]: psi_kill_exceeded_timer: 0
Jun 07 05:54:28 user-pc nohang[3047]: sigterm_psi_exceeded: True
Jun 07 05:54:28 user-pc nohang[3047]: psi_term_exceeded_timer: 159.4

Fix output

Fix it:

апр 25 23:11:06 PC nohang[31850]:   The victim died in the search process: ValueError: 10
апр 25 23:11:06 PC nohang[31850]:   The victim died in the search process: FileNotFoundError: 10
апр 25 23:11:06 PC nohang[31850]:   ProcessLookupError (the victim died in the search process): : 10

Something went wrong

Monitoring has started!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Memory status that requires corrective actions:
  MemAvailable [0 MiB, 0.0 %] <= soft_threshold_min_mem [982 MiB, 10.0 %]
  SwapFree [2168 MiB, 14.7 %] <= soft_threshold_min_swap [2210 MiB, 15.0 %]
Found 92 processes with existing /proc/[pid]/exe realpath
Process with highest badness (found in 6 ms):
  PID: 12297, Name: tail, badness: 727
Recheck memory levels...
Memory status that requires corrective actions:
  MemAvailable [0 MiB, 0.0 %] <= soft_threshold_min_mem [982 MiB, 10.0 %]
  SwapFree [2163 MiB, 14.7 %] <= soft_threshold_min_swap [2210 MiB, 15.0 %]
Regexp '^tail$' matches with name 'tail'
Execute the command(1) in Thread-1: kill -TERM 12297
Implement a corrective action:
  Run the command: kill -TERM 12297
  Exit status: None; total response time: 7 ms
The victim doesn't respond on corrective action in 0.027 sec
Memory status after implementing a corrective action:
  MemAvailable: 0.0 MiB, SwapFree: 2130.9 MiB
Total stat (what happened in the last 26 sec):
  Run the command 'kill -TERM $PID': 1
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
victim_cache_time is not exceeded for 4579503_pid12297 (0.082 < 10.0)
Memory status that requires corrective actions:
  MemAvailable [0 MiB, 0.0 %] <= soft_threshold_min_mem [982 MiB, 10.0 %]
  SwapFree [2073 MiB, 14.1 %] <= soft_threshold_min_swap [2210 MiB, 15.0 %]
New victim is cached victim 12297 (tail)
Recheck memory levels...
Memory status that requires corrective actions:
  MemAvailable [0 MiB, 0.0 %] <= soft_threshold_min_mem [982 MiB, 10.0 %]
  SwapFree [2073 MiB, 14.1 %] <= soft_threshold_min_swap [2210 MiB, 15.0 %]
max_soft_exit_time is not exceeded (0.1 < 10.0) for the victim
Command(1) execution completed in 0.128 sec; exit status: 0
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
victim_cache_time is not exceeded for 4579503_pid12297 (0.132 < 10.0)
Memory status that requires corrective actions:
  MemAvailable [0 MiB, 0.0 %] <= soft_threshold_min_mem [982 MiB, 10.0 %]
  SwapFree [2099 MiB, 14.2 %] <= soft_threshold_min_swap [2210 MiB, 15.0 %]
New victim is cached victim 12297 (tail)
Recheck memory levels...
Memory status that requires corrective actions:
  MemAvailable [0 MiB, 0.0 %] <= soft_threshold_min_mem [982 MiB, 10.0 %]
  SwapFree [2104 MiB, 14.3 %] <= soft_threshold_min_swap [2210 MiB, 15.0 %]
victim badness (0) < min_badness (10); nothing to do; response time: 0 ms
Total stat (what happened in the last 26 sec):
  Run the command 'kill -TERM $PID': 1
  victim badness < min_badness: 1
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Improve psi-monitor

TODO: new options

--target /system.slice
--period 2
--log-file ./psi-monitor.log

TODO: new output

2019-08-03 16:47:37,629: Starting psi-monitor, target: SYSTEM_WIDE, period: 2, log-file: ./psi-monitor.log
2019-08-03 16:47:37,629: ----------------------------------------------------------------------------------------------------------------
2019-08-03 16:47:37,629:  some cpu pressure   || some memory pressure| full memory pressure || some io pressure    |  full io pressure
2019-08-03 16:47:37,629: ---------------------||---------------------|----------------------||---------------------|---------------------
2019-08-03 16:47:37,629:  avg10  avg60 avg300 || avg10  avg60 avg300 |  avg10  avg60 avg300 || avg10  avg60 avg300 |  avg10  avg60 avg300
2019-08-03 16:47:37,629: ------ ------ ------ ||------ ------ ------ | ------ ------ ------ ||------ ------ ------ | ------ ------ ------
2019-08-03 16:47:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:49:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:47:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:49:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:47:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:49:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:47:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:49:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:47:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:49:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:47:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43
2019-08-03 16:49:37,629:  44.29  28.54  20.08 || 44.29  28.54  20.08 |  43.42  27.88  19.43 || 44.29  28.54  20.08 |  43.42  27.88  19.43

descendants_badness_adj

descendants_badness_adj = 300
It means:
The choice of the victim.
Search its descendants.
Descendants get increased badness.
Re-selection of the victim from the new pool - the first victim and her children with a new badness.

Allow prefer_regex and avoid_regex to match cmdline

I'd rather have nohang kill a single Firefox tab instead of the whole browser. Sadly, both processes have the same name, so I can't use prefer_regex for that.

If nohang would give me some way to also match the cmdline-params (can get those from /proc/$pid/cmdline), I would be able to give processes with an -childID "preferred" treatment :)

Not sure if this should require extra config-options (enable_cmdline, prefer_cmdline_regex,...) are just be included in the standard behaviour.

Overcommit detector to prevent memory errors

A new detector.

Memory errors if vm.overcommit_memory=2: https://imgur.com/a/p9j67KA

Maybe useful if vm.overcommit_memory=2 to prevent MemoryError in innocent processes.

See also: https://www.kernel.org/doc/html/latest/vm/overcommit-accounting.html

overcommit_checking_enabled = False

soft_threshold_overcommit = 5%
hard_threshold_overcommit = 10%

/proc/meminfo:

CommitLimit:    25149116 kB
Committed_AS:    1909064 kB

vm.overcommit_memory=2, Committed_AS/CommitLimit = 50.0 %

/proc/sys/vm/overcommit_memory

Нет версионирования

Нигде не нашел указание версии скрипта. Это может понадобится при создании пакета для пакетного менеджера. Можно сделать через git tag.

Fallback mode

allow_fallback = False
fallback_mem = 1 M
fallback_swap = 1 M
fallback_zram = 90 %

In a critical situation: disable customization, quickly find a victim by oom_score and send SIGKILL.

Undefined variables

This piece of code is using undefined variables:

nohang:1439:54: F821 undefined name 'prefer_regex'
nohang:1440:54: F821 undefined name 'prefer_factor'
nohang:1441:54: F821 undefined name 'avoid_regex'
nohang:1442:54: F821 undefined name 'avoid_factor'
nohang:1447:54: F821 undefined name 'prefer_re_cmdline'
nohang:1448:54: F821 undefined name 'prefer_cmd_factor'
nohang:1449:54: F821 undefined name 'avoid_re_cmdline'
nohang:1450:54: F821 undefined name 'avoid_cmd_factor'
nohang:1455:54: F821 undefined name 'prefer_re_uid'
nohang:1456:54: F821 undefined name 'prefer_uid_factor'
nohang:1457:54: F821 undefined name 'avoid_re_uid'
nohang:1458:54: F821 undefined name 'avoid_uid_factor'

Also this import is not used anywhere:

nohang:1033:5: F401 'sre_constants' imported but unused

output bug

июл 03 23:16:44 PC nohang[6885]: Regexp '' matches with name 'tail'
июл 03 23:16:44 PC nohang[6885]: Execute the command: kill -SEGV 7026
июл 03 23:16:44 PC nohang[6885]: Exit status: 0; exe duration: 0.009 sec
июл 03 23:16:44 PC nohang[6885]: Implement a corrective action:
июл 03 23:16:44 PC nohang[6885]:   Run the command: kill -SEGV 7026
июл 03 23:16:44 PC nohang[6885]:   Exit status: 0; total response time: 77 ms
июл 03 23:16:44 PC nohang[6885]: Total stat (what happened in the last 9 min 22 sec):
июл 03 23:16:44 PC nohang[6885]:   Run the command 'kill -SEGV 6895': 2
июл 03 23:16:44 PC nohang[6885]:   Run the command 'kill -SEGV 6925': 2
июл 03 23:16:44 PC nohang[6885]:   Run the command 'kill -SEGV 6971': 2
июл 03 23:16:44 PC nohang[6885]:   Run the command 'kill -SEGV 6997': 2
июл 03 23:16:44 PC nohang[6885]:   Run the command 'kill -SEGV 7026': 1
июл 03 23:16:44 PC nohang[6885]: Execute the command: /usr/sbin/nohang_notify_helper --uid 0 --time 1562163404.0337765 &
июл 03 23:16:44 PC nohang[6885]: Exit status: 0; exe duration: 0.026 sec
июл 03 23:16:44 PC nohang[6885]: Implement a corrective action:
июл 03 23:16:44 PC nohang[6885]:   Run the command: kill -SEGV 7026
июл 03 23:16:44 PC nohang[6885]:   Exit status: 0; total response time: 77 ms
июл 03 23:16:44 PC nohang[6885]: success:         True
июл 03 23:16:44 PC nohang[6885]: victim will die: None
июл 03 23:16:44 PC nohang[6885]: response_time:   0.10454392433166504 sec
июл 03 23:16:44 PC nohang[6885]: Process exited (VmRSS = 0) in 0.00046 sec
июл 03 23:16:44 PC nohang[6885]: The victim died in 0.58 sec
июл 03 23:16:44 PC nohang[6885]: Memory status after implementing a corrective action:
июл 03 23:16:44 PC nohang[6885]:   MemAvailable: 3097.5 MiB, SwapFree: 7390.2 MiB
июл 03 23:16:44 PC nohang[6885]: Total stat (what happened in the last 9 min 23 sec):
июл 03 23:16:44 PC nohang[6885]:   Run the command 'kill -SEGV 6895': 2
июл 03 23:16:44 PC nohang[6885]:   Run the command 'kill -SEGV 6925': 2
июл 03 23:16:44 PC nohang[6885]:   Run the command 'kill -SEGV 6971': 2
июл 03 23:16:44 PC nohang[6885]:   Run the command 'kill -SEGV 6997': 2
июл 03 23:16:44 PC nohang[6885]:   Run the command 'kill -SEGV 7026': 2

notify_helper bug

июн 15 03:27:39 pc nohang[5282]: MemAvail: 9335 M, 95.2 % | SwapFree: 1682 M,  82.1 % | dMem:    -2 M/s
июн 15 03:27:42 pc nohang[5282]: MemAvail: 9327 M, 95.1 % | SwapFree: 1686 M,  82.3 % | dMem:    -1 M/s
июн 15 03:27:45 pc nohang[5282]: MemAvail: 9321 M, 95.0 % | SwapFree: 1690 M,  82.5 % | dMem:    -1 M/s
июн 15 03:27:46 pc nohang[5282]: nohang_notify_helper: wait_time: 8
июн 15 03:27:46 pc nohang[5282]: Send GUI notification: Low memory MemAvail: 0%
июн 15 03:27:46 pc nohang[5282]: SwapFree: 50% ('user', 'DISPLAY=:0', 'DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus')
июн 15 03:27:46 pc nohang[5282]: Send GUI notification: Low memory MemAvail: 0%
июн 15 03:27:46 pc nohang[5282]: SwapFree: 50% ('root', 'DISPLAY=:0', 'DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-7XOX46CH5b,guid=c22809dd399138da090795805d03d826')
июн 15 03:27:46 pc nohang[5282]: TimeoutExpired: notify user: user
июн 15 03:27:46 pc nohang[5282]: TimeoutExpired: notify user: root
июн 15 03:27:46 pc nohang[5282]: nohang_notify_helper: wait_time: 8
июн 15 03:27:46 pc nohang[5282]: Send GUI notification: Low memory MemAvail: 0%
июн 15 03:27:46 pc nohang[5282]: SwapFree: 17% ('root', 'DISPLAY=:0', 'DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-7XOX46CH5b,guid=c22809dd399138da090795805d03d826')
июн 15 03:27:46 pc nohang[5282]: Send GUI notification: Low memory MemAvail: 0%
июн 15 03:27:46 pc nohang[5282]: SwapFree: 17% ('user', 'DISPLAY=:0', 'DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus,guid=7625e2f7cd8770020fae5d565d03d64d')
июн 15 03:27:46 pc nohang[5282]: TimeoutExpired: notify user: root
июн 15 03:27:46 pc nohang[5282]: TimeoutExpired: notify user: user

https://imgur.com/a/nwMon8k

TODO: exclude root

improve cgroups support

cgroup_v1 -> cgroup_name_systemd

:name= -> :name=systemd

find_cgroup_mountpoints()
return c1_systemd_mountpoint, c2_mountpoint

improve pid_to_cgroup_v2
rename and improve pid_to_cgroup_v1

find_cgroup_mountpoints()
return c1_systemd_mountpoint, c2_mountpoint

pid_to_cgroup_v1_systemd_pid_list(pid)
return []

pid_to_cgroup_v2_pid_list(pid)
return []

' cgroup2 rw,'
',name=systemd 0 0'

nohang --cgroup-info

mountpoint + /system.slice/foo.service + /cgroup.procs

Kill innocent victim

мая 26 23:02:22 user-VirtualBox nohang[530]: Memory status before implementing a corrective action:
мая 26 23:02:22 user-VirtualBox nohang[530]:   MemAvailable: 1574.5 MiB, SwapFree: 546.0 MiB
мая 26 23:02:22 user-VirtualBox nohang[530]: Victim VmRSS: 3268608 KiB
мая 26 23:02:22 user-VirtualBox nohang[530]: Timer (value = 0.01 sec) expired; seems like the victim handles signal
мая 26 23:02:22 user-VirtualBox nohang[530]: Implement a corrective action:
мая 26 23:02:22 user-VirtualBox nohang[530]:   Send SIGTERM to the victim; total response time: 13246 ms

todo: recheck thresholds before implementing corrective action.

Improvement: show list of worst processes instead of background kill

For me as casual user, is more conveniently to choose process which should be killed. It is difficult to constantly update the priorities of processes in the config, especially since various situations may arise that will be decided not in favor of the user.
I think it's better to show list of worst processes and possibility to pass SIGTERM, SIGKILL to one or many of them. If the indicator of the load on the subsystem reaches critical values, then the process is automatically killed, as it works now.

Option to ignore oom_score completely

In my case there's a lot of memory-eager applications which sets themselves oom_score_adj=-1000 and therefore I keep getting xfce4-notifyd (with 25M VmRSS) killed first which occasionally gets an oom_score value of 1 instead of the process that actually ate all of my ram and triggered -m limits (but keeps its oom_score at 0).
I don't really know is that a common problem but anyway, in my opinion it'll be good to have an option to decide which process to kill based exclusively on VmRSS value

rfjakob/earlyoom#140

execute_the_command

nohang always try to run command instead of send SIGTERM even if execute_the_command = False

Long search for victims

2019-05-26 23:02:21,140: Process with highest badness (found in 13245 ms):

TODO: Add in the documentation the conditions for the emergence of such behavior.

UnboundLocalError

июн 19 22:06:22 pc systemd[1]: nohang.service: Service RestartSec=100ms expired, scheduling restart.
июн 19 22:06:22 pc systemd[1]: nohang.service: Scheduled restart job, restart counter is at 582.
июн 19 22:06:22 pc systemd[1]: Stopped Highly configurable OOM prevention daemon.
июн 19 22:06:22 pc systemd[1]: Started Highly configurable OOM prevention daemon.
июн 19 22:06:22 pc nohang[3204]: Config: /etc/nohang/nohang.conf
июн 19 22:06:22 pc nohang[3204]: Monitoring has started!
июн 19 22:06:22 pc nohang[3204]: Traceback (most recent call last):
июн 19 22:06:22 pc nohang[3204]:   File "/usr/sbin/nohang", line 3075, in <module>
июн 19 22:06:22 pc nohang[3204]:     swap_free, swap_total) = check_mem_swap_ex()
июн 19 22:06:22 pc nohang[3204]:   File "/usr/sbin/nohang", line 1298, in check_mem_swap_ex
июн 19 22:06:22 pc nohang[3204]:     if swap_total > swap_min_sigkill_kb:
июн 19 22:06:22 pc nohang[3204]: UnboundLocalError: local variable 'swap_min_sigkill_kb' referenced before assignment
июн 19 22:06:22 pc systemd[1]: nohang.service: Main process exited, code=exited, status=1/FAILURE
июн 19 22:06:22 pc systemd[1]: nohang.service: Failed with result 'exit-code'.

Badness bug

incorrect badness value in log, will be fixed later

Problem with PSI

Jun 12 01:11:35 user-pc nohang[447]: ##################################################################
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 25.589
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.0
Jun 12 01:11:35 user-pc nohang[447]: PSI avg:  92.97 | MemAvail: 2039 M, 90.8 % | SwapFree: 4586 M,  92.8 % | dMem:     0 M/s
Jun 12 01:11:35 user-pc nohang[447]: min_delay_after_sigterm IS EXCEEDED, it is time to action
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.97) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.0 seconds
Jun 12 01:11:35 user-pc nohang[447]: Found 70 processes with existing /proc/[pid]/exe
Jun 12 01:11:35 user-pc nohang[447]: Process with highest badness (found in 6 ms):
Jun 12 01:11:35 user-pc nohang[447]:   PID: 477, Name: Xorg, badness: 12
Jun 12 01:11:35 user-pc nohang[447]: Recheck memory levels...
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 25.596
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.0
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.97) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.0 seconds
Jun 12 01:11:35 user-pc nohang[447]: victim badness 12 < min_badness 20; nothing to do; response time: 7 ms
Jun 12 01:11:35 user-pc nohang[447]: Total stat (what happened in the last 31 min 59 sec):
Jun 12 01:11:35 user-pc nohang[447]:   Send SIGTERM to tail: 1
Jun 12 01:11:35 user-pc nohang[447]:   victim badness < min_badness: 52
Jun 12 01:11:35 user-pc nohang[447]: ##################################################################
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 25.698
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.1
Jun 12 01:11:35 user-pc nohang[447]: PSI avg:  92.97 | MemAvail: 2039 M, 90.8 % | SwapFree: 4586 M,  92.8 % | dMem:     0 M/s
Jun 12 01:11:35 user-pc nohang[447]: min_delay_after_sigterm IS EXCEEDED, it is time to action
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.97) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.1 seconds
Jun 12 01:11:35 user-pc nohang[447]: Found 70 processes with existing /proc/[pid]/exe
Jun 12 01:11:35 user-pc nohang[447]: Process with highest badness (found in 7 ms):
Jun 12 01:11:35 user-pc nohang[447]:   PID: 477, Name: Xorg, badness: 12
Jun 12 01:11:35 user-pc nohang[447]: Recheck memory levels...
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 25.707
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.1
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.97) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.1 seconds
Jun 12 01:11:35 user-pc nohang[447]: victim badness 12 < min_badness 20; nothing to do; response time: 9 ms
Jun 12 01:11:35 user-pc nohang[447]: Total stat (what happened in the last 31 min 59 sec):
Jun 12 01:11:35 user-pc nohang[447]:   Send SIGTERM to tail: 1
Jun 12 01:11:35 user-pc nohang[447]:   victim badness < min_badness: 53
Jun 12 01:11:35 user-pc nohang[447]: ##################################################################
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 25.809
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.2
Jun 12 01:11:35 user-pc nohang[447]: PSI avg:  92.97 | MemAvail: 2039 M, 90.7 % | SwapFree: 4586 M,  92.8 % | dMem:    -2 M/s
Jun 12 01:11:35 user-pc nohang[447]: min_delay_after_sigterm IS EXCEEDED, it is time to action
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.97) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.2 seconds
Jun 12 01:11:35 user-pc nohang[447]: Found 70 processes with existing /proc/[pid]/exe
Jun 12 01:11:35 user-pc nohang[447]: Process with highest badness (found in 6 ms):
Jun 12 01:11:35 user-pc nohang[447]:   PID: 477, Name: Xorg, badness: 12
Jun 12 01:11:35 user-pc nohang[447]: Recheck memory levels...
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 25.816
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.2
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.97) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.2 seconds
Jun 12 01:11:35 user-pc nohang[447]: victim badness 12 < min_badness 20; nothing to do; response time: 7 ms
Jun 12 01:11:35 user-pc nohang[447]: Total stat (what happened in the last 31 min 59 sec):
Jun 12 01:11:35 user-pc nohang[447]:   Send SIGTERM to tail: 1
Jun 12 01:11:35 user-pc nohang[447]:   victim badness < min_badness: 54
Jun 12 01:11:35 user-pc nohang[447]: ##################################################################
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 25.917
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.3
Jun 12 01:11:35 user-pc nohang[447]: PSI avg:  92.97 | MemAvail: 2039 M, 90.7 % | SwapFree: 4586 M,  92.8 % | dMem:     0 M/s
Jun 12 01:11:35 user-pc nohang[447]: min_delay_after_sigterm IS EXCEEDED, it is time to action
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.97) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.3 seconds
Jun 12 01:11:35 user-pc nohang[447]: Found 70 processes with existing /proc/[pid]/exe
Jun 12 01:11:35 user-pc nohang[447]: Process with highest badness (found in 6 ms):
Jun 12 01:11:35 user-pc nohang[447]:   PID: 477, Name: Xorg, badness: 12
Jun 12 01:11:35 user-pc nohang[447]: Recheck memory levels...
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 25.924
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.3
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.97) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.3 seconds
Jun 12 01:11:35 user-pc nohang[447]: victim badness 12 < min_badness 20; nothing to do; response time: 7 ms
Jun 12 01:11:35 user-pc nohang[447]: Total stat (what happened in the last 31 min 59 sec):
Jun 12 01:11:35 user-pc nohang[447]:   Send SIGTERM to tail: 1
Jun 12 01:11:35 user-pc nohang[447]:   victim badness < min_badness: 55
Jun 12 01:11:35 user-pc nohang[447]: ##################################################################
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 26.026
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.4
Jun 12 01:11:35 user-pc nohang[447]: PSI avg:  92.97 | MemAvail: 2039 M, 90.7 % | SwapFree: 4586 M,  92.8 % | dMem:     2 M/s
Jun 12 01:11:35 user-pc nohang[447]: min_delay_after_sigterm IS EXCEEDED, it is time to action
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.97) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.4 seconds
Jun 12 01:11:35 user-pc nohang[447]: Found 70 processes with existing /proc/[pid]/exe
Jun 12 01:11:35 user-pc nohang[447]: Process with highest badness (found in 5 ms):
Jun 12 01:11:35 user-pc nohang[447]:   PID: 477, Name: Xorg, badness: 12
Jun 12 01:11:35 user-pc nohang[447]: Recheck memory levels...
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 26.032
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.4
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.97) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.4 seconds
Jun 12 01:11:35 user-pc nohang[447]: victim badness 12 < min_badness 20; nothing to do; response time: 6 ms
Jun 12 01:11:35 user-pc nohang[447]: Total stat (what happened in the last 31 min 59 sec):
Jun 12 01:11:35 user-pc nohang[447]:   Send SIGTERM to tail: 1
Jun 12 01:11:35 user-pc nohang[447]:   victim badness < min_badness: 56
Jun 12 01:11:35 user-pc nohang[447]: ##################################################################
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_timer: 26.133
Jun 12 01:11:35 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:35 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:35 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:35 user-pc nohang[447]: psi_term_exceeded_timer: 66.5
Jun 12 01:11:35 user-pc nohang[447]: PSI avg:  92.25 | MemAvail: 2039 M, 90.7 % | SwapFree: 4586 M,  92.8 % | dMem:    -2 M/s
Jun 12 01:11:35 user-pc nohang[447]: min_delay_after_sigterm IS EXCEEDED, it is time to action
Jun 12 01:11:35 user-pc nohang[447]: PSI avg (92.25) > sigterm_psi_threshold (90.0)
Jun 12 01:11:35 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.5 seconds
Jun 12 01:11:35 user-pc nohang[447]: Found 70 processes with existing /proc/[pid]/exe
Jun 12 01:11:35 user-pc nohang[447]: Process with highest badness (found in 4 ms):
Jun 12 01:11:36 user-pc nohang[447]:   PID: 477, Name: Xorg, badness: 12
Jun 12 01:11:36 user-pc nohang[447]: Recheck memory levels...
Jun 12 01:11:36 user-pc nohang[447]: psi_post_action_delay_timer: 26.138
Jun 12 01:11:36 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:36 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:36 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:36 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:36 user-pc nohang[447]: psi_term_exceeded_timer: 66.5
Jun 12 01:11:36 user-pc nohang[447]: PSI avg (92.25) > sigterm_psi_threshold (90.0)
Jun 12 01:11:36 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.5 seconds
Jun 12 01:11:36 user-pc nohang[447]: victim badness 12 < min_badness 20; nothing to do; response time: 5 ms
Jun 12 01:11:36 user-pc nohang[447]: Total stat (what happened in the last 32 min 0 sec):
Jun 12 01:11:36 user-pc nohang[447]:   Send SIGTERM to tail: 1
Jun 12 01:11:36 user-pc nohang[447]:   victim badness < min_badness: 57
Jun 12 01:11:36 user-pc nohang[447]: ##################################################################
Jun 12 01:11:36 user-pc nohang[447]: psi_post_action_delay_timer: 26.239
Jun 12 01:11:36 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:36 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:36 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:36 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:36 user-pc nohang[447]: psi_term_exceeded_timer: 66.6
Jun 12 01:11:36 user-pc nohang[447]: PSI avg:  92.25 | MemAvail: 2038 M, 90.7 % | SwapFree: 4586 M,  92.8 % | dMem:    -1 M/s
Jun 12 01:11:36 user-pc nohang[447]: min_delay_after_sigterm IS EXCEEDED, it is time to action
Jun 12 01:11:36 user-pc nohang[447]: PSI avg (92.25) > sigterm_psi_threshold (90.0)
Jun 12 01:11:36 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.6 seconds
Jun 12 01:11:36 user-pc nohang[447]: Found 70 processes with existing /proc/[pid]/exe
Jun 12 01:11:36 user-pc nohang[447]: Process with highest badness (found in 5 ms):
Jun 12 01:11:36 user-pc nohang[447]:   PID: 477, Name: Xorg, badness: 12
Jun 12 01:11:36 user-pc nohang[447]: Recheck memory levels...
Jun 12 01:11:36 user-pc nohang[447]: psi_post_action_delay_timer: 26.245
Jun 12 01:11:36 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:36 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:36 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:36 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:36 user-pc nohang[447]: psi_term_exceeded_timer: 66.6
Jun 12 01:11:36 user-pc nohang[447]: PSI avg (92.25) > sigterm_psi_threshold (90.0)
Jun 12 01:11:36 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.6 seconds
Jun 12 01:11:36 user-pc nohang[447]: victim badness 12 < min_badness 20; nothing to do; response time: 5 ms
Jun 12 01:11:36 user-pc nohang[447]: Total stat (what happened in the last 32 min 0 sec):
Jun 12 01:11:36 user-pc nohang[447]:   Send SIGTERM to tail: 1
Jun 12 01:11:36 user-pc nohang[447]:   victim badness < min_badness: 58
Jun 12 01:11:36 user-pc nohang[447]: ##################################################################
Jun 12 01:11:36 user-pc nohang[447]: psi_post_action_delay_timer: 26.346
Jun 12 01:11:36 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:36 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:36 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:36 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:36 user-pc nohang[447]: psi_term_exceeded_timer: 66.7
Jun 12 01:11:36 user-pc nohang[447]: PSI avg:  92.25 | MemAvail: 2038 M, 90.7 % | SwapFree: 4586 M,  92.8 % | dMem:     1 M/s
Jun 12 01:11:36 user-pc nohang[447]: min_delay_after_sigterm IS EXCEEDED, it is time to action
Jun 12 01:11:36 user-pc nohang[447]: PSI avg (92.25) > sigterm_psi_threshold (90.0)
Jun 12 01:11:36 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.7 seconds
Jun 12 01:11:36 user-pc nohang[447]: Found 70 processes with existing /proc/[pid]/exe
Jun 12 01:11:36 user-pc nohang[447]: Process with highest badness (found in 5 ms):
Jun 12 01:11:36 user-pc nohang[447]:   PID: 477, Name: Xorg, badness: 12
Jun 12 01:11:36 user-pc nohang[447]: Recheck memory levels...
Jun 12 01:11:36 user-pc nohang[447]: psi_post_action_delay_timer: 26.353
Jun 12 01:11:36 user-pc nohang[447]: psi_post_action_delay_exceeded: True
Jun 12 01:11:36 user-pc nohang[447]: sigkill_psi_exceeded: False
Jun 12 01:11:36 user-pc nohang[447]: psi_kill_exceeded_timer: 0
Jun 12 01:11:36 user-pc nohang[447]: sigterm_psi_exceeded: True
Jun 12 01:11:36 user-pc nohang[447]: psi_term_exceeded_timer: 66.7
Jun 12 01:11:36 user-pc nohang[447]: PSI avg (92.25) > sigterm_psi_threshold (90.0)
Jun 12 01:11:36 user-pc nohang[447]: PSI avg exceeded psi_excess_duration (value = 40.0 sec) for 66.7 seconds
Jun 12 01:11:36 user-pc nohang[447]: victim badness 12 < min_badness 20; nothing to do; response time: 6 ms
Jun 12 01:11:36 user-pc nohang[447]: Total stat (what happened in the last 32 min 0 sec):
Jun 12 01:11:36 user-pc nohang[447]:   Send SIGTERM to tail: 1
Jun 12 01:11:36 user-pc nohang[447]:   victim badness < min_badness: 59
Jun 12 01:11:36 user-pc nohang[447]: ##################################################################

Packaging for RPM

Some rpmlint errors:

E: zero-length /etc/nohang/version

This empty file really necessary or it could be deleted safely?

Install fails

Installing latest git master (c4f6a66) fails for me (Ubuntu 18.04)

$ sudo make install                
install -d /usr/local/bin
install -m0755 nohang /usr/local/bin/nohang
install -m0755 nohang_notify_helper /usr/local/bin/nohang_notify_helper
install -m0755 oom-sort /usr/local/bin/oom-sort
install -m0755 psi-top /usr/local/bin/psi-top
install -m0755 psi-monitor /usr/local/bin/psi-monitor
install -d /etc/nohang
git describe --tags --long --dirty > version
install -m0644 version /etc/nohang/version
rm -fv version
removed 'version'
install -m0644 nohang.conf /etc/nohang/nohang.conf
install -m0644 nohang.conf /etc/nohang/nohang.conf.default
install -d /etc/logrotate.d
install -m0644 nohang.logrotate /etc/logrotate.d/nohang
install -d /usr/share/man/man1
gzip -c nohang.1 > /usr/share/man/man1/nohang.1.gz
gzip -c oom-sort.1 > /usr/share/man/man1/oom-sort.1.gz
install -d /etc/systemd/system
sed "s|:TARGET_BIN:|/usr/local/bin|g;s|:TARGET_CONF:|/etc|g" nohang.service.in > nohang.service
install -m0644 nohang.service /etc/systemd/system/nohang.service
rm -fv nohang.service
removed 'nohang.service'
chcon -t systemd_unit_file_t /etc/systemd/system/nohang.service
chcon: can't apply partial context to unlabeled file '/etc/systemd/system/nohang.service'
Makefile:12: recipe for target 'install' failed
make: [install] Error 1 (ignored)

I guess the problem is that

File context can be temporarily modified with the chcon command. If you want to permanantly change the file context you need to use the semanage fcontext command. This will modify the SELinux labeling database. You will need to use restorecon to apply the labels.

-- https://www.systutorials.com/docs/linux/man/8-system_selinux/

Kill a cgroup as a single unit

$respect_memory.oom.group = False

@kill_cgroup_v2_group_re  ^/workload\.slice/

@kill_cgroup_v1_group_re  ^/workload\.slice/

@kill_cgroup_v1_group_re  ^/system\.slice/foo\.service$

If the process cgroup matches the specified one, then all processes with the same сgroup will be killed.

remove oom_score_adj_max

decrease_oom_score_adj = False
oom_score_adj_max = 0

->

ignore_positive_oom_score_adj = False

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.