GithubHelp home page GithubHelp logo

rfparedes / gdg Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 1.0 19.14 MB

Granular Data Gatherer is an easy and open all-in-one tool to collect OS metrics for troubleshooting

License: GNU General Public License v3.0

Go 100.00%

gdg's Introduction

Contributors Language Issues GPL-3.0 License Watchers


Granular Data Gatherer (gdg)

Collects Granular OS Metrics for Troubleshooting
Report Bug · Request Feature

Table of Contents

  1. About The Project
  2. Getting Started
  3. Technical Details
  4. Usage
  5. Build It Yourself
  6. Validated Distributions
  7. Roadmap
  8. Contributing
  9. License
  10. Reference

About The Project

gdg or Granular Data Gatherer was developed in Go to fill the missing gap in the availability of an easy and open all-in-one tool to collect OS metrics for troubleshooting. OSWatcher and nmon cannot be the only viable options.

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

  • a server, instance, VM running a systemd-enabled Linux distribution

Installation

Download the binary from Releases (https://github.com/rfparedes/gdg/releases/latest/download/gdg) to /usr/local/sbin on the server and run:

sudo chmod +x /usr/local/sbin/gdg

Start it

sudo /usr/local/sbin/gdg --start

Check Status Anytime

/usr/local/sbin/gdg --status

Technical Details

  • There are three components to gdg, each which can be separately started or stopped

    1. granular data collection using standard utilities
    2. rtmon collection of network state information
    3. process d-state detect and automated sysrq-t
  • gdg uses standard Linux utilities to perform its work, including:

    • iostat
    • top
    • mpstat
    • vmstat
    • ss
    • nstat
    • ps
    • nfsiostat
    • ethtool
    • ip
    • pidstat
    • numastat
    • sar
    • rtmon
  • gdg will detect which utilities are available and only use those installed. In advance, you can install any of the utilities above anytime before or after setup. Most of these utilities are located in six different packages. On most distributions, sysstat package contains (iostat, mpstat, pidstat, sar), nfs-common or nfs-client package contains (nfsiostat), procps package contains (top, vmstat, ps), iproute2 package contains (ss, nstat, ip, rtmon), ethtool contains (ethtool), and numactl contains (numastat).

  • gdg will by default keep seven days of logs. This can be changed by the user with the --logdays option. In addition, all log files that haven't been gzipped, with the exception of the log currently being written to, will be gzipped hourly. gdg --status will give you the current gdg space usage

  • gdg will create a configuration file in /etc/gdg.cfg and a data directory in /var/log/gdg-data.

  • gdg uses a systemd timer so there is no running daemon.

  • gdg installs two systemd services and two systemd timer on --start. One set of service and timer files are responsible for calling the data collection. The other set of service and timer files are responsible for the log tidying every hour.

  • gdg removes the systemd service and systemd timer on --stop. All other files are untouched.

  • gdg collects data in the /var/log/gdg-data directory. The children below this directory are named after the utility (e.g. iostat) which collected the data. Below this directory are .dat (e.g. meminfo_21.03.07.2300.dat) files named after the following format (utility_YY.MM.DD.HH00.dat). The .dat files contain at maximum, one hour worth of data.

  • To easily search down chronologically through the data collected in the .dat file, use the search string zzz.

  • rtmon logging needs to be enabled explicitly and will collect network state information directly from the kernel on an ongoing basis. Enabling this enables a systemd service which is running while rtmon is enabled. This can be used to prove that service issues started after an external network failure. [1]

  • If d-state is enabled, during each interval run, the number of processes in D state are detected and if this number is greater than or equal to a user-defined value (number of processes in D state), echo t > /proc/sysrq-trigger is executed to get a task trace of all processes. This is a one-time action, meaning, once task trace is triggered, it won't be triggered again until user enables again explictly.

Usage

To start collection in 30s intervals and keep logs for 7 days, run

sudo /usr/local/sbin/gdg --interval 30 --logdays 7 --start

To stop collection, run

sudo /usr/local/sbin/gdg --stop

To see the data collected

cd /var/log/gdg-data

To see the current status of gdg including start/stop status, version, interval, data location, and current size of collected data, run

/usr/local/sbin/gdg --status

e.g.

~~~~~~~~~~~~~~~
  gdg status
~~~~~~~~~~~~~~~
VERSION: gdg-0.9.1
STATUS: started
RTMON: started
INTERVAL: 15s
LOG DAYS TO KEEP: 14d
DATA LOCATION: /var/log/gdg-data/
CONFIG LOCATION: /etc/gdg.cfg
CURRENT DATA SIZE: 318MB
~~~~~~~~~~~~~~~
DSTATE: stopped
NUMPROCS: 0

If you want to change the interval (-t) or logdays (-l) after installing additional supported utilities, run

sudo /usr/local/sbin/gdg --reload --interval 60 --logdays 14

To toggle rtmon logging on or off, run

sudo /usr/local/sbin/gdg --rtmon

To enable d-state functionality to trigger sysrq-t

sudo /usr/local/sbin/gdg --dst <NUMPROCS>

For help

/usr/local/sbin/gdg --help

Build it yourself

  • You'll need a go compiler installed

Clone it

git clone https://github.com/rfparedes/gdg.git

Build it

cd gdg
go build -o gdg

Move it

mv gdg /usr/local/sbin
sudo chmod +x /usr/local/sbin/gdg

Start it

sudo /usr/local/sbin/gdg --start

Validated Distributions

gdg has been validated on:

  • SLE-12 (SLES or SLES-SAP 12 all SPs)
  • SLE-15 (SLES or SLES-SAP 15 all SPs)
  • openSUSE Leap 12/15
  • Debian 9
  • Debian 10
  • RHEL7
  • RHEL8
  • Ubuntu 18.04
  • Ubuntu 20.04

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the GPL-3.0 License. See LICENSE for more information.

Reference

[1] https://www.suse.com/support/kb/doc/?id=000019863

gdg's People

Contributors

johanburati avatar rfparedes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

johanburati

gdg's Issues

Service file permissions

I'm seeing the following message in the logs:

systemd[1]: Configuration file /etc/systemd/system/gdg.service is marked executable. Please remove executable permission bits. Proceeding anyway.

The service filee is created with 755 permissions, I think it should be changed to 644.

-rwxr-xr-x. 1 root root 153 Mar  9 06:47 /etc/systemd/system/gdg.service

Btw great project !

gdg.service: Succeeded. keeps repeating in the logs

Hi Rich,

The gdg.service: Succeeded. message keeps repeating in the logs,
I understand it is related to the interval but that would be nice if we could get rid of that message.

Mar  9 07:11:20 r83t systemd[1]: gdg.service: Succeeded.
Mar  9 07:11:50 r83t systemd[1]: gdg.service: Succeeded.
Mar  9 07:12:21 r83t systemd[1]: gdg.service: Succeeded.
Mar  9 07:12:51 r83t systemd[1]: gdg.service: Succeeded.
Mar  9 07:13:22 r83t systemd[1]: gdg.service: Succeeded.
Mar  9 07:13:52 r83t systemd[1]: gdg.service: Succeeded.
Mar  9 07:14:23 r83t systemd[1]: gdg.service: Succeeded.
Mar  9 07:14:53 r83t systemd[1]: gdg.service: Succeeded.
Mar  9 07:15:24 r83t systemd[1]: gdg.service: Succeeded.
Mar  9 07:15:54 r83t systemd[1]: gdg.service: Succeeded.

Overwrite the gdg.cfg anytime stopped and started

This will be necessary as the start will always rediscover any supported binaries and add these to gdg.cfg. But if a binary was removed or a network interface for instance, the entries in gdg.cfg will be stale

SLES 11 support

SLES 11 doesn't have systemd so cron would be used as scheduler

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.