GithubHelp home page GithubHelp logo

mdb's Introduction

mdb

tests style readthedocs

An MPI-aware frontend for serial debuggers, such as gdb and lldb.

Documentation

For help with installation, a quick-start tutorial (with example debug session) and an API reference please check out mdb's documentation.

Purpose

mdb is a debugger aimed at parallel programs using the MPI programming paradigm. mdb acts as a MPI-aware frontend for different backend debuggers, such as gdb and lldb. As such, it supports the following languages:

  • C
  • C++
  • Fortran

Technically gdb supports other languages as well, but this is the intersection of languages that MPI is implemented in. For lldb your mileage may vary when debugging Fortran.

Usage

Please see the quick start guide in the documentation for a walk-through of a simple debug session. The guide covers basic debug commands and information on how to launch the debugger.

Installation

These instructions are for normal use of mdb. Please see below for a developer install.

  1. Clone the repository.

    git clone https://github.com/TomMelt/mdb.git
  2. (optional - but recommended) Create a conda environment or venv.

    conda create -n mdb python
    conda activate mdb
  3. Install mdb.

    cd mdb/
    pip install .

More information can be found in the installation guide.

Please Note mdb doesn't currently support Windows (see here for more info).

Dependencies

Non-Python Dependencies

  • Either gdb or lldb (depending on your preference)

mdb does not package gdb or lldb. You will need these installed on your system in order to run mdb. Please visit the debugger's respective sites for installation instructions e.g., gdb and lldb.

Python Dependencies

The main python dependencies are listed in the pyproject.toml file, e.g.,

  • click
  • matplotlib
  • numpy
  • pexpect

These will all be installed as part of the default pip installation. See installing mdb in the documentation for more information.

  • termgraph (optional - fancy Unicode plots straight to your terminal)

termgraph is optional but can be installed alongside mbd. See installing mdb in the documentation for more information.

Supported MPI implementations

Currently I am building and testing for open MPI only. In principle it really won't take much work to expand to other implementations but I just haven't done it yet.

  • Open MPI mpirun and mpiexec
  • Intel MPI mpirun and mpiexec
  • Slurm srun (should work but still needs testing)
  • others...

TODO

  • rewrite launcher to add more functionality (e.g., auto-restart if MPI job fails)
  • intercept stdin to run commands on another process (or processes) inside of an interactive session
  • track MPI communication dependencies (holistic metric)
  • print aggregated backtrace (holistic metric)
  • record asciinema demo? / youtube video?

Contributing

If you would like to be involved in the development, feel free to submit a PR. A word of caution though... the code is currently in a highly volatile state and a plan major changes to the interface and layout. I will update this section when I reach a more stable part of the development. Either way changes are welcome at anytime.

Please see CONTRIBUTING.md for more details on how best to contribute.

Developers

For development it is best to install mdb with some additional dependencies. These can be installed following the installing mdb for developers guide.

Acknowledgements

This project was inspired by @mystery-e204's mpidb tool and @Azrael3000's tmpi tmux interface.

Similar Projects

I have recently come across @robertu94's mpigdb. It seems to offer similar functionality and it has a closer integration with gdb using gdb's inbuilt inferiors to handle multiple processes at the same time (see gdb manual sec. 4.9 for more info). The main difference from my perspective is that I can plot variables across MPI processes using mdb and AFAIK mpigdb cannot. If you like mdb you may want to check out mpigdb as well.

mdb's People

Contributors

fjebaker avatar tommelt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

fjebaker

mdb's Issues

gdbserver is not secure

Due to an underlying security issue with gdbserver, it is not recommended to run mdb on multi-user systems connected to a public network e.g., most HPC systems.

Warning: gdbserver does not have any built-in security. Do not run gdbserver connected to any public network; a GDB connection to gdbserver provides access to the target system with the same privileges as the user running gdbserver.

(source gdb manual)

I am working on a fix which involves replacing the gdbserver backend with my own tls/ssl-encrypted server to manage connections and directly use gdb as a backend instead.

I am planning to use randomly-generated tokens (similar to jupyter notebook) to secure the connection.

  • replace gdbserver backend with tls/ssl-encrypted server + gdb backend
  • update documentation and remove security notice

Multi-node jobs unsupported

Whilst remote debugging will work, if a single MPI launch spans multiple nodes then I have no way of currently passing more than one host to mdb attach. This will need to be fixed.

Support for MPICH and derived MPI implementations

I just wanted to point out that the -configfile argument1 passed to mpirun in the Intel MPI case (though perhaps with just 1 - rather than 2) should also work for MPICH and derived implementations.

Pretty much all MPI implementations except OpenMPI are derived from MPICH (including Intel MPI) so it might be reasonable to assume any non-OpenMPI mpirun (or mpiexec) command supports this argument (the output of mpirun --version in the case of vanilla MPICH is not the nicest to parse).

Footnotes

  1. In fact, the MPI spec explicitly recommends the -configfile argument (p513). โ†ฉ

Proposal: broadcast port

The current issue

At the moment, mdb uses the following procedure for launching and attaching

# Instance A
mdb launch -n $RANKS ./exe

# Instance B
mdb attach -n $RANKS -b MAIN__

The number of ranks in both places must be equal. If the gdb servers are spawned at different addresses (#8), then the attach sub command becomes more complex. If A has some at the moment unknown configuration that B ought to know about, then B must also pass those on the command line.

Also, say some service is listening on 2005. Then if mdb is using more than 5 ranks, the launch will fail and the user will have to trial and error a new port range that has a contiguous range of open ports.

The proposed solution

My proposal is to have a dedicated broadcast port which does a handshake to exchange information about what is being debugged and who is doing the debugging.

This could look something like:

Instance A: launches N gdb servers on 2001,2002,...2004,2006,...,2001+N, but listens itself at port 2000.

Instance B: connects to port 2000
  A: sends information about the launch, e.g. "there are N ranks, they are at these addresses: localhost:2001, localhost:2002. they are using intel MPI (might be relevant, who knows)"
  B: returns a checksum which A can verify
  A: ack to B and end handshake. stops listening on 2000
  B: connects to the GDB servers it received addresses for with knowledge about whats going on

In practice:

# Instance A
mdb launch -n $RANKS ./exe

# Instance B
mdb attach -b MAIN__

I think there's a lot of cool flexibility to be gained from this type of broadcast architecture. Am very happy to help with the implementation of this too.

What this could bring into scope

This could then also be used when daisy chaining mdb instances. Consider the above handshake. After A stops listening, B can listen on 2000 and if C comes along to connect, can learn everything from B about what has happened and get up to sync.

Potential issues

There are 100% zero things that could ever go wrong with this forever and ever.

finalize mdb launch logger

When the core mdb code is more stable, update mdb launch logger to write to save the log to the ~/.mdb directory with a UUID.

catch user input error

For example, if user types interact 0' instead of interact 0 they will get a rather annoying error that exits the debugging session.

(mdb 0-7) interact 0'
Traceback (most recent call last):
  File "/home/melt/miniconda3/envs/mdb/bin/mdb", line 33, in <module>
    sys.exit(load_entry_point('mdb', 'console_scripts', 'mdb')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/melt/miniconda3/envs/mdb/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/melt/miniconda3/envs/mdb/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/melt/miniconda3/envs/mdb/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/melt/miniconda3/envs/mdb/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/melt/miniconda3/envs/mdb/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/melt/sync/cambridge/projects/side/mdb/mdb/mdb_attach.py", line 115, in attach
    mshell.cmdloop()
  File "/home/melt/miniconda3/envs/mdb/lib/python3.12/cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
           ^^^^^^^^^^^^^^^^^
  File "/home/melt/miniconda3/envs/mdb/lib/python3.12/cmd.py", line 217, in onecmd
    return func(arg)
           ^^^^^^^^^
  File "/home/melt/sync/cambridge/projects/side/mdb/mdb/mdb_shell.py", line 114, in do_interact
    rank: int = int(line)
                ^^^^^^^^^
ValueError: invalid literal for int() with base 10: "0'"

It is probably sufficient to just catch the ValueError and return a message like Optional argument to command must be a string containing comma- and/or hyphen-separated numbers e.g., command 1,2-5 [command]. For help type "?" or "help"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.