GithubHelp home page GithubHelp logo

tlparse's Introduction

tlparse: Parse structured PT2 logs

tlparse parses structured torch trace logs and outputs HTML files analyzing data.

Quick start: Run PT2 with the TORCH_TRACE environment variable set:

TORCH_TRACE=/tmp/my_traced_log example.py

Feed input into tlparse:

tlparse /tmp/my_traced_log -o tl_out/

Adding custom parsers

You can extend tlparse with custom parsers which take existing structured log data and output any file. To do so, first implement StructuredLogParser with your own trait:

pub struct MyCustomParser;
impl StructuredLogParser for MyCustomParser {
    fn name(&self) -> &'static str {
        "my_custom_parser"
    }
    fn get_metadata<'e>(&self, e: &'e Envelope) -> Option<Metadata<'e>> {
        // Get required metadata from the Envelope.
        // You'll need to update Envelope with your custom Metadata if you need new types here
        ....
    }

    fn parse<'e>(&self,
        lineno: usize,
        metadata: Metadata<'e>,
        _rank: Option<u32>,
        compile_id: &Option<CompileId>,
        payload: &str
    ) -> anyhow::Result<ParserResult> {
       // Use the metadata and payload however you'd like
       // Return either a ParserOutput::File(filename, payload) or ParserOutput::Link(name, url)
    }
}

tlparse's People

Contributors

aorenste avatar drisspg avatar ezyang avatar jamesjwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tlparse's Issues

Improve stack trie suffix pruning

I'm still regularly seeing stack tries that look like this:

torch/nn/modules/module.py:1566 in _wrapped_call_impl
torch/nn/modules/module.py:1575 in _call_impl
torch/_dynamo/eval_frame.py:433 in _fn
torch/nn/modules/module.py:1566 in _wrapped_call_impl
torch/nn/modules/module.py:1575 in _call_impl
torch/_dynamo/convert_frame.py:1116 in __call__
[[2/0]](https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmp0umcnJ/index.html#[2/0]) [[2/1]](https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmp0umcnJ/index.html#[2/1]) torch/_dynamo/convert_frame.py:472 in __call__

The torch/_dynamo/convert_frame.py:1116 in __call__ definitely need to be killed. But there's also some funny business with _wrapped_call_impl indirection that also is unnecessary ๐Ÿค”

tlparse --latest

If I repeatedly TORCH_TRACE into a single directory, I'll accumulate lots of log files for each run. It would be convenient to have a --latest flag that make tlparse only process the latest log

UX polish wrt id fragments

It should be possible to click on something like [33/0] and get the hash so you can bookmark this link.

When you are navigated to #[33/0] fragment, the relevant fragment should be highlighted yellow

Self documentation on reports

Reports should say what MAST job they were generated from, what the command to tlparse that was used to generate it, what version of tlparse was used

Support collapsing nodes in stack trie

Example: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/20240321-wei-guo-ads-regression-f543344225-rank0/index.html

Should be able to click on the minus sign and that will cause the tree to fold up (so you can easily jump to the next sibling node)

Might also want to think about if we actually want these children:

- <torch_package_0>.caffe2/torch/fb/module_factory/sync_sgd/train_loop_pipeline/memcpy_comm_compute/torchrec/train_step.py:598 in run
- <torch_package_0>.caffe2/torch/fb/module_factory/sync_sgd/train_loop_pipeline/memcpy_comm_compute/torchrec/train_step.py:691 in run

these are technically the same function, maybe they should be put together? IDK.

Limited amount of runtime information associated with compiled frames

tlparse currently is a compile time only metrics collector. However, there is a small amount of runtime information that I think would be really useful:

  1. How many times a particular compiled frame was hit. In particular, if we recompile multiple times, we might be interested to know how hot each of the particular recompiles are, which tells you if there is some warmup / unspec thing going on, or if there is legitimate multiple dispatch going on
  2. How quickly the compiled frames run, a sort of poor man's profiling, but mostly I just want to see the timings from a compiled products perspective, as opposed to the usual performance trace perspective

Dump information about is_dynamo_compiling queries

When diagnosing why code doesn't work with torch.compile but works without, is_dynamo_compiling is a way for the problem to be a userspace problem. It should be obvious when this has been hit in torch.compile so that we can tell if these are suspicious and need further information

Render custom information in index.html

Internally, we want to render stuff like mast job ID and other metadata, but we should allow custom metadata (and artifacts of custom parsers) to be rendered somewhere in index.html when they aren't necessarily associated with a specific compile ID

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.