GithubHelp home page GithubHelp logo

adrianstone55 / symbolsort Goto Github PK

View Code? Open in Web Editor NEW
117.0 5.0 17.0 110 KB

A Utility for Measuring C++ Code Bloat

Home Page: http://gameangst.com/?p=320

License: Apache License 2.0

C# 100.00%

symbolsort's Introduction

This is an example application for analyzing the symbols from an executable extracted either from the PDB or from a dump using DumpBin /headers. More documentation is available at http://gameangst.com/?p=320

This code was originally authored and released by Adrian Stone ([email protected]). It is available for use under the Apache 2.0 license. See LICENCE file for details.

#SYMBOLSORT OVERVIEW:

SymbolSort is a utility for analyzing code bloat in C++ applications. It works by extracting the symbols from a dump generated by the Microsoft DumpBin utility or by reading a PDB file. It processes the symbols it extracts and generates lists sorted by a number of different criteria.

The lists are:

  • Raw Symbols, sorted by size

This list is generated from the complete set of symbols. No deduplication is performed so this list is intended to highlight individual large symbols.

  • File contributions, sorted by size

This list is generated by calculating the total size of symbols that contribute to a folder path. If the input is a COMDAT dump, the source location for symbols is the .obj or .lib file that DumpBin was run on (see usage for details). It is important to note that for COMDAT dumps individual symbols will appear multiple times coming from different .obj files. If the input is a PDB file, the source location for symbols is the actual source file in which the symbol is defined. The source file for data symbols is not always clearly defined within the PDB so in some cases it is a best guess.

  • File contribution, sorted by path

This is a complete, hierarchical list of the size of symbols in all contributing source files.

  • Symbol Sections / Types, sorted by total size and by total count

This shows a breakdown of symbols by section or type, depending on the kind of information that can be extracted from the input source.

  • Merged Duplicate Symbols, sorted by total size and by total count

This list is generated by merging symbols with identical names. The symbols are not guaranteed to be the same symbol. In the case of PDB input there will be very few duplicate symbols. COMDAT input, however, should contain a large number of duplicate symbols. This list is useful for measuring total compile and link time for a particular symbol. A relatively small symbol that appears in a very large number of .obj files will have a large total size and appear near the top of this list.

  • Merged Template Symbols, sorted by total size and by total count

This list is generated by stripping template parameters from symbols and then merging duplicates. Symbols std::auto_ptr and std::auto_ptr will be transformed into std::auto_ptr in this list and be counted together.

  • Merged Overloaded Symbols, sorted by total size and by total count

This list is generated by stripping template parameters and function parameters from symbols and then merging duplicates. Overloaded functions sqrt(float) and sqrt(double) will be transformed into sqrt(...) in this list and be counted together.

  • Symbol Tags, sorted by total size and by total count

This list represents a tag cloud generated from the symbol names. The symbols are tokenized and the total size and count is tallied for each token. I'm not sure what this list is good for, but I'm all about tag clouds so I couldn't resist including it.

USAGE:

SymbolSort [options]

Options:
  -in[:type] filename
      Specify an input file with optional type.  Exe and PDB files are
      identified automatically by extension.  Otherwise type may be:
          comdat - the format produced by DumpBin /headers
          sysv   - the format produced by nm --format=sysv
          bsd    - the format produced by nm --format=bsd --print-size

  -out filename
      Write output to specified file instead of stdout

  -count num_symbols
      Limit the number of symbols displayed to num_symbols

  -exclude substring
      Exclude symbols that contain the specified substring

  -diff:[type] filename
      Use this file as a basis for generating a differences report.
      See -in option for valid types.

  -searchpath path
      Specify the symbol search path when loading an exe

  -path_replace regex_match regex_replace
      Specify a regular expression search/replace for symbol paths.
      Multiple path_replace sequences can be specified for a single
      run.  The match term is escaped but the replace term is not.
      For example: -path_replace d:\\SDK_v1 c:\SDK -path_replace
      d:\\SDK_v2 c:\SDK

  -complete
      Include a complete listing of all symbols sorted by address.
    
Options specific to Exe and PDB inputs:
  -include_public_symbols
      Include 'public symbols' from PDB inputs.  Many symbols in the
      PDB are listed redundantly as 'public symbols.'  These symbols
      provide a slightly different view of the PDB as they are named
      more descriptively and usually include padding for alignment
      in their sizes.
    
  -keep_redundant_symbols
      Normally symbols are processed to remove redundancies.  Partially
      overlapped symbols are adjusted so that their sizes aren't over
      reported and completely overlapped symbols are discarded
      completely.  This option preserves all symbols and their reported
      sizes
    
  -include_sections_as_symbols
      Attempt to extract entire sections and treat them as individual
      symbols.  This can be useful when mapping sections of an
      executable that don't otherwise contain symbols (such as .pdata).
    
  -include_unmapped_addresses
      Insert fake symbols representing any unmapped addresses in the
      PDB.  This option can highlight sections of the executable that
      aren't directly attributable to symbols.  In the complete view
      this will also highlight space lost due to alignment padding.

Supported Input files

SymbolSort supports several types of input files:

COMDAT dump

A COMDAT dump is generated using the DumpBin utility with the /headers option. DumpBin is included with the Microsoft compiler toolchain. SymbolSort can accept the dump from a single .lib or .obj file, but the best way to use it is to create a complete dump of all the .obj files from an entire application. The Windows command line utility FOR can be used for this:

for /R "c:\obj_file_location" %n in (*.obj) do "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\DumpBin.exe" /headers "%n" >> c:\comdat_dump.txt

This will generate a concatenated dump of all the headers in all the .obj files in c:\obj_file_location. Beware, for large applications this could produce a multi-gigabyte file.

PDB or EXE

SymbolSort supports reading debug symbol information from .exe files and .pdb files. The .exe file will only be used to find the location of its matching .pdb file, and then the symbols will be extracted from the PDB. SymbolSort uses msdia140.dll to extract data from the PDB file. Msdia140.dll is included with the Microsoft compiler toolchain. In order to use it you will probably have to register the dll by running this command from an elevated command prompt:

regsvr32 "c:\Program Files (x86)\Microsoft Visual Studio 14.0\DIA SDK\bin\amd64\msdia140.dll"

It is important that you register the 64-bit version of msdia140.dll on 64-bit Windows and the 32-bit version on 32-bit Windows. Note that SymbolSort works with multiple versions of msdia*.dll, from at least msdia90.dll to msdia140.dll.

NM dump

Similar to the COMDAT dump, SymbolSort can accept symbol dumps from the unix utility nm. The symbols can be extracted from .obj files or entire .elfs. SymbolSort supports bsd and sysv format dumps. Sysv is preferred because it contains more information. The recommended nm commands lines are:

nm --format=sysv --demangle --line-numbers input_file.elf
nm --format=bsd --demangle --line-numbers --print-size input_file.elf

BUILDING:

SymbolSort comes with a Solution and Project file for Visual Studio 2015. If you want to build with VS2015 or a compatible future version of Visual Studio, just open SymbolSort.sln and hit build.

If you want to build it with a different version of Visual Studio, you can pretty easily start with just SymbolSort.cs and place it in a default-generated C# command line application. In order to get the msdia140 interop to work you must add msdia140.dll as a reference to the C# project. That is done either by dragging and dropping the dll onto the references folder in the C# project or by right clicking the references folder, selecting "Add Reference" and then browsing for the msdia140 dll.

You may get this error message:

A reference to 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\DIA
SDK\bin\amd64\msdia140.dll' could not be added. Please make sure that the
file is accessible, and that it is a valid assembly or COM component.

This just means that msdia140.dll has not been registered. This is easily fixed by running this command from an administrator command prompt:

regsvr32 "c:\Program Files (x86)\Microsoft Visual Studio 14.0\DIA SDK\bin\amd64\msdia140.dll"

REVISION HISTORY:

###1.2

  • Upgraded to Visual Studio 2010 / msdia100.dll
  • Added -path_replace option to convert paths stored in PDBs.
  • Added -complete option to dump a full list of all symbols sorted by address.
  • Added several options for controlling what symbols are included in PDB dumps since PDBs often list the same address redundantly under different labels.

###1.1

  • Added support for computing differences between multiple input sources
  • Added support for nm output for PS3 / unix platforms.
  • Changed command line parameters. See usage for details.
  • Added section / type information to output.

###1.0

  • First release!

FUTURE WORK (to be done by someone else!):

  • Add a GUI frontend to allow interactive filtering and sorting.
  • Read both the PDB and the COMDAT dump simultaneously and cross-reference the two. This would enable new kinds of analysis and richer dumps.
  • Produce additional merged symbol reports by merging all symbols from the same class or namespace or that match based on some more clever fuzzy comparison.
  • Improve relative -> absolute path conversion for nm inputs
  • Figure out how to extract string literal information from PDB.

symbolsort's People

Contributors

adrianstone55 avatar kasper93 avatar randomascii avatar stgatilov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

symbolsort's Issues

See how much bloat is generated by template class

A lot of code bloat comes from generic template classes. For instance, let it be MyVector defined in MyVector.h. If would be great if SymbolSort would allow to see how much code was generated by such class.

Right now it is possible to analyze object files (COMDAT), but there is no way to group symbols by class or by header file in such case. Also, it is possible to analyze PDB, but then duplication of symbols across object files is not taken into account (and it is important for analyzing build times).

I see two approaches to implement this feature:

  1. Extract classes from symbol names. Ideally, they can be extracted with namespaces, e.g. std::_XTree, and then grouped like SymbolSort does for paths. This is perhaps the best approach, but given how many special types of symbols exist, it becomes very hard to do it right. In fact, it is necessary to implement full-fledged parser of symbol names (and perhaps decorated symbols are even easier to parse than undecorated ones) to do it right.

  2. Attribute each symbol to the source file where its code is located. This information is absent in object files, but it is present in PDB files. So it is possible to read object file dumps for the main data, then read PDB files solely for setting proper code location to symbols. This approach has some disadvantages: mainly, not all symbols are present in PDB, and not all symbols have any location in source code.

Running SymbolSort with VS2017?

I'm having issues with running SymbolSort after building it with VS 2017. I tried both the provided solution file, and creating my own, using info from #17

I've tried registering msdia140.dll in various ways. With regsvr32, and with the registry patch from csoltenborn/GoogleTestAdapter#124

But still I get this:

Unhandled Exception: System.IO.FileNotFoundException: Retrieving the COM class factory for component with CLSID {E6756135-1E65-4D17-8576-610761398C3C} failed due to the following error: 8007007e The specified module could not be found. (Exception from HRESULT: 0x8007007E).

What can I do?

Allow to filter symbols by sections

I'm trying to analyze code size in TheDarkMod game based on idTech4 engine. It turns out that the engine uses global variables a LOT. So when I look at the result, 90% of symbols size is taken by .bss section.

I'd like to add a new parameter, which specifies which sections to include (all sections if it is not present). It would look like:

-sections code,data,rdata

Here is the PR: #24.

Symbol names in reports should be sorted

A useful technique for monitoring regressions is to created symbolsort reports and then textually compare them. However symbolsort doesn't have a consistent way of breaking ties in the Sorted by Size section. In a a recent pair of reports on Chrome this led to these two series of globals in the list of symbols:

   32768         data  gDigits                                                                                                                   
   32768         data  nacl_user                                                                                                                 
   32768         data  gInvSqrtTable                                                                                                             
   32768         data  nacl_thread_ids                                                                                                           

and:

   32768         data  nacl_thread_ids                                                                                                           
   32768         data  nacl_user                                                                                                                 
   32768         data  gInvSqrtTable                                                                                                             
   32768         data  gDigits                                                                                                                   

The only difference is that the first and last globals are swapped. Using symbol name as a tie-breaker would avoid this problem. I'll probably put together a PR for this later. A couple of swaps is not a problem but on Chrome reports there are typically hundreds.

BadImageFormatException on x64

Compiled with VS2017 15.9.3 Enterprise edition using x64/AnyCPU.
Registered "C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\DIA SDK\bin\amd64\msdia140.dll". When I run the exe, it throws an error:

PS C:\Users\foobar\work_projects\SymbolSort> C:\Users\foobar\work_projects\SymbolSort\bin\Release\SymbolSort.exe -in C:\Users\foobar\work_projects\someproduct\build\Windows\RelWithDebInfo\RelWithDebInfo\someproduct.pdb -out symbol_sort.txt

Loading symbols from C:\Users\foobar\work_projects\someproduct\build\Windows\RelWithDebInfo\RelWithDebInfo\someproduct.pdb

Unhandled Exception: System.BadImageFormatException: Retrieving the COM class factory for component with CLSID {761D3BCD-1304-41D5-94E8-EAC54E4AC172} failed due to the following error: 800700c1  is not a valid Win32 application. (Exception from HRESULT: 0x800700C1).
   at SymbolSort.SymbolSort.ReadSymbolsFromPDB(List`1 symbolsOutput, String filename, String searchPath, UserFlags options) in C:\Users\foobar\work_projects\SymbolSort\SymbolSort.cs:line 1181
   at SymbolSort.SymbolSort.LoadSymbols(InputFile inputFile, List`1 symbols, String searchPath, UserFlags options) in C:\Users\foobar\work_projects\SymbolSort\SymbolSort.cs:line 1550
   at SymbolSort.SymbolSort.Main(String[] args) in C:\Users\foobar\work_projects\SymbolSort\SymbolSort.cs:line 0

Crash when using -diff on Chrome

I hit the following crash when trying to compare two Chrome PDBs:

Loading symbols from old_06ef945\chrome.dll.pdb
Reading section info...
Reading source file info...
Reading global function symbols... 100% complete
Reading thunk symbols... 100% complete
Reading private data symbols... 100% complete
Reading global data symbols... 100% complete
Subtracting overlapping symbols... 100

Loading symbols from new_bb2dd59\chrome.dll.pdb
Reading section info...
Reading source file info...
Reading global function symbols... 3% complete
Unhandled Exception: System.ArgumentException: Value does not fall within the expected range.
at Dia2Lib.IDiaLineNumber.get_sourceFile()
at SymbolSort.SymbolSort.FindSourceFileForRVA(IDiaSession session, UInt32 rva, UInt32 rvaLength)
at SymbolSort.SymbolSort.ReadSymbolsFromScope(IDiaSymbol parent, SymTagEnum type, SymbolFlags additionalFlags, UInt32 startPercent, UInt32 endPercent, IDiaSession diaSession, List1 sectionContribs, Dictionary2 compilandFileMap, List1 symbols) at SymbolSort.SymbolSort.ReadSymbolsFromPDB(List1 symbols, String filename, String searchPath, Options options)
at SymbolSort.SymbolSort.LoadSymbols(InputFile inputFile, List`1 symbols, String searchPath, Options options)
at SymbolSort.SymbolSort.Main(String[] args)

I tried debugging it but the VS debugger seems to default to launching Any CPU C# programs as 32-bit which lead to a quick out-of-memory crash. Running the debug x64 version under the debugger worked correctly. So, this isn't a very useful bug report but I thought I'd put it out there anyway.

Issue when running on Windows

The command SymbolSort.exe -in SymbolSort.exe works fine (even though the results are all zeros), but when I try to command to run on my real target, I get the following issue. Maybe the developer has insight on what might be causing this?

[edit]: I've tried building multiple different ways (with the default VS solution, with a brand new C# project, etc.) per the other Issues on this repo. Everything seems to lead to this same NotImplementedException. I have no issues with building, and my dll file is in the right place and included properly per the instructions I could find here and other places. Clearly I'm still not doing something right though.....

Reading section info...

Unhandled Exception: System.NotImplementedException: The method or operation is not implemented.
   at Dia2Lib.IDiaSession.getEnumTables(IDiaEnumTables& ppEnumTables)
   at SymbolSort.SymbolSort.GetEnumSectionContribs(IDiaSession session) in C:\SymbolSort\SymbolSort.cs:line 679
   at SymbolSort.SymbolSort.BuildSectionContribTable(IDiaSession session, List`1 sectionContribs) in C:\SymbolSort\SymbolSort.cs:line 889
   at SymbolSort.SymbolSort.ReadSymbolsFromPDB(List`1 symbolsOutput, String filename, String searchPath, UserFlags options) in C:\SymbolSort\SymbolSort.cs:line 1198
   at SymbolSort.SymbolSort.LoadSymbols(InputFile inputFile, List`1 symbols, String searchPath, UserFlags options) in C:\SymbolSort\SymbolSort.cs:line 1550
   at SymbolSort.SymbolSort.Main(String[] args) in C:\SymbolSort\SymbolSort.cs:line 1825

Not working on VS2019?

I've registered the dll DIA SDK\bin\amd64\msdia140.dll and also added the dll into reference.
I build and run x64 target but it always throw the exeception:

Reading section info...
Reading source file info...
Reading global function symbols...
System.NotImplementedException: The method or operation is not implemented.
   at Dia2Lib.IDiaSymbol.findChildren(SymTagEnum symTag, String name, UInt32 compareFlags, IDiaEnumSymbols& ppResult)
   at SymbolSort.SymbolSort.ReadSymbolsFromScope(IDiaSymbol parent, SymTagEnum type, SymbolFlags additionalFlags, UInt32 startPercent, UInt32 endPercent, IDiaSession diaSession, List`1 sectionContribs, Dictionary`2 compilandFileMap, List`1 symbols) in  C:\ProgramsData\VSProject\DEPENDENCY\SymbolSort\SymbolSort.cs:915
   at SymbolSort.SymbolSort.ReadSymbolsFromPDB(List`1 symbolsOutput, String filename, String searchPath, UserFlags options)  in C:\ProgramsData\VSProject\DEPENDENCY\SymbolSort\SymbolSort.cs: 1224
   at SymbolSort.SymbolSort.LoadSymbols(InputFile inputFile, List`1 symbols, String searchPath, UserFlags options) in C:\ProgramsData\VSProject\DEPENDENCY\SymbolSort\SymbolSort.cs:1550
   at SymbolSort.SymbolSort.Main(String[] args) at C:\ProgramsData\VSProject\DEPENDENCY\SymbolSort\SymbolSort.cs:1825

The msdia140.dll version is 14.28.29910.0

Build instructions don't seem to work

I tried building SymbolSort, with VC++ 2015, and I couldn't get it to work. All possible ways of adding msdia140.dll as a reference seemed to have no effect. This may be due to my lack of recent experience with driving COM from C#.

However, with some poking around I found a method. I don't know if it is the one-true method or if I was doing something wrong previously. Here is what I found - I think it would be good to get this or something equivalent added to the instructions.

First, I needed to add a reference to msdia140.dll. I found some ideas here:
http://stackoverflow.com/questions/697541/how-do-i-use-the-ms-dia-sdk-from-c

which I turned in to these steps:

"%VS140COMNTOOLS%....\VC\vcvarsall.bat" amd64
set DIASDK=%VS140COMNTOOLS%....\DIA SDK
midl /I "%DIASDK%\include" "%DIASDK%\idl\dia2.idl" /tlb dia2.tlb
tlbimp dia2.tlb

This created dia2lib.dll which I added as a reference. I did this by right-clicking on References, selecting Add Reference, clicking Browse, finding the file, then clicking okay.

Building then gave these two errors:

1>SymbolSort.cs(1164,7,1164,21): error CS1752: Interop type 'DiaSourceClass' cannot be embedded. Use the applicable interface instead.
1>SymbolSort.cs(1164,38,1164,52): error CS1752: Interop type 'DiaSourceClass' cannot be embedded. Use the applicable interface instead.

After some quick research I found this article http://stackoverflow.com/questions/2483659/interop-type-cannot-be-embedded which told me to select dia2lib from the references in the project, view properties, and change Embed Interop Types to False. This worked.

So, this worked for me. Maybe some combination of a batch file and solution/project file could be added to simplify this process? I'm not sure. Mostly I wanted to get my instructions/observations up here so I don't have to figure this out again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.