GithubHelp home page GithubHelp logo

Comments (6)

GoogleCodeExporter avatar GoogleCodeExporter commented on August 18, 2024

Original comment by [email protected] on 22 Mar 2011 at 7:05

from sawbuck.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 18, 2024
Our current disassembler makes many assumptions about the code it is parsing.  
Notably, we assume certain behaviour regarding the placement and use of lookup 
tables.  Hand coded assembly does many things that violate these assumptions 
(notably the entire crt library; a particularly bad offender is memcpy).  It 
would be useful to be able to distinguish hand written assembly from compiler 
generated code, and only enforce our stronger assumptions on the latter.  The 
DIA API exposes this information via IDiaSymbol::get_language, and it would be 
useful to annotate blocks with this information, extending BlockAttributeEnum.

Original comment by [email protected] on 23 Mar 2011 at 6:55

from sawbuck.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 18, 2024
Unfortunately, after exhaustively exploring the DIA symbols there is no 
reliable way to determine whether a function is built from assembly or from a 
higher level language.  

The main motivation for finding this information was in order to handle data 
sections. We know that the compiler (seems to?) put any static data at the end 
of function, including jump tables, etc.  Assembly functions can place data 
wherever they want, including in the middle of the function body.  Our data 
detection routines were able to be smarter assuming we know that the code was 
generated by the compiler.

Further investigations into the available DIA symbols revealed that information 
regarding all static data *is* included in the PDB.  Pushing this information 
to the disassembler (along with alignment information, also present in the PDB) 
should allow us to get a full disassembly of functions, including all data and 
padding bytes.  It also allows us to move away from heuristics for finding data 
locations, which often fail in hand-coded assembly.  (For example, we presently 
assume that lookup tables are zero-indexed, but in 'memcpy' they are not.  This 
causes us to identify certain bytes as data, when they are in fact part of an 
instruction.)

With this new information we will be able to skip the heuristics and reliably 
label data.  This will also allow us to stop the disassembler from running into 
data.

Presently, the Decomposer provides information to the Disassembler in two 
manners: through the OnInstruction callback, and through the Disassembler API 
prior to calling 'Walk'.  Using the OnInstruction callback is not sufficient 
elegant because we can only provide information regarding an already decompiled 
instruction; we would be able to tell the disassembler to back-up if it started 
running into known data, but without greatly changing the API we could not tell 
it about data extents.

In my mind, the simplest approach would be to extend Disassembler to accept 
data extents much like it currently accepts labels using 'Unvisited'.

Original comment by [email protected] on 24 Mar 2011 at 8:21

from sawbuck.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 18, 2024
It has been observed that our data finding/hitting heuristics are now in fact 
incorrect.  We had previously been using the base address of table lookups (as 
an argument to jmp functions) as an indication that data lives at that address. 
 We would then stop disassembly when it would overrun what had been assumed to 
be data.  Unfortunately, for hand-written assembly these lookup tables are not 
always meant to be zero-indexed, in which case our assumed data location was 
wrong (see for example, memcpy).

All of these heuristics become unnecessary with reliable data information, and 
will not be needed once we extract Data information via DIA.

Original comment by [email protected] on 28 Mar 2011 at 1:40

from sawbuck.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 18, 2024
More accumulated knowledge that I feel the need to write down somewhere: the 
public   symbols provided by DIA do not have meaningful lengths. In fact, the 
lengths are simply the distance between successive public symbols. However, we 
need to use them because they are the only place we get information about the 
location of virtual tables.

Original comment by [email protected] on 8 Apr 2011 at 8:18

from sawbuck.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 18, 2024
Fixed in http://code.google.com/p/sawbuck/source/detail?r=253.

Original comment by [email protected] on 19 Apr 2011 at 6:09

  • Changed state: Fixed

from sawbuck.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.