Comments (6)
Original comment by [email protected]
on 22 Mar 2011 at 7:05
from sawbuck.
Our current disassembler makes many assumptions about the code it is parsing.
Notably, we assume certain behaviour regarding the placement and use of lookup
tables. Hand coded assembly does many things that violate these assumptions
(notably the entire crt library; a particularly bad offender is memcpy). It
would be useful to be able to distinguish hand written assembly from compiler
generated code, and only enforce our stronger assumptions on the latter. The
DIA API exposes this information via IDiaSymbol::get_language, and it would be
useful to annotate blocks with this information, extending BlockAttributeEnum.
Original comment by [email protected]
on 23 Mar 2011 at 6:55
from sawbuck.
Unfortunately, after exhaustively exploring the DIA symbols there is no
reliable way to determine whether a function is built from assembly or from a
higher level language.
The main motivation for finding this information was in order to handle data
sections. We know that the compiler (seems to?) put any static data at the end
of function, including jump tables, etc. Assembly functions can place data
wherever they want, including in the middle of the function body. Our data
detection routines were able to be smarter assuming we know that the code was
generated by the compiler.
Further investigations into the available DIA symbols revealed that information
regarding all static data *is* included in the PDB. Pushing this information
to the disassembler (along with alignment information, also present in the PDB)
should allow us to get a full disassembly of functions, including all data and
padding bytes. It also allows us to move away from heuristics for finding data
locations, which often fail in hand-coded assembly. (For example, we presently
assume that lookup tables are zero-indexed, but in 'memcpy' they are not. This
causes us to identify certain bytes as data, when they are in fact part of an
instruction.)
With this new information we will be able to skip the heuristics and reliably
label data. This will also allow us to stop the disassembler from running into
data.
Presently, the Decomposer provides information to the Disassembler in two
manners: through the OnInstruction callback, and through the Disassembler API
prior to calling 'Walk'. Using the OnInstruction callback is not sufficient
elegant because we can only provide information regarding an already decompiled
instruction; we would be able to tell the disassembler to back-up if it started
running into known data, but without greatly changing the API we could not tell
it about data extents.
In my mind, the simplest approach would be to extend Disassembler to accept
data extents much like it currently accepts labels using 'Unvisited'.
Original comment by [email protected]
on 24 Mar 2011 at 8:21
from sawbuck.
It has been observed that our data finding/hitting heuristics are now in fact
incorrect. We had previously been using the base address of table lookups (as
an argument to jmp functions) as an indication that data lives at that address.
We would then stop disassembly when it would overrun what had been assumed to
be data. Unfortunately, for hand-written assembly these lookup tables are not
always meant to be zero-indexed, in which case our assumed data location was
wrong (see for example, memcpy).
All of these heuristics become unnecessary with reliable data information, and
will not be needed once we extract Data information via DIA.
Original comment by [email protected]
on 28 Mar 2011 at 1:40
from sawbuck.
More accumulated knowledge that I feel the need to write down somewhere: the
public symbols provided by DIA do not have meaningful lengths. In fact, the
lengths are simply the distance between successive public symbols. However, we
need to use them because they are the only place we get information about the
location of virtual tables.
Original comment by [email protected]
on 8 Apr 2011 at 8:18
from sawbuck.
Fixed in http://code.google.com/p/sawbuck/source/detail?r=253.
Original comment by [email protected]
on 19 Apr 2011 at 6:09
- Changed state: Fixed
from sawbuck.
Related Issues (20)
- Failed to ceshe the downloaded installer. error: 0x80090008 HOT 2
- Patch for /trunk/sawbuck/installer/installer.gyp
- windows installer not installed
- [deleted issue]
- problem downloading buttons for toolbar HOT 1
- error meessage %\windir%\system32\Optional Features.exe check spelling
- dashboard stopped chrome HOT 1
- Patch for /trunk/sawbuck/installer/sawbuck.wxs
- no google earth' HOT 3
- google earth 1603 HOT 7
- Patch for /trunk/sawbuck/installer/sawbuck.wxs
- Cannot upload Chrome HOT 1
- Patch for /trunk/sawbuck/viewer/viewer_window.cc
- Error code 1603 when installing google earth HOT 5
- nnexion
- unable to install google chrome HOT 1
- "Ads by AdFree App" - a creepy Adware
- Error Log of Ads by AdFreeApp - Adware
- eroare 1603
- google earth wont downloas installer error 1603
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sawbuck.