GithubHelp home page GithubHelp logo

Comments (5)

yegord avatar yegord commented on August 24, 2024

It would be great to see a minimal example.

You can disassemble a range of addresses, either using GUI, or the command-line decompiler (--from, --to, --print-instructions). If a range starting with 0x00 0x00, followed by a sensible instruction, is disassembled correctly, and a range starting with 0x00, followed by the same sensible instruction, is not, we probably have a bug.

Could you provide such an example (essentially, several bytes of executable code)?

from snowman.

fourierules avatar fourierules commented on August 24, 2024

ill need some time to get everything setup on my box at home but in the mean time...
what I see is something like this out of objdump:
12c366: c3 ret
12c367: 00 00 add
12c369: 00 00 add
12c36b: 00 48 89 add
12c36e: 5c pop
12c36f: 24 e8 and
12c370: 48 89 6c 24 f0 mov

somewhere else in the code I might see a call or jump to address 12c36c which is seems like it makes the add at 12c36b erroneous.

snowmans output for the same function shows the add operations and If I am following decode.c properly it set the instruction offset at the same incorrect address.
Ill try to get a full snippet from the binary later.

from snowman.

yegord avatar yegord commented on August 24, 2024

Probably what you see is the following.

The decompiler is pretty stupid at deciding where an instruction begins.
It just tries to disassemble starting from the 0th byte in the code section.
If disassembling succeeds, it takes the size of the instruction, adds it to the address of the instruction, and tries to disassemble the instruction at this address.
If disassembling fails, it adds 1 (on x86) to the address of the previous attempt and tries again.

So, if for "00 48 89" libudis86 says this is a 3-bytes-longs add instruction, the decompiler trusts it and goes on with disassembling the immediately following "5c".

Ideally, one should do something more clever: like first identifying jump destinations, then disassembling everything starting from these jump destinations, and then trying to disassemble all the rest using something like the current approach.

So, IR generation (when you have IR, you know what is a jump, and can estimate its destination) should be intertwined with disassembling.

from snowman.

fourierules avatar fourierules commented on August 24, 2024

Agreed. I trued building a test object but couldn't get the padding to show up between functions but was able to get a simple hack to skip some cases of zeros after UD_Iret and UD_Ijmp. Im still trying to trace it all out. Its not as simple as I had thought.

Blank lines between instructions have any specific meaning?

1dce5b: mov [rsp], edx

1dce5f: jz 0x1dce5f

1dce62: inc dword [rax]

I think this is getting misinterpreted too. Right after this is another group of adds from odd zero padding.

from snowman.

yegord avatar yegord commented on August 24, 2024

Blank lines between instructions mean that one instruction begins not where the previous ends:

if (instr->addr() != successorAddress) {

from snowman.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.