GithubHelp home page GithubHelp logo

dds's People

Contributors

pgoodman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dds's Issues

Extend fall-through conditions to cover more cases.

Right now we have the following relations which detect a common pattern of trailing padding/NOPs between the last jmp or ret in a function, and the beginning of the subsequent function. The relations look as follows:

; Keep track of a linear sequence of instructions that
; falls-through to the beginning of a function.
#local falls_through_to_function(EA)


; The base case is an instruction that falls-through
; into the head of a function.
falls_through_to_function(EA)
    : raw_transfer(EA, FuncEA, EDGE_FALL_THROUGH)
    , function(FuncEA).


; The inductive case is an instruction that falls-through
; to another instruction that falls-through to a function.
falls_through_to_function(EA)
    : raw_transfer(EA, ToEA, EDGE_FALL_THROUGH)
    , falls_through_to_function(ToEA).


; Often times there is padding between functions. This manifests
; as one function ending in a `ret` or ` jmp`, followed by some
; padding NO-OPs, followed by the head of another function. We
; don't want any of the instructions following the `ret`/`jmp`
; to be included as reachable from inside a function if they fall
; through in this way, and so here we restrict the pseudo fall-
; through
transfer(FromEA, ToEA, EDGE_PSEUDO_FALL_THROUGH)
    : fixed_transfer(FromEA, ToEA, EDGE_PSEUDO_FALL_THROUGH)
    , !falls_through_to_function(ToEA).


; If a terminator instruction, e.g. `jmp` or `ret` is immediately
; followed by a function head then we don't want to treat the
; pseudo-flow as being an inter-procedural flow.
transfer(FromEA, ToEA, EDGE_PSEUDO_FALL_THROUGH)
    : fixed_transfer(FromEA, ToEA, EDGE_PSEUDO_FALL_THROUGH)
    , !falls_through_to_function(ToEA)
    , !function(ToEA).

We should extend these relations as follows:

  • If we fall through into a non-instruction, then we should not treat the transfer as pseudo-transfer.
  • If we fall through into a new section, then maybe we should not treat the transfer as pseudo-transfer. Think about this a bit more.
  • If we fall-through to an error instruction, then we should not treat the transfer as pseudo-transfer.

Semantics integration, per-function throwaway databases for heavyweight dataflow analyses

This issue serves to track ideas and goals related to integrating instruction semantics.

Basic example:

EA_LEA:   lea eax, [edi + esi * 2 + 0xf00]
EA_MOV:   mov ecx, dword ptr [eax]

From this, we'd like to create the following approximate relations:

raw_operation(0, FuncEA, EA_LEA, READ_REGISTER_32, REG_EDI, _, _)  ; read edi
raw_operation(1, FuncEA, EA_LEA, READ_REGISTER_32, REG_ESI, _, _)  ; read esi
raw_operation(2, FuncEA, EA_LEA, CONSTANT_32, 0xf00, _, _)
raw_operation(3, FuncEA, EA_LEA, BASE_2xINDEX_DISP, 0, 1, 2) ; edi + (esi * 2) + 0xf00
raw_operation(4, EA_LEA, WRITE_REGISTER_32, REG_EAX, 3, _)

raw_operation(5, FuncEA, EA_MOV, READ_REGISTER_32, REG_EAX, _, _)
raw_operation(6, FuncEA, EA_MOV, READ_MEMORY_32, 5, _, _)  ; `dword ptr [eax]`
raw_operation(7, FuncEA, EA_MOV, WRITE_REGISTER_32, REG_ECX, 6, _)

In the above, things like REG_EDI would be the unique IDs of values for the registers. So probably raw_operation wouldn't start at 0.

From here, we'd want rules that do some basic things like copy propagation. This means definition the post-instruction register state. These rules would exist in a separare datalog db, with one instance per function. The idea would be to run these, have them publish messages that we'd store back into the main db, then destroy these instances until they're needed again. Key idea: throwaway databases.

; Orchestrated by creator of throwaway db, unique values for `ValOnFuncEntry`.
#message func_entry_reg_state(FuncEA, RegName, ValOnFuncEntry)

; Sent from the users of the main DB to us, based off of `transfer`.
#message local_transfer(FromEA, ToEA)

; !!!!
; Could have a bunch of messages, e.g. for constants or symbolic values that are
; user-provided, for interactive constant folding / symexec during a speculative execution
; or speculative specialization of a function.
; !!!!

; Base case for beginning of a function: each register has a tombstone `VAL_ON_FUNC_ENTRY` value
; on entry to a function.
post_inst_reg_state(FuncEA, RegName, ValOnFuncEntry)
    : func_entry_reg_state(FuncEA, RegName, ValOnFuncEntry).
    , !raw_operation(_, FuncEA, FuncEA, WRITE_REGISTER_32, RegName, _, _).

; Any operation that writes out the register value submits its value to the post-register state
; of that instruction. 
post_inst_reg_state(InsnEA, RegName, WrittenId)
    : raw_operation(_, FuncEA, InsnEA, WRITE_REGISTER_32, RegName, WrittenId, _).

; If we have a flow from `PredInsnEA` to `InsnEA`, and if `InsnEA` doesn't write to
; `RegName`, then propagate the value of `WrittenId` for `RegName`.
post_inst_reg_state(FuncEA, InsnEA, RegName, WrittenId)
    : local_transfer(PredInsnEA, InsnEA)
    , !raw_operation(_, FuncEA, InsnEA, WRITE_REGISTER_32, RegName, _, _)
    , post_inst_reg_state(FuncEA, PredInsnEA, RegName, WrittenId).

I think with the simple rules above, execution would converge toward the following:

  • the datalog system would explore all possible paths through the function
  • the datalog system /should/ converge because the above doesn't do any actual constant folding, i.e. it never invents or introduces new values, just copied some around. It's tantamount to repeated matrix multiplication.

For every point in a function, we would be able to express register values in terms of register values from another place in the function. This is something that the GrammaTech people mentioned in their ddisasm paper. Basically, we could say: there exists a path such that the register written at instruction EA1 is read by the instruction at EA2.

I think one thing that becomes apparent from this type of "raw operation" representation is that we'd want to represent conditional register writes that either preserve the register contents or alter them, that way we can model those data-centric flows.

ROP gadget finding

Just for fun, here is what it would look like to identify possible ROP gadgets.

#foreign gadget ```python Gadget```


; Declare a functor that will create an initial gadget, containing
; only one instruction (typically a return instruction).
#functor init_gadget(bound u64 RetEA, bound bytes RetInstBytes,
                     free gadget Chain) range(.)


; Declare a functor that will extend a gadget with a single instruction.
#functor extend_gadget(bound u64 EA, bound bytes InstBytes,
                       bound gadget BaseChain, free gadget Chain) range(.)


#local gadget_at(EA, Gadget)


; Make all discovered gadgets available.
#query gadget(free gadget Gadget)

gadget(Gadget) : gadget_at(_, Gadget).


; Base case, a return instruction.
gadget_at(InstEA, Gadget)
    : instruction(RetEA, INSN_RETURN, RetBytes)
    , init_gadget(RetEA, RetBytes, Gadget).


; Inductive case, a fall-through into a gadget.
gadget_at(InstEA, Gadget)
    : raw_transfer(InstEA, GadgetEA, EDGE_FALL_THROUGH)
    , gadget_at(GadgetEA, BaseGadget)
    , instruction(InstEA, INSN_NORMAL, InstBytes)
    , extend_gadget(InstEA, InstBytes, BaseGadget, Gadget).


; Inductive case, an unconditional jump into a gadget.
gadget_at(InstEA, Gadget)
    : raw_transfer(InstEA, GadgetEA, EDGE_JUMP_TAKEN)
    , gadget_at(GadgetEA, BaseGadget)
    , instruction(InstEA, INSN_DIRECT_JUMP, InstBytes)
    , extend_gadget(InstEA, InstBytes, BaseGadget, Gadget).


#prologue ```python

Gadget = Tuple[Tuple[int, bytes], ...]

def init_gadget_bbf(ea: int, ea_bytes: bytes) -> Gadget:
  reuturn ((ea, ea_bytes),)

def extend_gadget_bbbf(ea: int, ea_bytes: bytes, gadget_chain: Gadget) -> Gadget:
  return ((ea, ea_bytes),) + gadget_chain

#```

Add support for recognizing and propagation error conditions

We should be able to mark functions as being no-return, e.g. _Exit, exit, abort, __assert_fail, etc.

We should propagate error-ness as follows:

  • A jump to a non-decodable instruction is an error EA
  • A jump to an error instruction DestEA is an error EA
  • A jump to an error DestEA is an error EA
  • A fall-through to an error instruction is a no-return EA
  • A fall-through to a non-decodable instruction is a no-return EA
  • A fall-through to an error DestEA is an error EA
  • A call to an non-decodable instruction DestEA is an error EA
  • A call to an error instruction DestEA is an error EA
  • A call to an error DestEA is an error EA
  • A conditional branch, where TakenEA and NotTakenEA are errors, is also an error EA

Eventually, we'd also want to augment error flows with:

  • Have a command-line flag that tells us to include/exclude privileged instructions as decodable
  • Introduce semantics support and look for things like dereferencing the NULL pointer, dividing by zero, etc.

Error flows should be omitted from the usual control-flows, as they are guaranteed to error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.