Comments (9)
I don't have much experience with integrating python interpreters into C++ projects or the other way around, but the linux.py and ELF.py approach sounds good. It's actually how Macho and I think PE binary formats are supported. ELF is hardcoded in fcd.
I think one could actually get something like "function filtering" to work by delegating some of the entry point discovery from fcd to the scripts. Discover some entry points in the .py scripts, omit suff like __libc_start_main
and pass the entry point addresses to fcd for lifting and decompilation.
The issue with this approach I see is that some entry points are discovered using Remill via the recursive descent disassembly and I'm not sure if that would not reintroduce some entry points filtered by the scripts. Then again one could also pass a list of filtered entry points from the script to fcd and have fcd omit them as well.
from fcd.
To add, I think the scripting approach is strictly better than the interactive one. But that's just my opinion.
from fcd.
This raises one question for me, which is: should the main binary loading / parsing be done by C++ code? If we made fcd's C++ side cooperate with a Python side, then we could bring in third-party packages like Angr's cle to load in binary images, and have the C++ side actually invoke CLE to do the reading. I envision something like microx, where a class is provided that can be extended, and the extension implements methods for reading virtual memory, etc. This would then generalize to handling actual process memory dumps.
from fcd.
Yeah, this sounds pretty good. And I think fcd actually has some support for this already, from glacing over fcd/scripts
and fcd/fcd/executables
. I think the idea there is that a Python script needs to provide certain functions, like a function to translate virtual addresses and maybe others, to a C++ class. But I bet this could be modified to better suit things like Angr's cle.
from fcd.
I can also imagine *.py scripts being very useful in scenarios with packed and / or encrypted executables.
from fcd.
So maybe something like...
import cle
import fcd
# Memory abstraction that will let the decompiler read memory. You could
# implement Memory here by invoking APIs from cle, Binary Ninja, IDA Pro, etc.
# You could also provide info to fcd from a McSema-lifted CFG file, which contains
# rich info.
class ExecutableMemory(fcd.ExecutableMemory):
def __init__(self, ld):
self.ld = ld
def read(self, addr, num_bytes):
# do something with self.ld, returning a list or tuple bytearray
ld = cle.Loader(sys.argv[0])
memory = ExecutableMemory(ld)
decomp = fcd.Decompiler(memory)
decomp.add_entrypoint(0xf00, name="main")
# Fill in other named entrypoints from ld
# Maybe bring in Angr's CFGFast to invoke other APIs,
# e.g. decomp.mark_as_function() or something. Down
# the line, having the ability to mark indirect xrefs would
# be nifty.
# Now lift to bitcode
bc = decomp.lift()
# Show me the bitcode!
bc.dump(address=0xf00)
bc.dump(name="main")
# Eventually we could implement the emulator test suite
# via whatever bc is, e.g. bc.execute(cpu), where cpu is
# an object of a class implementing methods like
# read_register and read_memory.
bc.set_calling_convention(...)
bc.decompile(address=0xf00)
bc.decompile(name="main")
from fcd.
I think your example looks good, but it's also the reverse of what fcd currently does. Currently fcd uses Python to parse executables. Like for example...
import pefile
import bisect
# helper globals
stubs = {}
sectionStart = []
sectionInfo = {}
# fcd interface below (I assume this is what fcd's C++ Executable class requires)
executableType = "Portable Executable"
targetTriple = "unknown-unknown-win32"
entryPoints = []
def init(data):
# fill stubs, sectionStart, sectionInfo, ...
def getStubTarget(target):
# returns the target of a stub function (library functions, etc)
def mapAddress(address):
# maps virtual addresses to actual addresses in the binary
The above script is then passed to fcd via a command-line flag, for example $ fcd -f scripts/pe.py pefile.exe
, and during lifting, fcd then calls the functions from the above script to resolve stub targets, virtual addresses and what have you. Fcd then does the actual reading of binary data on it's own.
In your example it seems to me that fcd, would be more of a library with Python bindings, rather than a standalone executable, which I'm not opposed to, but I assume it would be a bit more work. That being said, it seems that C++ library with Python bindings is the way a lot of projects nowadays go, so why do something different.
from fcd.
I think library-ifying it is something I could pull together in a reasonable amount of time. It'd be pretty cool to expose fcd to Binary Ninja, for example.
from fcd.
It'd be pretty cool to expose fcd to Binary Ninja, for example.
That I completely agree with.
from fcd.
Related Issues (20)
- RFC: Recovery of parameters passed via stack HOT 1
- Alias Analysis of Remill's `State` structure
- Enhance function return type recovery in `RemillArgumentRecovery`
- Segfault in `ExpressionUse::setUse(Expression*)`
- Analyzing RA location on the stack
- Remill `State` and Intrinsic cleanup
- Analyze callsites in `RemillArgumentRecovery`
- Using McSema lifted bitcode HOT 2
- Migrate IR passes from `RemillTranslationContext::FinalizeModule()`
- Refactoring AST generation HOT 1
- Improve handling of conditions in AST generation HOT 3
- Handle global entities in AST generation HOT 1
- Handle floats in AST generation HOT 1
- Presentation of string literals in output pseudocode HOT 1
- Refactor python bindings
- Refactor header declaration parsing HOT 2
- Refactor AddressSpaceAAWrapperPass HOT 1
- Add a comprehensive help message to cmdline flags
- Bad value replacement in `ConvertRemillArgsToLocals()`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fcd.