binaryanalysisplatform / bap Goto Github PK
View Code? Open in Web Editor NEWBinary Analysis Platform
License: MIT License
Binary Analysis Platform
License: MIT License
To reproduce on Trusty with 76c7cf2:
$ cat a.c
#include<stdio.h>
int main() {
printf("hello world\n");
}
$ gcc -g a.c
$ baptop
and then:
# let open Core_kernel.Std in let open Or_error in let open Bap.Std in
Image.create "a.out" >>= fun (image, errs) ->
printf "%d\n" @@ List.length errs;
return @@ Table.iteri (Image.symbols image) ~f:(fun m s ->
Image.section_of_symbol image s |> Fn.ignore);;
0
Exception:
("link_exn: unbound value "
((name main) (is_function true) (is_debug true) (locations (((addr ((z 0x40052D) (w 64) (signed false))) (len 16)) ())))).
BAP currently has much more requirements. Also it would be a good idea to advertise python bindings
currently they are stubs, should be easy to fix though
Hi,
Can I get documentation on how to install and use BAP and which version of ocaml and opam to be used. I am facing problems with version numbers of ocaml and opam. If I run "./configure" in bap folder, then I get the following error.
./configure
File "preconfig.ml", line 24, characters 14-27:
Warning 3: deprecated: String.create
Use Bytes.create instead.
File "preconfig.ml", line 29, characters 12-25:
Warning 3: deprecated: String.create
Use Bytes.create instead.
Exception: Sys_error "myocamlbuild.ml: No such file or directory".
Current version of ocaml - 4.02.1
opam - 1.2.0
If I use ocaml 4.00.1, the I get unbound value |> error.
Thank you.
I started with the official Ubuntu Trusty 64 Vagrant box and followed the instructions in the BAP README. I found that the list of system package dependencies was incomplete.
Installing OPAM from the Ubuntu package manager did not install ocaml-native-compilers
. The BAP installation failed while building the ocamlfind
dependency and I had to revert OPAM and begin the installation again. Additionally I needed the m4
package for a different BAP dependency. Unfortunately I didn't write down which one in my notes, but I had to revert OPAM and begin the BAP installation again.
The last error I had before I gave up:
===== ERROR while installing ssl.0.4.7 =====
# opam-version 1.1.1
# os linux
# command ./configure --prefix /home/vagrant/.opam/system
# path /home/vagrant/.opam/system/build/ssl.0.4.7
# compiler system (4.01.0)
# exit-code 1
# env-file /home/vagrant/.opam/system/build/ssl.0.4.7/ssl-21466-427793.env
# stdout-file /home/vagrant/.opam/system/build/ssl.0.4.7/ssl-21466-427793.out
# stderr-file /home/vagrant/.opam/system/build/ssl.0.4.7/ssl-21466-427793.err
### stdout ###
# ...[truncated]
# checking for ocamldep... /usr/bin/ocamldep
# checking for ocamllex... /usr/bin/ocamllex
# checking for ocamlyacc... /usr/bin/ocamlyacc
# checking for ocamldoc... /usr/bin/ocamldoc
# checking for ocamlmktop... /usr/bin/ocamlmktop
# checking for gcc... (cached) gcc
# checking whether we are using the GNU C compiler... (cached) yes
# checking whether gcc accepts -g... (cached) yes
# checking for gcc option to accept ISO C89... (cached) none needed
# checking for SSL_new in -lssl... no
### stderr ###
One last note that may be worth mentioning is to recommend users in the README run opam update
before the installation. The installation instructions don't say to do that, and it took me a while to figure out why BAP was an "invalid package name." The corresponding apt-get
message is Unable to locate package
.
The opam file that comes with bap is unreliable. It would be nice to
give 'opam install bap' an option to specify where to find llvm-config.
I don't know how to do this.
Another issue, is that 'opam remove' tries to run ./configure, which
can fail miserably and leave the .opam directory in a messy state.
This is annoying because opam will automatically attempt to remove
bap (and then recompile it) when you install something else (like utop).
It would be more robust to change the opam file as follows:
remove: [
["ocamlfind" "remove" "bap"]
["ocamlfind" "remove" "core_lwt"]
["rm" "-f" "%{bin}%/bap-mc" "%{bin}%/bap-server" "%{bin}%/baptop"
"%{bin}%/bapbuild" "%{bin}%/train" "%{bin}%/readbin"
"%{bin}%/fbi" "%{bin}%/byteweight"]
]
Bruno
It's seems that there aren't dynamic libraries on debian installed by default, so may be it's better to provide us with static linking ?
Current parser uses bitstring library, that requires us to convert the whole binary to string and load it to memory.
actually this is already done, but #34 showed up. So until I resolve the issue, I can't add them.
So that they will build with all others bap staff.
oasis outputs to many warnings that doesn't have any connections with real problems, like warning without piqi module that it can't find. Looks like that everybody is scared by them. It would be nice, to address them, or maybe just to hide.
as well as elf parser bap_dwarf
should also use bigstring, and work without copying data.
Even if we do not specify --enable-serialization
piqi is still required, and compilation breaks with:
Exception: PropList.Not_set ("piqic_path", None)
P.S. this piqi produces more problems than solves
readbin outputs addresses in hex, but immediate operands in decimal. Example:
readbin /Users/dbrumley/git/GitHub/binaryanalysisplatform/x86-binaries/elf/binutils/gcc_binutils_32_O0_a
readbin:
....
0804C7D9 (SUB32ri8 ESP ESP 48) | subl $48, %es
p
objdump:
...
804c7d9: 83 ec 30 sub $0x30,%esp
Also, it would be good to standardize to lower case.
all jumps are decoded as relative.
In order to make BAP usable to the majority of people doing binary analysis today, it needs to have a functional, stable, and documented Python API.
it creates lots of files in /tmp
folder, that should be cleaned with some policy.
running baptop
will result in "Package requires itself: bap" error.
It should be packaged as a separate opam library.
Right now we only have min_addr
and max_addr
.
Will PR.
For a bap developer setup (built from github), it is easier to install the right dependencies by doing:
$ opam install bap --deps-only
The reasoning here is that, if you are interested in developing specifically against bap , this will install the exact dependencies it is built on, so you don't have to worry about opam upgrade
. It can replace the following on the wiki:
The easiest way to install the OCaml dependencies of bap
is to use
the opam
package manager:
$ opam install $(cat opam.deps)
If you are using a development version, e.g., you have just cloned this from
github, then you will also need the oasis
package in order to create a build
environment.
$ opam install oasis
bap
will be much easier to install with opam install bap
. Also, packing into opam will give us an access to mac os x testing.
At present, the first coordinate of the returned value of Disasm.Basic.insn_of_mem
is the input memory. The doc says it returns the consumed memory (and this is indeed what a user would wish to know).
Example baptop inputs:
open Core_kernel.Std;;
open Or_error;;
open Bap.Std;;
let disassembler = Disasm.Basic.create ~backend:"llvm" "x86_64";;
let bigstr = Bigstring.of_string @@ String.init ~f:(fun _ -> '\x90') 100;;
let base = Addr.of_int64 0L;;
let mem = Memory.create LittleEndian base bigstr;;
let memok = ok_exn mem;;
let x = disassembler >>= fun d -> Disasm.Basic.insn_of_mem d memok;;
let a, b, c = ok_exn x;;
let memory_size mem = Memory.to_buffer mem |> Bigsubstring.length;;
let _ = printf "%d bytes to encode %S\n"
(memory_size a)
(Disasm.Basic.Insn.asm @@ uw b);;
Output:
100 bytes to encode "\tnop"
If one uses the readbin.native utility to read binary, there will be an error as below:
Elf_backend: failed with exn: ("validation errors" ((table.offset "value 52 < bound 64")))
Test binary:
https://github.com/tiffanyb/ARM-test/blob/master/coreutils/coreutils_O0_echo
bap-byteweight
has all the needed stuff on the board, it can output symbols
found in the symbol table with bap-byteweight symbols file
, as well as output symbols found with byteweight bap-byteweight find file
. Given this, we can build a test suite, that will check, that we have zero false negatives.
ARM lifter fails while lifting certain instructions. A place of failure is all the same:
bap_disasm_arm_mem_shift.ml:49:23: got register instead of imm
, where in the place of offset we have a register.
Here is the output on my mac:
$ ./configure --prefix=$(opam config var prefix)
Exception:
Failure
"Unable to load environment, the file '/Users/dbrumley/git/GitHub/binaryanalysisplatform/bap/setup.data' doesn't exist.".
$ opam config var prefix
/Users/dbrumley/.opam/401bap
As an FYI:
$ oasis setup
W: Cannot find source file matching module 'Stmt_piqi' in library serialization
W: Cannot find source file matching module 'Stmt_piqi_ext' in library serialization
The failure is overflow, but the actual problem is that something is parsed in a wrong way. Maybe it is just a wrong version of a dwarf format. (btw, it should be checked)
or install baptop
separately.
Although we're calling it with --port
option, currently it is ignored
currently we store all memory chunks, provided by our clients. But since we're using memory mapping, they don't cost us very much. Unlike qira, bap was never killed my oom. But... sooner or later it will happen.
This is a known issue on which any one can hit, so I will post it here as a precaution.
If llvm
library is compiled with new libc++
library, and bap
is compiled with GNU g++
with libstd++
library, then we will crash, since this libraries and compilers are not very ABI
compatible. The table below summarizes the expected (and tested behavior):
bap | llvm | result |
---|---|---|
gcc/stdc++ | clang/c++ | ๐ |
gcc/c++ | clang/c++ | ๐ |
gcc/stdc++ | clang/stdc++ | ๐ |
clang/stdc++ | clang/stdc++ | ๐ |
clang/stdc++ | clang/c++ | ๐ |
clang/c++ | clang/c++ | ๐ |
Bap's configuration scripts will try to evade this problems and set the proper flags depending on assumptions, that on linux we use libstdc++
, while on mac os x libc++
is used by default. So, for the end user this should just work. But if you're playing with --with-cxx
options and with-cxxlibs
flags, then make sure that they're coherent with llvm.
there should be clearly stated that bap depend on oasis of version 0.4
Travis system didn't fire an error, when baptop
broke, here is the excerpt from the "green" build:
+./test.sh coreutils
๏ฟฝ[0mPackage requires itself: bap
๏ฟฝ[0mFile "bil.ml", line 1, characters 0-12:
Error: Unbound module Bap
There should be an option to output the linear sweep disassembly of a section/segment, as done with objdump -D . This would be useful towards making bap a complete replacement for objdump.
Transports should be plugins, loadable at runtime.
elf backend doesn't support symbol tables, except debugging ones.
Currently bap supports only arm, x86 and x86_64 architectures. But since it uses LLVM it can provide some level of support for much more architectures.
We should extend Arch
module in a Bap_types
library with more architectures, so that readbin
and bap-mc
can work on them.
Bap-server has a pool of disassemblers, but the all ran in one thread.
not sure that it will give a significant speed up, especially for short runs, but we can try. It can be done either by using preemptive threads or lwt jobs.
It can and should reuse most of readbin facilities, like pretty printing, optimizations, etc. I think that it can even reuse its command-line interface.
Build system produces a few nasty warnings in OCaml 4.02
String.create
, and OCaml 4.02 asks us to use Bytes
that we can't use since we'reString.make
will help. Otherwise we can moveBigarrays
myocamlbuild.ml.in
.The following code:
begin(dired_dump_obstack_ENTRY)
0000a48c: 00 48 2d e9 push {r11, lr} ; STMDB_UPD(SP,SP,0xe,Nil,R11,LR)
0000a490: 04 b0 8d e2 add r11, sp, #0x4 ; ADDri(R11,SP,0x4,0xe,Nil,Nil)
0000a494: 20 d0 4d e2 sub sp, sp, #0x20 ; SUBri(SP,SP,0x20,0xe,Nil,Nil)
0000a498: 20 00 0b e5 str r0, [r11, #-32] ; STRi12(R0,R11,0xffffffe0,0xe,Nil)
0000a49c: 24 10 0b e5 str r1, [r11, #-36] ; STRi12(R1,R11,0xffffffdc,0xe,Nil)
0000a4a0: 24 30 1b e5 ldr r3, [r11, #-36] ; LDRi12(R3,R11,0xffffffdc,0xe,Nil)
0000a4a4: 0c 30 0b e5 str r3, [r11, #-12] ; STRi12(R3,R11,0xfffffff4,0xe,Nil)
0000a4a8: 0c 30 1b e5 ldr r3, [r11, #-12] ; LDRi12(R3,R11,0xfffffff4,0xe,Nil)
0000a4ac: 0c 30 93 e5 ldr r3, [r3, #12] ; LDRi12(R3,R3,0xc,0xe,Nil)
0000a4b0: 03 20 a0 e1 mov r2, r3 ; MOVr(R2,R3,0xe,Nil,Nil)
0000a4b4: 0c 30 1b e5 ldr r3, [r11, #-12] ; LDRi12(R3,R11,0xfffffff4,0xe,Nil)
0000a4b8: 08 30 93 e5 ldr r3, [r3, #8] ; LDRi12(R3,R3,0x8,0xe,Nil)
0000a4bc: 02 30 63 e0 rsb r3, r3, r2 ; RSBrr(R3,R3,R2,0xe,Nil,Nil)
0000a4c0: 23 31 a0 e1 lsr r3, r3, #2 ; MOVsi(R3,R3,0x13,0xe,Nil,Nil)
0000a4c4: 10 30 0b e5 str r3, [r11, #-16] ; STRi12(R3,R11,0xfffffff0,0xe,Nil)
0000a4c8: 10 30 1b e5 ldr r3, [r11, #-16] ; LDRi12(R3,R11,0xfffffff0,0xe,Nil)
0000a4cc: 00 00 53 e3 cmp r3, #0x0 ; CMPri(R3,0x0,0xe,Nil)
0000a4d0: 4b 00 00 0a beq #0x12c ; Bcc(0x12c,0x0,CPSR)
end(dired_dump_obstack_ENTRY)
Outputs BIL that references the memory using plenty of variables, name, like mem
, m2
, src
, although all loads and stores should have side effects only on one memory, namely the machine memory.
begin(dired_dump_obstack_ENTRY) {
orig_base_3161 := SP
mem := mem with [orig_base_3161 + 0xFFFFFFFC:32, el]:u32 <- LR
mem := mem with [orig_base_3161 + 0xFFFFFFF8:32, el]:u32 <- R11
SP := SP - 0x8:32
R11 := SP + 0x4:32
SP := SP - 0x20:32
m1 := m2 with [R11 + 0xFFFFFFE0:32, el]:u32 <- R0
m1 := m2 with [R11 + 0xFFFFFFDC:32, el]:u32 <- R1
R3 := src[R11 + 0xFFFFFFDC:32, el]:u32
m1 := m2 with [R11 + 0xFFFFFFF4:32, el]:u32 <- R3
R3 := src[R11 + 0xFFFFFFF4:32, el]:u32
R3 := src[R3 + 0xC:32, el]:u32
R2 := R3
R3 := src[R11 + 0xFFFFFFF4:32, el]:u32
R3 := src[R3 + 0x8:32, el]:u32
R3 := R2 - R3
unshifted_3203 := R3
R3 := unshifted_3203 >> 0x2:32
m1 := m2 with [R11 + 0xFFFFFFF0:32, el]:u32 <- R3
R3 := src[R11 + 0xFFFFFFF0:32, el]:u32
orig1_3213 := R3
orig2_3214 := 0x0:32
dest_3211 := R3 - 0x0:32
CF := orig2_3214 <= orig1_3213
VF := high:1[(orig1_3213 ^ orig2_3214) & (orig1_3213 ^ dest_3211)]
NF := high:1[dest_3211]
ZF := dest_3211 = 0x0:32
if (ZF = true) {
jmp dired_dump_obstack_0x178
}
}
On OS X, make reinstall fails. It seems reinstall runs uninstall then install. However, install isn't removing bap from opam.
davids-air-5:bap dbrumley$ make reinstall
ocaml setup.ml -reinstall
W: Nothing to install for findlib library 'types_test'
W: Nothing to install for findlib library 'image_test'
W: Nothing to install for findlib library 'dwarf_test'
ocamlfind: Package bap is already installed
So, we have byteweight
merged into bap
, but there're few issues, we need to discuss. We should understand, that currently it is mostly not a part of bap, but more a demo application. That's not bad, but it is not enough.
What we should do next, is to split it into a library/application parts. So that we can grab some neat stuff from byteweight, so that it can be used inside bap
itself. Also, we need to make a plugin of byteweight
. But before doing this we should figure out what kind of service does it provide. Currently in BAP there is only one service named bap.image
that provides facilities to load and parse binary files. So it is time to add new service. Now we should try to figure out an interface of the service. Indeed, we need to figure out two interfaces, one for backend (i.e., service provider) and other for the frontend (service itself) (cf., elf_backend and image). So, lets start from the frontend. Two variants came to my mind: something like function start identifier (FSI) or function boundaries identifier (FBI). Currently, only dwarf can provide the latter. But since dwarf can be used in real conditions we can forget about it. Also we have elf itself, that can provide some useful information even for stripped binary. But afaik it can also provide only function starts (correct me if I'm wrong, but all the we can rely is dynsym table coupled with relocation table, and they give us only starting locations). So, my idea is, instead of starting with FBI
and then downcasting it to FSI
we should start with the latter. Another question is symbol names. I thing that function boundaries and function names are orthogonal ideas, and shouldn't be mixed. It would be a better idea to have a separate service, that will resolve names. So back to FSI. What this service actually can provide is the predicate over binary, that marks certain addresses as starts of functions, that gives us image -> addr seq
or mem -> arch -> addr seq
. The problem with this interfaces, is that it doesn't grant any access to file metainformation, so we can't implement any providers, that rely on this (like dwarf, or elf). That means, that FSI backend should work on a lower level, it should work directly with file, so we came out with Bigstring.t -> arch -> addr seq
. Also, having in mind some other possible backend implementations, like based on llvm code, we can make it even a little bit more low-level:
Bigstring.t -> arch -> addr -> bool
. So, I'm eager to hear others. Everyone is welcome.
Since we're looking for plugins using compiled in reference to opam folder, bap executables can't be shipped as is. We need to figure out a best way to ship bap
executables as a bundle, with plugin system configured to search in a plugin folder.
The BAP oasis file currently says version 0.2. The previous BAP (before moving to github) went up through 0.8. Therefore, it does not seem to make sense to call this 0.2.
The original idea was to call this series BAP 1.x, as it is not backward compatible. There is a serious con here: 1.x might imply stability as opposed to newness.
We need a new name. I'm open to better solutions than 1.x. I would prefer 0.9 myself, and just say it's completely backward incompatible.
In the following code, emitted from 00 88 bd e8 pop {r11, pc}
the last statement should be before the jump:
begin(emit_mandatory_arg_note_0x28) {
orig_base_2605 <- SP
R11 <- mem[orig_base_2605 + 0x0:32, el]:u32
jmp mem[orig_base_2605 + 0x4:32, el]:u32
SP <- SP + 0x8:32
}
Currently instruction printer requires tabulation with proper tab stop set up, this should be either rewritten without tabulation, or something clever should be done, otherwise printing is broken
Consider
$ echo "0x31 0xd2 0x48 0xf7 0xf3" | bap-mc --show-inst --show-asm --arch x86
With 8c8b68e, the program terminates silently and returns 0
to the OS. It is not obvious to the user what went wrong. We should report the error to the console and also to the OS.
P.S. The failure lies in disassembler creation---x86
is not a valid cpu name in llvm.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.