gaasedelen / lighthouse Goto Github PK
View Code? Open in Web Editor NEWA Coverage Explorer for Reverse Engineers
License: MIT License
A Coverage Explorer for Reverse Engineers
License: MIT License
Hi,
I compiled the CodeCoverage Pintool and tested a demo program.
How to parse the .log file generated by the tool? I opened it with "gedit" on Ubuntu, but just found invalid characters.
Hello,
I am tentatively using this tool to evaluate the coverage of a legacy code, by feeding it with some inputs. My setting is Binary Ninja + Intel Pin tool + C++ binary code compiled by gcc 7.3.0. The OS is Ubuntu 18.04.
So one thing I am a bit confused in the instruction coverage. For instance, in the coverage output shown in Binary Ninja, function foo
's instruction coverage is something like 95 / 187, and I interpreted it as foo
has in total 187 assembly instructions, among which 95 are covered.
However, when I disassemble the binary code, and count the number of instructions within function foo
, I got something different, say foo
has in total 875 instructions.
I am trying to understand this inconsistency; Am I missed anything here? Thanks.
Hi, it seems like there is a bug when loading the coverage file produced by the Frida script. I know it's experimental but you may be interested in fixing it. :)
Traceback (most recent call last):
File "/Applications/IDA Pro 7.0/ida64.app/Contents/MacOS/plugins/lighthouse/ida_integration.py", line 234, in activate
self.action_function()
File "/Applications/IDA Pro 7.0/ida64.app/Contents/MacOS/plugins/lighthouse/core.py", line 321, in interactive_load_file
created_coverage, errors = self.director.create_coverage_from_drcov_list(drcov_list)
File "/Applications/IDA Pro 7.0/ida64.app/Contents/MacOS/plugins/lighthouse/director.py", line 439, in create_coverage_from_drcov_list
if coverage.suspicious:
File "/Applications/IDA Pro 7.0/ida64.app/Contents/MacOS/plugins/lighthouse/coverage.py", line 210, in suspicious
percent = (bad/float(total))*100
ZeroDivisionError: float division by zero
The coverage data produced by PIN loads fine on the same executable. It is a simple C test program compiled to a Mach-O executable.
#include <stdio.h>
#include <string.h>
void main(int argc, char** argv) {
char foo[20];
gets(foo);
if (strcmp(foo, "a") == 0) {
printf("foo\n");
} else {
printf("bar\n");
}
}
I've attached the failing frida-cov.log
file.
I installed lighthouse, but have yet to use it for anything. However, despite not using it:
$ ls ~/.idapro/lighthouse_logs/ |wc -l
5
Example log:
$ cat ~/.idapro/lighthouse_logs/lighthouse.38900.log
09-28-2017 00:26:14 | Lighthouse | DEBUG: Cleaning logs directory
09-28-2017 00:26:37 | pip.utils | DEBUG: lzma module is not available
09-28-2017 00:26:37 | pip.vcs | DEBUG: Registered VCS backend: git
09-28-2017 00:26:37 | pip.vcs | DEBUG: Registered VCS backend: hg
09-28-2017 00:26:37 | pip.vcs | DEBUG: Registered VCS backend: svn
09-28-2017 00:26:37 | pip.vcs | DEBUG: Registered VCS backend: bzr
09-28-2017 00:26:37 | Lighthouse.STDOUT | INFO: [+] Using already installed pip (version 9.0.1)
This is based off of 0.6, a149769
Is this the intended behavior?
It would be great if when hovering a colored basic block, Lighthouse could display the name of at least one of the coverage files that is responsible for this block being hit (attribution)
A log file that is generated with DynamoRIO 7.0.0-RC1 on Linux has a different header than on Windows and Lighthouse fails to account for this.
The first lines on those log files look like this and lack some entries (probably) due to this: https://github.com/DynamoRIO/dynamorio/blob/master/ext/drcovlib/modules.c#L388
DRCOV VERSION: 2
DRCOV FLAVOR: drcov-64
Module Table: version 2, count 6
Columns: id, base, end, entry, path
0, 0x0000000000400000, 0x0000000000621000, 0x0000000000404030, /usr/bin/ls
Actual error thrown is
line 307, in _parse_module_v2
self.checksum = int(data[4], 16)
ValueError: invalid literal for int() with base 16: '/usr/bin/ls'
I'm on IDA 6.8 and Pseudocode view doesn't show any highlights.
What could go wrong?
It seems there's a bug in loading any drcov v4 trace:
lighthouse/plugin/lighthouse/parsers/drcov.py
Lines 427 to 429 in 8e09989
The length of data ought to be 9 (I think?) on Windows:
The coverage Pin tool outputs address incorrectly for 32 bit traces.
I think this can be fixed by explicitly casting ADDRINTs image.low_ and image.high_ at https://github.com/gaasedelen/lighthouse/blob/master/coverage/pin/CodeCoverage.cpp#L208 to to uint64_t, as the format string calls for that, and if we're on a 32 bit trace, the format string will pull too much off the stack - and will also break outputting the module name. Doing so appeared to fix the issue on my system, but I haven't tested enough to issue a PR knowing it won't break anything...
Tested on: IDA Pro Version 7.0.170914 Windows x64
To reproduce:
\dynamo\bin32\drrun.exe -t drcov -- "C:\myfile.exe"
[Lighthouse] Failed to load coverage C:/drcov.myfile.exe.07872.0000.proc.log
If you have a coverage window in your regular IDA view and you start a debugging session, the coverage window will switch to floating mode. It is not possible to move it back "inside" IDA.
My guess it that the "parent" window no longer exists and the coverage window is "orphan" hence it floats.
Atm it doesn't work with Python 3:
C:\Users\user\AppData\Roaming\Hex-Rays\IDA Pro\plugins\lighthouse_plugin.py: Missing parentheses in call to 'print'. Did you mean print(prefix_message)? (log.py, line 21)
Traceback (most recent call last):
File "C:\Program Files\IDA Pro 7.4\python\3\ida_idaapi.py", line 590, in IDAPython_ExecScript
exec(code, g)
File "C:/Users/user/AppData/Roaming/Hex-Rays/IDA Pro/plugins/lighthouse_plugin.py", line 1, in <module>
from lighthouse.util.log import logging_started, start_logging
File "C:/Users/user/AppData/Roaming/Hex-Rays/IDA Pro/plugins\lighthouse\util\__init__.py", line 3, in <module>
from .log import lmsg, logging_started, start_logging
File "C:/Users/user/AppData/Roaming/Hex-Rays/IDA Pro/plugins\lighthouse\util\log.py", line 21
print prefix_message
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(prefix_message)?
I haven't checked for the deprecated APIs but that would need to be checked too.
Would make readability easier and makes things a bit less confusing if the function names in the Coverage Overview are the same as in the Function Window
Lighthouse adds the base address within the function rebase_blocks
, however in instances where static binaries have been traced the basic block addresses have already been offset by the base.
I've created the following patch to first check if the base address has been applied and only do so when it hasn't.
diff --git a/plugin/lighthouse/util/misc.py b/plugin/lighthouse/util/misc.py
index 99d9d92..bb859df 100644
--- a/plugin/lighthouse/util/misc.py
+++ b/plugin/lighthouse/util/misc.py
@@ -214,7 +214,13 @@ def rebase_blocks(base, basic_blocks):
"""
Rebase a list of basic block offsets (offset, size) to the given imagebase.
"""
- return map(lambda x: (base + x[0], x[1]), basic_blocks)
+ return_map = []
+ for x in basic_blocks:
+ if x[0] & base:
+ return_map.append((x[0], x[1]))
+ else:
+ return_map.append((base + x[0], x[1]))
+ return return_map
def build_hitmap(data):
"""
Note that I initial tried this within map, but it lead to IDA crashing -- the for loop is much more stable.
I've been working on a project to convert qemu debugging information to drcov format and it is compatible with this patch.
This problem I noticed before since I first started using lighthouse. All the painting is perfect the data is perfect, but when I'm done working and decide to close IDA(or IDA64) it won't finish shutting down. It leaves me with the output windows, which i can close but cannot quit IDA. I'd have to force quit. I removed ida skins and other plugins.
Fortunately IDA does finish packing the database, so nothing is left undone.
To me it seems as if there's some loops still in process which doesn't let IDA shut down, but that is purely speculation.
version 7.0, os x 10.12.6
I've been trying out lighthouse for coverage browsing, it's very polished, thanks!
I ran into one issue, the module name matching is too relaxed. If I have coverage for module "foo", but a module is listed earlier in the list called "foobar", then it matches and lighthouse claims there is no coverage.
I believe this is because you search for modules with fuzzy
matching by default like this:
# attempt lookup using case-insensitive filename
for module in self.modules:
if module_name.lower() in module.filename.lower():
return module
I manually edited my coverage file so that unrelated modules didn't contain my module as a substring, and then it worked perfectly.
>>> "foo".lower() in "foobar".lower()
True
I think fuzzy
should be the fallback, not the default, WDYT?
I am test on the windows 7 sp1 x86, the .log files looks like correct, but the lighthouse in IDA show the coverage is 0%
DynamoRIO-Windows-6.1.1-3 x86 , the log file version mismatch?
File "D:/IDA 6.8/plugins\lighthouse\parsers\drcov.py", line 123, in _parse_module_table_header
version_data, count_data = field_data.split(", ")
ValueError: need more than 1 value to unpack
Hi!
Quick issue regarding coverage, I added some extra debug info when a coverage file
matches given a module:
-- snip --
filename: RPCRT4.dll
module name: gdi32.dll
filename: LPK.dll
module name: gdi32.dll
filename: GDI32.dll
-- snip --
As you can see, even though GDI32 coverage was there, it would yield "Failed to find module GDI32.dll in coverage data" . The following patch, fixes that for me:
diff --git a/plugin/lighthouse/parsers/drcov.py b/plugin/lighthouse/parsers/drcov.py
index 0cff331..457af96 100644
--- a/plugin/lighthouse/parsers/drcov.py
+++ b/plugin/lighthouse/parsers/drcov.py
@@ -44,7 +44,7 @@ class DrcovData(object):
# locate the coverage that matches the given module_name
for module in self.modules:
- if module.filename == module_name:
+ if module.filename == module_name or module.filename.lower() == module_name:
mod_id = module.id
break
Cheers!
can i get coverage related data in the python api of ida?
so that the data could be processed further?
ps awesome work!
If I understand your source code correctly this won't work for 64bit:
class DrcovBasicBlock(Structure):
"""
Parser & wrapper for basic block details as found in a drcov coverage log.
NOTE:
Based off the C structure as used by drcov -
/* Data structure for the coverage info itself */
typedef struct _bb_entry_t {
uint start; /* offset of bb start from the image base */
ushort size;
ushort mod_id;
} bb_entry_t;
"""
_pack_ = 1
_fields_ = [
('start', c_uint32),
('size', c_uint16),
('mod_id', c_uint16)
]
and the block starts are indeed incorrect when you load a drcov file with 64bit addresses.
cc @domenukk @RobertBuhren
Compiled the Codecoverage Pintool with TARGET:=ia32
for 32 bit dylib. And the Overview doesn't have the color heat map.
IDA view, does show traced instructions though.
The desire to prefix functions from the coverage overview was raised as an aside to issue #23. I think this is a reasonable request for the next release (v0.7)
Hi,
First, thanks for LightHouse, it's awesome.
I feel that this issue have already been more or less discussed, but never on the parsing side only.
I've developed my own LightHouse format coverage dumper around the Unicorn Engine, and it's working great.
One of my target is a kernel that loads at address 0, but the .text starts at an offset like 0x0000842000000000.
So the offset between the start of my module and my basic blocks cannot be contained on the 32 bits of the "start" field of the "_bb_entry_t" structure.
typedef struct _bb_entry_t {
uint start; /* offset of bb start from the image base */
ushort size;
ushort mod_id;
} bb_entry_t;
Do you think LightHouse should support this on the parsing side ?
I think it should, and the patch should not be too complicated.
We could add a new field to the header of the BB Table, something like "Version" like in the Module Table header.
And the version 2 of this header would use a wider structure with a uint64_t as start.
If you think this is fine I can try to do a pull request.
An other way is to have a new field in the module table header that takes an offset to add to the image base.
Actually i've just noticed the version 4 of the header already support an "offset" field in the declaration of a module, but I didn't see the offset used anywhere. Maybe still in development ? Or maybe I completely missed something ?
Thanks for the help !
frida-drcov.py
can not calculate coverage for the dll load in the middle.
Hello.
Your plugin is awesome! DBI is better that breakpoint tracing...
I have one question. Do you know anything free solutions for code-trace analys/visualisation (not coverage) with all registers and memory states??
For example REVEN AXION (https://www.youtube.com/watch?v=5WNRplDPf5s) but his price is around 30k$/host. Or another example - QIRA, but it is for small-linux applications only...
I think, getting full-system code-trace for difficult applications via PANDA (https://github.com/panda-re/panda), but I dont know tools for analisys these traces.
(sorry for my bad english)
Hi,
Thanks for the great tool but not sure if you are aware that distribution of compiled Pin binaries is not allowed as per the End User Agreement. We are only allowed to distribute the Pin Tools in a source code form.
I did not know of any other way to contact you. That is why I though I should inform you here. Thanks
on 3.11 string gave unidentified identifier error on compiling,
add
using namespace std;
to compile.
Function names doesn't update in Coverage Overview.
Also it would be nice to use your other plugin, Prefix, in Overview.
Hi,
This is more a feature request than an actual bug but creating it here I think is best?
Could we add support for gcov [1][2] or any other Linux coverage output to lighthouse? This would be amazing as all other coverage tools from the README are for Windows afaict?
Note that for gcov, you need to have the source code but it would still be valuable to visualise in IDA the basic blocks being taken?
Thanks,
[1] https://gcovr.com/guide.html
[2] https://github.com/gcovr/gcovr
Hi;
Thank you for your great plugin. I want to show 1500 drcov log files; but I can't see more than 28(A-Z) coverage files in combo box. Is there any limitation to show more coverage files?
Best Regards
Hi all,
I get the following exception when I try to load a overage file:
[Lighthouse] Successfully loaded 1 coverage file(s)...
Exception in thread DatabasePainter:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/robert/repo/ida-7.2/plugins/lighthouse/painting/painter.py", line 400, in _async_database_painter
result = self._paint_database()
File "/home/robert/repo/ida-7.2/plugins/lighthouse/painting/painter.py", line 295, in _paint_database
if not self._priority_paint():
File "/home/robert/repo/ida-7.2/plugins/lighthouse/painting/ida_painter.py", line 413, in _priority_paint
cursor_address = idaapi.get_screen_ea()
File "/home/robert/repo/ida-7.2/python/ida_kernwin.py", line 2251, in get_screen_ea
return _ida_kernwin.get_screen_ea(*args)
RuntimeError: Function can be called from the main thread only
The "Coverage Overview" windows is successfully filled with the correct data, however, the disassembly output and the graph view are not painted.
IDA Version: Version 7.2.181105 Linux x86_64 (32-bit address size)
Regards,
Robert
Related to #60
do not support IDA7.0, indicting "Invalid Win32"
Hi,
Thank you for this excellent tool.
Can I suggest that you add a comment to download pin version "pin-3.4-97438" (gf90d1f746) as that is the only version that will work with your compiled CodeCoverage dlls ?
I just tried Lighthouse with BinaryNinja(latest release) and got an error:
Loaded Python plugin 'lighthouse'
Traceback (most recent call last):
File "%appdata%\AppData\Roaming\Binary Ninja\plugins\lighthouse_plugin.py", line 1, in
from lighthouse.util.log import logging_started, start_logging
File "%appdata%\AppData\Roaming\Binary Ninja\plugins\lighthouse\util_init_.py", line 3, in
from .log import lmsg, logging_started, start_logging
File "%appdata%\AppData\Roaming\Binary Ninja\plugins\lighthouse\util\log.py", line 5, in
from .disassembler import disassembler
File "%appdata%\AppData\Roaming\Binary Ninja\plugins\lighthouse\util\disassembler_init_.py", line 31, in
disassembler = BinjaAPI()
File "%appdata%\AppData\Roaming\Binary Ninja\plugins\lighthouse\util\disassembler\binja_api.py", line 106, in init
self._init_version()
File "%appdata%\AppData\Roaming\Binary Ninja\plugins\lighthouse\util\disassembler\binja_api.py", line 116, in _init_version
major, minor, patch = map(int, disassembler_version.split("."))
ValueError: invalid literal for int() with base 10: '1344 Personal'
Python plugin 'lighthouse_plugin' could not be loaded
The problem is that the version string is of format '1.1.1344 Personal' so I fixed with:
disassembler_version = binaryninja.core_version.split("-", 1)[0] if "-" in binaryninja.core_version else binaryninja.core_version.split(" ", 1)[0]
Hey so, we've talked about this a bit but I wanted to document why lighthouse supporting other formats would be nice while it was still fresh.
So, drcov is useful in that there are easy, cross-platform tools to generate it, however it has some pretty significant shortcomings which I'm running into. Specifically drcov is made up of a header which gives the module maps and then a series of tuples (module id, bb offset, bb size). The main issue here is the bb size field. If you're generating a trace with someone that is aware of the bb sizes (e.g. a dbi), this is all cool, however if youre dumping a trace from something that is not bb aware (e.g. an emulator or collecting code coverage via sampling) you just have a list of PC values.
Assuming you have have a module map and a list of PC values there are a few things you could do:
Basically both these require pre-processing the coverage in IDA before loading, which is doable but is a pain in the ass.
So, I'm pretty agnostic with regards to what the actual format is, but the feature request is the ability to load any coverage data format which can be generated from the module mappings and a list of PC values.
Hi,
I consider that it will be useful to have a feature to disable painting in the disassembly.
P.S. Thank you for this great tool 👍
In Hex-rays decompiler you can right click and "mark as decompiled".
In the code coverage results I'd like to see a column that shows this status or some way to filter out items that have been marked as decompiled.
I'm experimenting with Lighthouse and IDA 7, but I get the following error in the console at startup:
LoadLibrary(C:\Program Files\IDA Demo 7.0\plugins\CodeCoverage.dll) error: %1 is not a valid Win32 application.
C:\Program Files\IDA Demo 7.0\plugins\CodeCoverage.dll: can't load file
I'm using Lighthouse 0.6.0. Can you make Lighthouse 0.6.1 Windows binaries available? Thanks!
Hello i can't make the plugin load on IDA Pro 7
I am using Windows 10 x64
It gives error like this:
LoadLibrary(C:\Program Files\IDA 7.0\plugins\CodeCoverage.dll) error: %1 not a valid Win32 application.
C:\Program Files\IDA 7.0\plugins\CodeCoverage.dll: can't load file
Loading a coverage file, closing the coverage overview window, and re-opening a coverage file (the same or a different) will trigger the below error message:
[Lighthouse] Successfully loaded 1 coverage file(s)...
[Lighthouse] Failed to map coverage D:\cov\drcov.00204.0000.proc.log
[Lighthouse] - Internal C++ object (CoverageModel) already deleted.
[Lighthouse] Successfully loaded 0 coverage file(s)...
I'm also hitting a NULL deref in Python at random times (haven't been able to trigger it reliably yet but IDA crashes), and I'm wondering if it's not related to this issue. Let me know when you have a fix so that I can follow-up on this :).
Hi!
I consider that it would be useful to be able to export coverage data in JSON format, as such a feature will permit to use the coverage data in another systems/contexts.
An example: having n
number of binaries, we can call an IDA Pro
Python script using command line switches to spit coverage for every binary, then we can parse the log(s) and compare coverage.
Thank you!
Hello,my IDA is IDA_Pro 6.8 , How can I get the effect like the "Coverage Painting" show?
In my IDA disassembly,graph and Pseudocode views, there aren't code painted blue.....
Thx~
Hi, I found nothing in trace.log
I'm using Windows 10 x64
it gives errors like this
A: build\Source\pin\internal-include-windows-ia32\context_windows.H: LEVEL_VM::WINDOWS_PCTXT::BaseAddrOf: 325: assertion failed: 0 != ((1 << f) & cmask)
It would be good to have a feature like a Diff mode where you compare two composers output.
Nodes that appear in only one of can be colored yellow and those that appear in both blue .
Also there can be a column in Diff Overview which shows similarity and each row should be colored based on similarity.
The definition of similarity is a bit tricky it's good to have multiple algorithms for it.
I remember BinDiff had good diff views you can refer to.
BTW, This is a great plugin, I was looking for this for a long time. Thanks :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.