mchalupa / dg Goto Github PK
View Code? Open in Web Editor NEW[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.
License: MIT License
[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.
License: MIT License
we currently leak BBlocks and the nodes that are created as extra operands
we look only to store and load, but if we have callinst, there can be it too (even via constant expr)
In post-dominators and post-dominance frontiers
The BB's in DGParams has no key set, but we cannot set them to nullptr, because key does not have to be pointer
atm we have control dependency only on first block's node and arguments - if we slice away all of this, but leave some other instruction there, the function will go away
A class that will take DG and runs DFS on it -- instead of hardcoding the DFS right into the code. Create some Generic class and then derive DFS and BFS from it
when we have
const char *array[] = { "first", "second" }
then we'd should have points-to info:
node pts-to array
mem[0] pts-to "first"
mem[8] pts-to "second"
but we have only node pts-to array
in addOutParamsEdges on line 295, use getObjectRange instead of manually iterating over the definitions
#include <assert.h>
int a;
void foo(void)
{
assert(a == 1);
}
int main(void)
{
a = 1;
foo();
return 0;
}
We use BBlock everywhere, so use it here too
There's bug in computing post-dominance frontiers. Don't know where - the code is from llvm (check if the output is the same as in llvm) and the slicing ignores the frontiers in some cases...
Just metabug for taking notes
this is initialization of structure with function pointers
%struct.callbacks = type { i32 (i32)*, void (i32*)* }
@main.C = private unnamed_addr constant %struct.callbacks { i32 (i32)* @inc, void (i32*)* @zero }, align 8
%C = alloca %struct.callbacks, align 8
%0 = bitcast %struct.callbacks* %C to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* bitcast (%struct.callbacks* @main.C to i8*), i64 16, i32 8, i1 false)
and there are also zeroing using these function that should be considered as a setting pointer to null
When we call a functoin via unknown pointer, we slice it away. We must assume that any function that has the same prototype can be called
Use <> and set correct include paths instead of "" with relative paths
we create new root even when the root is existing block
create memory object for argv and inside of it mem object for the "strings" (probably one with UNKNOWN_OFFSET)
Now we have only generic template and we specialize it. So we cannot use the full power of inheritace. Add generic abstract interface, so that we can do something like:
DependenceGraph *dg = new llvmdg::DependenceGraph();
Node *node = new llvmdg::Node(value);
dg->addNode(value);
dg->slice(...)
NOTE: this depends on adding our edges iterator
If we slice away entry BB, the graph still has the dangling reference set
There are no guards around parts of code that use CFG (ENABLE_CFG)
For indefinite loops (that can be identified while compiling, e. g. while(1) {...} ) there's no post-dominator tree, since it has no end. Therefore there are no control dependencies and the slicer incorrectly removes the loop even when it should be in the slice.
Fix it (probably) by implementing algorithm from [1]
[1] Danicic Sebastian, Barraclough, R. W., et al. A unifying theory of control dependence and its application to arbitrary program structures
What if we for example have a int and re-cast it to char to access particular bytes?
BBlockDFS is OK, but DFS inherits from NodesWalk and NodesWalk uses CD and DD for walking graph, so DFS is not DFS...
Any! Some easy is enough,.
from the node of interest, like assert or similar
so that we can slice more precisely
critical
for node is is addParameters and for graph it is setParameters. I'm pro setParameters for both
loads of pointers that were not initialized do not have any points-to set
filter takes as id anything that starts with a number (probably atoi?) use str_to_uint
(wldbg) f d 1e
Didn't find filter with id '1'
Assertion `Offset.getBitWidth() == DL.getPointerSizeInBits(getPointerAddressSpace()) && "The offset must have exactly as many bits as our pointer."' failed.
we need to keep it correct, but we can make it more precise. Take a look at test4, it is correct, but can be sliced much more
According to dfsorder numbers printed by DG2Dot it looks like BFS or some random order, not DFS
Tests for:
-- Future --
If we run DataFlowAnalysis and in a runOnBlock/Node for some reason e. g. a DFS, then it does not work as expected - it is because both classes are derived from NodesWalk and nodes walk has only one global walk_run_counter.
We could fix it by replacing walk_run_counter by the address of the analysis (it is a number too), but two analyses can have the same address (new -> delete -> new)...
make use of namespaces when we have them! This is not a C.
we must assume they have modified the memory, otherwise it is unsound. Also I'm not sure we are adding def-use edges properly, because the callinst can be in constant expr (bitcast) and in that case we won't add the def-use edges at all
Do it generic container (DGContainer ?) and move it into ADT subdirectory
If that is everywhere the code is not better readable...
we always know how many of them there will be
Instead of pointer-> nodes set
use records of the form
memory [from : to] -> nodes set
meaning that memory with offset from to to was defined on these nodes. It will simplify it a lot
Let analysis use pointer state subgraph that will be build by backend but in some independent, generic way, so that points-to analysis won't need to know the implementation of nodes
Add iterator that will be independent of used container for edges.
most of malloc's pointers have cropped offset, because the type of returned memory is *i8, which is of size 1, and handleGep then cuts off the offsets
It's glued together and buggy... rethink it and refactor
When pointer points to some unknown location or with unknown offset, is it working?
fails on this program
char *
remove_newline(char *str)
{
char *p = str;
if (!p)
return NULL;
while (*p) {
if (*p == '\n') {
*p = 0;
break;
}
++p;
}
return str;
}
int main(void)
{
char str[] = "This is a string\n";
remove_newline(str);
assert(str[sizeof str - 1] == 0);
return 0;
}
if GEP has constant negative offset (e. g. -8), it is converted to unsigned value and cropped, because the value is almost UINT64_MAX. We get UNKOWN_OFFSET, so it should be correct, but it is unprecise.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.