yotann / bcdb Goto Github PK
View Code? Open in Web Editor NEWA database and infrastructure for distributed processing of LLVM bitcode.
License: Other
A database and infrastructure for distributed processing of LLVM bitcode.
License: Other
When invalidating a function with >10,000 cached results, bcdb invalidate is extremely slow (many minutes, CPU-bound). It's much faster to just run sqlite3 /path/to/bcdb 'DELETE FROM call WHERE fid = -1;'
, even though that should be equivalent. Probably the BCDB's connection setting pragmas (maybe the write-ahead log?) are making it slow.
There are various cases in the outlining code that I haven't fully thought through, and some of them are probably handled incorrectly. If we try to perform actual outlining, this will lead to incorrect code being generated.
lib/Outlining
constructFunction
in llvm/lib/Transforms/Utils/CodeExtractor.cpp
One concern I have, in looking at guided linking, is that it potentially causes a huge increase in the ROP surface of all programs linked using it. If it were possible to specify portions of the optimized set which, in the optimized output, must not have any (transitive) dependency between them, this could significantly alleviate the issue.
To illustrate, let's start with the simple optimized set given in Fig. 1 of the paper: program1 IR
needs library IR
; program2 IR
needs library IR
. If we consider the case where program1
is some normal program expected to be run by unprivileged users, and program2
is a tiny helper program that is SetUID in order to obtain specific resources it then hands off to program1
, but both use the same set of libraries, then guided linking may significantly reduce the overall security of the system.
If, however, it was possible to state that no dependency relationship may be created from code in program2 IR
to code in program1 IR
(with being in the same merged library counting as a dependency relationship in both directions), this problem could be avoided.
In the normal case, function @f
may have blockaddresses stored in global constant @g
, which is only used by function @f
. The obvious solution in this case is to put @g
in the split module along with @f
.
In the general case, blockaddresses for function @f
may be used by other functions, but this should be extremely rare and we don't need to handle it well.
Currently, splitting and joining causes DICompileUnits to be duplicated so each function gets its own copy of the compile unit. To fix this, we need to use the remainder module to keep track of which compile units are actually the same.
For debugging purposes, we should add checks at run time and raise an error if any of the constraints are violated. How to do this is explained in the paper.
When applied to a module with megabytes of debug metadata, the splitter is extremely slow and uses way too much memory.
Backtrace:
llvm::MDNode::operator new
llvm::DISubprogram::getImpl
llvm::DISubprogram::cloneImpl
llvm::MDNode::clone
MDNodeMapper::mapTopLevelUniquedNode
Mapper::mapMetadata [clone .part.318]
Mapper::mapMetadata
Mapper::remapInstruction
Mapper::remapFunction
llvm::ValueMapper::remapFunction
llvm::RemapFunction
ExtractFunction
bcdb::SplitModule
Currently the guided linker combines the entire merged library into a single module. For large sets of software, optimizing and compiling this module is very slow (e.g., LLVM+Clang takes several hours). We should add ThinLTO support so the merged library can be optimized faster.
Because split functions have no name, the globalopt
pass (included in opt -O1
) deletes them. If we give all the split functions a standardized name (like f
) this won't be a problem. However, any name we choose could potentially conflict with other names used by the program.
Another option: store split functions without a name, but give users the option to add a name when retrieving a function from the BCDB.
Currently, splitting and joining causes DICompileUnits to be duplicated so each function gets its own copy of the compile unit. To fix this, we need to use the remainder module to keep track of which compile units are actually the same.
lib/outlining/FalseMemorySSA.cpp
is based on MemorySSA.cpp
from LLVM 12. LLVM 13 has a few improvements to this file, which are probably worth copying over.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.