GithubHelp home page GithubHelp logo

plast-lab / cclyzer Goto Github PK

View Code? Open in Web Editor NEW
94.0 13.0 14.0 43.88 MB

A tool for analyzing LLVM bitcode using Datalog.

License: MIT License

Makefile 3.87% Python 24.85% Emacs Lisp 0.33% C++ 70.96%
llvm datalog c c-plus-plus analysis-framework pointer-analysis

cclyzer's Introduction

License MIT

CCLYZER

A tool for analyzing LLVM bitcode (generated either by C or C++) using Datalog.

This project uses a commercial Datalog engine, developed by LogicBlox Inc..

System requirements

  • A 64-bit flavor of Linux. Verify that you're running 64-bit Linux by running: uname -m which should return x86_64.
  • At least 4GB of available memory.
  • Python 2.7 or newer (but not Python 3.x). Available from the Python Website
  • Java Developer Kit version 6 or newer. Available from Oracle's Java website

Pre-installation steps

Install the LogicBlox engine

The LogicBlox engine needs to be installed. We recommend the PA-Datalog engine, which is a modified LogicBlox v3 engine, intended for use in program analysis projects.

(Alternatively, you can download a full-fledged LogicBlox engine (version 3.*) from the LogicBlox Download Page. You will need to [request an academic license] (http://www.logicblox.com/learn/academic-license-request-form/).)

You must also set the environment variable $LOGICBLOX_HOME and augment your $PATH accordingly. The following additions to either your .bashrc or .bash_profile should suffice, assuming that you have extracted the engine to /opt/lb/. If not, adjust the following lines appropriately:

export LOGICBLOX_HOME=/opt/lb/logicblox-3.10.14/logicblox
export PATH=$LOGICBLOX_HOME/bin:$PATH

Install LLVM

This step is not needed for newer Linux distributions, where you can install LLVM version 3.7 (or later) from the system's package manager.

  • Download LLVM 3.7.0 pre-built binary from the LLVM Download Page.
  • Untar the downloaded file to a destination path of your choice (e.g., /opt/llvm/) and modify permissions accordingly.
  • Add the /path/to/llvm-3.7.0/bin to your $PATH (by modifying your .bashrc or .bash_profile).

Additional Libraries

You will also have to install the following packages:

Fedora 20, 21, 22

# yum install boost-devel boost-python protobuf-devel python-pip python-devel

Fedora 24

# dnf install boost-devel boost-python protobuf-devel python-pip python-devel
# dnf install llvm-devel clang-devel

Ubuntu

# apt-get install build-essential libboost-dev libboost-filesystem-dev libboost-program-options-dev libboost-python-dev libprotobuf-dev libprotoc-dev protobuf-compiler python-pip python-dev

Ubuntu 15.10

In latest distro versions, that have switched to gcc 5, the binary compatibility between clang and gcc is broken (see bug 23529). So, the pre-built LLVM binaries will not work there.

Instead, for Ubuntu 15.10, you can:

  1. Skip the pre-built binary download step entirely, but otherwise follow the (Ubuntu) instructions

  2. Additionally install LLVM 3.7 and libedit from the system's package manager by running:

     # apt-get install llvm-3.7 libedit-dev
    
  3. When compiling the project, run make as follows:

     (venv)$ LLVM_CONFIG=llvm-config-3.7 make
     (venv)$ make install
    

YAML Configuration

To be able to easily customize your analysis via a configuration file, you will also need to install the python-yaml package.

The default user configuration will be automagically installed at ~/.config/cclyzer/config.yaml the first time you run the tool. Then, you can tweak this config file, e.g., to change the printed statistics and the loaded logic modules.

Installation

We recommend first to create a virtual environment by running:

$ pip install virtualenv  # if not already installed
$ cd /path/to/cclyzer/
$ virtualenv venv

To activate the virtual environment, run:

$ . venv/bin/activate
(venv)$    # <--- your prompt should change to something like this

Now, while inside the virtualenv, build cclyzer as follows:

(venv)$ make
(venv)$ make install

Then, you should be able to run the main cclyzer script that analyzes LLVM Bitcode. Try:

(venv)$ cclyzer -h
(venv)$ cclyzer analyze -h

Testing

The basic test suite comprises the GNU Core Utilities.

You may run all the tests with:

$ make tests.run

or a particular test, e.g., stty, with:

$ make test-stty

It is also possible to invoke a python interpreter for a more interactive experience:

$ python
>>> from cclyzer import *
>>> config = AnalysisConfig('./tests/coreutils-8.24/sort.bc', output_dir='./build/tests/sort')
>>> analysis = Analysis(config)
>>> analysis.run()
...
>>> print analysis.stats
# instructions        : 25417
# functions           :   438
# app functions       :   317
...
>>>

Troubleshooting

The warnings and errors that may come up during execution are not very informative. Instead, the log file located at $XDG_CACHE_HOME/cclyzer/cclyzer.log (which at most systems defaults to ~/.cache/cclyzer/cclyzer.log), or the system log, can be much more helpful.

cclyzer's People

Contributors

efferifick avatar gbalats avatar kferles avatar yanniss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cclyzer's Issues

Handle super-classes of zero size w.r.t. field name generation

Right now, whenever a class inherits from at least two other classes and one of them has zero size (e.g., a facade class, or the base case of a variadic data structure such as a tuple), the fact-import step fails with a functional dependency violation.

The reason is that these two superclasses would both be mapped to zero bit offset for their corresponding fields in pred::struct_type::field_name.

We must somehow detect such zero-size supertypes and produce the correct mapping either at the front-end (during dwarf debug info parsing), or at the Datalog level by comparing with the known types of the Type entity at bit offset 0.

Python module Factgen error

Sir,
I am working on feature extraction and selection technique to make a scheduler for open Cl program. I have a issue when i run make file " blox compiler: Command not found" and
its show that module fact-gen missing. Please help me to fix it

Points-to analysis does not detect dereferences in optimized LLVM IR

Hello,

I have been using cclyzer for running points-to analyses on some C programs. I have run into a potential issue. I have been looking at the results in pointer-dereferences.tsv for the following C program:

#include <stdlib.h>
 
 
 int execute(double *b) {
     double k = *b;
     return (int)k;
 }
 
 
 int main(int argc, char *argv[])
 {
     double *t = (double *)NULL;
     execute(t);
     return 0;
 }

When I run cclyzer, it tells me that %t in main and %1 in execute are both pointers to *null*, which is what I expected.

When I apply the LLVM -mem2reg optimization to the C code, I get the following IR code:

; Function Attrs: nounwind uwtable
 define i32 @execute(double* %b) #0 {
   %1 = load double, double* %b, align 8
   %2 = fptosi double %1 to i32
   ret i32 %2
 }
 
 ; Function Attrs: nounwind uwtable
 define i32 @main(i32 %argc, i8** %argv) #0 {
   %1 = call i32 @execute(double* null)
   ret i32 0
 }

In this code snippet, %1 is a pointer dereference to null. However, pointer-dereferences.tsv does not contain any dereferences after analyzing this code with cclyzer. Is it possible to expand the points-to analysis to account for loading from a pointer that does not have an associated alloca instruction (i.e. using mem2reg to promote memory operations to register operations)?

Thanks,

Leo

Error when analyzing debugging information

I am trying to analyze the following LLVM IR code:

 ; ModuleID = 'struct2.ll'
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 
 %struct.foo = type { i32* }
 
 ; Function Attrs: nounwind uwtable
 define i32 @main(i32 %argc, i8** %argv) #0 !dbg !4 {
   %x = alloca %struct.foo, align 8
   call void @llvm.dbg.value(metadata i32 %argc, i64 0, metadata !14, metadata !15), !dbg !16
   call void @llvm.dbg.value(metadata i8** %argv, i64 0, metadata !17, metadata !15), !dbg !18
   call void @llvm.dbg.declare(metadata %struct.foo* %x, metadata !19, metadata !15), !dbg !24
   %1 = getelementptr inbounds %struct.foo, %struct.foo* %x, i32 0, i32 0, !dbg !25
   store i32* null, i32** %1, align 8, !dbg !26
   call void @llvm.dbg.declare(metadata !2, metadata !27, metadata !15), !dbg !28
   call void @llvm.dbg.value(metadata i32 7, i64 0, metadata !29, metadata !15), !dbg !30
   %2 = getelementptr inbounds %struct.foo, %struct.foo* %x, i32 0, i32 0, !dbg !31
   %3 = load i32*, i32** %2, align 8, !dbg !31
   call void @llvm.dbg.value(metadata i32* %3, i64 0, metadata !32, metadata !15), !dbg !33
   %4 = load i32, i32* %3, align 4, !dbg !34
   call void @llvm.dbg.value(metadata i32 %4, i64 0, metadata !35, metadata !15), !dbg !36
   call void @llvm.dbg.value(metadata !2, i64 0, metadata !32, metadata !15), !dbg !33
   ret i32 7, !dbg !37
 }
 
 ; Function Attrs: nounwind readnone
 declare void @llvm.dbg.declare(metadata, metadata, metadata) #1
 
 ; Function Attrs: nounwind readnone
 declare void @llvm.dbg.value(metadata, i64, metadata, metadata) #1

 attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-        pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false"      "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-  math"="false" "use-soft-float"="false" }
 attributes #1 = { nounwind readnone }
 
 !llvm.dbg.cu = !{!0}
 !llvm.module.flags = !{!11, !12}
 !llvm.ident = !{!13}
 
 !0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.8.1 (tags/            RELEASE_381/final)", isOptimized: false, runtimeVersion: 0, emissionKind: 1, enums: !2, subprograms: !3)
 !1 = !DIFile(filename: "null_deref_struct2.c", directory: "~/Documents/vivas/code/null_deref/      synthesize/structs")
 !2 = !{}
 !3 = !{!4}
 !4 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !5, isLocal: false,              isDefinition: true, scopeLine: 7, flags: DIFlagPrototyped, isOptimized: false, variables: !2)
 !5 = !DISubroutineType(types: !6)
 !6 = !{!7, !7, !8}
 !7 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
 !8 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !9, size: 64, align: 64)
 !9 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !10, size: 64, align: 64)
 !10 = !DIBasicType(name: "char", size: 8, align: 8, encoding: DW_ATE_signed_char)
 !11 = !{i32 2, !"Dwarf Version", i32 4}
 !12 = !{i32 2, !"Debug Info Version", i32 3}
 !13 = !{!"clang version 3.8.1 (tags/RELEASE_381/final)"}
 !14 = !DILocalVariable(name: "argc", arg: 1, scope: !4, file: !1, line: 7, type: !7)
 !15 = !DIExpression()
 !16 = !DILocation(line: 7, column: 14, scope: !4)
 !17 = !DILocalVariable(name: "argv", arg: 2, scope: !4, file: !1, line: 7, type: !8)
 !18 = !DILocation(line: 7, column: 26, scope: !4)
 !19 = !DILocalVariable(name: "x", scope: !4, file: !1, line: 9, type: !20)
 !20 = !DICompositeType(tag: DW_TAG_structure_type, name: "foo", file: !1, line: 3, size: 64, align: 64,        elements: !21)
 !21 = !{!22}
 !22 = !DIDerivedType(tag: DW_TAG_member, name: "bar", scope: !20, file: !1, line: 4, baseType: !23, size: 64,  align: 64)
 !23 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !7, size: 64, align: 64)
 !24 = !DILocation(line: 9, column: 16, scope: !4)
 !25 = !DILocation(line: 10, column: 7, scope: !4)
 !26 = !DILocation(line: 10, column: 11, scope: !4)
 !27 = !DILocalVariable(name: "w", scope: !4, file: !1, line: 13, type: !7)
 !28 = !DILocation(line: 13, column: 11, scope: !4)
 !29 = !DILocalVariable(name: "v", scope: !4, file: !1, line: 13, type: !7)
 !30 = !DILocation(line: 13, column: 9, scope: !4)
 !31 = !DILocation(line: 16, column: 11, scope: !4)
 !32 = !DILocalVariable(name: "y", scope: !4, file: !1, line: 12, type: !23)
 !33 = !DILocation(line: 12, column: 10, scope: !4)
 !34 = !DILocation(line: 18, column: 9, scope: !4)
 !35 = !DILocalVariable(name: "z", scope: !4, file: !1, line: 13, type: !7)
 !36 = !DILocation(line: 13, column: 13, scope: !4)
 !37 = !DILocation(line: 22, column: 5, scope: !4)

When I run cclyzer, I get the following error message:

~/clang+llvm-3.8.1/include/llvm/Support/Casting.h:95: static bool llvm::isa_impl_cl<To, const From*>::doit(const From*) [with To = llvm::UndefValue; From = llvm::Value]: Assertion `Val && "isa<> used on a null pointer"' failed.
Aborted (core dumped)

When I run cclyzer on the code without including any debugging information, there are no errors

Slowdown due to Value::printAsOperand() method

This was first posted on the LLVM Dev mailing list by @kferles. Yet it remains unresolved.

The tool makes use of the Value::printAsOperand() method to print operands from several LLVM bitcode instructions to the CSV file. But this approach doesn't scale and the problem seems to be the Value::printAsOperand() method, based on some profiling.

The problem is the slow path of this method, which constructs a TypePrinting object from scratch, every time this path is triggered. It seems that, each time this slow path is taken, it invokes methods (e.g., TypeFinder::run()) that perform many module-wide calculations that are redundant, except for the first time they are performed. This whole process accounts for most of the execution time.

We should find a faster way to perform the same task without relying on any internal API, since we want to keep our tool as an LLVM client.

Understanding `template_type`

I'm trying to debug some unexpected behavior (actually using the Souffle port, but the question applies to both) and am confused about what the template_type rule is doing here:

template_type(Type) <-

It doesn't seem to be referring to templates in the C++ sense I'm thinking, as LLVM does not append .base to the declarations for template classes. Instead, that suffix seems to be related to padding as it pertains to inheritance.

I'm missing some class hierarchy data because primary_superclass relies on _typeinfo_class_type, and that rule says that both template_type and template_typeinfo must hold (or both not hold). However, those two rules seem to be expressing quite different, unrelated things. One is looking for < > symbols, indicating a template, while the other is looking at padding information.

I'm probably just misunderstanding. Could anyone please provide some intuition for what's going on with this portion of the logic code?

boost/make_unique.hpp not found

I'm getting an error when I'm trying to make cclyzer. I have installed the boost libraries, but I'm still receiving the following error when I run make:

src/DebugInfoProcessor.cpp:1:33: fatal error: boost/make_unique.hpp: No such file or directory
 #include <boost/make_unique.hpp>

Any suggestions?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.