galoisinc / crucible Goto Github PK

Crucible is a library for symbolic simulation of imperative programs

Haskell 8.12% Makefile 0.01% Java 0.01% Shell 0.08% C 0.55% COBOL 0.22% LLVM 0.48% Go 0.02% Rust 59.08% Dockerfile 0.03% C++ 0.01% Python 0.06% Assembly 0.02% HTML 30.91% WebAssembly 0.01% Mermaid 0.01% R 0.01% Ruby 0.42%

crucible's Introduction

Introduction

Crucible is a language-agnostic library for performing forward symbolic execution of imperative programs. It provides a collection of data-structures and APIs for expressing programs as control-flow graphs. Programs expressed as CFGs in this way can be automatically explored by the symbolic execution engine. In addition, new data types and operations can be added to the symbolic simulator by implementing fresh primitives directly in Haskell. Crucible relies on an underlying library called What4 that provides formula representations, and connections to a variety of SAT and SMT solvers that can be used to perform verification and find counterexamples to logical conditions computed from program simulation.

Crucible has been designed as a set of Haskell packages organized so that Crucible itself has a minimal number of external dependencies, and functionality independent of crucible can be separated into sub-libraries.

Currently, the repository consists of the following Haskell packages:

crucible provides the core Crucible definitions, including the symbolic simulator and control-flow-graph program representations.
crucible-llvm provides translation and runtime support for executing LLVM assembly programs in the Crucible symbolic simulator.
crucible-jvm provides translation and runtime support for executing JVM bytecode programs in the Crucible symbolic simulator.
crucible-saw provides functionality for generating SAW Core terms from Crucible Control-Flow-Graphs.
crux provides common support libraries for running the crucible simulator in a basic "all-at-once" use mode for simulation and verification. This includes most of the setup steps required to actually set the simulator off and running, as well as functionality for collecting and discharging safety conditions and generated assertions via solvers. Both the crux-llvm and crucible-jvm executables are thin wrappers around the functionality provided by crux.

In addition, there are the following library/executable packages:

crux-llvm, a standalone frontend for executing C and C++ programs in the crucible symbolic simulator. The front-end invokes clang to produce LLVM bitcode, and runs the resulting programs using the crucible-llvm language frontend.
crux-llvm-svcomp, an alternative entrypoint to crux-llvm that uses the protocol established for the SV-COMP competition. See here for more details.

crucible-jvm, also contains an executable for directly running compiled JVM bytecode programs, in a similar vein to the crux-llvm package.
crux-mir, a tool for executing Rust programs in the crucible symbolic simulator. This is the backend for the cargo crux-test command provided by mir-json. See the crux-mir README for details.
uc-crux-llvm, another standalone frontend for executing C and C++ programs in the Crucible symbolic simulator, using "under-constrained" symbolic execution. Essentially, this technique can start at any function in a given program with no user intervention and try to find bugs, but may raise false positives and is less useful for full verification than crux-llvm. See the README for details.

Finally, the following packages are intended primarily for use by Crucible developers:

crucible-cli provides a CLI for interacting with the Crucible simulator, via programs written in crucible-syntax.
crucible-llvm-cli provides a CLI for interacting with the Crucible simulator, via programs written in crucible-syntax with the extensions provided by crucible-llvm{,-syntax}.
crucible-syntax provides a native S-Expression based concrete syntax for crucible programs. It is useful for being able to directly interact with the core Crucible simulator without bringing in issues related to the translation of other front-ends (e.g. the LLVM translation). It is primarily intended for the purpose of writing test cases.

The development of major features and additions to crucible is done in separate branches of the repository, all of which are based off master and merge back into it when completed. Minor features and bug fixes are done in the master branch. Naming of feature branches is free-form.

Each library is BSD-licensed (see the LICENSE file in a project directory for details).

Quick start

Clone this repository and checkout the immediate submodules to supply the needed dependencies (git submodule update --init).

Crucible can be built with the cabal tool:

cabal update
cabal new-configure
cabal new-build all

Alternately, you can target a more specific sub-package instead of all.

Testing and Coverage

Testing with coverage tracking is done via cabal test --enable-coverage ... or cabal configure --enable-coverage, although additional workarounds will be needed as noted in #884 and haskell/cabal#6440.

Notes on Freeze Files

We use the cabal.GHC-*.config files to constrain dependency versions in CI. We recommand using the following command for best results before building locally:

ln -s cabal.GHC-<VER>.config cabal.project.freeze

These configuration files were generated using cabal freeze --enable-tests --enable-benchmarks. Note that at present, these configuration files assume a Unix-like operating system, as we do not currently test Windows on CI. If you would like to use these configuration files on Windows, you will need to make some manual changes to remove certain packages and flags:

regex-posix
tasty +unix
unix
unix-compat

Acknowledgements

Crucible is partly based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. N66001-18-C-4011. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency (DARPA).

crucible's People

Contributors

Stargazers

Watchers

crucible's Issues

what4 does not build with ghc 8.4.3

As of 2b2ac00 what4 fails to build (ghc 8.4.3) with a bunch of type errors in Builder. For example:

src/What4/Expr/Builder.hs:4479:20: error:
    • Could not deduce: ((w1 + 1) GHC.TypeNats.<=? w) ~ 'True
        arising from a use of ‘bvZext’
      from the context: 1 <= w
        bound by the type signature for:
                   integerToBV :: forall (w :: GHC.Types.Nat).
                                  (1 <= w) =>
                                  ExprBuilder t st
                                  -> SymInteger (ExprBuilder t st)
                                  -> NatRepr w
                                  -> IO (SymBV (ExprBuilder t st) w)
        at src/What4/Expr/Builder.hs:4473:3-13
      or from: (BaseIntegerType ~ 'BaseIntegerType, 1 <= w1)
        bound by a pattern with constructor:
                   BVToInteger :: forall (w :: GHC.Types.Nat) (e :: BaseType -> *).
                                  (1 <= w) =>
                                  e (BaseBVType w) -> App e BaseIntegerType,
                 in a pattern binding in
                      pattern guard for
                      an equation for ‘integerToBV’
        at src/What4/Expr/Builder.hs:4477:13-25
      or from: w ~ (w1 + (y + 1))
        bound by a pattern with constructor:
                   NatLT :: forall (y :: GHC.Types.Nat) (m :: GHC.Types.Nat).
                            NatRepr y -> NatComparison m (m + (y + 1)),
                 in a case alternative
        at src/What4/Expr/Builder.hs:4479:9-15
    • In the expression: bvZext sym w r
      In a case alternative: NatLT _ -> bvZext sym w r
      In the expression:
        case compareNat (bvWidth r) w of
          NatLT _ -> bvZext sym w r
          NatEQ -> return r
          NatGT _ -> bvTrunc sym w r
    • Relevant bindings include
        r :: Expr t (BaseBVType w1)
          (bound at src/What4/Expr/Builder.hs:4477:25)
        w :: NatRepr w (bound at src/What4/Expr/Builder.hs:4473:22)
        integerToBV :: ExprBuilder t st
                       -> SymInteger (ExprBuilder t st)
                       -> NatRepr w
                       -> IO (SymBV (ExprBuilder t st) w)
          (bound at src/What4/Expr/Builder.hs:4473:3)
     |
4479 |         NatLT _ -> bvZext sym w r
     |                    ^^^^^^^^^^^^^^

Using NULL as a C function pointer doesn't work

The following C program (compiled to LLVM bitcode) works just fine with Crucible. (I am testing with the crucible-integration branch of saw-script.)

#include <stddef.h>

typedef int func(int i);

int inc (int x) {
	return x + 1;
}

void apply (func *f, int *x) {
	*x = f(*x);
}

void test (int *x) {
	apply(inc, x);
}

However, if we add a test for NULL to the definition of apply

void apply (func *f, int *x) {
	if (f != NULL) { *x = f(*x); }
}

then we get an error from crucible:

at internal: arithmetic comparison on incompatible values Typed {typedType = PtrTo (FunTy (PrimType (Integer 32)) [PrimType (Integer 32)] False), typedValue = ValIdent (Ident "5")} ValNull

Similarly, if we modify function test to pass a NULL argument to apply

void test (int *x) {
	apply(NULL, x);
}

then we get a different error:

at internal: argument type mismatch in call to ValSymbol (Symbol "apply") [Typed {typedType = PtrTo (FunTy (PrimType (Integer 32)) [PrimType (Integer 32)] False), typedValue = ValNull},Typed {typedType = PtrTo (PrimType (Integer 32)), typedValue = ValIdent (Ident "3")}] PtrTo (FunTy (PrimType Void) [PtrTo (FunTy (PrimType (Integer 32)) [PrimType (Integer 32)] False),PtrTo (PrimType (Integer 32))] False)

I also tried modifying src/Lang/Crucible/LLVM/Translation.hs to print out argTypes and argTypes' in the "argument type mismatch" error message. Here they are:

[RecursiveRepr LLVM_pointer, RecursiveRepr LLVM_pointer]
[FunctionHandleRepr [BVRepr 32] (BVRepr 32), RecursiveRepr LLVM_pointer]

Implement remaining crucible expression formers in the concrete syntax

crucible-syntax now has pretty broad support for the core crucible language syntax. However, not all of the syntax formers have yet been exposed via the parser. I need to move on to other things for now, so this issue is here to remind me or someone else to go in and finish these up.

The remaining constructs are:

In addition, the following operations are not exposed, and might not be, because I'm thinking about removing them altogether:

AddSideCondition
BVUndef

Using a symbolic index into an array of function pointers doesn't work

This example is due to @glguy.

typedef int int_function(int);

int succ(int x) { return x+1; }
int pred(int x) { return x-1; }

int mytestfunction(int i, int j) {
    int_function *funs[] = { succ, pred };
    return funs[i](j);
}

Using the crucible-integration branch of saw, we can see that Crucible thinks that mytestfunction always calls succ and returns j+1.

load_crucible_llvm_module "test.bc";
crucible_llvm_verify "mytestfunction" []
  do {
    i <- crucible_fresh_var "i" (llvm_int 32);
    j <- crucible_fresh_var "j" (llvm_int 32);
    r <- crucible_execute_func [crucible_term i, crucible_term j];
    crucible_equal (llvm_int 32) r (crucible_term {{ j + 1 }});
  }
  z3;

(This returns Proof succeeded! @mytestfunction.)

If we make i concrete using saw-script, then the function behaves exactly the same way, always calling succ:

load_crucible_llvm_module "test.bc";
crucible_llvm_verify "mytestfunction" []
  do {
    let i = {{ 1:[32] }};
    j <- crucible_fresh_var "j" (llvm_int 32);
    r <- crucible_execute_func [crucible_term i, crucible_term j];
    crucible_equal (llvm_int 32) r (crucible_term {{ j + 1 }});
  }
  z3;

(This also returns Proof succeeded! @mytestfunction.)

Finally, specifying i = 1 by hard-coding it in another C function that calls mytestfunction can actually get it to look up index 1 and call pred.

int otherfunction(int j) {
    return mytestfunction(1, j);
}

crucible_llvm_verify "otherfunction" []
  do {
    j <- crucible_fresh_var "j" (llvm_int 32);
    r <- crucible_execute_func [crucible_term j];
    crucible_equal (llvm_int 32) r (crucible_term {{ j - 1 }});
  }
  z3;

So it appears that concrete values specified in saw-script are not treated exactly the same as concrete values from C/LLVM.

Symbolically execute recursive functions into transition relations

Given a recursive function with easy-to-handle parameters, extract a transition relation which relates the parameters to the recursive call to the original parameters.

Consider a more structured approach to conversions between representations

There are several notion of "type" in the crucible-llvm to SAW pipeline:

MemType/SymType
MemModel.Type (usually G.Type)
LLVM.AST.Type (usually L.Type)
TypeRepr (referred to as "Crucible type")

(Is this a complete list?)

Some of these have (partial) conversions between them:

tcType :: L.Type -> TC SymType
tcMemType :: L.Type -> TC (Maybe MemType)
resolveMemType :: SymType -> TC (Maybe MemType)
MemType :: MemType -> SymType
asCrucibleType :: G.Type -> (forall tpr. TypeRepr tpr -> x) -> x

There are also several notions of "value":

LLVMConst
LLVMVal
LLVMExpr
RegValue
CFG.Expr

With some conversions between them (these are somewhat more complex than the type conversions, because some of them require additional runtime information):

unpackMemValue :: ([...]) => sym ->LLVMVal sym ->IO (AnyValue sym) -- RegValue
liftConstant :: ([...]) => LLVMConst -> LLVMGenerator h s arch ret (LLVMExpr s arch)
packMemValue :: ([...]) => sym -> G.Type -> TypeRepr tp -> RegValue sym tp -> IO (LLVMVal sym)

and I'm currently writing LLVMConst -> LLVMVal.

Question: Can any of these "conversions" be systematized?

I found the above list by grepping through source code. Is there a way to codify these conversions? I think this might be helpful in the case of the types at least (where no additional runtime information is needed, and conversions are usually not in monads).

For superset/subset relationships (such as SymType/MemType), an Iso from the Lens package might be a good way to do this, e.g.

Iso' (Maybe SymType) MemType
Iso' (Maybe (forall tpr. TypeRepr tpr)) G.Type

For other overlapping-venn-diagram relationships (are there any of these?) something like semi-iso or partial-isomorphisms might do the trick.

Translation fails when C `extern` arrays are promoted to pointers

If the following C file

extern int arr[2];

int deref(int *x) {
	return x[0];
}

int test() {
	return deref(arr);
}

is compiled to LLVM and run through crucible (using crucible-integration branch of saw-script) we get the following error:

at internal: unsupported LLVM value: ValNull of type [2 x i32]

If the extern keyword is removed, then the example works.

LLVM memory model returns wrong values when reading symbolic pointers

The problem occurs when you read from a pointer with a symbolic base (i.e. allocation ID number) in a memory state where there has been some other more recent unrelated write to memory. In this situation (assuming the pointer offsets line up) the read from the symbolic pointer will whichever value was most recently written to memory.

Here's a minimal example. This function returns a pointer whose offset is 0, but whose allocation number is symbolic (involving an if-then-else on c). It also performs an unrelated write to memory address pz.

int* test(int c, int* px, int* py, int* pz) {
	*pz = 5;
	return (c ? px : py);
}

Now we make a spec that assumes c == 1, and then says that reading from px should give 5, the value that was actually written to pz. (The symbolic pointers necessitate the rewriting workaround from GaloisInc/saw-script#216.)

m <- llvm_load_module "test.bc";
crucible_llvm_verify m "test" [] false
  do {
    c <- crucible_fresh_var "c" (llvm_int 32);
    px <- crucible_alloc (llvm_int 32);
    py <- crucible_alloc (llvm_int 32);
    pz <- crucible_alloc (llvm_int 32);
    crucible_precond {{ c == 1 }};
    crucible_execute_func [crucible_term c, px, py, pz];
    crucible_return px;
    crucible_points_to px (crucible_term {{ 5:[32] }}); // this statement performs a read from px
  }
  do { simplify (addsimps [equalNat_ite] basic_ss); abc; };

The proof shouldn't succeed (px still points to uninitialized memory upon function exit) yet it does.

Merge LLVM.MemModel.Type and LLVM.MemType

These two datatypes are similar enough that one should be an instance of the other:

In Crucible.LLVM.MemModel.Type:

data Field v =
  Field
  { fieldOffset :: Offset
  , _fieldVal   :: v
  , fieldPad    :: Bytes
  }
  deriving (Eq, Ord, Show, Functor, Foldable, Traversable, Typeable)

In Crucible.LLVM.MemType:

data FieldInfo = FieldInfo
  { fiOffset    :: !Offset  -- ^ Byte offset of field relative to start of struct.
  , fiType      :: !MemType -- ^ Type of field.
  , fiPadding   :: !Size    -- ^ Number of bytes of padding at end of field.
  }
  deriving (Eq, Show)

Large global arrays cause unreasonable memory use

Consider the following very simple C program:

#include <stdio.h>

unsigned big[10000004];

int main(void) {
  printf("Hi!");
}

The following command sequence generates a bitcode file.

clang-3.8 -c -O1 -emit-llvm large-init.c

When attempting to load this module via Crucible, and unreasonable amount of memory is consumed during the translation process, driving my machine into swap space. Profiling indicates that the zeroExpand function is somehow involved.

Simplify Crucible assumption management

In adding path satisfiability checking to the SAWCore backend recently, @robdockins and I concluded that the path condition management done by the sbAddAssumption, sbPushBranchPred, sbBacktrackToState, and sbSwitchToState functions in the IsSimpleBuilderState class might be more complex or fragile than necessary. Perhaps it should be refactored.

Support inbounds GEP in LLVM constant expressions

The LLVM front end currently doesn't implement a translation for constant expressions involving GEP operations with the inrange flag.

https://github.com/GaloisInc/crucible/blob/master/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Constant.hs#L894

LLVM memory model can't convert between similarly-sized array types

Here's an example C program that takes a pointer to an array of word8s and writes word32s into it:

#include <stdint.h>

void foo(uint32_t *a) {
	a[0] = 0x01234567;
	a[1] = 0x89abcdef;
}

void bar(uint8_t *a) {
	foo((uint32_t *)a);
}

The LLVM translates into crucible just fine; the problem happens when we try to verify a spec for bar using an override for foo. Here is the saw-script:

load_crucible_llvm_module "castptr.bc";

foo_ov <- crucible_llvm_verify "foo" [] false
  do {
    p <- crucible_alloc (llvm_array 2 (llvm_int 32));
	crucible_execute_func [p];
	crucible_points_to p (crucible_term {{ [0x01234567,0x89abcdef] }});
  }
  z3;

bar_ov <- crucible_llvm_verify "bar" [foo_ov] false
  do {
    p <- crucible_alloc (llvm_array 8 (llvm_int 8));
	crucible_execute_func [p];
	crucible_points_to p (crucible_term {{ [0x67,0x45,0x23,0x01,0xef,0xcd,0xab,0x89] }});
  }
  z3;

The override for foo writes the value [0x01234567,0x89abcdef] (of type [2 x i32]) into the pointer a. Checking the postcondition in the spec for bar then involves reading from a at type [8 x i8] (which has the same size). However, this fails:

Loading module Cryptol
Loading file "/Users/huffman/Documents/saw/castptr.saw"
Proof succeeded! @foo
Registering override for `foo`
Executing override for `foo`
saw: user error ("crucible_llvm_verify" (/Users/huffman/Documents/saw/castptr.saw:13:11):
Invalid memory load: address (5, 0x0:[64]) at type [8 x i8])

Allow loading LLVM modules when some symbols can't be translated

At the moment, if an LLVM bitcode file contains anything that can't be translated into Crucible, loading the file fails. It would be useful to allow function definitions that can't be translated to be treated as function declarations, instead. Even nicer would be to have translation happen lazily, so that functions could be translated on demand, right before being called.

Translation fails for functions that write to globals

Compiling the following C file

static int count = 0;

void test() {
	count = 1;
}

to LLVM and running crucible (using crucible-integration branch of saw-script) yields a translation error:

at internal: Pointer type does not mach value type in store instruction

LLVM and pointer arithmetic

Could the LLVM memory model be updated to allow pointer arithmetic?

Here's an example C program that computes and returns the length of a string, but produces an error within the crucible-llvm library.

#include <stdio.h>
#include <stdint.h>

uint64_t mystrln(uint8_t *str) {
  uint8_t *s;
  for (s = str; *s != 0; ++s)
    ;
  return (s - str);
}

int main() {
  uint8_t foo[7] = {'f','o','o','b','a','r',0};
  printf("length of '%s' is %llu\n", foo, mystrln(foo));
  return 0;
}

The LLVM translation reports an error on line 8, indicating there is a

type mismatch when assigning register r11 , BVRepr 64 , RecursiveRepr LLVM_pointer") % 8:11

The LLVM output associated line 8 from the above c program converts the pointers s and str to integers before doing the subtraction.

; <label>:5                                       ; preds = %1
  %s.0.lcssa = phi i8* [ %s.0, %1 ]
  %6 = ptrtoint i8* %s.0.lcssa to i64, !dbg !43
  %7 = ptrtoint i8* %str to i64, !dbg !43
  %8 = sub i64 %6, %7, !dbg !43
  ret i64 %8, !dbg !44
}

toStorableType should be in a MonadFail

This function appears to be wrapped in a monad only so that it can call fail. Perhaps this would be better in a MonadFail, or even in a Maybe (since it has exactly one failure mode).

toStorableType :: (Monad m, HasPtrWidth wptr)
               => MemType
               -> m G.Type
toStorableType mt =
  case mt of
    IntType n -> return $ G.bitvectorType (G.bitsToBytes n)
    PtrType _ -> return $ G.bitvectorType (G.bitsToBytes (natValue PtrWidth))
    FloatType -> return $ G.floatType
    DoubleType -> return $ G.doubleType
    ArrayType n x -> G.arrayType (fromIntegral n) <$> toStorableType x
    VecType n x -> G.arrayType (fromIntegral n) <$> toStorableType x
    MetadataType -> fail "toStorableType: Cannot store metadata values"

LLVM Translation can not load VarArg functions

Using crucible via SAW, consider:

#include <unistd.h>
size_t some_vararg(int x, ... ){
    return 1;
}

int main()
{
    some_vararg(1);
    return 0;
}

And the compilation/loading:

% clang -c -emit-llvm vararg.c
% saw
...
version 0.2 (c0a7dec)

Loading module Cryptol
sawscript> load_crucible_llvm_module "vararg.bc"
saw: crucible type mismatch BVRepr 32 VectorRepr AnyRepr

Unsoundness in BVDomain implementation

Module Lang.Crucible.Utils.BVDomain implements an abstract domain for bitvectors. The abstract domain is used to recognize situations where a symbolic formula can be reduced to a specific concrete value.

The problem is that the current implementation (as of e1838b6) is unsound: Sometimes it says that a formula containing symbolic variables is equivalent to a specific concrete value, even when there are instantiations of the variables that produce a different value.

Example 1, where crucible says the result is always 1:

int test (uint16_t x) {
    uint32_t y = 0xffff;
    uint16_t z = x * y;
    if (z < 2) return 1;
    else return 0;
}

Example 2, where crucible says that the pointer dereference is always aligned to a multiple of 2 bytes (equivalently, that i+j is always even):

uint16_t badptr (uint16_t *p, uint8_t i, uint8_t j) {
    uint8_t *q = (uint8_t *)p;
    uint8_t *r = (q + i) + j;
    uint16_t *s = (uint16_t *)r;
    return *s;
}

Example 3, where crucible says that the equality comparison will always be false, and that the function will always return 0:

int is42 (uint16_t x) {
    uint32_t a = 9;
    uint32_t b = 0xa9cf5725;
    uint32_t y = 42;
    if ((x+a)*b == (y+a)*b) return 1;
    else return 0;
}

Check for overflow when shifting signed things

If we have an expressions like x << y, and x is at a signed type (or promoted to one!) we need to check that we don't overflow into the signed bit (i.e., it doesn't get set after the shift).

Crash! GlobalState.globalMuxFn

Using crucible via SAW, I encountered this:

%< --------------------------------------------------- 
  Revision:  2672c2252aefeb8027b93652d2b90f076d44d819
  Branch:    HEAD
  Location:  GlobalState.globalMuxFn
  Message:   Attempting to merge global states of incorrect branch depths:
              *** Depth 1:  2
              *** Depth 2:  1
              *** Location: internal
CallStack (from HasCallStack):
  panic, called at src/Lang/Crucible/Panic.hs:11:9 in crucible-0.4-5GF7fggLRYiAfjp8EgEf8:Lang.Crucible.Panic
  panic, called at src/Lang/Crucible/Simulator/GlobalState.hs:208:6 in crucible-0.4-5GF7fggLRYiAfjp8EgEf8:Lang.Crucible.Simulator.GlobalState
%< ---------------------------------------------------

I can provide a test case (sawscript and llvm) if you'd like.

default value for registers

In the crucible concrete syntax, it would be convenient if we could optionally specify initial values to the registers clause when defining functions. E.g.:

(defun @mjrty ( (xs (Vector Integer)) ) Integer
  (registers
    ($i Nat 0)
    ($sz Nat (vector-size xs))
    ($x Integer 0)
    ($k Integer 0))

...
)

I imagine this would be basically equivalent to having a sequence of set-register! statements in the initial block, but with the caveat that only the function arguments are in scope.

The `-Werror`s in the .cabal files are annoying

I'm trying to build crucible as a dependency, with profiling, on GHC 8.2, with stack --profile <my package that depends on crucible>. However, this fails by -Werror, because Stack passes -auto-all behind the scenes, which has apparently been deprecated in favor of -fprof-auto.

Perhaps it would make sense to enable -Werror only in Travis builds, and not in the .cabal file itself (except perhaps behind a flag)?

LLVM `free` is too aggressive

The free primitive in the LLVM memory model doesn't play nicely with symbolic branching. If you perform a free under a symbolic condition, the condition is effectively ignored and the resulting memory will either be freed or not freed after the merge point, depending on the order the simulator chooses to execute the branches in. This is either unsound or later produces impossible proof goals, depending on how this goes.

The following program demonstrates this problem when run with crucible-c. The pointer arithmetic required for computing buf[0] produces impossible side-conditions, rather than indicating that b must be false.

#include <stdint.h>
#include <stdlib.h>
#include "crucible.h"

int main () {
  int8_t b = crucible_int8_t("b");

  char* buf = malloc( 4 );
  if( b ) {
    free( buf );
  }

  buf[0] = 'a';
  check( buf[0] == 'a' );
}

Delete PtrToIntConst constructor

A PtrToIntConst is

  -- | A special marker for pointer constants that have been cast as an integer type.
  PtrToIntConst :: !LLVMConst -> LLVMConst

As such, it could only contain pointers. Only some LLVMConst expressions could possibly be interpreted as symbols, so this type should be refactored to reflect that:

ZeroConst: ? Is this the null pointer?
SymbolConst: Definitely a pointer
IntConst: ? Could these get cast to pointers?
All the other constructors are not pointers: FloatConst, DoubleConst, ArrayConst, VectorConst, StructConst.

`scripts/build_sandbox.sh` does not pull submodules for abcBridge

When trying to follow the build instructions, the build fails due to a fairly mysterious error which is fixed by getting the submodules for dependencies/abcBridge - it is probably the case that the build_sandbox.sh script should also fetch the submodules

LLVM invalid memory load error

The program below causes the LLVM symbolic simulator to fail to load a value from memory. When the string s is set to foobar, it prints the bytes of the string and then exits.

#include <stdio.h>
int main() {

  unsigned char s[] = "foobarb"; // fails
  //unsigned char s[] = "foobar";  // works

  for (int i=0; s[i] !=0; ++i)
    printf("byte %d = %#x\n", i, s[i]);

  printf("Moving printf before loop changes index of error: '%s'\n", s);
  return 0;
}

When s is set to foobarb, the first byte of the string is printed and then an invalid memory error is generated.

byte 0 = 0x66
Invalid memory load: address (7, 0x8:[64], 0x1:[64]) at type i8
in main at ./read-bytes.c:7:17
  When calling llvm_load

The LLVM bitcode in the successful case identifies s as a pointer. In the failing case, it stores the string as a 64bit integer and bitcasts. One idea @robdockins wanted to track down was to see if this was related to read failing to find the overlapping memory region it belonged to.

Further complicating this example, is that if you move the printf on line 10 to the line before the loop, the invalid memory load location changes to:

Invalid memory load: address (7, 0x8:[64], 0x0:[64]) at type i8
in main at ./read-bytes.c:7:3
  When calling printf

Commenting out the line that prints the whole string generates a translation error indicating that the arguments to llvm.dbg.declare are ill-formed. For these examples, I used clang with llvm-3.6.2 and compiled with -O3 optimizations.

add crux to the README

The README lists the packages in this repo, but omits crux (perhaps it's a recent addition?)

Variable scoping issues in the what4 SMTWriter

When expressions involving variable binders (e.g., forall or exists statements) are processed, it can sometimes happen that top-level variables only occur in the scope of a binder. In these cases, the top-level variable is declared in an inner scope, and forgotten when the scope of the binder is left; this is an error because the top-level variable ought to be in an wider scope. This problem is fresh since we used to use the NeverDelete option on top-level variables, causing a cache writethrough; however, we recently switched to DeleteOnPop behavior to accommodate Z3 (which has different variable scoping rules than Yices). This eliminates the cache writethrough behavior, exposing this bug.

The symptoms of this bug are a bit baffling. If a solver finds a model, it will return SAT, but what4 thinks that the variable wasn't involved in the query (it was mistakenly erased from the scoping data structures) and is therefore unconstrained. It will simply assign a default value to unconstrained variables, and the resulting model may be incorrect.

A correct fix for this problem will involve some refactoring of the way SMTWriter works so that we can find and declare top-level variables in an appropriate scope before entering the local scope of a quantified formula.

Implement more LLVM intrinsic functions

Currently, Crucible implements a subset of available LLVM intrinsic functions. Which ones? Find the latest list:

curl https://raw.githubusercontent.com/GaloisInc/crucible/master/crucible-llvm/src/Lang/Crucible/LLVM/Intrinsics.hs | grep "let nm = " | perl -p -i -e 's/let nm = "(.+)" in/\1/' | sort

They are implemented here.

Here's a list of the ones suggested in the FIXME comment:

Two types are named `LLVMContext` in `crucible-llvm` package

Not only is this confusing, but it makes it impossible to define a single module that re-exports both of them.

One of these types needs to be renamed to something else.

Symbolic simulation fails when comparing symbolic pointer value with NULL

Here's the example C code:

#include <stddef.h>
const int val = 1;
static const int *get(int x) {
	if (x == 0) { return &val; }
	else { return NULL; }
}
int test(int x) {
	return get(x) != NULL;
}

When this is compiled with LLVM and function test is simulated with saw-script, we get the following error:

Attempted to compare pointers from different allocations
in test at internal
  When calling llvm_ptrEq
  In test at internal
[...]
Symbolic execution failed.
Attempted to compare pointers from different allocations
in test at internal
Stack frame
  Allocations:
    merge
      Condition:
        boolNot (bvEq 0x0:[32] c2:bv)
      True Branch:
        
      False Branch:
        
    StackAlloc (4, 0x0:[64]) 0x4:[64]
  Writes:
    merge
      Condition:
        boolNot (bvEq 0x0:[32] c2:bv)
      True Branch:
        *(5, 0x0:[64]) := ptr(0, 0x0:[64])
      False Branch:
        *(5, 0x0:[64]) := ptr(3, 0x0:[64])
    *(6, 0x0:[64]) := c2:bv
    *(4, 0x0:[64]) := c2:bv
Base memory
  Allocations:
    HeapAlloc (3, 0x0:[64]) 0x4:[64]
    HeapAlloc (2, 0x0:[64]) 0x0:[64]
    HeapAlloc (1, 0x0:[64]) 0x0:[64]
  Writes:
    *(3, 0x0:[64]) := 0x1:[32]
)

Lazy errors for failed LLVM constant translations

When translating LLVM instructions to Crucible, many failures result in Crucible code that will fail when executed, rather than failing during translation. This can be useful in the case where the untranslatable code is never actually executed. However, for translation of constants, the structure of the code doesn't make creation of error instructions straightforward. It would be nice to refactor it so that errors would only occur if untranslatable constants are actually used.

Add support for unsat cores to What4

Several solvers, including CVC4 and, recently, Yices, support generation of unsat cores. For clients that can make use of these, it would be convenient for What4 to support parsing and returning them along with unsat results.

Crucible panic when using SAW

I ran into the following panic when using the latest nightly build of SAW for MACs (https://saw.galois.com/builds/nightly/saw-0.2-2018-09-05-MacOSX-64.tar.gz)

You have encountered a bug in Crucible's implementation.
*** Please create an issue at https://github.com/GaloisInc/crucible/issues

%< ---------------------------------------------------
Revision: de3f0f9
Branch: HEAD
Location: Intrinsics.register_llvm_override
Message: Argument type mismatch when registering LLVM mss override.
*** Override name: Symbol "llvm.objectsize.i64.p0i8"
CallStack (from HasCallStack):
panic, called at src/Lang/Crucible/Panic.hs:11:9 in crucible-0.4-DND9C4WiHH7AxtfMml2CWw:Lang.Crucible.Panic
panic, called at src/Lang/Crucible/LLVM/Intrinsics.hs:317:14 in crucible-llvm-0.1-KeRiwQXtoJRASDAERGm0CZ:Lang.Crucible.LLVM.Intrinsics
%< ---------------------------------------------------

Path condition issue in crucible-c

Crucible-C fails to verify the program below (as far as I can tell, it loses the path condition a == 5, although it still has b >= 0).

extern int __VERIFIER_nondet_int(void);
extern void __VERIFIER_assert(int cond);

#include <assert.h>

int main()
{
  int a = __VERIFIER_nondet_int();
  int b = __VERIFIER_nondet_int();
  if (a == 5 && b >= 0) {
    __VERIFIER_assert(a == 5);
  }
  return 0;
}

crucible-syntax doesn't parse negative literals

I poked around in the parser, but I don't see any obvious reason. The following program fails to parse with an error.

(defun @test-integer () Integer
  (start start:
    (let q (the Integer -1))
    (return q)))

Loading from a field of a one-element struct fails with "Invalid memory load"

Here's the C code:

struct single {
	int value;
};

int get (struct single *p) {
	return p->value;
}

Which compiles to the following LLVM:

%struct.single = type { i32 }

; Function Attrs: nounwind ssp uwtable
define i32 @get(%struct.single*) #0 {
  %2 = alloca %struct.single*, align 8
  store %struct.single* %0, %struct.single** %2, align 8
  %3 = load %struct.single*, %struct.single** %2, align 8
  %4 = getelementptr inbounds %struct.single, %struct.single* %3, i32 0, i32 0
  %5 = load i32, i32* %4, align 4
  ret i32 %5
}

I ran this with the following saw-script:

load_crucible_llvm_module "single.bc";
crucible_llvm_verify "get" []
  do {
    x <- crucible_fresh_var "x" i32;
    p <- crucible_alloc (llvm_struct "struct.single");
    crucible_points_to p (crucible_struct [crucible_term x]);
    crucible_execute_func [p];
    crucible_return (crucible_term x);
  }
  z3;

And here's the result:

Symbolic execution failed.
Invalid memory load: (1, 0x8:[64], 0x0:[64])
in get at internal
Stack frame
  Allocations:
    StackAlloc (2, 0x0:[64]) 0x8:[64]
  Writes:
    *(2, 0x0:[64]) := ptr(1, 0x0:[64])
Base memory
  Allocations:
    HeapAlloc (1, 0x0:[64]) 0x8:[64]
  Writes:
    *(1, 0x0:[64]) := {base+0 = c2:bv}
)

If I add a second field to the struct declaration, then it works just fine.

Implement outlining

Write a transformation over Crucible programs that will extract a sub-region of a Crucible function into a separate function.

Calling functions that take void * parameters?

I'm trying to verify code that passes arrays of various types (e.g. unsigned int or typedef struct Foo { unsigned int[32] } Foo_t) to assembly subroutines that take void * parameters (e.g. memcpy, memset), but I'm not quite sure how to specify these subroutines and their compatibility with the callers. So far, I've tried the following mockup:

// voidTest.c

/** mockup of subroutine to assume/override */
void clear_void(void *arr, unsigned int size) {
    unsigned char *cArr = arr;
    for (int i = 0; i < size; i++) {
        cArr[i] = 0;
    }
}

/** mockup of caller that passes a pointer to an array of unsigned ints to the void *-based subroutine */
void clear_uints(unsigned int *uints, unsigned int numUInts) {
    clear_void(uints, numUInts * sizeof(unsigned int));
}

// saw -d 4 voidTest.saw

/** 
 * spec for mockup of subroutine that clears the given number of bytes from the 
 * arbitrary array pointed to by a given void * 
 */
let clear_void_spec : CrucibleSetup() = do {
    let voidArrayType = (llvm_array 12 (llvm_int 8));
    
    arr <- (crucible_fresh_var "arr" voidArrayType);
    p_arr <- (crucible_alloc voidArrayType);
    let v_arr = (crucible_term arr);
    
    size <- (crucible_fresh_var "size" (llvm_int 32));
    let v_size = (crucible_term size); 
    
    crucible_equal v_size (crucible_term {{ 12:[32] }});
    crucible_points_to p_arr v_arr;
    
    crucible_execute_func [p_arr, v_size];
    
    crucible_points_to p_arr (crucible_term {{ zero:[12][8] }});
};

/** 
 * spec for function that calls the subroutine to clear the given number of 
 * unsigned ints from the array pointed to by a specified unsigned int * 
 */ 
let clear_uints_spec : CrucibleSetup() = do {
    let uintsType = (llvm_array 3 (llvm_int 32));
    
    uints <- (crucible_fresh_var "uints" uintsType);
    p_uints <- (crucible_alloc uintsType);
    let v_uints = (crucible_term uints);
    
    numUInts <- (crucible_fresh_var "numUInts" (llvm_int 32));
    let v_numUInts = (crucible_term numUInts); 
    
    crucible_equal v_numUInts (crucible_term {{ 3:[32] }});
    crucible_points_to p_uints v_uints;
    
    crucible_execute_func [p_uints, v_numUInts];
    
    crucible_points_to p_uints (crucible_term {{ zero:[3][32] }});
};

let main : TopLevel () = do {
    voidTest <- llvm_load_module "voidTest.bc";
    
    // The actual subroutine result would be "crucible_llvm_unsafe_assume_spec ..."
    clear_void_12_result <- crucible_llvm_verify voidTest "clear_void" [] false clear_void_spec z3;
    clear_uints_3_result <- crucible_llvm_verify voidTest "clear_uints" [clear_void_12_result] false clear_uints_spec z3;
    
    print "Done!";
};

The code compiles in Clang/LLVM, but SAW reports an "invalid memory store"...

> saw voidTest.saw
Loading file "...voidTest.saw"
Invalid memory store in clear_void at ..\src\voidTest.c:11:17
When calling llvm_store
In clear_void at ..\src\voidTest.c:11:17
Symbolic simulation failed along some paths!
Subgoal failed: @clear_void safety assertion: literal equality postcondition
SolverStats {solverStatsSolvers = fromList ["SBV->Z3"], solverStatsGoalSize = 140}
----------Counterexample----------
("arr",[255,0,0,0,0,0,0,0,0,0,0,0])
("size",12)
user error ("crucible_llvm_verify" (...voidTest.saw:51:7):
Proof failed.)

...probably because I'm not correctly specifying this. If Crucible/LLVM supports it, can you please provide an example to verify a caller that passes a pointer to an array of known type/length to a function that takes a void * and corresponding bytelength? Thank you.

SimpleBuilder `natToInteger` and `integerToReal` implementations are apparently wrong

I came across some suspicious code while browsing module Lang.Crucible.Solver.SimpleBuilder:

  natToInteger sym x
    | NatElt n l <- x = return $! IntElt (toInteger n) l
    | Just (IntegerToNat y) <- asApp x = return y
    | otherwise = sbMakeElt sym (NatToInteger x)

  integerToReal sym x
    | IntElt i l <- x = return $! RatElt (toRational i) l
    | Just (RealToInteger y) <- asApp x = return y
    | otherwise  = sbMakeElt sym (IntegerToReal x)

I had assumed that natToInteger should always return a non-negative integer, and integerToReal should always return a whole number. But if a negative integer was passed to integerToNat, and the result was passed to natToInteger, then we would get the original negative integer back! Similarly, passing a fractional real number to realToInteger followed by integerToReal would give the original fractional real number.

Does it make any sense to do this?

Implement profiling hooks

Now that the more fine-grained execution mechanism is in place, it would be nice to add support for tracing key events during symbolic execution (branches, merges, satisfiability checks, etc.) to make profiling more straightforward. Something in the spirit of this:

https://2018.splashcon.org/event/splash-2018-oopsla-finding-code-that-explodes-under-symbolic-evaluation

broken saw-script integration test

Recent crucible-jvm commits have broken the test_crucible_jvm integration test. Seems like some problem with string handling. Bisecting reveals the culprit to be commit 3fe7017.

There's a lot of stuff in there so, I'm not entirely sure what might be going on.

Use outlining to translate loops to recursive functions

Use the outlining mechanism implemented in #70 to translate loops into recursive functions. A first implementation could be quite restrictive in the form of loops it could handle, and could require a manual specification of all state variables read or written to by the loop. Supporting C programs with a top level while(1) { ... } loop (as in many SV-COMP benchmarks) could be a first goal.

Crucible panic from SAW equivalence proof

Thank you for your help in solving my previous issue. I am now using clang 3.9. However, as I've continued to write SAW code, I've encountered a new panic. Any thoughts? Thank you again!

You have encountered a bug in Crucible's implementation.
*** Please create an issue at https://github.com/GaloisInc/crucible/issues

%< ---------------------------------------------------
Revision: de3f0f9
Branch: HEAD
Location: MemModel.packMemValue
Message: Type mismatch when storing value.
*** Expected storable type: bitvectorType 1
*** Given crucible type: RecursiveRepr LLVM_pointer [BVRepr 1]
CallStack (from HasCallStack):
panic, called at src/Lang/Crucible/Panic.hs:11:9 in crucible-0.4-DND9C4WiHH7AxtfMml2CWw:Lang.Crucible.Panic
panic, called at src/Lang/Crucible/LLVM/MemModel.hs:527:3 in crucible-llvm-0.1-KeRiwQXtoJRASDAERGm0CZ:Lang.Crucible.LLVM.MemModel
%< ---------------------------------------------------

Reduce Lang.Vector

See GaloisInc/parameterized-utils#9, the functions included in that PR should be removed from Crucible when that submodule/dependency is updated.

Crucible references `check_sat` in several places

There are several references to check_sat in the Crucible library which, afaik, is a function from another project.

In particular, this error message might be confusing to a Crucible user:

https://github.com/GaloisInc/crucible/blob/master/crucible/src/Lang/Crucible/Solver/SimpleBackend/SMTLib2.hs#L469

Slowdown with Salsa20

The SAW example verifying Salsa20 went from taking ~13s to not finishing in several minutes. See here:

https://github.com/GaloisInc/saw-script/blob/master/examples/salsa20/salsa-crucible.saw

The change responsible seems to be either 7038542 or 51e9345. Both involve the interpretations of bit vector operations.

expose internals/knowledge of symbolic execution in tutorial materials

I asked two students who recently did an MSc project using Cryptol and SAW to give a couple of key recommendations to us. One of their suggestions was to teach a little bit about symbolic execution in our documentation and tutorial materials for SAW, since doing so will help users better understand how the tool works and give them hints about how to debug certain classes of problems.

Unify (or clarify) error handling

Some functions in Crucible that can fail take an implicit error handling parameter, like so:

?err :: String -> a

whereas others are in a MonadError or MonadFail:

instrResultType :: (?lc :: TyCtx.LLVMContext, MonadFail m, HasPtrWidth wptr) => L.Instr ->m MemType
liftSyntaxParse :: MonadError (ExprErr s) m => SyntaxParse Atomic a -> AST s -> m a

still others, in what4, use Either:

userSymbol :: String -> Either SolverSymbolError SolverSymbol

Our code will be most composable if we use mostly the same error handling mechanism. Otherwise, we might want to consider some guidelines on when to use the different options.