evincarofautumn / ward Goto Github PK

A static analysis tool for C.

License: Other

Haskell 88.88% C 2.36% Shell 8.76%

static-analysis static-code-analysis c haskell locking signals

ward's Issues

Verbose mode with boring statistics?

At least in the HTML output, it would be nice to output some boring statistics (# of iterations until quiescence, total number of functions scanned, something like that). So that we can tell that the analysis is running even when the final report says:

0 Warnings, 0 Errors

Formally specify type system

It would be helpful to have a document (e.g. in ott) formally specifying the judgments that define the type system. Currently the only reference is prose

Flesh out warnings and errors

Inconsistent annotations
Annotation on definition of non-static function
Added permission not granted
Dropped permission not revoked
Locally revoking permission not in context
Name collision between static definitions
Calling function with unknown permissions
Granting permission already in context
Aliasing variable with local permissions

Generate HTML report

We currently produce compiler-style output:

/path/file.c:line:column: (note|warning|error): Message

It would be nice to have a --mode=html flag that produces a formatted report, e.g., for viewing on CI.

Ward should define a preprocessor symbol

We should define __WARD__ or something similar so that non-Ward runs can ignore our attributes.

ie so we can do:

#if defined(__WARD__))
#  define WARD_PERMISSIONS(...) __attribute__((ward (__VA_ARGS__)))
#else
#  define WARD_PERMISSIONS(...) /*empty*/
#endif

Error disappears when a module is added

Using 05b02cf I am seeing some rather concerning behavior when processing callmaps. I am using this configuration file:

sm_lock_held      "assume the storage manager lock is held";                
take_sm_lock      "the ability to take the storage manager lock"            
  -> ! sm_lock_held;                                                        
                                                                            
.enforce "sm/NonMovingMark.c";                                              
.enforce "sm/NonMovingMark.h";                                              
.enforce "sm/BlockAlloc.c";                                                 
.enforce "rts/sm/BlockAlloc.h";                                             
.enforce "Block.h";

I have two callmap files: rts/sm/BlockAlloc.c.ward.graph and rts/Capability.c.ward.graph.

Checking the Capability callmap alone correctly reports no errors:

$ ward gcc --mode=compiler --config=rts/config.ward    rts/Capability.c.ward.graph   
Loading config files...
Preprocessing and parsing...
Checking...
Warnings: 0, Errors: 0

Checking the BlockAlloc callmap alone correctly reports several errors:

$ ward gcc --mode=compiler --config=rts/config.ward rts/sm/BlockAlloc.c.ward.graph                                                                                                                                                       
Loading config files...
Preprocessing and parsing...
Checking...
rts/sm/BlockAlloc.h:13: error: missing required annotation on 'allocLargeChunk'; annotation [] is missing: [revoke(take_sm_lock),need(take_sm_lock),grant(sm_lock_held)]
includes/rts/storage/Block.h:320: error: missing required annotation on 'allocGroupOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:315: error: missing required annotation on 'allocGroup_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:317: error: missing required annotation on 'allocBlock_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:322: error: missing required annotation on 'allocBlockOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:333: error: missing required annotation on 'freeGroup_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:335: error: missing required annotation on 'freeChain_lock'; annotation [] is missing: [need(take_sm_lock)]
Warnings: 0, Errors: 7

However, when I check the Capability and BlockAlloc callmaps together the error vanishes:

$ ward gcc --mode=compiler --config=rts/config.ward rts/Capability.c.ward.graph rts/sm/BlockAlloc.c.ward.graph
Loading config files...
Preprocessing and parsing...
Checking...
Warnings: 0, Errors: 0

Even stranger, when I flip the order of the two the errors are again correctly reported,

$ ward gcc --mode=compiler --config=rts/config.ward rts/sm/BlockAlloc.c.ward.graph rts/Capability.c.ward.graph
Loading config files...
Preprocessing and parsing...
Checking...
rts/sm/BlockAlloc.h:13: error: missing required annotation on 'allocLargeChunk'; annotation [] is missing: [revoke(take_sm_lock),need(take_sm_lock),grant(sm_lock_held)]
includes/rts/storage/Block.h:320: error: missing required annotation on 'allocGroupOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:315: error: missing required annotation on 'allocGroup_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:317: error: missing required annotation on 'allocBlock_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:322: error: missing required annotation on 'allocBlockOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:333: error: missing required annotation on 'freeGroup_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:335: error: missing required annotation on 'freeChain_lock'; annotation [] is missing: [need(take_sm_lock)]
Warnings: 0, Errors: 7

Differentiate “need” from “use”

Currently there is no distinction between permissions needed directly and those needed indirectly. mono/mono#4529 (comment) suggests adding a “use” annotation to the body of a function that actually uses a permission, e.g., __WARD_USE__ (foo_locked). Ward should raise a warning when you need a permission without using it (directly or indirectly), in order to help prevent stale/redundant annotations.

Conflicts and failed restrictions beneath 'Choice' nodes aren't reported

Take a look at https://github.com/evincarofautumn/Ward/blob/minimal-annotations/src/Check/Permissions.hs#L655

      let conflicts = HashSet.filter conflicting $ mconcat sites
      unless (HashSet.null conflicts) $ do
        record True $ Error pos $ Text.concat $
          [ "conflicting information for permissions "
          , Text.pack $ show $ sort $ map presencePermission
            $ HashSet.toList conflicts
          , " in '"
          , name
          , "'"
          ]

this will find all conflicts in sites :: Vector [Site] and report on them. the problem is that sites comes from https://github.com/evincarofautumn/Ward/blob/minimal-annotations/src/Check/Permissions.hs#L587

      do
        sites <- liftIO $ fmap Vector.toList $ Vector.freeze $ nodeSites node
        reportCallSites restrictions (sites, nodeCalls node, name, pos)

And the nodeSites record field in Node only contains one element for each toplevel element of nodeCalls :: CallTree (Function). It does not have any sites for calls beneath conditionals.

Now I don't think this means we won't report those conflicts at all. Rather - I think we'll just report a conflict at the position of the if statement. This is... not great.

The same problem, but worse, will happen with restrictions https://github.com/evincarofautumn/Ward/blob/minimal-annotations/src/Check/Permissions.hs#L667

      for_ (zip [0 :: Int ..] sites) $ \ (index, s) -> do
        let
          position = case index of
            0 -> ["before first call"]
            _ ->
              [ "at "
              , Text.pack $ show $ callTreeIndex (index - 1) callees
              ]
        for_ restrictions $ \ restriction -> do
          unless (evalRestriction s restriction) $ do
            record True $ Error pos $ Text.concat $
              [ "restriction "
              , Text.pack $ show restriction
              , " violated in '"
              , name
              {-
              , "' with permissions '"
              , Text.pack $ show $ HashSet.toList s
              -}
              , "' "
              ]
              <> position

The issue here is worse, I think, because a restriction could be true at a Choice node, but false at a call in one of its branches.

Anyway, I think I can probably make this all go away by recomputing the site info under 'Choice' when reporting. (1. I think we can get the info in one pass no need to iterate again since the 'Call's will have the best info we have gathered. 2. This should be pretty cheap to do - I think the propagation algorithm is fairly inexpensive)

Poor performance on real-world codebase

I was looking into using Ward to lint GHC's runtime system, starting with simple lock checking. Unfortunately even with only no privileges defined and enabling enforcement for a single file the check runs for more than 10 minutes before sending my laptop with 32GB of RAM into swap-death. This seems a bit high for a 50kLoC codebase.

Checking each source file individually typically takes around 30 seconds per file. Is this the recommended strategy for non-small projects?

Join points at return?

Consider

void foo () {
  f1 ();
  if (cond) {
    f2 ();
    return;
  }
  f3 ();
}

I haven't looked too closely, but does CallTree represent the join point between the return after f2 and the fallthrough after f3()? Does it need to? (the usual representation for functions is a graph - not tree - of single-entry-single-exit basic blocks)

Upload to Hackage

Having Ward on Hackage would make usage much more convenient.

Fix SGen issues

This issue is to track functions with (seemingly) legitimate permission errors that need to be fixed in order to use Ward with Mono.

Failure to lock or possible deadlock in mono-profiler-log.c: heap_walk assumes the GC lock is held and the world is stopped by calling mono_gc_walk_heap, but also assumes it can take the GC lock and stop the world in the EXIT_LOG_EXPLICIT macro, which may call process_requests, which may call mono_gc_collect. gc_event shares this problem by calling heap_walk.
Possible deadlock in sgen-gc.c: major_copy_or_mark_from_roots assumes the GC lock is held and the world is stopped, but calls sgen_nursery_allocator_prepare_for_pinning, which may indirectly take the GC lock via sgen_clear_allocator_fragments → sgen_clear_range → sgen_client_array_fill_range → get_array_fill_vtable → mono_gc_make_descr_for_array.
sgen-gc.c: collect_nursery assumes the world is stopped, but calls sgen_debug_verify_nursery, which may call sgen_nursery_allocator_prepare_for_pinning.

Termination condition is wrong

https://github.com/evincarofautumn/Ward/blob/minimal-annotations/src/Check/Permissions.hs#L519

       writeIORef growing =<< permissionsFromCallSites (nodePermissions node) sites

Isn't this wrong? should be like...

     nodeGrowing <- permissionsFromCallSites (nodePermissions node) sites
     modifyIORef' growing (|| nodeGrowing)

ie... if any node grew then re-run the whole thing. Right now it just reruns if the last node grew.

Errors are reported on a single line

In compiler mode I have noticed that errors are reported on a single line, greatly compromising legibility. For instance:

$ ward gcc --mode=compiler --config=rts/config.ward    rts/sm/BlockAlloc.c.ward.graph rts/Capability.c.ward.graph
Loading config files...
Preprocessing and parsing...
Checking...
rts/sm/BlockAlloc.h:13: error: missing required annotation on 'allocLargeChunk'; annotation [] is missing: [revoke(take_sm_lock),need(take_sm_lock),grant(sm_lock_held)]includes/rts/storage/Block.h:320: error: missing required annotation on 'allocGroupOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]includes/rts/storage/Block.h:315: error: missing required annotation on 'allocGroup_lock'; annotation [] is missing: [need(take_sm_lock)]includes/rts/storage/Block.h:317: error: missing required annotation on 'allocBlock_lock'; annotation [] is missing: [need(take_sm_lock)]includes/rts/storage/Block.h:322: error: missing required annotation on 'allocBlockOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]includes/rts/storage/Block.h:333: error: missing required annotation on 'freeGroup_lock'; annotation [] is missing: [need(take_sm_lock)]includes/rts/storage/Block.h:335: error: missing required annotation on 'freeChain_lock'; annotation [] is missing: [need(take_sm_lock)]Warnings: 0, Errors: 7

Add support for writing a CallMap to disk and loading it instead of the original C file

The idea is that we can do a pass over a bunch of C source files and extract the callgraph and run the analysis separately. If we do this right we can scan the C source files one at a time and be less dependent on language-c being efficient about memory usage.

@evincarofautumn I'm taking a stab at this. No PR yet but I wanted to track the idea.

Remove mentions of "permission" attribute

It looks like the correct syntax is __attribute__ ((ward(...))) but some tests and the README mention__attribute__((permission(...)))

Make annotations mandatory in some cases

E.g., in Mono, runtime entry- and exit-points should require annotations. (“things like icalls, the runtime API and when calling user supplied callbacks” — Kumpera)

Feature proposal: Parametric permissions

Currently permissions are merely names. However, Ward could be made significantly more powerful by extending the permission language with some notion of parameterisation on source program identifiers. To make this concrete, let's say we have some datastructure (call it Capability, drawing on the concept from GHC) which embeds a lock:

struct Capability {
  Mutex lock;
  int some_state;
};

We want to expose a function which can change some_state of a Capability yet only if the caller holds its lock. To do this we might define the following:

#define WARD(...) __attribute__((__VA_ARGS__))

WARD(need(may_take_capability_lock(cap), grant(holds_capablity_lock(cap)))
void acquire_capability_lock(Capability *cap) {
  ACQUIRE_MUTEX(&cap->lock);
}

WARD(need(holds_capability_lock(cap), revoke(holds_capablity_lock(cap)))
void release_capability_lock(Capability *cap) {
  RELEASE_MUTEX(&cap->lock);
}

WARD(need(holds_capability_lock(cap))
void set_state(Capability *cap, int new_state) {
  cap->state = new_state;
}

Here we have extended the permission language as follows:

permission_name := string
permission := permission_name ['(' arg_list ')']
arg_list := argument [',' arg_list]
argument := c_identifier

Where argument must be either a global variable or an identifier bound in the function's argument list. For simplicity, call sites of functions with such permissions would be restricted to only "trivial" parameters (e.g. just identifiers). This would allow permissions to be easily propagated from the call-site to the definition site. For instance,

WARD(need(may_take_capability_lock(cap))
void set_state_to_zero(Capability *the_cap) {
  acquire_capability_lock(the_cap);
  // We now have holds_capability_lock(the_cap) in our context
  set_state(the_cap, 0);
  // By the definition given above, the above call requires that we
  // have holds_capability_lock(the_cap), which we indeed have.
  release_capability_lock(the_cap);
}

The arity of a capability is fixed in the configuration. For instance the Ward configuration for the above might look like:

holds_capability_lock(cap) "The given capability's lock is held";
may_take_capability_lock(cap) "The may take the given capability's lock";

Note that under this proposal arguments are untyped. That is, one is free to write seemingly nonsensical things like,

WARD(grant(may_take_capability_lock(n)))
void do_something(int n) {  /* ... */ }

Additionally, the fact that permissions may take multiple arguments allows to represent relations between runtime values. For instance in GHC Tasks can "own" Capability's, allowing us to do some things in a lock-free manner,

__thread struct Task my_task; // the current thread's task

WARD(need(task_owns_capability(my_task, cap)))
void do_something(Capability *cap) { /* ... */ }

Limitations

There are, of course, many cases that this minimal proposal is not able to cover. For instance,

void set_all_to_zero(int n, Capabilities *caps[n]) {
  for (int i=0; i < n; i++)
    set_state_to_zero(caps[i]);
}

We sadly have no way to express the permission requirements of this function. To handle this we need the ability to embed permission actions in function bodies:

void set_all_to_zero(int n, Capability *caps[n]) {
  for (int i=0; i < n; i++) {
    Capability *cap = caps[i];
    WARD(grant(may_take_capability_lock(cap)));
    set_state_to_zero(cap);
  }
}

This of course makes set_all_to_zero part of the trusted codebase.

Extensions

Return terms

We could do slightly better on the above case by extending our permission syntax with:

argument := c_identifier
          | 'return'

Where the return keyword denotes the return value of the function. This allows us to write:

WARD(need(may_take_capability_lock(caps), grant(may_take_capability_lock(return)))
Capability *get_capability(int i, Capability *caps[i]) {
  return caps[i];
}

Allowing indexing

One could imagine further extending the permission syntax with:

argument := term [ '[' integer_literal ']' ]
term := c_identifier
      | 'return'

which would allow a few more patterns to be captured. For instance, output parameters:

WARD(grant(may_take_capability_lock(cap[0])))
void get_a_capability(Capability **cap) {  /* cap is an output */
  *cap = ...;
}

Implementation

My suspicion is that nothing of the above should be particularly difficult to implement. The only non-trivial aspect the proposal is the argument renaming necessary at call-sites but this is a simple mechanical rewrite.

Something isn't right with static functions in a CallMap

Consider the following example file "foo.c":

int foo (int a, int b) __attribute__((ward(deny(coop_can_checkpoint))));
static int bar (int a, int b);

int 
foo (int a, int b)  {
	if (a < 0)
		return foo (b, a - b);
	else
		return bar (a, b);
}

static int
bar (int a, int b) {
	int x = foo (1, a);
	int y = foo (1, b);
	return foo (x, y);
}

I expect that this defines 2 functions foo and bar where bar is static and foo is not.

Here's the callgraph I get out, however:

(callmap
  (function "foo" (node (span (source "foo.c" 128 2 1) 1 (source "foo.c" 199 2 72))(name 19))
    (actions (deny coop_can_checkpoint))
    (calltree
      (choice
        (call "foo")
        (call "foo.c`bar"))))
  (function "bar" (node (span (source "foo.c" 202 4 1) 1 (source "foo.c" 231 4 30))(name 33))
    (actions )
    (calltree
    ))
  (function "foo.c`bar" (node (span (source "foo.c" 327 15 1) 1 (source "foo.c" 421 20 1))(name 116))
    (actions )
    (calltree
      (call "foo")
      (call "foo")
      (call "foo")))

Note that there are evidently 3 functions. It looks like the calls end up being to the right functions, but I'm still surprised to see "bar" defined at all.

Config file for permission relationships

Re. mono/mono#4529 (comment) and mono/mono#4529 (comment), we want a way to specify how permissions are related. I propose adding a --config=<path> / -C<path> option, which reads a config file consisting of a series of declarations, each of which defines a permission or a relationship between permissions.

<config> ::= <decl>*
<decl> ::= <name> ("->" <expr>)? <desc>? ";"
<expr> ::= <or-expr>
<or-expr> ::= <and-expr> ("|" <and-expr>)*
<and-expr> ::= <term> ("&" <term>)*
<term> ::= <name> | "!" <term> | "(" <expr> ")"
<name> ::= /^[A-Za-z_][0-9A-Za-z_]*$/
<desc> ::= /^"([^"\\]|\\[\\"])*"$/

For example, suppose the foo lock can only be taken when the bar lock is held and the baz lock is not held.

lock_foo; foo_locked;
lock_bar; bar_locked;
lock_baz; baz_locked;

lock_foo -> bar_locked & !baz_locked;

Now checking need(lock_foo) also implies checking need(bar_locked) and deny(baz_locked).

When a config file is specified, use of a permission not declared in the config is an error, rather than implicitly creating the permission.

Parse config file (e95fa7e)
Check restrictions when checking permissions
Disallow undeclared permissions when using a config

Is simplifyCallTree is wrong on Choice Nop t ?

simplifyCallTree (Choice a b) = case (simplifyCallTree a, simplifyCallTree b) of
  (a', Nop) -> a'
  (Nop, b') -> b'
  (a', b') -> Choice a' b'

This seems wrong to me. That's saying that fnA and fnB below are equivalent. Is that really what we want?

void f1 (void);
void f2 (void);

void fnA (int i) {
  if (i)
    f1 ();
  f2 ();
}

void fnB (int i) {
  f1 ();
  f2 ();
}

conditionally granting or revoking or needing or using a permissions shouldn't be the same as always having it, should it?

evincarofautumn / ward Goto Github PK

ward's Issues

Limitations

Extensions

Return terms

Allowing indexing

Implementation

Recommend Projects

Recommend Topics

Recommend Org

Jobs