evincarofautumn / ward Goto Github PK
View Code? Open in Web Editor NEWA static analysis tool for C.
License: Other
A static analysis tool for C.
License: Other
At least in the HTML output, it would be nice to output some boring statistics (# of iterations until quiescence, total number of functions scanned, something like that). So that we can tell that the analysis is running even when the final report says:
0 Warnings, 0 Errors
It would be helpful to have a document (e.g. in ott) formally specifying the judgments that define the type system. Currently the only reference is prose
static
functionstatic
definitionsWe currently produce compiler-style output:
/path/file.c:line:column: (note|warning|error): Message
It would be nice to have a --mode=html
flag that produces a formatted report, e.g., for viewing on CI.
We should define __WARD__
or something similar so that non-Ward runs can ignore our attributes.
ie so we can do:
#if defined(__WARD__))
# define WARD_PERMISSIONS(...) __attribute__((ward (__VA_ARGS__)))
#else
# define WARD_PERMISSIONS(...) /*empty*/
#endif
Using 05b02cf I am seeing some rather concerning behavior when processing callmaps. I am using this configuration file:
sm_lock_held "assume the storage manager lock is held";
take_sm_lock "the ability to take the storage manager lock"
-> ! sm_lock_held;
.enforce "sm/NonMovingMark.c";
.enforce "sm/NonMovingMark.h";
.enforce "sm/BlockAlloc.c";
.enforce "rts/sm/BlockAlloc.h";
.enforce "Block.h";
I have two callmap files: rts/sm/BlockAlloc.c.ward.graph
and rts/Capability.c.ward.graph
.
Checking the Capability
callmap alone correctly reports no errors:
$ ward gcc --mode=compiler --config=rts/config.ward rts/Capability.c.ward.graph
Loading config files...
Preprocessing and parsing...
Checking...
Warnings: 0, Errors: 0
Checking the BlockAlloc
callmap alone correctly reports several errors:
$ ward gcc --mode=compiler --config=rts/config.ward rts/sm/BlockAlloc.c.ward.graph
Loading config files...
Preprocessing and parsing...
Checking...
rts/sm/BlockAlloc.h:13: error: missing required annotation on 'allocLargeChunk'; annotation [] is missing: [revoke(take_sm_lock),need(take_sm_lock),grant(sm_lock_held)]
includes/rts/storage/Block.h:320: error: missing required annotation on 'allocGroupOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:315: error: missing required annotation on 'allocGroup_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:317: error: missing required annotation on 'allocBlock_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:322: error: missing required annotation on 'allocBlockOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:333: error: missing required annotation on 'freeGroup_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:335: error: missing required annotation on 'freeChain_lock'; annotation [] is missing: [need(take_sm_lock)]
Warnings: 0, Errors: 7
However, when I check the Capability
and BlockAlloc
callmaps together the error vanishes:
$ ward gcc --mode=compiler --config=rts/config.ward rts/Capability.c.ward.graph rts/sm/BlockAlloc.c.ward.graph
Loading config files...
Preprocessing and parsing...
Checking...
Warnings: 0, Errors: 0
Even stranger, when I flip the order of the two the errors are again correctly reported,
$ ward gcc --mode=compiler --config=rts/config.ward rts/sm/BlockAlloc.c.ward.graph rts/Capability.c.ward.graph
Loading config files...
Preprocessing and parsing...
Checking...
rts/sm/BlockAlloc.h:13: error: missing required annotation on 'allocLargeChunk'; annotation [] is missing: [revoke(take_sm_lock),need(take_sm_lock),grant(sm_lock_held)]
includes/rts/storage/Block.h:320: error: missing required annotation on 'allocGroupOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:315: error: missing required annotation on 'allocGroup_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:317: error: missing required annotation on 'allocBlock_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:322: error: missing required annotation on 'allocBlockOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:333: error: missing required annotation on 'freeGroup_lock'; annotation [] is missing: [need(take_sm_lock)]
includes/rts/storage/Block.h:335: error: missing required annotation on 'freeChain_lock'; annotation [] is missing: [need(take_sm_lock)]
Warnings: 0, Errors: 7
Currently there is no distinction between permissions needed directly and those needed indirectly. mono/mono#4529 (comment) suggests adding a “use” annotation to the body of a function that actually uses a permission, e.g., __WARD_USE__ (foo_locked)
. Ward should raise a warning when you need
a permission without using it (directly or indirectly), in order to help prevent stale/redundant annotations.
Take a look at https://github.com/evincarofautumn/Ward/blob/minimal-annotations/src/Check/Permissions.hs#L655
let conflicts = HashSet.filter conflicting $ mconcat sites
unless (HashSet.null conflicts) $ do
record True $ Error pos $ Text.concat $
[ "conflicting information for permissions "
, Text.pack $ show $ sort $ map presencePermission
$ HashSet.toList conflicts
, " in '"
, name
, "'"
]
this will find all conflicts in sites :: Vector [Site]
and report on them. the problem is that sites
comes from https://github.com/evincarofautumn/Ward/blob/minimal-annotations/src/Check/Permissions.hs#L587
do
sites <- liftIO $ fmap Vector.toList $ Vector.freeze $ nodeSites node
reportCallSites restrictions (sites, nodeCalls node, name, pos)
And the nodeSites
record field in Node
only contains one element for each toplevel element of nodeCalls :: CallTree (Function)
. It does not have any sites for calls beneath conditionals.
Now I don't think this means we won't report those conflicts at all. Rather - I think we'll just report a conflict at the position of the if
statement. This is... not great.
The same problem, but worse, will happen with restrictions https://github.com/evincarofautumn/Ward/blob/minimal-annotations/src/Check/Permissions.hs#L667
for_ (zip [0 :: Int ..] sites) $ \ (index, s) -> do
let
position = case index of
0 -> ["before first call"]
_ ->
[ "at "
, Text.pack $ show $ callTreeIndex (index - 1) callees
]
for_ restrictions $ \ restriction -> do
unless (evalRestriction s restriction) $ do
record True $ Error pos $ Text.concat $
[ "restriction "
, Text.pack $ show restriction
, " violated in '"
, name
{-
, "' with permissions '"
, Text.pack $ show $ HashSet.toList s
-}
, "' "
]
<> position
The issue here is worse, I think, because a restriction could be true at a Choice
node, but false at a call in one of its branches.
Anyway, I think I can probably make this all go away by recomputing the site info under 'Choice' when reporting. (1. I think we can get the info in one pass no need to iterate again since the 'Call's will have the best info we have gathered. 2. This should be pretty cheap to do - I think the propagation algorithm is fairly inexpensive)
I was looking into using Ward to lint GHC's runtime system, starting with simple lock checking. Unfortunately even with only no privileges defined and enabling enforcement for a single file the check runs for more than 10 minutes before sending my laptop with 32GB of RAM into swap-death. This seems a bit high for a 50kLoC codebase.
Checking each source file individually typically takes around 30 seconds per file. Is this the recommended strategy for non-small projects?
Consider
void foo () {
f1 ();
if (cond) {
f2 ();
return;
}
f3 ();
}
I haven't looked too closely, but does CallTree
represent the join point between the return
after f2
and the fallthrough after f3()
? Does it need to? (the usual representation for functions is a graph - not tree - of single-entry-single-exit basic blocks)
Having Ward on Hackage would make usage much more convenient.
This issue is to track functions with (seemingly) legitimate permission errors that need to be fixed in order to use Ward with Mono.
Failure to lock or possible deadlock in mono-profiler-log.c
: heap_walk
assumes the GC lock is held and the world is stopped by calling mono_gc_walk_heap
, but also assumes it can take the GC lock and stop the world in the EXIT_LOG_EXPLICIT
macro, which may call process_requests
, which may call mono_gc_collect
. gc_event
shares this problem by calling heap_walk
.
Possible deadlock in sgen-gc.c
: major_copy_or_mark_from_roots
assumes the GC lock is held and the world is stopped, but calls sgen_nursery_allocator_prepare_for_pinning
, which may indirectly take the GC lock via sgen_clear_allocator_fragments
→ sgen_clear_range
→ sgen_client_array_fill_range
→ get_array_fill_vtable
→ mono_gc_make_descr_for_array
.
sgen-gc.c
: collect_nursery
assumes the world is stopped, but calls sgen_debug_verify_nursery
, which may call sgen_nursery_allocator_prepare_for_pinning
.
https://github.com/evincarofautumn/Ward/blob/minimal-annotations/src/Check/Permissions.hs#L519
writeIORef growing =<< permissionsFromCallSites (nodePermissions node) sites
Isn't this wrong? should be like...
nodeGrowing <- permissionsFromCallSites (nodePermissions node) sites
modifyIORef' growing (|| nodeGrowing)
ie... if any node grew then re-run the whole thing. Right now it just reruns if the last node grew.
In compiler
mode I have noticed that errors are reported on a single line, greatly compromising legibility. For instance:
$ ward gcc --mode=compiler --config=rts/config.ward rts/sm/BlockAlloc.c.ward.graph rts/Capability.c.ward.graph
Loading config files...
Preprocessing and parsing...
Checking...
rts/sm/BlockAlloc.h:13: error: missing required annotation on 'allocLargeChunk'; annotation [] is missing: [revoke(take_sm_lock),need(take_sm_lock),grant(sm_lock_held)]includes/rts/storage/Block.h:320: error: missing required annotation on 'allocGroupOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]includes/rts/storage/Block.h:315: error: missing required annotation on 'allocGroup_lock'; annotation [] is missing: [need(take_sm_lock)]includes/rts/storage/Block.h:317: error: missing required annotation on 'allocBlock_lock'; annotation [] is missing: [need(take_sm_lock)]includes/rts/storage/Block.h:322: error: missing required annotation on 'allocBlockOnNode_lock'; annotation [] is missing: [need(take_sm_lock)]includes/rts/storage/Block.h:333: error: missing required annotation on 'freeGroup_lock'; annotation [] is missing: [need(take_sm_lock)]includes/rts/storage/Block.h:335: error: missing required annotation on 'freeChain_lock'; annotation [] is missing: [need(take_sm_lock)]Warnings: 0, Errors: 7
The idea is that we can do a pass over a bunch of C source files and extract the callgraph and run the analysis separately. If we do this right we can scan the C source files one at a time and be less dependent on language-c
being efficient about memory usage.
@evincarofautumn I'm taking a stab at this. No PR yet but I wanted to track the idea.
It looks like the correct syntax is __attribute__ ((ward(...)))
but some tests and the README mention__attribute__((permission(...)))
E.g., in Mono, runtime entry- and exit-points should require annotations. (“things like icalls, the runtime API and when calling user supplied callbacks” — Kumpera)
Currently permissions are merely names. However, Ward could be made significantly more powerful by extending the permission language with some notion of parameterisation on source program identifiers. To make this concrete, let's say we have some datastructure (call it Capability
, drawing on the concept from GHC) which embeds a lock:
struct Capability {
Mutex lock;
int some_state;
};
We want to expose a function which can change some_state
of a Capability
yet only if the caller holds its lock. To do this we might define the following:
#define WARD(...) __attribute__((__VA_ARGS__))
WARD(need(may_take_capability_lock(cap), grant(holds_capablity_lock(cap)))
void acquire_capability_lock(Capability *cap) {
ACQUIRE_MUTEX(&cap->lock);
}
WARD(need(holds_capability_lock(cap), revoke(holds_capablity_lock(cap)))
void release_capability_lock(Capability *cap) {
RELEASE_MUTEX(&cap->lock);
}
WARD(need(holds_capability_lock(cap))
void set_state(Capability *cap, int new_state) {
cap->state = new_state;
}
Here we have extended the permission language as follows:
permission_name := string
permission := permission_name ['(' arg_list ')']
arg_list := argument [',' arg_list]
argument := c_identifier
Where argument
must be either a global variable or an identifier bound in the function's argument list. For simplicity, call sites of functions with such permissions would be restricted to only "trivial" parameters (e.g. just identifiers). This would allow permissions to be easily propagated from the call-site to the definition site. For instance,
WARD(need(may_take_capability_lock(cap))
void set_state_to_zero(Capability *the_cap) {
acquire_capability_lock(the_cap);
// We now have holds_capability_lock(the_cap) in our context
set_state(the_cap, 0);
// By the definition given above, the above call requires that we
// have holds_capability_lock(the_cap), which we indeed have.
release_capability_lock(the_cap);
}
The arity of a capability is fixed in the configuration. For instance the Ward configuration for the above might look like:
holds_capability_lock(cap) "The given capability's lock is held";
may_take_capability_lock(cap) "The may take the given capability's lock";
Note that under this proposal argument
s are untyped. That is, one is free to write seemingly nonsensical things like,
WARD(grant(may_take_capability_lock(n)))
void do_something(int n) { /* ... */ }
Additionally, the fact that permissions may take multiple arguments allows to represent relations between runtime values. For instance in GHC Task
s can "own" Capability
's, allowing us to do some things in a lock-free manner,
__thread struct Task my_task; // the current thread's task
WARD(need(task_owns_capability(my_task, cap)))
void do_something(Capability *cap) { /* ... */ }
There are, of course, many cases that this minimal proposal is not able to cover. For instance,
void set_all_to_zero(int n, Capabilities *caps[n]) {
for (int i=0; i < n; i++)
set_state_to_zero(caps[i]);
}
We sadly have no way to express the permission requirements of this function. To handle this we need the ability to embed permission actions in function bodies:
void set_all_to_zero(int n, Capability *caps[n]) {
for (int i=0; i < n; i++) {
Capability *cap = caps[i];
WARD(grant(may_take_capability_lock(cap)));
set_state_to_zero(cap);
}
}
This of course makes set_all_to_zero
part of the trusted codebase.
We could do slightly better on the above case by extending our permission syntax with:
argument := c_identifier
| 'return'
Where the return
keyword denotes the return value of the function. This allows us to write:
WARD(need(may_take_capability_lock(caps), grant(may_take_capability_lock(return)))
Capability *get_capability(int i, Capability *caps[i]) {
return caps[i];
}
One could imagine further extending the permission syntax with:
argument := term [ '[' integer_literal ']' ]
term := c_identifier
| 'return'
which would allow a few more patterns to be captured. For instance, output parameters:
WARD(grant(may_take_capability_lock(cap[0])))
void get_a_capability(Capability **cap) { /* cap is an output */
*cap = ...;
}
My suspicion is that nothing of the above should be particularly difficult to implement. The only non-trivial aspect the proposal is the argument renaming necessary at call-sites but this is a simple mechanical rewrite.
Consider the following example file "foo.c"
:
int foo (int a, int b) __attribute__((ward(deny(coop_can_checkpoint))));
static int bar (int a, int b);
int
foo (int a, int b) {
if (a < 0)
return foo (b, a - b);
else
return bar (a, b);
}
static int
bar (int a, int b) {
int x = foo (1, a);
int y = foo (1, b);
return foo (x, y);
}
I expect that this defines 2 functions foo
and bar
where bar
is static and foo
is not.
Here's the callgraph I get out, however:
(callmap
(function "foo" (node (span (source "foo.c" 128 2 1) 1 (source "foo.c" 199 2 72))(name 19))
(actions (deny coop_can_checkpoint))
(calltree
(choice
(call "foo")
(call "foo.c`bar"))))
(function "bar" (node (span (source "foo.c" 202 4 1) 1 (source "foo.c" 231 4 30))(name 33))
(actions )
(calltree
))
(function "foo.c`bar" (node (span (source "foo.c" 327 15 1) 1 (source "foo.c" 421 20 1))(name 116))
(actions )
(calltree
(call "foo")
(call "foo")
(call "foo")))
Note that there are evidently 3 functions. It looks like the calls end up being to the right functions, but I'm still surprised to see "bar" defined at all.
Re. mono/mono#4529 (comment) and mono/mono#4529 (comment), we want a way to specify how permissions are related. I propose adding a --config=<path>
/ -C<path>
option, which reads a config file consisting of a series of declarations, each of which defines a permission or a relationship between permissions.
<config> ::= <decl>*
<decl> ::= <name> ("->" <expr>)? <desc>? ";"
<expr> ::= <or-expr>
<or-expr> ::= <and-expr> ("|" <and-expr>)*
<and-expr> ::= <term> ("&" <term>)*
<term> ::= <name> | "!" <term> | "(" <expr> ")"
<name> ::= /^[A-Za-z_][0-9A-Za-z_]*$/
<desc> ::= /^"([^"\\]|\\[\\"])*"$/
For example, suppose the foo
lock can only be taken when the bar
lock is held and the baz
lock is not held.
lock_foo; foo_locked;
lock_bar; bar_locked;
lock_baz; baz_locked;
lock_foo -> bar_locked & !baz_locked;
Now checking need(lock_foo)
also implies checking need(bar_locked)
and deny(baz_locked)
.
When a config file is specified, use of a permission not declared in the config is an error, rather than implicitly creating the permission.
simplifyCallTree (Choice a b) = case (simplifyCallTree a, simplifyCallTree b) of
(a', Nop) -> a'
(Nop, b') -> b'
(a', b') -> Choice a' b'
This seems wrong to me. That's saying that fnA
and fnB
below are equivalent. Is that really what we want?
void f1 (void);
void f2 (void);
void fnA (int i) {
if (i)
f1 ();
f2 ();
}
void fnB (int i) {
f1 ();
f2 ();
}
conditionally granting or revoking or needing or using a permissions shouldn't be the same as always having it, should it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.