microsoft / checkedc Goto Github PK

Checked C is an extension to C that lets programmers write C code that is guaranteed by the compiler to be type-safe. The goal is to let people easily make their existing C code type-safe and eliminate entire classes of errors. Checked C does not address use-after-free errors. This repo has a wiki for Checked C, sample code, the specification, and test code.

Home Page: https://www.microsoft.com/en-us/research/project/checked-c/

License: Other

TeX 39.16% Makefile 0.09% C 60.22% CMake 0.13% R 0.40%

c programming-language system-programming clang llvm microsoft

checkedc's Introduction

Checked C

Checked C adds static and dynamic checking to C to detect or prevent common programming errors such as buffer overruns and out-of-bounds memory accesses. The goal of the project is to improve systems programming by making fundamental improvements to C. This repo contains sample code, the extension specification, and test code.

For a quick overview of Checked C, more information, and pointers to example code, see our Wiki.
The PDF of the specification is available here.
Compilers are available here.
The Checked C clang repo is here.
The instructions to build and test the Checked C compiler are documented on the Checked C clang wiki.

Publications and Presentations

We presented a research paper on Checked C at the IEEE 2018 Cybersecurity Development Conference: "Checked C: Making C Safe by Extension". The paper describes the key ideas of Checked C in 8 pages. Note that we have added features to Checked C for improving type safety (and reducing type confusion) since writing the paper. The Wiki and specification provide up-to-date descriptions of Checked C.
We presented another paper on Checked C at the 2019 Principles of Security and Trust Conference: "Achieving Safety Incrementally With Checked C". This paper describes a tool for converting existing C code to use Ptr types. It also proves a blame property about checked regions that shows that checked regions are blameless for any memory corruption. This proof is formalized for a core subset of the language extension.
We presented a poster at the LLVM Dev Meeting 2019: "Overflows Be Gone: Checked C for Memory Safety". The poster provides an introduction to Checked C, outlines the compiler implementation and presents an experimental evaluation of Checked C.
We presented a talk (slides) at the 2020 LLVM Virtual Dev Meeting: "Checked C: Adding memory safety support to LLVM". The talk describes the design of bounds annotations for checked pointers and array pointers as well as the framework for the static checking of the soundness of bounds. We also briefly describe novel algorithms to automatically widen bounds for null-terminated arrays and for comparison of expressions for equivalence.

Build Status

Configuration	Testing	Status
Debug X86 Windows	Checked C and clang regression tests
Debug X64 Windows	Checked C and clang regression tests
Debug X64 Linux	Checked C and clang regression tests
Release X64 Linux	Checked C, clang, and LLVM nightly tests

Participating

We're happy to have the help! You can contribute by trying out Checked C, reporting bugs, and giving us feedback. There are other ways to contribute too. You can watch the announcement page for announcements about the project.

Licensing

The software in this repository is covered by the MIT license. See the file LICENSE.TXT for the license. The Checked C specification is made available by Microsoft under the OpenWeb Foundation Final Specification Agreement, version 1.0. Contributions of code to the Checked LLVM/clang repos are subject to the CLANG/LLVM licensing terms.

Code of conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

checkedc's People

Contributors

Stargazers

Watchers

Forkers

leonidwang peide huleg ahylick-zz vincentjzimmer chris-wood kasi86 kumarl nextgenintelligence zackhaikal dtarditi meteoritt chubbymaggie cedarlogic haritha91 mistuke chorior nkwilson aonorin sftcrl yangmingming supermario1990 jqk6 zheng-hl b-xiang nonameit bytes256 gavinhwa maojxsir glorylee zhangjinde zhkmxx9302013 louzhuang bryonglodencissp nckny robbietoo lenmoyouzijiangjun dhanzhang fw1121 is00hcw ccczone alexshilucky xubingyue fnet123 yuanfeng0905 dragonyzl xornand allenyllee haridassagarnitturu uranium62 strogo yuhangwang ivanjasenov bharadwajy hyperhcl cephdon cnsuhao intrigus sivansong lenary zy498420 wonsubkim jijoongmoon hades210 dxq-git neuroradiology liufeigit jamesgbl jp1729 jebcat1982 ifarhankhan kutim mgashraf fuxiocteract najibalghaeth chinmayhundekari hbcbh1999 linecode neverhoodboy walvka lasloyu brghena lianghenghao louwangzhiyuy nexweb rchyena awesomedotnetcore blocky2019 cloudstdiolab lqingyu kevinmiles oalign shanba fcccode vdedyukhin excitoon-favorites farazhussain blockspacer sauravkdubey zxf1023818103

checkedc's Issues

Simplify and extend test code in checkedc_scope_pragma.c

In pull request #203, I wrote:

2100 lines of test code is a lot code to test the BOUNDS_CHECKED pragma. This may be costly to maintain in the future. Please simplify this test code to be more concise. I recommend selective checking properties that you'd expect to be true with a checked or unchecked scope.

For example, you already test that unchecked parameters are not allowed when BOUNDS_CHECKED has been turned on. It would be good to have a few tests of that. I don't think we need as many as you have. You also check that unchecked local variables are not allowed when BOUNDS_CHECKED has been turned on. It would be good to also have a few tests of that.

Pragmas can also be turned on and off and I believe can be limited to specific scopes. Could you check that too?

So that I can commit changes to the compiler, I'm merging the pull request. Please revise the code for the BOUNDS_CHECKED pragma.

Describe bounds-safe interoperation types

The specification currently describes a way to annotate that a variable with unchecked pointer type has a bounds-safe interface that is ptr type. A 'ptr annotation is used in place of a bounds expression. In practice, this is not sufficient to describe the bounds-safe interface for int **. In the clang implementation, we have generalized this to be type annotation that is used in place of a bounds expression. The type annotation describes an alternate type. We need to update the specification with the new approach.

Allow use of unchecked keyword to declare unchecked arrays?

When an array declarator is declared to be checked, by default the checked property propagates to directly-nested array types. This is provided that the nested array types are declared as part of the declarator.

For example, int a checked[10][10]; declares a checked array of checked arrays. This should be thought of as syntactic shorthand for int a checked[10]checked[10]. In contrast, typedef int myarr[10]; myarr a checked[10] declares a checked array of unchecked arrays.

For the purposes of language completeness, it might be useful to allow programmers to declare an array to be unchecked. We don't recommend mixing checked arrays and unchecked arrays in multi-dimensional array declaration, but in incrementally converting code, the need for this come up. To do this, we might allow an array declarator to be prefixed by the 'unchecked' keyword.

discuss terminology of safe vs. unsafe pointers

We are using the terminology of safe and unsafe pointer types in the document. It might be better to call them checked and unchecked pointer types instead. There are fewer negative connotations to the term unchecked.

Think through declarations of bounds for checked array variables.

Currently, it is possible to declare bounds for a fixed-size checked array, Consider the following example of declaring a global array with bounds:

int len;
float g50 checked[10] : count(len);

This implies that 0 <= len < 10 must hold true through the program. Currently there is no constraint on len. We could add require a where clause for len that implies that len is always within range of the fixed size array. We would face a similar issue for variable-sized arrays.

Forbidding bounds on fixed-sized arrays is not a reasonable option. It is a C idiom to have a incomplete array type as part of an interface:

extern int len;
extern float g50 checked[]

Allow implicit conversions from pointers to unchecked arrays to pointers to checked arrays

Currently, checked arrays and unchecked arrays are distinct types. This will cause problems when passing a multi-dimensional unchecked array to a checked array parameter. Consider the following code:

void f(int checked arr[10][10]);

During type checking, arr is treated as a pointer to a 10-element checked integer array:

int (*arr) checked[10];

If we have code of the form:

int g() {
int myarr[10][10];
f(myarr);
}

Typechecking will fail. myarr will be converted to a pointer to an unchecked array.

int (*myarr)[10]

and the array types will not be compatible. Compatibility is a specific notion in the C Definition. It is problematic that typchecking fails because we know that that the actual size of data is sufficient.

I propose that we add an implicit conversion from pointers to unchecked arrays to pointers to checked arrays, where the only difference between the array types is whether the arrays are checked or unchecked. We will rely on the checking of bounds declaration to enforces that the pointer to an unchecked array points to a valid region of memory of sufficient size for the checked array.

This is actually a deeper change at the level of the C Definition. C only defines compatibility, which means roughly "two types are equivalent". C does not have a notion of "assignment compatibility" that differentiates between source and destination types. With bounds checking, assignment compatibility arises naturally. It is OK to pass an array of T with more elements than expected to a function. It is also OK to pass an unchecked array (of sufficient runtime size) to a checked array as well. The reverse, though is not OK. This means that we need to have a "directional" notion of compatibility.

Disallow array_ptrs to function types

Function types have no size associated with them. For this reason, it does not make sense to have array_ptrs to function types. Pointer arithmetic is not meaningful for these kinds of array_ptrs, which implies that we can't define bounds checks for these kinds of pointers.

Provide guidance on bounds-safe interfaces for function pointers.

The specification allows bounds-safe interfaces on parameters of function pointer types. Right now, we can write something like:

void qsort(void *base : byte_count(nmemb * size),
           size_t nmemb, size_t size,
           int(*compar)(const void *, const void *) :
             itype(_Ptr<int(const void * : itype(_Ptr<const void>),
                             const void * : itype(_Ptr<const void>))>));

The bounds-safe interface type for compar is pretty confusing. It could be_Ptr<int (const void *, const void *)> or Ptr<int (Ptr<const void>, Ptr<const void>) The first is a checked pointer to a function that takes unchecked pointer arguments. The second is a checked pointer to a function that takes checked pointer arguments.

In the event that qsort is called from code that mixes checked and unchecked code, we require that if one parameter takes a checked type, all the other parameters with pointer type have to take a checked type. It seems non-intuitive that we'd allow base to be a checked array and compar to be a function that operates on unchecked pointers.

The intent of a bounds-safe interface type is to define a checked interface to existing unchecked code. I suggest that we simplify things for now by forbidding bounds-safe interfaces on parameters in function types. The revised definition would be:

void qsort(void *base : byte_count(nmemb * size),
           size_t nmemb, size_t size,
           int(*compar)(const void *, const void *) :
             itype(_Ptr<int (_Ptr<const void>, _Ptr<const void>)>))

In this approach, the bounds-safe interface type specifies one of two alternatives, and there is no question about what the intending types for the parameters for the interface type are.

Return values for functions with bounds-safe interfaces

In an unchecked context, we check the correctness of bounds for argument expressions to function calls with a bounds-safe interfaces on an all-or-nothing basis. If an argument expression has a safe pointer type and the corresponding formal parameter has a bounds-safe interface, then all argument expressions will be checked.

However, we don't propagate the requirement for checking to the return value in that case, i.e. treat the result as being something that must be checked at its use. We should consider doing that to avoid possibly unintuitive behavior. Consider the following example:

int *f(int *x : count(5)) : count(5);
int *g(int *x : count(6))

array_ptr y : count(5);
g(f(y))

f(y) will be checked, but its use as an argument to g will not be checked.

how to do runtime check

Dear Mr/Miss:
I did a test, that use the codes as follows:
void add()
{
int a checked[2]:bounds(a, a + 2);
int b checked[2]:bounds(b, b + 2);
for (int i = 0; i < 5; i++)
{
a[i] += b[i];
}
return;
}
After I compiled these codes and generated an .o file, I compared the call stack, it is as the same as these codes:
void add()
{
int a checked[2]:bounds(a, a + 2);sss
int b checked[2]:bounds(b, b + 2);

for (int i = 0; i < 5; i++)
{
    a[i] += b[i];
}

return;

}

Did not add any codes after compiled.
So I have two question：

How checked-c does runtime check?
How can I do this test? I want test runtime check, can you give me some examples?

Wait for your replay
Thank you
Best wishs

The 'any' bounds is a confusing concept

@parjong has provided feedback that the 'any' bounds could be confusing to developers. The 'any' bounds is the bounds used for null pointers. It means that the expression could have any bounds. He suggested the use of 'unknown' instead.

@reubeno, @gdr-at-ms, @Chris-Hawblitzel, and I discussed this and we agreed that the bounds name could be confusing.

One possibility is to rename the bounds. Another possibility is to remove the 'any' bounds from the syntax of the language. It is a concept that is needed when checking bounds declarations, but it does not seem likely that programmers actually need to write this down. This would be similar to the special treatment of null pointers in the C semantics where a null value can given any pointer type. We are leaning toward removing the 'any' bounds from the language.

Can you instead upload your spec in the Markdown language that's more amenable to GitHub?

Allow bounds declarations on union members?

Pull request #37 has raised the issue of what to do about bounds declaration on union members in the clang implementation of Checked C. The question is whether to allow them or make it an error to declare bounds declarations on union members. I am interested in feedback about this question.

The specification is currently silent about bounds declarations on union members. Union members are problematic because storing a value in one member and accessing it at a different member is implicitly a cast of a value from one type to another type. Of course, this can easily accidentally bypass bounds checking.

Even though union members are problematic, my thinking is that we should still allow bounds declarations on union members. First, we already allow explicit casts that could violate bounds safety, so we are not worse off safety-wise than before. Of course, It is easier to misuse union member accesses because the casts happen implicitly, not explicitly in the code. Second, we do plan to extend the specification to check pointer type casts that preserve bounds safety. We will address the issue of union members acting as implicit casts as part of that work. Many times unions are used in a type-safe way to save space. There might a tag that controls access to members or program invariants that values stored in a union member are only read using that the union member. We still face the problem of unions being used to implement truly unsafe casts, but my guess is often we'll be able to show that unions are being used in ways that do preserve bounds safety.

From a practical viewpoint, it will make it easier to add bounds declarations to existing code if we allow bounds declarations on union members. Consider the alternative of banning bounds declarations on union members. If a programmer wants to still have bounds declaration in code that uses the member, the programmer would have to place bounds casts at each use of a member. This is error-prone and tedious, so many programmers won't do that. A programmer might create wrapper functions and use them, but that can affect performance negatively and require more significant changes to code. A likely outcome is that code that uses unions would just be left as unchecked code. That seems worse than allowing bounds even though a programmer might compromise them through a type-safety violation.

Fix missing header file include in tests for bounds-safe interfaces of C standard libraries

In the test of wrappers for wcstoimax and wcstoumax, we are only including <inttypes.h>. We are not including <stddef.h>. This results in a test failure on a Linux x64 Ubuntu box because wchar_t is missing.

Describe how function types are extended with bounds information

We need to add a description to the Checked C specification of how functions types are extended with bounds information. This includes describing rules for type compatibility and implicit conversions, The rules should be written down before implementing checking of redeclarations of functions and typechecking of function pointers in clang. We particularly want to allow functions that have arguments with unchecked pointers to be redeclared with additional bounds information.

Simplify bounds inference rules for integer-typed expressions

The bounds inference rules for integer-typed binary operations require that the subexpression that has bounds also be non-zero. This is probably too burdensome for programmers to use in practice. We should change the rules to avoid this requirement.

This requirement arose from the definition of bounds for pointers, which can either be null or a valid pointer. The bounds may not be valid if the pointer is null. We have handle the case where a null pointer is converted to an integer and then made non-null through integer operations.

Having this requirement transfer to integers is likely to be burdensome in practice, however. The specification had an example involving tagged pointers with a 2-bit tag in the least significant bits. To store an integer in the tag, you might need to first clear the tag:

array_ptr<int> set(array_ptr<int> p : tagged_bounds(p)) : tagged_bounds(result_value) {
   if (p != null) {
     size_t untagged : tagged_bounds(p) = (size_t) p & ~0x3;
     ...
   }
}

It is not logically true from the code that (size_t) p & ~0x3 is non-zero. This means that the bounds inference rule would not succeed in inferring a bounds for untagged.

If we drop the non-zero requirement from the bounds inference rules for integer-typed binary expressions, we need to ensure that bounds are always valid for integer-typed values, even if the values are 0. We can achieve this by:

Changing the definition of bounds for integer-typed values to be slightly different than the definition of bounds for pointers. Specifically, we can define bounds for integer-typed values so that the are required to always be valid (correspond to bounds for an actual object). This eliminates the special case of bounds possibly being invalid when the value of an integer expression becomes 0.
At casts from pointers to integers, requiring that the source pointer either be non-null (and have valid bounds) or that the bounds be provably empty if the source pointer is null.

This pushes the burden from integer operations to casts from pointers to integers. This seems likel to be easier for programmers to use.

Update Syntactic Restrictions on Commas in Non-Modifying Expressions

Non-modifying expressions currently disallow all comma expressions. This should be changed to only disallow them at the top level, so they cannot be confused with a function parameter list.

Remove error-check in rel_align.c

Remove error-check in rel_align.c to along with "Avoiding generating error message 'expected bounds expression' when there is a typechecking error (checkedc-clang : #195)

Revisit `volatile` qualifiers on checked types

C11 allows volatile-qualified types, we should make sure their constraints are reasonable, and if not add some for volatile-qualified checked types.

One proposal of mine is to ban any volatile checked types, with the understanding that volatile types are for interacting with specific hardware protocols, or somewhere where the programmer doesn't want to compiler to interfere, and therefore the programmer should use unchecked volatile types, but then assign the resulting values of operations with them into variables with checked types, before they go further into the program.

strncmp declaration error on Unbuntu 14.04 (Trusty)

Our declaration for strncmp(s1, s2, n) on Ubuntu 14.04 is causing errors.

This is because Ubuntu overrides the declaration of strncmp with the following macro:

/* Compare N characters of S1 and S2.  */
#ifndef _HAVE_STRING_ARCH_strncmp
# define strncmp(s1, s2, n)                                                   \
  (__extension__ (__builtin_constant_p (n)                                    \
                  && ((__builtin_constant_p (s1)                              \
                       && strlen (s1) < ((size_t) (n)))                       \
                      || (__builtin_constant_p (s2)                           \
                          && strlen (s2) < ((size_t) (n))))                   \
                  ? strcmp (s1, s2) : strncmp (s1, s2, n)))
#endif

This seems a lot harder to solve than what we did for apple's system headers.

Fix non-portable tests of integers with bounds declarations.

I built and ran the Checked C tests on Ubuntu Linux x64. typechecking/bounds.c and typechecking/no_prototype_functions.c failed because of 32 vs. 64 pointer size differences. We are trying to test integers with bounds declaration. To test this, we need to cast some pointers to be integers. We used int as the integer type in the tests, which does not work on a 64-bit platform. It causes a warning for no_prototype_functions.c because a smaller type is being cast to a pointer size. It causes an outright error in bounds.c because the initializer for a static variable is no longer a compile-time constant. We had already hit this attempting to test short integers on 32-bit platform.

I've reviewed the C specification and there is no portable way to write code that casts between pointers and integers. For example, the C spec does not guarantee that size_t is actually large enough to hold a pointer value. It does not guarantee that there is any integer type large enough to hold a pointer value, in fact. At best, we can write some non-portable code for specific target platforms that tests this functionality. Clang has test support for doing this.

This work item is to move the integer bounds tests to platform-specific tests and make sure that the tests only run on platforms where the tests are expected to work. We should restrict the platforms to avoid the tests breaking on other platforms where assumptions may be different.

add discussion of const/volatile modifiers for new pointer types

The new pointer types allow const/volatile modifiers, just as other existing pointer types allow them. The syntax is different, of course. There is no discussion of type modifiers in the specification, so it should be added.

Add type compatibility rules for structure types

We need to add type compatibility rules to the Checked C specification for structure types. The rules are important for bounds-safe interfaces, where a member in one declaration of the type may have a bounds declaration and a member in another declaration can omit it.

Change specification to describe updated Checked C keywords

We've changed the Checked C keywords to begin with an underscore and a capital letter. This is to avoid conflicts with identifiers in existing programs. The keywords are _Ptr, _Array_ptr, _Checked, _Unchecked, and _Where. We will have a header file "stdchecked.h" that defines macros that map the original names to these new types.

We plan to use the original shorter names through the spec because that is the way we intend for code to be written. We need to describe the actual keywords and the assumption of the use of a header file when we introduce the new pointer types.

Allow bounds declarations for integer-typed variables

C programs sometimes cast pointers to integers, do arithmetic on the integer, and then cast the integer back to a pointer. Section 5.4 describes this and how bounds can be tracked through conversions between integers and pointers in expressions. Programmers should be able to assign the results of conversions to integers to variables. A programmer might want to use a temporary variable for a complex expression, for example. This means that we should allow bounds declarations for integer-typed variables.

The clang implementation already allows this. Section 5.4 of the specification needs to be updated to allow this.

improve description of cast operators

In the description of cast operators in Section 5.1.2, we need a description of dynamic_cast<T, A> (where dynamic_cast has two type arguments)

We should add a chart that summarizes the cast operators and their meanings, instead of just having text.

It would be helpful to have an example of dynamic_bounds_cast implemented by hand.

Add additional references to related work section

Here are some additional references to add:

New paper on data-oriented programming:

https://www.comp.nus.edu.sg/~shweta24/publications/dop_oakland16.pdf

Data-Oriented Programming: On the Expressiveness of Non-Control Data Attacks
Hong Hu, Shweta Shinde, Adrian Sendroiu, Zheng Leong Chua, Prateek Saxena, and Zhenkai Liang (National University of Singapore)

Predecessors to the current address sanitizers in GCC

GCC bounded pointers extension:
-- https://gcc.gnu.org/ml/gcc-patches/2000-03/msg00358.html
-- http://www.imperial.ac.uk/pls/portallive/docs/1/18619746.PDF
Mudflap, which succeeded the bounded pointers extension:
-- ftp://ftp.uvsq.fr/pub/gcc/summit/2003/mudflap.pdf

An earlier proposal for safe arrays/pointers

Safe arrays and pointers for C, by John Nagle: http://www.animats.com/papers/languages/safearraysforc43.pdf

Intel MPX extension

Change tests of out-of-bounds memory accesses to use signals.

The tests in tests\dynamic_checking intentionally cause out-of-bounds memory accesses. Currently, this causes processes to crash during testing. We are using the not program from the LLVM test harness with the --crash option to determine when processes have exited with a crash.

We have found that this approach is not working reliably on Windows. The not program spawns a child process and looks at the process exit code, looking for negative exit codes to indicate a crash. When running automated testing under Visual Studio Team Services, sometimes the exit code is a positive number. In addition, on Windows, it is bad form to just cause a process to crash. This causes Windows Error Reporting (WER) to be invoked, which causes spurious logging of errors and slows down testing.

Instead of allowing tests with out-of-bounds memory accesses to just crash, we will switch to using signals. The tests will catch signals generated by out-of-bounds memory accesses and exit. Currently, the Checked C clang implementation causes a illegal instruction signal (SIGILL) to be generated when a bounds check fails on x86 or x64. This may change in the future, but we'll catch this signal and use it to cause tests to exit early.

Revisit Restrictions on Bounds Expressions

In particular, the following is going to give us code generation problems, because we need to do a bounds check when we dereference p, which would use the bounds we're currently defining:

p : bounds(l, u)

where l or u contain any dereference of p, such as *p, p[i], p.f or p->f

Coloring of text missing in Section 3.7.3

We lost the highlighting of text in sections 3.7.3 and 4.7 when the document was converted to LaTex.

discuss mixing of safe and unsafe types

Do we want to allow unsafe pointers to point to safe pointers or vice versa? What about arrays?

Right now, the spec restricts multidimensional arrays so that all dimensions are checked or unchecked. Because a multidimensional array is technically an array of array types, this is a restriction on mixing checked and unchecked types. No other restrictions are placed on mixing other kinds of safe and unsafe types.

Update motivation and definitions of Explicit Dynamic Checks

After a discussion today, we realise we need to update section 2.8 on dynamic checks to reflect the motivation that these should be able to be checked statically or dynamically, as the compiler wishes, and therefore should contain only non-modifying expressions.

Allow uses of enumeration constants in member bounds

Currently the specification says that member bounds may only use members of the structure that is being defined. It would be useful to allow member bounds to use enumeration constants as well.

Add missing address-of cases to description of bounds declaration checking

Bounds declaration checking does not have rules for bounds for taking the address-of a pointer dereference expression (&*e) or the address of an array subscription expression (&e1[e2]). Those rules need to be added.

The address-of operator and the pointer dereference operator cancel in the C semantics, so no memory is accessed. &e1[e2]) is the same as &*(e1 + e2).

There is a question for both operators whether &*e should return the bounds for exactly the element that would be accessed or the bounds of e, which could be wider. We discussed this and decided that it should return the bounds of e. The bounds can be narrowed to an individual element when the result is assigned to a variable or through a cast operation. The issue is that there is no way to differentiate whether & should return a pointer to a single element is needed or a pointer to a slice of an array is needed. We agreed that we would a discussion of alternate possible designs.

Decide what to do about old-style C function declarations

C allows old-style function declarations where the types of function arguments are not checked at function calls. For example, you can write:

int f();

The function 'f' returns an integer, but takes unspecified arguments. Arguments are passed at the call site based on their types and some well-specified conversions to ints and doubles.

This is inherently unsafe because a mismatch between a function definition and function call in the number of arguments or the types of the arguments can corrupt the program call stack.

My initial proposal is the following:

We not allow bounds declarations to be used with old-style C function declarations.
We disallow passing arguments with checked pointer types to functions declared using old-style declarations.
We disallow calls to functions declared with old-style declarations in checked scopes. Even if checked pointers are not passed, the stack can still be corrupted.

Of course, a function can be declared with an old-style declaration and then have a subsequent redeclaration that adds the parameter list. The above rules would apply only to code that is only in the scope of the old-style declaration.

I am interested in feedback about this proposal.

Handle parameters with array types specially in description of bounds declaration checking

Typechecking in C treats a parameter with the type array of T as through it has the type "pointer to T". It does not enforce at function calls that any actual arguments have the size required by T.

This can easily result in incorrect code, For the function g,

int g(int input[10]) { ... }

there is no guarantee that g is passed a pointer to a 10-element array. The following incorrect code will typecheck.

int f() {
int myarr[3];
g(myarr, ...);
}

With the new array types, checking of bounds declarations should flag this as an error. The description of bounds declaration checking needs to be updated to handle parameters with array types specially.

Consider changing the treatment of null pointers.

Overview

I am concerned that the treatment of null pointers in Checked C will lead to too many runtime checks. We have been implementing the runtime checks required by the current Checked C specification. At memory accesses using an array_ptr, there would be a null pointer check followed by a bounds checks. At pointer arithmetic involving array_ptr, there will also be a non-null check before the pointer arithmetic operation. There will be a lot of checking.

The problem is the semantics that we’ve chosen for bounds when null pointers are around: a pointer is either null or has valid bounds. The problem is that this means that a null pointer may not have valid bounds. From Section 3.1 of the Checked C v0.6 specification:

The meaning of a bounds expression can be defined more precisely. At runtime, given an expression e with a bounds expression bounds(lb , ub ), let the runtime values of e , lb , and ub be ev ,lbv , and ubv , respectively. The value ev will be 0 (null) or have been derived via a sequence of
operations from a pointer to some object obj with bounds(low , high ). The following statement will be true at runtime: ev == 0 || (low <= lbv && ubv <= high ). In other words, if ev is null, the bounds may or may not be valid. If ev is non-null, the bounds must be valid. This implies
that any access to memory where ev != 0 && lbv <= ev && ev < ubv will be within the bounds of obj .

We chose this definition because C treats null pointers as interchangeable with other pointers. The definition results in less work and typing when converting programs. However, it has led to several issues in the semantics:

We can’t allow arithmetic involving a null pointer because that could lead to the forging of a non-null pointer with invalid bounds. This is why we need runtime checks on pointer arithmetic.
We “lose” bounds information when a pointer becomes null.

Proposal

We’re running into problems because we’re trying to combine bounds checking and the handling of null pointers. The fact that C pointers can either be null or point to valid objects is a source of complexity when reasoning about C programs.

I propose that we adapt the idea of nullable pointers to Checked C. We would use types to distinguish between the different ways in which null will be allowed or handled:

ptr values must point to valid objects that can hold values of type T. ptr values cannot be null.
array_ptr values can point anywhere in memory or be null. Bounds for array_ptr values must always be valid (a subrange of a valid object). This restricts when array_ptr values that have bounds can be null. It also prevents array_ptr values that are null from being used to access memory. Null is not within the range of any object, so bounds checks will always fail. No runtime checks are needed for pointer arithmetic.
We introduce a nullable modifier that can be applied to ptr and array_ptr types.
For a pointer of type nullable ptr<T>, a runtime null check is done before accessing memory.
For a pointer of type nullable array_ptr<T>, a runtime null check is done before accessing memory. The runtime null check precedes the bounds check. The bounds for a nullable array_ptr<T> are only required to be valid when the value is non-null.
Null pointer constants have empty bounds (corresponding to the empty object) instead of having ‘any’ bounds.
We may decide to allow conditional bounds expressions. I’d prefer to put this off for now.

Examples

It is a valid to assign a ptr variable a value that is guaranteed to be non-null. The following declarations and assignments are valid:

int y;
ptr<int> px = &y;
int arr[10];
px = &arr[5];

It is not valid to assign a ptr variable a value that is null. The following will be rejected at compile time:

ptr<int> px = NULL;

void f(int *a) {
  ptr<int> p = &*a;  // a could be null and a may not have valid bounds. 
}

It is valid to assign null to an array_ptr variable with bounds, if the bounds are empty:

int len = 0;
array_ptr<int> x : count(len) = NULL;

The empty bounds are a subrange of any valid object.

It is invalid to assign to null to an array_ptr variable with non-empty bounds. This declaration is invalid:

array_ptr<int> x : count(5) = NULL;

bounds(NULL, NULL + 5) is not a subrange of a valid object.

It is valid to assign null to an nullable array_ptr variable with non-empty bounds. This declaration is valid:

nullable array_ptr<int> x : count(5) = NULL

Additional thoughts

There is another way to understand why values with ptr cannot be null. The declaration ptr<T> x is equivalent to array_ptr<T> x : count(1). The bounds (NULL, NULL + 1) are invalid because no valid object includes NULL in it is bounds.
ptr values become pointers that can be used unconditionally (without runtime checks).
array_ptr only requires bounds checks.

Bounds-safe interfaces

My strawman proposal is to allow the keyword nullable to precede the in-line bounds declaration for an unchecked pointer type. For example:

void *calloc(size_t num, size_t size) : nullable byte_count(num * size);

This implies in a checked context that calloc returns a nullable array_ptr<void>.

For interface types, nullable can be applied as a type qualifier to _Ptr types. For example, the bounds-safe interface for the string-to-double function would be:

double strtod(const char * restrict nptr,
                char ** restrict endptr : itype(restrict _Nullable _Ptr<char *>));

If endptr is non-null, strtod returns the location where the conversion stopped by modifying *endptr.

Conversions

ptr values and array_ptr values can always be converted to nullable ptr and nullable array_ptr, respectively.
The reverse conversion (from nullable ptr and nullable array_ptr to ptr and array_ptr, respectively) is allowed only when it provable that the value being converted is not null.
Conversions from array_ptr to ptr continue to require that the array_ptr have bounds large enough to hold the ptr value.

Next steps

I modified the Checked C wrappers from the C standard library to add nullable type modifiers where necessary. I didn’t modify functions involving strings because we haven’t added support for null-terminated arrays. The results are on Github at https://github.com/dtarditi/checkedc/tree/nullable. There are two quick take-aways:

Most functions aren’t expecting or prepared to handle a null pointer : nullable modifiers were not needed in too many places.
It makes the interface descriptions more precise. This is no surprise; comparisons with SAL may arise. It seems better to have machine-checkable descriptions than to rely on imprecise English descriptions.

Release version 0.6 of the Checked C specification

We plan to release version 0.6 of the Checked C specification in December. There have been enough important improvements and changes to the specification that it is time to bump the version number.

Change array_view to span in specification document

The C++ standards committee is planning to use the name "span" for the type that is currently called "array_view" in the document. Update the document to use the new name to avoid confusion.

decide whether to use C underscore pattern for new keywords

Checked C introduces several new keywords: checked, unchecked, ptr, array_ptr, and where. It also sometimes treats identifiers based on context (count, bounds, and byte_count). The new keyword are problematic because they could break existing programs that use them new keywords as existing variable names.

C has a well-established design pattern for introducing new keyword: prefix them with _. This was used to introduce the Boolean type, for example. The keyword in C is _Bool. The unprefixed keywords can be used by including a header file that #defines the unprefixed keyword to the underscore based keyword. For example, the standard header file stdbool.h does this for Boolean.

We could follow the same pattern for Checked C, creating a new header file checked.h that #defines the names to new keywords.

The pros:

Standard C pattern.
Avoids breaking existing programs.
Allows incremental conversion.

The cons:

Now programs have to include a header file everywhere.
More verbose for programs that have been converted to be checked.

add uncheckedptr<T> type constructor

Greg Morrisett suggested that we add an unsafe_ptr constructor that is an abbreviation for T *. I think this is a good idea because it will allow code to be parameterized by the kind of pointer. This will be useful in C when using macros and likely useful when extending Checked C to C++.

usage of unchecked pointers with bounds-safe interface type annotation

The current design document has a way to declare that an unchecked pointer type should be treated as a _Ptr type in a bound-safe interface:

int *y : ptr;

This does not handle the case where there is a pointer to a pointer:

int **y : ?

We will be generalizing the ptr notation in the specification to be a checked pointer type (_Ptr and _Array_ptr). This allows an unchecked pointer type that points to other pointer types to be treated as a checked pointer type in a bounds-safe interface. For example:

int **y : _Ptr<_Ptr<int>>

This raises an interesting question. If a programmer specifies an _Array_ptr annotation but does not specify a bounds, should we allow the unchecked pointer to be used to access memory in unchecked code?

int ** y : _Array_ptr<Ptr<int>>

I believe the answer is no. The reasoning is this: an _Array_ptr without bounds has bounds(none) and cannot be used to access memory. It would clearly be an error for a programmer to assign an _Array_ptr with bounds(none) to a variable with bounds-safe interface and then use that variable to access memory. In other words, even uses of unchecked pointers must respect an interop annotation that says they are not expected to be used to access memory.

This does raise another possibility: should we require that all variables or members with _Array_ptr bounds-safe interface annotation have bounds declarations? The answer is clearly no. Such a variable or member may be a bound (or used in a bounds expression) by the programmer, with no intent that the variable or member actually be used to access memory.

Update checking of bounds declarations to handle lexically-hidden variables

We dropped the requirement that variables not be lexically hidden. We need to add some additional conditions about checking of bounds declarations involving lexically hidden variables. For example, we might need to add a condition that variables are suitably renamed before checking so that there is no lexical hiding, or something equivalent to that.

In a compiler implementation, this all drops out because the compiler will internally distinguish between variables in different scopes with the same name.

Revisit `const` qualifiers on checked types

C11 allows const-qualified types, we should make sure their constraints are reasonable, and if not add some for const-qualified checked types.

Revisit language around dynamic checks

We should be clearer about dynamic checks. In some places we're slightly too explicit about where and when a dynamic check will happen. This should be changed to say something like "at or before " which allows us to hoist some dynamic checks.

We will probably have to reference a set of rules for how we hoist checks, in order to prevent us from hoisting checks too early, but these can be fleshed out later.

Add rules for checking that variables and members with checked pointer types are initialized before use.

We need to write rules for making sure that checked pointer variables are definitely initialized before use. Use includes taking the address of a checked pointer variable. Also need to make sure that structure members with checked pointer types are initialized before use.

use-after-free, complementary library

I just came across this project and though I'd let you guys know about the complementary "language extension" library (note, shameless plug) for higher level C/C++ application code (as opposed to low-level system code).

I haven't read your whole spec document yet, but it seems to me like you are addressing the bounds-checking issue but not the "use-after-free" or "use-of-uninitialized-value" issues? While the bounds-checking is great, "use-after-free" seems to be responsible for more critical vulnerabilities these days. At least if you go by Pwn2Own or Chromium's critical security bug tracker. Have you guys considered addressing the use-after-free issue? If so, you might consider introducing a low-level pointer type analogous to mse::TRegisteredPointer<> that is automatically set to null when the target object is deallocated.

Describe bounds checking rules for member reference expression.

Figure out how to compile Checked C programs using existing compilers that do not support it.

We've thought a lot about how to make Checked C backwards compatible with C. Some people have pointed out that there is a different backwards compatibility problem to consider. How would a Checked C program be compiled by a C compiler that does not support Checked C? Without support for this, projects that use C to achieve wide portability across platforms are unlikely to adopt Checked C.

The usual approach would be to macroize the usage of Checked C extensions in the program. One would then strip the Checked C extensions from the programs via the macros when the program is compiled with a C compiler that does not understand Checked C. It would be helpful to define a standard set of macros for this.

One sticking point in doing this is the change to the syntax of pointers types that Checked C made. For example, _Ptr<int> x. We might consider changing syntax for checked pointer types to use parentheses so that _Ptr could be treated as a macro. _Ptr(int) x could be mapped to int *x. However, there will still problems for pointer types where the referent type is a function type or an array type. The simple conversion won't work. The C syntax for this requires embedding the identifier inline as part of the declarator. We could try to define macros that fix this, but that'll just dig the whole deeper.

An alternative is to allow usage of type modifiers on pointers. This is analogous to the way const can be declared on a pointer type and is what the Deputy system did. For example, one could write int * _Ptr. Then "deconverting" simply consists of erasing _Ptr.

We could allow both "old" and "new" syntax for constructing pointer types. People who are writing new code would probably prefer "_Ptr(t)" for t * _Ptr.

Update initialization requirements for automatic variables

For now, automatic variables with _Array_ptr or _Ptr type must have initializers. The spec needs to be updated to describe this clearly.