GithubHelp home page GithubHelp logo

thradams / cake Goto Github PK

View Code? Open in Web Editor NEW
503.0 10.0 18.0 72.3 MB

Cake a C23 front end and transpiler written in C

Home Page: http://thradams.com/cake/index.html

License: GNU General Public License v3.0

C 99.44% CSS 0.01% HTML 0.40% C++ 0.16% Batchfile 0.01%
c compiler front-end transpiler c23 cake c2x static-analysis

cake's Introduction

The C Programming language 1978

"C is a general-purpose programming language which features economy of expression, modern control flow and data structures, and a rich set of operators. C is not a "very high level" language, nor a "big" one, and is not specialised to any particular area of application. But its absence of restrictions and its generality make it more convenient and effective for many tasks than supposedly more powerful languages."

"In our experience, C has proven to be a pleasant, expressive, and versatile language for a wide variety of programs. It is easy to learn, and it wears well as one's experience with it grows"

The C Programming language Second Edition 1988

"As we said in the preface to the first edition, C "wears well as one's experience with it grows." With a decade more experience, we still feel that way."

Me 2024

As my experience with any language grows, more a like C.

๐Ÿฐ Cake

Cake is a compiler front end written from scratch in C, designed from the C23 language specification. It allows you to translate newer versions of C, such as C23, to C99. Additionally, Cake provides a platform for experimenting with new features for the C language, including extensions like lambdas, defer and static ownership checks.

Web Playground

This is the best way to try.

http://thradams.com/cake/playground.html

Use cases

If you have a project that is distributed with code, you don't need to limit the project development at the lower supported language version. For instance, you can use attributes like nodiscard during the development or defer, both features improving the code security. Then adding a extra step in your build you can distribute a readable C99 source code that compiles everywhere. Cake can also be used as static analyzer, especially the new ownership analysis.

Features

  • C23 preprocessor
  • C23 syntax analysis
  • C23 semantic analysis
  • Static ownership checks (Extension)
  • Sarif output
  • C backend
  • AST

Build

GitHub https://github.com/thradams/cake

MSVC build instructions

Open the Developer Command Prompt of visual studio. Go to the src directory and type

cl build.c && build

This will build cake.exe, then run cake on its own source code.

GCC (Linux) build instructions

Got to the src directory and type:

gcc build.c -o build && ./build

To run unit tests windows/linux add -DTEST for instance:

gcc -DTEST build.c -o build && ./build

Emscripten build instructions (web)

Emscripten https://emscripten.org/ is required.

First do the normal build.

The normal build also generates a file lib.c that is the amalgamated version of the "core lib".

Then at ./src dir type:

call emcc -DMOCKFILES "lib.c" -o "Web\cake.js" -s WASM=0 -s EXPORTED_FUNCTIONS="['_CompileText']" -s EXTRA_EXPORTED_RUNTIME_METHODS="['ccall', 'cwrap']"

This will generate the \src\Web\cake.js

Running cake at command line

Make sure cake is on your system path.

Samples

cake source.c

this will output ./out/source.c

See Manual

Road map

  • Ownership static analysis
  • Fixes
  • Ownership specification

References

How did we get here?

A copy of each C standard draft in included in docs folder.

A very nice introduction was written by Al Williams

C23 Programming For Everyone

https://hackaday.com/2022/09/13/c23-programming-for-everyone/

Influenced by

  • Typescript
  • Small C compilers

Participating

You can contribute by trying out cake, reporting bugs, and giving feedback.

Have a suggestion for C?

DISCORD SERVER

https://discord.gg/YRekr2N65S

How cake is developed?

I am using Visual Studio 2022 IDE to write/debug cake source. Cake is parsing itself using the includes of MSVC and it generates the out dir after build.

I use Visual Studio code with WSL for testing and compiling the code for Linux.

Cake source code is not using any extension so the output is the same of input. This compilation is useful for tracking errors together with the unit tests.

Differences from CFront

CFront was the original compiler for C++ which converted C++ to C.

CFront generated code was used only for direct compilation because it had all macros expanded making it useless to reuse the generated code in other platforms.

Cake have two modes. One is for direct compilation (like CFront) and the other preserves macros includes etc.. making it suitable for distribution.

The other difference is that C++ is a second branch of evolution making C++ more compatible with C89 than C99.

The idea of Cake is to keep the main line of evolution of C and be always 100% compatible. Cake โ™ฅ C.

The added extensions aims to keep the spirit of the language and implement proposed features in a way they can be experimented even before standardization.

cake's People

Contributors

apaz-cli avatar hcmh avatar imaami avatar robinlinden avatar rurban avatar thradams avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cake's Issues

Array types are pointers for _Generic and others.

 int  main()
 {
   char s[2];
   _Generic(s,
            char *: "char*",
            char [2]: "char [2]",
            default: "?");
}

It is returning "char [2]" and should return "char*".

The idea is to keep the most complete information always, but we need to
convert to "lower information" when required by the standard.

maybe add something like

struct type type_lvalue_conversion(struct type*);

and call it on _Generic.

preprocessing emulation

The preprocessing emulation seems to get stuck on a first call to a macro:

typedef typeof(nullptr) nullptr_t;

#define giveit(X) _Generic(0UL, unsigned long: 1, void*: 2, nullptr_t: 2, default: 0)

#pragma expand giveit

int main(void)
{
   int a0 = giveit(0UL);
   int b0 = giveit(nullptr);
   int c0 = giveit((void*)0);
   int d0 = giveit(0);
}

Here the preprocessing output that is generated is

typedef typeof(nullptr) nullptr_t;

int main(void)
{
   int a0 = _Generic(0UL,unsignedlong:1,void*:2,nullptr_t:2,default:0);
   int b0 = _Generic(0UL,unsignedlong:1,void*:2,nullptr_t:2,default:0);
   int c0 = _Generic(0UL,unsignedlong:1,void*:2,nullptr_t:2,default:0);
   int d0 = _Generic(0UL,unsignedlong:1,void*:2,nullptr_t:2,default:0);
}

which just seems to reuse the argument for the first call to following calls.

(also the fact that this output has not enough space characters is a bit disturbing.)

typeof error

This code

int a[2];
typeof(a) b;

is generating

int a[2];
int[2] b;

instead of

int a[2];
int b[2];

Preprocessor rescanning

I was reading this doc

https://learn.microsoft.com/en-us/cpp/preprocessor/preprocessor-experimental-overview?source=recommendations&view=msvc-170

and found the same problem in cake preprocessor for the following item

"Rescanning replacement list for macros"

#define CAT(a,b) a ## b
#define ECHO(...) __VA_ARGS__
// IMPL1 and IMPL2 are implementation details
#define IMPL1(prefix,value) do_thing_one( prefix, value)
#define IMPL2(prefix,value) do_thing_two( prefix, value)

// MACRO chooses the expansion behavior based on the value passed to macro_switch
#define DO_THING(macro_switch, b) CAT(IMPL, macro_switch) ECHO(( "Hello", b))
DO_THING(1, "World");

// Traditional preprocessor:
// do_thing_one( "Hello", "World");
// Conforming preprocessor:
// IMPL1 ( "Hello","World");

Feature request: struct introspection

Another of the features I really miss in C is struct introspection (ie: having automatic getters and setters for struct fields accessed by their numerical index), which would open the door to automatic struct serialization and very useful related things.

For example, I imagine something like this:

struct myvector{
   double x;
   double y;
   double z;
};

struct mydata{
   uint32_t onevalue;
   struct myvector v;
   char  id[5];
};

struct mydata datavalues={0};

size_t numfields = numfieldsof(struct mydata);
for(size_t f=0; f<numfields; f++) {
   fread(&datavalues+offsetof(fieldof(struct mydata, f), sizeof(fieldof(struct mydata, f), 1, fileptr));
};

This would need two new operators: numfieldsof(), and fieldof(), and of course offsetof() should be available.
I also foresee the need of assigning flags or attributes to struct fields, so that the program could have for example an enum for the type of the field (which could be necessary in some kinds of serialization), or a flag for not serializing a field which shouldn't be serialized). Perhaps an extra operator returning the field name as a C string could be nice as well, something that could let you do printf("The name of the third field in struct mydata is: '%s'", fieldident(fieldof(struct mydata, 2)));

Things that might require some additional thought:

  • Pointers to dynamic arrays (maybe a raw implementation as just a pointer could be the way to go).
  • C strings (null terminated strings: if they are arrays, no problem, but if they are pointers to dynamic arrays, then it's perhaps a particular case of previous comment).
  • Bitfields (not sure if they would need an additional consideration).
  • How to let the program define flags or enums for each struct field (i.e: How to set and read per-field flags such as "DONT_SERIALIZE", "TYPE_U32", "TYPE_FLOAT"... of course all these enums or definitions should be something defined only in the user program... Cake shouldn't know nor care about these flags/enums, because otherwise it would limit this feature by constraining what the implementation can do and cannot do).

Note that nested structs shouldn't require additional thought: My simple example above has a nested struct.

Better suport for defer at AST

Today defer is implemented at code generation with little support from AST.
For instance, a stack of defers is built at code generation. (visit_ctx)

I am considering put this stack at the AST to have better support and to be able to
do static analysis considering the effects of defer happens at end of scope not where
defer is parsed.

To be able to check this effect the parser needs to evaluate defer at the end of scope
during compilation. Having this defer stack at AST may help.

Annotation feature also requires this analysis.

creating embed files

Today #embed expand the sequence
For instance

  static const char file_txt[] = {
   #embed "stdio.h"
   ,0
  };

becomes

static const char file_txt[] = {
    35,112,113, /*... lot more ... */ 10
   ,0
  };

The idea of this feature is add an option to generate a file with "embed_" prefix that is included
resulting in.

  static const char file_txt[] = {
   #include "embed_stdio.h"
   ,0
  };

that looks much more like the original version.

style options for -n option

-n Check naming conventions (it is hardcoded for its own naming convention)

Today it is hardcode of it own style.

type normalization

When building a type data struct we need remove all extra useless parentheses.
Like this etc..
'''c
char (((A));
is the same of
char A;
'''

wrong size / type literals

'''c
static_assert(sizeof("abc") == 4);
static_assert(sizeof("\nabc") == 5);
static_assert(sizeof(L"abc") == 16);
'''

nullptr does not seem to have a dedicated type

The following

int main(void)
{
    auto a = nullptr;
}

is resolved to long, even when the target language is C23. In general one of the points of nullptr is to be able to do _Generic with it an distinguish macro calls that receive nullptr from other pointer values.

constness troubles

Qualifiers seem to be treated quite wrongly

int main()
{
   double const x = 78.9;
   double y = 78.9;
   auto q = x;
   auto const p = &x;
   auto const r = &y;
}

Here this results in q being const double, which is wrong because the initializer is undergoing lvalue conversion and looses qualification.

p and r are even worse, because the const doesn't land on the right side of the * as it should. The const in an auto declaration applies to the whole type on the RHS, much as if it where encapsulated in a typedef.

C23 auto

With this sample

int main() {
 int a5[5];
 auto a[3] = &a5;
}

GCC emits
error: 'auto' requires a plain identifier, possibly with attributes, as declarator

Cake generates (similar what happens with typedefs)

int main()
{
 int a5[5];
 int  (* a[3])[5] = &a5;
}

Also

int main()
{
 int * p2;
 auto * p2 = p1;
}

GCC
error: 'auto' requires a plain identifier, possibly with attributes, as declarator

Cake generates

int main()
{
  int * p1;
  auto * p2 = p1;
}

and warning: auto with pointer is UB in C23

return type not working

The same function we use must accept function or function pointer
and be able to run this code

int (*(*F1)(void))(int, int*);
int (* F2(void) )(int, int*);
static_assert(_Generic(F1(), int (*)(int, int*) : 1));
static_assert(_Generic(F2(), int (*)(int, int*) : 1));

missing conversions in auto initializers

Arrays and functions are not converted to pointers in expressions where they should.
A simple example

int main(void)
{
    double ar[6];
    extern int func(void);
    auto a = ar;
    auto f = func;
}

Both a and f should have pointer type.

Redeclaration error

At file scope you can have multiple declarations.

typedef int I;
I i;
int i;
int i = 0;

In function scope, params only one. except typedefs.

typedef int I;
typedef int I;
typedef int I;
typedef int I;
I i;
int i;
int i = 0;

int main()
{
    typedef int I;
    typedef int I;
}

When multiple declaration are accept they must have the same type. This is missing

wrong type comparison

typedef int A1[2];
typedef A1 *B1 [1];

static_assert(
  (typeof(B1)) == (int (* [1]) [2])
  );

      
typedef int A2[2];
typedef A2 *B2 ;

static_assert(
  (typeof(B2)) == (int (*) [2])
  );

nullptr does not seem to have a dedicated type

The following

int main(void)
{
    auto a = nullptr;
}

is resolved to long, even when the target language is C23. In general one of the points of nullptr is to be able to do _Generic with it an distinguish macro calls that receive nullptr from other pointer values.

Wrong error report

Wrong error report here

void F() {
  char (*s)[10];
  *s + 1;
}

error: left operator is not scalar

Incorrect error report array

void f() {
    int a[] = { 1, 2 };
    *(a + 1) = 1;
}

error: indirection requires pointer operand.
The problem is caused because x.s + 1 is not considered pointer.

Wrong ouput of elif

For some reason "defined(C)" is disappearing.

#if defined(A)
#if defined(B)
#elif defined(C)
#endif
#endif

Is generating

#if defined(A)
#if defined(B)
#elif 
#endif
#endif

Strange 8s in output

#include <windows.h>
int a;

is generating

#include <windows.h>
888888888int a;

Feature request: define user numerical types with their arithmetic operators

One of the features I miss the most in C is being able to define for example my own floating point data type and provide its arithmetic operators implementation. There's a proposal in WG14: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3051.pdf (and I believe the LCC win32 compiler implements that).

But, aside from that proposal, given that Cake translates to C99, I believe the implementation could be quite straightforward, because the generated C99 code would just need to call the function pointers of the operators implementation provided by the user. Then maybe the data storage could be provided by an opaque struct defined by the user and its sizeof() result.

Strange character on the middle of windows header

In the middle of some windows header like

C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winnt.h:9763:2

We can find a strange character that produces an error:

Opening the file inside VC IDE is possible to see the character.

What is the problem?
This causes a surprise and build error for people trying cake.
(I delete this char myself)

Implementing sizeof / alignof

sizeof/alignof does not affect transpilation but it may trigger some static asserts and it is on the roadmap
of the complete front end compiler.

Missing size of array

Missing type information that a is 3 elements.

int a[] = { 1, 2, 3 };
static_assert(sizeof(a) == sizeof(int) * 3);

Ensure we have a guaranteed path to free and destroy

Ensure we have a guaranteed path to free and destroy

We should have a warning here

[[free]] void *  malloc(int i){}
void free([[free]] void *p) {}

struct X {
  int i;
};

[[free]] struct X* f() {
    struct X * p = malloc(1);
    struct X * p2;
    p2 = p;
    return p2; /*p2 is moved*/
}

int f(int runtime_condition) {
   struct X * p = f();


   if (runtime_condition)
   {
     free(p); /*not guaranteed path*/
   }
}
int main() {}

Parsing error

Error 'i' not found in

void (*f(int i))(void) {
        i = 1;
        return 0;
}

The problem was the parameters scope was (void) instead of (correct one) (int i).

Move static analisys to "visit"

Cake is doing static analysis while parsing (for instance ownership checks).
The problem is that it is impossible to handle gotos without the full AST.
We cannot see for instance, what are the scopes we are leaving without having
the goto destination. The we cannot check defers and end-of-life of variables.

To be able to do static analysis of defer, we also may need a secondary phase.

fix generic sample using lambdas and typeof

This is a regression. We didn't have test for this.
Lambdas are the most complicated code generation because local declarations have to be exported then renamed locally.
Its the only feature that requires two passes.

I am also considering to remove lambdas to focus on existing C23 features. Like enum and constexpr.

typeof error inside typename

typeof(int [2]) *p1;
int main(){
 auto p2 = (typeof(int [2]) *) p1 ;
}

Generating error code at line 3 (line 1 is correct)

 int  (* p2)[2] = (int[2] *) p1 ;

should be:

 int  (* p2)[2] = (int (*)[2]) p1 ;

fixing auto type inference details

type for auto should be const char* not array of chars.

int main()
{
  auto s = "string";
  static_assert(_is_same(const char*, typeof(s)));
}

duplicated try catch make already used label

If we repeat a try catch block:
#include <stdio.h>
int main()
{
FILE * f = NULL;
try
{
f = fopen("file.txt", "r");
if (f == NULL) throw;
/success here/
}
catch
{
/some error/
}
if (f)
fclose(f);

try
{
f = fopen("file.txt", "r");
if (f == NULL) throw;
/success here/
}
catch
{
/some error/
}
if (f)
fclose(f);
}

The compiled code, in the second try catch block, use the same label causing a compile error.

implement __func__

int fname()
{
  const char * s = __func__;
  _Static_assert(sizeof(__func__) == 6, "");
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.