WARNING: this is just my experiment I work on in my free time. Do NOT rely on it.

poica

This library exports type-driven development to plain C11.

Motivation
Features
Installation
ADTs (algebraic data types)
Type introspection
Safe, consistent error handling
Built-in ADTs
Type-generic programming
Roadmap
FAQ

Motivation

Programming (especially in C!) is notoriously error-prone. The bad thing is that a part of the programmer's mistakes is usually detected during testing, and another part goes to production, leaving our users with improperly working devices and making business lose money.

The good news are that many kinds of mistakes can be detected during compilation, thereby:

Improving safety of our software;
Increasing execution speed by eliminating run-time assertions.

Powerful type systems are good for this, because they limit ranges of values that can be assigned to variables. Unfortunately, C has a very weak type system, which is unable to express most of business logic constraints between communicating software components. To resolve the situation, poica imitates various features from modern type systems via the macro system of C. The further explanations and examples will convey the key ideas to you!

Features

C11-compliant
Can work on bare-metal environments
Comes with the specification
No third-party code generators, just #include <poica.h> and go!

Installation

git clone https://github.com/Hirrolot/poica.git
cd poica

The only dependency is Boost/Preprocessor (yes, it supports plain C). If you are on a UNIX-like system, just run the following script:

sudo bash scripts/install_boost_pp.sh

Alternatively, Boost/Preprocessor can be downloaded and then installed from its official releases.

Since poica is a header-only library, feel free to copy necessary files to your project and #include <poica.h> to export its API (using the -I compiler option). That's all.

ADTs (algebraic data types)

ADTs provide a convenient approach to combine, destruct, and introspect data types. There are two main kinds of them: sum types and product types.

Simply put, a sum type is either of T1, ..., Tn, and a product type is both T1, ..., Tn. Another name of sum types is a tagged union, and product types correspond to structures in C.

Pattern matching is checking each variant of a sum type, and, if a matched variant is an actual one, trigger some action. They are like if statements, but for sum types, rather than for boolean expressions.

They have tremendous amount of applications to real-world programming, including:

Safe, consistent error handling
Compiler construction: tokens & AST evaluation
Concurrency: message passing

Motivation

Usually in C we use unions to tell a compiler that we're going to interpret a single memory region in different ways. To decide how to interpret a union, we endow it with a tag and get a tagged union.

However, there'll be quite lot of duplication in code:

typedef struct {
    enum {
        OUR_TAGGED_UNION_STATE_1,
        OUR_TAGGED_UNION_STATE_2,
        OUR_TAGGED_UNION_STATE_3,
    } state;

    union {
        int state_1;
        const char *state_2;
        double state_3;
    } data;
} OurTaggedUnion;

What's even worse is that this approach is unsafe, meaning that we can construct invalid OurTaggedUnion (i), or, for example, (ii) access data.state_1 when the actual state is OUR_TAGGED_UNION_STATE_3:

// (i)
OurTaggedUnion res1 = { .state = OUR_TAGGED_UNION_STATE_2, .data.state_1 = 123 };

// (ii)
OurTaggedUnion res2 = { .state = OUR_TAGGED_UNION_STATE_3, .data.state_3 = .99 };
some_procedure(res2.data.state_1);

poica solves these two problems by introducing algebraic data types (discussed in the next section). That's how it's accomplished with poica:

choice(
    OurTaggedUnion,
    variant(MkState1, int)
    variant(MkState2, const char *)
    variant(MkState3, double)
);

// (i) Compilation failed!
OurTaggedUnion res1 = MkState2(123);

OurTaggedUnion res2 = MkState3(.99);
some_procedure(/* Impossible to pass state_1! */);

Sum types

For example, a binary tree like this:

Can be conveniently represented as a sum type and further manipulated using pattern matching. In the code below we first construct this binary tree, and then print all its elements to stdout:

[examples/binary_tree.c]

#include <poica.h>

#include <stdio.h>

choice(
    Tree,
    variant(MkEmpty)
    variant(MkLeaf, int)
    variantMany(MkNode,
        field(left, struct Tree *)
        field(number, int)
        field(right, struct Tree *)
    )
);

void print_tree(const Tree *tree) {
    match(*tree) {
        of(MkEmpty) {
            return;
        }
        of(MkLeaf, number) {
            printf("%d\n", *number);
        }
        ofMany(MkNode, (left, number, right)) {
            print_tree(*left);
            printf("%d\n", *number);
            print_tree(*right);
        }
    }
}

#define TREE(tree)                obj(tree, Tree)
#define NODE(left, number, right) TREE(MkNode(left, number, right))
#define LEAF(number)              TREE(MkLeaf(number))

int main(void) {
    const Tree *tree =
        NODE(NODE(LEAF(81), 456, NODE(LEAF(90), 7, LEAF(111))), 57, LEAF(123));

    print_tree(tree);
}

#undef TREE
#undef NODE
#undef LEAF

Output

Product types

If we have structures in C, why do we need product types? Well, because product types provide type introspection (discussed in the next section). A product type is represented like this:

record(
    UserAccount,
    field(name, const char *)
    field(balance, double)
    field(age, unsigned char)
);

And it can be further manipulated like an ordinary structure:

UserAccount user = {"Gandalf", 14565.322, 715};
user.name = "Mithrandir";
user.age++;
user.balance *= 2;

Type introspection

Type introspection is supported in the sense that you can query the type properties of ADTs at compile-time and then handle them somehow in your hand-written macros.

Motivation

Sometimes it's desirable not to only declare new data types, but also to introspect their inner structure. Type introspection makes possible such things as these:

And more (planned!).

Sum types

[examples/introspection/choice.c]

#include <poica.h>

#include <stdio.h>

#include <boost/preprocessor.hpp>

#define MY_CHOICE                                                           \
    Something,                                                              \
    variant(MkA)                                                            \
    variant(MkB, int)                                                       \
    variantMany(MkC, field(c1, double) field(c2, char))

choice(MY_CHOICE);
#define Something_INTROSPECT POICA_CHOICE_INTROSPECT(MY_CHOICE)

int main(void) {
    puts(BOOST_PP_STRINGIZE(Something_INTROSPECT));
}

Output

((POICA_VARIANT_KIND_EMPTY)(MkA))
((POICA_VARIANT_KIND_SINGLE)(MkB)(int))
((POICA_VARIANT_KIND_MANY)(MkC)( ((c1)(double)) ((c2)(char)) ))

Product types

[examples/introspection/record.c]

#include <poica.h>

#include <stdio.h>

#include <boost/preprocessor.hpp>

#define MY_RECORD                                                           \
    Something,                                                              \
    field(a, int)                                                           \
    field(b, const char *)                                                  \
    field(c, double)

record(MY_RECORD);
#define Something_INTROSPECT POICA_RECORD_INTROSPECT(MY_RECORD)

int main(void) {
    puts(BOOST_PP_STRINGIZE(Something_INTROSPECT));
}

Output

((a)(int)) ((b)(const char *)) ((c)(double))

That is, the metainformation about types is actually a sequence in the terms of Boost/Preprocessor. So the BOOST_PP_SEQ_* macros can be used further, as well as other utility macros from poica.

Safe, consistent error handling

ADTs provide a safe, consistent approach to error handling. A procedure that can fail returns a sum type, designating either a successful or a failure value, like this:

typedef enum RecvMsgErrKind {
    BAD_CONN,
    NO_SUCH_USER,
    ...
} RecvMsgErrKind;

typedef const char *Msg;

DefRes(Msg, RecvMsgErrKind);

Res(Msg, RecvMsgErrKind) recv_msg(...) { ... }

And then Res(Msg, RecvMsgErrKind) can be matched to decide what to do in the case of Ok(Msg, RecvMsgErrKind) and Err(Msg, RecvMsgErrKind):

Res(Msg, RecvMsgErrKind) res = recv_msg(...);
match(res) {
    of(Ok(Msg, RecvMsgErrKind), msg) { ... }
    of(Err(Msg, RecvMsgErrKind), err_kind) { ... }
}

But why this is better than int error codes? Because of:

Readability. Such identifiers as Ok and Err are more for humans, and therefore, it's much harder to confuse them with each other. In contrast to this, the usual approach in C to determine an error is by using magic ranges (for example, <0 or -1).
Consistency. No need to invent different strategies to handle different kinds of errors (i.e. using exceptions for less likely errors, int codes for a normal control flow, ...); ADTs address the problem of error handling generally.
Exhaustiveness checking (case analysis). A smart compiler and static analysis tools ensure that all the variants of Res are handled in match, so we can't forget to handle an error and make a possibly serious bug by leaving an application work as there's no error, when there is.

ADTs even have advantages over exceptions: they do not perform transformations with a program stack, since they are just values with no implicit logic that can hurt performance.

See examples/error_handling.c as an example of error handling using ADTs.

Built-in ADTs

ADT	Description	Example
`Maybe`	An optional value	`examples/maybe.c`
`Either`	Either this value or that	`examples/either.c`
`Pair`	A pair of elements	`examples/pair.c`
`Res`	Either a successful or a failure value	`examples/error_handling.c`

The last one has been presented in the previous section. All these generic types share the common API:

// Generate a definition of an ADT.
DefX(T1, ..., Tn);

// Generate a type name.
X(T1, ..., Tn) = ...;

The utility functions can be found in the specification.

Type-generic programming

Type-generic programming is a way to abstract over concrete data types: instead of writing the same function or data structure each time for concrete types, you write it generically, allowing specific types to be substituted later.

Motivation

This problem is often addressed via void * in C. However, it has two big disadvantages:

A compiler is unable to perform type-specific optimisations;
void * types could be confused with each other;
Not self-documenting.

poica uses a technique called monomorphisation, which means that it'll instantiate your generic types with concrete substitutions after preprocessing, eliminating all the disadvantages of void *.

Generic types

Below is a trivial implementation of a generic linked list:

[examples/generic_linked_list.c]

#include <poica.h>

#include <assert.h>
#include <stddef.h>
#include <stdlib.h>
#include <string.h>

#define DeclLinkedList(type)                                                   \
    typedef struct LinkedList(type) {                                          \
        type *data;                                                            \
        struct LinkedList(type) * next;                                        \
    }                                                                          \
    LinkedList(type);                                                          \
                                                                               \
    static LinkedList(type) * listNew(type)(type item);                        \
    static void listFree(type)(LinkedList(type) * list);                       \
                                                                               \
    POICA_FORCE_SEMICOLON

#define DefLinkedList(type)                                                    \
    static LinkedList(type) * listNew(type)(type item) {                       \
        LinkedList(type) *list = malloc(sizeof(*list));                        \
        assert(list);                                                          \
                                                                               \
        list->data = malloc(sizeof(type));                                     \
        assert(list->data);                                                    \
        memcpy(list->data, &item, sizeof(type));                               \
        list->next = NULL;                                                     \
                                                                               \
        return list;                                                           \
    }                                                                          \
                                                                               \
    static void listFree(type)(LinkedList(type) * list) {                      \
        LinkedList(type) *node = list;                                         \
                                                                               \
        do {                                                                   \
            free(node->data);                                                  \
            LinkedList(type) *next_node = node->next;                          \
            free(node);                                                        \
            node = next_node;                                                  \
        } while (node);                                                        \
    }                                                                          \
                                                                               \
    POICA_FORCE_SEMICOLON

#define LinkedList(type) POICA_MONOMORPHISE(LinkedList, type)
#define listNew(type)    POICA_MONOMORPHISE(listNew, type)
#define listFree(type)   POICA_MONOMORPHISE(listFree, type)

DeclLinkedList(int);
DefLinkedList(int);

int main(void) {
    LinkedList(int) *list = listNew(int)(123);
    list->next = listNew(int)(456);
    list->next->next = listNew(int)(789);

    listFree(int)(list);
}

There's nothing much to say, except that POICA_MONOMORPHISE expands to a unique function or type identifier, e.g. performs type substitution.

Interfaces

An interface declares a collection of procedures, which shall be defined by its implemetors. Interfaces can be used to achieve ad-hoc polymorphism, by defining a parametrically polymorphic procedure with type constraints on a specific interface.

For example, consider the Register interface with load and store operations:

[examples/swap_registers.c]

#include <poica.h>

#include <stdio.h>

#define declRegisterLoad(type) static type registerLoad(type)(const type *self)
#define declRegisterStore(type)                                                \
    static void registerStore(type)(type * self, const type *src)

#define registerLoad(type)  POICA_MONOMORPHISE(registerLoad, type)
#define registerStore(type) POICA_MONOMORPHISE(registerStore, type)

And then we can define a parametrically polymorphic swap procedure, taking three pointers to some type, which implements Register:

#define declSwap(type)                                                         \
    static void swap(type)(type * left, type * right, type * tmp)
#define defSwap(type)                                                          \
    declSwap(type) {                                                           \
        registerStore(type)(tmp, left);                                        \
        registerStore(type)(left, right);                                      \
        registerStore(type)(right, tmp);                                       \
    }                                                                          \
                                                                               \
    POICA_FORCE_SEMICOLON

#define swap(type) POICA_MONOMORPHISE(swap, type)

After that, we implement Register and define swap for int:

declRegisterLoad(int);
declRegisterStore(int);
declSwap(int);

declRegisterLoad(int) {
    return *self;
}

declRegisterStore(int) {
    *self = *src;
}

defSwap(int);

The main procedure looks like this. Here we work only with int as a register, but later you can implement Register for arbitrary types in the same manner.

int main(void) {
    int ax = 2, bx = 63, tmp = 0;
    printf("ax = %d, bx = %d\n", ax, bx);

    swap(int)(&ax, &bx, &tmp);

    printf("ax = %d, bx = %d\n", ax, bx);
}

Output

ax = 2, bx = 63
ax = 63, bx = 2

HKTs (higher-kinded types)

Higher-kinded types allow to write code even more generically. Consider these facts:

int has kind *
LinkedList, Vect, Set have kind * -> *
HashMap has kind * -> * -> *

Do you see the pattern? int is already a concrete type, so its kind is just *. To drive LinkedList to a concrete type, we need to apply some other type to it, i.e. POICA_MONOMORPHISE(LinkedList, SomeType).

poica supports partial application of higher-kinded types, meaning that you can pass a higher-kinded type as a type argument into another generic type, thereby completing it at some later point.

For instance, TreeG (taken from the SO answer) has kind (* -> *) -> * -> *:

[examples/hkt.c]

#include <poica.h>

#define DefTreeG(branch, type)                                                 \
    choice(                                                                    \
        TreeG(branch, type),                                                   \
        variantMany(Branch(branch, type),                                      \
            field(data, type)                                                  \
            field(branches,                                                    \
                POICA_MONOMORPHISE(branch, TreeG(branch, type))                \
            )                                                                  \
        )                                                                      \
        variant(Leaf(branch, type), type))

#define TreeG(branch, type)  POICA_MONOMORPHISE(TreeG, branch, type)
#define Branch(branch, type) POICA_MONOMORPHISE(Branch, branch, type)
#define Leaf(branch, type)   POICA_MONOMORPHISE(Leaf, branch, type)

The branch type parameter has kind * -> *, so it can be something like LinkedList or Vect. Below we define BinaryTree and WeirdTree. They are about to be passed into TreeG later:

#define DefBinaryTree(type)                                                    \
    record(                                                                    \
        BinaryTree(type),                                                      \
        field(left, struct type *)                                             \
        field(right, struct type *)                                            \
    )
#define BinaryTree(type) POICA_MONOMORPHISE(BinaryTree, type)

#define DefWeirdTree(type)                                                     \
    record(                                                                    \
        WeirdTree(type),                                                       \
        field(text, const char *)                                              \
    )
#define WeirdTree(type) POICA_MONOMORPHISE(WeirdTree, type)

DefBinaryTree(TreeG(BinaryTree, int));
DefTreeG(BinaryTree, int);

DefWeirdTree(TreeG(WeirdTree, int));
DefTreeG(WeirdTree, int);

And they can be constructed as follows:

void binary_tree(void) {
    TreeG(BinaryTree, int) _456_leaf = Leaf(BinaryTree, int)(456);
    TreeG(BinaryTree, int) _789_leaf = Leaf(BinaryTree, int)(789);

    TreeG(BinaryTree, int) binary_tree =
        Branch(BinaryTree, int)(123,
                                (BinaryTree(TreeG(BinaryTree, int))){
                                    &_456_leaf,
                                    &_789_leaf,
                                });
}

void weird_tree(void) {
    TreeG(WeirdTree, int) weird_tree_1 =
        Branch(WeirdTree, int)(123,
                               (WeirdTree(TreeG(WeirdTree, int))){
                                   .text = "Hey",
                               });
}

Roadmap

Higher-order abstract syntax
Allow specifying attributes on ADTs and their items (probably a breaking change).
Add several libraries of wrappers for common libraries (see issue #1).
Add a library for type-driven JSON (de)serialization (see serde-json).
Add a library for type-driven command-line argument parsing (see clap).

FAQ

Q: What "poica" means?

A: "poica" is a Quenya word, which means clean, pure. It reflects its API.

Q: Any pitfalls?

Scary macro errors, describing consequences, not causes.
Macro blueprinting. It occurs typically in advanced metaprogramming and can be solved using the DEFER + EXPAND combination, presented in the link above.

Q: Why ADTs are algebraic?

A: Read "The algebra (and calculus!) of algebraic data types" by Joel Burget.

Q: How to resolve name collisions?

A: #define POICA_USE_PREFIX before #include <poica.h> renames all the camelCaseed and PascalCaseed identifiers (match -> poicaMatch, DefRes -> PoicaDefRes, ...) in the current translation unit.

lisprez / poica Goto Github PK

poica's Introduction

poica

Table of contents

Motivation

Features

Installation

ADTs (algebraic data types)

Motivation

Sum types

Product types

Type introspection

Motivation

Sum types

Product types

Safe, consistent error handling

Built-in ADTs

Type-generic programming

Motivation

Generic types

Interfaces

HKTs (higher-kinded types)

Roadmap

FAQ

poica's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org

Jobs