GithubHelp home page GithubHelp logo

abc-llvm's People

Contributors

michael-lehn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

gmh5225

abc-llvm's Issues

Empty strings are treated by the compiler as a string of two double quotes

Description

Two double quotes directly after another are interpreted by the compiler as a string of two double quotes instead of an empty string.

I will call them strings for convenience reasons, although they are actually pointers to an u8.
type String : -> u8

Expected behavior: "" should be interpreted as if there is nothing in the string.
Actual behavior: "" is interpreted like the string "\"\"".

This appears to be an bug either in the parser or in the pre-processor.

To Reproduce

The following code segment is used to analyze the strings and to make the reproducibility simpler.

@ <stdio.hdr>

// Type aliases for better readability
type Char : u8;
type String : -> Char;

// Returns the length of a String
fn strlen (str : String) : size_t {
    local length : size_t = 0;
    for (; *str != 0; ++str, ++length) {;} // Explicit empty statement
    return length;
}

// Analyzes a String by printing it out together with its length.
fn analyzeString (string : String) {
    local length : size_t = strlen(string);
    printf("The String '%s' is %zu Char(s) long.\n", string, length);
}

In the following example the emptyString is analyzed and it will be printed to the console that the string has a length of 2 and the content is "" (exact output: The String '""' is 2 Char(s) long.).

fn main () {
    local emptyString : String = "";
    analyzeString(emptyString);
}

If multiple empty strings are concatenated by the pre-processor the number of double quotes will just stack.
The call analyzeString("" "" "" ""); would output The String '""""""""' is 8 Char(s) long. on the console.

But if text is inserted into the double quotes the behavior returns to what is expected for those parts of the string concatenation.
The call analyzeString("" "123" "" "abc"); would output The String '""123""abc' is 10 Char(s) long. on the console.

Current workaround

If a pointer to an u8 with the value 0 is used, it behaves like an empty String.

fn main () {
    local zeroValue : u8 = 0
    analyzeString(&zeroValue)
}

In this case the console output will be The String '' is 0 Char(s) long..

Buffer overflows with strings as global variables

Description

It is possible to create buffer overflows with global string variables.
For convenience I will call them strings, although they are actually pointers to u8 or arrays of u8.

To Reproduce

The following code should illustrate the problem.

@ <stdio.hdr>
extern fn exit (:int);

global string0 : array [32] of u8 = "Malicious Buffer Overflow       "; // Misses the zero char at the end
global hexu64 : u64 = 0x6161616161616161; // 8 printable bytes buffered to get to the next string address
global hexu64_ : u64 = 0x6161616161616161; // 8 printable bytes buffered to get to the next string address
global hello1 : array [6] of u8 = "Hallo1"; // Misses the zero char at the end
global hello2 : array [7] of u8 = "Hallo2"; // Has the zero char
global hello3 : array [6] of u8 = "Hallo3"; // Misses the zero char at the end
global hello4 : array [6] of u8 = "Hallo4"; // Misses the zero char at the end

fn main () {
    printf (">> %zu @ %s\n", string0, string0);
    printf (">> %zu @ %s\n", hello1, hello1);
    printf (">> %zu @ %s\n", hello2, hello2);
    printf (">> %zu @ %s\n", hello3, hello3);
    printf (">> %zu @ %s\n", hello4, hello4);
    printf (">> %zu @ %s\n", &hexu64, &hexu64);
    exit (0);
}

The output of this snippet on my machine is the following :

>> 140356855304208 @ Malicious Buffer Overflow       aaaaaaaaaaaaaaaaHallo1Hallo2
>> 140356855304256 @ Hallo1Hallo2
>> 140356855304262 @ Hallo2
>> 140356855304269 @ Hallo3Hallo4
>> 140356855304275 @ Hallo4
>> 140356855304240 @ aaaaaaaaaaaaaaaaHallo1Hallo2

It illustrates, that it is possible to have strings stored in the global namespace "escape" its bounds because the array of u8 is allowed to be one byte to small to store the terminating zero-byte.

In the example global hello2 : array [7] of u8 = "Hallo2" it is shown, that this overflow does not happen, when the array is one byte bigger, so that it can now contain the zero-byte.

The example print of string0 even shows, that it is possible to overflow not only the to the neighboring string, but even read through other variables, that should normally not be printed as text.

Reason and possible Solutions

Looking at the assembler code the behavior can be explained.

string0:
	.ascii	"Malicious Buffer Overflow       "
	.size	string0, 32

	.type	hexu64,@object
	.p2align	3, 0x0
hexu64:
	.quad	7016996765293437281
	.size	hexu64, 8

	.type	hexu64_,@object
	.p2align	3, 0x0
hexu64_:
	.quad	7016996765293437281
	.size	hexu64_, 8

	.type	hello1,@object
hello1:
	.ascii	"Hallo1"
	.size	hello1, 6

	.type	hello2,@object
hello2:
	.asciz	"Hallo2"
	.size	hello2, 7

	.type	hello3,@object
hello3:
	.ascii	"Hallo3"
	.size	hello3, 6

	.type	hello4,@object
hello4:
	.ascii	"Hallo4"
	.size	hello4, 6

	.type	.L0,@object
	.section	.rodata,"a",@progbits

This snippet only contains the global space for variables. The assembler code for the main function was ommitted.

As the assembler code shows, only the hello2 variable is stored as an .asciiz, which denotes a zero terminated ascii string.
The other strings are only stored as .ascii, which means that they are not zero terminated.
This may be intended behavior, since if arrays of u8 are used, the size may be known.

The issue is, that it is required to use arrays to store global strings and those arrays to not require, that the last entry is a zero terminator.

Possible solutions are either to force the array size of the u8 array to be big enough to store the zero terminator, if string literals are stored into them.
This may be a easy solution, since a length check is already in place as shown below. But it only checks if the array is long enough to store the string excluding the zero terminator.

global hello5 : array [5] of u8 = "Hallo5";
global hello5 : array [5] of u8 = "Hallo5";
                                  ^^^^^^^^^
overflow.abc:13.35-13.43: : error: : excess elements in array initializer

An other solution would be to allow the storing of zero terminated strings in u8 pointers like in the following example.
This would also simplify writing global string variables, since you do not need to update the length of the global array all of the time, just to have enough space for the string or to prevent global space to be allocated without ever being used.

global moin : -> u8 = "Moin";
gen::cat: can not cast 'array [5] of u8' to '-> u8'
abc: gen/cast.cpp:39: llvm::Value* gen::cast(gen::Value, const abc::Type*, const abc::Type*): Assertion `0' failed.

As shown this is currently not possible and it even creates an assertion.
This is something that also should be fixed and may be content of a future issue, when I understand it good enough.

TL;DR

It is possible to create overflows with string variables, since the compiler presumably thinks the length of a string literal is only the written content without the zero terminator (i.e. the empty string "" has length 0 but it should actually contains 1 character: the zero terminator).

Switch-case fallthrough crashes the compiler

Description

The fall through feature of the switch-case construct creates a segmentation fault in the compiler.

To Reproduce

The following mimmal example creates the error.

fn main () {
    switch (42) {
        case 1:
        case 2:
    }
}

This issue is occurs, when 2 or more case clauses follow another without a statement separating them.

A single semicolon after case 1: is enough to circumvent the error.

fn main () {
    switch (42) {
        case 1:;
        case 2:
    }
}

I am using WSL with Ubuntu 22.04.4 LTS, LLVM in version 18.1.3 and the abc compiler was compiled with clang++-18 in the Ubuntu clang version 18.1.3 with the target x86_64-pc-linux-gnu.

Further inside

The usage of a case clause directly followed by a default clause (or vis versa) does not create this problem.
It only occurs if two or more case clauses are involved with no statements in between. But the following also creates an error.

fn main () {
    switch (42) {
        case 1:
        default:
        case 2:
    }
}

The usage of valgrind has shown, that the issue may lie with LLVM or in an function call to it. Valgrind reports:

==2774== Invalid read of size 1
==2774==    at 0x981395F: ??? (in /usr/lib/llvm-18/lib/libLLVM.so.1)
==2774==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==2774==
==2774==
==2774== Process terminating with default action of signal 11 (SIGSEGV)
==2774==  Access not within mapped region at address 0x0
==2774==    at 0x981395F: ??? (in /usr/lib/llvm-18/lib/libLLVM.so.1)
==2774==  If you believe this happened as a result of a stack
==2774==  overflow in your program's main thread (unlikely but
==2774==  possible), you can try to increase the size of the
==2774==  main thread stack using the --main-stacksize= flag.
==2774==  The main thread stack size used in this run was 8388608.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.