michael-lehn / abc-llvm Goto Github PK
View Code? Open in Web Editor NEWABC: A Bloody Compiler for A Better C
ABC: A Bloody Compiler for A Better C
Two double quotes directly after another are interpreted by the compiler as a string of two double quotes instead of an empty string.
I will call them strings for convenience reasons, although they are actually pointers to an u8
.
type String : -> u8
Expected behavior: ""
should be interpreted as if there is nothing in the string.
Actual behavior: ""
is interpreted like the string "\"\""
.
This appears to be an bug either in the parser or in the pre-processor.
The following code segment is used to analyze the strings and to make the reproducibility simpler.
@ <stdio.hdr>
// Type aliases for better readability
type Char : u8;
type String : -> Char;
// Returns the length of a String
fn strlen (str : String) : size_t {
local length : size_t = 0;
for (; *str != 0; ++str, ++length) {;} // Explicit empty statement
return length;
}
// Analyzes a String by printing it out together with its length.
fn analyzeString (string : String) {
local length : size_t = strlen(string);
printf("The String '%s' is %zu Char(s) long.\n", string, length);
}
In the following example the emptyString
is analyzed and it will be printed to the console that the string has a length of 2 and the content is ""
(exact output: The String '""' is 2 Char(s) long.
).
fn main () {
local emptyString : String = "";
analyzeString(emptyString);
}
If multiple empty strings are concatenated by the pre-processor the number of double quotes will just stack.
The call analyzeString("" "" "" "");
would output The String '""""""""' is 8 Char(s) long.
on the console.
But if text is inserted into the double quotes the behavior returns to what is expected for those parts of the string concatenation.
The call analyzeString("" "123" "" "abc");
would output The String '""123""abc' is 10 Char(s) long.
on the console.
If a pointer to an u8
with the value 0 is used, it behaves like an empty String.
fn main () {
local zeroValue : u8 = 0
analyzeString(&zeroValue)
}
In this case the console output will be The String '' is 0 Char(s) long.
.
It is possible to create buffer overflows with global string variables.
For convenience I will call them strings, although they are actually pointers to u8
or arrays of u8
.
The following code should illustrate the problem.
@ <stdio.hdr>
extern fn exit (:int);
global string0 : array [32] of u8 = "Malicious Buffer Overflow "; // Misses the zero char at the end
global hexu64 : u64 = 0x6161616161616161; // 8 printable bytes buffered to get to the next string address
global hexu64_ : u64 = 0x6161616161616161; // 8 printable bytes buffered to get to the next string address
global hello1 : array [6] of u8 = "Hallo1"; // Misses the zero char at the end
global hello2 : array [7] of u8 = "Hallo2"; // Has the zero char
global hello3 : array [6] of u8 = "Hallo3"; // Misses the zero char at the end
global hello4 : array [6] of u8 = "Hallo4"; // Misses the zero char at the end
fn main () {
printf (">> %zu @ %s\n", string0, string0);
printf (">> %zu @ %s\n", hello1, hello1);
printf (">> %zu @ %s\n", hello2, hello2);
printf (">> %zu @ %s\n", hello3, hello3);
printf (">> %zu @ %s\n", hello4, hello4);
printf (">> %zu @ %s\n", &hexu64, &hexu64);
exit (0);
}
The output of this snippet on my machine is the following :
>> 140356855304208 @ Malicious Buffer Overflow aaaaaaaaaaaaaaaaHallo1Hallo2
>> 140356855304256 @ Hallo1Hallo2
>> 140356855304262 @ Hallo2
>> 140356855304269 @ Hallo3Hallo4
>> 140356855304275 @ Hallo4
>> 140356855304240 @ aaaaaaaaaaaaaaaaHallo1Hallo2
It illustrates, that it is possible to have strings stored in the global namespace "escape" its bounds because the array of u8
is allowed to be one byte to small to store the terminating zero-byte.
In the example global hello2 : array [7] of u8 = "Hallo2"
it is shown, that this overflow does not happen, when the array is one byte bigger, so that it can now contain the zero-byte.
The example print of string0
even shows, that it is possible to overflow not only the to the neighboring string, but even read through other variables, that should normally not be printed as text.
Looking at the assembler code the behavior can be explained.
string0:
.ascii "Malicious Buffer Overflow "
.size string0, 32
.type hexu64,@object
.p2align 3, 0x0
hexu64:
.quad 7016996765293437281
.size hexu64, 8
.type hexu64_,@object
.p2align 3, 0x0
hexu64_:
.quad 7016996765293437281
.size hexu64_, 8
.type hello1,@object
hello1:
.ascii "Hallo1"
.size hello1, 6
.type hello2,@object
hello2:
.asciz "Hallo2"
.size hello2, 7
.type hello3,@object
hello3:
.ascii "Hallo3"
.size hello3, 6
.type hello4,@object
hello4:
.ascii "Hallo4"
.size hello4, 6
.type .L0,@object
.section .rodata,"a",@progbits
This snippet only contains the global space for variables. The assembler code for the main function was ommitted.
As the assembler code shows, only the hello2
variable is stored as an .asciiz
, which denotes a zero terminated ascii string.
The other strings are only stored as .ascii
, which means that they are not zero terminated.
This may be intended behavior, since if arrays of u8
are used, the size may be known.
The issue is, that it is required to use arrays to store global strings and those arrays to not require, that the last entry is a zero terminator.
Possible solutions are either to force the array size of the u8
array to be big enough to store the zero terminator, if string literals are stored into them.
This may be a easy solution, since a length check is already in place as shown below. But it only checks if the array is long enough to store the string excluding the zero terminator.
global hello5 : array [5] of u8 = "Hallo5";
global hello5 : array [5] of u8 = "Hallo5";
^^^^^^^^^
overflow.abc:13.35-13.43: : error: : excess elements in array initializer
An other solution would be to allow the storing of zero terminated strings in u8
pointers like in the following example.
This would also simplify writing global string variables, since you do not need to update the length of the global array all of the time, just to have enough space for the string or to prevent global space to be allocated without ever being used.
global moin : -> u8 = "Moin";
gen::cat: can not cast 'array [5] of u8' to '-> u8'
abc: gen/cast.cpp:39: llvm::Value* gen::cast(gen::Value, const abc::Type*, const abc::Type*): Assertion `0' failed.
As shown this is currently not possible and it even creates an assertion.
This is something that also should be fixed and may be content of a future issue, when I understand it good enough.
It is possible to create overflows with string variables, since the compiler presumably thinks the length of a string literal is only the written content without the zero terminator (i.e. the empty string ""
has length 0 but it should actually contains 1 character: the zero terminator).
The fall through feature of the switch-case construct creates a segmentation fault in the compiler.
The following mimmal example creates the error.
fn main () {
switch (42) {
case 1:
case 2:
}
}
This issue is occurs, when 2 or more case clauses follow another without a statement separating them.
A single semicolon after case 1:
is enough to circumvent the error.
fn main () {
switch (42) {
case 1:;
case 2:
}
}
I am using WSL with Ubuntu 22.04.4 LTS, LLVM in version 18.1.3 and the abc compiler was compiled with clang++-18 in the Ubuntu clang version 18.1.3 with the target x86_64-pc-linux-gnu.
The usage of a case clause directly followed by a default clause (or vis versa) does not create this problem.
It only occurs if two or more case clauses are involved with no statements in between. But the following also creates an error.
fn main () {
switch (42) {
case 1:
default:
case 2:
}
}
The usage of valgrind has shown, that the issue may lie with LLVM or in an function call to it. Valgrind reports:
==2774== Invalid read of size 1
==2774== at 0x981395F: ??? (in /usr/lib/llvm-18/lib/libLLVM.so.1)
==2774== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==2774==
==2774==
==2774== Process terminating with default action of signal 11 (SIGSEGV)
==2774== Access not within mapped region at address 0x0
==2774== at 0x981395F: ??? (in /usr/lib/llvm-18/lib/libLLVM.so.1)
==2774== If you believe this happened as a result of a stack
==2774== overflow in your program's main thread (unlikely but
==2774== possible), you can try to increase the size of the
==2774== main thread stack using the --main-stacksize= flag.
==2774== The main thread stack size used in this run was 8388608.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.