llir / llvm Goto Github PK

View Code? Open in Web Editor NEW

1.2K 31.0 80.0 1.68 MB

Library for interacting with LLVM IR in pure Go.

Home Page: https://llir.github.io/document/

License: BSD Zero Clause License

Go 99.68% Makefile 0.30% Shell 0.02%

llvm-ir go golang llvm

llvm's People

Contributors

Stargazers

Watchers

llvm's Issues

ir: rename NewFunction to NewFunc

Just want to question a4f2487. Is there a good rationale for it? One rule I try to abide by is "only have one name for a thing, to the extent it is possible". So it should be type Func xor func NewFunction.

Visualize Control-Flow Graphs using the DOT graph description language

Corresponds to requirement 8, decomp/decomp#98.

Distinguish between unnamed local %42 and named local %"42"

LLVM distinguishes between unnamed local variables (e.g. %42), and named local variables (e.g. %"42"). To be compatible, we should too.

Example test case llvm/test/Analysis/DominanceFrontier/new_pm_test.ll:

define void @a_linear_impl_fig_1() nounwind {
0:
  br label %"1"
1:
  br label %"2"
2:
  br label %"3"
3:
  br i1 1, label %"13", label %"4"
4:
  br i1 1, label %"5", label %"1"
5:
  br i1 1, label %"8", label %"6"
6:
  br i1 1, label %"7", label %"4"
7:
  ret void
8:
  br i1 1, label %"9", label %"1"
9:
  br label %"10"
10:
  br i1 1, label %"12", label %"11"
11:
  br i1 1, label %"9", label %"8"
13:
  br i1 1, label %"2", label %"1"
12:
   switch i32 0, label %"1" [ i32 0, label %"9"
                              i32 1, label %"8"]
}

When parsing the example file above, we currently get the error:

invalid local ID in function "@a_linear_impl_fig_1", expected %12, got %13

This is because the basic block names are treated as unnamed IDs, and their order is out of place, since basic block 13 appears before 12.

Add an Operands() function for inspecting operands of instructions

I think it would be very useful to have a way of finding the operands of an instruction.

Here's an example of a code generator which makes such a function.

https://gist.github.com/pwaller/255654cd78b77484a02cdfaa6a22237c

In the end, I don't know exactly how I feel about it (especially the code generation part...).

But I guess this gets me quite close to being able to implement a simple dead code pass which can kill unused private functions.

Example use:

func loopOperands(irModule *ir.Module) {
	for _, f := range irModule.Funcs {
		for _, bb := range f.Blocks {
			for _, i := range bb.Insts {
				log.Println("inst:", i.Def())
				var tmp [16]irvalue.Value
				for _, o := range ir.Operands(tmp[:0], i) {
					log.Println("  op:", o)
				}
			}
		}
	}
}

I note that Operands() could almost return pointers to values, so that the references were mutable. However, this is broken. The only reason it is broken that I can find is the Scope field on InstCatchPad and InstCleanupPad. I think if we want to be able to obtain mutable references to Operands, those fields should become of types value.Value. I guess there are pros and cons to that. But if you want mutable references to operands I think the alternatives are going to be much uglier.

Read support for remaining LLVM IR language concepts

The intention is to provide read support for LLVM IR assembly using a Gocc generated lexer and parser from a BNF grammar of the LLVM IR assembly language.

The BNF grammar is located at ast/internal/ll.bnf. The reason to keep the grammar in an internal directory, is because the lexer and parser packages generated by Gocc will be considered internal packages, and should not be used by end-users directly. Instead, high-level libraries will make use of these internal packages to parse LLVM IR assembly into the data structures of the llir/llvm/ir package.

Since LLVM IR makes use of unnamed local variables and basic blocks, a context is required to keep track of and map local IDs to their associated values. A bit unfortunate, but this essentially means we cannot use syntax directed translation to translate directly from LLVM IR assembly to the data structures of the ir package. Instead, we must introduce an intermediate step which keeps the necessary information around for us to create and make use of this contextual information. Said and done, the current approach is to define an ast package for LLVM IR assembly, which will later be traversed to create the aforementioned context and translate AST nodes into their corresponding ir data types.

To get a feel for what the production action expressions of Gocc looks like, see the follow example.

FuncDef
    : "define" OptFuncLinkage
      FuncHeader FuncBody                         << irx.NewFuncDef($2, $3) >>
;

Help wanted

If anyone manages to figure out a clean way for us to skip this step (i.e. not having to translate from BNF grammar to AST, then from AST to ir data types; but instead, translating directly from BNF grammar to ir data types), and go directly from the BNF grammar to the ir package data types using production action expressions, please let us know. This would facilitate the maintainability and future development of this package a lot!

ir: add type as first argument to NewStruct, NewArray and NewVector?

Currently, the type of struct, array and vector constants is inferred by the elements and fields passed to their respective constructors. The idea was to make it easier for users to create these constants. However, there are valid cases where the user may which to pass a specific type to these constructors, especially the NewStruct constructor as struct types are equated by type identity and not structural equality.

Also for consistency with NewInt, NewFloat and other constructors of constants, we may wish to add a type as the first argument to the constructors NewStruct, NewArray and NewVector.

I'll leave this open for discussion, so we can collect different benefits and drawbacks with the various approaches.

To be specific, this issue suggests to update the constant.NewStruct, constant.NewArray and constant.NewVector constructors, as follows:

 package constant

-func NewStruct(fields ...Constant) *Struct
+func NewStruct(t *types.StructType, fields ...Constant) *Struct

-func NewArray(elems ...Constant) *Array
+func NewArray(t *types.ArrayType, elems ...Constant) *Array

-func NewVector(elems ...Constant) *Vector
+func NewVector(t *types.VectorType, elems ...Constant) *Vector

astx/fix: doesn't handle quoted names correctly

Input:

define void @fn() {
    call void @"quoted1"()
    call void @"quoted 2"()
    ret void
}

declare void @"quoted1"()
declare void @"quoted 2"()

Output:

panic: unable to locate global identifier "\"quoted1\""

goroutine 1 [running]:
github.com/llir/llvm/asm/internal/astx.(*fixer).getGlobal(0xc421d896a8, 0xc420012381, 0x9, 0x5fa520, 0xc421d45360)
	/home/dominikh/prj/src/github.com/llir/llvm/asm/internal/astx/fix.go:345 +0x12c
[...]

getGlobal and getLocal in astx/fix.go do not handle quoted names correctly. The maps store unquoted names, but the lookup includes the surrounding quotes.

Add WriteTo(w) methods in addition to Def() to produce asm

At the moment, every Def() allocates its own string builder. This results in a lot of allocation and copying overhead for building IR outputs.

I don't (yet) have a benchmark showing this to be a problem, but in terms of API it would be nice to supply a writer and have the llvm package write directly there.

This issue is a reminder to come back to this.

Generate a c-shared library for the standard C LLVM API

This idea was originally presented by @quarnster in #3 (comment). Creating a dedicated issue to track any discussions on the LLVM-dev mailing list, and implementation discussions. Please post updates in this issue if you read about the direction in which the definition of a stable C LLVM API is heading (this is still an active topic of discussion).

With go 1.5's support for creating c-shared libraries, I really like the idea of having a go generate tool which generates the standard C LLVM api (or the implemented subset anyways). That way this code could be used as a drop in replacement for anything that currently uses the LLVM C api, presuming all the functions used are implemented.

Just figured I'd mention this as it's been on my mind due to the discussion on the llvm-dev list about potentially splitting the LLVM C api into a separate project.

ir.InlineAsm does not implement value.Named

The documentation for NewCall says it may have one of the following types:

*ir.Function
*types.Param
*constant.ExprBitCast
*ir.InstBitCast
*ir.InstLoad
*ir.InlineAsm

However, ir.InlineAsm doesn't implement the value.Named interface, so trying to use it as the callee parameter results in a compile time error about *ir.InlineAsm not implementing GetName()

Produce Control-Flow Graphs from LLVM IR basic blocks

Corresponds to requirement 7, decomp/decomp#97.

Ensure that Gocc generated code conforms to gofmt

Ref: https://travis-ci.org/llir/llvm/builds/180995728

### gofmt
./asm/internal/token/token.go
./asm/internal/lexer/lexer.go
./asm/internal/lexer/transitiontable.go
./asm/internal/lexer/acttab.go
./asm/internal/parser/actiontable.go
./asm/internal/parser/productionstable.go
./asm/internal/parser/gototable.go
./asm/internal/parser/action.go
./asm/internal/parser/parser.go
./asm/internal/util/rune.go
./asm/internal/util/litconv.go
./asm/internal/errors/errors.go

Extract from gofmt -d

diff ./util/litconv.go gofmt/./util/litconv.go
--- /tmp/gofmt318333961	2016-12-04 12:01:27.498715220 +0100
+++ /tmp/gofmt274563540	2016-12-04 12:01:27.498715220 +0100
@@ -1,4 +1,3 @@
-
 // generated by gocc; DO NOT EDIT.
 
 //Copyright 2013 Vastech SA (PTY) LTD
diff ./util/rune.go gofmt/./util/rune.go
--- /tmp/gofmt147755799	2016-12-04 12:01:27.502048553 +0100
+++ /tmp/gofmt271031690	2016-12-04 12:01:27.502048553 +0100
@@ -1,4 +1,3 @@
-
 // generated by gocc; DO NOT EDIT.
 
 //Copyright 2013 Vastech SA (PTY) LTD
diff ./parser/parser.go gofmt/./parser/parser.go
--- /tmp/gofmt528173389	2016-12-04 12:01:27.522048553 +0100
+++ /tmp/gofmt702109256	2016-12-04 12:01:27.522048553 +0100
@@ -1,9 +1,8 @@
-
 // generated by gocc; DO NOT EDIT.
 
 package parser
 
-import(
+import (
 	"bytes"
 	"fmt"
 
@@ -20,16 +19,16 @@
 // Stack
 
 type stack struct {
-	state []int
-	attrib	[]Attrib
+	state  []int
+	attrib []Attrib
 }
 
 const iNITIAL_STACK_SIZE = 100
 
 func newStack() *stack {
-	return &stack{ 	state: 	make([]int, 0, iNITIAL_STACK_SIZE),
-					attrib: make([]Attrib, 0, iNITIAL_STACK_SIZE),
-			}
+	return &stack{state: make([]int, 0, iNITIAL_STACK_SIZE),
+		attrib: make([]Attrib, 0, iNITIAL_STACK_SIZE),
+	}
 }
 
 func (this *stack) reset() {
@@ -42,8 +41,8 @@
 	this.attrib = append(this.attrib, a)
 }
 
-func(this *stack) top() int {
-	return this.state[len(this.state) - 1]
+func (this *stack) top() int {
+	return this.state[len(this.state)-1]
 }
 ...

VAArg declaration

How to create a function with VAArg like int printf ( const char * format, ... );?

For example:

mod := ir.NewModule()
f := mod.NewFunc(
	"printf",
	types.I32,
	ir.NewParam("format", types.NewPointer(types.I8)),
)
fmt.Printf("%s\n", f.Def())

After searching, I think llir only has instruction about using VAArg but can't create a function with VAArg.

Test cases

This meta issue is meant to track the implementation of test cases. Ideally these test cases will be implemented after the API skeleton has been drafted but prior to the implementation of any core logic.

Create round-trip test cases which reads an LLVM IR assembly file, stores it, and reads it back again. The IR of the two reads should be identical.

Feature: Specialized Metadata Nodes

https://llvm.org/docs/LangRef.html#specialized-metadata-nodes

ir: If array length is uint64, should bitsize also be?

Just came across 1f63577 which changes the type of array/vector lengths from int64 to uint64. This broke some code I have that multiplies by the bitsize of the element type because now they have different types.

I don't mind too much which type is used, but it seems to me that whatever logic is used to choose signed vs unsigned would apply equally well to both, and the consistency breakage is a downside

llvm/ir/types/types.go

Lines 208 to 214 in acfb969

 // IntType is an LLVM IR integer type. 

 type IntType struct { 

 // Type name; or empty if not present. 

 TypeName string 

 // Integer size in number of bits. 

 BitSize int64 

 }

llvm/ir/types/types.go

Lines 601 to 609 in acfb969

 // ArrayType is an LLVM IR array type. 

 type ArrayType struct { 

 // Type name; or empty if not present. 

 TypeName string 

 // Array length. 

 Len uint64 

 // Element type. 

 ElemType Type 

 }

How to handle attribute group IDs with missing attribute group definition?

Recently, we've been porting the suite of test cases from the official LLVM project. Many have helped uncover corner cases in the grammar, and the AST to IR translation code.

One of the corner cases seem quite strange though, as it seems valid to use attribute IDs (e.g. #42) in LLVM IR modules not containing any associated attribute group definition (e.g. #42 = {...}).

For instance, test/DebugInfo/X86/parameters.ll uses #0, #1 and #2, but only contains definitions for #0 and #1. The definition for #2 is missing.

define void @_ZN7pr147634funcENS_3fooE(%"struct.pr14763::foo"* noalias sret %agg.result, %"struct.pr14763::foo"* %f) #0

declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

declare void @_ZN7pr147633fooC1ERKS0_(%"struct.pr14763::foo"*, %"struct.pr14763::foo"*) #2

attributes #0 = { uwtable }
attributes #1 = { nounwind readnone }

Any ideas why this may be? Also, how shall we handle these issues? It seems to be an error that should be reported, but since Clang and opt silently ignores it, perhaps we must too.

@pwaller what are your thoughts?

Cannot parse foo.ll containing basic block named `header:`

The generated lexer tokenizes header: as a token distinct from LabelIdent as header: is used as the field name of the specialized metadata node GenericDINodeField.

For this reason, any input file containing a basic block named header: will report a syntax error.

Example from llvm/test/Analysis/ScalarEvolution/2008-02-15-UMax.ll:

define i32 @foo(i32 %n) {
entry:
        br label %header
header:
        %i = phi i32 [ 100, %entry ], [ %i.inc, %next ]
        %cond = icmp ult i32 %i, %n
        br i1 %cond, label %next, label %return
next:
        %i.inc = add i32 %i, 1
        br label %header
return:
        ret i32 %i
}

Remove debug output before v0.3.0 release.

Just a reminder to remove the asm: parsing into AST took: 24.431252ms debug output before the v0.3.0 release.

unable to resolve circular local value references

Input file issue_27.ll:

; minimal test case adapted from the @main function of base32.ll, as part of
; coreutils in https://github.com/decomp/testdata

define i32 @main(i32, i8**) {
entry:
	br label %loop_init

loop_init:
	br label %loop_post

loop_cond:
	%cond = icmp ult i32 %i.0, 42
	br i1 %cond, label %loop_post, label %loop_exit

loop_post:
	%i.1 = phi i32 [ %i.0, %loop_cond ], [ 0, %loop_init ]
	%i.0 = add i32 %i.1, 1
	br label %loop_cond

loop_exit:
	ret i32 %i.0
}

$ lparse issue_27.ll
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x4d928a]

goroutine 1 [running]:
github.com/llir/llvm/ir.(*InstAdd).Type(0xc420707600, 0x65e400, 0xc421d17640)
	/home/u/Desktop/go/src/github.com/llir/llvm/ir/inst_binary.go:48 +0x2a
github.com/llir/llvm/asm/internal/irx.(*Module).basicBlock(0xc421d87e08, 0xc420706b80, 0xc420707500)
	/home/u/Desktop/go/src/github.com/llir/llvm/asm/internal/irx/translate.go:1114 +0xe69
github.com/llir/llvm/asm/internal/irx.(*Module).funcDecl(0xc421d87e08, 0xc420080320)
	/home/u/Desktop/go/src/github.com/llir/llvm/asm/internal/irx/translate.go:622 +0x2fd9
github.com/llir/llvm/asm/internal/irx.Translate(0xc421d4a000, 0x179, 0x379, 0xc421d4a000)
	/home/u/Desktop/go/src/github.com/llir/llvm/asm/internal/irx/translate.go:127 +0x1322
github.com/llir/llvm/asm.ParseBytes(0xc421d48000, 0x179, 0x379, 0x179, 0x379, 0x0)
	/home/u/Desktop/go/src/github.com/llir/llvm/asm/asm.go:43 +0x60
github.com/llir/llvm/asm.ParseFile(0x7ffcdd84892c, 0xa, 0xc42005c060, 0xc42000a090, 0x1)
	/home/u/Desktop/go/src/github.com/llir/llvm/asm/asm.go:22 +0x9f
main.parse(0x7ffcdd84892c, 0xa, 0xc4208ca240, 0xc420079f58)
	/home/u/Desktop/go/src/github.com/llir/llvm/cmd/lparse/lparse.go:22 +0x39
main.main()
	/home/u/Desktop/go/src/github.com/llir/llvm/cmd/lparse/lparse.go:15 +0x79

llvm/asm: missing files

hi there,

trying to compile "llvm/asm" I get:

$> go get -u -v github.com/llir/llvm/asm
github.com/llir/llvm (download)
github.com/pkg/errors (download)
package github.com/llir/llvm/asm/internal/lexer: cannot find package "github.com/llir/llvm/asm/internal/lexer" in any of:
	/usr/lib/go/src/github.com/llir/llvm/asm/internal/lexer (from $GOROOT)
	/home/binet/work/igo/src/github.com/llir/llvm/asm/internal/lexer (from $GOPATH)

it seems to me there are a few files missing in, at least, "llvm/asm/internal/{lexer,parser}".

could this be fixed?

thx!

NewFloat does not generate the expected output

Hello all,

I would first like to say thanks for an amazing library and I appreciate the hard work and dedication to support LLVM through Go.

However, using the library in a project of my own to generate LLVM instructions for different variables, I ran into a problem where using NewFloat did not produce the expected results. I will demonstrate using a C program as a comparison.

Using clang -S -emit-llvm main.c on the following program:

Input C Program:

int main() {
  float j = 1.1;

  return 0;
}

Produces the following store instruction for the float variable:

store float 0x3FF19999A0000000, float* %2, align 4

This is a 64 bit float with the last 28 bits dropped and converted to hex. (According to: http://lists.llvm.org/pipermail/llvm-dev/2011-April/039811.html)

However, attempting to generate the same instruction using this library:

mainBlock.NewStore(constant.NewFloat(value, types.Float), mainBlock.NewAlloca(types.Float))

where value is the float literal 1.1.

I obtained the following instruction:

store float 1.1, float* %1

Putting this into LLVM to generate assembly using:

llc -march=x86 -o main.expr.assembly main.expr.ll

Generates an error of:

llc: main.expr.ll:6:14: error: floating point constant invalid for type
        store float 1.1, float* %1

I can provide more information if needed, but a few questions:

Is this expected behavior?
Should this be expected behavior?
If yes, why does this produce code that doesn't work?

I can get it to work using types.Double, and if that is the solution then so be it for now, but I'd like to investigate if this is actually the expected output.

Again,
Thanks for the work and dedication

Figure out how to handle index vectors in getelementptr instructions and constant expressions

Today I noticed a strange use of getelementptr instructions, that I have yet to find any official documentation describing its semantics. Rather than integer values being used as indices of gep, I found an instruction which uses integer vectors. And the resulting type of the gep instruction is not a pointer type but a vector of pointers type.

From ls.ll of Coreutils:

%37 = getelementptr inbounds %struct.fileinfo, %struct.fileinfo* %20, <2 x i64> %34, !dbg !4706
...
%40 = bitcast i8** %39 to <2 x %struct.fileinfo*>*, !dbg !4708
store <2 x %struct.fileinfo*> %37, <2 x %struct.fileinfo*>* %40, align 8, !dbg !4708, !tbaa !1793

Notice that the first index of gep is <2 x i64> %34, a vector value and not an integer value.

Furthermore, notice that the type of %37 is <2 x %struct.fileinfo*>, a vector of pointers type, and not a pointer type.

@pwaller Have you seen this before, and do you know how the result type of gep is calculated?

I skimmed through https://llvm.org/docs/GetElementPtr.html and found no reference of this behaviour.

Cheers!
Robin

Linking modules

First of all, I love this library. The LLVM bindings were a pain to work with because of their compile times and the fact it takes away the cross compilation that go gives us. But I am running in to a problem: In the current build, is there any way to define a function in one module, and use it in another one without erroring because it isn't defined?

Any help would be appreciated!

Code coverage

This issue tracks code coverage for the different llir/llvm packages. We will seek to add test cases for cases where code covered is absent.

asm

The list of concepts to add test cases for is presented below. It is based on rev e157748 and was constructed by assessing the output of go test -coverprofile=a.out && go tool cover -html=a.out for the asm package.

Prior to adding test cases for these concepts, the code coverage of asm was ~75%.

$ go test -cover
coverage: 75.6% of statements

ir: rename Def method to LLString?

Analogous to fmt.GoStringer, we could use LLString (or LLVMString) to have LLVM IR constructs print their own definition.

The main reason to switch is to free up the name Def, which we may want to use for use-def chains as part of the v0.4.0 release which focuses on data flow analysis.

Formal Grammar of LLVM IR

I've been unable to locate an official formal grammar for LLVM IR. If anyone has information about work in this direction, please point it out to me.

To address this issue a formal grammar of LLVM IR will be created, prior to the implementation of the LLVM IR Assembly Language parser. This work was taking place at mewlang/llvm/asm/grammar (old link superseded by https://github.com/llir/llvm/blob/master/asm/internal/ll.bnf).

Edit: For anyone who happen to stumble upon this issue. The latest version of the grammar is located in the llir/grammar repository, more specifically see ll.tm for an EBNF grammar for LLVM IR assembly.

Performance

This issue is intended to profile the performance of the llir/llvm library, measure it against the official LLVM distribution and evaluate different methods for improving the performance.

This is a continuation of mewspring/mewmew-l#6

The benchmark suite is at https://github.com/decomp/testdata. Specifically, the LLVM IR assembly of these projects are used in the benchmark:

Below follows a first evaluation of using concurrency to speed up parsing. The evaluation is based on a very naiive implementation of concurrency, just to get some initial runtime numbers. It is based on 3011396 of the development branch, and subsets of the following patch has been applied https://gist.github.com/mewmew/d127b562fdd8f560222b4ded739861a7

Official LLVM results

For comparison, below are the runtime results of the opt tool from the official LLVM distribution (using opt -verify foo.ll).

Coreutils

real 8.18
user 7.22
sys 0.88

SQLite

real 1.90
user 1.73
sys 0.13

`llir/llvm` results

Coreutils

no concurrency

total time for file "testdata/coreutils/testdata/yes.ll": 55.744113ms
real 11.54
user 14.70
sys 0.16

concurrent `translateTopLevelEntities`

4 go-routines with waitgroup in translateTopLevelEntities

total time for file "testdata/coreutils/testdata/yes.ll": 53.49785ms
real 10.28
user 16.06
sys 0.15

concurrent `translateGlobals`

2 go-routines with waitgroup in translateGlobals (for global and function definitions)

total time for file "testdata/coreutils/testdata/yes.ll": 55.567134ms
real 9.83
user 17.18
sys 0.17

concurrent `translateTopLevelEntities` and `translateGlobals`

4 go-routines with waitgroup in translateTopLevelEntities
2 go-routines with waitgroup in translateGlobals (for global and function definitions)

total time for file "testdata/coreutils/testdata/yes.ll": 58.474581ms
real 9.23
user 18.08
sys 0.16

SQLite3

no concurrency

total time for file "shell.ll": 3.147106433s
real 3.18
user 3.86
sys 0.32

concurrent `translateTopLevelEntities`

4 go-routines with waitgroup in translateTopLevelEntities

total time for file "shell.ll": 2.848574349s
real 2.88
user 4.67
sys 0.32

concurrent `translateGlobals`

2 go-routines with waitgroup in translateGlobals (for global and function definitions)

total time for file "testdata/sqlite/testdata/shell.ll": 2.86919391s
real 2.90
user 4.90
sys 0.32

concurrent `translateTopLevelEntities` and `translateGlobals`

4 go-routines with waitgroup in translateTopLevelEntities
2 go-routines with waitgroup in translateGlobals (for global and function definitions)

total time for file "shell.ll": 2.897873366s
real 2.93
user 4.79
sys 0.33

Sort top-level entities in the same way as opt and Clang

Alphabetically sorted

Type definitions

Given the input:

%x = type { i32 }
%1 = type { i32 }
%0 = type { %1, %2 }
%2 = type { float, double }

opt -S -o output.ll < input.ll produces the following output:

%0 = type { %1, %2 }
%1 = type { i32 }
%2 = type { float, double }
%x = type { i32 }

As such, the order of occurrence in the input source file is not taken into consideration during output, but rather, type names are sorted alphabetically. We should do the same.

Comdat definitions

Input:

$b = comdat any
$a = comdat any

@x = global i32 42, comdat($a)
@b = global i32 42, comdat($b)

opt output:

$a = comdat any
$b = comdat any

@x = global i32 42, comdat($a)
@y = global i32 42, comdat($b)

Attribute group definitions

Input:

define void @a() #0 {
	ret void
}

define void @b() #0 #2 {
	ret void
}

define void @c() #22 {
	ret void
}

define void @d() {
	ret void
}

define void @e() #2 {
	ret void
}

attributes #22 = { "foobar" }
attributes #0 = { nounwind readnone "target-cpu"="hexagonv60" }

opt output:

define void @a() #0 {
  ret void
}

define void @b() #0 {
  ret void
}

define void @c() #1 {
  ret void
}

define void @d() {
  ret void
}

define void @e() {
  ret void
}

attributes #0 = { nounwind readnone "target-cpu"="hexagonv60" }
attributes #1 = { "foobar" }

Note: besides sorting in numerical order, opt also renamed #22 to #1, the first attribute group ID not yet in use.

(unnamed) Metadata definitions

Input:

define void @a() !x !2 {
	ret void
}

define void @b() !x !21 !a !2 {
	ret void
}

define void @c() !x !0 {
	ret void
}

define void @d() {
	ret void
}

define void @e() !x !2 {
	ret void
}

!21 = !{ !"foo" }
!2 = !{ !"baz" }
!0 = !{ !"bar" }

opt output:

define void @a() !x !0 {
	ret void
}

define void @b() !x !1 !a !0 {
	ret void
}

define void @c() !x !2 {
	ret void
}

define void @d() {
	ret void
}

define void @e() !x !0 {
	ret void
}

!0 = !{!"baz"}
!1 = !{!"foo"}
!2 = !{!"bar"}

Note: besides sorting in numerical order, opt also renamed !21 to !0, the first metadata ID not yet in use.

Sorted by order of occurrence in input

Global variable declarations and definitions

Input:

@x = external global i32
@0 = global i32 42
@1 = external global i32
@a = external global i32
@b = global i32 42

opt output:

@x = external global i32
@0 = global i32 42
@1 = external global i32
@a = external global i32
@b = global i32 42

Function declarations and definitions

Input:

declare void @x()

define void @0() {
   ret void
}

declare void @1()

declare void @a()

define void @b() {
   ret void
}

opt output:

declare void @x()

define void @0() {
  ret void
}

declare void @1()

declare void @a()

define void @b() {
  ret void
}

Indirect symbols (aliases and indirect functions)

Input:

@foo = global i32 42

@x = alias i32, i32* @foo
@y = ifunc void (), void ()* @bar
@0 = alias i32, i32* @foo
@1 = alias i32, i32* @foo
@2 = ifunc void (), void ()* @bar
@3 = ifunc void (), void ()* @bar
@a = alias i32, i32* @foo
@c = ifunc void (), void ()* @bar
@b = alias i32, i32* @foo
@d = ifunc void (), void ()* @bar

define void @bar() {
	ret void
}

opt output:

@foo = global i32 42

@x = alias i32, i32* @foo
@0 = alias i32, i32* @foo
@1 = alias i32, i32* @foo
@a = alias i32, i32* @foo
@b = alias i32, i32* @foo

@y = ifunc void (), void ()* @bar
@2 = ifunc void (), void ()* @bar
@3 = ifunc void (), void ()* @bar
@c = ifunc void (), void ()* @bar
@d = ifunc void (), void ()* @bar

define void @bar() {
	ret void
}

Named metadata definitions

Input:

define void @a() !x !2 {
	ret void
}

define void @b() !x !21 !a !2 {
	ret void
}

define void @c() !x !0 {
	ret void
}

define void @d() {
	ret void
}

define void @e() !x !2 {
	ret void
}

!foo = !{!2}
!bar = !{!0}
!aaa = !{!21}

!21 = !{!"foo"}
!2 = !{!"baz"}
!0 = !{!"bar"}

opt output:

define void @a() !x !0 {
	ret void
}

define void @b() !x !2 !a !0 {
	ret void
}

define void @c() !x !1 {
	ret void
}

define void @d() {
	ret void
}

define void @e() !x !0 {
	ret void
}

!foo = !{!0}
!bar = !{!1}
!aaa = !{!2}

!0 = !{!"baz"}
!1 = !{!"bar"}
!2 = !{!"foo"}

NewFloat does not generate the correct value

Hello all,

I would first like to say thanks for an amazing library and I appreciate the hard work and dedication to support LLVM through Go.

Using clang -S -emit-llvm main.c on the following program:

Input C Program:

int main() {
  float j = 1.1;

  return 0;
}

Produces the following store instruction for the float variable:

store float 0x3FF19999A0000000, float* %2, align 4

This is a 64 bit float with the last 28 bits dropped and converted to hex. (According to: http://lists.llvm.org/pipermail/llvm-dev/2011-April/039811.html)

However, attempting to generate the same instruction using this library:

mainBlock.NewStore(constant.NewFloat(value, types.Float), mainBlock.NewAlloca(types.Float))

where value is the float literal 1.1.

I obtained the following instruction:

store float 1.1, float* %1

Putting this into LLVM to generate assembly using:

llc -march=x86 -o main.expr.assembly main.expr.ll

Generates an error of:

llc: main.expr.ll:6:14: error: floating point constant invalid for type
        store float 1.1, float* %1

I can provide more information if needed, but a few questions:

Is this expected behavior?
Should this be expected behavior?
If yes, why does this produce code that doesn't work?

I can get it to work using types.Double, and if that is the solution then so be it for now, but I'd like to investigate if this is actually the expected output.

Again,
Thanks for the work and dedication

Use tracking

The issue is intended to track discussions and experimental implementation related to use tracking.

The C++ API of LLVM defines the concepts of a Use and a User. A Use is an edge between a used value and its user. Each User has a number of operands which specify the Used values. Pseudo code follows:

type Value interface {
	Uses() []Use
}

type Use interface {
	OpNum() int
	User() User
	Usee() Value
}

type User() interface {
	NOps() int
	Op(i int) Value
	SetOp(i int, v Value) error
}

Anyone is invited to join the discussion. How would users of the API which to use it? May it be implemented by a dedicated package separate from the ir package? How would the interaction work? May there co-exist several implementations of use-tracking, and is this ever useful?

Requirements

This issue summarizes the requirements of the LLVM packages, as specified by its intended use cases.

The requirements of llgo as stated by @axw (in this issue) are as follows:

As for llgo's requirements:

in terms of using the LLVM API for generating code, it's mostly write-only via the builder API. Bitcode and IR reading is not important (at the moment?), but writing is; one or the other is required, but preferably both.

llgo uses the DIBuilder API for generating debug metadata (DWARF, et al.). This could be built outside of the core (it's just a matter of creating metadata nodes in a particular format), just be aware that it's pretty finicky and easy to break.

llgo needs to be able to look up target data (arch word size, alignment, etc.) from triples

For the decompilation pipeline the llvm packages should be able to:

represent the LLVM IR using Control Flow Graphs, where each node represents a BasicBlock.
insert new nodes and rearrange existing ones.

interaction with go runtime

Thanks for this cool project!

@mewmew I'm wondering if llir/llvm could be used to generate code in a Go program that interacts with the Go (gc) runtime... allocating objects that are gc-ed; starting goroutines that are scheduled; making blocking system calls that the runtime handles with blocking... the kinds of things that would make llir/llvm viable for implementing an interpreter in Go.

What kind of interaction (if any) are you thinking could be supported? Or is that not on the radar at all?

asm: use natural instead of lexicographic sorting of top-level entities

Given the following input:

%t1 = type {}
%t19 = type {}
%t20 = type {}
%t21 = type {}
%t2 = type {}

define void @main() {
   alloca %t1
   alloca %t2
   alloca %t19
   alloca %t20
   alloca %t21
   ret void
}

opt -S -o < foo.ll produces the following output, in which type definitions are sorted using natural instead of lexicographic sorting:

%t1 = type {}
%t2 = type {}
%t19 = type {}
%t20 = type {}
%t21 = type {}

define void @main() {
  %1 = alloca %t1
  %2 = alloca %t2
  %3 = alloca %t19
  %4 = alloca %t20
  %5 = alloca %t21
  ret void
}

Edit: for comparison, using sort.Strings, we current get the follow output:

%t1 = type {}
%t19 = type {}
%t2 = type {}
%t20 = type {}
%t21 = type {}

define void @main() {
; <label>:0
	%1 = alloca %t1
	%2 = alloca %t2
	%3 = alloca %t19
	%4 = alloca %t20
	%5 = alloca %t21
	ret void
}

ir: use value.Value for operands of instructions?

This has come up in discussion and would enable APIs such as Operands() []*value.Value, which would allow not only use tracking, but also value replacement; as proposed by @pwaller. (ref: #42)

Depending on how far we wish to take this API change, there are some benefits and drawbacks. The main drawback I can see is if we change instructions (and terminators) to take value.Value instead of *ir.BasicBlock, since then users of the API cannot make use of the basic block directly, but would have to type assert to inspect the instructions of the basic block for instance. This is also true for the phi instruction, for which Incoming may be redefined as follows:

 // Incoming is an incoming value of a phi instruction.
 type Incoming struct {
 	// Incoming value.
 	X value.Value
 	// Predecessor basic block of the incoming value.
-	Pred *BasicBlock
+	Pred value.Value // *ir.BasicBlock
 }

Another instruction that would change is catchdad, which would take a value.Value instead of the concrete type *TermCatchSwitch:

 // InstCatchPad is an LLVM IR catchpad instruction.
 type InstCatchPad struct {
 	// Name of local variable associated with the result.
 	LocalIdent
 	// Exception scope.
- 	Scope *TermCatchSwitch
+ 	Scope value.Value // *ir.TermCatchSwitch
 	// Exception arguments.
 	Args []value.Value
 
 	// extra.
 
 	// (optional) Metadata.
 	Metadata []*metadata.MetadataAttachment
 }

Besides the phi and catchpad instructions, quite a few terminators would be update to make use of value.Value instead of *ir.BasicBlock.

 // TermCondBr is a conditional LLVM IR br terminator.
 type TermCondBr struct {
 	// Branching condition.
 	Cond value.Value
 	// True condition target branch.
-	TargetTrue *BasicBlock
+	TargetTrue value.Value // *ir.BasicBlock
 	// False condition target branch.
-	TargetFalse *BasicBlock
+	TargetFalse value.Value // *ir.BasicBlock
 
 	// extra.
 
 	// Successor basic blocks of the terminator.
 	Successors []*BasicBlock
 	// (optional) Metadata.
 	Metadata []*metadata.MetadataAttachment
 }

The catchret terminator would take a value.Value instead of *ir.InstCatchPad

It is also possible we'd have to update ir.Case to take a value.Value instead of a constant.Constant if we want to use this approach for also refining/updating/replacing values and not just for read-only access to operands.

 // Case is a switch case.
 type Case struct {
 	// Case comparand.
-	X constant.Constant // integer constant or integer constant expression
+	X value.Value // integer constant or integer constant expression
 	// Case target branch.
 	Target *BasicBlock
 }

I'm currently on the fence whether this change is good or not. The data types become less exact with what values they may contain, and specifically for basic blocks, users of the API would have to type assert to access the fields specific to basic blocks (such as its instructions). On the other hand, it would enable a general and quite powerful API for operand tracking and replacement.

I'll label this for the v0.4.0 release for now, as it mostly targets data analysis and the use-def chains API.

Any input is warmly welcome.

Cheers,
/u

[challenge] hello

"The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."
— Tom Cargill, Bell Labs

To encourage the development of the final 10%, a set of challenges have been produced. This is a meta-issue to track the hello challenge of the parse-me repository.

Once these challenges have been beaten, lexing, parsing, and potentially type checking of LLVM IR assembly will have been implemented. At this point, the project is ready for an API overhaul and will welcome an open discussion with other members of the community interested in finding a clean, minimal API for interacting with LLVM IR.

Note: there will exist several, almost identical, challenge issues. The main reason for this is that the developer finds childish joy in closing issues once a challenge has been beaten :)

grammar: add module summary (introduced in LLVM 7.0)

In LLVM 7.0 the concept of ThinLTO module summaries was introduced. We currently have no IR representation of module summaries, and the grammar for module summaries has not yet been written.

A test case containing a module summary is present in llvm/test/Assembler/thinlto-summary.ll:

; ModuleID = 'thinlto-summary.thinlto.bc'

^0 = module: (path: "thinlto-summary1.o", hash: (1369602428, 2747878711, 259090915, 2507395659, 1141468049))
^1 = module: (path: "thinlto-summary2.o", hash: (2998369023, 4283347029, 1195487472, 2757298015, 1852134156))

; Check a function that makes several calls with various profile hotness, and a
; reference (also tests forward references to function and variables in calls
; and refs).
^2 = gv: (guid: 1, summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 10, calls: ((callee: ^15, hotness: hot), (callee: ^17, hotness: cold), (callee: ^16, hotness: none)), refs: (^13))))

; Function with a call that has relative block frequency instead of profile
; hotness.
^3 = gv: (guid: 2, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 10, calls: ((callee: ^15, relbf: 256)))))

; Summaries with different linkage types.
^4 = gv: (guid: 3, summaries: (function: (module: ^0, flags: (linkage: internal, notEligibleToImport: 0, live: 0, dsoLocal: 1), insts: 1)))
; Make this one an alias with a forward reference to aliasee.
^5 = gv: (guid: 4, summaries: (alias: (module: ^0, flags: (linkage: private, notEligibleToImport: 0, live: 0, dsoLocal: 1), aliasee: ^14)))
^6 = gv: (guid: 5, summaries: (function: (module: ^0, flags: (linkage: available_externally, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
^7 = gv: (guid: 6, summaries: (function: (module: ^0, flags: (linkage: linkonce, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
^8 = gv: (guid: 7, summaries: (function: (module: ^0, flags: (linkage: linkonce_odr, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
^9 = gv: (guid: 8, summaries: (function: (module: ^0, flags: (linkage: weak_odr, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
^10 = gv: (guid: 9, summaries: (function: (module: ^0, flags: (linkage: weak, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
^11 = gv: (guid: 10, summaries: (variable: (module: ^0, flags: (linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 0))))
; Test appending globel variable with reference (tests backward reference on
; refs).
^12 = gv: (guid: 11, summaries: (variable: (module: ^0, flags: (linkage: appending, notEligibleToImport: 0, live: 0, dsoLocal: 0), refs: (^4))))

; Test a referenced global variable.
^13 = gv: (guid: 12, summaries: (variable: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0))))

; Test a dsoLocal variable.
^14 = gv: (guid: 13, summaries: (variable: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 1))))

; Functions with various flag combinations (notEligibleToImport, Live,
; combinations of optional function flags).
^15 = gv: (guid: 14, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 1, live: 1, dsoLocal: 0), insts: 1)))
^16 = gv: (guid: 15, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1, funcFlags: (readNone: 1, noRecurse: 1))))
; This one also tests backwards reference in calls.
^17 = gv: (guid: 16, summaries: (function: (module: ^1, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1, funcFlags: (readOnly: 1, returnDoesNotAlias: 1), calls: ((callee: ^15)))))

; Alias summary with backwards reference to aliasee.
^18 = gv: (guid: 17, summaries: (alias: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 1), aliasee: ^14)))

; Test all types of TypeIdInfo on function summaries.
^19 = gv: (guid: 18, summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 4, typeIdInfo: (typeTests: (^24, ^26)))))
^20 = gv: (guid: 19, summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 8, typeIdInfo: (typeTestAssumeVCalls: (vFuncId: (^27, offset: 16))))))
^21 = gv: (guid: 20, summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 5, typeIdInfo: (typeCheckedLoadVCalls: (vFuncId: (^25, offset: 16))))))
^22 = gv: (guid: 21, summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 15, typeIdInfo: (typeTestAssumeConstVCalls: (vFuncId: (^27, offset: 16), args: (42), vFuncId: (^27, offset: 24), args: (43))))))
^23 = gv: (guid: 22, summaries: (function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 5, typeIdInfo: (typeCheckedLoadConstVCalls: (vFuncId: (^28, offset: 16), args: (42))))))

; Test TypeId summaries:

; Test the AllOnes resolution, and all kinds of WholeProgramDevirtResolution
; types, including all optional resolution by argument kinds.
^24 = typeid: (name: "_ZTS1A", summary: (typeTestRes: (kind: allOnes, sizeM1BitWidth: 7), wpdResolutions: ((offset: 0, wpdRes: (kind: branchFunnel)), (offset: 8, wpdRes: (kind: singleImpl, singleImplName: "_ZN1A1nEi")), (offset: 16, wpdRes: (kind: indir, resByArg: (args: (1, 2), byArg: (kind: indir, byte: 2, bit: 3), args: (3), byArg: (kind: uniformRetVal, info: 1), args: (4), byArg: (kind: uniqueRetVal, info: 1), args: (5), byArg: (kind: virtualConstProp)))))))
; Test TypeId with other optional fields (alignLog2/sizeM1/bitMask/inlineBits)
^25 = typeid: (name: "_ZTS1B", summary: (typeTestRes: (kind: inline, sizeM1BitWidth: 0, alignLog2: 1, sizeM1: 2, bitMask: 3, inlineBits: 4)))
; Test the other kinds of type test resoultions
^26 = typeid: (name: "_ZTS1C", summary: (typeTestRes: (kind: single, sizeM1BitWidth: 0)))
^27 = typeid: (name: "_ZTS1D", summary: (typeTestRes: (kind: byteArray, sizeM1BitWidth: 0)))
^28 = typeid: (name: "_ZTS1E", summary: (typeTestRes: (kind: unsat, sizeM1BitWidth: 0)))

Rewrite Git history to prune large 'old files'?

Update summary, 23/11/2018: This repository currently requires ~10MiB of download, which isn't ideal considering the source is only a few hundreds of kilobytes. @mewmew and I propose to shrink it to ~800kiB, to give a faster "Go install" experience for anyone using the repository.

The reason for the blowup is that there were some large test cases (including sqlite) which measure in the 10's of MiBs, and various other bits relating to parsing were also quite large. Those have now moved into other repositories in the llir organization, so we don't need to download those anymore if you just want to import llir.

Original issue text.

I just saw @mewmew's comment in ec48d54 but thought it would be easier to have a separate issue for discussion - the commit itself is very long so if I commented on the commit the discussion would be way down at the bottom!

First, can I clarify the question - are you asking how to remove lots of old large assets from the history of the repository?

If that is the question, the answer is, yes you can do it, but anyone who cloned the repository needs to know about it otherwise they might get in a mess, since it requires rewriting history. At least, that's the best I know. See github's guidance on the issue.

Constant globals in constant expressions

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
    printf("%d\n", strncmp("a", "a", 1));
    return 0;
}

clang translates this to:

%6 = call i32 @strncmp(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0), i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0), i64 1) #3
  %7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i32 0, i32 0), i32 %6)

Note that strncmp requires i8*, but the constant globals are i8 arrays, so it uses a constant getelementptr expression on the global to get the i8* from the static data.

As far as I can tell, I can't achieve the same effect at the moment because *ir.Global does not implement IsConstant, so can't be fed into a constant.NewGetElementPtr. Is that correct?

Ambiguity in grammar for parsing alignment attributes, string attributes

The grammar contains an ambiguity when parsing global variable alignment attributes. More specifically, an alignment attribute of a global variable may be interpreted either as a GlobalAttr or a FuncAttr, and since the list of both global attributes and function attributes may be optionally empty, this leads to a shift/reduce ambiguity in the parser.

From the ll.tm EBNF grammar:

GlobalDecl -> GlobalDecl
	: Name=GlobalIdent '=' ExternLinkage Preemptionopt Visibilityopt DLLStorageClassopt ThreadLocalopt UnnamedAddropt AddrSpaceopt ExternallyInitializedopt Immutable ContentType=Type (',' Section)? (',' Comdat)? (',' Align)? Metadata=(',' MetadataAttachment)+? FuncAttrs=(',' FuncAttribute)+?
;

FuncAttribute -> FuncAttribute
	: AttrString
	| AttrPair
	# not used in attribute groups.
	| AttrGroupID
	# used in functions.
	#| Align # NOTE: removed to resolve reduce/reduce conflict, see above.
	# used in attribute groups.
	| AlignPair
	| AlignStack
	| AlignStackPair
	| AllocSize
	| FuncAttr
;

Specifically, the end of the line is of interest (',' Align)? Metadata=(',' MetadataAttachment)+? FuncAttrs=(',' FuncAttribute)+?

Given that there are no metadata attachments, the alignment attribute (align 8) of the following LLVM IR:

@a = global i32 42, align 8

may be either reduced to a global attribute (i.e. Align before MetadataAttachment), or as a function attribute (i.e. FuncAttribute after MetadataAttachment).

The solution employed by the C++ parser is the opposite of maximum much, as it will try to reduce rather than shift when possible.

Read the assembly language representation of LLVM IR

Corresponds to requirement 4, decomp/decomp#94.

Lexer: implemented in lexer and token.
Parser: generated from a BNF grammar defined in the llir/spec repository.

[challenge] rand

"The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."
— Tom Cargill, Bell Labs

To encourage the development of the final 10%, a set of challenges have been produced. This is a meta-issue to track the rand challenge of the parse-me repository.

Note: there will exist several, almost identical, challenge issues. The main reason for this is that the developer finds childish joy in closing issues once a challenge has been beaten :)

How to create malloc in llir/llvm?

Basically what I need is something like this:
https://godoc.org/llvm.org/llvm/bindings/go/llvm#Builder.CreateMalloc

In my case, I'm trying to create a currying function call, it would need to store parameters in a structure

ir: rethink sumtypes to allow for user-definable types

Should we rethink sumtypes to allow for user-defined types?

For instance, ir.Instruction currently requires the unexported isInstruction method, but there are valid use cases where users may wish to define their own instructions to put in basic blocks.

One such use case seen in the wild is a comment pseudo-instruction which prints itself as ; data..., e.g.

// Comment is a pseudo-instruction that may be used for adding LLVM IR comments
// to basic blocks.
type Comment struct {
   // Line-comment contents.
   Data string
}

// IsInstruction implements the ir.Instruction interface for Comment.
func (inst *Comment) IsInstruction() {}

// LLString returns the LLVM syntax representation of the comment.
func (inst *Comment) LLString() string {
   return fmt.Sprintf("; %s", inst.Data)
}

Provide a full implementation of LLVM IR

Based on the time constraints of this project a full implementation of LLVM IR will not be developed during its time frame. The ambition is to develop a full implementation once the project is finished. With this in mind, the focus is now to implement a minimal subset required for decompilation. Any code not directly related to this subset will be removed from the repository for now, and will be added back once the project is completed. This issue will make sure to track these code changes so they can be reverted easily.

Support specifying linkage type for globals and functions

Per https://llvm.org/docs/LangRef.html#linkage-types global variables and functions can have linkage types. Trying to generate IR (as a compiler frontend), this is a required feature.

Unless I've been particularly blind, setting the linkage type isn't currently possible.

members of call instruction not resolved when type set to FuncType rather than VoidType

The callee operand of call instructions has two ways to represent their types. Namely as the return type of the callee, or as the complete function signature of the callee. Currently, the latter format causes a nil-pointer deref when the callee has a void return type.

Successful parse:

declare void @g()

define void @f() {
	call void @g()
	ret void

	call void @g()
	ret void
}

Crash with nil-pointer defer when parsing.

declare void @g()

define void @f() {
	call void () @g()
	ret void

	call void () @g()
	ret void
}

Implement floating point numbers

To be able to Implement the instructions taking floats, first floats need to be implemented. This Issue tracks all work related to implementing the float type and instructions.
Floating point types:

half 16-bit floating point value
float 32-bit floating point value
double 64-bit floating point value
fp128 128-bit floating point value (112-bit mantissa)
x86_fp80 80-bit floating point value (X87)
ppc_fp128 128-bit floating point value (two 64-bits)

Floating point constants:

Decimal point
Scientific
Hex

Instructions:

fast-math-flags

Upcoming release of the llir/llvm project

This notice is intended to give a heads up for those using the llir/llvm library. The next release will include complete support for all intrinsics of the LLVM IR language. The work is currently in a flux, and to experiment with different API designs and simplify the parser logic and reduce the code duplication in the project, a new repo has been created during the experimental phase.

https://github.com/mewmew/l

At the current stage, the grammar is capable of parsing the entirety of the LLVM IR language, including specialized metadata nodes (#26).

While working on this we will also try to take into consideration previous issues that have been identified with the parser (such as the handling of quoted strings #24).

The llir/llvm/ir package will be extended to support the entire LLVM IR language; thus resolving #23 as linkage information will be present in the in-memory intermediate representation form.

With the upcoming release, read support for all of the LLVM IR language concepts will have been implemented; thus resolving #15.

Similarly; we will now have a grammar covering the entire LLVM IR language; thus resolving #2.

With the addition of support for specialized metadata nodes, the second requirement of llgo will also be fully supported (#3); llgo uses the DIBuilder API for generating debug metadata (DWARF, et al.). This could be built outside of the core (it's just a matter of creating metadata nodes in a particular format), just be aware that it's pretty finicky and easy to break..

For IR construction, a similar approach will be used as has been done before. Personally, we feel this approach has worked out well and has been quite pleasant to use. If anyone has input on their own experience using the API of the llir/llvm/ir package to construct LLVM IR, please let us know as that could help shape the upcoming release. As for llgo, the first requirement in terms of using the LLVM API for generating code, it's mostly write-only via the builder API. Bitcode and IR reading is not important (at the moment?), but writing is; one or the other is required, but preferably both. is satisfied by this API, and has been for a while. Although, now the llir/llvm/ir package will contain the support for the entire LLVM IR language, and now just a subset; thus the requirement should be satisfied in full.

Module top-level information such as target triple and data layout has been and will continue to be recorded and maintained by the IR API, thus supporting the third requirement of llgo; llgo needs to be able to look up target data (arch word size, alignment, etc.) from triples.

Generating C-shared library bindings compatible with the official C library of the LLVM project is an ambitious goal that is left for a future release (#12). Anyone specifically interested in this topic, feel free to get in touch with us or continue the discussion in the dedicated issue.

Similarly, interaction with the Go runtime is targeted for a future release, and those with knowledge in this domain are happily invited to the discussion on what is needed and how to bring this about (#18).

As for use-tracking and data analysis support (#19), more thought will be required to get a clean API. This is therefore targeted for a future release.

So, to summarize, the upcoming release of the llir/llvm project will include read and write support for the entire LLVM IR language. In other words, it will be possible to parse arbitrary LLVM IR assembly files into an in-memory representation, aka the one defined in package llir/llvm/ir. And the in-memory IR representation will have support for the entire LLVM IR language, and can be converted back to LLVM IR assembly for interaction with other tools, such as the LLVM optimizer.

Any feedback is welcome, so we know we're heading in the right direction.

Cheerful regards,
/u & i

String examples

Could you provide an example of generating llvm ir with string usage and concatenation using the API? Your main example on the readme is great for using integer values and variables but I am struggling to convert the example to use strings or pointer values in general.

I know what I would like the ir to look like, but I am unclear how to generate the resultant ir using the API.

example:

From C:

#include <stdio.h>
#include <string.h>

int main() {
   char src[50], dest[50];

   strcpy(src,  "This is source");
   strcpy(dest, "This is destination");

   strcat(dest, src);

   printf("Final destination string : |%s|", dest);
   
   return(0);
}

To desired llvm ir:

@.str = private unnamed_addr constant [15 x i8] c"This is source\00", align 1
@.str.1 = private unnamed_addr constant [20 x i8] c"This is destination\00", align 1
@.str.2 = private unnamed_addr constant [32 x i8] c"Final destination string : |%s|\00", align 1

; Function Attrs: noinline nounwind uwtable
define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca [50 x i8], align 16
  %3 = alloca [50 x i8], align 16
  store i32 0, i32* %1, align 4
  %4 = getelementptr inbounds [50 x i8], [50 x i8]* %2, i32 0, i32 0
  %5 = call i8* @strcpy(i8* %4, i8* getelementptr inbounds ([15 x i8], [15 x i8]* @.str, i32 0, i32 0)) #3
  %6 = getelementptr inbounds [50 x i8], [50 x i8]* %3, i32 0, i32 0
  %7 = call i8* @strcpy(i8* %6, i8* getelementptr inbounds ([20 x i8], [20 x i8]* @.str.1, i32 0, i32 0)) #3
  %8 = getelementptr inbounds [50 x i8], [50 x i8]* %3, i32 0, i32 0
  %9 = getelementptr inbounds [50 x i8], [50 x i8]* %2, i32 0, i32 0
  %10 = call i8* @strcat(i8* %8, i8* %9) #3
  %11 = getelementptr inbounds [50 x i8], [50 x i8]* %3, i32 0, i32 0
  %12 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([32 x i8], [32 x i8]* @.str.2, i32 0, i32 0), i8* %11)
  ret i32 0
}

; Function Attrs: nounwind
declare i8* @strcpy(i8*, i8*) #1

; Function Attrs: nounwind
declare i8* @strcat(i8*, i8*) #1

declare i32 @printf(i8*, ...) #2

	// IntType is an LLVM IR integer type.
	type IntType struct {
	// Type name; or empty if not present.
	TypeName string
	// Integer size in number of bits.
	BitSize int64
	}

	// ArrayType is an LLVM IR array type.
	type ArrayType struct {
	// Type name; or empty if not present.
	TypeName string
	// Array length.
	Len uint64
	// Element type.
	ElemType Type
	}

llir / llvm Goto Github PK

llvm's People

Contributors

Stargazers

Watchers

Forkers

llvm's Issues

Help wanted

asm

Official LLVM results

Coreutils

SQLite

llir/llvm results

Coreutils

no concurrency

concurrent translateTopLevelEntities

concurrent translateGlobals

concurrent translateTopLevelEntities and translateGlobals

SQLite3

no concurrency

concurrent translateTopLevelEntities

concurrent translateGlobals

concurrent translateTopLevelEntities and translateGlobals

Alphabetically sorted

Type definitions

Comdat definitions

Attribute group definitions

(unnamed) Metadata definitions

Sorted by order of occurrence in input

Global variable declarations and definitions

Function declarations and definitions

Indirect symbols (aliases and indirect functions)

Named metadata definitions

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`llir/llvm` results

concurrent `translateTopLevelEntities`

concurrent `translateGlobals`

concurrent `translateTopLevelEntities` and `translateGlobals`

concurrent `translateTopLevelEntities`

concurrent `translateGlobals`

concurrent `translateTopLevelEntities` and `translateGlobals`