riscv-non-isa / riscv-elf-psabi-doc Goto Github PK
View Code? Open in Web Editor NEWA RISC-V ELF psABI Document
Home Page: https://jira.riscv.org/browse/RVG-4
License: Creative Commons Attribution 4.0 International
A RISC-V ELF psABI Document
Home Page: https://jira.riscv.org/browse/RVG-4
License: Creative Commons Attribution 4.0 International
Hi,
I assume that packed structs are always passed on the stack? Can't find it documented?
Also, an RV32IMAD (no F extension), should it still have float arguments in float registers? (Unlikely combination, but should be documented).
A struct { double x; long y; }
is passed in a GPR and a FPR on call, but in two GPRs on return. It'd be didactically useful for have the rules for return values be precisely the same as for call arguments.
There is a mention for bitfields, but it's still slightly under-documented.
The behaviour I'm observing from GCC is that:
struct s1 { float f; long long int j : 32; }
and similar are passed as FPR+GPR on ILP32D (should be clarified in the doc that this is still allowable, despite long long int
being 64 bits.Can someone please confirm the above are expected behaviour, suitable for documentation in the psabi-doc?
http://refspecs.linuxbase.org/cxxabi-1.83.html#calls implies that classes without nontrivial copy constructors are supposed to be handled like structs, but this code is passing a one-word class in memory:
class A {
public:
virtual void fn();
};
extern void abc(A);
void cba() { A q; abc(q); }
When using the hard FP ABI, a struct containing just an int+fp may be passed in one GPR and one FPR. If the integer is < XLEN, is its extension to XLEN bits specified? i.e. is it passed as if it were a scalar parameter (extended according to the sign of the type up to 32-bits, then sign extended to XLEN), or are the bits beyond the width of the original type undefined?
Hi, guys
There isn't t3 in RVE. Thus, I propose a solution following: replacing t2 with t0 and replacing t3 with t2 in the entries.
...The first entry in the PLT occupies two 16 byte entries:
1: auipc t0, %pcrel_hi(.got.plt)
sub t1, t1, t2 # shifted .got.plt offset + hdr size + 12
l[w|d] t2, %pcrel_lo(1b)(t0) # _dl_runtime_resolve
addi t1, t1, -(hdr size + 12) # shifted .got.plt offset
addi t0, t0, %pcrel_lo(1b) # &.got.plt
srli t1, t1, log2(16/PTRSIZE) # .got.plt offset
l[w|d] t0, PTRSIZE(t0) # link map
jr t2
Subsequent function entry stubs in the PLT take up 16 bytes and load a function pointer from the GOT. On the first call to a function, the entry redirects to the first PLT entry which calls _dl_runtime_resolve and fills in the GOT entry for subsequent calls to the function:
1: auipc t2, %pcrel_hi([email protected])
l[w|d] t2, %pcrel_lo(1b)(t2)
jalr t1, t2
nop
Imagine I have a double or a struct containing a double on an RV32I system. This probably has 8-byte alignment (see #19). In the case that there are no GPRs available, will the stack slot it is written to be 4 or 8-byte aligned? (i.e. aligned according to the original type, or according to xlen).
There's a related case with a struct containing an int32 on an RV64I system. The struct has 4-byte alignment, but when I fail to allocate a GPR for it there's the choice of assigning 4-byte sized 4-byte aligned stack slot, an 8-byte size 4-byte aligned stack slot, or a 8-byte size 8-byte aligned stack slot. I think "bits past the end of an aggregate whose size in bits is not divisible by XLEN, are undefined" suggests an 8-byte stack slot, but the alignment is still unspecified.
Is the intended alignment rule max(xlen_align, object_align) for these cases?
Both GCC and Clang support this syntax and generate working code even when compiling for a system with no vector support. Therefore, it would be good if we can standardise the ABI details.
The calling convention description is incomplete without specifying the alignment of scalar values. Of course the alignment is usually sizeof(ty), but some 32-bit ABIs allow doubles to be 4-byte aligned.
The rules for the floating point calling convention result in rather a high number of mutually incompatible ABIs. Is this desirable?
Currently we have:
Plus of course RV128 in the future. There is no RV32IFDQ as this combination is disallowed by the ISA specification.
TLSDESC (-mtls-dialect=gnu2
) improves traditional General Dynamic and Local Dynamic TLS models (-mtls-dialect=gnu
). In the most common case that TLS variables are defined in initially-loaded modules, it simplifies the work in __tls_get_addr
and (probably more importantly) switches to a custom calling convention that doesn't clobber any register ("preserve any registers they modify"). This speedup is significant.
The linker may relax TLSDESC code sequence to Initial Exec (targeting an executable, the symbol is preemptable) or Local Exec (target an executable, the symbol is non-preemptable) if applicable.
The initial (static) and outstanding (dynamic) relocation types for TLSDESC have to be defined, as well as how static relocation types are relaxed to Initial Exec and Local Exec models.
TLSDESC is currently available on x86, x86-64, arm and aarch64.
x86: https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
ARM: https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt (the published paper was referenced by ELF for the Arm 64-bit Architecture (AArch64) and ARM 32-bit)
As I understand it, TLSDESC is a strict improvement, so it might be worth defaulting to TLSDESC and probably deprecating relocation types for General Dynamic/Local Dynamic.
This issue is about my interpretation of the ABI and GCC's implementation of the ABI. GCC is definitely broken in one specific case, but what the fix is depends on the ABI, and it's not obvious to me exactly what the existing text means.
Lets start with an example of what GCC does wrong, this is actually a C++ test, not C, that is important:
#define MAKE_STRUCT_PASSING_TEST(type,val) \
static struct struct_ ## type ## _t \
{ \
struct { } e; \
struct { type f; } s; \
} global_struct_ ## type = { {}, { val } }; \
\
static bool \
check_struct_ ## type (struct_ ## type ## _t obj) \
{ \
return (obj.s.f == global_struct_ ## type .s.f); \
} \
\
int \
main () \
{ \
bool result = check_struct_ ## type ( global_struct_ ## type ); \
return result ? 0 : 1; \
}
MAKE_STRUCT_PASSING_TEST(float,2.5)
The relevant part of the ABI document is this:
Empty structs or union arguments or return values are ignored by C compilers which support them as a non-standard extension. This is not the case for C++, which requires them to be sized types.
As I said, this is a C++ test, so the empty sub-struct will have size 1-byte, but with trailing padding will account for 4-bytes in the containing struct.
Currently GCC is a bit of a mess in its handling of these cases. In the example above with a float argument GCC tries to ignore the empty struct, but then incorrectly passes the float field (it actually passes the empty struct instead). If the above example is changed to contain an integer field then GCC correctly passes the full struct, including the empty part and the integer field using two integer registers.
The quoted part of the ABI document doesn't really say what should happen with empty C++ structs, it seems to me the ABI document simply states that such structures are non-zero sized, which is true.
However, as far as I understand it, the content of that non-zero sized space is undefined, and as such a compiler could ignore the empty structure in C++ just as it does in C.
In the above case the caller could pass the non-empty fields and the callee can simply make up content with which to fill the non-zero sized empty struct, this feels just as valid as passing over some undefined bytes.
I currently see two routes forward, one would be we change the above quoted part to read something like:
Empty structs or union arguments or return values are ignored by C compilers which support them as a non-standard extension. In C++ empty structs and unions are required to be sized types, however, as their content is undefined, they are similarly ignored when passing arguments.
We then fix GCC to correctly ignore the empty structs in C++, this would fix the float case which is just broken, but would be ABI breaking in GCC for the case of passing an empty struct and an integer.
Alternatively, we make the intention of the ABI clearer by changing the above quoted text to read something like:
Empty structs or union arguments or return values are ignored by C compilers which support them as a non-standard extension. This is not the case for C++, which requires them to be sized types, and these are passed as integer arguments.
We then fix GCC to correctly pass over the empty structs in C++, this would fix the float case which is currently just broken, and would maintain the existing ABI in GCC for the integer case. However, the final ABI would be slightly less efficient in this case (which is probably an edge case anyway, so we probably don't care too much about efficiency).
The in-memory representation of _Bool is not documented anywhere.
The SysV64 ABI specifies it as follows:
Booleans, when stored in a memory object, are stored as single byte objects the value
of which is always 0 (false) or 1 (true).
Which section of the ABI would be appropriate to add this type of wording ?
Some parts of the ISA, e.g., the RISC-V Vector ISA, use 0 for false, and ~0 for true. It might be saner to try to make the representation of _Bool to match all contexts, and use ~0 for true everywhere. ~0 is a mask that selects all bits. Being able to just do a "logical and" with _Bool to select all or no bits would be quiet useful and elegant.
The document currently states: "struct { struct { float f[1]; } g[2]; int h; }
and struct { float f; float g; int h; }
are treated the same"
There are multiple possible fixes depending on the point that is being made. Perhaps { struct { float f[1]; } float g[1]; int h; }
is what was intended? However regardless of whether you flatten the struct fields, this is (by my reading) always passed identically to the integer calling convention. struct { struct { float f[1]; } int h[1]; }
would perhaps be a better example?
There is a table in the v2.1 spec which should be moved here. It might also be worth expanding to include a mapping of _Complex
and _Bool
.
The table in the v2.1 specification lists long double as 16 bytes in RV32. Is that correct given that RV32IMFDQ is specifically disallowed in the spec? Or is the idea that it's a handy way to access 128-bit fp implemented via software emulation routines?
Current syscall abi implement in newlib is pass argument in $a0~$a3 and syscall number in $a7[1]
However $a7 is not avariable in RV32E, so let put in $a4, $a5?
[1] https://github.com/riscv/riscv-newlib/blob/riscv-newlib-2.4.0/libgloss/riscv/machine/syscall.h
We've been implementing TLS support in lld: https://reviews.llvm.org/D39324, but the documentation regarding the layout of TLS structures is missing.
If I'm understanding correctly from reading the various TLS headers in glibc: tp
is set up to point to the end of the TCB and the beginning of the static TLS block; this means that it is neither variant I nor II as described in ELF Handling for Thread-Local Storage. The front of the TCB contains a pointer to the DTV, and each pointer in DTV points to 0x800 past the start of a TLS block to make full use of the range of load/store instructions.
Is it safe to assume that this will be the case across all platforms? These values are hardcoded in bfd as well. What is the motivation behind making tp
to point to the end of TCB as opposed to following the two variants defined in the ELF TLS paper?
int foo();
int g() { return foo()+1; }
This compiles to R_RISCV_CALL
(call foo)
in -fno-pic
mode and R_RISCV_CALL_PLT
(call foo@plt
) in -fpie/-fpic mode.
IMHO the distinction is not really useful. We can avoid R_RISCV_CALL
and use R_RISCV_CALL_PLT
everywhere (also in -fno-pic
mode). If the target symbol is non-preemptable (local/hidden/not -shared/etc), the linker can omit PLT creation for R_RISCV_CALL_PLT
. AFAICT, the only differences are these lines:
// binutils-gdb/bfd/elfnn-riscv.c
case R_RISCV_CALL_PLT:
/* This symbol requires a procedure linkage table entry. We
actually build the entry in adjust_dynamic_symbol,
because this might be a case of linking PIC code without
linking in any dynamic objects, in which case we don't
need to generate a procedure linkage table after all. */
if (h != NULL)
{
h->needs_plt = 1;
h->plt.refcount += 1;
}
break;
case R_RISCV_CALL:
/* Handle a call to an undefined weak function. This won't be
relaxed, so we have to handle it here. */
if (h != NULL && h->root.type == bfd_link_hash_undefweak
&& h->plt.offset == MINUS_ONE)
{
/* We can use x0 as the base register. */
bfd_vma insn = bfd_get_32 (input_bfd,
contents + rel->r_offset + 4);
insn &= ~(OP_MASK_RS1 << OP_SH_RS1);
bfd_put_32 (input_bfd, insn, contents + rel->r_offset + 4);
/* Set the relocation value so that we get 0 after the pc
relative adjustment. */
relocation = sec_addr (input_section) + rel->r_offset;
}
I don't know much about BFD internals, but it appears the existence of a PLT in -fpie/-fpic code does not make differences. R_RISCV_CALL_PLT is not necessary to force the creation of a PLT if the symbol is non-preemptable.
In -fno-pic code, there may be a so-called "canonical PLT" (created due to an absolute/pc relative relocation to a function). However, its handling has nothing to do with the dichotomy of R_RISCV_CALL{,_PLT}
.
I don't know what the weak-undef code does.
On x86_64, newer gas/llvm-mc (after r358652) produces R_X86_64_PLT32
for call foo
and call foo@plt
. R_X86_64_PLT32
can be optimized to R_X86_64_PC32
if the symbol is non-preemptable.
powerpc32 ELFv1 specifies R_PPC_LOCAL24PC
R_PPC_REL24
and R_PPC_PLTREL24
. powerpc64 ELFv2 just uses R_PPC64_REL24
.
I've been having some questions about RV32E support in LLVM+Clang. It's difficult to consider adding support as the RV32E instruction set extension itself isn't yet "frozen" and the ABI isn't fully documented. A little more documentation on the proposed ABI has been added recently (thanks!), but I thought it would be worth making an issue to track the work that still needs to be done.
As far as I can see, we need:
Are there other issues that need to be addressed? Please note: I'm simply creating this issue to track what needs to be done, not to claim it - I'm not currently distributing or directly supporting RV32E IP myself.
The DWARF specification says that ABI implementations should define the hardware register -> DWARF register mappings (§2.6.1.1.2). It seems that this is not included in the RISC-V psABI, nor can I seem to find this mapping anywhere else in official RISC-V documentation. As an example, the x86-64 psABI documents the register mapping in §3.6.2.
For context, this is needed in Mono to do unwinding. E.g. for x86-64: https://github.com/mono/mono/blob/aec2773e1db0799479161688d6161f5d5ce586a3/mono/mini/unwind.c#L46-L51
I've just done a bunch of tests to nail this down for the libffi port, so I should probably document this very soon.
There's no red zone so the stack pointer must be decreased before any data can be stored in the frame. From openhwgroup/cv32e40p#10 .
From the wording, it is unclear to me if the following is acceptable or not
.text
.globl _start
foo:
addi a1, a0, %pcrel_lo(label)
ret
.section .text.new_section
_start:
label:
auipc a0, %pcrel_hi(bar)
j foo
bar:
ret
GNU ld complains with dangerous relocation: %pcrel_lo missing matching %pcrel_hi
, so perhaps there is an underlying assumption that the reference and the label must be found in the same section?
I think the assumption is reasonable but I fail to see the text explicitly constraining this. For example, something like this
The `R_RISCV_PCREL_LO12_I` or `R_RISCV_PCREL_LO12_S` relocations contain
a label pointing to an instruction with a `R_RISCV_PCREL_HI20` relocation
entry that points to the target symbol:
- At label: `R_RISCV_PCREL_HI20` relocation entry ⟶ symbol
- - `R_RISCV_PCREL_LO12_I` relocation entry ⟶ label
+ - `R_RISCV_PCREL_LO12_I` relocation entry ⟶ label. The reference to the label
+ and the label definition must be in the same section.
What do you think?
Thanks!
The meaning of the relocations' operands in the Details column is not documented. Presumably S
stands for symbol and A
for addend. But maybe I'm interpreting it incorrectly as something seems to be wrong. For instance:
Enum | ELF Reloc Type | Description | Details |
---|---|---|---|
1 | R_RISCV_32 | Runtime relocation | word32 = S + A |
39 | R_RISCV_SUB32 | 32-bit label subtraction | word32 = S - A |
My understanding of R_RISCV_32
is that it should replace the existing value at some memory location. Therefore the interpretation of word32 = S + A
seems straightforward, with S
being the referenced symbol and A
being an addend offset. So if your assembly file has something like .word some_symbol+4
you'd get a R_RISCV_32
relocation with word32 = &some_symbol + 4
. But then that interpretation doesn't work for R_RISCV_SUB32
. If you have .word sym2+4-sym1
you get the two relocations R_RISCV_ADD32 sym2+4
and R_RISCV_SUB32 sym1+0
. So shouldn't R_RISCV_SUB32
be documented as word32 -= S + A
, or something along those lines?
Please clarify this issue and what the operands should refer to.
The AArch64 AAPCS and x86-64 psABI both make an effort to standardise bit field representation. I don't know how much code is out there that exposes bit fields in its public ABI, but it may be worth specifying something sensible here.
The Floating-Point Control and Status Register (FCSR) contains reserved bits, the rounding mode (frm) and the accrued exception flags (fflags).
The frm are probably a system-wide setting.
What about the fflags? Are they supposed to be preserved across function calls?
In the ARM world, there were complaints that GCC and LLVM ended up making different choices regarding the stack slot fp points to. See here and here. There is a desire for "fast unwinding", i.e. unwinding without having to use DWARF metadata.
Given that the frame pointer will likely be omitted in the common case and the prevalence of optimisations like shrink wrapping, this feels like a minor point. However it did cause pain in the ARM community so thought I'd raise it here for consideration.
This document should clarify whether fs0-fs11 should be considered callee-save when compiling for the soft-float ABI. e.g. -march=rv32imaf -mabi=ilp32
.
I would suggest that all floating point registers must be considered temporaries when compiling for the soft float ABI. Consider linking object file F.o compiled with -march=rv32imaf -mabi=ilp32
and D.o compiled with -march=rv32imafd -mabi=ilp32
. If code in D.o stores a double in fs0 and code in F.o attempts to spill it, the value would obviously be corrupted. We could of course define new bits in e_flags
and linker behaviour to try catch these problems at link time (ensuring code compiled with FLEN=0 is only linked with code compiled with one other FLEN value, e.g. a mix of objects with flen=0 and flen=32 is fine, but not flen=0 and flen=32 and flen=64).
There's a missing trivial indication that x8 (s0) is the frame pointer. See for example https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf .
Krste pointed out that the FP calling convention isn't explicit about what happens when you have both int and fp args. It only says that FP regs are still used even if the int regs are exhausted. But doesn't say what happens if the int regs haven't been exhausted yet. Or what happens with int args after you start using FP regs for args. In particular, if you have 8 int args and 8 fp args (which are smaller than int/fp regs), then you can pass all of them in registers no matter what order they appear in.
There is apparently an old RISC-V calling convention which can still be found on the web which says otherwise.
See this x86 ABI bug for more details: https://sourceware.org/bugzilla/show_bug.cgi?id=21265
I vote we solve this on RISC-V by just requiring that lazily bound functions follow the standard ABI. This lets us avoid saving the rest of the X registers, and assuming we can do the fixup without any extensions (which seems reasonable, and is what we currently do despite it not actually being enforced) we won't need to save F or V state here.
I don't know where this should go in the manual, @asb?
The same reasoning as for long double
under -m64
would apply here and double values, being = 2 * XLEN and > FLEN, should be passed exclusively in memory.
Consider the following declarations:
struct s1_ty { float f; int i; };
struct s2_ty { float f; char c; };
union u_ty { struct s1_ty s1; struct s2_ty s2; };
struct s3_ty { union u_ty u; };
As it stands, the FP calling convention makes no mention of unions. So if I had a function taking u_ty
as an argument I might reasonably pass this according to the integer calling convention. If a function took s3_ty
, there's a question of whether a union can ever be "flattened". For instance in this particular case, the first field of the union will always be a float. Presumably even if you did "flatten" the fp field, you'd still consider the second field of the flattened result to be a union of int
and char
, which would be an "aggregate" rather than an "integer". The degenerate case would be a union of two identical structs, containing a real and an integer.
I assume the intention is that unions are never "flattened"?
Right now the Linux syscall interface uses a7 for the syscall number and a0-a6 for integer arguments. This is quite adequate for Linux, where syscalls have at most 7 integer arguments; but I think it'd be cleaner to have a general syscall ABI defined with the same functionality as the base ABI.
Straw proposal: syscall number in t0 or t1 (TBD; t1 is friendlier to millicode) before ecall, everything else as for the base calling convention.
This should be in the main document somewhere but right now I'm writing this primarily for @PkmX .
Consider the following:
.option norvc
.option relax
__global_pointer$:
.skip 2028
L1:
addi a0, a0, %lo(L3) // (D)
L2:
lui a1, %hi(L1) // (A)
addi a1, a1, %lo(L1) // (B)
lui a0, %hi(L3) // (C)
j L1
L3:
Relaxation of one instruction at a time in address order with immediate update of symbol values will break this code. When (D) is visited, L3 - __global_pointer$
is out of range for a 12-bit immediate so the addi has to be kept, but when (C) is visited that offset is in-range because (A) + (B) can be relaxed to a single instruction. Relaxing (C) then causes a0
to be not set when control flow reaches (D).
It is necessary for all linked %hi / %lo pairs to make a consistent decision regarding whether the relaxation should be performed; since there's no metadata explicitly linking the pairs, the easiest way to arrange this is with a two-pass algorithm:
Visit each relaxable relocation, determine if it can be relaxed, and perform byte modifications but do not change any symbol values. The current binutils implementation uses R_RISCV_DELETE
pseudo-relocations to record bytes which need to be deleted in the next phase; since it is never written into object files it is not an actual ABI relocation.
Perform all pending byte deletions.
As an additional benefit, the two-pass algorithm is linear time and somewhat parallelizable.
The stack pointer need only be aligned to a 32-bit boundary.
We only require stack pointer aligned to 32-bit before, however we allow RV32EF and RV32EFD now, it's might be a problem for RV32EFD.
Here is two possible solution:
I see binutils 2.29 added a new R_RISCV_32_PCREL relocation. Could somebody please document its purpose and semantics?
The "Defualt ABIs and C type sizes" section does not document the value of _Alignof(max_align_t)
, which is implementation-defined and part of the platform ABI.
long double
has an alignment requirement of 16, which forces _Alignof(max_align_t) >= 16
on the 32-bit and 64-bit ISAs.
All memory allocations returned by malloc
for sizes >= _Alignof(max_align_t)
need to be aligned to a _Alignof(max_align_t)
boundary, so from this POV, it makes sense to pick it "as large as necessary, as small as possible".
The SysV 32 and 64-bit ABIs set _Alignof(max_align_t) == 16
, and this would also make sense here AFAICT.
The only reason to pick a higher value is if we plan to have types in the ISA with a fundamental alignment larger than 16 bytes according to the C standard.
The current docs say
The RV32E calling convention may only be used with the RV32E ISA, hence the role of registers x16-x31 and f0-f31 is not defined. A future version of this specification may relax this constraint.
But it is easy to make rve work with rv32i by making the extra registers call clobbered (i.e caller saved). This is how FP regs already work when using for instance the ilp32 ABI with the rv32gc architecture. This is how the proposed EABI works with registers x16-x31. And this is also how the gcc RVE implementation works, though it needed a small bug fix to make a6 and a7 usage depend on the ABI instead of the architecture. So I think this should be allowed.
The documentation says the following about the PCREL relocations:
23 R_RISCV_PCREL_HI20 PC-relative reference %pcrel_hi(symbol) (U-Type)
24 R_RISCV_PCREL_LO12_I PC-relative reference %pcrel_lo(symbol) (I-Type)
25 R_RISCV_PCREL_LO12_S PC-relative reference %pcrel_lo(symbol) (S-Type)
This lead me to believe that the same data address would be referenced in the HI20 as the LO12.
However, upon compiling an assembly file containing an access to a global of the following form:
ld t4,my_global+8
and inspecting the relocation table generated with elfdump, I discovered that the symbols referenced by the PCREL_HI20 and the PCREL_LO12_I relocations were different:
entry: 4
r_offset: 0x10
r_info: 0x400000017
r_addend: 8
entry: 5
r_offset: 0x10
r_info: 0x33
r_addend: 8
entry: 6
r_offset: 0x14
r_info: 0x2900000018
r_addend: 0
Looking at these symbols, the one referenced by the PCREL_HI20 was the global:
entry: 4
st_name: my_global
st_value: 0
st_size: 16
st_info: STT_OBJECT STB_LOCAL
st_shndx: 3
But the one referenced by the PCREL_LO12_I was a label into the text section:
entry: 45
st_name: .L0
st_value: 0x44
st_size: 0
st_info: STT_NOTYPE STB_LOCAL
st_shndx: 1
It appears that the PCREL_HI20 part refers to the data address being accessed, while the PCREL_LO12 part refers to a label indicating the start PC being accessed relative to. After going on this journey of discovery, I did find that the examples in the RISCV asm documentation could have given me this clue, although the example that describes this behavior points it out very subtly:
1: auipc a0, %pcrel_hi(msg) # load msg(hi)
addi a0, a0, %pcrel_lo(1b) # load msg(lo)
It's pretty easy to not notice that 1b is in pcrel_lo rather than msg, especially if you weren't looking for this piece of information specifically.
Would it be possible to update the documentation to indicate this behavior more clearly?
We have a fair number of RISC-V predefined macros which are either useless or redundant with standard macros or gcc extensions. We have an opportunity to remove them now and then spec the macros that RISC-V C compilers should implement. More details exist on private emails right now, but I'll make a PR soon.
I get the following disassembly code from an object file
0000000000000000 <init_module>:
0: 000007b7 lui a5,0x0
0: R_RISCV_HI20 name
0: R_RISCV_RELAX *ABS*
4: 0007b583 ld a1,0(a5) # 0 <init_module>
4: R_RISCV_LO12_I name
4: R_RISCV_RELAX *ABS*
8: 00000537 lui a0,0x0
8: R_RISCV_HI20 .LC0
8: R_RISCV_RELAX *ABS*
c: ff010113 addi sp,sp,-16
10: 00050513 mv a0,a0
10: R_RISCV_LO12_I .LC0
10: R_RISCV_RELAX *ABS*
14: 00113423 sd ra,8(sp)
18: 00000317 auipc t1,0x0
18: R_RISCV_CALL printk
18: R_RISCV_RELAX *ABS*
1c: 000300e7 jalr t1
20: 00813083 ld ra,8(sp)
24: 00000513 li a0,0
28: 01010113 addi sp,sp,16
2c: 00008067 ret
As mentioned in the ABI doc, it makes sense to place an REL entry of type R_RISCV_RELAX
in address 18
to relax the function call to a JAL
instruction:
Procedure call linker relaxation allows the AUIPC+JALR pair to be relaxed to the JAL instruction when the prodecure or PLT entry is within (-2MiB to +2MiB-1) of the instruction pair.
However, I am confused about the REL entry of type R_RISCV_RELAX
in address 0
. How to relax an absolute addressing?
At the bare minimum we should specify the va_list
struct. The document should probably go much further and give a detailed run-down, much like the AArch64 and x86-64 psABI docs.
Nested structs with long doubles (more generally: float types = 2 * XLEN > FLEN) and other zero-byte fields are in some cases passed in GPR pairs. It'd be simpler if we could say that anything with a real long double field (in particular, not a zero-length array of long double) is forced to memory passing.
Section "Hardware floating-point calling convention" specifies only usage of argument registers (argument passing and value returning). It should also specify that floating point values wider than FLEN stored in callee-saved registers aren't be preserved across function call. This allows spilling registers by code compiled without support of Q or D.
Presumably, an ILP32-on-RV64 ABI would use ELF32, so the ELF class no longer unambiguously equals XLEN.
This isn't an issue for x86-64's x32 ABI: since x86-64 uses a different ELF machine code than IA-32, there is no ambiguity.
RISC-V Integer Calling Convention in web site:"https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md" says that "Aggregates whose total size is no more than XLEN bits are passed in a register, with the fields laid out as though they were passed in memory. If no register is available, the aggregate is passed on the stack. Aggregates whose total size is no more than 2✕XLEN bits are passed in a pair of registers".
Does this mean that the aggregate type of more than 2✕XLEN must be placed on the stack according to the RISC-V calling specification? so the Calling Convention seems to conflict with the code generated by GCC's -fipa-sra optimization option, as this option may use more than two registers when converting aggregate types to scalars. So, does this mean that the use of RISC-V should disable the -fipa-sra option?
In our current CPU design, we extend RISC-V ISA by adding several instructions, such as GP-implied load/store. To handle these instructions, we also extend relocation type. However, there does not have any rule or guideline to add customized relocation type currently. To avid conflict, in our current design, the new relocation id is assigned from 255 and in decreasing order. Could anyone have any idea or propose to deal with this issue?
The AMD64 psABI specifies that _Complex FOO
is always exactly the same as struct { FOO re; FOO im; }
. What is implemented in RISC-V gcc is significantly more complicated. For instance, a _Complex float
can be passed in three different ways on RV64G hard-float:
fa4
and fa5
, if there are 0–6 argument slots used already. This passes them in the recoded float format.a7
if there are 7 argument slots used.If we keep this, it'll add complexity to the psABI document, libffi, and other non-gcc programs which need to interwork with gcc such as llvm and various non-libffi ffis. If we don't it'll require gcc changes and break compiled programs with complex number arguments (probably just LAPACK at this point). That complexity would only be justifiable if it gave a "sufficiently large" performance benefit to "sufficiently many" programs, but I don't have a great understanding of the tradeoffs here.
Attn @aswaterman @kito-cheng because this is a gcc calling convention issue. I am going to be filing a few more issues for complicated spots in the calling convention so we can decide which to keep.
The MVP calling convention is something close to the following:
(Not 100% certain about the sign extension rule here; I'd also like to write up a rationale in a bit to see if this makes sense)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.