t-crest / patmos Goto Github PK
View Code? Open in Web Editor NEWPatmos is a time-predictable VLIW processor, and the processor for the T-CREST project
Home Page: http://patmos.compute.dtu.dk
License: BSD 2-Clause "Simplified" License
Patmos is a time-predictable VLIW processor, and the processor for the T-CREST project
Home Page: http://patmos.compute.dtu.dk
License: BSD 2-Clause "Simplified" License
Most top-level VHLD files are broken due to new pin names. Before fixing them we should go through all top-level and discuss which we keep.
This issue will track the discussion into changing the Patmos ISA to allow predicate manipulation instructions in the second issue slot.
Like #73, there is no technical reason for disallowing predicate manipulation instructions in the second issue slot. Allowing them could bring benefit in many ways, however, for single-path code especially, which has many independent predicate manipulation instructions, this could bring significant improvements
Todo:
This issue will track the discussion into changing the Patmos ISA to make use of either deferred or split instructions.
Some types of instructions cannot be executed without incurring some kind of delay or latency in the pipeline. One example is load instructions, which currently have a 1 cycles delay slot before the loaded value can be used. Another example could be a multiply or division instructions, which requires multiple cycles to execute.
Deferred/split instructions try to address the inefficiency in instructions with latency, by allowing the compiler decide how to manage this latency.
Split instructions "split" a given instructions into two parts: (1) issue the instruction and (2) get the result.
E.g., loads could be split into issuing the load (lwc
, load word from data-cache) and then putting the loaded value into a register (glw
, get loaded word).
The two parts of the load can then be scheduled independently by the compiler, to try and avoid any latency by issue other instructions between them.
Example:
lwc t1 = [r1] ; issue load of address in r1 to load-register t1
add r2 = r3, r4 ; do something else
add r2 = r2, r5 ; do something else
glw r1 = t1 ; get loaded value from load-register t1 into register r1
add r2 = r2, r1 ; use loaded value
Deferred instructions try to address the same problem with a different approach. In addition to providing an instructions with the usual operands, it is also provided with an immediate value operand that specifies when the result is expected.
The immediate value defines after how many instruction words the result should be available in the target register. The compiler can then use this immediate value to issue the instruction early and set the value to match when in the instruction stream it needs the result.
Example:
lwc r1 = [r1], 3 ; Issue a deferred load, with the value available to the third following instruction
add r2 = r3, r4 ; do something else
add r2 = r2, r5 ; do something else
add r2 = r2, r1 ; use loaded value
The deferral range is not specified yet, but suitable ranges could be between 32 and 256.
Patmos emulator fails to build when the method cache is replaced by a direct-mapped instruction cache in hardware/config/default.xml
:
- <ICache type="method" size="8k" assoc="16" repl="fifo" />
+ <ICache type="line" size="8k" assoc="1" repl="dm" />
Several exceptions are raised during the build process:
Patmos configuration "default configuration for DE2-115 board"
Frequency: 80 MHz
Pipelines: 2
Cores: 1
Instruction cache: 8 KB, direct-mapped
Data cache: 4 KB, direct-mapped, write through
Stack cache: 2 KB
Instruction SPM: 1 KB
Data SPM: 2 KB
Addressable external memory: 2 MB
MMU: false
Burst length: 4
[info] [2.097] Done elaborating.
[error] (run-main-0) firrtl.passes.PassExceptions:
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 103:20]: [module ICache] Expression ctrl.io.ctrlrepl is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 106:20]: [module ICache] Expression ctrl.io.ocp_port is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 106:20]: [module ICache] Expression io.ocp_port is used as a SourceFlow but can only be used as a SinkFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 107:16]: [module ICache] Expression ctrl.io.perf is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 111:20]: [module ICache] Expression repl.io.icachefe is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 112:20]: [module ICache] Expression repl.io.replctrl is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 114:17]: [module ICache] Expression repl.io.memIn is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.PassException: 7 errors detected!
[error] firrtl.passes.PassExceptions:
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 103:20]: [module ICache] Expression ctrl.io.ctrlrepl is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 106:20]: [module ICache] Expression ctrl.io.ocp_port is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 106:20]: [module ICache] Expression io.ocp_port is used as a SourceFlow but can only be used as a SinkFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 107:16]: [module ICache] Expression ctrl.io.perf is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 111:20]: [module ICache] Expression repl.io.icachefe is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 112:20]: [module ICache] Expression repl.io.replctrl is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow: @[ICache.scala 114:17]: [module ICache] Expression repl.io.memIn is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.PassException: 7 errors detected!
[error] Nonzero exit code: 1
[error] (Compile / runMain) Nonzero exit code: 1
[error] Total time: 8 s, completed Dec 22, 2021 3:46:38 PM
Build process does not finish successfully. Looks like some links for poseidon are broken:
**===== Processing 'poseidon' =====
===== Cloning from https://github.com/t-crest/poseidon.git =====
Cloning into '/home/bob/t-crest/poseidon'...
remote: Enumerating objects: 3765, done.
remote: Total 3765 (delta 0), reused 0 (delta 0), pack-reused 3765
Receiving objects: 100% (3765/3765), 18.73 MiB | 9.69 MiB/s, done.
Resolving deltas: 100% (2212/2212), done.
~/t-crest/poseidon ~/t-crest
#@-mkdir -p lib 2>&1
#@cd lib && svn checkout http://pugixml.googlecode.com/svn/tags/release-1.2 pugixml
git submodule init
--2021-11-24 12:15:58-- http://mirrors.dotsrc.org/apache//commons/cli/source/commons-cli-1.4-src.tar.gz
Resolving mirrors.dotsrc.org (mirrors.dotsrc.org)... 130.225.254.116, 2001:878:346::116
Connecting to mirrors.dotsrc.org (mirrors.dotsrc.org)|130.225.254.116|:80... connected.
HTTP request sent, awaiting response... Submodule 'lib/pugixml' (https://github.com/zeux/pugixml.git) registered for path 'lib/pugixml'
git submodule update
404 Not Found
2021-11-24 12:15:58 ERROR 404: Not Found.
make: *** [Makefile:67: .common-cli] Error 8
make: *** Waiting for unfinished jobs....
Cloning into '/home/bob/t-crest/poseidon/lib/pugixml'...
Submodule path 'lib/pugixml': checked out '937ac8116e4feac075701d80211c4cafdf673142'**
I am observing a behavior that I cannot explain when executing the same code on multiple cores in patemu
. I was expecting that more cores executing the same code in parallel would slow down the execution on each core due to the increased number of memory accesses. However, the opposite seems to happen with code executed on just one core taking longer to execute than if the same code is run in parallel on multiple cores.
Steps to reproduce (starting from a clean build of the most recent version, i.e., commit 4e8f8d9):
patmos/hardware/config/altde2-115.xml
and uncomment lines 7 and 8 as follows to build a multicore system (note that Argo is not used since it produces an error, see #104 ):<!-- Default is single core -->
<pipeline dual="false" />
<cores count="8"/>
<!--<CmpDevs>
<CmpDev name="Argo" />
</CmpDevs> -->
misc/build.sh
, save the following code to a file called test.c
(sorry for the huge file, I tried to keep it as short as possible while still being able to reproduce the behavior):#include <stdio.h>
#include <stdint.h>
#include <machine/patmos.h>
#include <machine/rtc.h>
#include "libcorethread/corethread.h"
void test(uint8_t *data) {
int round;
for (round = 1; 1; round++) {
{
uint8_t tmp = data[0];
int i;
for (i = 0; i < 15; i++) {
data[i] = data[i+1];
}
data[15] = tmp;
}
if (round == 10)
break;
{
int i, j;
for (i = 0; i < 4; i++) {
uint8_t *col = data + (i * 4);
uint8_t copy[4];
for (j = 0; j < 4; j++) {
copy[j] = col[j];
}
col[0] = copy[3] ^ copy[2] ^ copy[1];
col[1] = copy[0] ^ copy[3] ^ copy[2];
col[2] = copy[1] ^ copy[0] ^ copy[3];
col[3] = copy[2] ^ copy[1] ^ copy[0];
}
}
}
}
static uint8_t test_data[MAX_CPUS * 16];
volatile _UNCACHED static unsigned long long t_start[MAX_CPUS];
volatile _UNCACHED static unsigned long long t_end [MAX_CPUS];
void work(void* arg) {
int cpuid = get_cpuid();
unsigned long long start, end;
// have all CPUs start roughly at the same time
while ((start = get_cpu_cycles()) < 1000000)
;
test(((uint8_t *)test_data) + (cpuid * 16));
end = get_cpu_cycles();
// wait for other CPUs to finish
while (get_cpu_cycles() < end + 1000000)
;
t_start[cpuid] = start;
t_end [cpuid] = end;
}
int main() {
int i, core_cnt = get_cpucnt();
if (core_cnt > MAX_CPUS) {
core_cnt = MAX_CPUS;
}
printf("Starting threads on %d CPUs\n", core_cnt);
for (i = 1; i < core_cnt; i++) {
corethread_create(i, &work, NULL);
}
work(NULL);
int ret;
for (i = 1; i < core_cnt; i++) {
corethread_join(i, (void *)&ret);
}
printf("Threads joined\n");
for (i = 0; i < core_cnt; i++) {
printf(" start time: %llu, duration: %llu\n", t_start[i], t_end[i] - t_start[i]);
}
return 0;
}
patemu
:$ patmos-clang -O3 test.c libcorethread/corethread.c -DMAX_CPUS=1
$ patemu a.out
Starting threads on 1 CPUs
Threads joined
start time: 1000005, duration: 108301
$ patmos-clang -O3 test.c libcorethread/corethread.c -DMAX_CPUS=2
$ patemu a.out
Starting threads on 2 CPUs
Threads joined
start time: 1000000, duration: 92199
start time: 1000000, duration: 92220
$ patmos-clang -O3 test.c libcorethread/corethread.c -DMAX_CPUS=4
$ patemu a.out
Starting threads on 4 CPUs
Threads joined
start time: 1000000, duration: 92199
start time: 1000000, duration: 92220
start time: 1000005, duration: 92236
start time: 1000002, duration: 92260
For the last command, four CPUs execute the same code. The execution time for that code should be higher or at least equal compared to just one CPU executing the code. However, the execution time on a single CPU is significantly higher than one multiple CPUs. Note that the execution time does not vary as more CPUs are added, there is only a huge difference between one and multiple CPUs.
Disclaimer: I assume that the value returned by get_cpucnt()
is equal when called simultaneously by two cores, please correct me if this is wrong.
it appears the compiler optimizes the masking of the shift amount away, the behavior is thus not matching the specification.
This issue will track the discussion into changing the Patmos ISA to allow branches in the second issue slot.
The limitation of not allowing branch instructions in the second issue slot is slightly "artificial" in that there are no technical reasons for it (as there is for load instructions.)
However, allowing branches in the second issue slot can provide significant benefit. One example is string copying, where we can use it in an implementation that only uses 3 cycles per character (amortized):
pmov $p1 = $p0
loop:
{ ($p1) lbuc $r1 = [$r2] ; Load next char
($p1) br loop }; Loop if not done
{ ($p1) add $r2 = $r2, 1 ; Increment source pointer
($p1) add $r3 = $r3, 1 }; Increment target pointer
{ ($p1) sbc [$r3-1] = $r1 ; Save char
cmpneq $p1 = $r1, 0 }; Check whether null was reached
Note: There might be a better way to do this by loading 4 chars at a time, but this is the fastest way to do it while loading only 1 char at a time.
Todo:
Running make platform
in Aegean generates the multi-core system. A side effect is that make emulator
fails in Patmos.
The file patmos/hardware/emulator_config.h
is generated by the Aegean build process and makes the build fail by taking precedence over the correct configuration file (in patmos/hardware/build/
).
The configuration file is generated by patmos/hardware/src/patmos/Config.scala
at line 254. The patmos generation is for some reason run with the cpp backend, but without a configured memory. This messes up the generated configuration header.
Work around: Delete patmos/hardware/emulator_config.h
and run make emulator
Steps to reproduce:
patmos/hardware/config/altde2-115.xml
and uncomment lines 7 and 8 and change the core count to 16 as follows to build a multicore system (note that Argo is not used since it produces an error, see #104 ): <!-- Default is single core -->
<pipeline dual="false" />
<cores count="16"/>
<!-- <CmpDevs>
<CmpDev name="Argo" />
</CmpDevs> -->
misc/build.sh
, save the following code to a file called test.c
:#include <stdio.h>
#include "libcorethread/corethread.h"
void work(void* arg) {
}
int main() {
int i;
for (i = 1; i < 16; i++) {
corethread_create(i, &work, NULL);
printf("started thread on core %d\n", i);
}
return 0;
}
$ patmos-clang test.c libcorethread/corethread.c
$ patemu a.out
started thread on core 1
started thread on core 2
started thread on core 3
started thread on core 4
started thread on core 5
started thread on core 6
started thread on core 7
^C
After starting threads on CPUs 1, 2, 3, 4, 5, 6, and 7, the emulator hangs indefinitely.
In the patmos handbook under the description for typed loads it states:
The value of the destination register is undefined during this load delay slot.
This has some implications that I think we do not want.
Say we have some code that needs to load 2 values and add them to an existing value in a register. A naive implementation could look like:
lwc $r1 = [$r2]
lwc $r1 = [$r3]
add $r4 = $r4, $r1
add $r4 = $r4, $r1
However, the above wording would render this wrong: at the third instruction the value of $r1 is undefined, since it's the destination register of the previous load. A correct implementation would have to be:
lwc $r1 = [$r2]
lwc $r5 = [$r3]
add $r4 = $r4, $r1
add $r4 = $r4, $r5
Which requires an additional register. $r1 is effectively unavailable for the second load. If we don't have enough registers available, a nop
would be needed after the second load.
In my opinion, the value of the destination register should be unaffected by the load until after the delay slot. Then we can always reuse registers in successive loads. I also think this is the behavior most would expect.
I have already brought this to the attention of @schoeberl, so this issue is mostly to ensure we don't forget.
Also, I don't know how Patmos currently implements the loads or whether a change would be needed to conform to my proposal.
The input function scanf does not read in any input of the command line. C progam executed with patemu.
This failure occurs in the VM with the commit id: a891aec.
The handbook defines the register calling convention.
In section "4.2 Register Usage Conventions" it says:
- r20 through r31 are callee-saved saved registers.
However, if we look in section "2.3 Register Files" figure 2.2(a) r20 is given the label (scratch)
as if it is caller-saved.
It clearly cannot be both.
Looking at the compiler, I can see that PatmosRegisterInfo.cpp
defines Patmos::R21
through Patmos::R28
as calleesaved (in the function PatmosRegisterInfo::getCalleeSavedRegs
).
Therefore, I suspect section 4.2 is wrong and should have r21 as the first callee-saved register.
Utility.printConfig(configFile) needs an update, commented out in Patmos.scala
The setting of the PC to jump to an ELF file entry for Verilator is strange. We may also want to set the PC on our chip project. Maybe we can find a better solution that fits for both use cases.
Change all hardcoded constants in the C files to use the defined ones from machine/patmos.h
Also, drop OCPio and comConf. No dynamic class loading for normal IO devices.
Most board definitions are now broken. One (besides other) issue is the multiple definitions of the UART in default.xml and in the other board.xmls.
Patmos emulator fails to build with recent versions of Verilator (tested with 4.212 and 4.214). misc/build.sh patmos
produces following error:
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -Wno-undefined-bool-conversion -O1 -DTOP_TYPE=VPatmos -DVL_USER_FINISH -include VPatmos.h -DVL_THREADED -std=c++17 -c -o VPatmos__Trace__4__Slow.o VPatmos__Trace__4__Slow.cpp
echo "" > VPatmos__ALL.verilator_deplist.tmp
../Patmos-harness.cpp: In member function โvoid Emulator::emu_uart(int, int)โ:
../Patmos-harness.cpp:191:12: error: โclass VPatmosโ has no member named โPatmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Cmdโ
191 | if (c->Patmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Cmd == 0x1
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:192:16: error: โclass VPatmosโ has no member named โPatmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Addrโ
192 | && (c->Patmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Addr & 0xff) == 0x04) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:193:28: error: โclass VPatmosโ has no member named โPatmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Dataโ
193 | unsigned char d = c->Patmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Data;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:201:25: error: โclass VPatmosโ has no member named โPatmos__DOT__UartCmp__DOT__uart__DOT__tx_baud_tickโ
201 | bool baud_tick = c->Patmos__DOT__UartCmp__DOT__uart__DOT__tx_baud_tick;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:216:16: error: โclass VPatmosโ has no member named โPatmos__DOT__UartCmp__DOT__uart__DOT__rx_stateโ
216 | c->Patmos__DOT__UartCmp__DOT__uart__DOT__rx_state = 0x3; // rx_stop_bit
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:217:16: error: โclass VPatmosโ has no member named โPatmos__DOT__UartCmp__DOT__uart__DOT__rx_baud_tickโ
217 | c->Patmos__DOT__UartCmp__DOT__uart__DOT__rx_baud_tick = 1;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:218:16: error: โclass VPatmosโ has no member named โPatmos__DOT__UartCmp__DOT__uart__DOT__rxd_reg2โ
218 | c->Patmos__DOT__UartCmp__DOT__uart__DOT__rxd_reg2 = 1;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:219:16: error: โclass VPatmosโ has no member named โPatmos__DOT__UartCmp__DOT__uart__DOT__rx_buffโ
219 | c->Patmos__DOT__UartCmp__DOT__uart__DOT__rx_buff = d;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At global scope:
cc1plus: note: unrecognized command-line option โ-Wno-undefined-bool-conversionโ may have been intended to silence earlier diagnostics
Downgrading to Verilator 4.200 fixes the problem.
If I interpret the OCP port signals correctly, it appears that Patmos is filling the method cache on every call and return instruction, regardless of whether the fetched method is already cached.
Steps to observe this behavior:
test.c
:#include <machine/spm.h>
int fib(int n) {
if (n < 2) {
return n;
}
return fib(n-1) + fib(n-2);
}
int main() {
volatile _SPM int *led = (volatile _SPM int *) 0xF0090000;
*led = 1;
int res = fib(3);
*led = 0;
return 0;
}
patmos-clang test.c
patemu -v a.out
fib
(for me it starts at address 0x20bc4).patmos-llvm-objdump -d a.out > dump.S
Patmos.vcd
with GTKWave or a similar program. The traces of interest are TOP/Patmos/Leds/ledReg
(helps with locating the relevant section), TOP/Patmos/cores_0/icache/io_ocp_port_M_cmd
and TOP/Patmos/cores_0/icache/io_ocp_port_M_Addr
.As can be seen in the screenshot below, shortly after ledReg
is assigned 1, the instruction cache starts fetching addresses 0x20bc0 through 0x20c70, which corresponds to the function fib
, as expected. However, after a brief break the cache starts fetching the same addresses again.
Zooming out, we can see that the instructions of fib
are fetched 9 times in total, after which the instruction cache fetches the main
function (address 0x20c84). Hence, the function fib
is fetched every time it is called (5 times in total) and every time a recursive call returns to fib
(4 times). This is redundant, the function is already in the cache when called recursively.
This appears to have a significant impact on the performance of Patmos. For algorithms repeatedly calling small leaf methods the execution time is reduced by up to an order of magnitude when inlining these calls in order to avoid redundant cache refills.
Most VHDL top-level files still contain the port to Argo, which shall be removed.
However, before changing all, we should decide which one we will drop.
The pasim
simulator does not take predicates into account when it throws an error about use of load value without a delay slot.
Take this assembly program:
.file "main.bc"
.text
.globl main
.align 16
.type main,@function
.fstart main, .Ltmp0-main, 16
main: # @main
# BB#0: # %entry
sres 8
mfs $r9 = $s0
sws [1] = $r9 # 4-byte Folded Spill
sws [2] = $r26 # 4-byte Folded Spill
li $r1 = main
lwc $r2 = [$r1] #Load of r2
(!$p0)add $r1 = $r1, $r2 #Use of r2, throws error
mov $r1 = $r0
lws $r9 = [1] # 4-byte Folded Reload
ret
lws $r26 = [2] # 4-byte Folded Reload
mts $s0 = $r9
sfree 8
.Ltmp0:
.Ltmp1:
.size main, .Ltmp1-main
Compiling and running the program with pasim
will make it throw the following error:
Cycle 24502: Illegal instruction at 00052048<main + 0x24>: Use of load result without delay slot!
Stacktrace:
#0 0x52024 <main>(): $rsp 0x68 stack cache size 0x40
at 0x52048 (base: 0x52024 <main>, offset: 0x24 <main + 0x24>)
#1 0x20124 <__start>(): $rsp 0x7fffffff stack cache size 0x0
at 0x201cc (base: 0x20194 <__start:.LBB1_1:.Ltmp2 + 0x8>, offset: 0x38 <__start:.LBB1_1:.Ltmp2 + 0x40>)
#2 0x20084 <_start>(): $rsp 0x7fffffff stack cache size 0x0
at 0x20110 (base: 0x20084 <.LBB0_0:_start>, offset: 0x8c <_start + 0x8c>)
The offending instruction is (!$p0)add $r1 = $r1, $r2
. pasim
's error does not point directly to it, but to the subsequent instruction. We can remove the error by introducing a nop
between the load and the use:
.file "main.bc"
.text
.globl main
.align 16
.type main,@function
.fstart main, .Ltmp0-main, 16
main: # @main
# BB#0: # %entry
sres 8
mfs $r9 = $s0
sws [1] = $r9 # 4-byte Folded Spill
sws [2] = $r26 # 4-byte Folded Spill
li $r1 = main
lwc $r2 = [$r1] #Load of r2
nop
(!$p0)add $r1 = $r1, $r2 #Use of r2, doesn't throw an error
mov $r1 = $r0
lws $r9 = [1] # 4-byte Folded Reload
ret
lws $r26 = [2] # 4-byte Folded Reload
mts $s0 = $r9
sfree 8
.Ltmp0:
.Ltmp1:
.size main, .Ltmp1-main
Usually, this error is correct, because the load instructions have 1 load-to-use latency where the destination register does not have the loaded value. But, in our case, the use instruction is predicated, and we can see that it will never run. In this case we can even see it before running the program, since (!$p0) is always false. But, the predicate could also be run-time-dependent.
In my mind, pasim
should check the value of the predicate before throwing this error, such that if the predicate evaluates to false, at run-time, the error is not thrown.
Am I totally off in my understanding of the instruction set and pasim
or should this be fixed?
Using bisect I worked my way to a working commit for Argo and that seems to be titled:
chisel3: Last bug that i can resolve for compatability layer. Regtestโฆ
I have tried looking through the changes to find what could possibly break Argo but I cannot, so I am opening an issue.
In Config.scala (line 244) we now force that in every ExtMem device there must be an sramAddrWidth param:
if (!(ExtMemNode \ "@DevTypeRef").isEmpty){
ExtMemDev = devFromXML(ExtMemNode,DevList,false)
ExtMemAddrWidth = ExtMemDev.params("sramAddrWidth")
}
However, if an onboard memory is used it creates the awkward situation of needing double address width parameters, otherwise the build of course fails with a key not found error:
<Dev DevType="OCRam" entity="OCRamCtrl" iface="OcpBurst">
<params>
<param name="addrWidth" value="19" />
<param name="sramAddrWidth" value="19" />
</params>
</Dev>
Originally posted by LehrChristoph September 14, 2021
Hi all,
I started to develop a container image which provides all dependencies and has the compiler etc set up. With the setup script more or less everything worked fine, but when I try synthesizing patmos the Quartus fitter takes around 17 minutes on my AMD Ryzen 7 5800X, in comparison to Quartus 19.1 it only takes 7-8 seconds.
Additionally when flashing the hello puts example onto my DE2-115 Board the process exits with an error, using 19.1 the program exits normal.
That behaviour is quite strange for me, and I'm a little lost looking for the error although Quartus 20.1 should still support the Cyclone IV FPGA line.
During setup of patmos and Ethernet I found a set of bugs:
ethlib
does not configure the initial RX Buffer Descriptor as empty, therefore the Ethernet MAC assumes the buffer is full and no package is written into the systems RAM.ethlib_demo
UDP Checksum calculation blocks CPU: added line to skip broken packeteth_wr
and eth_rd
are dedicated for EMAC io device and not for EthMac. To make this more clear a disclaimer would be helpful.Verify that the change in the method cache (759f455) for Chisel 3 does not change the generated Verilog code.
The connection is there, th real thing (ld, st) is missing.
Hello,
when I try to build patmos, it fails on emulator target with the following error message.
%Error: ../harnessConfig.vlt:2: syntax error, unexpected IDENTIFIER %Error: Exiting due to 1 error(s) %Error: Command Failed /usr/bin/verilator_bin --cc ../harnessConfig.vlt Patmos.v --top-module Patmos '+define+TOP_TYPE=VPatmos' --threads 1 -CFLAGS '-Wno-undefined-bool-conversion -O1 -DTOP_TYPE=VPatmos -DVL_USER_FINISH -include VPatmos.h' -Mdir /home/ahmed/t-crest/patmos/hardware/build --exe ../Patmos-harness.cpp -LDFLAGS -lelf --trace
OS: Ubuntu 18.04
verilator: 3.916
The standard output does not seem to reflect the fact that the following program writes to the UART, when run through the Patmos emulator (patemu). It, however, does work fine with the simulator (pasim).
#include <machine/spm.h>
int main() {
volatile _SPM int *uart_data = (volatile _SPM int *) 0xF0080004;
*uart_data = 'H';
for(;;);
}
The loop is only there to make sure the program doesn't terminate the UART communication early.
The following two commands was used to run the program myhello.c
as above in the emulator
$ make comp APP=myhello
$ patemu tmp/myhello.elf
There are two merge conflicts that need to be resolved:
both modified: hardware/config/altde2-all.xml
both modified: hardware/src/main/scala/io/EthMac.scala
The whole merge adds quite some files:
modified: c/ethlib/eth_mac_driver.c
modified: c/ethlib/eth_patmos_io.c
modified: c/ethlib/eth_patmos_io.h
new file: c/ethlib/tte.c
new file: c/ethlib/tte.h
new file: c/ethlib_tte_demo.c
new file: c/ethlib_tte_demo_interrupts.c
new file: c/ethlib_tte_demo_latency.c
new file: c/ethlib_tte_wcet.c
new file: hardware/config/altde2-interrupt.xml
new file: hardware/config/altde2-latency.xml
new file: hardware/config/default-no-timer.xml
modified: hardware/ethmac/eth_controller_top.vhdl
new file: hardware/ethmac/eth_controller_top2.vhdl
new file: hardware/quartus/altde2-interrupt/patmos.qpf
new file: hardware/quartus/altde2-interrupt/patmos.qsf
new file: hardware/quartus/altde2-interrupt/patmos.sdc
new file: hardware/quartus/altde2-latency/patmos.qpf
new file: hardware/quartus/altde2-latency/patmos.qsf
new file: hardware/quartus/altde2-latency/patmos.sdc
new file: hardware/src/main/scala/io/EthMac2.scala
new file: hardware/vhdl/patmos_de2-interrupt.vhdl
new file: hardware/vhdl/patmos_de2-latency.vhdl
new file: wcet/analyse_wcet.sh
new file: wcet/config_de2_115.pml
Hi all,
I wanted to test the ptplib_demo and I got a set of compilation errors. I will work on the fix
ptplib_demo.c:93:117: error: too many arguments to function call, expected 7, have 8
ptpv2_issue_msg(thisPtpPortInfo, tx_addr, rx_addr, PTP_BROADCAST_MAC, PTP_MULTICAST_IP, seqId, PTP_SYNC_MSGTYPE, syncInterval);
~~~~~~~~~~~~~~~ ^~~~~~~~~~~~
/opt/t-crest/patmos/c/ethlib/ptp1588.h:187:1: note: 'ptpv2_issue_msg' declared here
int ptpv2_issue_msg(PTPPortInfo ptpPortInfo, unsigned tx_addr, unsigned rx_addr, unsigned char destination_mac[6], unsigned char destination_ip[4], unsigned seqId, unsigned msgType);
^
ptplib_demo.c:95:119: error: too many arguments to function call, expected 7, have 8
ptpv2_issue_msg(thisPtpPortInfo, tx_addr, rx_addr, PTP_BROADCAST_MAC, PTP_MULTICAST_IP, seqId, PTP_FOLLOW_MSGTYPE, syncInterval);
~~~~~~~~~~~~~~~ ^~~~~~~~~~~~
/opt/t-crest/patmos/c/ethlib/ptp1588.h:187:1: note: 'ptpv2_issue_msg' declared here
int ptpv2_issue_msg(PTPPortInfo ptpPortInfo, unsigned tx_addr, unsigned rx_addr, unsigned char destination_mac[6], unsigned char destination_ip[4], unsigned seqId, unsigned msgType);
^
ptplib_demo.c:101:141: error: too many arguments to function call, expected 7, have 8
ptpv2_issue_msg(thisPtpPortInfo, tx_addr, rx_addr, PTP_BROADCAST_MAC, lastSlaveInfo.ip, rxPTPMsg.head.sequenceId, PTP_DLYRPLY_MSGTYPE, syncInterval);
~~~~~~~~~~~~~~~ ^~~~~~~~~~~~
/opt/t-crest/patmos/c/ethlib/ptp1588.h:187:1: note: 'ptpv2_issue_msg' declared here
int ptpv2_issue_msg(PTPPortInfo ptpPortInfo, unsigned tx_addr, unsigned rx_addr, unsigned char destination_mac[6], unsigned char destination_ip[4], unsigned seqId, unsigned msgType);
^
ptplib_demo.c:127:141: error: too many arguments to function call, expected 7, have 8
ptpv2_issue_msg(thisPtpPortInfo, tx_addr, rx_addr, PTP_BROADCAST_MAC, lastMasterInfo.ip, rxPTPMsg.head.sequenceId, PTP_DLYREQ_MSGTYPE, ptpTimeRecord.syncInterval);
~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/t-crest/patmos/c/ethlib/ptp1588.h:187:1: note: 'ptpv2_issue_msg' declared here
int ptpv2_issue_msg(PTPPortInfo ptpPortInfo, unsigned tx_addr, unsigned rx_addr, unsigned char destination_mac[6], unsigned char destination_ip[4], unsigned seqId, unsigned msgType);
^
ptplib_demo.c:133:140: error: too many arguments to function call, expected 7, have 8
ptpv2_issue_msg(thisPtpPortInfo, tx_addr, rx_addr, PTP_BROADCAST_MAC, lastMasterInfo.ip, rxPTPMsg.head.sequenceId, PTP_DLYREQ_MSGTYPE, ptpTimeRecord.syncInterval);
~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/t-crest/patmos/c/ethlib/ptp1588.h:187:1: note: 'ptpv2_issue_msg' declared here
int ptpv2_issue_msg(PTPPortInfo ptpPortInfo, unsigned tx_addr, unsigned rx_addr, unsigned char destination_mac[6], unsigned char destination_ip[4], unsigned seqId, unsigned msgType);
^
ptplib_demo.c:181:109: error: too few arguments to function call, expected 6, have 4
thisPtpPortInfo = ptpv2_intialize_local_port(PATMOS_IO_ETH, my_mac, (unsigned char[4]){192, 168, 2, 50}, 1);
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
/opt/t-crest/patmos/c/ethlib/ptp1588.h:181:1: note: 'ptpv2_intialize_local_port' declared here
PTPPortInfo ptpv2_intialize_local_port(unsigned int eth_base, int portRole, unsigned char mac[6], unsigned char ip[4], unsigned short portId, int syncPeriod);
^
ptplib_demo.c:198:124: error: too few arguments to function call, expected 6, have 4
thisPtpPortInfo = ptpv2_intialize_local_port(PATMOS_IO_ETH, my_mac, (unsigned char[4]){192, 168, 2, rand_addr}, rand_addr);
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
The bintray repository is no longer available to install sbt.
The new repository including its configuration can be found here: https://www.scala-sbt.org/download.html
Chisel 3 does not create a Verilog case statement, but priority MUXs (see chipsalliance/chisel#983). This hurts fmax (also for Chisel switch statements, e.g., an ALU lookup in Leros). A workaround is to generate an inline Verilog table (case).
There is some dead code in the emulator harness (was around setting on-chip memory). Should be removed.
We use Yahoo for the Patmos mailing list. We need to move soon. And all the history is lost :-(
I think we should extract the simulator (everything under patmos/simulator
) into its own repository under t-crest.
I need this to make a good automatic testing and deployment setup using github and travis-ci. This is because we currently have a circular dependency in our repositories:
To build and test LLVM, we need pasim
, however, to build and test Patmos (which contains pasim
), we need LLVM. However, the simulator on its own is not dependent on LLVM, which means if we extract it, we can build it independently and then use it for the LLVM build.
I have already tested that the simulator
folder is completely self-contained and can simply be extracted from the patmos repository. However, I do not have permission to make new t-crest repositories, so I need someone who can do this for me.
After #69, now running the make BOARD=altde2-115 gen synth fails giving the following output:
[info] Compiling 4 Scala sources to /home/patmos/t-crest/patmos/hardware/target/scala-2.11/classes ...
[error] /home/patmos/t-crest/patmos/hardware/src/main/scala/io/I2controller.scala:68:10: value sclClk is not a member of Chisel.Bundle{val sdaIn: Chisel.Bool; val sdaOut: Chisel.Bool; val sclOut: Chisel.Bool; val sclIn: Chisel.Bool; val i2cEn: Chisel.Bool}
[error] io.pins.sclClk := sclClk
[error] ^
[error] /home/patmos/t-crest/patmos/hardware/src/main/scala/io/I2controller.scala:69:12: value sdaClk is not a member of Chisel.Bundle{val sdaIn: Chisel.Bool; val sdaOut: Chisel.Bool; val sclOut: Chisel.Bool; val sclIn: Chisel.Bool; val i2cEn: Chisel.Bool}
[error] io.pins.sdaClk := sdaClk
[error] ^
[error] /home/patmos/t-crest/patmos/hardware/src/main/scala/io/I2controller.scala:126:10: value busy is not a member of Chisel.Bundle{val sdaIn: Chisel.Bool; val sdaOut: Chisel.Bool; val sclOut: Chisel.Bool; val sclIn: Chisel.Bool; val i2cEn: Chisel.Bool}
[error] io.pins.busy := busy
[error] ^
[error] three errors found
[error] (Compile / compileIncremental) Compilation failed
the first two errors seem to be a typo as the correct pin names look like they should :
io.pins.sclOut
io.pins.sdaOut
When generating a Patmos multi-core device (i.e. 2x2) the instantiated UART, LED and KEYS IO devices for cores that do not have access to them (i.e. core# 1/2/3) have an error trailing comma as shown in the figure and Vivado 2018.2 gives an error during Simulation.
Perhaps devices that are not used by the cores should not be instantiated at all ?
The test case vliw_test/add (and possibly other test cases) show inconsistent forwarding in the simulator. In the first cycle, r1 is assigned 2 in pipeline 0 and 5 in pipeline 1. r2-r7 are assigned r1+r1 in the subsequent cycles. The result is that r1=5 (result from pipeline 1), r2=r3=4 (apparently forwarded from pipeline 0), and r4=r5=r6=r7=10 (forwarded from pipeline 1 or the register file). This should be fixed such that the value in the register and the forwarded values are consistent.
The test cases inst_tests/stackcache_load_store.s and inst_tests/datacache_load_store.s are almost identical. However, the first load to r3 sets the register to 0xffffffff when loading from the stack cache, while the result of the corresponding load from the data cache (or local memory or global memory) is 0. As the same data is written to the respective address before the load, the results should be identical.
The handbook says:
- The return information registers s7-s10 (srb, sro, sxb, sxo) are callee-saved saved registers.
This doesn't make sense to me, as a call
would overwrite the registers with the callee's return information, making the callee unable to save the caller's return information.
After talking with @schoeberl, this seems to be a mistake, so I propose we reword it to caller-saved
instead.
We used Python 2 in Aegean, Aegean is gone now. I guess we do not use python anymore anywhere else. So we should remove the dependency on the README and in the handbook.
According to section 1.2.8 of the Patmos Reference Handbook to enable a multi-core Patmos one should "uncomment following lines in the configuration file:"
<pipeline dual="false" />
<cores count="4"/>
<CmpDevs>
<CmpDev name="Argo" />
</CmpDevs>
However, if I do that, then building Patmos emulator fails. I first did a build with the original configuration to verify that everything works and then uncommented the lines as instructed in the configuration file. When rebuilding Patmos emulator I get following error:
[info] [0.003] Elaborating design...
IO device Timer: entity Timer, offset 2, params Map(), interrupts: List(0, 1), all cores
IO device Deadline: entity Deadline, offset 3, params Map(), all cores
IO device Sram16: entity SRamCtrl, offset -1, params Map(ocpAddrWidth -> 21, sramAddrWidth -> 20, sramDataWidth -> 16), core 0
IO device Leds: entity Leds, offset 9, params Map(ledCount -> 9), core 0
IO device Keys: entity Keys, offset 10, params Map(keyCount -> 4), interrupts: List(2, 3, 4, 5), core 0
Config core count: 4
Reading /home/michael/Documents/t-crest-2021/patmos/hardware/../tmp/bootable-bootloader.bin
Reading /home/michael/Documents/t-crest-2021/patmos/hardware/../tmp/bootable-bootloader.bin
Reading /home/michael/Documents/t-crest-2021/patmos/hardware/../tmp/bootable-bootloader.bin
Reading /home/michael/Documents/t-crest-2021/patmos/hardware/../tmp/bootable-bootloader.bin
Config cmp:
device: Argo
Argo connecting 4 Patmos islands with configuration:
N=2
M=2
SPM_SIZE (Bytes)=4096
Emulation is false
o--Instantiating Nodes
|---Node #0 @ (0,0)
|---Node #1 @ (0,1)
|---Node #2 @ (1,0)
|---Node #3 @ (1,1)
o--Building Interconnect
[error] java.io.IOException: error=2, No such file or directory
[error] ...
[error] at argo.ArgoConfig$.genPoseidonSched(ArgoConfig.scala:157)
[error] at argo.Argo.<init>(Argo.scala:136)
[error] at patmos.Patmos.$anonfun$cmpdevios$2(Patmos.scala:257)
[error] at chisel3.Module$.do_apply(Module.scala:54)
[error] at patmos.Patmos.$anonfun$cmpdevios$1(Patmos.scala:257)
[error] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:285)
[error] at scala.collection.immutable.Set$Set3.foreach(Set.scala:233)
[error] at scala.collection.TraversableLike.map(TraversableLike.scala:285)
[error] at scala.collection.TraversableLike.map$(TraversableLike.scala:278)
[error] at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:53)
[error] at scala.collection.SetLike.map(SetLike.scala:105)
[error] at scala.collection.SetLike.map$(SetLike.scala:105)
[error] at scala.collection.AbstractSet.map(Set.scala:53)
[error] at patmos.Patmos.<init>(Patmos.scala:254)
[error] at patmos.PatmosMain$.$anonfun$new$118(Patmos.scala:580)
[error] ... (Stack trace trimmed to user code only, rerun with --full-stacktrace if you wish to see the full stack trace)
[error] (run-main-0) firrtl.options.StageError:
[error] firrtl.options.StageError:
[error] at chisel3.stage.ChiselStage.run(ChiselStage.scala:60)
[error] at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] at firrtl.options.Translator.transform(Phase.scala:248)
[error] at firrtl.options.Translator.transform$(Phase.scala:248)
[error] at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] at firrtl.options.Stage.$anonfun$transform$5(Stage.scala:47)
[error] at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] at firrtl.options.Stage.$anonfun$transform$3(Stage.scala:47)
[error] at logger.Logger$.$anonfun$makeScope$2(Logger.scala:166)
[error] at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] at logger.Logger$.makeScope(Logger.scala:164)
[error] at firrtl.options.Stage.transform(Stage.scala:47)
[error] at firrtl.options.Stage.execute(Stage.scala:58)
[error] at chisel3.stage.ChiselStage.emitVerilog(ChiselStage.scala:117)
[error] at patmos.PatmosMain$.delayedEndpoint$patmos$PatmosMain$1(Patmos.scala:580)
[error] at patmos.PatmosMain$delayedInit$body.apply(Patmos.scala:571)
[error] at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error] at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error] at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error] at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error] at scala.collection.immutable.List.foreach(List.scala:431)
[error] at scala.App.main(App.scala:80)
[error] at scala.App.main$(App.scala:78)
[error] at patmos.PatmosMain$.main(Patmos.scala:571)
[error] at patmos.PatmosMain.main(Patmos.scala)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: chisel3.internal.ChiselException: Exception thrown when elaborating ChiselGeneratorAnnotation
[error] at chisel3.stage.ChiselGeneratorAnnotation.elaborate(ChiselAnnotations.scala:65)
[error] at chisel3.stage.phases.Elaborate.$anonfun$transform$1(Elaborate.scala:24)
[error] at scala.collection.immutable.List.flatMap(List.scala:366)
[error] at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:23)
[error] at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:16)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] at firrtl.options.Translator.transform(Phase.scala:248)
[error] at firrtl.options.Translator.transform$(Phase.scala:248)
[error] at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] at firrtl.options.DependencyManager.$anonfun$transform$3(DependencyManager.scala:278)
[error] at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] at firrtl.options.DependencyManager.transform(DependencyManager.scala:269)
[error] at firrtl.options.DependencyManager.transform$(DependencyManager.scala:255)
[error] at firrtl.options.PhaseManager.transform(DependencyManager.scala:436)
[error] at chisel3.stage.ChiselStage.run(ChiselStage.scala:46)
[error] at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] at firrtl.options.Translator.transform(Phase.scala:248)
[error] at firrtl.options.Translator.transform$(Phase.scala:248)
[error] at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] at firrtl.options.Stage.$anonfun$transform$5(Stage.scala:47)
[error] at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] at firrtl.options.Stage.$anonfun$transform$3(Stage.scala:47)
[error] at logger.Logger$.$anonfun$makeScope$2(Logger.scala:166)
[error] at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] at logger.Logger$.makeScope(Logger.scala:164)
[error] at firrtl.options.Stage.transform(Stage.scala:47)
[error] at firrtl.options.Stage.execute(Stage.scala:58)
[error] at chisel3.stage.ChiselStage.emitVerilog(ChiselStage.scala:117)
[error] at patmos.PatmosMain$.delayedEndpoint$patmos$PatmosMain$1(Patmos.scala:580)
[error] at patmos.PatmosMain$delayedInit$body.apply(Patmos.scala:571)
[error] at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error] at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error] at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error] at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error] at scala.collection.immutable.List.foreach(List.scala:431)
[error] at scala.App.main(App.scala:80)
[error] at scala.App.main$(App.scala:78)
[error] at patmos.PatmosMain$.main(Patmos.scala:571)
[error] at patmos.PatmosMain.main(Patmos.scala)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: java.io.IOException: Cannot run program "../../local/bin/poseidon": error=2, No such file or directory
[error] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
[error] at scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:75)
[error] at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:104)
[error] at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang(ProcessBuilderImpl.scala:118)
[error] at argo.ArgoConfig$.genPoseidonSched(ArgoConfig.scala:157)
[error] at argo.Argo.<init>(Argo.scala:136)
[error] at patmos.Patmos.$anonfun$cmpdevios$2(Patmos.scala:257)
[error] at chisel3.Module$.do_apply(Module.scala:54)
[error] at patmos.Patmos.$anonfun$cmpdevios$1(Patmos.scala:257)
[error] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:285)
[error] at scala.collection.immutable.Set$Set3.foreach(Set.scala:233)
[error] at scala.collection.TraversableLike.map(TraversableLike.scala:285)
[error] at scala.collection.TraversableLike.map$(TraversableLike.scala:278)
[error] at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:53)
[error] at scala.collection.SetLike.map(SetLike.scala:105)
[error] at scala.collection.SetLike.map$(SetLike.scala:105)
[error] at scala.collection.AbstractSet.map(Set.scala:53)
[error] at patmos.Patmos.<init>(Patmos.scala:254)
[error] at patmos.PatmosMain$.$anonfun$new$118(Patmos.scala:580)
[error] at chisel3.Module$.do_apply(Module.scala:54)
[error] at chisel3.stage.ChiselGeneratorAnnotation.$anonfun$elaborate$1(ChiselAnnotations.scala:60)
[error] at chisel3.internal.Builder$.$anonfun$build$1(Builder.scala:642)
[error] at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] at chisel3.internal.Builder$.build(Builder.scala:639)
[error] at chisel3.internal.Builder$.build(Builder.scala:635)
[error] at chisel3.stage.ChiselGeneratorAnnotation.elaborate(ChiselAnnotations.scala:60)
[error] at chisel3.stage.phases.Elaborate.$anonfun$transform$1(Elaborate.scala:24)
[error] at scala.collection.immutable.List.flatMap(List.scala:366)
[error] at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:23)
[error] at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:16)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] at firrtl.options.Translator.transform(Phase.scala:248)
[error] at firrtl.options.Translator.transform$(Phase.scala:248)
[error] at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] at firrtl.options.DependencyManager.$anonfun$transform$3(DependencyManager.scala:278)
[error] at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] at firrtl.options.DependencyManager.transform(DependencyManager.scala:269)
[error] at firrtl.options.DependencyManager.transform$(DependencyManager.scala:255)
[error] at firrtl.options.PhaseManager.transform(DependencyManager.scala:436)
[error] at chisel3.stage.ChiselStage.run(ChiselStage.scala:46)
[error] at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] at firrtl.options.Translator.transform(Phase.scala:248)
[error] at firrtl.options.Translator.transform$(Phase.scala:248)
[error] at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] at firrtl.options.Stage.$anonfun$transform$5(Stage.scala:47)
[error] at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] at firrtl.options.Stage.$anonfun$transform$3(Stage.scala:47)
[error] at logger.Logger$.$anonfun$makeScope$2(Logger.scala:166)
[error] at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] at logger.Logger$.makeScope(Logger.scala:164)
[error] at firrtl.options.Stage.transform(Stage.scala:47)
[error] at firrtl.options.Stage.execute(Stage.scala:58)
[error] at chisel3.stage.ChiselStage.emitVerilog(ChiselStage.scala:117)
[error] at patmos.PatmosMain$.delayedEndpoint$patmos$PatmosMain$1(Patmos.scala:580)
[error] at patmos.PatmosMain$delayedInit$body.apply(Patmos.scala:571)
[error] at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error] at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error] at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error] at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error] at scala.collection.immutable.List.foreach(List.scala:431)
[error] at scala.App.main(App.scala:80)
[error] at scala.App.main$(App.scala:78)
[error] at patmos.PatmosMain$.main(Patmos.scala:571)
[error] at patmos.PatmosMain.main(Patmos.scala)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: java.io.IOException: error=2, No such file or directory
[error] at java.lang.UNIXProcess.forkAndExec(Native Method)
[error] at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
[error] at java.lang.ProcessImpl.start(ProcessImpl.java:134)
[error] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
[error] at scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:75)
[error] at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:104)
[error] at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang(ProcessBuilderImpl.scala:118)
[error] at argo.ArgoConfig$.genPoseidonSched(ArgoConfig.scala:157)
[error] at argo.Argo.<init>(Argo.scala:136)
[error] at patmos.Patmos.$anonfun$cmpdevios$2(Patmos.scala:257)
[error] at chisel3.Module$.do_apply(Module.scala:54)
[error] at patmos.Patmos.$anonfun$cmpdevios$1(Patmos.scala:257)
[error] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:285)
[error] at scala.collection.immutable.Set$Set3.foreach(Set.scala:233)
[error] at scala.collection.TraversableLike.map(TraversableLike.scala:285)
[error] at scala.collection.TraversableLike.map$(TraversableLike.scala:278)
[error] at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:53)
[error] at scala.collection.SetLike.map(SetLike.scala:105)
[error] at scala.collection.SetLike.map$(SetLike.scala:105)
[error] at scala.collection.AbstractSet.map(Set.scala:53)
[error] at patmos.Patmos.<init>(Patmos.scala:254)
[error] at patmos.PatmosMain$.$anonfun$new$118(Patmos.scala:580)
[error] at chisel3.Module$.do_apply(Module.scala:54)
[error] at chisel3.stage.ChiselGeneratorAnnotation.$anonfun$elaborate$1(ChiselAnnotations.scala:60)
[error] at chisel3.internal.Builder$.$anonfun$build$1(Builder.scala:642)
[error] at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] at chisel3.internal.Builder$.build(Builder.scala:639)
[error] at chisel3.internal.Builder$.build(Builder.scala:635)
[error] at chisel3.stage.ChiselGeneratorAnnotation.elaborate(ChiselAnnotations.scala:60)
[error] at chisel3.stage.phases.Elaborate.$anonfun$transform$1(Elaborate.scala:24)
[error] at scala.collection.immutable.List.flatMap(List.scala:366)
[error] at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:23)
[error] at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:16)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] at firrtl.options.Translator.transform(Phase.scala:248)
[error] at firrtl.options.Translator.transform$(Phase.scala:248)
[error] at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] at firrtl.options.DependencyManager.$anonfun$transform$3(DependencyManager.scala:278)
[error] at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] at firrtl.options.DependencyManager.transform(DependencyManager.scala:269)
[error] at firrtl.options.DependencyManager.transform$(DependencyManager.scala:255)
[error] at firrtl.options.PhaseManager.transform(DependencyManager.scala:436)
[error] at chisel3.stage.ChiselStage.run(ChiselStage.scala:46)
[error] at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] at firrtl.options.Translator.transform(Phase.scala:248)
[error] at firrtl.options.Translator.transform$(Phase.scala:248)
[error] at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] at firrtl.options.Stage.$anonfun$transform$5(Stage.scala:47)
[error] at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] at firrtl.options.Stage.$anonfun$transform$3(Stage.scala:47)
[error] at logger.Logger$.$anonfun$makeScope$2(Logger.scala:166)
[error] at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] at logger.Logger$.makeScope(Logger.scala:164)
[error] at firrtl.options.Stage.transform(Stage.scala:47)
[error] at firrtl.options.Stage.execute(Stage.scala:58)
[error] at chisel3.stage.ChiselStage.emitVerilog(ChiselStage.scala:117)
[error] at patmos.PatmosMain$.delayedEndpoint$patmos$PatmosMain$1(Patmos.scala:580)
[error] at patmos.PatmosMain$delayedInit$body.apply(Patmos.scala:571)
[error] at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error] at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error] at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error] at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error] at scala.collection.immutable.List.foreach(List.scala:431)
[error] at scala.App.main(App.scala:80)
[error] at scala.App.main$(App.scala:78)
[error] at patmos.PatmosMain$.main(Patmos.scala:571)
[error] at patmos.PatmosMain.main(Patmos.scala)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:498)
[error] Nonzero exit code: 1
[error] (Compile / runMain) Nonzero exit code: 1
[error] Total time: 6 s, completed Nov 4, 2021 5:05:46 PM
The handbook documentation around the brcf
and brcfnd
says that these instructions use the formats CFLi
(immediate operand) and CFLrt
(two register operands). However, for inline assembly giving them two registers will throw an error:
<inline asm>:35:16: error: invalid operand for instruction or syntax mismatch
brcfnd $r1, $r10
<inline asm>:36:16: error: invalid operand for instruction or syntax mismatch
brcf $r12, $r12
while giving them only 1 register operand (matching CFLrs
format) will compile with no issue and even be accepted by patmos-llvm-objdump
as a valid instruction.
So, is the documentation wrong, or is it the implementation?
If the documentation is wrong, is the CFLrt
format used by any instructions (no other instructions are in the handbook.)
If the implementation is wrong, what would the semantics of the CFLrt
formats be?
it appears that some special symbols make the assembler crash (e.g., in an old version of fetch_double.s)
If we load a label into a register as part of a bundle, then dereference the register, followed by using a multiply on the resulting value, it does not execute correctly on pasim
.
Take this program (main.c
):
#include <stdio.h>
volatile int _1 = 1;
int init_func(){
int x;
asm volatile(
"{add $r3 = $r0, _1\n"
"nop}\n"
"lwc $r5 = [$r3]\n"
"li $r4 = 2\n"
"mul $r4, $r5\n"
"nop\n"
"mfs %[x] = $s2\n"
:[x] "=r" (x)
:
:"$r3", "$r4", "$r5", "$r4", "$s2", "$s1"
);
return x;
}
// Should print "2" for correct execution
int main(){
printf("%d\n", init_func());
}
Looking in the inline assembly, we see that we start by loading the label _1
into r3
. We then dereference r3
into r5
, which means r5 = 1
. We then load r4 = 2
and multiply r4
and r5
, which should result in the value 2 in the special register s2
(the lower half of the mul
result).
The code successfully compiles using patmos-clang main.c
, but running it in pasim
(using pasim a.out
) result in the program printing 0, where we would expect 2.
This error is very specific. If we do any one of the following, the code will execute correctly in pasim
:
add
not be part of a bundle. Though, no matter which other instruction is part of the bundle, the add
will not success.mul
. E.g. if we exchange the mul
for an add
the correct result will be printed.add
, or use a register, instead of a label.Looking at pasim
debug prints, I suspect the problem lies in the stalling of the regular pipeline by the multiply pipeline, but I am not sure. I will investigate.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.