GithubHelp home page GithubHelp logo

koyamanx / rv32x_dev Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 3.0 829 KB

RISC-V implementation in NSL for FPGA. (rv32xSoC)

License: GNU General Public License v3.0

Dockerfile 0.34% Shell 1.00% C 79.92% C++ 5.87% Python 6.43% Makefile 4.02% Assembly 1.82% Pawn 0.60%

rv32x_dev's Introduction

Hi there ๐Ÿ‘‹

koyamanX's GitHub stats

rv32x_dev's People

Contributors

koyamanx avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

rv32x_dev's Issues

Reset mtime after timer interrupt taken

mtime is read-only register in CLINT, after taking timer interrupt then reset mtime.
Otherwise, timer interrupt want arises at the correct interval.
No way to reset it from software.

Inference of memory block in cache module

Inference of memory block in cache module

  • Currently, Block RAM in cache module(both icache and dcache) is not enabled, so we cannot fit into FPGA.
  • cache module is really complex in this project, so Re-design of cache module is suggested.

Imprecise instret register

Imprecise instret register

We implemented minstret for rdinstret instruction.
Some instruction such as ecall never retire in practice, However in current version of implementation counts up instret counter on all of instructions except for flushed instructions.
This behaviour must be fixed.

Delegation of exceptions to supervisor mode from machine mode.

Delegation of exceptions to supervisor mode from machine mode.

Some delegation of exceptions is missing like for example, page fault.
Future work must be implement all of delegation. (some exception delegation is not impractical ex. delegate illegal instruction exception to supervisor mode)

Occurrence of interrupts and exceptions during bus accesses.

Disable occurrence of interrupts and exception during bus accesses.

Current implementation allows occurrence of trap even when bus is accessed.
This may produce unwanted execution behavior.
For example, assume following instructions are in pipeline.

IF DE EX MEM WB
interrupt handler x x page fault x

In this case, page fault in memory stage overtakes interrupt handler.
Which is not correct behavior, since interrupts has higher precedence than synchronous exceptions.
In addition, it will destruct the context in the first trap.
To avoid this situation, we need to disable interrupt and exception from occurring during data memory access.
This will lead to little bit lower responsiveness of interrupts.
Or flush memory stage if trap is taken. (natural way?)
This requires facilities to cancel bus transfer(memory access).

Arbitrate memory access from Instruction Bus and Data Bus

Arbitrate memory access from Instruction Bus and Data Bus

Currently, no arbitration is taken place on memory access from Instruction Bus and Data Bus.
For simulation purpose, it can be managed because, memory access is processed in software(cpp top module)
However, running on FPGA, it needs dual port ram or arbitration on simultaneous access of memory.
Even with dual port RAM, SDRAM region must require arbitration.

Handling illegal instruction exception on CSR instructions

Handling illegal instruction exception on CSR instructions

CSR instructions may generate illegal instruction if instruction is accessing CSRs not mapped on CSR address space or writing value to read-only CSRs.
In current implementation, evaluation of illegal instruction is taken place at execute stage of the pipeline.
However CSR instructions are issued at writeback stage, so it complicates handling of illegal instruction exception.

Possible implementation choices

  • issue CSR instructions at execute stage which needs additional forwarding unit for CSRs.
  • check address of CSR at execute stage beforehand if not illegal instruction then issue CSR instructions at writeback.
    -- if illegal instruction, then handle it at execute stage.
    Needs additional decoder for CSRs.

Extend physical address space to 34 bits

Extend physical address space to 34 bits

This is not required on most of the cases, but as a implementation note,
SV32 paging translates 32 bit VA(Virtual Address) to 34 bit PA(Physical Address),
current implementation assumes 32 bits for physical address space.

CSR write with uimm clears other bits except for lower 5 bits

CSR write with uimm clears other bits except for lower 5 bits

The codes below clears other bits in CSR if uimm operation.

csr_wb_data = if(DEREG.uimm) 32'(DEREG.rs1) else execute_alu_a;

csr_wb_data = if(DEREG.uimm) 32'(DEREG.rs1) else execute_alu_a;

csr_wb_data = if(DEREG.uimm) 32'(DEREG.rs1) else execute_alu_a;

Since 5 bits uimm is zero extended to create new CSR value.
However CSR value with uimm is concatenation of previous CSR value with 5 bit uimm like code below.
new_CSR = {prev_CSR[31:5], uimm[4:0]};
So that value in upper 27 bits of CSR won't be cleared on CSR write with uimm.

CSR accessibility check mechanism

CSR accessibility check mechanism

csr_addr[9:8] encodes the lowest privilege level that can access the CSR.
Check this fields to verify if current operating mode has right privilege to access them.
Although current implementation has only Machine mode available so it is not required right now.
However, implementing other lower privileged level upon Machine mode, this mechanism must be implemented altogether.

Possible deadlock in execute stage on issuing system instructions

Possible deadlock in execute stage on issuing system instructions

  • Transition of memory stage from execute stage is determined by following code.
  • If DEREG.alu_sel is not set as ALU on system instructions such as wfi, ecall then no state transition occuries and stuck in execute stage forever.
  • To avoid this situation, we need to add condition check for system instructions.

    rv32x_dev/core/rv32x5p.nsl

    Lines 324 to 346 in 14e15de

    if(taken && !stall_execute) {
    stall_ifetch();
    stall_decode();
    ifetch_nop();
    ifetch.finish();
    decode_nop();
    decode.finish();
    ifetch(DEREG.nextpc);
    memory(emreg);
    } else if(DEREG.jump && !stall_execute) {
    stall_ifetch();
    stall_decode();
    ifetch_nop();
    ifetch.finish();
    decode_nop();
    decode.finish();
    ifetch(alu_q);
    memory(emreg);
    } else if(!taken && (DEREG.alu_sel == ALU) && !stall_execute) {
    memory(emreg);
    } else if(!taken && (DEREG.alu_sel == MUNIT) && rv32x_munit32.done && !stall_execute) {
    memory(emreg);
    }

Rename dcache_unit to load_store_unit.

Dcache_unit module serves as controller to memory operations onto cache unit inside and outside memory unit.
It's reasonable to change name for readability.

Consideration of removal of check_daddr_range in rv32x_mcore module

Removal of check_daddr_range in rv32x_mcore module

checking data memory's address range with check_daddr_range facility is not appropriate?
This facility must be done in PMP module.

check_daddr_range checks which address range is mapped(valid) in dmem accesses, however checking these attributes is not suitable feature of cpu core itself but PMP facilities(supposingly)
PMP -> raise access fault if not valid or mapped.

Bus error can be used for non-mapped region since these region is connected to default slave and always make error response (in AHB-Lite)

Address range check for MMIO devices and CLINT

Address range check for MMIO devices and CLINT

Current design does not check whether accessing address is mapped to register or memory regions.
This will hang simulator, since accessing vacant region and does not assert dmem_valid.
Add mechanism to check valid address range of MMIO devices and CLINT to avoid this situation.
Also consider to implement bus timeout.

Supervisor mode interrupt takes over machine mode interrupt

Supervisor mode interrupt takes over machine mode interrupt

Supervisor mode interrputs takes over machine mode interrupts when s-mode interrputs happend during processing m-mode interrupts.

fix condition for supervisor_interrupt_enabled and priority encoder for interrupt not to take less privileged interrupts when processing higher.

Implementation does not cover all possible illegal instructions.

Implementation does not cover all possible illegal instructions.

For example,

slli x1, x1, 0x20

does not process as illegal instruction which should be.
The code above let x1 <= x1 << 0x0 which must be handled as illegal instruction exception on valid implementation.
Although, this code cannot be assembled by "GNU as" however similar code appears at rv32mi-p-shamt in riscv-tests.
This behaviour is caused because current implementation does not check some fields precisely by the decoder.

Making sure following condition must produce illegal instruction exceptions.

  • (opcode == OP_IMM) && (funct3 == SLLI) && (imm[11:5] != 0?00000) -> illegal
  • (opcode == OP_IMM) && (funct3 == SRLI) && (imm[11:5] != 0?00000) -> illegal
  • (opcode == OP_IMM) && (funct3 == SRAI) && (imm[11:5] != 0?00000) -> illegal
  • (opcode == OP) && ((funct7 != 0000000) && (funct7 != 0100000) && (funct7 != MULDIV)) -> illegal (can be ommited?)
  • (opcode == MISC_MEM) && (funct3 == FENCE) && ((rs1 != 00000) || (rd != 00000))) -> illegal
  • (opcode == SYSTEM) && (funct3 == PRIV) && ((rs1 != 00000) || (rd != 00000)) -> illegal
  • (opcode == MISC_MEM) && (funct3 == FENCE.I) && ((rs1 != 00000) || (rd != 00000) || (imm[11:0] != 000000000000)) -> illegal
  • (opcode == SYSTEM) && (funct3 == 100) -> illegal
  • opcode not implemented -> illegal

Reconsider condition of interrupts

Reconsider condition of interrupts

if(mip.meip && mie.meip) {
external_interrupt_enabled();
}
if(mip.mtip && mie.mtip) {
timer_interrupt_enabled();
}
if(mip.msip && mie.msip) {
software_interrupt_enabled();
}

machine_external_interrupt_req
&& interrupt_enabled && external_interrupt_enabled: external_interrupt();
machine_software_interrupt_req
&& interrupt_enabled && software_interrupt_enabled: software_interrupt();
machine_timer_interrupt_req
&& interrupt_enabled && timer_interrupt_enabled: timer_interrupt();

Enable Fast simulation

Enable Fast simulation

Current simulator print waveform (VCD file) for simulation.
So it's too slow for running practical program.
Make compilation switch to disable and enable output of VCD file.

Implement CSR field attributes

Implement CSR field attributes

Attribute Valid write Invalid write
WIRI Ignore Ignore
WPRI Ignore Ignore
WLRL Write Write
WARL Write Modified to valid

Notes

  • Reading these fields are always held however software cannot assume values read after writing invalid values to WLRL fields.
  • Writing value to WIRI field may cause illegal instruction exception (implementation dependent).
  • Writing value to register which all field are WIRI will always cause illegal instruction exception.
  • Writing any value to WIRI, WPRI fields -> mask all value.
  • Writing any value to WLRL -> OK
  • Writing invalid value to WARL -> correct to valid value
  • Writing valid value to WARL -> OK

CSR write does not take place on some condition

rv32x_dev/core/rv32x5p.nsl

Lines 421 to 449 in b47d284

func csrrw {
csr_wb_data = if(DEREG.uimm) 32'(DEREG.rs1) else execute_alu_a;
if(DEREG.rd != 5'b00000) {
csr_read(DEREG.funct12);
}
csr_write(DEREG.funct12, csr_wb_data);
}
func csrrs {
csr_wb_data = if(DEREG.uimm) 32'(DEREG.rs1) else execute_alu_a;
csr_read(DEREG.funct12);
if(DEREG.rs1 != 5'b00000) {
if((csr_wb_data != 0x00000000) && DEREG.rd != 0) {
csr_write(DEREG.funct12, (csr_wb_data | 32'(crdata)));
} else {
csr_write(DEREG.funct12, 32'(crdata));
}
}
}
func csrrc {
csr_wb_data = if(DEREG.uimm) 32'(DEREG.rs1) else execute_alu_a;
csr_read(DEREG.funct12);
if(DEREG.rs1 != 5'b00000) {
if((csr_wb_data != 0x00000000) && DEREG.rd != 0) {
csr_write(DEREG.funct12, (~csr_wb_data & 32'(crdata)));
} else {
csr_write(DEREG.funct12, 32'(crdata));
}
}
}

Wrong fields name of xie register.

Wrong fields name of xie register.

We use xip structure for xie.
So fields name of xie is not correct.
This mistake does not raise problem because xie and xip uses same format except for its name.
However understandability of codes, it should be changed to correct name.

imprecise implementation of fence instruction.

imprecise implementation of fence instruction.

Current implementation of fence instruction is executed as nop.
However, with D$, we need to guarantee that external devices observe memory instruction execution order.
No external device access memory implicitly so no problem arises for now.

Integrate RSP server to simulation module

Integrate RSP server to simulation module

For simulating practical application on this simulator, GDB is often useful.
To enable debugging with GDB, integrate RSP server to simulation top module, give access to internal structure of simulator to GDB.

RSP server sample resides in simulation/rsp_server/main.c

Add interrupt enable to UART module

Add interrupt enable to UART module

UART does not have interrupt enable flag in status register, PLIC can mask or unmask interrupt however, it may sometimes useful for having interrupt enable in UART module.

Simulator exit hints

Simulator exit hints

This simulator exits its simulation when writing exit code to memory location 0x80001000.
Then, exit code is returned to your environment.
0x80001000 is used in riscv-tests as symbol name of 'tohost' to indicate end of tests.
However, for more practical program, writing to fixed location is not suitable.
Provides options for how to exit simulator.
Consider following options.

  • Reading location of 'tohost' symbol from executable loaded and watching memory write for exit.
  • Output some character through UART. (Not working for riscv-tests)
  • Implement hint instruction to indicate simulator exit. (Not working for riscv-tests)

Add FIFO to uart modules

Add FIFO to uart modules

Current design of UART module does not contain FIFO inside meaning every time UART module finishes its operation,
It will generates interrupt signal.
This causes performance of processor.
In addition, UART RXD will lose received data if processor takes that data before another data arrives.

Privilege mode transition in memory stage on trap handling

Exception like environment call from x mode does not flush memory stage, so if memory stage has at least one wait cycle, operation privilege mode of memory stage changed to trapped privilege mode.
Need to cancel until memory stage on all exception or store independent privilege mode prior to trap for memory stage during memory operation handling.

Check alignment of operand address in AMO operations.

Check alignment of operand address in AMO operations.

spec says on unprivileged ISA V20191214-draft p.49 that "For LR and SC, the A extensions requires that the address held in rs1 be naturally aligned to the size of the operand.
If not aligned, address-misaligned exception or access-fault exception will be generated.
The access-fault exception can be generated for memory access that would otherwise be able to complete except for the misalignment, if misaligned access should not be emulated."

So, in this implementation. we shall generate address-misaligned exception for them.

Evaluation of instruction address misaligned

Evaluation of instruction address misaligned

Current implementation evaluate whether instruction address misaligned or not at ifetch stage (right before sending address to BUS)
However, in RISCV privileged specification on p.41 at (riscv-tests@14f08f8)

Instruction address misaligned exceptions are raised by control-flow instructions with misaligned
targets, rather than by the act of fetching an instruction. Therefore, these exceptions have lower
priority than other instruction address exceptions.

With this design choice and to retain exception handing priority, we need to evaluate misaligned target at address calculation.
Otherwise, exception priority in execute stage (address calculation stage) will change.
Evaluation of misaligned targets must be done in the same stage as target address calculation (namely execute stage for this implementation).

  • Jump address calculation is done at execute stage whereas branch target calculation is done at decode stage.
  • In addition, change the priority encoder for exception

Some byte in D$ entry are incorrect in some condition where SV32 enabled.

Some byte in D$ entry are incorrect in some condition where SV32 enabled.

Temporary disabled D$ at commit(89467eb)

Problem are under investigation, however we note some for investigation.
SV32 are enabled,
0x0000_0000 -> 0x8000a0000 mapping for R, W, X, U, V.
Priv mode are User,
Instruction at address of VA:0x00000010(PA:0x8000a010) are wrong,
eg. instruction at 0x0000_0000 are li a7, 7
however, actually executed are li gp, 7. (only one byte is not correct)

Executing xv6's initcode(before user proc init) will always fault with invalid system call number at 0x00000000, because it writes to gp instead of a7.

Loading memory location at 0x0000_0000 are correct.

So we think that writeback of D$ is not work correctly when two-level paging enabled.

Determining whether accessing location is existed or not.

Determining whether accessing location is existed or not.

if not existed, then raise {INSTRUCTION,STORE,LOAD}_ACCESS_FAULT?
or Implementation defined BUS-ERROR exception?
or deadlock if bus does support timeout.

Need more information on ACCESS_FAULT.

Determining memory accesses are whether cacheable or non-cacheable

Determining memory accesses are whether cacheable or non-cacheable

This implementation installs data cache, so checking accessing address location are cacheable(memory location) or non-cacheable(I/O).
Otherwise, data cache holds invalid data of I/O registers.

Data cache is instantiated in 'rv32x_mcore' module however memory map is set in 'rv32x_integration' module.
So need some way to test whether cacheable or not.
If cacheable, memory access is done through data cache.
if non-cacheable, memory access is done directly without data cache.

Unnecessary condition check on load hazard

Unnecessary condition check on load hazard

  • Condition check below are not necessary, This condition check is originally used to determine if memory stage is finished or not.

    rv32x_dev/core/rv32x5p.nsl

    Lines 285 to 288 in fb14d58

    if(!memory_load_hazzard_b) {
    DEREG.alu_b := execute_alu_b;
    DEREG.alu_b_forward_en := 0;
    }

    rv32x_dev/core/rv32x5p.nsl

    Lines 292 to 295 in fb14d58

    if(!memory_load_hazzard_a) {
    DEREG.alu_a := execute_alu_a;
    DEREG.alu_a_forward_en := 0;
    }

    rv32x_dev/core/rv32x5p.nsl

    Lines 299 to 302 in fb14d58

    if(!csr_load_hazzard_b) {
    DEREG.alu_b := execute_alu_b;
    DEREG.alu_b_forward_en := 0;
    }

    rv32x_dev/core/rv32x5p.nsl

    Lines 306 to 309 in fb14d58

    if(!csr_load_hazzard_a) {
    DEREG.alu_a := execute_alu_a;
    DEREG.alu_a_forward_en := 0;
    }
  • However, whether memory stage is finished or not is determined at these part of code below, so the codes above can be removed.
    if((DEREG.rs1 == EMREG.rd) && (DEREG.alu_a_forward_en) && (DEREG.rs1 != 0) && EMREG.load && memory) {

    if((DEREG.rs2 == EMREG.rd) && (DEREG.alu_b_forward_en) && (DEREG.rs2 != 0) && EMREG.load && memory) {

Exception handling priority on 5 staged pipeline

According to privileged specification, highest priority is Instruction address breakpoint so ecall is lower priority than Instruction address breakpoint.

In this implementation, ecall is taken place at execute and instruction fetch at ifetch.
In this case, instruction in ifetch stage is the instruction which must not be executed and flushed by ecall.

Assume the instruction flow below in pipeline.

Ifetch Decode Execute Memory Writeback
break point address some instruction ecall some instruction some instruction

In this flow, ecall must change instruction flow with flushing ifetch and decode so no break point address exception arises now.
However, in this implementation, break point address exception is taken place, and ecall exception lost.

This implementation causes problem in the case where multiple sources of exception arises simultaneously in multiple stages.
However, priority encoder does not take these cases into account, need to change priority order carefully with checking which stages rises exceptions.
One way to solve this problem is change order of statements in alt construct so that exception in preceding instruction is accepted and following instruction is flushed.
NOTE: the priority order in stage must be preserved.

alt {
(rv32x.illegal_instruction || csr_not_mapped): illegal_instruction();
i_misaligned: instruction_address_misaligned();
rv32x.ecall: ecall();
rv32x.ebreak: ebreak();
l_misaligned: load_address_misaligned();
l_access_fault: load_access_fault();
s_misaligned: store_address_misaligned();
s_access_fault: store_access_fault();
}

Add context 1 to plic to support supervisor external interrupts in core0.

Add context 1 to plic to support supervisor external interrupts in core0.

Implement interrupt request to context1 in PLIC so that core0 can use context1' interrupt request line to implement supervisor external interrupt.

Implementing interrupt request line to context1 is straightforward, just copy and paste interrupt request for context0.
Some signals requires renaming.

Lint check of the design

Make lint check with Verilator.

Lint check does not take place on this project now, however we should add lint check for design correctness.
A module with only combinational logic will generate UNUSED warning on m_clock, so -Wno-UNUSED flag must be added.

Inference of memory block in BTB

Inference of memory block in BTB

Currently, memory blocks in BTB module does not infer for most of FPGA boards, since it does not register output port of memory and writing all of blocks simultaneously on flush.
So, with assumption that this module never be inferred, it has small amount of entries(32 entries).
However, to improve performance as respect to clock frequency and hit rates of BTB, it must use inferred block ram with more entries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.