GithubHelp home page GithubHelp logo

Linux on VexRiscv about vexriscv HOT 345 CLOSED

spinalhdl avatar spinalhdl commented on May 26, 2024
Linux on VexRiscv

from vexriscv.

Comments (345)

daveshah1 avatar daveshah1 commented on May 26, 2024 6

I have the userspace segfault issue seemingly fixed!

Screenshot from 2019-03-16 10-53-13

The problem was that the mapping code in the kernel was always mapping pages as RWX. But the kernel relies on pages being mapped read-only and triggering a fault on writes (e.g. for copy-on-write optimisations). Fixing that, and hacking the DBusCached plugin so that all write faults trigger a store page fault exception (the store fault exception was going to M-mode and causing problems, need to look into correct behaviour here), seems to result in a reliable userspace.

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024 6

Screenshot from 2019-03-16 15-05-20

liteeth is working too! Although the combination of lack of caching and expensive context switches means this takes the best part of a minute...

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024 4

Hardware refilled MMU + cacheless iBus dBus plugin design done, i will test it and keep you updated as soon there is something stable enough

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024 3

Tested for 8, 12, 16, 32 bits PLIC with close to minimal features.
It take about 10 LC per interrupt. I estimate the cost of the ExternalInterruptArrayPlugin about 4 LC per interrupt

PlicBench8 ->
iCE40 -> 110 Mhz 81 LC
PlicBench12 ->
iCE40 -> 100 Mhz 118 LC
PlicBench16 ->
iCE40 -> 92 Mhz 156 LC
PlicBench32 ->
iCE40 -> 70 Mhz 272 LC

But with all feature enabled, it is about 16 LC by interrupt XD

I have to say, that the implementation i used was made to be fast on multi bits priority configs. Not to be small on single bits priority. So probably it can be smaller.

https://github.com/SpinalHDL/VexRiscv/blob/linux/src/test/scala/vexriscv/experimental/PlicCost.scala

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024 2

@roman3017 Yes, it look right for me, the cpu0.yaml should be the one generated by the SpinalHDL generation.
Then, it will run this simulation workspace :
https://github.com/SpinalHDL/VexRiscv/blob/master/src/test/cpp/regression/main.cpp#L2697

But by default the golden model is disabled. and i don't think you can use it in the same time than the JTAG, it will think the CPU is doing crazy things.

So, what is required :

@enjoy-digital @roman3017 @kgugala @daveshah1

I think we need to consolidate things, trying to have a platform independent minimal port / having robust foundation :D

So there is the things i'm considering for the hardware side :

  • Implementing another MMU as the RISC-V spec require, as a self refilled one, and solving the memory coherency that it would have against the data cache by sharing the datacache between the CPU load/store and the MMU. I have to inverstigate a bit, but finaly it look like it would be without noticeable synthesis Area/Speed compromises.
  • Reworking the cache to be a write-through one, removing the write buffer, and also removing the write-to-read bypass muxes to reduce area for a minimal IPC impact and probably a better FMax, supporting multi way data cache

Refilling the MMU from the data cache would greatly reduce the need of internal TLB caches, which is quite heavy on FPGA, and also improve performance quite much.

So quite a bit of redesign, but i thing it worth it. Do it sound sensible ? Would that compromise your progress much ?

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024 1

Yes, I think an M-mode trap handler is the proper solution. We can probably use it to deal with any missing atomic instructions too.

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024 1

The Linux repo is: https://github.com/daveshah1/litex-linux-riscv

To build:

cp litex_default_configuration .config
ARCH=riscv CROSS_COMPILE=riscv32-unknown-linux-gnu- make -j`nproc`
riscv32-unknown-linux-gnu-objcopy -O binary vmlinux vmlinux.bin

Defconfig: https://github.com/daveshah1/litex-linux-riscv/blob/master/litex_default_configuration
Device tree: https://github.com/daveshah1/litex-linux-riscv/blob/master/rv32.dts

The arch code is in https://github.com/daveshah1/litex-linux-riscv/tree/master/arch/riscv

In particular:

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024 1

@roman3017 So, i have to say, i merged and adapted changes made by daveshah1 and kgugala by hands, as the head of the repo was quite different, things will not work out of the box, anyway they removed many bugs/spec missmatches :)

@daveshah1 and @kgugala About the Linux requirements. Which part of the RISC-V Atomic extension is used ? Only LR and SC ? Is that right ?

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024 1

I would personally prefer a RISC-V compatible interrupt controller rather than the one we have already, if it isn't too large or complex.

A small stub will be needed in M-mode to deal with SBI (e.g. setiing timers), forwarding timer interrupts, atomic emulation, etc. Perhaps this could be part of the LiteX BIOS.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

About atomics, there is some support in VexRiscv to provide LR/SC in a local way, it only work for single CPU systems.

from vexriscv.

 avatar commented on May 26, 2024

Yeah, "dummy" implementations that work on single CPU systems should be perfectly fine.

from vexriscv.

enjoy-digital avatar enjoy-digital commented on May 26, 2024

As discussed at Free Silicon Conference together with @Dolu1990 , we are also working on it here:
enjoy-digital/litex#134.

We can continue the discussion here for the CPU aspect. @daveshah1: i saw you made some progress,
just for info @Dolu1990 is ok to help getting things working. So it you see strange things or need help on things related to Spinal/Vexriscv, you can discuss your findings here.

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

My current status is that I have made quite a few hacks to the kernel, vexriscv and LiteX, but I'm still only just getting into userspace and not anywhere useful yet.

VexRiscv: https://github.com/daveshah1/VexRiscv/tree/Supervisor
Build config: https://github.com/daveshah1/VexRiscv-verilog/tree/linux
LiteX: https://github.com/daveshah1/litex/tree/vexriscv-linux
kernel: https://github.com/daveshah1/litex-linux-riscv

@Dolu1990 I would be interested if you could look at 818f1f6 - loads were always reading 0xffffffff from virtual memory addresses when bit 10 of the offset (0x400) was set. This seems to fix it, but I'm not sure if a better fix is possible

As it stands, the current issue is a kernel panic "Oops - environment call from S-mode" shortly after init starts. It seems after a few syscalls it either isn't returning properly to userspace, or a spurious ECALL is accidently triggered while in S-mode (it might be the ECALL getting "stuck" somewhere and lurking, so what should be an IRQ triggers the ECALL instead)

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

Hi @daveshah1 @enjoy-digital :D

So, for sure we will hit bugs in VexRiscv, as only the machine mode was properly tested.
Things not tested enough in VexRiscv which could have bugs :

  • Supervisor / User mode
  • MMU

I think the best would be to setup a minimal test environnement to run linux on. It would save us a lot of time and sanity. Especialy for a linux port project :D
So, to distinguish hardware bugs from software bugs my proposal is that i setup a minimalistic environnement where only the VexRiscv CPU is simulated and compared against a instruction syncronised software model of the CPU (I already have one which do that, but CSR are missing from it)
This would point exactly when the hardware is diverging from what it should do, and bring serenity in the developpement ^.^

Does that sound good for you ?

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

That sounds very sensible! The minimal peripheral requirement is low, just a timer (right now I have the LiteX timer connected to the timerInterruptS pin, and hacked the kernel to directly talk to that rather than the proper SBI route to setting up a timer) and a UART of some kind.

My only concern with this is speed, right now it is taking about 30s on hardware at 75MHz to get to the point of failure. So definitely want to use Verilator and not iverilog...

from vexriscv.

enjoy-digital avatar enjoy-digital commented on May 26, 2024

I can setup easily a verilator simulation. But 30s on hardware at 75MHz will still be a bit slow: we can expect 1MHz execution speed so that's still around 40 min...

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

I did just manage to make a bit of progress on hardware (perhaps this talk of simulators is scaring it into behaviour πŸ˜„)

It does reach userspace successfully, so we can almost say Linux is working. If I set /bin/sh as init, then I can even use shell builtins - being able to run echo hello world counts as Linux, right? (but calls to other programs don't seem to work). init itself is segfaulting deep within libc, so there's still something fishy, but could just be a dodgy rootfs.

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

@daveshah1 this is great. The libc segfault happened also in our REnode (https://github.com/renode/renode) emulation. Can you share the rootfs you're using?

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

initramdisk.gz

This is the initramdisk from antmicro/litex-linux-readme with a small change to inittab to remove some references to files that don't exist

In terms of other outstanding issues, I also had to patch VexRiscv so that interrupts are routed to S-mode rather than M-mode. This broke the LiteX BIOS which expects M-mode interrupts, so I had to patch that to not expect interrupts at all, but that means there is now no useful UART output from the BIOS. I think a proper solution would be to select interrupt privilege dynamically somehow.

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

We had to fix/workaround irq delegates. I think this code should be in our repo, but I'll check that again.

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

The segfault I see is:

[   53.060000] getty[45]: unhandled signal 11 code 0x1 at 0x00000004 in libc-2.26.so[5016f000+148000]
[   53.070000] CPU: 0 PID: 45 Comm: getty Not tainted 4.19.0-rc4-gb367bd23-dirty #105
[   53.080000] sepc: 501e2730 ra : 501e2e1c sp : 9f9b2c60
[   53.080000]  gp : 00120800 tp : 500223a0 t0 : 5001e960
[   53.090000]  t1 : 00000000 t2 : ffffffff s0 : 00000000
[   53.090000]  s1 : 00000000 a0 : 00000000 a1 : 502ba624
[   53.100000]  a2 : 00000000 a3 : 00000000 a4 : 000003ef
[   53.100000]  a5 : 00000160 a6 : 00000000 a7 : 0000270f
[   53.110000]  s2 : 502ba5f4 s3 : 00000000 s4 : 00000150
[   53.110000]  s5 : 00000014 s6 : 502ba628 s7 : 502bb714
[   53.120000]  s8 : 00000020 s9 : 00000000 s10: 000003ef
[   53.120000]  s11: 00000000 t3 : 00000008 t4 : 00000000
[   53.130000]  t5 : 00000000 t6 : 502ba090
[   53.130000] sstatus: 00000020 sbadaddr: 00000004 scause: 0000000d

The bad address (0x73730 in libc-2.26.so) seems to be in _IO_str_seekoff, the disassembly around it is:

   73700:	00080c93          	mv	s9,a6
   73704:	00048a13          	mv	s4,s1
   73708:	000e0c13          	mv	s8,t3
   7370c:	000d8993          	mv	s3,s11
   73710:	010a0793          	addi	a5,s4,16
   73714:	00000d93          	li	s11,0
   73718:	00000e93          	li	t4,0
   7371c:	00800e13          	li	t3,8
   73720:	3ef00d13          	li	s10,1007
   73724:	02f12223          	sw	a5,36(sp)
   73728:	04092483          	lw	s1,64(s2)
   7372c:	71648463          	beq	s1,s6,73e34 <_IO_str_seekoff@@GLIBC_2.26+0x41bc>
   73730:	0044a783          	lw	a5,4(s1)

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

I checked the code, and it looks like all has been pushed to github.

As for the segfault: Note that we had to re implement the mapping code in Linux + there are some hacks in the Vex MMU itself. This could be reason of the segfault as user space starts using the virtual memory very extensively.

For example the whole kernel memory space is mapped directly and we bypass the MMU translation maps see:
https://github.com/antmicro/VexRiscv/blob/97d04a5243bbfee9d1dfe56857f3490da9fe1091/src/main/scala/vexriscv/plugin/MemoryTranslatorPlugin.scala#L116

the kernel range is defined in MMU plugin instance: https://github.com/antmicro/VexRiscv/blob/97d04a5243bbfee9d1dfe56857f3490da9fe1091/src/main/scala/vexriscv/TestsWorkspace.scala#L98

I'm pretty sure there are many bugs hidden there :)

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

Ok, I will think about the best way and how exactly setup that test environnement with the syncronised software golden model (to get max speed).
About the golden model, i will complet it (MMU part). But then about the CSR i can do it too, but probably the best would be that somebody else than me cross check my interpretation of the privileged spec, because if both the hardware and the software golden model implement the same wrong interpretation, that's not so helpfull ^^.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@enjoy-digital
Maybe we can keep the actual regression test environnement of VexRiscv, and just complet it with the required stuff.
It's a bit dirty, but it should be fine.
https://github.com/SpinalHDL/VexRiscv/blob/master/src/test/cpp/regression/main.cpp

The golden model is currently there
https://github.com/SpinalHDL/VexRiscv/blob/master/src/test/cpp/regression/main.cpp#L193

from vexriscv.

enjoy-digital avatar enjoy-digital commented on May 26, 2024

@Dolu1990: in fact i already have the verilator simulation that is working fine, just need improve it a little bit load more easily the vmlinux.bin/vmlinux.dtb and initramdisk to ram. But yes, we'll use what it more convenient for you. I'll look at the your regression env and your golden model.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@enjoy-digital Can you show me the verilator testbench sources :D ?

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@kgugala Which CPU configuration are you using, can you show me ? (The test workspace you pointer isn't using caches nor MMU)

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

The config I am using is at https://github.com/daveshah1/VexRiscv-verilog/blob/linux/src/main/scala/vexriscv/GenCoreDefault.scala (which has a few small tweaks compared to @kgugala's, to skip over FENCEs for example).

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@enjoy-digital The checks between the golden model and the RTL are :

  • Register file writes
  • Peripheral accesses
  • Some liveness checks

It should be enough to find out divegences fast.

@daveshah1 Jumping over Fence instruction is probably fine for the moment. But jumping over iFence instruction isn't. There is no cache coherency between the instruction cache and the data cache.

Need to use the caches fluch :) Is that used by some ways ?

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

(Memory coherency issues is something which is automaticaly catched by the golden model / RTL cross checkes)

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

As it stands it looks like all the memory has been set up as IO, which I suspect means the L1 caches won't be used at all - I think LiteX provides a single L2 cache.

Indeed, to get useful performance proper use of caches and cache flushes will be needed.

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

yes, we disabled the caches as they were causing a lot of troubles. It didn't make sense to fight both MMU and caches at the same time

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@daveshah1 Ok ^^ One thing to know, is the instruction cache do not support IO instruction fetch, instead it cache them. (Supporting IO instruction fetch cost area, and isn't realy a usefull think, as far i know ?)
So you still need to flush the instruction cache in iFence. It could be done easily.

@kgugala The cacheless plugins aren't aware about the MMU.
I perfectly understand your point about avoiding the trouble of both at once. So my proposal, is :

  • I port MMU support to cacheless instruction and data plugins
  • We test things on that cacheless configuration
  • Later when things are stable enough, we can introduce caches stuff via a proper machine mode ifence emulation

To the roadmap would be :

  • To port MMU support into cacheless plugins
  • Implement the cross checked test environnement
  • Test and fix stuff until it is stable enough
  • Introduce the caches in the loop with proper machine mode emulation

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

TBH the real long term solution will be to reimplement the MMU so it is fully compliant with the spec. Then we can get rid of the custom mapping code in Linux and restore the original mainline memory mapping code used for RV64.

I'm aware this will require quite significant amount of work in Vex itself.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

I don't think it would require that much work. MMU is a relatively easy piece of hardware.
I have to think about he heavyness in term of FPGA area of a fully compliant MMU.

But what is the issue of a software refilled MMU ? If it use the machine mode to do it, it became transparent to the linux kernel right ? So no linux kernel modification required, but just a piece of machine mode code to have in addition of the raw Linux port :) ?

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

(troll on)
We should not forget the ultimate goal : RISC-V linux on ice40 1K, i'm sure #28 would agree ^.^
(troll off)

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

It just may be difficult to push the custom mapping code to Linux' mainline

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

The trap handler need not sit in Linux at all, it can be part of the bootloader.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@kgugala By mapping you mean the different flags of each MMU TLB of VexRiscv (https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/plugin/MemoryTranslatorPlugin.scala#L51) ? If the given feature aren't enough, i'm happy to fix that in the first place

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

@daveshah1 yes, it can. But that makes things even more complicated as two pieces of software will have to be maintained.

@Dolu1990 the flags were sufficient. One of the missing part is variable map size. AFAIK right now you can map only 4k pages. This made mapping of the whole kernel space impossible - the MMU's map table is to small to fit so many 4k entries. This is the reason we added this constant kernel space mapping hack. Also, in user space, there are many mappings for different contexts. Those mappings are switched very often, so rewriting those every time with 2 custom instructions for every 4k page is very slow.

We haven't tested properly if the reloading is done properly, and if the mappings are refreshed correctly in the MMU itself. This, IMO, is the reason of a segfault we're seeing in user space.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@kgugala the initial idea tohandle pages bigger than 4KB was to just translate them on demand to 4KB ones in the TLB
For example

Access at vitual address 0x1234568, via a 16 MB page which map 0x12xxxxxx to 0xABxxxxxx =>
Software emulation which add in the TLB cache a 4KB TLB which map 0x12345xxx to 0xAB345xxx

But now that i think about it, maybe the support of 16MB pages can be added for very few hardware addition over the exisiting solution.

The software model should also be able to indirectly pick up MMU translation errors :)

from vexriscv.

enjoy-digital avatar enjoy-digital commented on May 26, 2024

@Dolu1990: the simulation source is here:
https://github.com/enjoy-digital/litex/blob/master/litex/utils/litex_sim.py
and
https://github.com/enjoy-digital/litex/tree/master/litex/build/sim

With a vmlinux.bin with the .dtb appended, we can run linux with on mor1kx with:
litex_sim --cpu-type=or1k --ram-init=vmlinux.bin

For now for Vexriscv, i was hacking the ram initialization function to aggregate the vmlinux.bin, vmlinux.dtb and initramdisk.gz, but i'll thinking about using a .json file to describe how the ram needs to be initialized:

{
    "vmlinux.bin":    0x00000000,
    "vmlinux.dtb":    0x01000000,
    "initramdisk.gz": 0x01002000,
}

and then just do:
litex_sim --cpu-type=vexriscv --ram-init=ram_init_linux.json

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

The software right now maps the pages on demand.

@Dolu1990 The problem is that kernel space has to be mapped for the whole time. The whole kernel runs in S mode in virtual memory. This space cannot be unmapped, because any interrupt/exception (including TLB miss) etc may happen at any time. We cannot end up in a situation where TLB miss causes jump to a handler, which is not mapped at the moment causing another TLB miss. This would end up in terrible miss->handler->miss loop

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@enjoy-digital Ahh, ok, so it is a SoC level simulation. I think the best would realy be to stick to a raw CPU simulation in Verilator, to realy keep a full control over the CPU, and keep it raw nature, and keep simulation performance as high as possible to reduce sim time.

@kgugala This is the purpose of Machine mode emulation. Basicaly, in machine mode, the MMU translation is off, and the cpu can do all sort of things, without the supervisor mode even being able to notice it.

There is the schedule of a user space TLB miss :

  • Use space TLB miss

  • It trigger a machine mode exception

  • The machine mode MMU software refiller check the TLB in the main memory

  • If there is a memory TLB existing, it refill the hardware MMU and return into user mode without supervisor even knowing

  • If there was no memory TLB to map required access, it emulate a supervisor exception and return the execution to the supervisor.

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

@daveshah1 this is awesome

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@daveshah1 Great :D

from vexriscv.

 avatar commented on May 26, 2024

What do you think about no-MMU support for Linux on RISC-V? Would it be possible? That would require hacking the kernel, instead of VexRiscv, of course.

from vexriscv.

enjoy-digital avatar enjoy-digital commented on May 26, 2024

Awesome @daveshah1!

from vexriscv.

roman3017 avatar roman3017 commented on May 26, 2024

@wm4: https://en.wikipedia.org/wiki/MClinux

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

@daveshah1 on what platform do you run it? Do you run it with the ramdisk you shared before? I tried to run it and it seems to be stucking at:

[    0.000000] RAMDISK: gzip image found at block 0

I boot linux commit d27b7d5cb658ccb9ade4bea6a12feb08ebdcc541

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

initramdisk.gz

Reuploading ramdisk just in case, but don't think there have been any changes.

The kernel requires the LiteX timer to be connected to the VexRiscv timerInterruptS, and the cycle/cycleh CSRs to work. ime 'stuck during boot' has generally been timer-related problems.

My platform:

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

This must be the timer interrupt then. I'll add this to my test platform

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

Oh, I see you run it with the latest Litex. I tried it on the system we used for the initial work (from December 2018). I have to rebase our changes

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

I bumped all the parts and have it running on Arty :)

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

Awesome! I just pushed some very basic kernel-mode emulation of atomic instructions, which has improved software compatibility a bit (the current implementation I've done isn't actually atomic yet, as it ignores acquire/release for now...)

from vexriscv.

roman3017 avatar roman3017 commented on May 26, 2024

@Dolu1990 If I were to use RiscvGolden as you have suggested, would I run it with
VexRiscv/src/test/cpp/regression$ make DEBUG_PLUGIN_EXTERNAL=yes
Then connect openocd with
openocd$ openocd -c "set VEXRISCV_YAML cpu0.yaml" -f tcl/target/vexriscv_sim.cfg
Then load vmlinux, dtb and initrd over gdb. I just want to make sure to use it as expected.

from vexriscv.

enjoy-digital avatar enjoy-digital commented on May 26, 2024

@Dolu1990: yes that seems sensible and the way to go for the long term. Getting to the actual situation was i think the hard work (thanks to all) and we now know what's need to be improved.
Not sure we'll run the Linux SoC on small FPGAs (the ice40 boards we will be lacking external memory or resources), so even if the new MMU use a bit more resources, it will be ok. The current situation still allow us to improve things that are not directly related to VexRiscv and the MMU: test it on various targets to verify LiteDRAM is working correctly on all targets (testing it on ULX3S will also be interesting), load code from SPI Flash, SDCard, integrate others cores, start working on drivers, etc... So plenty to do :)

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@roman3017
Ok, so i would say, for the moment don't botter about the golden model, we will come to it after the hardware rework :)

from vexriscv.

kgugala avatar kgugala commented on May 26, 2024

@Dolu1990 @enjoy-digital

With spec compliant MMU it will be much easier to push the 32 bit Linux code upstream, and make Vex the first supported 32 bit CPU by the Linux mainline.

I agree that simpler MMU in the spec may be beneficial for FPGA implementations. I'll rise this topic on the next RISC-V Soft Cores work group meeting.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@kgugala I'm not sure a simpler MMU spec is required :) It should be fine with the actual one. I will try it.

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

Other remaining issues for Linux mainline support as well as the MMU:

  • time/timeh CSRs (shouldn't be hard)
  • SBI stub for setting the timer - perhaps the timer should be built into VexRiscv (and thus tied to the time CSR) rather than provided by LiteX
  • Interrupt control, either getting the VexRiscv interrupt driver upstream, working on a PLIC implementation, or emulating a proper PLIC in M-mode
  • Atomics, re-adding the proper atomic instructions in Linux that upstream uses and removing the userspace atomic emulation that I added to the kernel. Probably doing this in M-mode is easier than adding all the amo* instructions
  • LiteX UART and liteeth Ethernet, either replacing these with modules with an upstream driver or getting the drivers for these upstream
  • Fences/cache flush instructions

from vexriscv.

roman3017 avatar roman3017 commented on May 26, 2024

@Dolu1990 Thank you very much for your explanation. I like the long term plan and will wait for HW changes for now.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

I created a new branch :
https://github.com/SpinalHDL/VexRiscv/tree/linux

The goal there would be to develop the hardware refilled MMU + new data cache design and have the "raw" test environnement.

@daveshah1

  • time/timeh => i agree, and maybe later, when everything is ok, we can think about emulation (would save ~128 luts)
  • Timer => let's be as close as possible than the reference implementation ^.^ => + 1
  • PLIC => there is one already implemented in SpinalHDL, it can be used for test purposes, but probably a migen one will be required to have flexibility in the LiteX MiSoC flow.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

Got the VexRiscv with the new MMU design + cache less to pass all the standard regression (which aren't using the MMU)
On Artix7, the MMU (untested) cost about 250 LUT + 400 register for 4 iTLB + 4 dTLB

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

Very good! 250 LUTs is certainly not a problem, the existing design only uses about 50% of the Versa's ECP5 45k.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

Todo :

Just let's me know if you want to pick one ^^

from vexriscv.

mithro avatar mithro commented on May 26, 2024

We should most certainly look at getting liteeth and litex uart drivers upstream. @shenki was looking at this a while back I believe.

from vexriscv.

mithro avatar mithro commented on May 26, 2024

We should also be able to share the liteeth and litex UART between or1k and riscv support, so @stffrdhrn might also be interested.

from vexriscv.

mithro avatar mithro commented on May 26, 2024

@mgielda is probably interested in this issue too.

from vexriscv.

stffrdhrn avatar stffrdhrn commented on May 26, 2024

@mithro I'm interested. Last I checked (~3 months ago) these drivers both worked on openrisc in qemu and on Arty.

I didn't read the whole conversation. Want me to clean them up and submit upstream? Or is someone else working on it?

from vexriscv.

mithro avatar mithro commented on May 26, 2024

I would suggest that @daveshah1, @shenki and @stffrdhrn should coordinate on cleaning the LiteX uart and LiteEth drivers into the upstream kernel? Don't know the best way to do that however...

BTW There is a linux-litex Google Group / Mailing list.

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

daveshah1/litex-linux-riscv@5f4338e makes the driver little endian for RISC-V, this would need generifcying.

I also made a few fixes along the way:
daveshah1/litex-linux-riscv@5c46c48 fixes a serious memory leak
daveshah1/litex-linux-riscv@ac29e8f fixes a panic
daveshah1/litex-linux-riscv@116b2d2 may or may not actually fix anything

The fact that I found at least two serious issues in a short period of time make me think some more testing is warranted.

We should also look at performance, right now this is peaking at about 150kB/s for me, I am hoping to use this for a rootfs on NFS (the Versa has no other mass storage options by default). I don't know how much of the performance problem is just the CPU/MMU stuff and how much is the driver/core.

from vexriscv.

enjoy-digital avatar enjoy-digital commented on May 26, 2024

@daveshah1: to give you an idea, with mor1kx wget was 380KB/s with a 50MHz cpu. We were discussing about adding DMA to LiteEth to improve that.

from vexriscv.

enjoy-digital avatar enjoy-digital commented on May 26, 2024

@mithro, @daveshah1, @shenki @stffrdhrn: to try to coordinate the work on the drivers and avoid polluting to much this current issue, i just created some issues for the drivers in https://github.com/antmicro/litex-linux-riscv:
antmicro/litex-linux-riscv#1
antmicro/litex-linux-riscv#2

from vexriscv.

mithro avatar mithro commented on May 26, 2024

@daveshah1 I'm sure you will find a lot more bugs when cleaning up and testing.

FYI We collected some discussion about adding LiteEth DMA in this Google Doc.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

Currently testing the self refilled MMU

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

The state now is :

I have to say, I'm realy not much experienced into lowlevel linux stuff.
So, i need to know where are all the configs related to the plateform, and how to build the linux
Do you have some documentation about it ?

Thanks :)

from vexriscv.

enjoy-digital avatar enjoy-digital commented on May 26, 2024

@Dolu1990: do you want to also test on hardware? If so i can prepare a design for you where you'll be able to
insert the generated Vexriscv. I just need to know which FPGA board you have with DRAM and Ethernet.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@daveshah1 Thanks :D
@enjoy-digital Currently, i will try to stay in simulation, because of the software model which check the VexRiscv behaviour. But then, if the simulation is reaaaaly too slow. Sure, i will ask you :)

About the VexRiscv repo, to run the tests :

git clone https://github.com/SpinalHDL/SpinalHDL.git -b dev
git clone https://github.com/SpinalHDL/VexRiscv.git -b linux
cd VexRiscv
sbt "runMain vexriscv.demo.LinuxGen"
cd src/test/cpp/regression
make run IBUS=SIMPLE DBUS=SIMPLE REDO=10 DHRYSTONE=yes COMPRESSED=yes TRACE=no

It will take some time to generate the core the first time you run it, as it use an unreleased version of SpinalHDL.

from vexriscv.

futaris avatar futaris commented on May 26, 2024

Where can I find documentation on getting this to run on renode?

from vexriscv.

roman3017 avatar roman3017 commented on May 26, 2024

@Dolu1990 The mmu tests work great. I have also tried to run the following command instead of mmu tests:

Workspace("run").withRiscvRef()->noInstructionReadCheck()->run(0xFFFFFFFFFFFF);

Then connecting openocd still works. But connecting gdb afterwards is for some reason crashing it:

BOOT
CONNECTED
makefile:181: recipe for target 'run' failed
make: *** [run] Segmentation fault (core dumped)

I have compiled vmlinux and was hoping to load it over gdb. I have also converted elf to hex and tried loadHex("vmlinux.hex") and bootAt(0xc0000000) but cannot connect gdb. Likely I am not using it as expected.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

Hmmm have to check that.
Also, i will have some fixes to do in openocd to manage the MMU stuff properly.

Anyway, i think the best is to create a dedicated class which extend Workspace. Then in it we can redefine the memeory mapping that we need, and also load the binaries directly without using the JTAG stuff :)

So, just have a fiew things to fix myself on Vex, then i setup a minimal workspace that we can extend to emulate the peripheral we need.

Things i'm currently fixing :

  • VexRiscv wasn't implementing the interruption flags exactly as the spec is saying. I'm fixing that now.
  • I'm also adding more regression test around the privilege modes and the delegation stuff.

I will tell you as soon it's done :)

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@roman3017 Ahhh now i get it, if you want to use the debug interface in the sim you should not use the withRiscvRef() stuff. Has the VexRiscv software model do not include the debug interface yet.

from vexriscv.

roman3017 avatar roman3017 commented on May 26, 2024

@Dolu1990 Thank you very much for explanation. I will use SoC on FPGA for now.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

simulation command update :
make run DBUS=SIMPLE IBUS=SIMPLE SUPERVISOR=yes CSR=yes COMPRESSED=yes TRACE=no
the jtag is broken, but that's fine for the moment, 10 test will fail because of that.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

@kgugala @daveshah1 @enjoy-digital @roman3017
I pushed everything required to run the simulation and to load the linux in. .
There is some notes how to use the thing in case of :
https://github.com/SpinalHDL/VexRiscv/blob/linux/src/main/scala/vexriscv/demo/Linux.scala#L30

Now, we have to make some choices together ^.^
It crash on the second instruction, which is related to the interrupt controller :

        csrw VEXRISCV_CSR_IRQ_MASK, zero
c0000004:	bc001073          	csrw	0xbc0,zero

This is for the none RISCV interrupt controller added inside VexRiscv for Misoc/Litex compatibility.
Do we want to keep it for the Linux stuff ? or we move on a regular RISC-V design ?
Which mean :

      input   timerInterrupt,
      input   externalInterrupt,
      input   softwareInterrupt,
      input   externalInterruptS, //(Supervisor)

So from the spec, there is no input to set the supervisor timer interrupt pending (STIP), it is done via machine mode which has to set the STIP flag durring a machine timer interrupt.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

Just for the info :
LitexSoC peripheral emulation isn't written :

//TODO Emulate peripherals here

LitexSoC workspace usage :

LitexSoC("linux")

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

About the PLIC, its parametrization can greatly reduce the footprint, basicaly, "removing" the priority stuff by setting it's width to one bit, and hard-wireing all the gateway priority to 1 and all the targets threshold to zero.

I will do some ice40 benches to get a better idea of the final footprint.

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

Looks good. I'm happy with the PLIC solution, so long as it doesn't cause too much trouble with the LiteX integration cc @enjoy-digital

from vexriscv.

enjoy-digital avatar enjoy-digital commented on May 26, 2024

I'm also fine with the PLIC solution, i need to look at that but don't think it will be too complex to integrate in LiteX.

from vexriscv.

futaris avatar futaris commented on May 26, 2024

About the Linux requirements. Which part of the RISC-V Atomic extension is used ? Only LR and SC ? Is that right ?

64-bit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/include/asm/atomic.h?h=v5.1-rc2

32-bit:
https://github.com/daveshah1/litex-linux-riscv/blob/master/arch/riscv/include/asm/atomic.h

Looks like only LR and SC are used.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on May 26, 2024

So, i made some experiment with the head of the main riscv-linux repo.
The objective is to go as far as possible without any change into the kernel, and emulating all missing feature in machine mode via an emulator.

The sources are here :
https://github.com/SpinalHDL/VexRiscv/tree/linux/src/main/c/emulator/src

It already work a bit :

[    0.000000] Linux version 4.20.0-g8fe28cb (spinalvm@spinalvm-VirtualBox) (gcc version 7.2.0 (GCC)) #1 Sun Mar 24 20:18:48 CET 2019
[    0.000000] printk: bootconsole [early0] enabled

πŸ’ƒ

Then it trigger

	BUG_ON(mem_size == 0);
c000419c:	00079463          	bnez	a5,c00041a4 <setup_arch+0x140>
c00041a0:	00100073          	ebreak

I will look further tomorrow
Anyway thanks all for the tips/help/commands/codes, it realy helped :)

@futaris In fact, there was atomic instruction (amoxxx) veeeery early and at multiple places in the binary. I had to emulate them in the machine mode.

from vexriscv.

futaris avatar futaris commented on May 26, 2024

Looks like arch/riscv/include/asm/futex.h is where they come from in mainline linux.

And it looks like userspace linux, needs the "A" (atomic) extension:
ivmai/libatomic_ops#31 (comment)

from vexriscv.

futaris avatar futaris commented on May 26, 2024

Oh, and I'm not sure how to support !CONFIG_GENERIC_ATOMIC64 with __riscv_xlen < 64 ... I think that we'd have to do something similar to what is done in arch/arm/include/asm/atomic.h.

from vexriscv.

futaris avatar futaris commented on May 26, 2024

Looks like your code for AMOxxx opcodes are similar to daveshah1's :

daveshah1/litex-linux-riscv@a9819e6#diff-48943b18b315b64e8efabc4035b9ed19R114

from vexriscv.

futaris avatar futaris commented on May 26, 2024

https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/XVha867D0y0

from vexriscv.

futaris avatar futaris commented on May 26, 2024

If you want to try building a rootfs without atomics, try buildroot.

If you disable BR2_RISCV_ISA_RVA, then you'd need to enable support in uClibc or musl.

glibc needs atomics though.

from vexriscv.

futaris avatar futaris commented on May 26, 2024

openembedded / yocto is another alternative.
https://github.com/riscv/meta-riscv

glibc 32-bit support is still not upstream.

riscv/meta-riscv@ab1ebdc
@alistair23 seems to be working on 32-bit linux support in meta-riscv.

from vexriscv.

daveshah1 avatar daveshah1 commented on May 26, 2024

I don't think an atomic free RISC-V userspace is possible. glibc and musl won't compile without atomics, although I haven't looked at uclibc. In practice I was finding simple C stuff such as busybox was tending not to call any atomic instructions, but C++ stuff was calling them at startup. This is why I ended up implementing the kernel mode emulation of them.

from vexriscv.

futaris avatar futaris commented on May 26, 2024

It is possible to add support so that we can have an atomic free RISC-V userspace and kernel, by making sure your gcc doesn't emit atomic instructions, by selecting the correct arch.

e.g. -march=rv32i

or -march=rv32ima (for something that supports multiply and atomic)

https://gcc.gnu.org/wiki/Atomic
https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/arm/linux-atomic.c

You would need to do software atomic, in the kernel, like the above.

As of this writing, there are no A routine emulations because they were rejected as part of the Linux upstreaming process -- this might change in the future, but - for now - we plan to mandate that Linux-capable machines subsume the A extension as part of the RISC-V platform specification.

https://www.sifive.com/blog/all-aboard-part-1-compiler-args

It’s only possible to emulate the A extension on single processor machines, where it happens to be very cheap to implement the A extension. Thus, it seemed simpler to reduce the number of ABIs supported (4 instead of 6). If someone decides to build non-A, Linux-capable machines then we’ll re-evaluate the situation.

https://forums.sifive.com/t/questions-about-all-aboard-series-part-1/781

from vexriscv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.