GithubHelp home page GithubHelp logo

Comments (6)

Dolu1990 avatar Dolu1990 commented on June 20, 2024

Hi,

Did you tired with the vanilla GenFullNoMmuMaxPerf config ?
in which test environnement did you run you version ?

from vexriscv.

piondeno avatar piondeno commented on June 20, 2024

Hi,
After removing TCM and restore cache size back to 8 KB for each I and D bus.

/home/datakey/tools/riscv64-unknown-elf-gcc-2018/bin/riscv64-unknown-elf-gcc -fno-inline -fno-common -O3 -DPREALLOCATE=1 -DHOST_DEBUG=0 -DMSC_CLOCK  -march=rv32im  -mabi=ilp32 -g -O3  -fno-inline   -MD -fstrict-volatile-bitfields  -o build/dhrystone.elf build/src/main.o build/src/dhry_1.o build/src/dhry_2.o build/src/crt.o build/src/stdlib.o -lc -lc  -march=rv32im  -mabi=ilp32 -nostdlib -lgcc -mcmodel=medany -nostartfiles -ffreestanding -Wl,-Bstatic,-T,../libs/linkerAllInSramForSim.ld,-Map,build/dhrystone.map,--print-memory-usage 
Memory region         Used Size  Region Size  %age Used
       onChipRam:       26992 B        32 KB     82.37%
           sdram:          0 GB        64 MB      0.00%

After downloading bitstream to FPGA and run the program in release mode.
The result is showing below:

Dhrystone Benchmark, Version 2.1 (Language: C)

Program compiled without 'register' attribute

Please give the number of runs through the benchmark:
Execution starts, 500 runs through Dhrystone
Execution ends

Final values of the variables used in the benchmark:

Int_Glob:            5
        should be:   5
Bool_Glob:           1
        should be:   1
Ch_1_Glob:           A
        should be:   A
Ch_2_Glob:           B
        should be:   B
Arr_1_Glob[8]:       7
        should be:   7
Arr_2_Glob[8][7]:    510
        should be:   Number_Of_Runs + 10
Ptr_Glob->
  Ptr_Comp:          -2147459732
        should be:   (implementation-dependent)
  Discr:             0
        should be:   0
  Enum_Comp:         2
        should be:   2
  Int_Comp:          17
        should be:   17
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
  Ptr_Comp:          -2147459732
        should be:   (implementation-dependent), same as above
  Discr:             0
        should be:   0
  Enum_Comp:         1
        should be:   1
  Int_Comp:          18
        should be:   18
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:           5
        should be:   5
Int_2_Loc:           13
        should be:   13
Int_3_Loc:           7
        should be:   7
Enum_Loc:            1
        should be:   1
Str_1_Loc:           DHRYSTONE PROGRAM, 1'ST STRING
        should be:   DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:           DHRYSTONE PROGRAM, 2'ND STRING
        should be:   DHRYSTONE PROGRAM, 2'ND STRING

Clock cycles=213512
                    DMIPS per Mhz:                              1.33

The bench result is 1.33DMIPS/Mhz.
This result is better than TCM but not make sense.
Do you have any idea to help me verify it?
Thanks.

from vexriscv.

Dolu1990 avatar Dolu1990 commented on June 20, 2024

Hi,

I looked at the code, and i think i found the reason why :

arbitration.haltItself setWhen(stages.dropWhile(_ != execute).tail.map(s => s.arbitration.isValid && s.input(HAS_SIDE_EFFECT)).orR)

Basicaly, the data cache has the advantage that the write are delayed until writeback stage, while the thigly coupled dbus has the penality that write are scheduled early (execute stage) and should ensure that there is no risk of them being unscheduled by a branch or an exception or anything else.

So thigly coupled dbus will sometime have to wait for the pipeline to empty itself (when doing store)

from vexriscv.

piondeno avatar piondeno commented on June 20, 2024

Hi,

Thanks for the reply.
I got it.

from vexriscv.

piondeno avatar piondeno commented on June 20, 2024

Hi, @Dolu1990

May I ask one more question?

First, change the configuration for DivPlugin,

        //new DivPlugin,
        new MulDivIterativePlugin(genMul = false, genDiv = true, mulUnrollFactor = 1, divUnrollFactor = 2, dhrystoneOpt=true),

The bench will be improved like following
1.33DMIPS(8KB Cache IBUS, 8KB Cache DBUS) ->
1.38DMIPS(8KB Cache IBUS, 8KB Cache DBUS, divUnrollFactor = 2)->
1.44DMIPS(8KB Cache IBUS, 8KB Cache DBUS, divUnrollFactor = 2, dhrystoneOpt=true)
When setting dhrystoneOpt=true, is it really helpful to improve in real operation?

Second, when I set genMul = true and mulUnrollFactor=2 to replace MulPlugin,

        //new MulPlugin,
        //new DivPlugin,
        new MulDivIterativePlugin(genMul = true, genDiv = true, mulUnrollFactor = 2, divUnrollFactor = 2, dhrystoneOpt=true),

The bench test is decrease to 1.33MIPS.
Although using genMul = true in MulDivIterativePlugin can replace MulPlugin,
But performance is lower than MulPlugin.
Is it right?

Thanks

from vexriscv.

Dolu1990 avatar Dolu1990 commented on June 20, 2024

When setting dhrystoneOpt=true, is it really helpful to improve in real operation?

I would say, not realy usefull, as it only work for very small division numbers

But performance is lower than MulPlugin.

yes, at least in practice for FPGA

from vexriscv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.