Comments (6)
Hi,
As far as I understand using Vec means using more LUT cause all the signals have to be synthetized
Yes right, that's not the way to go.
Indeed using the Axi Stream is a bit slower, but use less LUT, am I right?
Yes, serializing things is a better tradeoff i think, then eventualy buffering things into a Mem (ram) for later reuse.
from vexriscv.
Thank you @Dolu1990 ,
So, going to the Streaming mode, I'm using code taken from here: #53
I'm trying, without success to make it work with readStreamNonBlocking. I've to investigate further on the valid and read signal.
I'll keep you updated. Any further help is appreciated.
Apb3Axis is connected in the main SoC scala code, the Apb3Axis class looks like above
package lk.lib
import spinal.core._
import spinal.lib._
import spinal.lib.bus.amba3.apb.{Apb3, Apb3Config, Apb3SlaveFactory}
case class Apb3Axis(apb3Config: Apb3Config) extends Component {
val io = new Bundle {
val apb = slave(Apb3(apb3Config))
val input = slave(Stream(Bits(32 bits)))
val output = master(Stream(Bits(32 bits)))
}
val ctrl = Apb3SlaveFactory(io.apb)
// input stream is by readStreamNonBlocking, but is not working, indeed comment code by streamfifo is working
ctrl.readStreamNonBlocking(io.input.queue(128), address = 0)
//val ififo = StreamFifo(dataType = Bits(32 bits), depth = 128)
//ififo.io.push << io.input
//ctrl.read(ififo.io.pop.payload, address = 0);
//val ififoPopReady = ctrl.drive(ififo.io.pop.ready, address = 4)
//ctrl.read(ififo.io.pop.valid, address = 8);
//when(ififo.io.pop.valid) { ififoPopReady := False }
val wordCount = (1 + widthOf(io.input.payload) - 1) / 32 + 1
val wordAddressInc = 32 / 8
val addressHigh = 0 + (2 - 1) * wordAddressInc
SpinalInfo("Wordcount: " + wordCount)
SpinalInfo("addressHigh: " + addressHigh)
// output stream is by streamfifo, but needs to be converted to createAndDriveFlow
val ofifo = StreamFifo(dataType = Bits(32 bits), depth = 128)
ofifo.io.pop >> io.output
ctrl.drive(ofifo.io.push.payload, address = 12)
val ofifoPushValid = ctrl.drive(ofifo.io.push.valid, address = 16)
ctrl.read(ofifo.io.push.ready, address = 20)
when(ofifo.io.push.ready) { ofifoPushValid := False }
//val writeFlow = ctrl.createAndDriveFlow(Bits(32 bits), address = 0)
//writeFlow.toStream.stage() >> ofifo.io.push
}
main C sample code is below
typedef struct
{
volatile uint32_t IN_DATA;
volatile uint32_t IN_READY;
volatile uint32_t IN_VALID;
volatile uint32_t OUT_DATA;
volatile uint32_t OUT_VALID;
volatile uint32_t OUT_READY;
} AXIS_Reg;
#define AXIS ((AXIS_Reg *)(0xF0060000))
// inside the main function
while (1)
{
while (AXIS->IN_VALID == 0)
{
asm volatile("");
}
AXIS->OUT_DATA = 3 + AXIS->IN_DATA;
AXIS->OUT_VALID = 0xFFFF;
while (AXIS->OUT_VALID != 0)
{
asm volatile("");
}
AXIS->IN_READY = 0xFFFF;
while (AXIS->IN_READY != 0)
{
asm volatile("");
}
}
then the verilog top function syntetized
I'm looking the the verilog analizer at axis_input_payload and axis_output_payload;
they works (output is input +3 each ticks) if input stream is implemented using streamfifo (the commented code of Apb3Axis), indeed does not work using readStreamNonBlocking
//tick_1s_tick ticks every 1 second
reg axis_input_valid;
wire axis_input_ready;
reg [31:0] axis_input_payload;
wire axis_output_valid;
reg axis_output_ready;
wire [31:0] axis_output_payload;
always @(posedge clk)
begin
if(tick_1s_tick)
begin
axis_input_valid <= 1'b1;
axis_input_payload <= samplePayload;
end
else
begin
axis_input_valid <= 1'b0;
end
if(axis_output_valid)
begin
axis_output_ready <= 1'b1;
end
else
begin
axis_output_ready <= 1'b0;
end
end
Soc Soc_inst(
// all the other signals
.io_axis_input_valid(axis_input_valid),
.io_axis_input_ready(axis_input_ready),
.io_axis_input_payload(axis_input_payload),
.io_axis_output_valid(axis_output_valid),
.io_axis_output_ready(axis_output_ready),
.io_axis_output_payload(axis_output_payload)
);
from vexriscv.
I was able to make it works, but I've performance problem.
So, first. How i make this work using readStreamNonBlocking and createAndDriveFlow, using a 31 bit payload.
Apb3Axis looks now like below:
case class Apb3Axis(apb3Config: Apb3Config) extends Component {
val io = new Bundle {
val apb = slave(Apb3(apb3Config))
val input = slave(Stream(Bits(31 bits)))
val output = master(Stream(Bits(31 bits)))
}
val busCtrl = Apb3SlaveFactory(io.apb)
val ioinputqueue = io.input.queueLowLatency(128)
busCtrl.readStreamNonBlocking(
ioinputqueue,
address = 0,
validBitOffset = 31,
payloadBitOffset = 0
)
val ofifo = StreamFifoLowLatency(dataType = Bits(31 bits), depth = 128)
ofifo.io.pop >> io.output
val writeFlow = busCtrl.createAndDriveFlow(Bits(31 bits), address = 4)
writeFlow.toStream.stage() >> ofifo.io.push
}
software side (almost like below) - notice I'm using a Gpio output to check the timing on the the analyzer
typedef struct
{
volatile uint32_t IN_PAYLOAD;
volatile uint32_t OUT_PAYLOAD;
} AXIS_Reg;
#define AXIS ((AXIS_Reg *)(0xF0060000))
#define IN_PAYLOAD_VALID_MASK 0x80000000
#define IN_PAYLOAD_VALID_SHIFT 31
#define IN_PAYLOAD_DATA_MASK 0x7FFFFFFF
#define IN_PAYLOAD_DATA_SHIFT 0
//
// in main function, main while loop
//
while (1)
{
uint32_t payload = AXIS->IN_PAYLOAD;
if ((payload & IN_PAYLOAD_VALID_MASK) >> IN_PAYLOAD_VALID_SHIFT == 1) {
uint32_t data = (payload & IN_PAYLOAD_DATA_MASK) >> IN_PAYLOAD_DATA_SHIFT;
if (data == 10) {
gpioA_setOutputBit(0);
gpioA_clearOutputBit(0);
}
// AXIS->OUT_PAYLOAD = data;
}
}
Verilog side
reg axis_input_valid;
wire axis_input_ready;
reg [30:0] axis_input_payload;
wire axis_output_valid;
reg axis_output_ready;
wire [30:0] axis_output_payload;
reg [30:0] axis_output_payload_reg;
reg [30:0] signaldata_reg;
reg [30:0] signalretdata_reg;
initial signaldata_reg = 0;
always @(posedge clk)
begin
// enable signal send every 1 second (it will be at 10kHz in the future)
if(tick_1s_tick)
begin
signalnum_reg <= 1;
end
// check sending 32 data payload
if(signaldata_reg >= 1 && signaldata_reg <= 32 && axis_input_ready)
begin
signaldata_reg <= signaldata_reg + 1;
axis_input_valid <= 1'b1;
axis_input_payload <= signaldata_reg;
end
else
begin
axis_input_valid <= 1'b0;
end
// receiving output and moving to data
axis_output_ready <= 1'b1;
if(axis_output_valid)
begin
signalretdata_reg <= axis_output_payload;
end
end
Soc Soc_inst(
// all the other signals
.io_axis_input_valid(axis_input_valid),
.io_axis_input_ready(axis_input_ready),
.io_axis_input_payload(axis_input_payload),
.io_axis_output_valid(axis_output_valid),
.io_axis_output_ready(axis_output_ready),
.io_axis_output_payload(axis_output_payload)
);
Code above works with StreamFifo and .queue. I've find no difference using that or the LowLatency one.
I'm running that code on a Briey based Soc, running @ 72Mhz main clk on a Tang Primer 20k.
In the future I'm going to use a payload of 31 bit, then I'll put a command in the first 7 bit, and use the other 24 for data.
My problem is about performance. I've try StreamFifo and .queue instead of queueLowLatency and StreamFifoLowLatency but it makes not difference.
I've used the gpio output to measure how much time it takes for a signal to be read (or sent). it seems reading a signal takes many cycles. If you look at the 1024 cycles capture below you will notice the gpio I/O happened almost at cycle 530, for data number 10. It means if I have to send 32 data payload it will takes 1700 cycle almost. That is some kind of too much for my requirements. I've to send data hopefully at 10kHz. it means I've 7200 cycles to to make math (simple math) each loop in my software.
StreamFifo and .queue instead of queueLowLatency and StreamFifoLowLatency makes not difference. Running the SoC at 72Mhz or 12Mhz makes not difference. FPGA/Briey and busses (AXI+APB3) are all running within the same clock domain).
Do I miss something?
Note for image (here I'm using a payload that contains a command in the first 7 bit, and use the other 24 for data).
Writing data back to output 'AXIS->OUT_PAYLOAD = xxx ' makes no difference in timing, that means SoC to FPGA is fast. It's just the input a little too slow for me.
Thanks for help.
from vexriscv.
Hi, looking at your simulation, it show things from time 0 right ?
Thing is, the CPU will need a bit of time to reach the while loop.
from vexriscv.
Thank you @Dolu1990
So, analyzer it's triggered at tick_1s_tick edge. And it's showing from time 0. If you look at signnum_reg you will find the 32 signals loaded to the StreamFifo in 32 cycles (find zoomed below).
I'm testing the InterruptCtrl but this will not make difference now, cause in the while loop I'm always reading.
Zoomed in the other image the a load (uint32_t payload = AXIS->IN_PAYLOAD;) + unload (AXIS->OUT_PAYLOAD = data;) timing.
It's almost 90 cycles.
Maybe something involving DMA can help?
Sorry for my dumbness but I've just entered the FPGA SoC world.
I know I'm asking a lot from this core. My plan is to make this works on VexRiscV (even at slower speed), cause I like this project (portable and customizable), then when I'll be ready maybe moving to an hardware core (xilinx ARM) changing the busses of course.
Verilog below contains the actual payload content (signal number + integer), and will clarify it to you:
always @(posedge clk)
begin
if(tick_1s_tick)
begin
signalnum_reg <= 1;
end
if(signalnum_reg >= 1 && signalnum_reg <= 32)
begin
if(axis_input_ready)
begin
signalnum_reg <= signalnum_reg + 1;
signaldata_reg <= signaldata_reg + 1;
end
axis_input_valid <= 1'b1;
axis_input_payload <= {signalnum_reg, signaldata_reg};
end
else
begin
axis_input_valid <= 1'b0;
end
axis_output_ready <= 1'b1;
if(axis_output_valid)
begin
signalretnum_reg <= axis_output_payload[30-:7];
signalretdata_reg <= axis_output_payload[23:0];
end
end
from vexriscv.
Hmm, one thing to be carefull about aswell, is that the first attempt you will hit i$ d$ refills, so to take mesurements, you realy have to run the code more than once, and then take mesurment of the last execution.
Are your picture from the very first execution ?
Was your code compiled in O3 ?
from vexriscv.
Related Issues (20)
- Instructions to save/restore register to stack is taking 2 clock each HOT 12
- DE0-Nano Board with VexRiscV: IO and Fit Design Issues Including Specific Command Used HOT 3
- Adding VexRiscV as a dependency HOT 2
- FPU plugin to GenFull.scala HOT 3
- EU Funding HOT 3
- Compile C code and run bare metal cycle accurate simulation HOT 3
- Debug instructions executed twice HOT 5
- Exit cycle accurate simulation HOT 1
- Problems with adding FPU in Briey HOT 5
- Problem about how to compile the software that can be used in Vexriscv with FPU HOT 10
- How to use printf function? HOT 10
- About the Csr registers in Vexriscv HOT 2
- How to only modify certain one reset kind of specific Reg in vex core. HOT 1
- How to only modify certain one reset kind of specific Reg in vex core.
- AxiCrossBar with Standard Axi4 Interface in Briey HOT 15
- VexRiscV shift bus fail HOT 3
- rdcycle and rdinstret instructions not working HOT 2
- default bus doesn't expose write mask
- Help for custom instruction HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vexriscv.