It appears that Pamys is unable to properly handle multipliers with unequal port widt

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Parmys fails to properly handle multipliers with unequal input widths about vtr-verilog-to-routing HOT 7 OPEN

WhiteNinjaZ commented on June 30, 2024

Parmys fails to properly handle multipliers with unequal input widths

from vtr-verilog-to-routing.

Comments (7)

amirarjmand93 commented on June 30, 2024 1

Issues with Smaller Multipliers:

I've made some progress in handling multiplier input size less than "max hard block bit width"(25) and greater "min hard block bit width(18)". The changes are located within the pad_multiplier function.
Now it can pass a variety of multipliers of any input size in the design file(diffeq2.v) and turns out OK. but i'm not sure about Pin mapping and node connection functionality.
please take a look @WhiteNinjaZ .
multiplier_v1.txt

from vtr-verilog-to-routing.

vaughnbetz commented on June 30, 2024

@amirarjmand93

from vtr-verilog-to-routing.

WhiteNinjaZ commented on June 30, 2024

Here is a little more detail into what I have found:
Using Valgrind and several print statements, it looks like the issue stems from split_multiplier() function in multiplier.cc. From what I can tell, the problem is that when this function tries to map a 32x32 multiply from diffeq2.v onto the 18x25 multiplier. In this scenario, node->num_output_pins=32+32=64, b0=25 and addbig->num_output_pins=addsmall->num_output_pins=a1b0->num_output_pins + 1=(32-18)+25+1=40. In the final for loop that remaps the pins to the new node the elements b0-1 to b0-1+addbig->num_output_pins are accessed. Since b0-1+addbig->num_output_pins = 64 > node->num_output_pins - 1 we exceed our array bounds by one. If I did things right, I believe the root of the problem is that the assumption (stated in the header for the split_multiplier function) that the multiplication can be balanced to remove an addition only holds if the multiplier to be mapped to contains inputs with equal port widths. Removing this assumption by inserting an if statement and an extra multiplier when the multiplier ports where unequal, I was able to successfully compile diffeq2.v. However, I think I may have connected the nodes together incorrectly as some of the larger benchmarks (i.e. mcml.v) fail the parmys pass. @amirarjmand93 if you could take a look that would be great.

Here are my changes to the multplier.cc file. Basically all my changes are in split_multiplier funciton:
multiplier.txt

from vtr-verilog-to-routing.

amirarjmand93 commented on June 30, 2024

I am exactly approaching split_multiplier function as well as you. also, I have found if the bit width of the desired multiplier (verilog file) becomes lower than the minimum bit width of the designed hard block(architecture file), it turns out correct synthesis. for example, when trying to map 16x16 multiplied (modified diffeq2.v) by an 18x25 multiplier (architecture file), it is OK. but the error turns out when trying a greater number like 20x20 or 32x32 (modified diffeq2.v) into an 18x25 multiplier (architecture file).

my other concern falls in the modified arch file. I tried any manipulation in the test_multiplier_size.xml but I didn't get any error related to modified architecture. Also, Is the modified version of 'k6_frac_N10_frac_chain_mem32K_40nm.xml' a valid one?

from vtr-verilog-to-routing.

amirarjmand93 commented on June 30, 2024

As a quick update, I want to clarify some parts of the issue:

The core of problems fall in iterate_multipliers(netlist_t *netlist) in multiplier.cc file.

Arch file : test_multiplier_size.xml (fixed 25x18)

Design file: diffeq2.v (modified)

Success with Large Multipliers:

I checked the multiplier.txt. I think we have been approaching with the same idea on replacing "Concatinating" methodology((a1 * b1) . (a0 * b0)) with "addsmall2" ((a1 * b1) + (a0 * b0)). Now, calling split_multiplier(node, a0, b0, a1, b1, netlist) function can handle Multiplicand greater than max hard block bit width (25). good job.
so 26x26, 27x27, ... , 35x35 multipliers can be passed well. (35x35 is maximum allowed -> see section 3 )

Issues with Smaller Multipliers:

The problem still persssit for multiplier bit width less than 25(max hard block bit width). so designs with multipliers' bit widths lesser than 25 like 24x24,23x23,...,18x18, cannot be passed through the 25*18 hard block correctly. this error stems from calling pad_multiplier(node, netlist); . inside this function, we have a variable by the name "diffb" which goes to a negative value and the oassert function terminates the program by error.

Issues with Handling Very Large Multipliers:

I think the mcml.v file contains 64x64 multiplication . this large bit width cannot be positioned correctly inside a 25*18 hard block. I think the reason is the Splitter split inputs at once( just for one time) and the split Multiplicants(inputs) must be lesser or equal to min hard block bit width(18). unless Spliiter would have a recursive method to break down new Multiplicants(inputs) again to get fitted in the hard block (D & Q algorithm).
So it should follow: min hard block bit width > multiplier bit width / 2.
(here, in mcml.v and 25x18 arch, min hard block bit width is 18 and multiplier bit width / 2 is 32. so 35x35 is maximum allowance value for input multiplier )

from vtr-verilog-to-routing.

WhiteNinjaZ commented on June 30, 2024

I have run the changes through several different hard multiplier sizes with unequal input widths and all the ones I ran worked on diffeq2.v. Nice work! As you mentioned mcml is still broken because of the 64x64multiplier. Looking at parmys's generated netlist it also looks like a few of the multipliers input pins from the split multiplier a/b functions are completely unconnected. I am currently looking into this and will let you know what I find.

from vtr-verilog-to-routing.

amirarjmand93 commented on June 30, 2024

Thank you Joshua,
I have some suggestions that may be helpful.
please ignore changes in the Padding function and work on your code. next, test on verilog mul design (<18) and (>25). care about the max boundary(not more than 35). in other words, refuse middle range (18~25) multiplier size. keep the arch file intact(18*25). see the netlist connection status.
( I think the mcml works on baseline arch (k6...) because of mono 36 * 36 and dual of 18 * 18 mul hard blocks and satisfies the mentioned inequality. maybe!)

from vtr-verilog-to-routing.

Parmys fails to properly handle multipliers with unequal input widths about vtr-verilog-to-routing HOT 7 OPEN

Comments (7)

Success with Large Multipliers:

Issues with Smaller Multipliers:

Issues with Handling Very Large Multipliers:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs