Hello! First of all, great work on this awesome library! Really easy to use and get st

So I should clarify, the <div class="highlight highlight-source-rust notranslate p

Matrices are transposed about rust-psp HOT 5 CLOSED

overdrivenpotato commented on June 11, 2024

Matrices are transposed

from rust-psp.

Comments (5)

overdrivenpotato commented on June 11, 2024

Thanks for the detailed bug report.

I'm not sure I see the error here. Wolfram alpha is returning [538, 612, 686, 760], which seems to be the exact same result that the same code generates when using M000. The results in result_transposed differ from what wolfram returns.

Is this not the expected behaviour?

from rust-psp.

gardell commented on June 11, 2024

So I should clarify, the

assert_eq!((result.x, result.y, result.z, result.w), (538.0, 612.0, 686.0, 760.0));
assert_eq!((result_transposed.x, result_transposed.y, result_transposed.z, result_transposed.w), (190.0, 486.0, 782.0, 1078.0));

lines fail.

The output from the first assertion is:

panicked at `assertion failed: `(left == right)`
  left: `(190.0, 486.0, 782.0, 1078.0)`,
right: `(538.0, 612.0, 686.0, 760.0)`, src/main.rs:125:5

from rust-psp.

overdrivenpotato commented on June 11, 2024

Ah I see. Interestingly, I fed the same assembly through the original GCC toolchain (which was derived from Sony), and there was no difference in the output.

    ScePspFMatrix4 matrix = {
        { 1.0, 2.0, 3.0, 4.0 },
        { 5.0, 6.0, 7.0, 8.0 },
        { 9.0, 10.0, 11.0, 12.0 },
        { 13.0, 14.0, 15.0, 16.0, },
    };

    ScePspFVector4 vector = { 17.0, 18.0, 19.0, 20.0 };

    ScePspFVector4 result;

    __asm__ volatile (
         "lv.q C000, 0  + %1\n"
         "lv.q C010, 16 + %1\n"
         "lv.q C020, 32 + %1\n"
         "lv.q C030, 48 + %1\n"
         "lv.q C100, %2\n"
         "vtfm4.q C110, M000, C100\n"
         "sv.q C110, %0\n"
         : "=m"(result) : "m"(matrix), "m"(vector) : "memory"
    );

As it turns out, it seems that the transposed registers E___ are designed for column-mode operations. So the assembler generation is actually correct, you just have to use E000 because you are doing a column operation. M000 in this case is actually the transposed matrix.

It may seem counter-intuitive because you are using vtfm4.q with M000 * C100, but this is just a notational trick; you can just as easily do M000 * R100 if you stored vector into R100. The assembler doesn't care, the instruction always assumes the vector operand is a column vector, and thus the matrix operand can be transposed either way.

I wasn't aware of this myself. Should probably add more tests...

BTW: your snippet has some undefined behavior (taking a &mut reference on uninitialized data):

let mut result: psp::sys::ScePspFVector4 = unsafe { core::mem::MaybeUninit::uninit().assume_init() };

You can instead do:

let mut result = core::mem::MaybeUninit::uninit(); // No unsafe necessary

Then you can write into the result with result = in(reg) (result.as_mut_ptr()), and finally .assume_init() after the assembly executes.

from rust-psp.

gardell commented on June 11, 2024

Woah that's so confusing! Thanks for taking the time to look into this though!

But this should mean that the documentation at http://hitmen.c02.at/files/yapspd/psp_doc/chap4.html saying that E000 is the transpose matrix is wrong? I can't see how this wouldn't apply to the general case?

Thanks for the MaybeUninit thing, I haven't used it before.

from rust-psp.

overdrivenpotato commented on June 11, 2024

Indeed the description is off. It would perhaps be better to say that the E___ registers represent column-major matrices, while M___ registers represent row-major matrices.

It would then make sense that choosing e.g. M000 in vtfm4.q is actually the transpose of the data, because matrix is accessed by E000 as a column-major matrix. (Accessing a column-major matrix in row-major order is a transposition).

Operating under this definition, sv.q / lv.q therefore must work with row-major matrices, which is why you use M___ when loading/storing vectors. The shorthands C___ and R___ are just notation that you use when working with a row-major matrix. R___ is packed tightly, but C___ has a stride. The rule of thumb then should be do everything in row-major order until you need to access it differently in an instruction like vtfm4.q.

This is why using M___ like normal for loading/storing works in your example, but we have to use E___ when actually manipulating the data, to make sure it is interpreted in column-major format. For what it's worth, the only instructions that this matters for in the assembler are vtfm_ and vhtfm_, because nearly every single other operation isn't sensitive to the matrix format (vmidt, vmzero, vmmov).

The one big exception to this rule would be matrix multiplication. If we want to multiply 2 matrices "normally" with column-major order, according to our definition above we would expect to do vmmul.q E000, E100, E200, but instead we do:

; Column-major multiplication
;
; M000 = M100 * M200
vmmul.q M000, M100, M200

The assembler actually flips the M/E bit to make this happen. I'm not sure why Sony did this. I assume it's because logically applying E___ here would be too confusing for unfamiliar devs writing this common operation, so they just special-cased it. Unfortunately this just made things more complex. Here, the original definition of "transposed matrix" from hitmen.c02.at seems to apply.

EDIT: To make this more strange and subtle, only the M/E bit of the middle register is flipped, but the multiplication order of the two matrices is also flipped while interpreting the instruction. This combination of changes still has the same effect as transposing all registers as described above, so we can pretend that M___ gives us column-major multiplication.

I guess it somewhat means that each instruction has its own definition of "normal" vs "transposed", but if you use it with this mental model I think it should work fine. There may be more exceptions to this rule, but after playing around and examining the opcode tables, this seems to work as expected.

from rust-psp.

Matrices are transposed about rust-psp HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs