GithubHelp home page GithubHelp logo

Comments (5)

overdrivenpotato avatar overdrivenpotato commented on June 11, 2024

Thanks for the detailed bug report.

I'm not sure I see the error here. Wolfram alpha is returning [538, 612, 686, 760], which seems to be the exact same result that the same code generates when using M000. The results in result_transposed differ from what wolfram returns.

Is this not the expected behaviour?

from rust-psp.

gardell avatar gardell commented on June 11, 2024

So I should clarify, the

assert_eq!((result.x, result.y, result.z, result.w), (538.0, 612.0, 686.0, 760.0));
assert_eq!((result_transposed.x, result_transposed.y, result_transposed.z, result_transposed.w), (190.0, 486.0, 782.0, 1078.0));

lines fail.

The output from the first assertion is:

panicked at `assertion failed: `(left == right)`
  left: `(190.0, 486.0, 782.0, 1078.0)`,
right: `(538.0, 612.0, 686.0, 760.0)`, src/main.rs:125:5

from rust-psp.

overdrivenpotato avatar overdrivenpotato commented on June 11, 2024

Ah I see. Interestingly, I fed the same assembly through the original GCC toolchain (which was derived from Sony), and there was no difference in the output.

    ScePspFMatrix4 matrix = {
        { 1.0, 2.0, 3.0, 4.0 },
        { 5.0, 6.0, 7.0, 8.0 },
        { 9.0, 10.0, 11.0, 12.0 },
        { 13.0, 14.0, 15.0, 16.0, },
    };

    ScePspFVector4 vector = { 17.0, 18.0, 19.0, 20.0 };

    ScePspFVector4 result;

    __asm__ volatile (
         "lv.q C000, 0  + %1\n"
         "lv.q C010, 16 + %1\n"
         "lv.q C020, 32 + %1\n"
         "lv.q C030, 48 + %1\n"
         "lv.q C100, %2\n"
         "vtfm4.q C110, M000, C100\n"
         "sv.q C110, %0\n"
         : "=m"(result) : "m"(matrix), "m"(vector) : "memory"
    );

As it turns out, it seems that the transposed registers E___ are designed for column-mode operations. So the assembler generation is actually correct, you just have to use E000 because you are doing a column operation. M000 in this case is actually the transposed matrix.

It may seem counter-intuitive because you are using vtfm4.q with M000 * C100, but this is just a notational trick; you can just as easily do M000 * R100 if you stored vector into R100. The assembler doesn't care, the instruction always assumes the vector operand is a column vector, and thus the matrix operand can be transposed either way.

I wasn't aware of this myself. Should probably add more tests...


BTW: your snippet has some undefined behavior (taking a &mut reference on uninitialized data):

let mut result: psp::sys::ScePspFVector4 = unsafe { core::mem::MaybeUninit::uninit().assume_init() };

You can instead do:

let mut result = core::mem::MaybeUninit::uninit(); // No unsafe necessary

Then you can write into the result with result = in(reg) (result.as_mut_ptr()), and finally .assume_init() after the assembly executes.

from rust-psp.

gardell avatar gardell commented on June 11, 2024

Woah that's so confusing! Thanks for taking the time to look into this though!

But this should mean that the documentation at http://hitmen.c02.at/files/yapspd/psp_doc/chap4.html saying that E000 is the transpose matrix is wrong? I can't see how this wouldn't apply to the general case?

Thanks for the MaybeUninit thing, I haven't used it before.

from rust-psp.

overdrivenpotato avatar overdrivenpotato commented on June 11, 2024

Indeed the description is off. It would perhaps be better to say that the E___ registers represent column-major matrices, while M___ registers represent row-major matrices.

It would then make sense that choosing e.g. M000 in vtfm4.q is actually the transpose of the data, because matrix is accessed by E000 as a column-major matrix. (Accessing a column-major matrix in row-major order is a transposition).

Operating under this definition, sv.q / lv.q therefore must work with row-major matrices, which is why you use M___ when loading/storing vectors. The shorthands C___ and R___ are just notation that you use when working with a row-major matrix. R___ is packed tightly, but C___ has a stride. The rule of thumb then should be do everything in row-major order until you need to access it differently in an instruction like vtfm4.q.

This is why using M___ like normal for loading/storing works in your example, but we have to use E___ when actually manipulating the data, to make sure it is interpreted in column-major format. For what it's worth, the only instructions that this matters for in the assembler are vtfm_ and vhtfm_, because nearly every single other operation isn't sensitive to the matrix format (vmidt, vmzero, vmmov).

The one big exception to this rule would be matrix multiplication. If we want to multiply 2 matrices "normally" with column-major order, according to our definition above we would expect to do vmmul.q E000, E100, E200, but instead we do:

; Column-major multiplication
;
; M000 = M100 * M200
vmmul.q M000, M100, M200

The assembler actually flips the M/E bit to make this happen. I'm not sure why Sony did this. I assume it's because logically applying E___ here would be too confusing for unfamiliar devs writing this common operation, so they just special-cased it. Unfortunately this just made things more complex. Here, the original definition of "transposed matrix" from hitmen.c02.at seems to apply.

EDIT: To make this more strange and subtle, only the M/E bit of the middle register is flipped, but the multiplication order of the two matrices is also flipped while interpreting the instruction. This combination of changes still has the same effect as transposing all registers as described above, so we can pretend that M___ gives us column-major multiplication.

I guess it somewhat means that each instruction has its own definition of "normal" vs "transposed", but if you use it with this mental model I think it should work fine. There may be more exceptions to this rule, but after playing around and examining the opcode tables, this seems to work as expected.

from rust-psp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.