GithubHelp home page GithubHelp logo

Comments (23)

yegord avatar yegord commented on August 24, 2024

Conditional calls can be implemented using calls and conditional jumps.
In the basic block obtained by getBasicBlockForInstruction you add a conditional jump to a fresh basic block (not having an address) and to the direct successor.
In the fresh basic block you create a call and an unconditional jump to the direct successor.
For a simple example you can have a look at CMOV implementation for x86.
On ARM all conditional instructions are implemented this way.

directSuccessor in InstructionAnalyzers is just conventionally used to denote the basic block starting immediately after the instruction being currently handled, it has no special meaning otherwise. So, I do not understand your last question. Could you elaborate?

from snowman.

nihilus avatar nihilus commented on August 24, 2024

for MIPS delay slots are software details :

  • Bxx : if taken, the next instruction is executed as a delay slot before jumping to the target address. If not taken, the next instruction is just executed normally.
  • BxxL : if taken, the next instruction is executed as a delay slot before jumping to the target address. If not taken, the next instruction is skipped (ignored).

So, as you see, you cannot simply ignore them in the CFG. As an example:

BEQ r1, r2, target // address + 0
ADD r1, r2, r3 // address + 4

    t1 = r1 == r2; // address + 0
    r1 = r2 + r3; // address + 4
   jump t1, target // address + 0
// adress + 8

BEQL r1, r2, target // address + 0
ADD r1, r2, r3 // address + 4

   jump r1 != r2, skip // address + 0
    r1 = r2 + r3; // address + 4
   jump true, target // address + 0
skip: // address + 8

see how exactly the delay slot is emulated here ?

from snowman.

yegord avatar yegord commented on August 24, 2024

I would do the following:

  1. I would generate the code for the instruction immediately following the jump as usual.
  2. I would try to find a jump generated by the jump instruction. You should be able to get it by something like Program::getBasicBlockCovering(instruction->addr() - 1)->getJump() (modulo checks for nullptr).
  3. I would clone the statements generated in step 1 and insert them before that jump using BasicBlock::insertBefore().
  4. I would fix the jump destination: instead of instruction->addr() (or the corresponding basic block pointer) it must become instruction->endAddr() (or the corresponding basic block pointer).

This way, if somebody jumps to the instruction following the jump, it will get to it, and there will be no jump past it.
If somebody will execute the jump, it will execute the next instruction first and only then the jump; it will not execute the original copy of the instruction's IR, due to step 4.

Another issue is, that during step 3 you should update the instruction of the cloned statements to be the jump instruction.
Currently this is not possible (an assert will fire), you can kill that assert.
If you do not change the instruction, weird things may happen if there are jumps to that jump: the basic block containing the jump may be split at a wrong place.

from snowman.

yegord avatar yegord commented on August 24, 2024

By the way, I wonder what should happen if

  1. you jump to the instruction in your delay slot,
  2. the instruction in the delay slot is a jump.

from snowman.

nihilus avatar nihilus commented on August 24, 2024

Wow one could probably do some semi-infinite-regress by having lots of delay slots with branches... But in the end the hardware pipelining will break and give some sort of error in the EPC-register. The reason afaik for having delat slots was to speed up the CPU-pipelining.

from snowman.

nihilus avatar nihilus commented on August 24, 2024

One weird thing is this warning '[Warning] Invalid instruction `b 0x4010cc' at 0x40101c: Cannot assign expressions of different sizes: 32 and 1'.

How should I handle the operand(0) in this case?

from snowman.

yegord avatar yegord commented on August 24, 2024

You assign something of size 1 to something of size 32 somewhere.
Can you maybe show what in InstructionAnalyzer causes this exception?

from snowman.

nihilus avatar nihilus commented on August 24, 2024

Seems like assigning register $zero to constant(0) is the culprit... :-/

from snowman.

yegord avatar yegord commented on August 24, 2024

That's very weird.
Does removing that assignment fix the exception? :-)

from snowman.

nihilus avatar nihilus commented on August 24, 2024

$zeros always holds the value of 0 in 32-bits... Nope, but resizing it to 31 bits changed the errors :-(

from snowman.

yegord avatar yegord commented on August 24, 2024

Are you sure the problem is at that line?
Maybe you set a breakpoint and look at the call stack?

from snowman.

nihilus avatar nihilus commented on August 24, 2024

I'll try my best 👍

from snowman.

yegord avatar yegord commented on August 24, 2024

And it is better not to call directSuccessor within _[].
The resulting expression might not do what you expect.
For example,

_[
    regizter(MipsRegisters::ra()) ^= constant(instruction->endAddr()),
    directSuccessor(),
    call(operand(0))
];

will most likely create an assignment object, discard it, call directSuccessor, discard the result, create a call object, and generate an ir::Call from it.

Calling directSuccessor to end the current basic block is not necessary: IRGenerator will do it for you later.

from snowman.

nihilus avatar nihilus commented on August 24, 2024

Ah, yes... That was a try to do 'branch delay' :-)

from snowman.

nihilus avatar nihilus commented on August 24, 2024

Another question: for multiplication I've a register pair 'hi' and 'lo' which I try to make into one 64-bits register called 'hilo'. Using 'hilo' works, but when I try to bit shift and bit-and out the 'hi' and 'lo' parts I don't know how to do it correctly:
case MIPS_INS_MULTU: {
auto operand0 = operand(0);
auto operand1 = operand(1);
[
regizter(MipsRegisters::hilo()) ^= zero_extend((std::move(operand0) * std::move(operand1)), 64),
regizter(MipsRegisters::hi() ^= unsigned
(regizter(MipsRegisters::hilo())) >> unsigned_(constant(32)),
regizter(MipsRegisters::lo() ^= unsigned_(regizter(MipsRegisters::hilo())) & unsigned_(constant(0xffffffff)),
];
break;
}

should work but apparently I need to set the size of the results to 32 for 'hi' and 'lo'. Any clues?

from snowman.

yegord avatar yegord commented on August 24, 2024
  1. I guess, to compute hilo, you need first to zero-extend and then multiply, not vice versa.
  2. To resize, use truncate().

from snowman.

yegord avatar yegord commented on August 24, 2024

Actually, the last two assignments should not be there.
Your code is equivalent to

union hilo_t {
struct {
uint32_t lo;
uint32_t hi;
};
uint64_t hilo;
};
hilo_t hilo;
hilo.hilo = a * b;
hilo.hi = hilo.hilo >> 32;
hilo.lo = hilo.hilo & 0xffffffff;

This is obviously incorrect, because the last assignment uses a modified by the second assignment version of hilo. (Upd. Sorry, it is actually correct, because the upper bits of hilo do not influence the value of the expression for lo, but both assignments are no-op anyway.)

Also, the decompiler knows about aliasing of memory locations, and it will generate correct (but ugly, with typecasts and pointer arithmetic) code for the case when, e.g., one assigns something to hilo and reads something from lo.

from snowman.

nihilus avatar nihilus commented on August 24, 2024

It sure works when accessing the 'lo' part but not the higher part of the shared memory.

I will try figurer things out... Ive got another problem where set one bit on the 32-bit register. Different sizes etc. Maybe I need to use constant()

from snowman.

yegord avatar yegord commented on August 24, 2024

It sure works when accessing the 'lo' part but not the higher part of the shared memory.

If this is the case, I would like to see a bug report with a minimal example.

Ive got another problem where set one bit on the 32-bit register.

Either you do bit arithmetic, as in C, or you compute the memory location of that bit and assign to it:

auto ml = reg->memoryLocation().shifted(10).resized(1);
_[MemoryLocationExpression(ml) ^= constant(1)];

This will set the 10th bit of reg to 1.

I close the original issue with won't fix resolution: IR language is Turing-complete (modulo finite address space), so it does not need to be extended.

from snowman.

nihilus avatar nihilus commented on August 24, 2024

Id appreciate if you could write some "pseudo-code" for branch delats including branch likely. For me to make a pull req upstreams I want to fix this first.

from snowman.

yegord avatar yegord commented on August 24, 2024

Any pseudocode I will write will be wrong for sure, because the delay handling involves several cases (jumps, calls, conditional calls), and I cannot get all them right without testing. I already mentioned the overall idea in #21 (comment). If you have some concrete problems with a particular step, I can give you some guidance.

Alternatively, you can cleanup everything, push it to me, and I will then add handling of delays.

from snowman.

nihilus avatar nihilus commented on August 24, 2024

Thx... I will first fo my best trying now when I got the conditional handling correct for move, set bits and branching.

from snowman.

hlide avatar hlide commented on August 24, 2024

@yegord

By the way, I wonder what should happen if

  1. you jump to the instruction in your delay slot,
  2. the instruction in the delay slot is a jump.

eheheh

standard MIPS will say: undefined behavior.

In reality, it can lead to some tricks used by some game developers in consoles (PlayStation 1, 2 and portable, N64, etc.).

But the behavior is very dependent on the way hardware is implemented. Such attemps with MIPS32/64R6 will issue now an illegal instruction exception.

For Allegrex, I can tell what happens:

Allegrex conditional branch instructions (Bx) has 1 delay slot + 2 bubbles:

BNE        [IC][IF][RF][EX][DC][DF][WB]
Delay Slot     [IC][IF][RF][EX][DC][DF][WB]
Bubble Slot        [IC][IF][RF][EX][DC][DF][WB]
Bubble Slot            [IC][IF][RF][EX][DC][DF][WB]
Target Slot                [IC][IF][RF][EX][DC][DF][WB] <--- target instruction of BNE

PC is updated by stage EX for BNE because EX does the registers computation in conditional part. 

Allegrex unconditional branch instructions (Jx) has 1 delay slot + 1 bubble.

JR         [IC][IF][RF][EX][DC][DF][WB]
Delay Slot     [IC][IF][RF][EX][DC][DF][WB]
Bubble Slot        [IC][IF][RF][EX][DC][DF][WB]
Target Slot            [IC][IF][RF][EX][DC][DF][WB] <--- target instruction of JR

PC is updated by stage RF for JR.

Since JR is just after BNE in the pipeline, their target instruction will collide in the same slot of the pipeline.

BNE        [IC][IF][RF][EX][DC][DF][WB]
JR             [IC][IF][RF][EX][DC][DF][WB] <--- delay slot!!!
Bubble Slot        [IC][IF][RF][EX][DC][DF][WB] <--- delay slot of JR is turned into bubble by BNE
Bubble Slot            [IC][IF][RF][EX][DC][DF][WB] <--- bubble due to BNE
Target Slot                [IC][IF][RF][EX][DC][DF][WB] <--- target instruction of JR or BNE?

Since a stage EX is executed with a slight delay after a stage RF in a pipeline, stage EX is the last to update PC and that may explain the actual result. JR is really executed but its PC update was overriden by the one given by stage EX.

So to be accurate, we need to check the real category of a branch instruction (that is conditional or unconditional) in the delay slot so to get the right behavior.

In case of a Bx followed by a Jx, we can consider to handle Bx as if it has no delay slot. In the case of a Bx following another Bx, their target slots shouldn't collide so I might be expecting to have something like that:

Bxx #1      [IC][IF][RF][EX][DC][DF][WB]
Bxx #2          [IC][IF][RF][EX][DC][DF][WB] <--- delay slot!!!
Bubble Slot         [IC][IF][RF][EX][DC][DF][WB] <--- delay slot of Bxx #2 is turned into bubble by Bxx #1
Bubble Slot             [IC][IF][RF][EX][DC][DF][WB] <--- bubble due to Bxx
Target #1 Slot/Bubble Slot  [IC][IF][RF][EX][DC][DF][WB] <--- target instruction of Bxx #1  is turned into bubble by Bxx #2 
Target #2 Slot                  [IC][IF][RF][EX][DC][DF][WB] <--- target instruction of Bxx #2

In case of a Jxx followed by a Bxx:

Jxx             [IC][IF][RF][EX][DC][DF][WB]
Bxx                 [IC][IF][RF][EX][DC][DF][WB] <--- delay slot!!!
Bubble Slot             [IC][IF][RF][EX][DC][DF][WB] <--- delay slot of Bxx is turned into bubble by Jxx
Target #1 Slot/Bubble Slot  [IC][IF][RF][EX][DC][DF][WB] <--- bubble due to Bxx
Bubble Slot                     [IC][IF][RF][EX][DC][DF][WB] <--- bubble due to Bxx
Target #2 Slot                      [IC][IF][RF][EX][DC][DF][WB] <--- target instruction of Bxx #2

Again is "Target #1 Slot" turned into a bubble or not?

In case of a Jxx following a Jxx, it sounds similar to the previous case except that there is one less bubble in the pipeline:

Jxx #1          [IC][IF][RF][EX][DC][DF][WB]
Jxx #2              [IC][IF][RF][EX][DC][DF][WB] <--- delay slot!!!
Bubble Slot             [IC][IF][RF][EX][DC][DF][WB] <--- delay slot of Jxx #1 is turned into bubble by Jxx #2?
Target #1 Slot/Bubble Slot  [IC][IF][RF][EX][DC][DF][WB] <--- bubble due to Jxx #2?
Target #2 Slot                  [IC][IF][RF][EX][DC][DF][WB] <--- target instruction of Jxx #2

After some tests, I finally have a pseudo-algorithm:

def interpret_delay_slot(pc, is_cond):
    insn = fetch(pc)    
    if insn.is_cond_branch() is True:
        if insn.test_cond_branch() is True:
            return insn
        else:
            return None
    elif insn.is_uncond_branch() is true:
        if is_cond is false:
            return insn 
        else:
            return None
    else:
        ...

def interpret(pc, delay_slot):
    insn = fetch(pc)
    if insn.is_cond_branch() is True:
        if insn.test_cond_branch() is True:       
            if delay_slot is False:
                delay_slot_insn = interpret_delay_slot(pc + 4, True)
                if delay_slot_insn is not None:
                    return delay_slot_insn.target_branch()
            return insn.target_branch()
        else:
            return pc + 4
    elif insn.is_uncond_branch() is true:
        if delay_slot is False:
            delay_slot_insn = interpret_delay_slot(pc + 4, False)
            if delay_slot_insn is not None:
                return delay_slot_insn.target_branch()
        return insn.target_branch()
    else:
        ...
  1. All conditional branch instructions included COP1 and COP2 have one delay slot + 2 bubbles when branch is taken.

1.1) When not taken, the next instruction is executed as a normal instruction, not as delay slot instruction.

1.2) When taken, if its delay slot instruction is also a conditional branch instruction which is taken, only the second instruction jumps to its target.

1.3) When taken, if its delay slot instruction is a unconditional branch instruction, only the first instruction jumps to its target.

  1. All unconditional branch instructions have one delay slot + 1 bubbles when branch is taken.

2.1) When not taken, the next instruction is executed as a normal instruction, not as delay slot instruction.

2.2) When taken, if its delay slot instruction is a branch instruction which is taken, only the second instruction jumps to its target.

NOTE: BAL has 2 bubbles while JAL has only 1 bubble so JAL is preferable to BAL.

from snowman.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.