GithubHelp home page GithubHelp logo

Comments (12)

jdmpapin avatar jdmpapin commented on August 26, 2024 1

Now that this issue is open, it occurs to me that this kind of change is probably supposed to be discussed at the OMR architecture meeting before being carried out. I can be the one to put it forth. I doubt that it will be controversial at all

from omr.

Spencer-Comin avatar Spencer-Comin commented on August 26, 2024 1

@IBMJimmyk @Akira1Saitoh I've opened a relevant issue here #6992

from omr.

Spencer-Comin avatar Spencer-Comin commented on August 26, 2024 1

Closing this since #6983 has been merged

from omr.

0xdaryl avatar 0xdaryl commented on August 26, 2024

@Spencer-Comin walked us through this proposal at the OMR Architecture Meeting today. There was consensus around the need and semantics of this new IL opcode. From the community's perspective this proposal is accepted and a PR with the changes can be brought forward.

from omr.

Spencer-Comin avatar Spencer-Comin commented on August 26, 2024

I'm looking into what instructions are currently being generated for arraycmp + arrayCmpLen flag, on the platforms that support it, to see what (if any) changes need to be made to support a 64-bit result.

I'm using the following test to generate an arraycmp with arrayCmpLen flag set, and looking at what I get in instruction selection in the log.

class Arraycmplen {
    static int unitTest1(byte[] a, byte[] b) {
        for (int i = 0; i < a.length; i++) {
            if (a[i] != b[i]) return i;
        }
        return -1;
    }

    public static void main(String[] args) {
        byte[] a, b;
        a = new byte[10000];
        b = a.clone();
        a[a.length-1] ^= -1;
        int x = 0;
        for (int i = 0; i < 100000; i++) {
            x = unitTest1(a, b);
        }
        System.out.println(x);
    }
}
$ java -Xshareclasses:none -Xjit:"dontInline={*unitTest1*},{*unitTest1*}(traceFull,log=unitTest1.log,optLevel=scorching)" Arraycmplen

X

------------------------------
treetop
  arraycmp  <arraycmp>[#221  helper Method] [flags 0x400 0x0 ] (in GPR_0054) (arrayCmpLen )
    aladd (in GPR_0032) (X>=0 internalPtr sharedMemory )
      ==>aRegLoad (in &GPR_0017) (X!=0 SeenRealReference sharedMemory )
      lconst 16 (highWordZero X!=0 X>=0 cannotOverflow )
    aladd (in GPR_0048) (X>=0 internalPtr sharedMemory )
      ==>aRegLoad (in &GPR_0016) (X!=0 SeenRealReference sharedMemory )
      ==>lconst 16 (highWordZero X!=0 X>=0 cannotOverflow )
    i2l (in GPR_0049) (highWordZero X>=0 )
      ==>iloadi (in GPR_0018) (X>=0 cannotOverflow )
------------------------------

lea     GPR_0032, qword ptr [&GPR_0017+0x10]            # LEA8RegMem, SymRef [#429 +16]
lea     GPR_0048, qword ptr [&GPR_0016+0x10]            # LEA8RegMem, SymRef [#430 +16]
mov     GPR_0049, GPR_0018              # MOVZXReg8Reg4
mov     GPR_0054, 0x00000000    # MOV8RegImm4
Label L0048:                    # label # (Start of internal control flow)
mov     GPR_0053, GPR_0049              # MOV8RegReg
shr     GPR_0053, 0x04  # SHR8RegImm1
je      Label L0050                     # JE4
Label L0049:                    # label
movups  FPR_0055, xmmword ptr [GPR_0032+1*GPR_0054]             # MOVUPSRegMem
movups  FPR_0056, xmmword ptr [GPR_0048+1*GPR_0054]             # MOVUPSRegMem
pcmpeqb FPR_0055, FPR_0056              # PCMPEQBRegReg
pmovmskb        GPR_0050, FPR_0055              # PMOVMSKB4RegReg
cmp     GPR_0050, 0x0000ffff    # CMP8RegImm4
jne     Label L0052                     # JNE4
add     GPR_0054, 0x00000010    # ADD8RegImm4
sub     GPR_0053, 0x00000001    # SUB8RegImm4
jg      Label L0049                     # JG4
jmp     Label L0050                     # JMP4
Label L0052:                    # label
not     GPR_0050                        # NOT2Reg
bsf     GPR_0050, GPR_0050              # BSF2RegReg
add     GPR_0054, GPR_0050              # ADD8RegReg
jmp     Label L0057                     # JMP4
Label L0050:                    # label
mov     GPR_0052, GPR_0049              # MOV8RegReg
and     GPR_0052, 0x0000000f    # AND8RegImm4
je      Label L0057                     # JE4
Label L0051:                    # label
mov     GPR_0051, byte ptr [GPR_0048+1*GPR_0054]                # L1RegMem
cmp     byte ptr [GPR_0032+1*GPR_0054], GPR_0051                # CMP1MemReg
jne     Label L0057                     # JNE4
add     GPR_0054, 0x00000001    # ADD8RegImm4
sub     GPR_0052, 0x00000001    # SUB8RegImm4
jg      Label L0051                     # JG4
assocreg                        # assocreg

This looks alright to me, but I'm not an X expert. @0xdaryl the result in GPR_0054 is a 64-bit value, correct?

P

------------------------------
treetop
  arraycmp  <arraycmp>[#240  helper Method] [flags 0x400 0x0 ] (in GPR_0089) (arrayCmpLen )
    aladd (in GPR_0080) (X>=0 internalPtr sharedMemory )
      ==>aRegLoad (in &GPR_0017) (X!=0 SeenRealReference sharedMemory )
      lconst 16 (highWordZero X!=0 X>=0 cannotOverflow )
    aladd (in GPR_0081) (X>=0 internalPtr sharedMemory )
      ==>aRegLoad (in &GPR_0016) (SeenRealReference sharedMemory )
      ==>lconst 16 (highWordZero X!=0 X>=0 cannotOverflow )
    i2l (in GPR_0082) (highWordZero X>=0 )
      ==>iloadi (in GPR_0018) (X>=0 cannotOverflow )
------------------------------

10      addi    GPR_0080, &GPR_0017, 16
13      addi    GPR_0081, &GPR_0016, 16
4       extsw   GPR_0082, GPR_0018
10      ori     GPR_0083, GPR_0082, 0x0
10      Label L0145:    ; (Start of internal control flow)
10      cmpdi   CCR_0086, GPR_0083, 8
10      blt     CCR_0086, Label L0148
10      addi    GPR_0080, GPR_0080, -8
10      addi    GPR_0081, GPR_0081, -8
10      srawi   GPR_0084, GPR_0083, 3
10      mtctr   GPR_0084
10      Label L0149:
10      ldu     GPR_0087, [GPR_0080, 8]
10      ldu     GPR_0088, [GPR_0081, 8]
10      cmpd    CCR_0085, GPR_0087, GPR_0088
10      bne     CCR_0085, Label L0146
10      bdnz    CCR_0085, Label L0149
10      addi    GPR_0080, GPR_0080, 8
10      addi    GPR_0081, GPR_0081, 8
10      Label L0146:
10      mfctr   GPR_0083
10      cmpdi   CCR_0086, GPR_0083, 0
10      subf    GPR_0083, GPR_0083, GPR_0084
10      rlwinm  GPR_0083, GPR_0083, 0000000000000003, FFFFFFFFFFFFFFF8
10      bne     CCR_0086, Label L0153
10      cmpd    CCR_0086, GPR_0083, GPR_0082
10      Label L0153:
10      subf    GPR_0083, GPR_0083, GPR_0082
10      Label L0148:
10      beq     CCR_0086, Label L0151
10      mtctr   GPR_0083
10      addi    GPR_0080, GPR_0080, -1
10      addi    GPR_0081, GPR_0081, -1
10      Label L0147:
10      lbzu    GPR_0087, [GPR_0080, 1]
10      lbzu    GPR_0088, [GPR_0081, 1]
10      cmpw    CCR_0085, GPR_0087, GPR_0088
10      bne     CCR_0085, Label L0152
10      bdnz    CCR_0085, Label L0147
10      Label L0152:
10      mfctr   GPR_0083
10      Label L0151:
10      subf    GPR_0089, GPR_0083, GPR_0082
10      Label L0150:    ; (End of internal control flow)

I'm not familiar with Power assembly. @IBMJimmyk Does this need any changes to for the result in GPR_0089 to be a 64 bit value?

Z

------------------------------
treetop
  arraycmp  <arraycmp>[#246  helper Method] [flags 0x400 0x0 ] (in GPR_0054) (arrayCmpLen )
    aladd (in GPR_0048) (X>=0 internalPtr sharedMemory )
      ==>aRegLoad (in &GPR_0017) (X!=0 SeenRealReference sharedMemory )
      lconst 16 (highWordZero X!=0 X>=0 cannotOverflow )
    aladd (in GPR_0049) (X>=0 internalPtr sharedMemory )
      ==>aRegLoad (in &GPR_0016) (SeenRealReference sharedMemory )
      ==>lconst 16 (highWordZero X!=0 X>=0 cannotOverflow )
    i2l (in GPR_0020) (highWordZero X>=0 unneededConv )
      ==>iloadi (in GPR_0020) (X>=0 cannotOverflow signExtendedTo64BitAtSource )
------------------------------

LA      GPR_0048,#404 16(&GPR_0017)
LA      GPR_0049,#405 16(&GPR_0016)
LGR     GPR_0050,GPR_0020         ; LR=Clobber_eval
SLGFI   GPR_0050,1
LGR     GPR_0054,GPR_0050
Label L0032:     # (Start of internal control flow)
VLL     VRF_0051,GPR_0050,#406 0(GPR_0048)
VLL     VRF_0052,GPR_0050,#407 0(GPR_0049)
VFENEBS VRF_0053,VRF_0051,VRF_0052
BRC     BNH(0x6), Label L0034
LA      GPR_0048,#408 16(GPR_0048)
LA      GPR_0049,#409 16(GPR_0049)
SLGFI   GPR_0050,16
BRC     MASK4(0x3), Label L0032
AGHI    GPR_0054,0x1
BRC     NOP(0xf), Label L0033
Label L0034:
SGR     GPR_0054,GPR_0050
VLGVB   GPR_0050,VRF_0053,#410 7
AGR     GPR_0054,GPR_0050
assocreg
Label L0033:     # (End of internal control flow)

For a constant length, Z generates a different sequence:

------------------------------
treetop
  arraycmp  <arraycmp>[#246  helper Method] [flags 0x400 0x0 ] (in GPR_0069) (arrayCmpLen )
    aladd (in GPR_0064) (X>=0 internalPtr sharedMemory )
      ==>aRegLoad (in &GPR_0017) (SeenRealReference sharedMemory )
      lconst 16 (highWordZero X!=0 X>=0 cannotOverflow )
    aladd (in GPR_0065) (X>=0 internalPtr sharedMemory )
      ==>aRegLoad (in &GPR_0016) (SeenRealReference sharedMemory )
      ==>lconst 16 (highWordZero X!=0 X>=0 cannotOverflow )
    lconst 10000 (in GPR_0066) (highWordZero X!=0 X>=0 )
------------------------------

LA      GPR_0064,#403 16(&GPR_0017)
LA      GPR_0065,#404 16(&GPR_0016)
LGHI    GPR_0066,0x2710
LGR     GPR_0067,GPR_0066
LGR     GPR_0068,GPR_0066
CLCL    GPR_0064,GPR_0065
LGR     GPR_0069,GPR_0066
assocreg
SGR     GPR_0069,GPR_0067

Both of these sequences seem good to me.

AArch64

I couldn't get an arraycmp to generate on AArch64. Just had to build JDK with newer OMR/OpenJ9.

------------------------------
 n548n    (  0)  treetop
 n547n    (  2)    arraycmp  <arraycmp>[#199  helper Method] [flags 0x400 0x0 ] (in GPR_0068) (arrayCmpLen )
 n534n    (  0)      aladd (in GPR_0064) (X>=0 internalPtr )
 n12n     (  0)        ==>aRegLoad (in &GPR_0016) (X!=0 )
 n536n    (  0)        lconst 8 (highWordZero X!=0 X>=0 cannotOverflow )
 n540n    (  0)      aladd (in GPR_0066) (X>=0 internalPtr )
 n313n    (  0)        ==>aRegLoad (in &GPR_0017)
 n536n    (  0)        ==>lconst 8 (highWordZero X!=0 X>=0 cannotOverflow )
 n546n    (  0)      i2l (in GPR_0067) (highWordZero X>=0 )
 n13n     (  1)        ==>iloadi (in GPR_0018) (X>=0 cannotOverflow )
------------------------------

10      addimmx         GPR_0064, &GPR_0016, 8
10      movx    GPR_0065, GPR_0064
13      addimmx         GPR_0066, &GPR_0017, 8
4       sxtwx   GPR_0067, GPR_0018
10      Label L0145:    ; (Start of internal control flow)
10      movw    GPR_0068, GPR_0067
10      cmpx    GPR_0065, GPR_0066
10      ccmpimmw        GPR_0067, 0, 4, ne
10      b.eq    Label L0146      ; Done if src1 and src2 are the same array or length is 0.
10      cmpimmw         GPR_0067, 16
10      b.cc    Label L0149      ; Jumps to lessThan16Label if length < 16.
10      subimmw         GPR_0067, GPR_0067, 16
10      Label L0150:     ; loop16Label
10      ldppostx        GPR_0069, GPR_0071, [GPR_0065, 16]
10      ldppostx        GPR_0070, GPR_0072, [GPR_0066, 16]
10      subsx   GPR_0069, GPR_0069, GPR_0070
10      ccmpx   GPR_0071, GPR_0072, 0, eq
10      b.ne    Label L0147      ; Jumps to notEqual16Label if mismatch is found in the 16-byte data
10      subsimmw        GPR_0067, GPR_0067, 16
10      b.cs    Label L0150      ; Jumps to loop16Label if the remaining length >= 16 and no mismatch is found so far.
10      addimmw         GPR_0067, GPR_0067, 16
10      cbzw    GPR_0067, Label L0148    ; Jumps to done0Label if the remaining length is 0.
10      b       Label L0149      ; Jumps to lessThan16Label
10      Label L0147:     ; notEqual16Label. src register points 16-byte ahead of the location where the data in the registers was read.
10      eorx    GPR_0071, GPR_0071, GPR_0072
10      cmpimmx         GPR_0069, 0
10      cselx   GPR_0069, GPR_0069, GPR_0071, ne
10      movzw   GPR_0070, 0x0001
10      cincx   GPR_0070, GPR_0070, ne
10      subx    GPR_0065, GPR_0065, GPR_0070 lsl 3
10      rbitx   GPR_0069, GPR_0069
10      clzx    GPR_0069, GPR_0069
10      addx    GPR_0065, GPR_0065, GPR_0069 lsr 3
10      b       Label L0148      ; Jumps to done0Label.
10      Label L0149:     ; lessThan16Label
10      subsimmw        GPR_0067, GPR_0067, 1
10      ldrbpost        GPR_0069, [GPR_0065, 1]
10      ldrbpost        GPR_0070, [GPR_0066, 1]
10      ccmpw   GPR_0069, GPR_0070, 0, hi
10      b.eq    Label L0149      ; Jumps to lessThan16Label (byteloop) if the remaining length > 0 and no mismatch is found
10      cmpw    GPR_0069, GPR_0070
10      cset    GPR_0072, ne
10      subx    GPR_0065, GPR_0065, GPR_0072
10      Label L0148:     ; done0Label
10      subx    GPR_0068, GPR_0065, GPR_0064
10      assocreg[&GPR_0016 : x0] [&GPR_0017 : x1]
10      Label L0146:    ; (End of internal control flow)         ; doneLabel

@Akira1Saitoh does this need any changes to produce a 64 bit result?

from omr.

IBMJimmyk avatar IBMJimmyk commented on August 26, 2024

I think just two places need to be changed on Power

10      cmpdi   CCR_0086, GPR_0083, 8
10      blt     CCR_0086, Label L0148
10      addi    GPR_0080, GPR_0080, -8
10      addi    GPR_0081, GPR_0081, -8
10      srawi   GPR_0084, GPR_0083, 3  //<--- This needs to be a sradi
10      mtctr   GPR_0084
10      Label L0149:
10      ldu     GPR_0087, [GPR_0080, 8]
10      ldu     GPR_0088, [GPR_0081, 8]

...

10      Label L0146:
10      mfctr   GPR_0083
10      cmpdi   CCR_0086, GPR_0083, 0
10      subf    GPR_0083, GPR_0083, GPR_0084
10      rlwinm  GPR_0083, GPR_0083, 0000000000000003, FFFFFFFFFFFFFFF8 //<--- This needs to be a rldicr
10      bne     CCR_0086, Label L0153
10      cmpd    CCR_0086, GPR_0083, GPR_0082
10      Label L0153:
10      subf    GPR_0083, GPR_0083, GPR_0082

from omr.

Akira1Saitoh avatar Akira1Saitoh commented on August 26, 2024

In the generated code on AArch64, 32-bit instructions are used for the length register (GPR_0067). I think they need to be changed to 64-bit instructions.

For example, sub instruction below must be changed to 64-bit version.

10      subimmw         GPR_0067, GPR_0067, 16

from omr.

IBMJimmyk avatar IBMJimmyk commented on August 26, 2024

What is the expected behavior of arraycmp on 32bit systems? Will arraycmpLen need to return values that don't fit in a signed 32bit value (>2^31 - 1)? Also is the input length node still signed 32 bit?

from omr.

jdmpapin avatar jdmpapin commented on August 26, 2024

In IL the length child (along with the result type) should always be 64-bit so that the type of arraycmplen doesn't depend on the target bitness, and so that we don't need two separate opcodes either. It does represent the size in bytes of a single allocation in memory though, so on 32-bit I think it's fine for us to assume (and therefore require) that the input length (and so also the output length) is representable as an unsigned 32-bit integer - larger lengths wouldn't make any sense, and neither would negative lengths.

I imagine that on 32-bit the IL would be generated something like this anyway:

l2i
  arraycmplen
    <array a>
    <array b>
    iu2l
      <32-bit length>

Probably the length child of arraycmp should be made 64-bit as well, though I don't think that necessarily needs to happen at the same time (or even at all, since we seem to be dealing with it so far). But it might possibly help, e.g. if there's much logic that currently processes the length child of arraycmp regardless of the flag; any such logic would need to start to handle both arraycmp and arraycmplen, and it could continue to treat them uniformly if the types were kept consistent.

from omr.

Spencer-Comin avatar Spencer-Comin commented on August 26, 2024

Relevant PR open for review now: #6983

from omr.

IBMJimmyk avatar IBMJimmyk commented on August 26, 2024

Just to make sure I have this right.

Original arraycmp node:
-Flag is set to change behavior to arraycmplen
-Length input is a signed 32 bit value. Does not depend on the arraycmplen flag.
-Return value is a signed 32 bit value. Does not depend on the arraycmplen flag.

New arraycmp node:
-No flag to change to arraycmplen
-Length input is still a signed 32 bit value but could be changed to signed 64 bit in the future.
-Return value is a signed 32 bit value (in practice -1, 0, 1 or 2)

New arraycmplen node:
-Length input is a signed 64 bit value. (Can be assumed to fit within an unsigned 32 bit value on 32 bit platforms)
-Return value is a signed 64 bit value.

None of this changes based on the number of bits of the target platform.

The length input to arraycmp being a signed 32 bit value is a bit shaky. I'm not sure about other platforms, but it only works on Power by coincidence. If I try and compare a large array of chars (>2^31-1 bytes), it will process the array length value as being negative. This causes a check against 8 to use 64 bit loads to fail and the entire array is then compared byte by byte load. This gets the right answer (very slowly) but it could have very well have completely fallen over as well.

from omr.

Akira1Saitoh avatar Akira1Saitoh commented on August 26, 2024

The length input to arraycmp being a signed 32 bit value is a bit shaky. I'm not sure about other platforms, but it only works on Power by coincidence.

Agree. I do not think that arraycmp works properly on AArch64 if the array length does not fit in 32 bit.

from omr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.