GithubHelp home page GithubHelp logo

Comments (46)

jens-maus avatar jens-maus commented on June 6, 2024

@Futaura @tboeckel

Please find attached a debug build of the current os3/m68k version of amissl with the debug output so that you can try to debug things regarding the digest check failure and the test-cipherlist failure (ticket #1).

amissl4_os3.tar.gz

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

FYI:

Please note that I was finally able to reproduce the problem using the openssl s_client command-line tool using the following sequence:

openssl s_client -connect pop.gmail.com:995 -tls1_2

Note, that this will not return any error, but using the just added debug output one can see that while using the OS4 build outputs matching memory areas using CRYPTO_memcmp() the following is the output for the OS3/m68k build:

        ../openssl/ssl/s3_both.c:271:BEFORE CRYPTO_memcmp(): 4149c88c 4149ba00 12
        ../openssl/crypto/cryptlib.c:1052:IN CRYPTO_memcmp(): 4149c88c 4149ba00 12
        ../openssl/crypto/cryptlib.c:1055:a[0]=df == b[0]=81
        ../openssl/crypto/cryptlib.c:1055:a[1]=af == b[1]=a3
        ../openssl/crypto/cryptlib.c:1055:a[2]=aa == b[2]=23
        ../openssl/crypto/cryptlib.c:1055:a[3]=7f == b[3]=41
        ../openssl/crypto/cryptlib.c:1055:a[4]=03 == b[4]=a2
        ../openssl/crypto/cryptlib.c:1055:a[5]=4c == b[5]=8b
        ../openssl/crypto/cryptlib.c:1055:a[6]=9b == b[6]=04
        ../openssl/crypto/cryptlib.c:1055:a[7]=c7 == b[7]=35
        ../openssl/crypto/cryptlib.c:1055:a[8]=a3 == b[8]=af
        ../openssl/crypto/cryptlib.c:1055:a[9]=e0 == b[9]=9f
        ../openssl/crypto/cryptlib.c:1055:a[10]=c4 == b[10]=be
        ../openssl/crypto/cryptlib.c:1055:a[11]=26 == b[11]=33

Please note that a and b are always different here while they should actually match.

Interestingly changing the openssl s_client execution to

openssl s_client -connect pop.gmail.com:995 -ssl3

results in a succeeding comparison:

          ../openssl/ssl/s3_both.c:271:BEFORE CRYPTO_memcmp(): 414d582c 414d49a0 36
          ../openssl/crypto/cryptlib.c:1052:IN CRYPTO_memcmp(): 414d582c 414d49a0 36
          ../openssl/crypto/cryptlib.c:1055:a[0]=21 == b[0]=21
          ../openssl/crypto/cryptlib.c:1055:a[1]=33 == b[1]=33
          ../openssl/crypto/cryptlib.c:1055:a[2]=3b == b[2]=3b
          ../openssl/crypto/cryptlib.c:1055:a[3]=cc == b[3]=cc
          ../openssl/crypto/cryptlib.c:1055:a[4]=cd == b[4]=cd
          ../openssl/crypto/cryptlib.c:1055:a[5]=8d == b[5]=8d
          ../openssl/crypto/cryptlib.c:1055:a[6]=d7 == b[6]=d7
          ../openssl/crypto/cryptlib.c:1055:a[7]=e6 == b[7]=e6
          ../openssl/crypto/cryptlib.c:1055:a[8]=61 == b[8]=61
          ../openssl/crypto/cryptlib.c:1055:a[9]=fc == b[9]=fc
          ../openssl/crypto/cryptlib.c:1055:a[10]=e4 == b[10]=e4
          ../openssl/crypto/cryptlib.c:1055:a[11]=02 == b[11]=02
          ../openssl/crypto/cryptlib.c:1055:a[12]=16 == b[12]=16
          ../openssl/crypto/cryptlib.c:1055:a[13]=0d == b[13]=0d
          ../openssl/crypto/cryptlib.c:1055:a[14]=6b == b[14]=6b
          ../openssl/crypto/cryptlib.c:1055:a[15]=cc == b[15]=cc
          ../openssl/crypto/cryptlib.c:1055:a[16]=3e == b[16]=3e
          ../openssl/crypto/cryptlib.c:1055:a[17]=cc == b[17]=cc
          ../openssl/crypto/cryptlib.c:1055:a[18]=d1 == b[18]=d1
          ../openssl/crypto/cryptlib.c:1055:a[19]=3c == b[19]=3c
          ../openssl/crypto/cryptlib.c:1055:a[20]=00 == b[20]=00
          ../openssl/crypto/cryptlib.c:1055:a[21]=b3 == b[21]=b3
          ../openssl/crypto/cryptlib.c:1055:a[22]=78 == b[22]=78
          ../openssl/crypto/cryptlib.c:1055:a[23]=57 == b[23]=57
          ../openssl/crypto/cryptlib.c:1055:a[24]=fd == b[24]=fd
          ../openssl/crypto/cryptlib.c:1055:a[25]=3b == b[25]=3b
          ../openssl/crypto/cryptlib.c:1055:a[26]=f5 == b[26]=f5
          ../openssl/crypto/cryptlib.c:1055:a[27]=93 == b[27]=93
          ../openssl/crypto/cryptlib.c:1055:a[28]=9d == b[28]=9d
          ../openssl/crypto/cryptlib.c:1055:a[29]=61 == b[29]=61
          ../openssl/crypto/cryptlib.c:1055:a[30]=5d == b[30]=5d
          ../openssl/crypto/cryptlib.c:1055:a[31]=ff == b[31]=ff
          ../openssl/crypto/cryptlib.c:1055:a[32]=57 == b[32]=57
          ../openssl/crypto/cryptlib.c:1055:a[33]=ab == b[33]=ab
          ../openssl/crypto/cryptlib.c:1055:a[34]=ec == b[34]=ec
          ../openssl/crypto/cryptlib.c:1055:a[35]=48 == b[35]=48
          ../openssl/ssl/s3_both.c:277:AFTER CRYPTO_memcmp()

Another interesting observation is, that as soon as I use -tls1_1 instead of -tls1_2 connections also start to work without the digest check problems:

openssl s_client -connect pop.gmail.com:995 -tls1_1

                ../openssl/ssl/s3_both.c:271:BEFORE CRYPTO_memcmp(): 414ff23c 414e0c00 12
                ../openssl/crypto/cryptlib.c:1052:IN CRYPTO_memcmp(): 414ff23c 414e0c00 12
                ../openssl/crypto/cryptlib.c:1055:a[0]=c4 == b[0]=c4
                ../openssl/crypto/cryptlib.c:1055:a[1]=0c == b[1]=0c
                ../openssl/crypto/cryptlib.c:1055:a[2]=18 == b[2]=18
                ../openssl/crypto/cryptlib.c:1055:a[3]=b1 == b[3]=b1
                ../openssl/crypto/cryptlib.c:1055:a[4]=f9 == b[4]=f9
                ../openssl/crypto/cryptlib.c:1055:a[5]=81 == b[5]=81
                ../openssl/crypto/cryptlib.c:1055:a[6]=d7 == b[6]=d7
                ../openssl/crypto/cryptlib.c:1055:a[7]=8b == b[7]=8b
                ../openssl/crypto/cryptlib.c:1055:a[8]=82 == b[8]=82
                ../openssl/crypto/cryptlib.c:1055:a[9]=36 == b[9]=36
                ../openssl/crypto/cryptlib.c:1055:a[10]=bd == b[10]=bd
                ../openssl/crypto/cryptlib.c:1055:a[11]=8f == b[11]=8f
                ../openssl/ssl/s3_both.c:277:AFTER CRYPTO_memcmp()

So it seems that the issue might be actually limited to TLSv1.2 connections because SSL3 and TLSv1.1 connections seem to work fine here.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

I agree. After initially suspecting particular ciphers, I too found
that TLSv1.1 seemed to work ok, whereas TLSv1.2 did not, even when
using the same ciphers.

Oliver Roberts / @Futaura / www.futaura.co.uk

from amissl.

Futaura avatar Futaura commented on June 6, 2024

I'm not completely sure on this yet, but it is looking like it errors only on ciphers using SHA256 (SHA384 works OK). Doesn't explain why it works on OS4, but I'm concentrating my efforts there for now.

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

Thanks for the hint. I could actually verify that while the following fails:

openssl s_client -connect pop.gmail.com:995 -tls1_2 -cipher ECDHE-RSA-AES128-GCM-SHA256

the following s_client execution perfectly works without a digest failure:

openssl s_client -connect pop.gmail.com:995 -tls1_2 -cipher ECDHE-RSA-AES256-GCM-SHA384

from amissl.

Futaura avatar Futaura commented on June 6, 2024

It's weird though because these work (most of the time!):

openssl s_client -connect ssllabs.com:443
openssl s_client -connect ssllabs.com:443 -tls1_2

But this doesn't (gives "tlsv1 alert decrypt error"):

openssl s_client -connect ssllabs.com:443 -cipher ECDHE-RSA-AES128-GCM-SHA256

Here all three result in the same cipher and TLSv1.2 being used.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

In case you have not already discovered, the -msg option for s_client is quite useful. It can be seen that the error always occurs in the reply from ChangeCipherSpec where a 51 alert "fatal decrypt error" is given by the client. If I'm not mistaken the received 5 byte message directly above that error should exactly match the message sent directly after the ClientKeyExchange message - the received/decrypted data looks wrong, at least compared to the results with the OS4 libs where they match. Might help narrow the problem down.

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

Thanks for the hint. I actually didn't know about the "-msg" option. But I already added quite a bunch of debug output to various related files to see the same happening somehow. The question really is if "p" or "s->s3->tmp.peer_finish_md" is incorrectly calculated here (https://github.com/jens-maus/amissl/blob/master/openssl/ssl/s3_both.c#L272).

Unfortunately it is quite hard for me to try to analyze this problem since I have almost no understanding of the cipher algorithms and their output is of course always different for every run, so it is really hard to debug at which position the code produces different results for the OS3 and OS4 build. Any help of course appreciated!

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

Please note that when I temporarily comment out the whole CRYPTO_memcmp() call in https://github.com/jens-maus/amissl/blob/master/openssl/ssl/s3_both.c#L272 the TLS1.2 connection succeeds correctly and doesn't throw any "digest check failed" error anymore. I could also verify that YAM started to successfully open a TLSv1.2 connection to pop.gmail.com as soon as that CRYPTO_memcmp() check is commented out.

If this is consistent after more checks I think then the calculation of s->init_msg seems to be the problem and we have to find out why its calculation fails for OS3.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

This is a bit of a stab in the dark, but have you tried disabling the optimised memset, memcpy, memmove, etc, in libcmt, and replacing them with simple byte loops?

As I'm sure you've found too, it is very hard to trace back through the code before the CRYPTO_memcmp(), due to the abundance of function pointers, etc. I've not given up yet though.

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

I just tried your suggestion in disabling the optimized memset, memcpy, memmove, etc. by using the simply byte-wise operations. However, this didn't solve the problems.

So I still think we have to continue to go the hard road in tracing to trace back the reason why either s->init_msg or s->s3->tmp.peer_finish_md in correctly calculated.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

OK - thanks. It also has to be considered if it is really just the hash
itself that is being miscalculated, or whether the actual data itself is
corrupt somehow. It could also be a bug in OpenSSL and/or an uninitialised
variable or something like that - sometimes such things are masked by the
compiler. Anyway, if we can figure out when and where those hashes are
created, we can dump the data.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

Have been playing with OpenSSL s_client and s_server together, with -msg and -debug options on both, and it can be seen that the "Finished" handshake message right before it errors is the same on both the client and server, but the server detects something is wrong with it and sends the fatal decrypt error back. So, I suspect it is wrong to focus too much on the CRYPTO_memcmp(), where s->init_msg matches the handshake data sent and s->s3->tmp.peer_finish_md doesn't (possibly because it was never calculated and contains stale data, as the server had already issued the alert and no "Finished" handshake response). In which case, that would mean the client calculates s->init_msg incorrectly in the first place, before encryption. And it does not appear to be linked to a unique cipher component.

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

I was also playing around with openssl s_client and openssl s_server yesterday as I feel that this might be the right way in trying to find the root of the problem. However, commenting out the CRYPTO_memcmp results in no error being reported on the client or server side and thus the connection succeeds and the SSL/TLS encryption perfectly works from there on. So please find attached a OS3 version of amissl_v102f.library with the CRYPTO_memcmp commented out so that you can test yourself that with that version everything works:

amisslv4_os3.tar.gz

However, of course, this isn't a solid solution and we still have to find out why either init_msg or tmp.peer_finish_md is incorrectly calculated. So that everyone is able to try to debug this situation here is a short howto with some helpful commands:

  1. Create private key and certificate for s_server execution:
$ openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 -nodes
  1. Start s_server listening at port 4433:
$ openssl s_server -key key.pem -cert cert.pem -accept 4433 -debug -msg
  1. Now go to your Amiga system and executed the following command (make sure to replace <IP> with the local IP of the system you have started the s_server process):
$ openssl s_client -connect <IP>:4433 -tls1_2 -cipher ECDHE-RSA-AES128-GCM-SHA256 -msg -debug

Using this command sequence one should see an output like:

[...]
write to 0x7fce9a5016f0 [0x7fce9b80f200] (6 bytes => 6 (0x6))
0000 - 14 03 03 00 01 01                                 ......
>>> ??? [length 0005]
    16 03 03 00 28
>>> TLS 1.2 Handshake [length 0010], Finished
    14 00 00 0c 27 a9 dd 55 76 a9 24 69 95 67 a3 02
[...]
<<< TLS 1.2 Alert [length 0002], fatal decrypt_error
    02 33
ERROR
140735216296016:error:1409441B:SSL routines:ssl3_read_bytes:tlsv1 alert decrypt error:s3_pkt.c:1461:SSL alert number 51
shutting down SSL

Please note the "TLS 1.2 Handshake" message with the hex numbers starting at "27 a9 dd 55..." being the actual SHA256 digest not being correct and thus the client returns the "TLS 1.2 Alert" message to the server ending up in the decrypt_error as far as I understand it.

When using the version with commented out CRYPTO_memcmp() the handshake looks like:

[...]
write to 0x7fce9a5016f0 [0x7fce9c001000] (6 bytes => 6 (0x6))
0000 - 14 03 03 00 01 01                                 ......
>>> ??? [length 0005]
    16 03 03 00 28
>>> TLS 1.2 Handshake [length 0010], Finished
    14 00 00 0c c5 6f 7a c4 09 db 45 5f b8 6f 85 f7
[...]

So the final "Finished" handshake looks essentially the same but the client simply doesn't return a fatal TLS alert and thus the connection continue to work and the server is able to decrypt the messages send by the client properly.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

I've added some more debug output which might hopefully provide a clue. Could you upload another 68k amissl_v102f.library (with CRYPTO_memcmp enabled) so I can test it?

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

Please find attached the updated amisslv4_os3.tar.gz package with your debug output added:
amisslv4_os3.tar.gz

However, please see my latest checkin (a6c005f) as I think I have finally found the root cause of the problem. If you look at the openssl/crypto/sha/sha256.c file there are two different sha256_block_data_order() function implementations separated by a OPENSSL_SMALL_FOOTPRINT define. As soon as I add this define the TLS1.2 connections start working properly with SHA256 digest. This is reproducible here.

So, however, the final question remains: What is so different in these two implementations that the default one fails on OS3/m68k while with OPENSSL_SMALL_FOOTPRINT the sha256 digest starts to work immediately? It would be great @Futaura if you could probably investigate that with your ASM knowledge. I will also try to add some more debugging output along this function to see at which stage of the TLS negotiation the digest in calculated incorrectly.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

The sha256 code doesn't always fail though - it works fine in some cases, so probably dependent on the source data (which my debug outputs).

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

I tried to investigate a bit further and have added the following debug output to the standard sha256_block_data_order() function:

 ../openssl/crypto/sha/sha256.c:315:LITTLE ENDIAN BRANCH!!!: is_endian.little = 0, in=10ca4d6f, (in % 4) = 3, sizeof(SHA_LONG) = 4
  ../openssl/crypto/sha/sha256.c:315:LITTLE ENDIAN BRANCH!!!: is_endian.little = 0, in=10ca4d6f, (in % 4) = 3, sizeof(SHA_LONG) = 4
  ../openssl/crypto/sha/sha256.c:315:LITTLE ENDIAN BRANCH!!!: is_endian.little = 0, in=10ca4d5a, (in % 4) = 2, sizeof(SHA_LONG) = 4
  ../openssl/crypto/sha/sha256.c:315:LITTLE ENDIAN BRANCH!!!: is_endian.little = 0, in=10ca4d5a, (in % 4) = 2, sizeof(SHA_LONG) = 4

As you can see in case of the TLS handshake the sha256 calculation enters the little endian branch within this function. And this is caused because the openssl developers are expecting the "in" pointer to be at an address always dividable by 4 or otherwise the little endian branch is executed with byte swapping, etc.

When removing the (in % 4) == 0 check everything is working correct and the SHA256 digest is correctly calculated.

So can you perhaps try to explain why the openssl developers have implementing it like this expecting that if addresses are not dividable by 4 they are applying little endianness aware calculations? And then why is this working of OS4/PPC and only breaks with OS3/m68k?

from amissl.

Futaura avatar Futaura commented on June 6, 2024

Presumably, those latest libs have OPENSSL_SMALL_FOOTPRINT defined? None of my test cases are failing now. That said, the source data is showing as identical in cases both where it would normally fail or succeed.

With your latest debug, could you try s_client -connect ssllabs.com:443 and s_client -connect ssllabs.com:443 -cipher ECDHE-RSA-AES128-GCM-SHA256. The former should succeed wheras the latter should fail, despite the ciphers being the same. Hopefully, this will confirm the addresses being a multiple of 4 or not.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

It probably works on OS4 because of higher alignment restrictions (in is always divisible by 4). I've not found where in is allocated yet - might be on the stack. No explanation on the % 4 part yet - a few days ago I thought the little endian stuff looked a bit weird, especially compared to sha512.c.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

If I'm not mistaken the two branches of code in sha256_block_data_order are essentially identical on big endian hosts - there is no byte swapping, looking at the HOST_c2l macro when it comes to compiling on 68k. Forcing the OS4 version to use the HOST_c2l portion still works, so perhaps the compiler is messing something up there for 68k?

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

That could actually be the case, yes. Later I will try to debug this function a bit further. But perhaps you can see if you could debug the asm of that function to see what is really causing this?!? Of course we could also simply comment out that branch of the function. But I want to make sure that the issue is not caused by some other macro that might be used in other functions. Thus, I think we should really try to understand what exactly is causing the current behaviour.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

There is so much asm in that function - it is very hard to debug with debug output, etc. So, we know the small footprint version works, but if you could undefine OPENSSL_SMALL_FOOTPRINT and double check that the first branch in the other function works and the second branch doesn't, that would be useful - perhaps by replacing the in % 4 part with 0 or 1 to test each branch. Also, try different optimisation levels when compiling sha256.c. In the meantime, I'm going to rip the code out of sha256.c and create an independent test program to see if the failures still show up when compiling for 68k here - hopefully they will and then it will be easier for me to debug.

BTW, closed this by accident - so easy to do so on a tablet, with the close button being next to the comment button on the website!

from amissl.

Futaura avatar Futaura commented on June 6, 2024

So, my sha256 test code is producing identical results on PPC and 68k, with both unaligned and aligned data. It may be worth outputting the K256 array from inside sha256_block_data_order to see if it matches the source code - this is an a4 relative data reference, so just curious if this could be a baserel/reloc issue (I'm guessing not, but am running out if ideas).

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

I doubt that outputting K256 will show any difference since this data is also used in the other working branch of the sha256_block_data_order() function.

But after some investigation I found the following.

Have a look at my debug output for the working OS4 code:

../openssl/crypto/sha/sha256.c:311:LITTLE ENDIAN BRANCH!!!: is_endian.little = 0, in=3dff301e, (in % 4) = 2, sizeof(SHA_LONG) = 4
../openssl/crypto/sha/sha256.c:312:a=6a09e667 b=bb67ae85 c=3c6ef372 d=a54ff53a e=510e527f f=9b05688c g=1f83d9ab h=5be0cd19
../openssl/crypto/sha/sha256.c:316:0: 5a 30 a0 bc
../openssl/crypto/sha/sha256.c:316:1: 14 cd 66 20
../openssl/crypto/sha/sha256.c:316:2: 0a 32 2f d3
../openssl/crypto/sha/sha256.c:316:3: e2 46 cb 4a
../openssl/crypto/sha/sha256.c:316:4: 31 88 fd 62
../openssl/crypto/sha/sha256.c:316:5: 0d 66 4e 05
../openssl/crypto/sha/sha256.c:316:6: 97 44 4a c3
../openssl/crypto/sha/sha256.c:316:7: 63 f6 60 d8
../openssl/crypto/sha/sha256.c:316:8: 0a d9 d9 a0
../openssl/crypto/sha/sha256.c:316:9: 23 ca 96 6b
../openssl/crypto/sha/sha256.c:316:10: 36 95 2c 32
../openssl/crypto/sha/sha256.c:316:11: 7b 55 8c f8
../openssl/crypto/sha/sha256.c:316:12: c0 97 47 3f
../openssl/crypto/sha/sha256.c:316:13: 74 05 fc 65
../openssl/crypto/sha/sha256.c:316:14: cf 48 34 9d
../openssl/crypto/sha/sha256.c:316:15: a5 1d 05 c9
../openssl/crypto/sha/sha256.c:384:a=a3bc1ba8 b=4faf3a0e c=1c94a644 d=71e3e048 e=edb43d81 f=b8851361 g=cabcb9b4 h=7b91697c
../openssl/crypto/sha/sha256.c:386:X[0] = 5a30a0bc
../openssl/crypto/sha/sha256.c:386:X[1] = 14cd6620
../openssl/crypto/sha/sha256.c:386:X[2] = 0a322fd3
../openssl/crypto/sha/sha256.c:386:X[3] = e246cb4a
../openssl/crypto/sha/sha256.c:386:X[4] = 3188fd62
../openssl/crypto/sha/sha256.c:386:X[5] = 0d664e05
../openssl/crypto/sha/sha256.c:386:X[6] = 97444ac3
../openssl/crypto/sha/sha256.c:386:X[7] = 63f660d8
../openssl/crypto/sha/sha256.c:386:X[8] = 0ad9d9a0
../openssl/crypto/sha/sha256.c:386:X[9] = 23ca966b
../openssl/crypto/sha/sha256.c:386:X[10] = 36952c32
../openssl/crypto/sha/sha256.c:386:X[11] = 7b558cf8
../openssl/crypto/sha/sha256.c:386:X[12] = c097473f
../openssl/crypto/sha/sha256.c:386:X[13] = 7405fc65
../openssl/crypto/sha/sha256.c:386:X[14] = cf48349d
../openssl/crypto/sha/sha256.c:386:X[15] = a51d05c9

Please note that each X[i] should be essentially the same like the Γ¬:above. So by looking at the example OS4 output you will notice thatX[0] = 5a30a0bcand above that you will see0: 5a 30 a0 bc` which shows that this is equal and fine also for the rest of the X values in the X array.

But now see the output for the broken OS3/m68k code:

../openssl/crypto/sha/sha256.c:311:LITTLE ENDIAN BRANCH!!!: is_endian.little = 0, in=10ce964a, (in % 4) = 2, sizeof(SHA_LONG) = 4
../openssl/crypto/sha/sha256.c:312:a=9a00f688 b=40d8fa6d c=4b8007dc d=0c3df705 e=ab9bd7a2 f=907e556e g=d5972d6b h=7549a15f
../openssl/crypto/sha/sha256.c:316:0: e9 23 4d d9
../openssl/crypto/sha/sha256.c:316:1: 62 c4 7a 07
../openssl/crypto/sha/sha256.c:316:2: 30 0e d5 9a
../openssl/crypto/sha/sha256.c:316:3: 5f b1 89 a4
../openssl/crypto/sha/sha256.c:316:4: fe 64 46 3e
../openssl/crypto/sha/sha256.c:316:5: 25 47 e0 19
../openssl/crypto/sha/sha256.c:316:6: 5e 0e 4e d6
../openssl/crypto/sha/sha256.c:316:7: c9 13 39 55
../openssl/crypto/sha/sha256.c:316:8: ff 42 6b 14
../openssl/crypto/sha/sha256.c:316:9: 09 49 1f f5
../openssl/crypto/sha/sha256.c:316:10: 94 a9 3f 14
../openssl/crypto/sha/sha256.c:316:11: b4 b0 d2 90
../openssl/crypto/sha/sha256.c:316:12: b4 a8 30 5d
../openssl/crypto/sha/sha256.c:316:13: 88 fb 8b c6
../openssl/crypto/sha/sha256.c:316:14: fe 4f ea 5e
../openssl/crypto/sha/sha256.c:316:15: 80 41 fc 0a
../openssl/crypto/sha/sha256.c:384:a=fa85fd5f b=cea098e7 c=a4bf6e58 d=81a45ffb e=0f8681cd f=7412ded1 g=ffb087e0 h=a3a40f68
../openssl/crypto/sha/sha256.c:386:X[0] = e9234dd9
../openssl/crypto/sha/sha256.c:386:X[1] = 62c47a07
../openssl/crypto/sha/sha256.c:386:X[2] = 300ed59a
../openssl/crypto/sha/sha256.c:386:X[3] = 0ed59a5f
../openssl/crypto/sha/sha256.c:386:X[4] = d59a5fb1
../openssl/crypto/sha/sha256.c:386:X[5] = 9a5fb189
../openssl/crypto/sha/sha256.c:386:X[6] = 5fb189a4
../openssl/crypto/sha/sha256.c:386:X[7] = b189a4fe
../openssl/crypto/sha/sha256.c:386:X[8] = 89a4fe64
../openssl/crypto/sha/sha256.c:386:X[9] = a4fe6446
../openssl/crypto/sha/sha256.c:386:X[10] = fe64463e
../openssl/crypto/sha/sha256.c:386:X[11] = 64463e25
../openssl/crypto/sha/sha256.c:386:X[12] = 463e2547
../openssl/crypto/sha/sha256.c:386:X[13] = 3e2547e0
../openssl/crypto/sha/sha256.c:386:X[14] = 2547e019
../openssl/crypto/sha/sha256.c:386:X[15] = 47e0195e

Please note that here X[0] till X[2] shows fine output, but starting with X[3] the values of X becomes strangely shuffled. For example, note that X[3] = 0ed59a5f is reported, while it should be 5f b1 89 a4. And if you look closer and analyze the differences in the values you will notice that the start of the X[3] values actually consists of bytes from the X[2] output, thus 0ed59a at the start of X[3] is actually part of the values of X[2] = 300ed59a.

I still have no explanation for this, but to me this currently looks like that indeed HOST_c2l() is to blame here at least partly.

Opinions?

from amissl.

Futaura avatar Futaura commented on June 6, 2024

Can you let me know the full GCC command line for 68k that sha256.c is compiled with? I cannot reproduce the issue here so far, but I'll compare the asm from your build and mine - that debug output is ideal, and at least I now know where to look in the asm.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

I can reproduce it now. Looks like a GCC bug - I get the problem (fails at X[1] for me) when compiling with -fbaserel. It goes wrong in conjunction with -O2 or -O3. Works here with -O1. Going to have a closer look at the asm now.

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

Well, what should I say. I think I have at least identified the compiler flag that seems to trigger that problem. It actually doesn't seem to be the -Ox option but the -fomit-frame-pointer option I am usually using to make the code even a bit small and which didn't show any such problems in the past. However, now that you suggested to look at the compiler optimization flags I was removing the -fomit-frame-pointer and voila it worked instantly. I could even raise the optimization level back to -O3 and just removed -fomit-frame-pointer option.

While removing this compiler options seems to have solved the "digest check failed" problem I still don't really understand why this option might have a negative effect for the OS3/m68k target. So it would be great if someone could continue to debug the ASM code and try to understand the effects of this option. Please find attached a new debug version of AmiSSL based on the latest sources in the master branch. So please see if this version really works now flawlessly.

amisslv4_os3.tar.gz

from amissl.

Futaura avatar Futaura commented on June 6, 2024

I'm not so sure -fomit-frame-pointer is the real cause - using it or not of course alters the register usage. Same could be said for -fbaserel triggering it. Whatever, it seems the optimiser is barfing on the source code, trashing the data variable. Here's a snippet of the assembly, with some comments:

     990: 2a6f 0098       moveal %sp@(152),%fp   ; put data in a5
     994: 52af 0098       addql #1,%sp@(152)     ; should be adding 4, not 1
     998: 101d            moveb %fp@+,%d0        ; HOST_c2l *((c)++)
     99a: 2c00            movel %d0,%d6
     99c: 7218            moveq #24,%d1
     99e: e3ae            lsll %d1,%d6
     9a0: 4280            clrl %d0
     9a2: 101d            moveb %fp@+,%d0        ; HOST_c2l *((c)++)
     9a4: 7210            moveq #16,%d1
     9a6: e3a8            lsll %d1,%d0
     9a8: 8c80            orl %d0,%d6
     9aa: 4280            clrl %d0
     9ac: 101d            moveb %fp@+,%d0        ; HOST_c2l *((c)++)
     9ae: e188            lsll #8,%d0
     9b0: 8c80            orl %d0,%d6
     9b2: 4280            clrl %d0
     9b4: 101d            moveb %fp@+,%d0        ; HOST_c2l *((c)++)
     9b6: 8c80            orl %d0,%d6
     9b8: 2f46 00b0       movel %d6,%sp@(176)
     9bc: 2408            movel %a0,%d2
     9be: ec9a            rorl #6,%d2
     9c0: 2008            movel %a0,%d0
     9c2: 7215            moveq #21,%d1
     9c4: e3b8            roll %d1,%d0
     9c6: b182            eorl %d0,%d2
     9c8: 2008            movel %a0,%d0
     9ca: ef98            roll #7,%d0
     9cc: b182            eorl %d0,%d2
     9ce: d4af 00a4       addl %sp@(164),%d2
     9d2: 2208            movel %a0,%d1
     9d4: 2609            movel %a1,%d3
     9d6: c283            andl %d3,%d1
     9d8: 2008            movel %a0,%d0
     9da: 4680            notl %d0
     9dc: c0af 00a8       andl %sp@(168),%d0
     9e0: b181            eorl %d0,%d1
     9e2: d481            addl %d1,%d2
     9e4: 4bec 807e       lea %a4@(-32642),%fp   ; load K256[] address in a5
...
     a30: 2a6f 0098       moveal %sp@(152),%fp   ; get data (now incorrect)
     a34: 52af 0098       addql #1,%sp@(152)

The code doesn't always compile like this - when the first few HOST_c2l()s succeed, it is because %fp is only used for data, and is not reloaded from the stack (unlike above), so it just works as %fp is incremented. It's only when some other operation tries to use %fp that it is then reloaded from the stack when data is referenced (with %fp not having been saved to the stack beforehand, so the value loaded is incorrect).

The HOST_c2l() stuff is used elsewhere too, I think. I guess the only real reason for it to be used on non 4-byte aligned addresses is to prevent against unaligned accesses - if you try to read a 32-bit number from an odd address on a 68000, for example, it will crash (works fine on 68020+ of course). So, another thing to look at is why the input buffer is not aligned in the first place. Not sure what the alignment of malloc() is on OS3?

from amissl.

Futaura avatar Futaura commented on June 6, 2024

As far as HOST_c2l() is concerned, the SHA256 code works when redefined as follows, even with -fomit-frame-pointer, -O3 and baserel:

#define HOST_c2l(c,l)    l =(((unsigned long)(c[0]))<<24),      \
                         l|=(((unsigned long)(c[1]))<<16),      \
                         l|=(((unsigned long)(c[2]))<< 8),      \
                         l|=(((unsigned long)(c[3]))    ), c += 4

The generated code for the above looks suboptimal though, with this being a more compact (faster?) approach in general:

#define HOST_c2l(c,l)   __asm__ ("moveb %1@+,%0\n" \
                                 "lsll  #8,%0\n"   \
                                 "moveb %1@+,%0\n" \
                                 "lsll  #8,%0\n"   \
                                 "moveb %1@+,%0\n" \
                                 "lsll  #8,%0\n"   \
                                 "moveb %1@+,%0\n" \
                                 : "=d" (l), "=a" (c) : "0" (l), "1" (c))

from amissl.

Futaura avatar Futaura commented on June 6, 2024

My fear is without knowing the root cause of the bug in GCC, the problem can still occur elsewhere when using -fno-omit-frame-pointer. It is clear it is a bug in the code generation linked to optimisation, triggered by using -fbaserel, -fomit-frame-pointer and -O2/O3 simultaneously. If you were to add -fno-rerun-cse-after-loop, for example, it also makes the problem go away, albeit probably in an undesirable manner. Then there is the matter of a4 and a5 both being unavailable for usage, when not using -fomit-frame-pointer, which probably impacts performance.

Unfortunately, I am not familiar with the GCC sources at all, so it is no good me trying to pinpoint the cause. Is this something Gunther or somebody else would be willing/able to do? I'm not talking about fixing it, but simply finding out the best way to avoid it.

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

I know what you mean and you are right that simply omitting -fomit-frame-pointer is not enough and just fixes the symptoms. Thanks for the alternatives for the HOST_c2l() macro. I think I will use that one instead and reintroduce the omit-frame-pointer flag because I think we shoould better fix each issue where we identified it rather than by using different optimization flags we don't fully understand. And simply not using omit-frame-pointer might easily introduce other issues.

Regarding patching GCC I am not entirely sure if we on our own would be able to perform that without completly reverse engineering the OS3 GCC patches. We would definitly need help from Gunther Nikl here. But unfortunatly he hasn't been active for a while anymore and hasn't replied to emails lately. But I still hope he will soon return and probably also take care of ticket #1 as well.

In addition, I think we should also consider the OS/m68k platform not a primary platform anymore but concentrate on the PPC platforms. One more important thing would be IMHO to try to get the baserel support reintegrated in newer OS4 gcc sources or to try to port AmiSSL to MorphOS with their latest GCC5 sources which seems to come with baserel support again.

So if the OS3/m68k version works reliably now with your modified HOST_c2l version (I guess I would prefer the ASM version) then we should walk on and concentrate on finalzing AmiSSLv4. Opinions?

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

note my latest checkin with your asm HOST_c2l() macro. First tests suggest that everything is working now. However, I would really appreciate it if you could rework your ASM macro to the same syntax the OpenSSL group is using for their optimized ASM macros (see https://github.com/jens-maus/amissl/blob/master/openssl/crypto/md32_common.h#L221). They are using a mixture of C and asm which IMHO would be more appropriate. In addition, it would be great to have also an asm version HOST_l2c() because I have at least the plan/idea to submit our OpenSSL changes to the OpenSSL developers.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

Yes, the ASM HOST_c2l() would be better, although as we are not supporting 68000, we could also just perform an unaligned memory access using what some other platforms do - see https://github.com/jens-maus/amissl/blob/master/openssl/crypto/md32_common.h#L249. I'm not sure which is best in terms of performance. This could also be used for PPC, but as I mentioned earlier, I'm pretty sure the input data is always long word aligned on PPC anyway, so the HOST_c2l case will never be reached anyway.

I agree that we can only move on from this, as there is not much more we can do. As an aside, I noticed there is some PPC ASM optimised code that is not being used in the OS4 build yet, which would be good to use.

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

Ik, then see if you van enable it or point me to it please

from amissl.

Futaura avatar Futaura commented on June 6, 2024

OK - I added something - what do you think? There is no need to mix C in with the ASM since we do not need any temporary storage and the ASM modifies the variables directly. Not sure if I got the __mc680x0 defines are correct. Could add a ROTATE() too.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

Actually, forget ROTATE() - GCC already nicely translates the default macro to use ror and rol for 68k.

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

This looks fine to me. However, if you really just want to use the ASM for __mc68000 I think we should remove it completely as AmiSSL will NEVER be released for pure 68000 so this is just dead code.

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

Please note my latest checkin where I simplified the whole story about HOST_c2l() by just using the already existing unaligned versions of HOST_c2l() and HOST_l2c(). What do you think? IMHO this should be a good compromise.

Please note, btw, that for the OS4/PPC build the HOST_c2l() branch of the function is also reached. But on OS4/PPC the HOST_c2l() seems to work without any problems.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

Yes - wasn't sure what the performance impact is with an unaligned long access vs 4 individual byte accesses. My hunch would be that an unaligned access would be faster. We also need to consider the case where the data order is little endian (e.g. MD5). I had hoped to commit this tonight, but am experiencing issues with the compiler creating bad code for that (am using 3 lines of ASM to convert the value from little to big endian) just like before. Once that is working, time to move on.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

This has turned into a complete nightmare... I have found that GCC can still even mess up the data variable when using those simple unaligned access macros. Therefore, I have added some really basic ASM macros which replicate the C macro, forcing data to be saved properly. They really should not be necessary and do not improve performance, but it is the only way I can see to avoid this issue in all the modules where these macros are used.

I have also added a new HOST_c2l and HOST_l2c for the little endian case which does an optimised byte swap, and also increments the data variable in asm for the same reason as above. I did initially try mixing the asm with a c+=4, like the other platforms, which would be nicer, but the compiler mangled the code again.

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

Sorry Oliver, but I had revert your latest change to the HOST_c2l() and HOST_l2c() functions since it broke the whole SSL/TLS handshake which caused that the openssl s_client wasn't able to establish a connection at all. Needs to investigate further on this as I still prefer to have ASM stuff there but perhaps you can already look at it carefully again. Please find attached a new binary set with your changes applied and which will should the problem:

amissl4_os3.tar.gz

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

Please note that with your ASM changes both MD5 and SHA256 seems to be broken. Here the sha256t and md5test test applications already return an error. So all of your macros seem to be somehow not completely correct. Hope you can find a final solution for this.

from amissl.

Futaura avatar Futaura commented on June 6, 2024

Found the issue - the optimiser is making HASH_MAKE_STRING() completely disappear. Just trying to figure out if there is a better way to fix than using asm volatile (which I'd rather not use, if possible).

from amissl.

Futaura avatar Futaura commented on June 6, 2024

OK, so it is mainly HOST_l2c that is the problem since GCC has no real way of knowing that both those macros are writing to memory, therefore the optimiser can remove that code when it determines that the output variables are not subsequently accessed.

After a lot of testing, the only real solution is to use asm volatile instead of asm, at the very least for the two HOST_l2c, but probably best to do so for HOST_c2l too. The ASM code itself is fine.

Could you try that and retest to see if it definitely works?

from amissl.

jens-maus avatar jens-maus commented on June 6, 2024

Thanks for the checkin. Now after having merged openssl-1.1.x into master and having applied your proposed "asm volatile" change the HOST_l2c and HOST_c2l macros seem to finally work and I am not able reproduce these crashes anymore. Thus, I will close this ticket as solved.

from amissl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.