awslabs / aws-c-s3 Goto Github PK
View Code? Open in Web Editor NEWC99 library implementation for communicating with the S3 service, designed for maximizing throughput on high bandwidth EC2 instances.
License: Apache License 2.0
C99 library implementation for communicating with the S3 service, designed for maximizing throughput on high bandwidth EC2 instances.
License: Apache License 2.0
I've been testing combination of latest aws-c-s3 update regarding #285 and the ability to stream data of unknown size to S3 using a publisher.
I have 2 concerns aside from errors I'm getting -
Platform: Macbook Pro, Ventura 13.4, 32GB, Apple M1 Max ARM64 processor.
Aws CLI: aws-cli/2.12.1 Python/3.11.4 Darwin/22.5.0 source/arm64
TargetThroughput: 20.0Gbps
Minimum Part Size: 1000000L (I think this causes issues after 128M)
I've mocked up data which is just a string of 128 Bytes that I send over and over in a ByteBuffer.
Errors include failed response (400) from awssdk, missing checksums for parts, and SIGSEGV on libobjc.A.dylib... I've also received a SIGABRT which doesn't even give me a dump.
Attaching a simple Java project to test - configure whichever credentials and a bucket name and execute - as you increase num lines you'll start to see issues. I create the crt log in whatever your work directory happens to be.
Files uploaded successfully and can manage at least 100G S3 Objects.
Closing the stream doesn't grow with file size so drastically as it does now.
Crashes, Failed uploads, heavy delay on completing the upload.
No response
No response
aws.sdk.version 2.20.79, aws.crt.version 0.22.1
openjdk 17.0.3 2022-04-19
Darwin US10MAC44VWYPKH 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:52:24 PDT 2023; root:xnu-8796.121.2~5/RELEASE_ARM64_T6000 arm64 arm Darwin
AWS has a growing list of instance types with multiple NICs. There should be some way to configure the S3 client to use a particular network interface (or multiple network interfaces?)
Maximum throughput cannot be achieved with all instances of the S3 client using just the one default network interface.
Mountpoint (which uses aws-c-s3) recently received this feature request: awslabs/mountpoint-s3#815
poudriere interactive jail for:
package name: aws-c-s3-0.1.47_1
building for: FreeBSD 131amd64-devel 13.1-RELEASE FreeBSD 13.1-RELEASE amd64
-- The C compiler identification is Clang 13.0.0
make test
:
(...)
174 - test_s3_copy_source_prefixed_by_slash (Failed)
175 - test_s3_copy_source_prefixed_by_slash_multipart (Failed)
178 - test_s3_list_bucket_valid (Failed)
Errors while running CTest
Output from these tests are in: /wrkdirs/usr/ports/devel/aws-c-s3/work/.build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
FAILED: CMakeFiles/test.util
cd /wrkdirs/usr/ports/devel/aws-c-s3/work/.build && /usr/local/bin/ctest --force-new-ctest-process
ninja: build stopped: subcommand failed.
*** Error code 1
ctest --force-new-ctest-process --rerun-failed --output-on-failure
Results:
Thanks
Hello,
I have noticed that when using the AWS CRT to upload S3 objects with auto-ranged
PUT requests, occasionally when the server sends an error response, instead of
the meta request immediately terminating and invoking struct aws_s3_meta_request_options.finish_callback
, the meta request will be retried
until the maximum number of retry attempts has been met at which point the
finish callback will be invoked with either AWS_IO_SOCKET_CLOSED
or
AWS_ERROR_HTTP_CONNECTION_CLOSED
.
After some investigation, I believe this happens if and only if the request body
is still in the process of being transmitted when the error response (with a
Connection: close
header) has been received, followed by a subsequent TCP FIN
or TLS close_notify
alert.
When the Connection: close
header is received,
aws-c-http/source/h1_connection.c:s_decoder_on_header()
sets
incoming_stream->is_final_stream = true
. Then in
aws-c-http/source/h1_connection.c:s_decoder_on_done()
, after seeing that
is_final_stream
is set,
s_stop(
connection, true /*stop_reading*/, false /*stop_writing*/, false /*schedule_shutdown*/, AWS_ERROR_SUCCESS);
is called, which sets connection->thread_data.is_reading_stopped = true
, but
does not indicate at all to its leftward slot or the channel as a whole to stop
reading.
Later in aws-c-http/source/h1_connection.c:s_decoder_on_done()
, if
incoming_stream->is_outgoing_message_done
is set, then s_stream_complete()
is called, and (since is_final_stream
is set), s_connection_close()
is
called and
s_stop(connection, false /*stop_reading*/, false /*stop_writing*/, true /*schedule_shutdown*/, AWS_ERROR_SUCCESS);
finally leads to aws_channel_shutdown()
being called.
But if is_outgoing_message_done
is not set, then the leftward slot will
continue its read loop and immediately encounter the TCP FIN (when cleartext
HTTP is being used) or TLS close_notify
alert (when HTTPS is being used).
In the HTTP case, the aws_socket_read()
call in
aws-c-io/socket_channel_handler.c:s_do_read()
will get a zero-length read and
raise AWS_IO_SOCKET_CLOSED
.
In the HTTPS case (at least in s2n builds), the s2n_recv()
call in
aws-c-io/source/s2n/s2n_tls_channel_handler.c:s_s2n_handler_process_read_message()
will get a zero-length read, the close_notify
alert will be logged, and return
with AWS_OP_SUCCESS
. I haven't quite tracked down the exact code path yet, but
this ends up with aws-c-http/source/h1_connection.c:s_stream_complete()
being
invoked with AWS_ERROR_HTTP_CONNECTION_CLOSED
.
This AWS_IO_SOCKET_CLOSED
or AWS_ERROR_HTTP_CONNECTION_CLOSED
value
eventually propagates to
aws-c-s3/source/s3_meta_request.c:s_s3_meta_request_send_request_finish_helper()
. Since
this error code does not match AWS_ERROR_S3_INVALID_RESPONSE_STATUS
or
AWS_ERROR_S3_NON_RECOVERABLE_ASYNC_ERROR
, finish_code
is set to
AWS_S3_CONNECTION_FINISH_CODE_RETRY
, and the meta request is tried again
despite the error response from the S3 server potentially being non-recoverable.
I am not sure if the proper solution to this lies in aws-c-s3
(e.g. changing
the logic for when a meta request should be retried), aws-c-http
(e.g. signal
its leftward slot to stop reading once a complete Connection: close
response
has been processed), aws-c-io
(e.g. change the socket and TLS handler logic to
be able to stop reading immediately when signaled by its rightward slot), or
some combination of the three, but I figured that raising the issue in the
top-level repo would make the most sense. Let me know if this issue would be
more appropriate in one of the other CRT repos instead.
It may also make sense to consider:
Logic to immediately stop transmitting a request when a response that
indicates failure has been received (perhaps if the HTTP response status >=
300) to avoid potentially triggering a RST from the server
Ensuring that a response is gracefully handled even if a RST, TLS alert, or
write error is received after having received a complete response. I have seen
instances where instead of the zero read in
aws-c-io/source/posix/socket.c:aws_socket_read()
, an ECONNRESET
was
received--looking at a pcap of the stream a RST had been received immediately
after the FIN.
I have also seen AWS_IO_TLS_ERROR_WRITE_FAILURE
raised after the
close_notify
TLS alert was received, but this case also ends up with
AWS_ERROR_HTTP_CONNECTION_CLOSED
being propagated to
aws-c-s3/source/s3_meta_request.c:s_s3_meta_request_send_request_finish_helper()
.
Here are some log snippets of such failures that I was able to reproduce using
both HTTP and HTTPS and using the AWS S3 servers and a third-party, nominally
S3-compatible server that I spun up locally for testing (I have seen this error
behavior using both HTTP and HTTPS with both the AWS S3 servers and the
third-party local server). Note that the AWS_LL_TRACE
messages that were
logged are listed with level DEBUG due to my logger implementation.
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Reading from body stream.
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Sending 16384 bytes of body, progress: 703902/804335
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming response status: 403 (Forbidden).
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: x-amz-request-id: DCT2S3BF391M7NX6
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: x-amz-id-2: tm8C9/WzAZn6vxs1/g9q36SVl4NvfTW1fgcwuY8nf6AkEeA2ZJsE7UCUIiUevjXX2tXomJ8LR30=
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: Content-Type: application/xml
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: Transfer-Encoding: chunked
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: Date: Thu, 02 Mar 2023 21:09:54 GMT
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: Server: AmazonS3
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: Connection: close
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Received 'Connection: close' header. This will be the final stream on this connection.
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Main header block done.
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming body: 344 bytes received.
[DEBUG]: AWS [00007f1b114fc700] [S3MetaRequest] - id=0x61600006ea80 Incoming body for request 0x611000059f00. Response status: 403. Data Size: 344. connection: 0x60400006dfd0.
[DEBUG]: AWS [00007f1b114fc700] [S3MetaRequest] - response body:
[DEBUG]: <?xml version="1.0" encoding="UTF-8"?>
[DEBUG]: <Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>minioadmin</AWSAccessKeyId><RequestId>DCT2S3BF391M7NX6</RequestId><HostId>tm8C9/WzAZn6vxs1/g9q36SVl4NvfTW1fgcwuY8nf6AkEeA2ZJsE7UCUIiUevjXX2tXomJ8LR30=</HostId></Error>
[INFO]: AWS [00007f1b114fc700] [socket] - id=0x613000053b00 fd=19: zero read, socket is closed
[DEBUG]: AWS [00007f1b114fc700] [task-scheduler] - id=0x61800007f680: Scheduling channel_shutdown task for immediate execution
[DEBUG]: AWS [00007f1b114fc700] [channel] - id=0x61800007f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007f1b114fc700] [task-scheduler] - id=0x61800007f680: Running channel_shutdown task with <Running> status
[DEBUG]: AWS [00007f1b114fc700] [channel] - id=0x61800007f480: beginning shutdown process
[DEBUG]: AWS [00007f1b114fc700] [channel] - id=0x61800007f480: handler 0x61200009b2c0 shutdown in read dir completed.
[DEBUG]: AWS [00007f1b114fc700] [channel] - id=0x61800007f480: handler 0x61700007eb08 shutdown in read dir completed.
[DEBUG]: AWS [00007f1b114fc700] [task-scheduler] - id=0x61800007f4a8: Scheduling (null) task for immediate execution
[DEBUG]: AWS [00007f1b114fc700] [task-scheduler] - id=0x61800007f4a8: Running (null) task with <Running> status
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Stream completed with error code 1051 (AWS_IO_SOCKET_CLOSED).
[INFO]: AWS [00007f1b114fc700] [http-connection] - id=0x61700007eb00: Shutting down connection with error code 0 (AWS_ERROR_SUCCESS).
[DEBUG]: AWS [00007f1b114fc700] [channel] - id=0x61800007f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007f1b114fc700] [S3MetaRequest] - id=0x61600006ea80: Request 0x611000059f00 finished with error code 1051 (aws-c-io: AWS_IO_SOCKET_CLOSED, socket is closed.) and response status 403
[ERROR]: AWS [00007f1b114fc700] [S3MetaRequest] - id=0x61600006ea80 Meta request failed from error 1051 (socket is closed.). (request=0x611000059f00, response status=403). Try to setup a retry.
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Reading from body stream.
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Sending 16384 bytes of body, progress: 720286/804335
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming response status: 403 (Forbidden).
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: x-amz-request-id: B5VDFTXHSRW9HVK3
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: x-amz-id-2: vJaT/N9Q9JXQkGu91IP3VH4ak//uge8vt9hDXolY8PJp06bsvNo0SkMoLPy9fpAjXRkDTeKh84Y=
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: Content-Type: application/xml
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: Transfer-Encoding: chunked
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: Date: Thu, 02 Mar 2023 21:09:54 GMT
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: Server: AmazonS3
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: Connection: close
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Received 'Connection: close' header. This will be the final stream on this connection.
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Main header block done.
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming body: 344 bytes received.
[DEBUG]: AWS [00007f1b10cfb700] [S3MetaRequest] - id=0x61600006ea80 Incoming body for request 0x611000059f00. Response status: 403. Data Size: 344. connection: 0x60400006dfd0.
[DEBUG]: AWS [00007f1b10cfb700] [S3MetaRequest] - response body:
[DEBUG]: <?xml version="1.0" encoding="UTF-8"?>
[DEBUG]: <Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>minioadmin</AWSAccessKeyId><RequestId>B5VDFTXHSRW9HVK3</RequestId><HostId>vJaT/N9Q9JXQkGu91IP3VH4ak//uge8vt9hDXolY8PJp06bsvNo0SkMoLPy9fpAjXRkDTeKh84Y=</HostId></Error>
[INFO]: AWS [00007f1b10cfb700] [socket] - id=0x613000045b00 fd=19: socket is closed.
[DEBUG]: AWS [00007f1b10cfb700] [task-scheduler] - id=0x61800006f680: Scheduling channel_shutdown task for immediate execution
[DEBUG]: AWS [00007f1b10cfb700] [channel] - id=0x61800006f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007f1b10cfb700] [task-scheduler] - id=0x61800006f680: Running channel_shutdown task with <Running> status
[DEBUG]: AWS [00007f1b10cfb700] [channel] - id=0x61800006f480: beginning shutdown process
[DEBUG]: AWS [00007f1b10cfb700] [channel] - id=0x61800006f480: handler 0x6120000832c0 shutdown in read dir completed.
[DEBUG]: AWS [00007f1b10cfb700] [channel] - id=0x61800006f480: handler 0x61700005ec08 shutdown in read dir completed.
[DEBUG]: AWS [00007f1b10cfb700] [task-scheduler] - id=0x61800006f4a8: Scheduling (null) task for immediate execution
[DEBUG]: AWS [00007f1b10cfb700] [task-scheduler] - id=0x61800006f4a8: Running (null) task with <Running> status
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Stream completed with error code 1051 (AWS_IO_SOCKET_CLOSED).
[INFO]: AWS [00007f1b10cfb700] [http-connection] - id=0x61700005ec00: Shutting down connection with error code 0 (AWS_ERROR_SUCCESS).
[DEBUG]: AWS [00007f1b10cfb700] [channel] - id=0x61800006f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007f1b10cfb700] [S3MetaRequest] - id=0x61600006ea80: Request 0x611000059f00 finished with error code 1051 (aws-c-io: AWS_IO_SOCKET_CLOSED, socket is closed.) and response status 403
[ERROR]: AWS [00007f1b10cfb700] [S3MetaRequest] - id=0x61600006ea80 Meta request failed from error 1051 (socket is closed.). (request=0x611000059f00, response status=403). Try to setup a retry.
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Reading from body stream.
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Sending 16331 bytes of body, progress: 244362/804335
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming response status: 403 (Forbidden).
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Accept-Ranges: bytes
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Content-Length: 314
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Content-Type: application/xml
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Server: MinIO
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Vary: Origin
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Date: Thu, 02 Mar 2023 19:09:19 GMT
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Connection: close
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Received 'Connection: close' header. This will be the final stream on this connection.
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Main header block done.
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming body: 314 bytes received.
[DEBUG]: AWS [00007fe36ccfb700] [S3MetaRequest] - id=0x61600003ed80 Incoming body for request 0x611000059f00. Response status: 403. Data Size: 314. connection: 0x604000027fd0.
[DEBUG]: AWS [00007fe36ccfb700] [S3MetaRequest] - response body:
[DEBUG]: <?xml version="1.0" encoding="UTF-8"?>
[DEBUG]: <Error><Code>RequestTimeTooSkewed</Code><Message>The difference between the request time and the server's time is too large.</Message><Resource>/test0/twocities.txt</Resource><RequestId></RequestId><HostId>64e90a68-95ff-4541-bf12-7e354f1dc058</HostId></Error>
[DEBUG]: AWS [00007fe36ccfb700] [tls-handler] - id=0x61600007cb80: Alert code 0
[DEBUG]: AWS [00007fe36ccfb700] [task-scheduler] - id=0x61800003f680: Scheduling channel_shutdown task for immediate execution
[DEBUG]: AWS [00007fe36ccfb700] [task-scheduler] - id=0x61800003f680: Running channel_shutdown task with <Running> status
[DEBUG]: AWS [00007fe36ccfb700] [channel] - id=0x61800003f480: beginning shutdown process
[DEBUG]: AWS [00007fe36ccfb700] [channel] - id=0x61800003f480: handler 0x61200005b6c0 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36ccfb700] [tls-handler] - id=0x61600007cb80: Shutting down read direction with error code 0
[DEBUG]: AWS [00007fe36ccfb700] [channel] - id=0x61800003f480: handler 0x61600007cb80 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36ccfb700] [channel] - id=0x61800003f480: handler 0x61700005b788 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36ccfb700] [task-scheduler] - id=0x61800003f4a8: Scheduling (null) task for immediate execution
[DEBUG]: AWS [00007fe36ccfb700] [task-scheduler] - id=0x61800003f4a8: Running (null) task with <Running> status
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Stream completed with error code 2058 (AWS_ERROR_HTTP_CONNECTION_CLOSED).
[INFO]: AWS [00007fe36ccfb700] [http-connection] - id=0x61700005b780: Shutting down connection with error code 0 (AWS_ERROR_SUCCESS).
[DEBUG]: AWS [00007fe36ccfb700] [channel] - id=0x61800003f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007fe36ccfb700] [S3MetaRequest] - id=0x61600003ed80: Request 0x611000059f00 finished with error code 2058 (aws-c-http: AWS_ERROR_HTTP_CONNECTION_CLOSED, The connection has closed or is closing.) and response status 403
[ERROR]: AWS [00007fe36ccfb700] [S3MetaRequest] - id=0x61600003ed80 Meta request failed from error 2058 (The connection has closed or is closing.). (request=0x611000059f00, response status=403). Try to setup a retry.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Reading from body stream.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Sending 16331 bytes of body, progress: 228031/804335
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming response status: 403 (Forbidden).
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Accept-Ranges: bytes
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Content-Length: 314
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Content-Type: application/xml
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Server: MinIO
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Vary: Origin
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Date: Thu, 02 Mar 2023 19:09:19 GMT
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Connection: close
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Received 'Connection: close' header. This will be the final stream on this connection.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Main header block done.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming body: 314 bytes received.
[DEBUG]: AWS [00007fe36d4fc700] [S3MetaRequest] - id=0x61600003ed80 Incoming body for request 0x611000059f00. Response status: 403. Data Size: 314. connection: 0x604000027fd0.
[DEBUG]: AWS [00007fe36d4fc700] [S3MetaRequest] - response body:
[DEBUG]: <?xml version="1.0" encoding="UTF-8"?>
[DEBUG]: <Error><Code>RequestTimeTooSkewed</Code><Message>The difference between the request time and the server's time is too large.</Message><Resource>/test0/twocities.txt</Resource><RequestId></RequestId><HostId>64e90a68-95ff-4541-bf12-7e354f1dc058</HostId></Error>
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61100006a840: Scheduling socket_written_task task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61100006a840: Running socket_written_task task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61700007b710: Scheduling http1_connection_outgoing_stream task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61700007b710: Running http1_connection_outgoing_stream task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Reading from body stream.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Sending 16331 bytes of body, progress: 244362/804335
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61100006a840: Scheduling socket_written_task task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [tls-handler] - id=0x61600008d080: Alert code 0
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61800004f680: Scheduling channel_shutdown task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61100006a840: Running socket_written_task task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61700007b710: Scheduling http1_connection_outgoing_stream task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61800004f680: Running channel_shutdown task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: beginning shutdown process
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: handler 0x6120000748c0 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36d4fc700] [tls-handler] - id=0x61600008d080: Shutting down read direction with error code 0
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: handler 0x61600008d080 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: handler 0x61700007b688 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61800004f4a8: Scheduling (null) task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61700007b710: Running http1_connection_outgoing_stream task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Reading from body stream.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Sending 16331 bytes of body, progress: 260693/804335
[ERROR]: AWS [00007fe36d4fc700] [http-connection] - id=0x61700007b680: Failed to send message in write direction, error 1031 (AWS_IO_TLS_ERROR_WRITE_FAILURE). Closing connection.
[INFO]: AWS [00007fe36d4fc700] [http-connection] - id=0x61700007b680: Shutting down connection with error code 1031 (AWS_IO_TLS_ERROR_WRITE_FAILURE).
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61800004f4a8: Running (null) task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Stream completed with error code 2058 (AWS_ERROR_HTTP_CONNECTION_CLOSED).
[INFO]: AWS [00007fe36d4fc700] [http-connection] - id=0x61700007b680: Shutting down connection with error code 0 (AWS_ERROR_SUCCESS).
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007fe36d4fc700] [S3MetaRequest] - id=0x61600003ed80: Request 0x611000059f00 finished with error code 2058 (aws-c-http: AWS_ERROR_HTTP_CONNECTION_CLOSED, The connection has closed or is closing.) and response status 403
[ERROR]: AWS [00007fe36d4fc700] [S3MetaRequest] - id=0x61600003ed80 Meta request failed from error 2058 (The connection has closed or is closing.). (request=0x611000059f00, response status=403). Try to setup a retry.
The CRT library versions that I'm currently using are
aws-c-auth: v0.6.25
aws-c-cal: v0.5.21
aws-c-common: v0.8.12
aws-c-compression: v0.2.16
aws-checksums: v0.1.14
aws-c-http: v0.7.5
aws-c-io: v0.13.18
aws-c-s3: v0.2.5
aws-c-sdkutils: v0.1.7
s2n-tls: v1.3.38
but I had seen this behavior for many months across multiple earlier versions
before finding the time to try to track down the error to a reportable issue.
Let me know if there is any additional information that you would like me to
provide.
Thank you,
Alex
I'm trying to run the samples given ,however i don't know what i'm doing wrong by running this line of code.Previously I added my key and secret key as environment variable as "AWS_ACCESS_KEY_ID" and"AWS_SECRET_ACCESS_KEY" .
I tried to run this command in the Ubuntu CLI
aws-c-s3/build/samples/s3/s3 ls s3://pruebaxpn --region us-west-2
However,this error pops up:
Failure while listing objects. Please check if you have valid credentials and s3 path is correct. Error: aws-c-s3: AWS_ERROR_S3_INVALID_RESPONSE_STATUS, Invalid response status from request
What I'm doing wrong? Thank you in advance
Hello!
During the static analysis process, a suspicion of a bad copypast error was identified in s3_util.c:208. There is no changes between two blocks (192-196 and 208-214 lines), but in 192 and 208 lines the same instruction: "if (signing_config->service.len > 0) {". I think there should be signed_body_value instead of service in line 208.Please clarify is this right?
In line 208:
if (signing_config->signed_body_value.len > 0) {
In line 208:
if (signing_config->service.len > 0) {
This occurs when calling
struct aws_cached_signing_config_aws *aws_cached_signing_config_new(
struct aws_allocator *allocator,
const struct aws_signing_config_aws *signing_config)
Change "if (signing_config->service.len > 0) {" to "if (signing_config->signed_body_value.len > 0) {"
No response
v0.3.18
g++ 8.3.0
OS Linux Debian
CompleteMultipartUpload
in auto-ranged-PUT failed due to a missing second (and final) UploadPart
.
We ran into this problem with aws-c-s3
0.1.51 and aws-sdk-cpp
1.10.54 on Linux
Our API issued an S3CrtClient->PutObject
, and it resulted in the following error:
Expected: 'x is ok', with x := 'output_blob_->close()' [av::status::Status]
x = PutObject() failed
where: cloud/aws/s3/s3_streambuf.cc:93
extra: s3://perception-prod-training-data/opt/a831200c/s2a/2023-02-15-bless-collect_dking_updateOverlapFeb10_latestIssues/test/36c6f8923fe514d6b5a28ac5dbdea034.rats: HTTP response code: 400
Resolved remote host IP address:
Request ID: 2BAQ8WRZGGH3PPG1
Exception name: InvalidPart
Error message: Unable to parse ExceptionName: InvalidPart Message: One or more of the specified parts could not be found. The part may not have been uploaded, or the specified entity tag may not match the part's entity tag.
7 response headers:
connection : close
content-type : application/xml
date : Wed, 15 Feb 2023 16:39:26 GMT
server : AmazonS3
transfer-encoding : chunked
x-amz-id-2 : oFLatf5OZVV1Ny34iYIEhqA3Ft/+QRXSLzz6/K/c36nd4grUqR6kmhKJS/U32GftV1/GdmVEbOY=
x-amz-request-id : 2BAQ8WRZGGH3PPG1
Backtrace (most recent call first)
#10 at 0x560a5b826f76 in av::cloud::aws::s3::S3Ostreambuf::PutObject()
#9 at 0x560a5b82b63d in av::cloud::aws::s3::S3Ostreambuf::close()
#8 at 0x560a5b822170 in av::cloud::aws::s3::S3Ostream::close()
#7 at 0x560a5af329b8 in av::perception::s2a::DataExtractionModule::on_shutdown()
#6 at 0x560a5bf40593 in av::framework::Module::on_executor_shutdown()
#5 at 0x560a5ac5ed91 in av::detail::FuncImpl<>::invoke()
#4 at 0x560a5beacc8b in av::BaseThreadPool::do_next()
#3 at 0x560a5beacecd in std::thread::_State_impl<>::_M_run()
#2 at 0x7fc1887c76df in <?>
#1 at 0x7fc188ca26db in start_thread
The PutObject
in frame 10 invokes the S3CrtClient->PutObject
call.
Further investigation showed that the first UploadPart succeeded (visible in list-parts
, there was no evidence (neither list-parts
nor API logs) that the second UploadPart
completed.
In our logs, there was no further aws-c-s3
error indicating a failed operation.
The CompleteMultipartUpload
request uses the ETags of the completed requests, so how could the CompleteMultiPartUpload
have been sent with the ETag of the second UploadPart?
Perhaps it was sent with only 1 ETag (that of the first, successfully completed UploadPart).
Hi,
Thanks for this work. I am trying to understand what would be the simplest way to upload a file to S3.
I have seen your samples but that is a bit more elaborated and looks like it's using command line arguments and then relevant aws functions to process these. However, I simply want to use C++ to upload a file to S3 without all the extra complexities.
Do you have example for this please?
Thank you
This is needed to support aws/aws-sdk-cpp#2477.
CopyObject
support was added in #166, but client support as well as tests were disabled in #246. It seems the cause was the structure of the tests. As a result, CopyObject
is currently neither supported nor continuously tested.
Please re-enable tests and client support for CopyObject
.
When using a chunk size of 5MiB (default value of the C++ SDK), the final byte of the blob was not transferred, corrupting the download.
When transferring an s3
blob of size 31457281 (1 byte more than exactly 30MiB), only 6 chunks were transferred, and the last chunk ended at 31457280. There should have been a 7th chunk that transferred the final byte.
I printed out the size of each body_callback
write when this was invoked by the main C++ SDK:
partSize: 5242880
got 5242880
got 5242880
got 5242880
got 5242880
got 5242880
got 5242880
With a partSize
(chunk size) of 8MiB or 16MiB, the problem did not occur, chunks were divided so that the whole object was transferred:
got 8388608
got 8388608
got 8388608
got 6291457
In both cases, the code calculated 31457280
as the end of the object range:
The end of the object (byte) range is set in s_discover_object_range_and_content_length
:
// source/s3_auto_ranged_get.c
static int s_discover_object_range_and_content_length(
struct aws_s3_meta_request *meta_request,
struct aws_s3_request *request,
int error_code,
uint64_t *out_total_content_length,
uint64_t *out_object_range_start,
uint64_t *out_object_range_end) { /* ... */
case AWS_S3_AUTO_RANGE_GET_REQUEST_TYPE_PART: /* ... */
/* When discovering the object size via first-part, the object range is the entire object. */
object_range_start = 0;
object_range_end = total_content_length-1; // <=== HERE
// ...
}
To correctly calculate the end of the range and number of chunks in a ranged-GET request, a value of content_length
should be used instead of content_length-1
.
After temporarily setting
// source/s3_auto_ranged_get.c
/* When discovering the object size via first-part, the object range is the entire object. */
object_range_start = 0;
object_range_end = total_content_length;
we now see that the final chunk of size 1B is sent when using a 5MiB chunk size, completing the download:
partSize: 5242880
got 5242880
got 5242880
got 5242880
got 5242880
got 5242880
got 5242880
got 1
Simply changing the object_range_end
values does not seem right, since
Range
headers, byte ranges start at byte 0 and end at byte n-1
,n
).Hence in order fix the problem properly, both cases need to be respected.
#360 introduced AWS_ERROR_HTTP_RESPONSE_FIRST_BYTE_TIMEOUT and others, which aren't defined
Package compiles
Does not compile
Compile package
No response
No response
0.3.23..0.4.0
gcc 12
debian 11 / buildroot 2023.08
We were experiencing stuck/hanging downloads and found 3 stalled TCP connections when debugging a stuck instance.
We have a program based on aws-c-s3
that downloads 9547 files from s3
, which repeatedly got stuck during download (6 confirmed cases). In the environment that it runs in the network speed is not very high.
Below are the results from debugging one such stuck program, which hung for over one hour after downloading 9544 out of 9547 files. Corresponding to the 3 remaining files, it had 3 open TCP connections (lsof
output):
avlog 32082 aurora 177u a_inode 0,13 0 11699 [eventfd]
avlog 32082 aurora 178u sock 0,8 0t0 931727 protocol: TCP
avlog 32082 aurora 179u sock 0,8 0t0 969081 protocol: TCP
avlog 32082 aurora 180u sock 0,8 0t0 927838 protocol: TCP
avlog 32082 aurora 181u sock 0,8 0t0 930058 protocol: TCP
avlog 32082 aurora 182u sock 0,8 0t0 931729 protocol: TCP
avlog 32082 aurora 183u sock 0,8 0t0 931730 protocol: TCP
avlog 32082 aurora 258u IPv4 928034 0t0 TCP car:41030->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
avlog 32082 aurora 264u IPv4 898921 0t0 TCP car:53400->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
avlog 32082 aurora 374u IPv4 943913 0t0 TCP car:48732->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
avlog 32082 aurora 401w REG 0,44 601226302 228200 /tmp/aws_sdk_2022-08-18-15.log
Further data (from ss
) showed that these connections had not been shut down yet:
avlog 32082 aurora 258u IPv4 RW,ND 928034 0t0 TCP car:41030->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
avlog 32082 aurora 264u IPv4 RW,ND 898921 0t0 TCP car:53400->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
avlog 32082 aurora 374u IPv4 RW,ND 943913 0t0 TCP car:48732->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
After some searching/grepping we found 3 "lost" connections in the logfiles, for which we looked up the requests
[DEBUG] 2022-08-18 12:10:40.806 http-stream [140608895186688] id=0x7fe20de64400: Created client request on connection=0x7fe20de61f00: GET https://aurora-amendments.s3.us-east-1.amazonaws.com/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df%23port_rear_camera.image_rgb.data.03123 HTTP/1.1
[DEBUG] 2022-08-18 12:59:24.799 http-stream [140608580613888] id=0x7fe1fb043200: Created client request on connection=0x7fe1fb061600: GET https://aurora-amendments.s3.us-east-1.amazonaws.com/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df%23forward_center_camera.image_rgb.data.06782 HTTP/1.1
[DEBUG] 2022-08-18 14:22:34.721 http-stream [140608907769600] id=0x7fe20e036600: Created client request on connection=0x7fe20e07a700: GET https://aurora-amendments.s3.us-east-1.amazonaws.com/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df%23starboard_rear_camera.image_rgb.data.03405 HTTP/1.1
All times are in UTC.
The last log entry was made at 15:24 UTC. Hence it looks as if there was no progress on these connnections for over 1 hour.
By default Linux does not enable TCP keep-alive in order to close (and appropriately fail) on stuck TCP connections. aws-c-s3
supports TCP keep-alive, but it needs to be enabled explicitly.
The Linux defaults (2 hours, 11 minutes) are also too long:
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
Hence the values need to be modified, ideally to detect a stuck TCP connection within 10 minutes.
We settled for an initial idle interval of 3 minutes (180sec), with 7 probes spaced 60 seconds. This allows to detect a dead TCP connection after 10 minutes.
We experienced that before #204 was merged, the following settings worked for us and can easily be ported to use with aws_s3_endpoint_options
:
--- a/source/s3_endpoint.c
+++ b/source/s3_endpoint.c
@@ -35,6 +35,10 @@
#include <math.h>
static const uint32_t s_connection_timeout_ms = 30000;
+/* SO_KEEPALIVE settings - detect dead connection after 10 minutes: */
+static const uint16_t s_keep_alive_interval_sec = 180;
+static const uint16_t s_keep_alive_timeout_sec = 60;
+static const uint16_t s_keep_alive_max_failed_probes = 7;
static const uint16_t s_http_port = 80;
static const uint16_t s_https_port = 443;
@@ -137,6 +141,12 @@ static struct aws_http_connection_manager *s_s3_endpoint_create_http_connection_
socket_options.type = AWS_SOCKET_STREAM;
socket_options.domain = AWS_SOCKET_IPV4;
socket_options.connect_timeout_ms = s_connection_timeout_ms;
+
+ socket_options.keepalive = true;
+ socket_options.keep_alive_interval_sec = s_keep_alive_interval_sec;
+ socket_options.keep_alive_timeout_sec = s_keep_alive_timeout_sec;
+ socket_options.keep_alive_max_failed_probes = s_keep_alive_max_failed_probes;
+
struct proxy_env_var_settings proxy_ev_settings;
AWS_ZERO_STRUCT(proxy_ev_settings);
/* Turn on envrionment variable for proxy by default */
๐ I'm looking into bench-marking this library, I came across this line which potentially would be what I'm looking for,
However when I try to execute it, it's not one of the available tests. I have built with the flags laid out in,
mattbr@MacBook-Air ~/D/G/a/a/b/tests (main)> ./aws-c-s3-tests test_s3_get_performance
Available tests:
0. test_s3_copy_http_message
1. test_s3_message_util_assign_body
2. test_s3_ranged_get_object_message_new
3. test_s3_set_multipart_request_path
4. test_s3_create_multipart_upload_message_new
5. test_s3_upload_part_message_new
6. test_s3_complete_multipart_message_new
7. test_s3_abort_multipart_upload_message_new
8. test_s3_client_create_destroy
9. test_s3_client_monitoring_options_override
10. test_s3_client_proxy_ev_settings_override
11. test_s3_client_tcp_keep_alive_options_override
12. test_s3_client_max_active_connections_override
13. test_s3_client_get_max_active_connections
14. test_s3_request_create_destroy
15. test_s3_client_queue_requests
16. test_s3_meta_request_body_streaming
17. test_s3_update_meta_requests_trigger_prepare
18. test_s3_client_update_connections_finish_result
19. test_s3_client_exceed_retries
20. test_s3_client_acquire_connection_fail
21. test_s3_meta_request_fail_prepare_request
22. test_s3_meta_request_sign_request_fail
23. test_s3_meta_request_send_request_finish_fail
24. test_s3_auto_range_put_missing_upload_id
25. test_s3_cancel_mpu_create_not_sent
26. test_s3_cancel_mpu_create_completed
27. test_s3_cancel_mpu_one_part_completed
28. test_s3_cancel_mpu_all_parts_completed
29. test_s3_cancel_mpd_nothing_sent
30. test_s3_cancel_mpd_one_part_sent
31. test_s3_cancel_mpd_one_part_completed
32. test_s3_cancel_mpd_two_parts_completed
33. test_s3_cancel_mpd_head_object_sent
34. test_s3_cancel_mpd_head_object_completed
35. test_s3_cancel_mpd_get_without_range_sent
36. test_s3_cancel_mpd_get_without_range_completed
37. test_s3_cancel_prepare
38. test_s3_get_object_tls_disabled
39. test_s3_get_object_tls_enabled
40. test_s3_get_object_tls_default
41. test_s3_get_object_less_than_part_size
42. test_s3_get_object_empty_object
43. test_s3_get_object_multiple
44. test_s3_get_object_sse_kms
45. test_s3_get_object_sse_aes256
46. test_s3_get_object_backpressure_small_increments
47. test_s3_get_object_backpressure_big_increments
48. test_s3_get_object_backpressure_initial_size_zero
49. test_s3_no_signing
50. test_s3_signing_override
51. test_s3_put_object_tls_disabled
52. test_s3_put_object_tls_enabled
53. test_s3_put_object_tls_default
54. test_s3_multipart_put_object_with_acl
55. test_s3_put_object_multiple
56. test_s3_put_object_less_than_part_size
57. test_s3_put_object_empty_object
58. test_s3_put_object_with_part_remainder
59. test_s3_put_object_sse_kms
60. test_s3_put_object_sse_kms_multipart
61. test_s3_put_object_sse_aes256
62. test_s3_put_object_sse_aes256_multipart
63. test_s3_put_object_sse_c_aes256_multipart
64. test_s3_put_object_sse_c_aes256_multipart_with_checksum
65. test_s3_put_object_singlepart_no_content_md5_enabled
66. test_s3_put_object_singlepart_no_content_md5_disabled
67. test_s3_put_object_singlepart_correct_content_md5_enabled
68. test_s3_put_object_singlepart_correct_content_md5_disabled
69. test_s3_put_object_singlepart_incorrect_content_md5_enabled
70. test_s3_put_object_singlepart_incorrect_content_md5_disabled
71. test_s3_put_object_multipart_no_content_md5_enabled
72. test_s3_put_object_multipart_no_content_md5_disabled
73. test_s3_put_object_multipart_correct_content_md5_enabled
74. test_s3_put_object_multipart_correct_content_md5_disabled
75. test_s3_put_object_multipart_incorrect_content_md5_enabled
76. test_s3_put_object_multipart_incorrect_content_md5_disabled
77. test_s3_upload_part_message_with_content_md5
78. test_s3_upload_part_message_without_content_md5
79. test_s3_create_multipart_upload_message_with_content_md5
80. test_s3_complete_multipart_message_with_content_md5
81. test_s3_put_object_double_slashes
82. test_s3_round_trip
83. test_s3_round_trip_default_get
84. test_s3_round_trip_multipart_get_fc
85. test_s3_round_trip_default_get_fc
86. test_s3_round_trip_mpu_multipart_get_fc
87. test_s3_round_trip_mpu_multipart_get_with_list_algorithm_fc
88. test_s3_round_trip_mpu_default_get_fc
89. test_s3_chunked_then_unchunked
90. test_s3_cancel_mpu_one_part_completed_fc
91. test_s3_cancel_mpd_one_part_completed_fc
92. test_s3_meta_request_default
93. test_s3_put_object_fail_headers_callback
94. test_s3_put_object_fail_body_callback
95. test_s3_get_object_fail_headers_callback
96. test_s3_get_object_fail_body_callback
97. test_s3_default_fail_headers_callback
98. test_s3_default_invoke_headers_callback_on_error
99. test_s3_default_invoke_headers_callback_cancels_on_error
100. test_s3_get_object_invoke_headers_callback_on_error
101. test_s3_put_object_invoke_headers_callback_on_error
102. test_s3_put_object_invoke_headers_callback_on_error_with_user_cancellation
103. test_s3_default_fail_body_callback
104. test_s3_error_missing_file
105. test_s3_existing_host_entry
106. test_s3_put_fail_object_invalid_request
107. test_s3_put_fail_object_inputstream_fail_reading
108. test_s3_put_single_part_fail_object_inputstream_fail_reading
109. test_s3_put_object_clamp_part_size
110. test_s3_auto_ranged_get_sending_user_agent
111. test_s3_auto_ranged_put_sending_user_agent
112. test_s3_default_sending_meta_request_user_agent
113. test_s3_range_requests
114. test_s3_not_satisfiable_range
115. test_s3_bad_endpoint
116. test_s3_different_endpoints
117. test_s3_replace_quote_entities
118. test_s3_strip_quotes
119. test_s3_parse_content_range_response_header
120. test_s3_parse_content_length_response_header
121. test_s3_get_num_parts_and_get_part_range
122. test_s3_aws_xml_get_top_level_tag_with_root_name
123. test_add_user_agent_header
124. test_get_existing_compute_platform_info
125. test_get_nonexistent_compute_platform_info
126. sha1_nist_test_case_1
127. sha1_nist_test_case_2
128. sha1_nist_test_case_3
129. sha1_nist_test_case_4
130. sha1_nist_test_case_5
131. sha1_nist_test_case_5_truncated
132. sha1_nist_test_case_6
133. sha1_test_invalid_buffer
134. sha1_test_oneshot
135. sha1_test_invalid_state
136. sha256_nist_test_case_1
137. sha256_nist_test_case_2
138. sha256_nist_test_case_3
139. sha256_nist_test_case_4
140. sha256_nist_test_case_5
141. sha256_nist_test_case_5_truncated
142. sha256_nist_test_case_6
143. sha256_test_invalid_buffer
144. sha256_test_oneshot
145. sha256_test_invalid_state
146. crc32_nist_test_case_1
147. crc32_nist_test_case_2
148. crc32_nist_test_case_3
149. crc32_nist_test_case_4
150. crc32_nist_test_case_5
151. crc32_nist_test_case_5_truncated
152. crc32_nist_test_case_6
153. crc32_test_invalid_buffer
154. crc32_test_oneshot
155. crc32_test_invalid_state
156. crc32c_nist_test_case_1
157. crc32c_nist_test_case_2
158. crc32c_nist_test_case_3
159. crc32c_nist_test_case_4
160. crc32c_nist_test_case_5
161. crc32c_nist_test_case_5_truncated
162. crc32c_nist_test_case_6
163. crc32c_test_invalid_buffer
164. crc32c_test_oneshot
165. crc32c_test_invalid_state
166. verify_checksum_stream
167. verify_chunk_stream
168. test_s3_put_pause_resume_happy_path
169. test_s3_put_pause_resume_all_parts_done
170. test_s3_put_pause_resume_invalid_checksum
171. test_s3_list_bucket_init_mem_safety
172. test_s3_list_bucket_init_mem_safety_optional_copies
173. test_s3_list_bucket_valid
Failed: test_s3_get_performance is an invalid test name.
Encountered warnings: conversion from 'const uint64_t' to 'size_t'
for Windows 32-bit build at the following two places:
aws-c-s3/source/s3_meta_request.c
Line 701 in 245af3d
static const uint64_t s_response_body_error_buf_size = KB_TO_BYTES(1); /* uint64_t here */
/* ...... */
/* We may have an error body coming soon, so allocate a buffer for that error. */
aws_byte_buf_init(
&request->send_data.response_body_error, meta_request->allocator, s_response_body_error_buf_size); /* convert to size_t here */
And
Line 705 in 245af3d
struct aws_s3_client {
/* ...... */
/* Size of parts for files when doing gets or puts. This exists on the client as configurable option that is passed
* to meta requests for use. */
const uint64_t part_size; /* uint64_t here */
/* ...... */
};
/* ...... */
for (size_t buffer_index = 0; buffer_index < num_buffers; ++buffer_index) {
struct aws_s3_part_buffer *part_buffer =
aws_mem_calloc(client->allocator, 1, sizeof(struct aws_s3_part_buffer));
aws_byte_buf_init(&part_buffer->buffer, client->allocator, client->part_size); /* convert to size_t here */
aws_linked_list_push_back(free_list, &part_buffer->node);
++pool->num_allocated;
}
And build failed because warnings are treated as errors.
No issues with 64-bit build.
Would it be possible to avoid the HeadObject requests when doing a GET range request? I noticed this comment but I wonder if it's something that's feasible, or in the plans?
aws-c-s3/source/s3_auto_ranged_get.c
Line 166 in 83008e5
When reading data in Parquet format (e.g. data lake applications), the file footer needs to be read first, so an implementation that reads from S3 needs to start with a HeadObject request and thus already knows the object size. The data itself may then be read in several small range requests, so making redundant HeadObject requests for each of those adds up latency. I understand that this library is optimized for throughput, but it would be great if there was a way to have those performance benefits without introducing latency in cases where the amount of data read is small.
I'm not familiar with the internals of the auto-range request implementation, but maybe the first request could be made to the last range (at the end of the object) so that an Unsatisfiable error will be returned if the range is out of bounds?
No response
The go requirement is not documented, I was able to solve it via installed 1.20.5 go from source (the default go with ubuntu 20.04 is 1.14 I think and then I got an error of missing includes).
wget https://go.dev/dl/go1.20.5.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.20.5.linux-amd64.tar.gz
mkdir ~/.go
GOROOT=/usr/local/go
GOPATH=~/.go
PATH=$PATH:$GOROOT/bin:$GOPATH/bin
sudo update-alternatives --install "/usr/bin/go" "go" "/usr/local/go/bin/go" 0
sudo update-alternatives --set go /usr/local/go/bin/go
After solving the go issue I am getting another error:
ubuntu@ip-172-31-30-217:/mnt/data/crtsdk$ cmake -S aws-checksums -B aws-checksums/build -DCMAKE_INSTALL_PREFIX=`pwd`
CMake Error at CMakeLists.txt:31 (include):
include could not find load file:
AwsCFlags
CMake Error at CMakeLists.txt:32 (include):
include could not find load file:
AwsCheckHeaders
CMake Error at CMakeLists.txt:33 (include):
include could not find load file:
AwsSharedLibSetup
CMake Error at CMakeLists.txt:34 (include):
include could not find load file:
AwsSanitizers
CMake Error at CMakeLists.txt:36 (include):
include could not find load file:
AwsFindPackage
CMake Error at CMakeLists.txt:37 (include):
include could not find load file:
AwsFeatureTests
CMake Error at CMakeLists.txt:117 (aws_set_common_properties):
Unknown CMake command "aws_set_common_properties".
-- Configuring incomplete, errors occurred!
See also "/mnt/data/crtsdk/aws-checksums/build/CMakeFiles/CMakeOutput.log".
Please advise what is the correct way to compile? the first 3 projects do compile but the 4th one and later are missing those includes.
This problem occurs when passing an aws_s3_client_config.retry_strategy
into the s3 client.
The aws_retry_strategy_new_[...]()
functions all initialize the retry_strategy
ref_count
to 1,
which is then used in s_s3_client_finish_destroy_default
to destroy the retry_strategy
.
For client-configured retry strategies, however, reference counting is handled differently:
ref_count
of 1,aws_s3_client_new
:// source/s3_client.c
struct aws_s3_client *aws_s3_client_new(
struct aws_allocator *allocator,
const struct aws_s3_client_config *client_config) {
if (client_config->retry_strategy != NULL) {
aws_retry_strategy_acquire(client_config->retry_strategy); // <== HERE
client->retry_strategy = client_config->retry_strategy; // <== INITIAL REF COUNT IS NOW 2
} else {
struct aws_exponential_backoff_retry_options backoff_retry_options = {
.el_group = client_config->client_bootstrap->event_loop_group,
.max_retries = s_default_max_retries,
};
struct aws_standard_retry_options retry_options = {
.backoff_retry_options = backoff_retry_options,
};
client->retry_strategy = aws_retry_strategy_new_standard(allocator, &retry_options); // <== INITIAL REF COUNT = 1
}
}
As a result, the client-retry_strategy
is never released in s_s3_client_finish_destroy_default
, causing the code to hang.
Is there anywhere a set of simple examples for using aws-c-s3?
The tests seem overly complex. It would be nice to have
examples showing how to read an object, write an object, and list a bucket.
original issue: aws/aws-sdk-cpp#2822
When I try to copy object from one folder to another in same bucket I get an error Invalid response status from request (aws-c-s3: AWS_ERROR_S3_INVALID_RESPONSE_STATUS)
. This happens only for CopyObject
, for HeadObject
, ListObjects
, GetObject
, PutObject
everything is correct
Expected host
URL should be s3.giraffe360-mimosa.com
and file should be copied
from
/cold-data/projects/db9768d14f7c4055aa7518e42b633888/floorplan/roomsketcher/floorplan-ALL-final_1_0.svg
to
/cold-data/projects/db9768d14f7c4055aa7518e42b633888/floorplan/roomsketcher/final_backups/floorplan-ALL-final_1_0_2024-01-24_13-03-01.svg
Based on logs I see it has wrong host
URL
[INFO] 2024-01-24 11:03:01.675 AuthSigning [140735609304640] (id=0x7fff88004380) Signing successfully built canonical request for algorithm SigV4, with contents
HEAD
/projects/db9768d14f7c4055aa7518e42b633888/floorplan/roomsketcher/floorplan-ALL-final_1_0.svg
host:cold-data.giraffe360-mimosa.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20240124T110301Z
host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
For other requests it is correct
[INFO] 2024-01-24 11:03:01.541 AuthSigning [140735592519232] (id=0x7fff840032d0) Signing successfully built canonical request for algorithm SigV4, with contents
PUT
/cold-data/projects/db9768d14f7c4055aa7518e42b633888/floorplan/roomsketcher/geometry.json
content-length:4073
content-md5:rg4HTU7lw0cuZltmQKY+2g==
content-type:binary/octet-stream
host:s3.giraffe360-mimosa.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20240124T110301Z
content-length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
cold-data
is bucket name.
Aws::Auth::AWSCredentials credentials(access_key, secret_key, session_token);
Aws::S3Crt::ClientConfiguration config;
config.endpointOverride = host_;
config.useVirtualAddressing = false;
config.verifySSL = false;
config.enableEndpointDiscovery = false;
config.enableHostPrefixInjection = false;
config.region = Aws::Region::US_EAST_1;
config.scheme = Aws::Http::Scheme::HTTPS;
config.disableMultiRegionAccessPoints = true;
config.disableS3ExpressAuth = true;
config.payloadSigningPolicy = Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never;
config.useUSEast1RegionalEndPointOption = Aws::S3Crt::US_EAST_1_REGIONAL_ENDPOINT_OPTION::LEGACY;
config.enableEndpointDiscovery = false;
config.enableHostPrefixInjection = false;
config.version = Aws::Http::Version::HTTP_VERSION_3;
std::unique_ptr<Aws::S3Crt::S3CrtClient> client = std::make_unique<Aws::S3Crt::S3CrtClient>(credentials, config);
Aws::S3Crt::Model::CopyObjectRequest request;
request.SetBucket(bucket);
request.SetKey(dst_path);
request.SetCopySource(bucket + "/" + src_path);
auto outcome = client->CopyObject(request);
if (!outcome.IsSuccess()) {
std::cerr << "Failed to copy file " << src_path << " to " << dst_path
<< " in bucket: " " << bucket
<< " with error: " << outcome.GetError().GetMessage()
<< " and error code: " << outcome.GetError().GetExceptionName();
}
No response
No response
1.11.249
gcc 13.1.0
Ubuntu 22.04.3 LTS
As far as I know the API is undocumented (at least I can't find it in search engines and it isn't listed on the official AWS docs links,) but samples go a long way. Unfortunately the only sample given is complicated by trying to make the s3
command act like aws s3, and it also uses private APIs. Standalone examples which only use the public API would go a long way.
I'm someone trying to use the API, but I'm not even sure where to start aside from calling aws_s3_client new. I'm unable to follow what the given sample does because it uses private APIs which aren't installed when you install the library and it's not clear from reading the public headers how to perform basic tasks like "get an object from s3."
No response
No response
Provide pagination metadata so that we may perform retrievals in parallel. i.e. async listObjectsV2, we never know how many pages there are and are unable to pull a page until we know the page before it and a dynamic 'next-token'.
Or suggest strategies - thoughts:
highly complex system writing streaming data to S3 - on disaster recovery we need to scan S3 objects, identify key patterns and determine where we left off in order to restart at the correct location. Across something like 10Million keys.
Today max request is 1000 keys at a time, and no way to do this in parallel.
No response
No response
The aws-c-s3
library already supports ranged-GET for doing single byte-range
requests in parallel.
The s3 backend supports single byte-ranges, but not multipart/byteranges
as the following experiment shows.
For a Range
header containing bytes=0-499
, the server responds with
ETag: "adca0407e42a4f8b1caea85350a8d2ce-5"
Accept-Ranges: bytes
Content-Range: bytes 0-499/74153027
Content-Type: application/octet-stream
ServerAmazon: S3
Content-Length: 500
Response code: 206
To a Range
header containing bytes=0-499,500-502
the server responds with
ETag:"adca0407e42a4f8b1caea85350a8d2ce-5"
Accept-Ranges: bytes
Content-Type: application/octet-stream
ServerAmazon: S3
Content-Length: 74153027
Response code: 200
The absence of the Content-Range
header and the response code (different from 206) indicate that the backend does not support this optional feature.
Retrieving multiple ranges within a call is very useful, in particular when only a few selected ranges (e.g. chunks) of a large/huge blob are required. This saves round-trip-times and hence speeds up processing.
Multipart byte ranges can be emulated on the client side by the aws-c-s3
library by e.g.
meta_request
to the emulated multipart/byteranges
transfer,multipart/byteranges
response.We are following the example by compiling the s3 demo copy program and running it on ubuntu 20.04 using the command line:
nohup time aws-c-s3/build/samples/s3/s3 cp s3://vl-sample-dataset-kitti/Kitti/ ~/Kitti --region us-east-2 &
The program runs fine without error but the number of received file is around 11K namely half the files are missing.
# Kitti is the folder of received files with s3 c demo
ubuntu@ip-172-31-30-217:~/Kitti$ du -sh .
6.1G .
ubuntu@ip-172-31-30-217:~/Kitti$ find . -type f | wc
11647 11647 391999
# Kitti2 is the data downloaded using aws s3 sync
ubuntu@ip-172-31-30-217:/mnt/data/crtsdk$ find ~/Kitti2/ | wc
22487 22487 1161674
ubuntu@ip-172-31-30-217:/mnt/data/crtsdk$ du -sh ~/Kitti2
12G /home/ubuntu/Kitti2
All files should be copied locally
Only 50% of the files are copied, there is no error.
The machine has enough disk space for the copy:
ubuntu@ip-172-31-30-217:~$ df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 304681132 245898460 58766288 81% /
ubuntu@ip-172-31-30-217:~$
Attached below are the running logs, of listing aws via the aws-c-s3 (number of files is correctly 22k, listing via aws s3 command (again 22k) and the full output of the run that copied 11k files.
We have opened the bucket permissions so you can try on your own.
nohup time aws-c-s3/build/samples/s3/s3 cp s3://vl-sample-dataset-kitti/Kitti/ ~/Kitti --region us-east-2 &
No response
Interestingly, when looking at the number of download printouts,
grep "download: s3://vl-sample-dataset-kitti/Kitti/raw" nohup.out | sort -u | wc
I see somehitng between 34K to 44K printouts. Maybe due to multithreading?
latest from repo compiled July 6, 2023
ubuntu@ip-172-31-30-217:~/Kitti$ cmake --version cmake version 3.16.3 CMake suite maintained and supported by Kitware (kitware.com/cmake).
ubuntu@ip-172-31-30-217:/Kitti$ uname -a Linux ip-172-31-30-217 5.15.0-1039-aws #4420.04.1-Ubuntu SMP Thu Jun 22 12:21:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux, instance was t2.xlarge
If you visit https://github.com/awslabs, most repos include an About summary that makes it simple to see, at a glance, what the repo is all about. This project does not contain About text.
While debugging a problem with a drastic reduction in transfer speed, I noticed that the client->stats.num_requests_in_flight
count does not go back to 0 at the end of the transfer.
The last s3_client
statement logged before the shutdown was:
[INFO] 2022-09-27 15:10:09.977 S3ClientStats [140233490298624] id=0x7f8aa4310180 Requests-in-flight(approx/exact):0/998 Requests-preparing:0 Requests-queued:0 Requests-network(get/put/default/total):0/0/0/0
Requests-streaming-waiting:0 Requests-streaming:0 Endpoints(in-table/allocated):0/0
[DEBUG] 2022-09-27 15:10:09.977 S3Client [140233490298624] id=0x7f8aa4310180 Client shutdown progress: starting_destroy_executing=0 body_streaming_elg_allocated=0 process_work_task_scheduled=0 process_work_ta
sk_in_progress=0 num_endpoints_allocated=0 finish_destroy=1
[DEBUG] 2022-09-27 15:10:09.977 S3Client [140233490298624] id=0x7f8aa4310180 Client finishing destruction.
Please note the 998 stats.num_requests_in_flight
count in Requests-in-flight(approx/exact):0/998
.
The stats.num_requests_in_flight
is used to regulate the number of new requests that are sent, the high count of exact requests-in-flight (in contrast to 0 approximate in-flight requests) caused the request rate to drop drastically.
The following debug log statement brought clarity:
// source/s3_request.c
static void s_s3_request_destroy(void *user_data) {
struct aws_s3_request *request = user_data;
if (request == NULL) {
return;
}
struct aws_s3_meta_request *meta_request = request->meta_request;
if (meta_request != NULL) {
struct aws_s3_client *client = meta_request->client;
if (client != NULL) {
aws_s3_client_notify_request_destroyed(client, request); // <== NOT CALLED WHEN client is NULL
}
}
if (request->tracked_by_client) { // <=== ADDED THIS STATEMENT TO LOGS
AWS_LOGF_ERROR(
AWS_LS_S3_REQUEST,
"id=%p REQUEST DESTROY meta=%p client=%p",
(void *)request,
(void *)meta_request,
(void *) (meta_request == NULL ? NULL : meta_request->client));
}
//...
}
In the log generated by the transfer, there were 1337 (actual number) entries of the following type:
[ERROR] 2022-09-27 15:19:24.323 S3Request [139957177939712] id=0x7f4a4d66af00 REQUEST DESTROY meta=0x7f489dd9b800 client=(nil)
According to the above code, this means that the stats.num_requests_in_flight
was off due to the client
field being NULL
.
As a result, the transfer speed dropped drastically, from full speed to only a few requests per second.
The client
field is needed to accurately maintain the stats.num_requests_in_flight
field responsible for the request-speed-control.
If possible, find the code path that sets meta_request->client
to NULL
before destroying the request.
Otherwise, a different approach is needed for maintaining the counter.
I believe this is responsible for an object metadata request failing when trying to copy objects using the AWS Java SDK v2 Transfer Manager API (and presumably for other clients using this library). I raised aws/aws-sdk-java-v2#3370 initially, then tracked the problem to the changes introduced in #166 .
The object metadata request will fail if talking to a local Minio server, e.g. the endpoint specified is localhost:9100
, in theory the Host header value should be my-bucket.localhost:9100
(as it is for other SDK client requests), but is actually my-bucket.s3.us-west-2.amazonaws.com
, to which Minio will understandably respond with a 404. I don't know if this also fails when talking to Amazon S3 for regions other than us-west-2.
I'm not familiar enough with this code (and haven't touched C in too long) to offer a solution, but perhaps @cmello or someone else could take a look. No doubt the actual request endpoint is readily available somewhere and can be used to construct the Host header value.
Mem limiter provides a push back mechanism on scheduler if memory usage is getting close to the limit.
With gets there is a chicken and egg problem, since we dont know the size object before doing a get and we want to avoid making additional request to figure out that size before doing a get (cause additional roundtrips for get tank perf). So crt will optimistically do a ranged get with a part size to get a first ranged part and figure out the overall size.
This approach works fine in most cases. But it will unnecessarily slow down gets when part size is huge and gets itself are small. Ex. part size is 1 GB and the files being retrieved are 1mb. Mem limiter in that case would only be able to schedule 4 gets in parallel (assuming 4 gb mem limit), since it would account for the worst case of getting back 1GB part. But in practice we should be able to schedule a lot more gets in parallel, cause they are all small.
refer to aws/aws-sdk-cpp#2922 for example of this in the wild
something better?
download slows down to a crawl on lots of small gets if part size is huge
set part size to a gig and observe downloads on 10k 256kb files
No response
No response
latest
every compiler
every os
We are seeing frequent/intermittent segmentation faults with v0.1.44 and aws-c-http
v0.6.19, the problem is also present on master
as of today (c1198ae):
[DEBUG] 2022-08-20 01:03:31.497 connection-manager [139657729799936] id=(nil): Acquire connection
FATAL: Received signal 11 (Segmentation fault)
Backtrace (most recent call first)
#10 <?> at 0x7f04b9f75980 in __restore_rt
#9 <?> at 0x55afd6f0535b in s_s3_client_acquired_retry_token
#8 <?> at 0x55afd6f60ff2 in s_exponential_retry_task
#7 <?> at 0x55afd6fceb39 in aws_task_run
#6 <?> at 0x55afd6fced90 in s_run_all
#5 <?> at 0x55afd6fcf1db in aws_task_scheduler_run_all
#4 <?> at 0x55afd6f6a215 in s_main_loop
#3 <?> at 0x55afd6fd3100 in thread_fn
#2 <?> at 0x55afd3f35304 in thread_metrics_start_routine
#1 <?> at 0x7f04b9f6a6db in start_thread
This happens on the first attempt, not on retry.
It turns out that the connection_manager
is NULL
when it is needed:
// aws-c-http/source/connection_manager.c
void aws_http_connection_manager_acquire_connection(
struct aws_http_connection_manager *manager,
aws_http_connection_manager_on_connection_setup_fn *callback,
void *user_data) {
AWS_LOGF_DEBUG(AWS_LS_HTTP_CONNECTION_MANAGER, "id=%p: Acquire connection", (void *)manager); // <=== HERE
The above function is used as the .acquire_http_connection function
pointer of the s_s3_client_default_vtable
, and invoked from s_s3_client_acquired_retry_token
like this:
// aws-c-s3/source/s3_client.c
static void s_s3_client_acquired_retry_token(
struct aws_retry_strategy *retry_strategy,
int error_code,
struct aws_retry_token *token,
void *user_data) {
// ....
client->vtable->acquire_http_connection(
endpoint->http_connection_manager, s_s3_client_on_acquire_http_connection, connection);
}
So the endpoint->http_connection_manager
is NULL
when it should not be.
At debug
level, the logs do not reveal much, hence we used this patch to log more information:
[INFO] 2022-08-20 01:03:31.496 AuthSigning [139657322956544] (id=0x7f047f82c050) Http request successfully built final authorization value via algorithm SigV4, with contents
AWS4-HMAC-SHA256 Credential=ASIAYO6OYNH5WYXVUNEK/20220820/us-east-1/s3/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=0da7d
7a8a6257b602c2234599874dd5dba60f43ec6f51143efdc67a78269aced
[DEBUG] 2022-08-20 01:03:31.496 task-scheduler [139657528473344] id=0x7f04898cf158: Scheduling s3_client_process_work_task task for immediate execution
[DEBUG] 2022-08-20 01:03:31.496 task-scheduler [139657528473344] id=0x7f04898cf158: Running s3_client_process_work_task task with <Running> status
[DEBUG] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 s_s3_client_process_work_default - Moving relevant synced_data into threaded_data.
[DEBUG] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 s_s3_client_process_work_default - Processing any new meta requests.
[DEBUG] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 Updating meta requests.
[DEBUG] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 Updating connections, assigning requests where possible.
[ERROR] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 0x7f04a4fd1ca0 s_s3_client_create_connection_for_request_default: (nil)
[DEBUG] 2022-08-20 01:03:31.497 exp-backoff-strategy [139657528473344] id=0x7f049a6c64d0: Initializing retry token 0x7f048bc12100
[DEBUG] 2022-08-20 01:03:31.497 task-scheduler [139657729799936] id=0x7f048bc12160: Scheduling aws_exponential_backoff_retry_task task for immediate execution
[INFO] 2022-08-20 01:03:31.497 S3ClientStats [139657528473344] id=0x7f04898cf000 Requests-in-flight(approx/exact):1/1 Requests-preparing:0 Requests-queued:0 Requests-network(get/put/default/total):0/1/0/1 Requests-streaming-waiting:0 Requests-streaming:0 Endpoints(in-table/allocated):1/0
[DEBUG] 2022-08-20 01:03:31.497 task-scheduler [139657729799936] id=0x7f048bc12160: Running aws_exponential_backoff_retry_task task with <Running> status
[DEBUG] 2022-08-20 01:03:31.497 exp-backoff-strategy [139657729799936] id=0x7f049a6c64d0: Vending retry_token 0x7f048bc12100
[DEBUG] 2022-08-20 01:03:31.497 connection-manager [139657729799936] id=(nil): Acquire connection
FATAL: Received signal 11 (Segmentation fault)
Backtrace (most recent call first)
#10 <?> at 0x7f04b9f75980 in __restore_rt
#9 <?> at 0x55afd6f0535b in s_s3_client_acquired_retry_token
#8 <?> at 0x55afd6f60ff2 in s_exponential_retry_task
#7 <?> at 0x55afd6fceb39 in aws_task_run
#6 <?> at 0x55afd6fced90 in s_run_all
#5 <?> at 0x55afd6fcf1db in aws_task_scheduler_run_all
#4 <?> at 0x55afd6f6a215 in s_main_loop
#3 <?> at 0x55afd6fd3100 in thread_fn
#2 <?> at 0x55afd3f35304 in thread_metrics_start_routine
#1 <?> at 0x7f04b9f6a6db in start_thread
The line
[ERROR] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 0x7f04a4fd1ca0 s_s3_client_create_connection_for_request_default: (nil)
shows that client 0x7f04898cf000 uses endpoint 0x7f04a4fd1ca0 whose http_connection_manager
is (nil)
. This is the bug, which produces the subsequent crash.
Here is the trace for that endpoint from the log:
[DEBUG] 2022-08-20 01:03:31.393 S3Endpoint [139658686979648] id=0x7f04a4fd1ca0: Created connection manager 0x7f0430ea0fc0 for endpoint
[ERROR] 2022-08-20 01:03:31.393 S3Client [139658686979648] id=0x7f04898cf000 0x7f04a4fd1ca0 aws_s3_client_make_meta_request: aurora-simulation-prod-logs.s3.us-east-1.amazonaws.com ADDED 1
[ERROR] 2022-08-20 01:03:31.394 S3Client [139657528473344] id=0x7f04898cf000 0x7f04a4fd1ca0 s_s3_client_create_connection_for_request_default: 0x7f0430ea0fc0
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657616553728] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: s_s3_endpoint_ref_count_zero
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: s_s3_endpoint_ref_count_zero - removing connection_manager
[ERROR] 2022-08-20 01:03:31.495 S3Client [139658686979648] id=0x7f04898cf000 0x7f04a4fd1ca0 aws_s3_client_make_meta_request: aurora-simulation-prod-logs.s3.us-east-1.amazonaws.com REF conman: (nil)
[ERROR] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 0x7f04a4fd1ca0 s_s3_client_create_connection_for_request_default: (nil)
Pay close attention to the thread IDs in this part:
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657616553728] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
That means that aws_s3_client_endpoint_release
was invoked from 2 different threads at the same time (possibly synchronized via mutex), both ended up calling aws_ref_count_release(&endpoint->ref_count)
, decrementing from 2 -> 1 -> 0
.
Both were seeing an endpoint->ref_count.ref_count
of 2 initially, hence neither ejected the endpoint from the hash table.
Here is the function with above logging added:
// source/s3_endpoint.c
void aws_s3_client_endpoint_release(struct aws_s3_client *client, struct aws_s3_endpoint *endpoint) {
AWS_PRECONDITION(endpoint);
AWS_PRECONDITION(client);
AWS_PRECONDITION(endpoint->handled_by_client);
AWS_LOGF_ERROR(
AWS_LS_S3_ENDPOINT,
"id=%p: %p aws_s3_client_endpoint_release, count = %d",
(void *)endpoint,
(void *)client,
aws_atomic_load_int(&endpoint->ref_count.ref_count)); // <== BOTH SEE 2 HERE
/* BEGIN CRITICAL SECTION */
{
aws_s3_client_lock_synced_data(client);
/* The last refcount to release */
if (aws_atomic_load_int(&endpoint->ref_count.ref_count) == 1) { // <== BOTH SEE 2 HERE
AWS_LOGF_ERROR(
AWS_LS_S3_ENDPOINT,
"id=%p: aws_s3_client_endpoint_release - removing from hashtable",
(void *)endpoint);
aws_hash_table_remove(&client->synced_data.endpoints, endpoint->host_name, NULL, NULL);
}
aws_s3_client_unlock_synced_data(client);
}
/* END CRITICAL SECTION */
aws_ref_count_release(&endpoint->ref_count); // <== BOTH CALL THIS
}
Hence the expected "aws_s3_client_endpoint_release - removing from hashtable"
does not appear in the log.
Instead, since endpoint->ref_count
now reaches 0, the http_connection_manager
of endpoint 0x7f04a4fd1ca0 is now removed, released and set to NULL
:
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657616553728] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: s_s3_endpoint_ref_count_zero
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: s_s3_endpoint_ref_count_zero - removing connection_manager
When next the s3_client
wants to make a request, it finds an entry in said hash table, uses it, and logs the fact that the http_connection_manager
field of that endpoint is (nil)
:
[DEBUG] 2022-08-20 01:03:31.495 S3Client [139657528473344] id=0x7f04898cf000 Updating connections, assigning requests where possible.
[DEBUG] 2022-08-20 01:03:31.495 S3MetaRequest [139658686979648] id=0x7f04a3dd0200 Created new Default Meta Request.
[ERROR] 2022-08-20 01:03:31.495 S3Client [139658686979648] id=0x7f04898cf000 0x7f04a4fd1ca0 aws_s3_client_make_meta_request: aurora-simulation-prod-logs.s3.us-east-1.amazonaws.com REF conman: (nil) <=== HERE
[INFO] 2022-08-20 01:03:31.495 S3Client [139658686979648] id=0x7f04898cf000: Created meta request 0x7f04a3dd0200
[INFO] 2022-08-20 01:03:31.495 S3ClientStats [139657528473344] id=0x7f04898cf000 Requests-in-flight(approx/exact):0/0 Requests-preparing:0 Requests-queued:0 Requests-network(get/put/default/total):0/0/0/0 Requests-streaming-waiting:0 Requests-streaming:0 Endpoints(in-table/allocated):1/0
It also reports that it has 1 endpoint (0x7f04a4fd1ca0) in-table, and 0 allocated.
So the cause of the problem is an inconsistency:
s3_client
loads an s3_endpoint from the hash table, it expects its http_connection_manager
to not be NULL
,aws_s3_client_endpoint_release
prevented a required release of the endpoint from the hash table.Here is a possible sequence of two threads, T1 and T2, where .Lock()
stands for taking the synced_data.lock
:
T1.Lock()
// blocks T2, which also needs to go through this section
// reference count is 2, hence do nothing
T1.Unlock()
T2.Lock() // now grabs the lock
// reference count is 2, hence do nothing
T2.Unlock()
T1.aws_ref_count_release(&endpoint->ref_count); // Atomic, happens in sequence
T2.aws_ref_count_release(&endpoint->ref_count);
// Value of endpoint->ref_count is now 0
The same would happen if T2 was allowed to decrement endpoint->ref_count
first.
Since ref_count.ref_count
is read/seen within the critical section, but modified outside of the critical section, the race condition can occur.
It can be avoided by pulling aws_ref_count_release(&endpoint->ref_count);
inside the critical section.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.