awslabs / aws-c-s3 Goto Github PK

C99 library implementation for communicating with the S3 service, designed for maximizing throughput on high bandwidth EC2 instances.

License: Apache License 2.0

CMake 1.16% C 97.14% Python 1.70%

aws-sdk hacktoberfest

aws-c-s3's People

Contributors

Stargazers

Watchers

aws-c-s3's Issues

S3-Transfer-Manager Streaming

Describe the bug

I've been testing combination of latest aws-c-s3 update regarding #285 and the ability to stream data of unknown size to S3 using a publisher.

I have 2 concerns aside from errors I'm getting -

For small loads this works great, but starts breaking down somewhere after 128MB at least in my setup. I am not sure if the issue is with this project or the java v2 sdk for s3-transfer-manager - feel free to redirect.
As the size increases, there is a jump in time to complete the upload - at 128M I've seen the completion step is anywhere from ~4s to 18s,

Platform: Macbook Pro, Ventura 13.4, 32GB, Apple M1 Max ARM64 processor.
Aws CLI: aws-cli/2.12.1 Python/3.11.4 Darwin/22.5.0 source/arm64
TargetThroughput: 20.0Gbps
Minimum Part Size: 1000000L (I think this causes issues after 128M)

I've mocked up data which is just a string of 128 Bytes that I send over and over in a ByteBuffer.

Errors include failed response (400) from awssdk, missing checksums for parts, and SIGSEGV on libobjc.A.dylib... I've also received a SIGABRT which doesn't even give me a dump.

Attaching a simple Java project to test - configure whichever credentials and a bucket name and execute - as you increase num lines you'll start to see issues. I create the crt log in whatever your work directory happens to be.

Expected Behavior

Files uploaded successfully and can manage at least 100G S3 Objects.
Closing the stream doesn't grow with file size so drastically as it does now.

Current Behavior

Crashes, Failed uploads, heavy delay on completing the upload.

Reproduction Steps

s3AsyncTest.zip

Possible Solution

No response

Additional Information/Context

No response

aws-c-s3 version used

aws.sdk.version 2.20.79, aws.crt.version 0.22.1

Compiler and version used

openjdk 17.0.3 2022-04-19

Operating System and version

Darwin US10MAC44VWYPKH 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:52:24 PDT 2023; root:xnu-8796.121.2~5/RELEASE_ARM64_T6000 arm64 arm Darwin

Support machines with multiple NICs

Describe the feature

AWS has a growing list of instance types with multiple NICs. There should be some way to configure the S3 client to use a particular network interface (or multiple network interfaces?)

Use Case

Maximum throughput cannot be achieved with all instances of the S3 client using just the one default network interface.

Proposed Solution

S3 client takes IP of network interface to bind its connections to
If multiple IPs passed, the S3 client distributes connections among them all

Other Information

Mountpoint (which uses aws-c-s3) recently received this feature request: awslabs/mountpoint-s3#815

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

FreeBSD port: testunit fails

poudriere interactive jail for:

package name: aws-c-s3-0.1.47_1
building for: FreeBSD 131amd64-devel 13.1-RELEASE FreeBSD 13.1-RELEASE amd64

-- The C compiler identification is Clang 13.0.0

make test:

(...)
174 - test_s3_copy_source_prefixed_by_slash (Failed)
        175 - test_s3_copy_source_prefixed_by_slash_multipart (Failed)
        178 - test_s3_list_bucket_valid (Failed)
Errors while running CTest
Output from these tests are in: /wrkdirs/usr/ports/devel/aws-c-s3/work/.build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
FAILED: CMakeFiles/test.util
cd /wrkdirs/usr/ports/devel/aws-c-s3/work/.build && /usr/local/bin/ctest --force-new-ctest-process
ninja: build stopped: subcommand failed.
*** Error code 1

ctest --force-new-ctest-process --rerun-failed --output-on-failure

Results:

Thanks

Under certain conditions meta requests are retried after receiving non-recoverable S3 error responses.

Hello,

I have noticed that when using the AWS CRT to upload S3 objects with auto-ranged
PUT requests, occasionally when the server sends an error response, instead of
the meta request immediately terminating and invoking struct aws_s3_meta_request_options.finish_callback, the meta request will be retried
until the maximum number of retry attempts has been met at which point the
finish callback will be invoked with either AWS_IO_SOCKET_CLOSED or
AWS_ERROR_HTTP_CONNECTION_CLOSED.

After some investigation, I believe this happens if and only if the request body
is still in the process of being transmitted when the error response (with a
Connection: close header) has been received, followed by a subsequent TCP FIN
or TLS close_notify alert.

When the Connection: close header is received,
aws-c-http/source/h1_connection.c:s_decoder_on_header() sets
incoming_stream->is_final_stream = true. Then in
aws-c-http/source/h1_connection.c:s_decoder_on_done(), after seeing that
is_final_stream is set,

        s_stop(
            connection, true /*stop_reading*/, false /*stop_writing*/, false /*schedule_shutdown*/, AWS_ERROR_SUCCESS);

is called, which sets connection->thread_data.is_reading_stopped = true, but
does not indicate at all to its leftward slot or the channel as a whole to stop
reading.

Later in aws-c-http/source/h1_connection.c:s_decoder_on_done(), if
incoming_stream->is_outgoing_message_done is set, then s_stream_complete()
is called, and (since is_final_stream is set), s_connection_close() is
called and

    s_stop(connection, false /*stop_reading*/, false /*stop_writing*/, true /*schedule_shutdown*/, AWS_ERROR_SUCCESS);

finally leads to aws_channel_shutdown() being called.

But if is_outgoing_message_done is not set, then the leftward slot will
continue its read loop and immediately encounter the TCP FIN (when cleartext
HTTP is being used) or TLS close_notify alert (when HTTPS is being used).

In the HTTP case, the aws_socket_read() call in
aws-c-io/socket_channel_handler.c:s_do_read() will get a zero-length read and
raise AWS_IO_SOCKET_CLOSED.

In the HTTPS case (at least in s2n builds), the s2n_recv() call in
aws-c-io/source/s2n/s2n_tls_channel_handler.c:s_s2n_handler_process_read_message()
will get a zero-length read, the close_notify alert will be logged, and return
with AWS_OP_SUCCESS. I haven't quite tracked down the exact code path yet, but
this ends up with aws-c-http/source/h1_connection.c:s_stream_complete() being
invoked with AWS_ERROR_HTTP_CONNECTION_CLOSED.

This AWS_IO_SOCKET_CLOSED or AWS_ERROR_HTTP_CONNECTION_CLOSED value
eventually propagates to
aws-c-s3/source/s3_meta_request.c:s_s3_meta_request_send_request_finish_helper(). Since
this error code does not match AWS_ERROR_S3_INVALID_RESPONSE_STATUS or
AWS_ERROR_S3_NON_RECOVERABLE_ASYNC_ERROR, finish_code is set to
AWS_S3_CONNECTION_FINISH_CODE_RETRY, and the meta request is tried again
despite the error response from the S3 server potentially being non-recoverable.

I am not sure if the proper solution to this lies in aws-c-s3 (e.g. changing
the logic for when a meta request should be retried), aws-c-http (e.g. signal
its leftward slot to stop reading once a complete Connection: close response
has been processed), aws-c-io (e.g. change the socket and TLS handler logic to
be able to stop reading immediately when signaled by its rightward slot), or
some combination of the three, but I figured that raising the issue in the
top-level repo would make the most sense. Let me know if this issue would be
more appropriate in one of the other CRT repos instead.

It may also make sense to consider:

Logic to immediately stop transmitting a request when a response that
indicates failure has been received (perhaps if the HTTP response status >=
300) to avoid potentially triggering a RST from the server
Ensuring that a response is gracefully handled even if a RST, TLS alert, or
write error is received after having received a complete response. I have seen
instances where instead of the zero read in
aws-c-io/source/posix/socket.c:aws_socket_read(), an ECONNRESET was
received--looking at a pcap of the stream a RST had been received immediately
after the FIN.

I have also seen AWS_IO_TLS_ERROR_WRITE_FAILURE raised after the
close_notify TLS alert was received, but this case also ends up with
AWS_ERROR_HTTP_CONNECTION_CLOSED being propagated to
aws-c-s3/source/s3_meta_request.c:s_s3_meta_request_send_request_finish_helper().

Here are some log snippets of such failures that I was able to reproduce using
both HTTP and HTTPS and using the AWS S3 servers and a third-party, nominally
S3-compatible server that I spun up locally for testing (I have seen this error
behavior using both HTTP and HTTPS with both the AWS S3 servers and the
third-party local server). Note that the AWS_LL_TRACE messages that were
logged are listed with level DEBUG due to my logger implementation.

Log snippet using HTTP with FIN/zero read

[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Reading from body stream.
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Sending 16384 bytes of body, progress: 703902/804335
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming response status: 403 (Forbidden).
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: x-amz-request-id: DCT2S3BF391M7NX6
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: x-amz-id-2: tm8C9/WzAZn6vxs1/g9q36SVl4NvfTW1fgcwuY8nf6AkEeA2ZJsE7UCUIiUevjXX2tXomJ8LR30=
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: Content-Type: application/xml
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: Transfer-Encoding: chunked
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: Date: Thu, 02 Mar 2023 21:09:54 GMT
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: Server: AmazonS3
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming header: Connection: close
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Received 'Connection: close' header. This will be the final stream on this connection.
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Main header block done.
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Incoming body: 344 bytes received.
[DEBUG]: AWS [00007f1b114fc700] [S3MetaRequest] - id=0x61600006ea80 Incoming body for request 0x611000059f00. Response status: 403. Data Size: 344. connection: 0x60400006dfd0.
[DEBUG]: AWS [00007f1b114fc700] [S3MetaRequest] - response body:
[DEBUG]: <?xml version="1.0" encoding="UTF-8"?>
[DEBUG]: <Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>minioadmin</AWSAccessKeyId><RequestId>DCT2S3BF391M7NX6</RequestId><HostId>tm8C9/WzAZn6vxs1/g9q36SVl4NvfTW1fgcwuY8nf6AkEeA2ZJsE7UCUIiUevjXX2tXomJ8LR30=</HostId></Error>
[INFO]: AWS [00007f1b114fc700] [socket] - id=0x613000053b00 fd=19: zero read, socket is closed
[DEBUG]: AWS [00007f1b114fc700] [task-scheduler] - id=0x61800007f680: Scheduling channel_shutdown task for immediate execution
[DEBUG]: AWS [00007f1b114fc700] [channel] - id=0x61800007f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007f1b114fc700] [task-scheduler] - id=0x61800007f680: Running channel_shutdown task with <Running> status
[DEBUG]: AWS [00007f1b114fc700] [channel] - id=0x61800007f480: beginning shutdown process
[DEBUG]: AWS [00007f1b114fc700] [channel] - id=0x61800007f480: handler 0x61200009b2c0 shutdown in read dir completed.
[DEBUG]: AWS [00007f1b114fc700] [channel] - id=0x61800007f480: handler 0x61700007eb08 shutdown in read dir completed.
[DEBUG]: AWS [00007f1b114fc700] [task-scheduler] - id=0x61800007f4a8: Scheduling (null) task for immediate execution
[DEBUG]: AWS [00007f1b114fc700] [task-scheduler] - id=0x61800007f4a8: Running (null) task with <Running> status
[DEBUG]: AWS [00007f1b114fc700] [http-stream] - id=0x6160000be580: Stream completed with error code 1051 (AWS_IO_SOCKET_CLOSED).
[INFO]: AWS [00007f1b114fc700] [http-connection] - id=0x61700007eb00: Shutting down connection with error code 0 (AWS_ERROR_SUCCESS).
[DEBUG]: AWS [00007f1b114fc700] [channel] - id=0x61800007f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007f1b114fc700] [S3MetaRequest] - id=0x61600006ea80: Request 0x611000059f00 finished with error code 1051 (aws-c-io: AWS_IO_SOCKET_CLOSED, socket is closed.) and response status 403
[ERROR]: AWS [00007f1b114fc700] [S3MetaRequest] - id=0x61600006ea80 Meta request failed from error 1051 (socket is closed.). (request=0x611000059f00, response status=403). Try to setup a retry.

Log snippet using HTTP with ECONNRESET

[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Reading from body stream.
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Sending 16384 bytes of body, progress: 720286/804335
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming response status: 403 (Forbidden).
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: x-amz-request-id: B5VDFTXHSRW9HVK3
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: x-amz-id-2: vJaT/N9Q9JXQkGu91IP3VH4ak//uge8vt9hDXolY8PJp06bsvNo0SkMoLPy9fpAjXRkDTeKh84Y=
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: Content-Type: application/xml
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: Transfer-Encoding: chunked
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: Date: Thu, 02 Mar 2023 21:09:54 GMT
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: Server: AmazonS3
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming header: Connection: close
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Received 'Connection: close' header. This will be the final stream on this connection.
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Main header block done.
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Incoming body: 344 bytes received.
[DEBUG]: AWS [00007f1b10cfb700] [S3MetaRequest] - id=0x61600006ea80 Incoming body for request 0x611000059f00. Response status: 403. Data Size: 344. connection: 0x60400006dfd0.
[DEBUG]: AWS [00007f1b10cfb700] [S3MetaRequest] - response body:
[DEBUG]: <?xml version="1.0" encoding="UTF-8"?>
[DEBUG]: <Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>minioadmin</AWSAccessKeyId><RequestId>B5VDFTXHSRW9HVK3</RequestId><HostId>vJaT/N9Q9JXQkGu91IP3VH4ak//uge8vt9hDXolY8PJp06bsvNo0SkMoLPy9fpAjXRkDTeKh84Y=</HostId></Error>
[INFO]: AWS [00007f1b10cfb700] [socket] - id=0x613000045b00 fd=19: socket is closed.
[DEBUG]: AWS [00007f1b10cfb700] [task-scheduler] - id=0x61800006f680: Scheduling channel_shutdown task for immediate execution
[DEBUG]: AWS [00007f1b10cfb700] [channel] - id=0x61800006f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007f1b10cfb700] [task-scheduler] - id=0x61800006f680: Running channel_shutdown task with <Running> status
[DEBUG]: AWS [00007f1b10cfb700] [channel] - id=0x61800006f480: beginning shutdown process
[DEBUG]: AWS [00007f1b10cfb700] [channel] - id=0x61800006f480: handler 0x6120000832c0 shutdown in read dir completed.
[DEBUG]: AWS [00007f1b10cfb700] [channel] - id=0x61800006f480: handler 0x61700005ec08 shutdown in read dir completed.
[DEBUG]: AWS [00007f1b10cfb700] [task-scheduler] - id=0x61800006f4a8: Scheduling (null) task for immediate execution
[DEBUG]: AWS [00007f1b10cfb700] [task-scheduler] - id=0x61800006f4a8: Running (null) task with <Running> status
[DEBUG]: AWS [00007f1b10cfb700] [http-stream] - id=0x6160000ae680: Stream completed with error code 1051 (AWS_IO_SOCKET_CLOSED).
[INFO]: AWS [00007f1b10cfb700] [http-connection] - id=0x61700005ec00: Shutting down connection with error code 0 (AWS_ERROR_SUCCESS).
[DEBUG]: AWS [00007f1b10cfb700] [channel] - id=0x61800006f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007f1b10cfb700] [S3MetaRequest] - id=0x61600006ea80: Request 0x611000059f00 finished with error code 1051 (aws-c-io: AWS_IO_SOCKET_CLOSED, socket is closed.) and response status 403
[ERROR]: AWS [00007f1b10cfb700] [S3MetaRequest] - id=0x61600006ea80 Meta request failed from error 1051 (socket is closed.). (request=0x611000059f00, response status=403). Try to setup a retry.

Log snippet using HTTPS

[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Reading from body stream.
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Sending 16331 bytes of body, progress: 244362/804335
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming response status: 403 (Forbidden).
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Accept-Ranges: bytes
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Content-Length: 314
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Content-Type: application/xml
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Server: MinIO
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Vary: Origin
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Date: Thu, 02 Mar 2023 19:09:19 GMT
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming header: Connection: close
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Received 'Connection: close' header. This will be the final stream on this connection.
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Main header block done.
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Incoming body: 314 bytes received.
[DEBUG]: AWS [00007fe36ccfb700] [S3MetaRequest] - id=0x61600003ed80 Incoming body for request 0x611000059f00. Response status: 403. Data Size: 314. connection: 0x604000027fd0.
[DEBUG]: AWS [00007fe36ccfb700] [S3MetaRequest] - response body:
[DEBUG]: <?xml version="1.0" encoding="UTF-8"?>
[DEBUG]: <Error><Code>RequestTimeTooSkewed</Code><Message>The difference between the request time and the server&#39;s time is too large.</Message><Resource>/test0/twocities.txt</Resource><RequestId></RequestId><HostId>64e90a68-95ff-4541-bf12-7e354f1dc058</HostId></Error>
[DEBUG]: AWS [00007fe36ccfb700] [tls-handler] - id=0x61600007cb80: Alert code 0
[DEBUG]: AWS [00007fe36ccfb700] [task-scheduler] - id=0x61800003f680: Scheduling channel_shutdown task for immediate execution
[DEBUG]: AWS [00007fe36ccfb700] [task-scheduler] - id=0x61800003f680: Running channel_shutdown task with <Running> status
[DEBUG]: AWS [00007fe36ccfb700] [channel] - id=0x61800003f480: beginning shutdown process
[DEBUG]: AWS [00007fe36ccfb700] [channel] - id=0x61800003f480: handler 0x61200005b6c0 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36ccfb700] [tls-handler] - id=0x61600007cb80: Shutting down read direction with error code 0
[DEBUG]: AWS [00007fe36ccfb700] [channel] - id=0x61800003f480: handler 0x61600007cb80 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36ccfb700] [channel] - id=0x61800003f480: handler 0x61700005b788 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36ccfb700] [task-scheduler] - id=0x61800003f4a8: Scheduling (null) task for immediate execution
[DEBUG]: AWS [00007fe36ccfb700] [task-scheduler] - id=0x61800003f4a8: Running (null) task with <Running> status
[DEBUG]: AWS [00007fe36ccfb700] [http-stream] - id=0x61600007bf80: Stream completed with error code 2058 (AWS_ERROR_HTTP_CONNECTION_CLOSED).
[INFO]: AWS [00007fe36ccfb700] [http-connection] - id=0x61700005b780: Shutting down connection with error code 0 (AWS_ERROR_SUCCESS).
[DEBUG]: AWS [00007fe36ccfb700] [channel] - id=0x61800003f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007fe36ccfb700] [S3MetaRequest] - id=0x61600003ed80: Request 0x611000059f00 finished with error code 2058 (aws-c-http: AWS_ERROR_HTTP_CONNECTION_CLOSED, The connection has closed or is closing.) and response status 403
[ERROR]: AWS [00007fe36ccfb700] [S3MetaRequest] - id=0x61600003ed80 Meta request failed from error 2058 (The connection has closed or is closing.). (request=0x611000059f00, response status=403). Try to setup a retry.

Log snippet using HTTPS with `AWS_IO_TLS_ERROR_WRITE_FAILURE` mentioned

[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Reading from body stream.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Sending 16331 bytes of body, progress: 228031/804335
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming response status: 403 (Forbidden).
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Accept-Ranges: bytes
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Content-Length: 314
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Content-Type: application/xml
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Server: MinIO
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Vary: Origin
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Date: Thu, 02 Mar 2023 19:09:19 GMT
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming header: Connection: close
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Received 'Connection: close' header. This will be the final stream on this connection.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Main header block done.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Incoming body: 314 bytes received.
[DEBUG]: AWS [00007fe36d4fc700] [S3MetaRequest] - id=0x61600003ed80 Incoming body for request 0x611000059f00. Response status: 403. Data Size: 314. connection: 0x604000027fd0.
[DEBUG]: AWS [00007fe36d4fc700] [S3MetaRequest] - response body:
[DEBUG]: <?xml version="1.0" encoding="UTF-8"?>
[DEBUG]: <Error><Code>RequestTimeTooSkewed</Code><Message>The difference between the request time and the server&#39;s time is too large.</Message><Resource>/test0/twocities.txt</Resource><RequestId></RequestId><HostId>64e90a68-95ff-4541-bf12-7e354f1dc058</HostId></Error>
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61100006a840: Scheduling socket_written_task task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61100006a840: Running socket_written_task task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61700007b710: Scheduling http1_connection_outgoing_stream task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61700007b710: Running http1_connection_outgoing_stream task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Reading from body stream.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Sending 16331 bytes of body, progress: 244362/804335
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61100006a840: Scheduling socket_written_task task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [tls-handler] - id=0x61600008d080: Alert code 0
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61800004f680: Scheduling channel_shutdown task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61100006a840: Running socket_written_task task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61700007b710: Scheduling http1_connection_outgoing_stream task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61800004f680: Running channel_shutdown task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: beginning shutdown process
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: handler 0x6120000748c0 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36d4fc700] [tls-handler] - id=0x61600008d080: Shutting down read direction with error code 0
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: handler 0x61600008d080 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: handler 0x61700007b688 shutdown in read dir completed.
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61800004f4a8: Scheduling (null) task for immediate execution
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61700007b710: Running http1_connection_outgoing_stream task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Reading from body stream.
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Sending 16331 bytes of body, progress: 260693/804335
[ERROR]: AWS [00007fe36d4fc700] [http-connection] - id=0x61700007b680: Failed to send message in write direction, error 1031 (AWS_IO_TLS_ERROR_WRITE_FAILURE). Closing connection.
[INFO]: AWS [00007fe36d4fc700] [http-connection] - id=0x61700007b680: Shutting down connection with error code 1031 (AWS_IO_TLS_ERROR_WRITE_FAILURE).
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007fe36d4fc700] [task-scheduler] - id=0x61800004f4a8: Running (null) task with <Running> status
[DEBUG]: AWS [00007fe36d4fc700] [http-stream] - id=0x61600008c480: Stream completed with error code 2058 (AWS_ERROR_HTTP_CONNECTION_CLOSED).
[INFO]: AWS [00007fe36d4fc700] [http-connection] - id=0x61700007b680: Shutting down connection with error code 0 (AWS_ERROR_SUCCESS).
[DEBUG]: AWS [00007fe36d4fc700] [channel] - id=0x61800004f480: Channel shutdown is already pending, not scheduling another.
[DEBUG]: AWS [00007fe36d4fc700] [S3MetaRequest] - id=0x61600003ed80: Request 0x611000059f00 finished with error code 2058 (aws-c-http: AWS_ERROR_HTTP_CONNECTION_CLOSED, The connection has closed or is closing.) and response status 403
[ERROR]: AWS [00007fe36d4fc700] [S3MetaRequest] - id=0x61600003ed80 Meta request failed from error 2058 (The connection has closed or is closing.). (request=0x611000059f00, response status=403). Try to setup a retry.

The CRT library versions that I'm currently using are

aws-c-auth: v0.6.25
aws-c-cal: v0.5.21
aws-c-common: v0.8.12
aws-c-compression: v0.2.16
aws-checksums: v0.1.14
aws-c-http: v0.7.5
aws-c-io: v0.13.18
aws-c-s3: v0.2.5
aws-c-sdkutils: v0.1.7
s2n-tls: v1.3.38

but I had seen this behavior for many months across multiple earlier versions
before finding the time to try to track down the error to a reportable issue.

Let me know if there is any additional information that you would like me to
provide.

Thank you,
Alex

Trouble to try the samples in my S3 account

Describe the issue

I'm trying to run the samples given ,however i don't know what i'm doing wrong by running this line of code.Previously I added my key and secret key as environment variable as "AWS_ACCESS_KEY_ID" and"AWS_SECRET_ACCESS_KEY" .

I tried to run this command in the Ubuntu CLI

aws-c-s3/build/samples/s3/s3 ls s3://pruebaxpn --region us-west-2

However,this error pops up:

Failure while listing objects. Please check if you have valid credentials and s3 path is correct. Error: aws-c-s3: AWS_ERROR_S3_INVALID_RESPONSE_STATUS, Invalid response status from request

What I'm doing wrong? Thank you in advance

Links

https://github.com/awslabs/aws-c-s3#readme

Issue about possible mistake "bad copypast"

Describe the bug

Hello!
During the static analysis process, a suspicion of a bad copypast error was identified in s3_util.c:208. There is no changes between two blocks (192-196 and 208-214 lines), but in 192 and 208 lines the same instruction: "if (signing_config->service.len > 0) {". I think there should be signed_body_value instead of service in line 208.Please clarify is this right?

Expected Behavior

In line 208:
if (signing_config->signed_body_value.len > 0) {

Current Behavior

In line 208:
if (signing_config->service.len > 0) {

Reproduction Steps

This occurs when calling

struct aws_cached_signing_config_aws *aws_cached_signing_config_new(
struct aws_allocator *allocator,
const struct aws_signing_config_aws *signing_config)

Possible Solution

Change "if (signing_config->service.len > 0) {" to "if (signing_config->signed_body_value.len > 0) {"

Additional Information/Context

No response

aws-c-s3 version used

v0.3.18

Compiler and version used

g++ 8.3.0

Operating System and version

OS Linux Debian

[auto-ranged PUT]: CompleteMultipartUpload called while 1 multipart upload is missing

CompleteMultipartUpload in auto-ranged-PUT failed due to a missing second (and final) UploadPart.

We ran into this problem with aws-c-s3 0.1.51 and aws-sdk-cpp 1.10.54 on Linux

Problem description

Our API issued an S3CrtClient->PutObject, and it resulted in the following error:

Expected: 'x is ok', with x := 'output_blob_->close()' [av::status::Status]
  x = PutObject() failed
  where: cloud/aws/s3/s3_streambuf.cc:93
  extra: s3://perception-prod-training-data/opt/a831200c/s2a/2023-02-15-bless-collect_dking_updateOverlapFeb10_latestIssues/test/36c6f8923fe514d6b5a28ac5dbdea034.rats: HTTP response code: 400
Resolved remote host IP address: 
Request ID: 2BAQ8WRZGGH3PPG1
Exception name: InvalidPart
Error message: Unable to parse ExceptionName: InvalidPart Message: One or more of the specified parts could not be found.  The part may not have been uploaded, or the specified entity tag may not match the part's entity tag.
7 response headers:
connection : close
content-type : application/xml
date : Wed, 15 Feb 2023 16:39:26 GMT
server : AmazonS3
transfer-encoding : chunked
x-amz-id-2 : oFLatf5OZVV1Ny34iYIEhqA3Ft/+QRXSLzz6/K/c36nd4grUqR6kmhKJS/U32GftV1/GdmVEbOY=
x-amz-request-id : 2BAQ8WRZGGH3PPG1

Backtrace (most recent call first)
#10	 at 0x560a5b826f76 in av::cloud::aws::s3::S3Ostreambuf::PutObject()
#9	 at 0x560a5b82b63d in av::cloud::aws::s3::S3Ostreambuf::close()
#8	 at 0x560a5b822170 in av::cloud::aws::s3::S3Ostream::close()
#7	 at 0x560a5af329b8 in av::perception::s2a::DataExtractionModule::on_shutdown()
#6	 at 0x560a5bf40593 in av::framework::Module::on_executor_shutdown()
#5	 at 0x560a5ac5ed91 in av::detail::FuncImpl<>::invoke()
#4	 at 0x560a5beacc8b in av::BaseThreadPool::do_next()
#3	 at 0x560a5beacecd in std::thread::_State_impl<>::_M_run()
#2	 at 0x7fc1887c76df in <?>
#1	 at 0x7fc188ca26db in start_thread

The PutObject in frame 10 invokes the S3CrtClient->PutObject call.

Further investigation showed that the first UploadPart succeeded (visible in list-parts, there was no evidence (neither list-parts nor API logs) that the second UploadPart completed.

In our logs, there was no further aws-c-s3 error indicating a failed operation.

Open Question

The CompleteMultipartUpload request uses the ETags of the completed requests, so how could the CompleteMultiPartUpload have been sent with the ETag of the second UploadPart?

Perhaps it was sent with only 1 ETag (that of the first, successfully completed UploadPart).

Upload to S3

Hi,

Thanks for this work. I am trying to understand what would be the simplest way to upload a file to S3.
I have seen your samples but that is a bit more elaborated and looks like it's using command line arguments and then relevant aws functions to process these. However, I simply want to use C++ to upload a file to S3 without all the extra complexities.

Do you have example for this please?

Thank you

[CopyObject]: please re-enable client support and tests for CopyObject

Describe the feature

This is needed to support aws/aws-sdk-cpp#2477.

Problem Description

CopyObject support was added in #166, but client support as well as tests were disabled in #246. It seems the cause was the structure of the tests. As a result, CopyObject is currently neither supported nor continuously tested.

Request

Please re-enable tests and client support for CopyObject.

[s3_auto_ranged_get] off-by-one error in calculating the end of the range

When using a chunk size of 5MiB (default value of the C++ SDK), the final byte of the blob was not transferred, corrupting the download.

Problem description

When transferring an s3 blob of size 31457281 (1 byte more than exactly 30MiB), only 6 chunks were transferred, and the last chunk ended at 31457280. There should have been a 7th chunk that transferred the final byte.
I printed out the size of each body_callback write when this was invoked by the main C++ SDK:

partSize: 5242880
got 5242880
got 5242880
got 5242880
got 5242880
got 5242880
got 5242880

With a partSize (chunk size) of 8MiB or 16MiB, the problem did not occur, chunks were divided so that the whole object was transferred:

got 8388608
got 8388608
got 8388608
got 6291457

In both cases, the code calculated 31457280 as the end of the object range:

since 31457280 % (5 * 2^20) == 0, only 6 chunks were transferred,
since 31457280 % (8 * 2^20) != 0, the correct number of 4 chunks were transferred.

Code analysis

The end of the object (byte) range is set in s_discover_object_range_and_content_length:

// source/s3_auto_ranged_get.c
static int s_discover_object_range_and_content_length(
    struct aws_s3_meta_request *meta_request,
    struct aws_s3_request *request,
    int error_code,
    uint64_t *out_total_content_length,
    uint64_t *out_object_range_start,
    uint64_t *out_object_range_end) { /* ... */
        case AWS_S3_AUTO_RANGE_GET_REQUEST_TYPE_PART: /* ... */

            /* When discovering the object size via first-part, the object range is the entire object. */
            object_range_start = 0;
            object_range_end = total_content_length-1;  // <=== HERE
// ...
}

Initial fix

To correctly calculate the end of the range and number of chunks in a ranged-GET request, a value of content_length should be used instead of content_length-1.

After temporarily setting

// source/s3_auto_ranged_get.c
            /* When discovering the object size via first-part, the object range is the entire object. */
            object_range_start = 0;
            object_range_end = total_content_length;

we now see that the final chunk of size 1B is sent when using a 5MiB chunk size, completing the download:

partSize: 5242880
got 5242880
got 5242880
got 5242880
got 5242880
got 5242880
got 5242880
got 1

Fixing the problem properly

Simply changing the object_range_end values does not seem right, since

for the purpose of HTTP Range headers, byte ranges start at byte 0 and end at byte n-1,
for the purpose of computing the number of chunks, we need the object length (n).

Hence in order fix the problem properly, both cases need to be respected.

Build broken since #360

Describe the bug

#360 introduced AWS_ERROR_HTTP_RESPONSE_FIRST_BYTE_TIMEOUT and others, which aren't defined

Expected Behavior

Package compiles

Current Behavior

Does not compile

Reproduction Steps

Compile package

Possible Solution

No response

Additional Information/Context

No response

aws-c-s3 version used

0.3.23..0.4.0

Compiler and version used

gcc 12

Operating System and version

debian 11 / buildroot 2023.08

TCP keep-alive settings that have proved useful

We were experiencing stuck/hanging downloads and found 3 stalled TCP connections when debugging a stuck instance.

Problem description

We have a program based on aws-c-s3 that downloads 9547 files from s3, which repeatedly got stuck during download (6 confirmed cases). In the environment that it runs in the network speed is not very high.

Below are the results from debugging one such stuck program, which hung for over one hour after downloading 9544 out of 9547 files. Corresponding to the 3 remaining files, it had 3 open TCP connections (lsof output):

avlog   32082 aurora  177u  a_inode   0,13          0    11699 [eventfd]
avlog   32082 aurora  178u     sock    0,8        0t0   931727 protocol: TCP
avlog   32082 aurora  179u     sock    0,8        0t0   969081 protocol: TCP
avlog   32082 aurora  180u     sock    0,8        0t0   927838 protocol: TCP
avlog   32082 aurora  181u     sock    0,8        0t0   930058 protocol: TCP
avlog   32082 aurora  182u     sock    0,8        0t0   931729 protocol: TCP
avlog   32082 aurora  183u     sock    0,8        0t0   931730 protocol: TCP
avlog   32082 aurora  258u     IPv4 928034        0t0      TCP car:41030->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
avlog   32082 aurora  264u     IPv4 898921        0t0      TCP car:53400->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
avlog   32082 aurora  374u     IPv4 943913        0t0      TCP car:48732->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
avlog   32082 aurora  401w      REG   0,44  601226302   228200 /tmp/aws_sdk_2022-08-18-15.log

Further data (from ss) showed that these connections had not been shut down yet:

avlog   32082 aurora  258u     IPv4         RW,ND 928034        0t0      TCP car:41030->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
avlog   32082 aurora  264u     IPv4         RW,ND 898921        0t0      TCP car:53400->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)
avlog   32082 aurora  374u     IPv4         RW,ND 943913        0t0      TCP car:48732->s3-us-east-1-r-w.amazonaws.com:https (ESTABLISHED)

After some searching/grepping we found 3 "lost" connections in the logfiles, for which we looked up the requests

[DEBUG] 2022-08-18 12:10:40.806 http-stream [140608895186688] id=0x7fe20de64400: Created client request on connection=0x7fe20de61f00: GET https://aurora-amendments.s3.us-east-1.amazonaws.com/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df%23port_rear_camera.image_rgb.data.03123 HTTP/1.1

[DEBUG] 2022-08-18 12:59:24.799 http-stream [140608580613888] id=0x7fe1fb043200: Created client request on connection=0x7fe1fb061600: GET https://aurora-amendments.s3.us-east-1.amazonaws.com/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df%23forward_center_camera.image_rgb.data.06782 HTTP/1.1

[DEBUG] 2022-08-18 14:22:34.721 http-stream [140608907769600] id=0x7fe20e036600: Created client request on connection=0x7fe20e07a700: GET https://aurora-amendments.s3.us-east-1.amazonaws.com/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df/20220716.052558Z.a579m-00509.VO-47484_PLT%40l1657949190.000000000u1657951584.000000000~isp_amen.e4b8797962d7c8ca2cfb67f9f106b8df%23starboard_rear_camera.image_rgb.data.03405 HTTP/1.1

All times are in UTC.
The last log entry was made at 15:24 UTC. Hence it looks as if there was no progress on these connnections for over 1 hour.

Problem resolution

By default Linux does not enable TCP keep-alive in order to close (and appropriately fail) on stuck TCP connections. aws-c-s3 supports TCP keep-alive, but it needs to be enabled explicitly.

The Linux defaults (2 hours, 11 minutes) are also too long:

net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9

Hence the values need to be modified, ideally to detect a stuck TCP connection within 10 minutes.

Fix that worked for us

We settled for an initial idle interval of 3 minutes (180sec), with 7 probes spaced 60 seconds. This allows to detect a dead TCP connection after 10 minutes.

We experienced that before #204 was merged, the following settings worked for us and can easily be ported to use with aws_s3_endpoint_options:

--- a/source/s3_endpoint.c
+++ b/source/s3_endpoint.c
@@ -35,6 +35,10 @@
 #include <math.h>

 static const uint32_t s_connection_timeout_ms = 30000;
+/* SO_KEEPALIVE settings - detect dead connection after 10 minutes: */
+static const uint16_t s_keep_alive_interval_sec = 180;
+static const uint16_t s_keep_alive_timeout_sec = 60;
+static const uint16_t s_keep_alive_max_failed_probes = 7;
 static const uint16_t s_http_port = 80;
 static const uint16_t s_https_port = 443;

@@ -137,6 +141,12 @@ static struct aws_http_connection_manager *s_s3_endpoint_create_http_connection_
     socket_options.type = AWS_SOCKET_STREAM;
     socket_options.domain = AWS_SOCKET_IPV4;
     socket_options.connect_timeout_ms = s_connection_timeout_ms;
+
+    socket_options.keepalive = true;
+    socket_options.keep_alive_interval_sec = s_keep_alive_interval_sec;
+    socket_options.keep_alive_timeout_sec = s_keep_alive_timeout_sec;
+    socket_options.keep_alive_max_failed_probes = s_keep_alive_max_failed_probes;
+
     struct proxy_env_var_settings proxy_ev_settings;
     AWS_ZERO_STRUCT(proxy_ev_settings);
     /* Turn on envrionment variable for proxy by default */

`test_s3_get_performance` is an invalid test name

👋 I'm looking into bench-marking this library, I came across this line which potentially would be what I'm looking for,

aws-c-s3/benchmarks/benchmarks-stack/benchmarks-stack/lib/project_scripts/run_aws_c_s3.sh

Line 59 in a68c298

$USER_DIR/aws-c-s3/build/tests/aws-c-s3-tests test_s3_get_performance

However when I try to execute it, it's not one of the available tests. I have built with the flags laid out in,

aws-c-s3/benchmarks/benchmarks-stack/benchmarks-stack/lib/project_scripts/run_aws_c_s3.sh

Line 54 in a68c298

 cmake -S aws-c-s3 -B aws-c-s3/build -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_PREFIX_PATH=$INSTALL_PATH -DENABLE_S3_NET_TESTS=ON -DENABLE_S3_PERFORMANCE_TESTS=ON -DPERFORMANCE_TEST_NUM_TRANSFERS=100 

mattbr@MacBook-Air ~/D/G/a/a/b/tests (main)> ./aws-c-s3-tests test_s3_get_performance
Available tests:
  0. test_s3_copy_http_message
  1. test_s3_message_util_assign_body
  2. test_s3_ranged_get_object_message_new
  3. test_s3_set_multipart_request_path
  4. test_s3_create_multipart_upload_message_new
  5. test_s3_upload_part_message_new
  6. test_s3_complete_multipart_message_new
  7. test_s3_abort_multipart_upload_message_new
  8. test_s3_client_create_destroy
  9. test_s3_client_monitoring_options_override
 10. test_s3_client_proxy_ev_settings_override
 11. test_s3_client_tcp_keep_alive_options_override
 12. test_s3_client_max_active_connections_override
 13. test_s3_client_get_max_active_connections
 14. test_s3_request_create_destroy
 15. test_s3_client_queue_requests
 16. test_s3_meta_request_body_streaming
 17. test_s3_update_meta_requests_trigger_prepare
 18. test_s3_client_update_connections_finish_result
 19. test_s3_client_exceed_retries
 20. test_s3_client_acquire_connection_fail
 21. test_s3_meta_request_fail_prepare_request
 22. test_s3_meta_request_sign_request_fail
 23. test_s3_meta_request_send_request_finish_fail
 24. test_s3_auto_range_put_missing_upload_id
 25. test_s3_cancel_mpu_create_not_sent
 26. test_s3_cancel_mpu_create_completed
 27. test_s3_cancel_mpu_one_part_completed
 28. test_s3_cancel_mpu_all_parts_completed
 29. test_s3_cancel_mpd_nothing_sent
 30. test_s3_cancel_mpd_one_part_sent
 31. test_s3_cancel_mpd_one_part_completed
 32. test_s3_cancel_mpd_two_parts_completed
 33. test_s3_cancel_mpd_head_object_sent
 34. test_s3_cancel_mpd_head_object_completed
 35. test_s3_cancel_mpd_get_without_range_sent
 36. test_s3_cancel_mpd_get_without_range_completed
 37. test_s3_cancel_prepare
 38. test_s3_get_object_tls_disabled
 39. test_s3_get_object_tls_enabled
 40. test_s3_get_object_tls_default
 41. test_s3_get_object_less_than_part_size
 42. test_s3_get_object_empty_object
 43. test_s3_get_object_multiple
 44. test_s3_get_object_sse_kms
 45. test_s3_get_object_sse_aes256
 46. test_s3_get_object_backpressure_small_increments
 47. test_s3_get_object_backpressure_big_increments
 48. test_s3_get_object_backpressure_initial_size_zero
 49. test_s3_no_signing
 50. test_s3_signing_override
 51. test_s3_put_object_tls_disabled
 52. test_s3_put_object_tls_enabled
 53. test_s3_put_object_tls_default
 54. test_s3_multipart_put_object_with_acl
 55. test_s3_put_object_multiple
 56. test_s3_put_object_less_than_part_size
 57. test_s3_put_object_empty_object
 58. test_s3_put_object_with_part_remainder
 59. test_s3_put_object_sse_kms
 60. test_s3_put_object_sse_kms_multipart
 61. test_s3_put_object_sse_aes256
 62. test_s3_put_object_sse_aes256_multipart
 63. test_s3_put_object_sse_c_aes256_multipart
 64. test_s3_put_object_sse_c_aes256_multipart_with_checksum
 65. test_s3_put_object_singlepart_no_content_md5_enabled
 66. test_s3_put_object_singlepart_no_content_md5_disabled
 67. test_s3_put_object_singlepart_correct_content_md5_enabled
 68. test_s3_put_object_singlepart_correct_content_md5_disabled
 69. test_s3_put_object_singlepart_incorrect_content_md5_enabled
 70. test_s3_put_object_singlepart_incorrect_content_md5_disabled
 71. test_s3_put_object_multipart_no_content_md5_enabled
 72. test_s3_put_object_multipart_no_content_md5_disabled
 73. test_s3_put_object_multipart_correct_content_md5_enabled
 74. test_s3_put_object_multipart_correct_content_md5_disabled
 75. test_s3_put_object_multipart_incorrect_content_md5_enabled
 76. test_s3_put_object_multipart_incorrect_content_md5_disabled
 77. test_s3_upload_part_message_with_content_md5
 78. test_s3_upload_part_message_without_content_md5
 79. test_s3_create_multipart_upload_message_with_content_md5
 80. test_s3_complete_multipart_message_with_content_md5
 81. test_s3_put_object_double_slashes
 82. test_s3_round_trip
 83. test_s3_round_trip_default_get
 84. test_s3_round_trip_multipart_get_fc
 85. test_s3_round_trip_default_get_fc
 86. test_s3_round_trip_mpu_multipart_get_fc
 87. test_s3_round_trip_mpu_multipart_get_with_list_algorithm_fc
 88. test_s3_round_trip_mpu_default_get_fc
 89. test_s3_chunked_then_unchunked
 90. test_s3_cancel_mpu_one_part_completed_fc
 91. test_s3_cancel_mpd_one_part_completed_fc
 92. test_s3_meta_request_default
 93. test_s3_put_object_fail_headers_callback
 94. test_s3_put_object_fail_body_callback
 95. test_s3_get_object_fail_headers_callback
 96. test_s3_get_object_fail_body_callback
 97. test_s3_default_fail_headers_callback
 98. test_s3_default_invoke_headers_callback_on_error
 99. test_s3_default_invoke_headers_callback_cancels_on_error
100. test_s3_get_object_invoke_headers_callback_on_error
101. test_s3_put_object_invoke_headers_callback_on_error
102. test_s3_put_object_invoke_headers_callback_on_error_with_user_cancellation
103. test_s3_default_fail_body_callback
104. test_s3_error_missing_file
105. test_s3_existing_host_entry
106. test_s3_put_fail_object_invalid_request
107. test_s3_put_fail_object_inputstream_fail_reading
108. test_s3_put_single_part_fail_object_inputstream_fail_reading
109. test_s3_put_object_clamp_part_size
110. test_s3_auto_ranged_get_sending_user_agent
111. test_s3_auto_ranged_put_sending_user_agent
112. test_s3_default_sending_meta_request_user_agent
113. test_s3_range_requests
114. test_s3_not_satisfiable_range
115. test_s3_bad_endpoint
116. test_s3_different_endpoints
117. test_s3_replace_quote_entities
118. test_s3_strip_quotes
119. test_s3_parse_content_range_response_header
120. test_s3_parse_content_length_response_header
121. test_s3_get_num_parts_and_get_part_range
122. test_s3_aws_xml_get_top_level_tag_with_root_name
123. test_add_user_agent_header
124. test_get_existing_compute_platform_info
125. test_get_nonexistent_compute_platform_info
126. sha1_nist_test_case_1
127. sha1_nist_test_case_2
128. sha1_nist_test_case_3
129. sha1_nist_test_case_4
130. sha1_nist_test_case_5
131. sha1_nist_test_case_5_truncated
132. sha1_nist_test_case_6
133. sha1_test_invalid_buffer
134. sha1_test_oneshot
135. sha1_test_invalid_state
136. sha256_nist_test_case_1
137. sha256_nist_test_case_2
138. sha256_nist_test_case_3
139. sha256_nist_test_case_4
140. sha256_nist_test_case_5
141. sha256_nist_test_case_5_truncated
142. sha256_nist_test_case_6
143. sha256_test_invalid_buffer
144. sha256_test_oneshot
145. sha256_test_invalid_state
146. crc32_nist_test_case_1
147. crc32_nist_test_case_2
148. crc32_nist_test_case_3
149. crc32_nist_test_case_4
150. crc32_nist_test_case_5
151. crc32_nist_test_case_5_truncated
152. crc32_nist_test_case_6
153. crc32_test_invalid_buffer
154. crc32_test_oneshot
155. crc32_test_invalid_state
156. crc32c_nist_test_case_1
157. crc32c_nist_test_case_2
158. crc32c_nist_test_case_3
159. crc32c_nist_test_case_4
160. crc32c_nist_test_case_5
161. crc32c_nist_test_case_5_truncated
162. crc32c_nist_test_case_6
163. crc32c_test_invalid_buffer
164. crc32c_test_oneshot
165. crc32c_test_invalid_state
166. verify_checksum_stream
167. verify_chunk_stream
168. test_s3_put_pause_resume_happy_path
169. test_s3_put_pause_resume_all_parts_done
170. test_s3_put_pause_resume_invalid_checksum
171. test_s3_list_bucket_init_mem_safety
172. test_s3_list_bucket_init_mem_safety_optional_copies
173. test_s3_list_bucket_valid
Failed: test_s3_get_performance is an invalid test name.

Warnings: conversion from 'const uint64_t' to 'size_t'

Encountered warnings: conversion from 'const uint64_t' to 'size_t' for Windows 32-bit build at the following two places:

aws-c-s3/source/s3_meta_request.c

Line 701 in 245af3d

 &request->send_data.response_body_error, meta_request->allocator, s_response_body_error_buf_size); 

static const uint64_t s_response_body_error_buf_size = KB_TO_BYTES(1); /* uint64_t here */

/* ...... */

/* We may have an error body coming soon, so allocate a buffer for that error. */
aws_byte_buf_init(
    &request->send_data.response_body_error, meta_request->allocator, s_response_body_error_buf_size); /* convert to size_t here */

And

aws-c-s3/source/s3_client.c

Line 705 in 245af3d

aws_byte_buf_init(&part_buffer->buffer, client->allocator, client->part_size);

struct aws_s3_client {
    /* ...... */
    /* Size of parts for files when doing gets or puts.  This exists on the client as configurable option that is passed
     * to meta requests for use. */
    const uint64_t part_size; /* uint64_t here */
    /* ...... */
};

/* ...... */

for (size_t buffer_index = 0; buffer_index < num_buffers; ++buffer_index) {
    struct aws_s3_part_buffer *part_buffer =
        aws_mem_calloc(client->allocator, 1, sizeof(struct aws_s3_part_buffer));

    aws_byte_buf_init(&part_buffer->buffer, client->allocator, client->part_size); /* convert to size_t here */

    aws_linked_list_push_back(free_list, &part_buffer->node);
    ++pool->num_allocated;
 }

And build failed because warnings are treated as errors.
No issues with 64-bit build.

Handle range header client-side

Describe the feature

Would it be possible to avoid the HeadObject requests when doing a GET range request? I noticed this comment but I wonder if it's something that's feasible, or in the plans?

aws-c-s3/source/s3_auto_ranged_get.c

Line 166 in 83008e5

  * For the range header value could be parsed client-side, doing so presents a number of 

Use Case

When reading data in Parquet format (e.g. data lake applications), the file footer needs to be read first, so an implementation that reads from S3 needs to start with a HeadObject request and thus already knows the object size. The data itself may then be read in several small range requests, so making redundant HeadObject requests for each of those adds up latency. I understand that this library is optimized for throughput, but it would be great if there was a way to have those performance benefits without introducing latency in cases where the amount of data read is small.

Proposed Solution

I'm not familiar with the internals of the auto-range request implementation, but maybe the first request could be made to the last range (at the end of the object) so that an Unsatisfiable error will be returned if the range is out of bounds?

Other Information

No response

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

Failed to build dependencies on ubuntu 20

Describe the issue

The go requirement is not documented, I was able to solve it via installed 1.20.5 go from source (the default go with ubuntu 20.04 is 1.14 I think and then I got an error of missing includes).

 wget https://go.dev/dl/go1.20.5.linux-amd64.tar.gz
  sudo tar -C /usr/local -xzf go1.20.5.linux-amd64.tar.gz
  mkdir ~/.go
  GOROOT=/usr/local/go
  GOPATH=~/.go
  PATH=$PATH:$GOROOT/bin:$GOPATH/bin
  sudo update-alternatives --install "/usr/bin/go" "go" "/usr/local/go/bin/go" 0
  sudo update-alternatives --set go /usr/local/go/bin/go

After solving the go issue I am getting another error:

ubuntu@ip-172-31-30-217:/mnt/data/crtsdk$ cmake -S aws-checksums -B aws-checksums/build -DCMAKE_INSTALL_PREFIX=`pwd`
CMake Error at CMakeLists.txt:31 (include):
  include could not find load file:

    AwsCFlags


CMake Error at CMakeLists.txt:32 (include):
  include could not find load file:

    AwsCheckHeaders


CMake Error at CMakeLists.txt:33 (include):
  include could not find load file:

    AwsSharedLibSetup


CMake Error at CMakeLists.txt:34 (include):
  include could not find load file:

    AwsSanitizers


CMake Error at CMakeLists.txt:36 (include):
  include could not find load file:

    AwsFindPackage


CMake Error at CMakeLists.txt:37 (include):
  include could not find load file:

    AwsFeatureTests


CMake Error at CMakeLists.txt:117 (aws_set_common_properties):
  Unknown CMake command "aws_set_common_properties".


-- Configuring incomplete, errors occurred!
See also "/mnt/data/crtsdk/aws-checksums/build/CMakeFiles/CMakeOutput.log".

Please advise what is the correct way to compile? the first 3 projects do compile but the 4th one and later are missing those includes.

Links

https://github.com/awslabs/aws-c-s3

aws-c-s3: client-configured is not released at time of client destruction

This problem occurs when passing an aws_s3_client_config.retry_strategy into the s3 client.

Problem description

The aws_retry_strategy_new_[...]() functions all initialize the retry_strategy ref_count to 1,
which is then used in s_s3_client_finish_destroy_default to destroy the retry_strategy.

For client-configured retry strategies, however, reference counting is handled differently:

client configured retry strategies also get initialized with a ref_count of 1,
but they get an additional increase by 1 in aws_s3_client_new:

// source/s3_client.c
struct aws_s3_client *aws_s3_client_new(
    struct aws_allocator *allocator,
    const struct aws_s3_client_config *client_config) {

    if (client_config->retry_strategy != NULL) {
        aws_retry_strategy_acquire(client_config->retry_strategy);  // <== HERE
        client->retry_strategy = client_config->retry_strategy;     // <== INITIAL REF COUNT IS NOW 2
    } else {
        struct aws_exponential_backoff_retry_options backoff_retry_options = {
            .el_group = client_config->client_bootstrap->event_loop_group,
            .max_retries = s_default_max_retries,
        };

        struct aws_standard_retry_options retry_options = {
            .backoff_retry_options = backoff_retry_options,
        };

        client->retry_strategy = aws_retry_strategy_new_standard(allocator, &retry_options);  // <== INITIAL REF COUNT = 1
    }
}

As a result, the client-retry_strategy is never released in s_s3_client_finish_destroy_default, causing the code to hang.

Simple usage examples?

Is there anywhere a set of simple examples for using aws-c-s3?
The tests seem overly complex. It would be nice to have
examples showing how to read an object, write an object, and list a bucket.

CopyObjectRequest wrong host URL

original issue: aws/aws-sdk-cpp#2822

Describe the bug

When I try to copy object from one folder to another in same bucket I get an error Invalid response status from request (aws-c-s3: AWS_ERROR_S3_INVALID_RESPONSE_STATUS). This happens only for CopyObject, for HeadObject, ListObjects, GetObject, PutObject everything is correct

Expected Behavior

Expected host URL should be s3.giraffe360-mimosa.com and file should be copied
from
/cold-data/projects/db9768d14f7c4055aa7518e42b633888/floorplan/roomsketcher/floorplan-ALL-final_1_0.svg
to
/cold-data/projects/db9768d14f7c4055aa7518e42b633888/floorplan/roomsketcher/final_backups/floorplan-ALL-final_1_0_2024-01-24_13-03-01.svg

Current Behavior

Based on logs I see it has wrong host URL

[INFO] 2024-01-24 11:03:01.675 AuthSigning [140735609304640] (id=0x7fff88004380) Signing successfully built canonical request for algorithm SigV4, with contents 
HEAD
/projects/db9768d14f7c4055aa7518e42b633888/floorplan/roomsketcher/floorplan-ALL-final_1_0.svg

host:cold-data.giraffe360-mimosa.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20240124T110301Z

host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD

For other requests it is correct

[INFO] 2024-01-24 11:03:01.541 AuthSigning [140735592519232] (id=0x7fff840032d0) Signing successfully built canonical request for algorithm SigV4, with contents 
PUT
/cold-data/projects/db9768d14f7c4055aa7518e42b633888/floorplan/roomsketcher/geometry.json

content-length:4073
content-md5:rg4HTU7lw0cuZltmQKY+2g==
content-type:binary/octet-stream
host:s3.giraffe360-mimosa.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20240124T110301Z

content-length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD

cold-data is bucket name.

Reproduction Steps

Aws::Auth::AWSCredentials credentials(access_key, secret_key, session_token);

Aws::S3Crt::ClientConfiguration config;
config.endpointOverride = host_;
config.useVirtualAddressing = false;
config.verifySSL = false;
config.enableEndpointDiscovery = false;
config.enableHostPrefixInjection = false;
config.region = Aws::Region::US_EAST_1;
config.scheme = Aws::Http::Scheme::HTTPS;
config.disableMultiRegionAccessPoints = true;
config.disableS3ExpressAuth = true;
config.payloadSigningPolicy = Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never;
config.useUSEast1RegionalEndPointOption = Aws::S3Crt::US_EAST_1_REGIONAL_ENDPOINT_OPTION::LEGACY;
config.enableEndpointDiscovery = false;
config.enableHostPrefixInjection = false;
config.version = Aws::Http::Version::HTTP_VERSION_3;

std::unique_ptr<Aws::S3Crt::S3CrtClient> client = std::make_unique<Aws::S3Crt::S3CrtClient>(credentials, config);

Aws::S3Crt::Model::CopyObjectRequest request;
request.SetBucket(bucket);
request.SetKey(dst_path);
request.SetCopySource(bucket + "/" + src_path);
auto outcome = client->CopyObject(request);
if (!outcome.IsSuccess()) {
  std::cerr << "Failed to copy file " << src_path << " to " << dst_path
             << " in bucket: " " << bucket
             << " with error: " << outcome.GetError().GetMessage()
             << " and error code: " << outcome.GetError().GetExceptionName();
}

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.249

Compiler and Version used

gcc 13.1.0

Operating System and version

Ubuntu 22.04.3 LTS

Improve Samples

Describe the feature

As far as I know the API is undocumented (at least I can't find it in search engines and it isn't listed on the official AWS docs links,) but samples go a long way. Unfortunately the only sample given is complicated by trying to make the s3 command act like aws s3, and it also uses private APIs. Standalone examples which only use the public API would go a long way.

Use Case

I'm someone trying to use the API, but I'm not even sure where to start aside from calling aws_s3_client new. I'm unable to follow what the given sample does because it uses private APIs which aren't installed when you install the library and it's not clear from reading the public headers how to perform basic tasks like "get an object from s3."

Proposed Solution

No response

Other Information

No response

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

Faster Paging

Describe the feature

Provide pagination metadata so that we may perform retrievals in parallel. i.e. async listObjectsV2, we never know how many pages there are and are unable to pull a page until we know the page before it and a dynamic 'next-token'.

Or suggest strategies - thoughts:

Ability to modify listObjectsV2Request.maxKeys to something > 1000.
Push down a read-ahead parameter so subsequent page retrievals can happen before we actually request the next page.
Ability to retrieve commonPrefix only when delimiter is set, ignoring the loading of actual keys. This would allow us to get a handle on our key patterns and subsequently make parallel listObjectRequests where we find a match.
Wildcard ability in the prefix. We know the start(prefix) and some of the middle portions of the key due to our partition strategy.

Use Case

highly complex system writing streaming data to S3 - on disaster recovery we need to scan S3 objects, identify key patterns and determine where we left off in order to restart at the correct location. Across something like 10Million keys.

Today max request is 1000 keys at a time, and no way to do this in parallel.

Proposed Solution

No response

Other Information

No response

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

Move copy_http_headers functionality into aws-c-http proper

[Feature Request]: support for multi-part byte ranges

Rationale

The aws-c-s3 library already supports ranged-GET for doing single byte-range requests in parallel.

The s3 backend supports single byte-ranges, but not multipart/byteranges as the following experiment shows.

1. Single range

For a Range header containing bytes=0-499, the server responds with

ETag: "adca0407e42a4f8b1caea85350a8d2ce-5"
Accept-Ranges: bytes
Content-Range: bytes 0-499/74153027
Content-Type: application/octet-stream
ServerAmazon: S3
Content-Length: 500

Response code: 206

2. Multiple ranges

To a Range header containing bytes=0-499,500-502 the server responds with

ETag:"adca0407e42a4f8b1caea85350a8d2ce-5"
Accept-Ranges: bytes
Content-Type: application/octet-stream
ServerAmazon: S3
Content-Length: 74153027

Response code: 200

The absence of the Content-Range header and the response code (different from 206) indicate that the backend does not support this optional feature.

Retrieving multiple ranges within a call is very useful, in particular when only a few selected ranges (e.g. chunks) of a large/huge blob are required. This saves round-trip-times and hence speeds up processing.

Proposed feature being asked for

Multipart byte ranges can be emulated on the client side by the aws-c-s3 library by e.g.

assigning a single meta_request to the emulated multipart/byteranges transfer,
splitting the multiple byte-ranges into separate requests,
re-assembling the result into a multipart/byteranges response.

Fail to download full folder with 15K images and 7K text files

Describe the bug

We are following the example by compiling the s3 demo copy program and running it on ubuntu 20.04 using the command line:
nohup time aws-c-s3/build/samples/s3/s3 cp s3://vl-sample-dataset-kitti/Kitti/ ~/Kitti --region us-east-2 &

The program runs fine without error but the number of received file is around 11K namely half the files are missing.

# Kitti is the folder of received files with s3 c demo
ubuntu@ip-172-31-30-217:~/Kitti$ du -sh .
6.1G	.
ubuntu@ip-172-31-30-217:~/Kitti$ find . -type f | wc
  11647   11647  391999

# Kitti2 is the data downloaded using aws s3 sync
ubuntu@ip-172-31-30-217:/mnt/data/crtsdk$ find ~/Kitti2/ | wc
  22487   22487 1161674
ubuntu@ip-172-31-30-217:/mnt/data/crtsdk$ du -sh ~/Kitti2
12G	/home/ubuntu/Kitti2

Expected Behavior

All files should be copied locally

Current Behavior

Only 50% of the files are copied, there is no error.
The machine has enough disk space for the copy:

ubuntu@ip-172-31-30-217:~$ df -k .
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/root      304681132 245898460  58766288  81% /
ubuntu@ip-172-31-30-217:~$

Attached below are the running logs, of listing aws via the aws-c-s3 (number of files is correctly 22k, listing via aws s3 command (again 22k) and the full output of the run that copied 11k files.
We have opened the bucket permissions so you can try on your own.

logs.zip

s3_exec.zip

Reproduction Steps

nohup time aws-c-s3/build/samples/s3/s3 cp s3://vl-sample-dataset-kitti/Kitti/ ~/Kitti --region us-east-2 &

Possible Solution

No response

Additional Information/Context

Interestingly, when looking at the number of download printouts,
grep "download: s3://vl-sample-dataset-kitti/Kitti/raw" nohup.out | sort -u | wc
I see somehitng between 34K to 44K printouts. Maybe due to multithreading?

aws-c-s3 version used

latest from repo compiled July 6, 2023

Compiler and version used

ubuntu@ip-172-31-30-217:~/Kitti$ cmake --version cmake version 3.16.3 CMake suite maintained and supported by Kitware (kitware.com/cmake).

Operating System and version

ubuntu@ip-172-31-30-217:~~/Kitti$ uname -a Linux ip-172-31-30-217 5.15.0-1039-aws #44~~20.04.1-Ubuntu SMP Thu Jun 22 12:21:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux, instance was t2.xlarge

Please add 'About' text to repo details

If you visit https://github.com/awslabs, most repos include an About summary that makes it simple to see, at a glance, what the repo is all about. This project does not contain About text.

s3_request: num_requests_in_flight not reduced due to nil client field

Problem description

While debugging a problem with a drastic reduction in transfer speed, I noticed that the client->stats.num_requests_in_flight count does not go back to 0 at the end of the transfer.

The last s3_client statement logged before the shutdown was:

[INFO] 2022-09-27 15:10:09.977 S3ClientStats [140233490298624] id=0x7f8aa4310180 Requests-in-flight(approx/exact):0/998  Requests-preparing:0  Requests-queued:0  Requests-network(get/put/default/total):0/0/0/0  
Requests-streaming-waiting:0  Requests-streaming:0  Endpoints(in-table/allocated):0/0
[DEBUG] 2022-09-27 15:10:09.977 S3Client [140233490298624] id=0x7f8aa4310180 Client shutdown progress: starting_destroy_executing=0  body_streaming_elg_allocated=0  process_work_task_scheduled=0  process_work_ta
sk_in_progress=0  num_endpoints_allocated=0 finish_destroy=1
[DEBUG] 2022-09-27 15:10:09.977 S3Client [140233490298624] id=0x7f8aa4310180 Client finishing destruction.

Please note the 998 stats.num_requests_in_flight count in Requests-in-flight(approx/exact):0/998.

The stats.num_requests_in_flight is used to regulate the number of new requests that are sent, the high count of exact requests-in-flight (in contrast to 0 approximate in-flight requests) caused the request rate to drop drastically.

The following debug log statement brought clarity:

// source/s3_request.c
static void s_s3_request_destroy(void *user_data) {
    struct aws_s3_request *request = user_data;

    if (request == NULL) {
        return;
    }

    struct aws_s3_meta_request *meta_request = request->meta_request;

    if (meta_request != NULL) {
        struct aws_s3_client *client = meta_request->client;

        if (client != NULL) {
            aws_s3_client_notify_request_destroyed(client, request);  // <== NOT CALLED WHEN client is NULL
        }
    }
    if (request->tracked_by_client) { // <=== ADDED THIS STATEMENT TO LOGS
        AWS_LOGF_ERROR(
            AWS_LS_S3_REQUEST,
            "id=%p REQUEST DESTROY meta=%p  client=%p",
            (void *)request,
            (void *)meta_request,
            (void *) (meta_request == NULL ? NULL : meta_request->client));
    }
//...
}

In the log generated by the transfer, there were 1337 (actual number) entries of the following type:

[ERROR] 2022-09-27 15:19:24.323 S3Request [139957177939712] id=0x7f4a4d66af00 REQUEST DESTROY meta=0x7f489dd9b800  client=(nil)

According to the above code, this means that the stats.num_requests_in_flight was off due to the client field being NULL.
As a result, the transfer speed dropped drastically, from full speed to only a few requests per second.

What to fix

The client field is needed to accurately maintain the stats.num_requests_in_flight field responsible for the request-speed-control.

If possible, find the code path that sets meta_request->client to NULL before destroying the request.
Otherwise, a different approach is needed for maintaining the counter.

Hard-coded Host header domain in aws_s3_get_object_size_message_new

I believe this is responsible for an object metadata request failing when trying to copy objects using the AWS Java SDK v2 Transfer Manager API (and presumably for other clients using this library). I raised aws/aws-sdk-java-v2#3370 initially, then tracked the problem to the changes introduced in #166 .

The object metadata request will fail if talking to a local Minio server, e.g. the endpoint specified is localhost:9100, in theory the Host header value should be my-bucket.localhost:9100 (as it is for other SDK client requests), but is actually my-bucket.s3.us-west-2.amazonaws.com, to which Minio will understandably respond with a 404. I don't know if this also fails when talking to Amazon S3 for regions other than us-west-2.

I'm not familiar enough with this code (and haven't touched C in too long) to offer a solution, but perhaps @cmello or someone else could take a look. No doubt the actual request endpoint is readily available somewhere and can be used to construct the Host header value.

Mem limiter affecting get throughput in corner cases

Describe the bug

Mem limiter provides a push back mechanism on scheduler if memory usage is getting close to the limit.

With gets there is a chicken and egg problem, since we dont know the size object before doing a get and we want to avoid making additional request to figure out that size before doing a get (cause additional roundtrips for get tank perf). So crt will optimistically do a ranged get with a part size to get a first ranged part and figure out the overall size.

This approach works fine in most cases. But it will unnecessarily slow down gets when part size is huge and gets itself are small. Ex. part size is 1 GB and the files being retrieved are 1mb. Mem limiter in that case would only be able to schedule 4 gets in parallel (assuming 4 gb mem limit), since it would account for the worst case of getting back 1GB part. But in practice we should be able to schedule a lot more gets in parallel, cause they are all small.

refer to aws/aws-sdk-cpp#2922 for example of this in the wild

Expected Behavior

something better?

Current Behavior

download slows down to a crawl on lots of small gets if part size is huge

Reproduction Steps

set part size to a gig and observe downloads on 10k 256kb files

Possible Solution

No response

Additional Information/Context

No response

aws-c-s3 version used

latest

Compiler and version used

every compiler

Operating System and version

every os

[s3 endpoint] aws_s3_client_endpoint_release race condition produces segmentation faults

We are seeing frequent/intermittent segmentation faults with v0.1.44 and aws-c-http v0.6.19, the problem is also present on master as of today (c1198ae):

[DEBUG] 2022-08-20 01:03:31.497 connection-manager [139657729799936] id=(nil): Acquire connection

FATAL: Received signal 11 (Segmentation fault)
Backtrace (most recent call first)
#10      <?> at 0x7f04b9f75980 in __restore_rt
#9       <?> at 0x55afd6f0535b in s_s3_client_acquired_retry_token
#8       <?> at 0x55afd6f60ff2 in s_exponential_retry_task
#7       <?> at 0x55afd6fceb39 in aws_task_run
#6       <?> at 0x55afd6fced90 in s_run_all
#5       <?> at 0x55afd6fcf1db in aws_task_scheduler_run_all
#4       <?> at 0x55afd6f6a215 in s_main_loop
#3       <?> at 0x55afd6fd3100 in thread_fn
#2       <?> at 0x55afd3f35304 in thread_metrics_start_routine
#1       <?> at 0x7f04b9f6a6db in start_thread

This happens on the first attempt, not on retry.
It turns out that the connection_manager is NULL when it is needed:

// aws-c-http/source/connection_manager.c
void aws_http_connection_manager_acquire_connection(
    struct aws_http_connection_manager *manager,
    aws_http_connection_manager_on_connection_setup_fn *callback,
    void *user_data) {

    AWS_LOGF_DEBUG(AWS_LS_HTTP_CONNECTION_MANAGER, "id=%p: Acquire connection", (void *)manager); // <=== HERE

The above function is used as the .acquire_http_connection function pointer of the s_s3_client_default_vtable, and invoked from s_s3_client_acquired_retry_token like this:

// aws-c-s3/source/s3_client.c
static void s_s3_client_acquired_retry_token(
    struct aws_retry_strategy *retry_strategy,
    int error_code,
    struct aws_retry_token *token,
    void *user_data) {
     // ....
    client->vtable->acquire_http_connection(
        endpoint->http_connection_manager, s_s3_client_on_acquire_http_connection, connection);
}

So the endpoint->http_connection_manager is NULL when it should not be.

Adding more logging

At debug level, the logs do not reveal much, hence we used this patch to log more information:

[INFO] 2022-08-20 01:03:31.496 AuthSigning [139657322956544] (id=0x7f047f82c050) Http request successfully built final authorization value via algorithm SigV4, with contents 
AWS4-HMAC-SHA256 Credential=ASIAYO6OYNH5WYXVUNEK/20220820/us-east-1/s3/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=0da7d
7a8a6257b602c2234599874dd5dba60f43ec6f51143efdc67a78269aced

[DEBUG] 2022-08-20 01:03:31.496 task-scheduler [139657528473344] id=0x7f04898cf158: Scheduling s3_client_process_work_task task for immediate execution
[DEBUG] 2022-08-20 01:03:31.496 task-scheduler [139657528473344] id=0x7f04898cf158: Running s3_client_process_work_task task with <Running> status
[DEBUG] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 s_s3_client_process_work_default - Moving relevant synced_data into threaded_data.
[DEBUG] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 s_s3_client_process_work_default - Processing any new meta requests.
[DEBUG] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 Updating meta requests.
[DEBUG] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 Updating connections, assigning requests where possible.
[ERROR] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 0x7f04a4fd1ca0 s_s3_client_create_connection_for_request_default: (nil)
[DEBUG] 2022-08-20 01:03:31.497 exp-backoff-strategy [139657528473344] id=0x7f049a6c64d0: Initializing retry token 0x7f048bc12100
[DEBUG] 2022-08-20 01:03:31.497 task-scheduler [139657729799936] id=0x7f048bc12160: Scheduling aws_exponential_backoff_retry_task task for immediate execution
[INFO] 2022-08-20 01:03:31.497 S3ClientStats [139657528473344] id=0x7f04898cf000 Requests-in-flight(approx/exact):1/1  Requests-preparing:0  Requests-queued:0  Requests-network(get/put/default/total):0/1/0/1  Requests-streaming-waiting:0  Requests-streaming:0  Endpoints(in-table/allocated):1/0
[DEBUG] 2022-08-20 01:03:31.497 task-scheduler [139657729799936] id=0x7f048bc12160: Running aws_exponential_backoff_retry_task task with <Running> status
[DEBUG] 2022-08-20 01:03:31.497 exp-backoff-strategy [139657729799936] id=0x7f049a6c64d0: Vending retry_token 0x7f048bc12100
[DEBUG] 2022-08-20 01:03:31.497 connection-manager [139657729799936] id=(nil): Acquire connection

FATAL: Received signal 11 (Segmentation fault)
Backtrace (most recent call first)
#10      <?> at 0x7f04b9f75980 in __restore_rt
#9       <?> at 0x55afd6f0535b in s_s3_client_acquired_retry_token
#8       <?> at 0x55afd6f60ff2 in s_exponential_retry_task
#7       <?> at 0x55afd6fceb39 in aws_task_run
#6       <?> at 0x55afd6fced90 in s_run_all
#5       <?> at 0x55afd6fcf1db in aws_task_scheduler_run_all
#4       <?> at 0x55afd6f6a215 in s_main_loop
#3       <?> at 0x55afd6fd3100 in thread_fn
#2       <?> at 0x55afd3f35304 in thread_metrics_start_routine
#1       <?> at 0x7f04b9f6a6db in start_thread

The line

[ERROR] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 0x7f04a4fd1ca0 s_s3_client_create_connection_for_request_default: (nil)

shows that client 0x7f04898cf000 uses endpoint 0x7f04a4fd1ca0 whose http_connection_manager is (nil). This is the bug, which produces the subsequent crash.

Here is the trace for that endpoint from the log:

[DEBUG] 2022-08-20 01:03:31.393 S3Endpoint [139658686979648] id=0x7f04a4fd1ca0: Created connection manager 0x7f0430ea0fc0 for endpoint
[ERROR] 2022-08-20 01:03:31.393 S3Client [139658686979648] id=0x7f04898cf000 0x7f04a4fd1ca0 aws_s3_client_make_meta_request: aurora-simulation-prod-logs.s3.us-east-1.amazonaws.com ADDED 1
[ERROR] 2022-08-20 01:03:31.394 S3Client [139657528473344] id=0x7f04898cf000 0x7f04a4fd1ca0 s_s3_client_create_connection_for_request_default: 0x7f0430ea0fc0
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657616553728] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: s_s3_endpoint_ref_count_zero
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: s_s3_endpoint_ref_count_zero - removing connection_manager
[ERROR] 2022-08-20 01:03:31.495 S3Client [139658686979648] id=0x7f04898cf000 0x7f04a4fd1ca0 aws_s3_client_make_meta_request: aurora-simulation-prod-logs.s3.us-east-1.amazonaws.com REF conman: (nil)
[ERROR] 2022-08-20 01:03:31.496 S3Client [139657528473344] id=0x7f04898cf000 0x7f04a4fd1ca0 s_s3_client_create_connection_for_request_default: (nil)

Pay close attention to the thread IDs in this part:

[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657616553728] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2

That means that aws_s3_client_endpoint_release was invoked from 2 different threads at the same time (possibly synchronized via mutex), both ended up calling aws_ref_count_release(&endpoint->ref_count), decrementing from 2 -> 1 -> 0.

Both were seeing an endpoint->ref_count.ref_count of 2 initially, hence neither ejected the endpoint from the hash table.
Here is the function with above logging added:

// source/s3_endpoint.c
void aws_s3_client_endpoint_release(struct aws_s3_client *client, struct aws_s3_endpoint *endpoint) {
    AWS_PRECONDITION(endpoint);
    AWS_PRECONDITION(client);
    AWS_PRECONDITION(endpoint->handled_by_client);

        AWS_LOGF_ERROR(
            AWS_LS_S3_ENDPOINT,
            "id=%p: %p aws_s3_client_endpoint_release, count = %d",
            (void *)endpoint,
            (void *)client,
            aws_atomic_load_int(&endpoint->ref_count.ref_count)); // <== BOTH SEE 2 HERE
    /* BEGIN CRITICAL SECTION */
    {   
        aws_s3_client_lock_synced_data(client);
        /* The last refcount to release */
        if (aws_atomic_load_int(&endpoint->ref_count.ref_count) == 1) { // <== BOTH SEE 2 HERE
        AWS_LOGF_ERROR(
            AWS_LS_S3_ENDPOINT,
            "id=%p: aws_s3_client_endpoint_release - removing from hashtable",
            (void *)endpoint);
            aws_hash_table_remove(&client->synced_data.endpoints, endpoint->host_name, NULL, NULL);
        }
        aws_s3_client_unlock_synced_data(client);
    }   
    /* END CRITICAL SECTION */

    aws_ref_count_release(&endpoint->ref_count); // <== BOTH CALL THIS
}

Hence the expected "aws_s3_client_endpoint_release - removing from hashtable" does not appear in the log.
Instead, since endpoint->ref_count now reaches 0, the http_connection_manager of endpoint 0x7f04a4fd1ca0 is now removed, released and set to NULL:

[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657616553728] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: 0x7f04898cf000 aws_s3_client_endpoint_release, count = 2
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: s_s3_endpoint_ref_count_zero
[ERROR] 2022-08-20 01:03:31.493 S3Endpoint [139657528473344] id=0x7f04a4fd1ca0: s_s3_endpoint_ref_count_zero - removing connection_manager

When next the s3_client wants to make a request, it finds an entry in said hash table, uses it, and logs the fact that the http_connection_manager field of that endpoint is (nil):

[DEBUG] 2022-08-20 01:03:31.495 S3Client [139657528473344] id=0x7f04898cf000 Updating connections, assigning requests where possible.
[DEBUG] 2022-08-20 01:03:31.495 S3MetaRequest [139658686979648] id=0x7f04a3dd0200 Created new Default Meta Request.
[ERROR] 2022-08-20 01:03:31.495 S3Client [139658686979648] id=0x7f04898cf000 0x7f04a4fd1ca0 aws_s3_client_make_meta_request: aurora-simulation-prod-logs.s3.us-east-1.amazonaws.com REF conman: (nil) <=== HERE
[INFO] 2022-08-20 01:03:31.495 S3Client [139658686979648] id=0x7f04898cf000: Created meta request 0x7f04a3dd0200
[INFO] 2022-08-20 01:03:31.495 S3ClientStats [139657528473344] id=0x7f04898cf000 Requests-in-flight(approx/exact):0/0  Requests-preparing:0  Requests-queued:0  Requests-network(get/put/default/total):0/0/0/0  Requests-streaming-waiting:0  Requests-streaming:0  Endpoints(in-table/allocated):1/0

It also reports that it has 1 endpoint (0x7f04a4fd1ca0) in-table, and 0 allocated.

So the cause of the problem is an inconsistency:

when the s3_client loads an s3_endpoint from the hash table, it expects its http_connection_manager to not be NULL,
the race condition in aws_s3_client_endpoint_release prevented a required release of the endpoint from the hash table.

Cause of the race condition

Here is a possible sequence of two threads, T1 and T2, where .Lock() stands for taking the synced_data.lock:

T1.Lock()                       
   // blocks T2, which also needs to go through this section
   // reference count is 2, hence do nothing
T1.Unlock()
                                 T2.Lock() // now grabs the lock
                                 // reference count is 2, hence do nothing
                                 T2.Unlock()

T1.aws_ref_count_release(&endpoint->ref_count); // Atomic, happens in sequence

                                 T2.aws_ref_count_release(&endpoint->ref_count); 
// Value of endpoint->ref_count is now 0

The same would happen if T2 was allowed to decrement endpoint->ref_count first.

Fix

Since ref_count.ref_count is read/seen within the critical section, but modified outside of the critical section, the race condition can occur.
It can be avoided by pulling aws_ref_count_release(&endpoint->ref_count); inside the critical section.

awslabs / aws-c-s3 Goto Github PK

aws-c-s3's People

Contributors

Stargazers

Watchers

Forkers

aws-c-s3's Issues

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

aws-c-s3 version used

Compiler and version used

Operating System and version

Describe the feature

Use Case

Proposed Solution

Other Information

Acknowledgements

Describe the issue

Links

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

aws-c-s3 version used

Compiler and version used

Operating System and version

Problem description

Open Question

Describe the feature

Problem Description

Request

Problem description

Code analysis

Initial fix

Fixing the problem properly

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

aws-c-s3 version used

Compiler and version used

Operating System and version

Problem description

Problem resolution

Fix that worked for us

Describe the feature

Use Case

Proposed Solution

Other Information

Acknowledgements

Describe the issue

Links

Problem description

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

AWS CPP SDK version used

Compiler and Version used

Operating System and version

Describe the feature

Use Case

Proposed Solution

Other Information

Acknowledgements

Describe the feature

Use Case

Proposed Solution

Other Information

Acknowledgements