GithubHelp home page GithubHelp logo

Comments (16)

aaronenberg avatar aaronenberg commented on July 28, 2024 1

I figured it out. Because I am using my own custom storage class inherited from S3Boto3Storage and changed the S3Boto3Storage.location (the default is: '') , the new location is not being picked up by s3file. s3file puts the upload into the <bucket_url>/tmp/s3file/ folder. It doesn't know about the prefix added to the path by S3Boto3Storage._normalize_name() . So then when that is passed to boto3 it makes the api call trying to read the file from the prefixed S3 location when s3file uploaded it somewhere else.

My initial thought to fix is just to prepend default_store.location to upload_path in the forms.py module like so:

upload_path = getattr(
    settings, 
    default_storage.location + 'S3FILE_UPLOAD_PATH', 
    pathlib.PurePosixPath(default_storage.location, 'tmp', 's3file')
)

and then do a path.lstrip(default_storage.location + '/') to take it out before passing to django-storages

from django-s3file.

codingjoe avatar codingjoe commented on July 28, 2024

Hi @aaronenberg

Thanks for reaching out. Let me try to paraphrase what you are saying, to make sure I got it correctly.

I understand that files are created at /tmp on your application server, but there are not in the upload location on your S3 bucket.

In this particular case, I would suggest checking the browsers network logs. Does it actually upload the files to S3 or are they uploaded to your application server. You can check if your server is correctly configured by checking the HTML output of the form. If the input has the class s3file, it should be configured correctly.

Regarding upload time improvements. This package doesn't make any claims of being "faster". That not the point. This package intends to take load of your application servers. Large file upload requests will introduce unnecessary IO and can keep your application server busy or even blocked. In fact, you should configure your application server to reject too large request bodies, since this can be used as an attack vector for a DoS attack. This is why nginx by default limits the request body size to 1MB.

You do have a point though. We could, in theory, send all files (from a single file input) within the same request. In practice, I don't know if S3 supports that functionality, since it's not really CRUD behavior. In fact, they recommend sending multiple concurrent requests. This does result in higher latency, but I presume is favorable for their internal service architecture.

In fact latency doesn't matter much on the client side too, since we post all files in parallel. Which is probably why you see the supposed speed increase when uploading multiple files at once.

Anyhow, I hope that answer helps you a bit. Let me know if you have any further questions.

Best
-Joe

from django-s3file.

aaronenberg avatar aaronenberg commented on July 28, 2024

Thanks for responding to my issue.
I don't believe I am getting the intended functionality out of django-s3file which is to relieve the application server of processing file uploads by directly sending uploads to an S3 bucket. I know I am not because those uploads are being sent to my application server and then moved to the bucket. You mentioned checking the browser's network logs to see if uploads are sent to the bucket, and I did check for that but to my knowledge browser's don't actually log that information. A little more about my app, I am using a formsets i.e. multiple file inputs in a single form. I see that this should handle formsets nicely as on the client-side you are creating an individual form baked with S3 API params for a every file in an input and for all inputs with .s3file class. So it is using AWS Signature V4? Thanks.

from django-s3file.

codingjoe avatar codingjoe commented on July 28, 2024

Hi @aaronenberg

Thanks again for sharing our questions here. I am certain they will help other to debunk their own project.

Anyhow, all modern browsers keep a network log. In Chrome for example, you find a Network tab in the developer console. If you hit the checkbox preserve logs, it will keep the logs even if you leave the main page.

Screenshot 2019-06-16 at 18 26 10

In any event, I believe I can help you best, if you share the rendered HTML with me. Just hit Ctrl+Alt+U on your keyboard. I am particularly interested about the input-tag. I should have a lot of data attributes, like so:

<input type="file" name="file" data-fields-x-amz-algorithm="AWS4-HMAC-SHA256" data-fields-x-amz-date="20170908T111600Z" data-fields-x-amz-signature="asdf" data-fields-x-amz-credential="testaccessid" data-fields-policy="asdf" data-fields-key="tmp/${filename}" data-url="/s3/" multiple id="id_file" class="s3file">

Oh, and the package utilizes pre-signed URLs and depending on your configuratio it will use v4 signatures. In fact, this should be the default.

from django-s3file.

aaronenberg avatar aaronenberg commented on July 28, 2024

I checked the Network Tab, I do not see any POST requests to the S3 bucket. Only to the application server. Here is a log after submitting the form with a file.

network

Here is the the input-tag

<input type="file" name="media-0-media" data-fields-key="tmp/s3file/2cdoI5trSTCf9fU6Q1zhLg/${filename}" data-fields-awsaccesskeyid="AKIAYZJR7ZVV7OPJ6E44" data-fields-policy="eyJleHBpcmF0aW9uIjogIjIwMTktMDYtMzBUMTY6MzI6MTJaIiwgImNvbmRpdGlvbnMiOiBbeyJidWNrZXQiOiAic2lyaXVzLXN0YXRpYy1tZWRpYSJ9LCBbInN0YXJ0cy13aXRoIiwgIiRrZXkiLCAidG1wL3MzZmlsZS8yY2RvSTV0clNUQ2Y5ZlU2UTF6aExnIl0sIHsic3VjY2Vzc19hY3Rpb25fc3RhdHVzIjogIjIwMSJ9LCBbInN0YXJ0cy13aXRoIiwgIiRDb250ZW50LVR5cGUiLCAiIl0sIHsiYnVja2V0IjogInNpcml1cy1zdGF0aWMtbWVkaWEifSwgWyJzdGFydHMtd2l0aCIsICIka2V5IiwgInRtcC9zM2ZpbGUvMmNkb0k1dHJTVENmOWZVNlExemhMZy8iXV19" data-fields-signature="UKDVvLdoZKFFCfAhRgRWFYa4uUg=" data-url="https://sirius-static-media.s3.amazonaws.com/" multiple="multiple" id="id_media-0-media" class="s3file">

from django-s3file.

codingjoe avatar codingjoe commented on July 28, 2024

Hi @aaronenberg I am currently working on a new feature to improve local development. Please check out the branch dummy-backend and it's documentation: https://github.com/codingjoe/django-s3file/tree/dummy-backend#using-s3file-in-development

Oh, and can you please also check, that you have the form.media included in your page? There should be a JavaScript script loaded called s3file.js.

from django-s3file.

aaronenberg avatar aaronenberg commented on July 28, 2024

Hey I switched to dummy-backend branch for my local dev environment. I am not seeing s3file.js loaded with either setup. I am including {{ form.media }} in the template.

from django-s3file.

codingjoe avatar codingjoe commented on July 28, 2024

@codingjoe did you add s3file to your installed apps?

from django-s3file.

codingjoe avatar codingjoe commented on July 28, 2024

It's strange that you your script does not include the s3file.js script. Can you maybe – just for testing – add it manually to the template?

from django-s3file.

aaronenberg avatar aaronenberg commented on July 28, 2024

Yes I added to installed apps. dummy-backend is not doing anything to the input tag, not loading the script, nor using the s3file url endpoints when submitting the form. On my test-server though when I add the s3file.js directly to the template I get a 403 forbidden. I have my static files in a public-read S3 bucket so permissions are a non-issue.

Edit: I had the wrong path to the src in the template script tag.

from django-s3file.

aaronenberg avatar aaronenberg commented on July 28, 2024

I copied s3file.js into the template and this is the error I'm getting on my test server after submitting the form with a file.

2019-06-17 09:26:48 [ERROR ] Internal Server Error: /outcomes/create/new/
Traceback (most recent call last):
File "/opt/python/run/venv/local/lib/python3.6/site-packages/storages/backends/s3boto3.py", line 464, in _open
f = S3Boto3StorageFile(name, mode, self)
File "/opt/python/run/venv/local/lib/python3.6/site-packages/storages/backends/s3boto3.py", line 72, in __init__
self.obj.load()
File "/opt/python/run/venv/local/lib/python3.6/site-packages/boto3/resources/factory.py", line 505, in do_action
response = action(self, *args, **kwargs)
File "/opt/python/run/venv/local/lib/python3.6/site-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(**params)
File "/opt/python/run/venv/local/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/opt/python/run/venv/local/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/python/run/venv/local/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
response = get_response(request)
File "/opt/python/run/venv/local/lib/python3.6/site-packages/s3file/middleware.py", line 19, in __call__
request.FILES.setlist(field_name, list(self.get_files_from_storage(paths)))
File "/opt/python/run/venv/local/lib/python3.6/site-packages/s3file/middleware.py", line 27, in get_files_from_storage
f = default_storage.open(path)
File "/opt/python/run/venv/local/lib/python3.6/site-packages/django/core/files/storage.py", line 36, in open
return self._open(name, mode)
File "/opt/python/run/venv/local/lib/python3.6/site-packages/storages/backends/s3boto3.py", line 467, in _open
raise IOError('File does not exist: %s' % name)
OSError: File does not exist: media/tmp/s3file/Zhi_Ct_KQGaKSFOt42i4Vg/SIRIUS W.M. Keck Collection-20190423T002314Z-001.zip

So boto3 is making a HEADObject api call on the file being uploaded, basically trying to read the file before s3file puts it there.

from django-s3file.

codingjoe avatar codingjoe commented on July 28, 2024

Great I saw the new issue you opened and commented on the pull-request. As for the missing JS file, this seem solved, correct?

from django-s3file.

aaronenberg avatar aaronenberg commented on July 28, 2024

Right now I am including the JS file directly in the template. Without that it won't load. It may have something to do with the js path in S3FileInputMixin's Media class. I have STATIC_URL='/static/' and so the widget's src path becomes '/static/s3file/js/s3file.js'. So that's where staticfiles should look for it. It's there but it's just not loading it. Also on dummy-backend, because the DummyS3Boto3Storage class doesn't have S3Boto3Storage as its parent, the isinstance(default_storage, S3Boto3Storage) check in apps.py fails so the mixin doesn't get included in ClearableFileInput.__bases__.

from django-s3file.

codingjoe avatar codingjoe commented on July 28, 2024

@aaronenberg good point, this needs to be fixed. It's still a working branch, so are still some bugs. With that feature I intend to make debugging simpler. You are kind of my guinea pigs, thank you for that :)

from django-s3file.

ktryber avatar ktryber commented on July 28, 2024

@codingjoe @aaronenberg This post saved me quite a bit. I wanted to post my learnings in case anyone else is experiencing a slow request time after the file is sent to S3.

for me the django-s3file package was working perfect, but after the POST request was successful directly to s3, my django app would take about 20 seconds to redirect to the success page. I'm on Heroku so if it took 30+ seconds my app would error.

The problem was exactly like what @aaronenberg said. I had a custom class overriding S3Boto3Storage

class MediaStorage(S3Boto3Storage):
    location = 'media/'
    file_overwrite = False

I ALSO had in my settings:
AWS_LOCATION = 'static/'

To fix this, I changed:
DEFAULT_FILE_STORAGE = 'config.settings.storage_backends.MediaStorage'

back to the default django-storages file storage:
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'

If anyone else runs into this, you'll need to move your files in S3 to the right directory or any old file will not be available in your app without playing with the url.

from django-s3file.

codingjoe avatar codingjoe commented on July 28, 2024

Hi @ktryber thanks for sharing you findings. I wonder, is your custom storage backend actually a subclass of the django-storages backend? And do you have multiple storage backends setup? If files are copied between backends, this could cause a problem. It could also be problematic if your storage is in a different data center from your application, since the application server must retrieve the file from the storage while precessing your POST request. Best, Jo

from django-s3file.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.