GithubHelp home page GithubHelp logo

developmentforpeople / dfp_external_storage Goto Github PK

View Code? Open in Web Editor NEW
25.0 3.0 12.0 361 KB

Simplest cloud file management for Frappe / ERPNext. S3 compatible external bucket can be assigned per Frappe folder, allowing you to fine-tune the location of your Frappe / ERPNext "File"s: within local filesystem or to exteral S3 bucket.

License: MIT License

Python 76.06% JavaScript 15.75% SCSS 0.88% CSS 0.26% HTML 7.05%
erpnext frappe frappe-framework s3 s3-bucket s3-buckets s3-storage

dfp_external_storage's Introduction

DFP External Storage

Simplest cloud file management for Frappe / ERPNext. S3 compatible external bucket can be assigned per Frappe folder, allowing you to fine-tune the location of your Frappe / ERPNext "File"s: within local filesystem or to exteral S3 bucket.

Frappe files within S3 buckets

Examples / Use cases

All Frappe / ERPNext files into external S3 compatible bucket

upload_frappe_erpnext_files_s3_compatible_bucket.webm

Move files / objects from S3 compatible bucket to another S3 compatible bucket (between buckets in same or different connection)

move_objects_from_one_s3_compatible_bucket_to_another.webm

Move files / objects from S3 compatible bucket to local file system

move_objects_from_s3_compatible_to_local_filesystem.webm

Move files in local filesystem to S3 compatible bucket

move_local_files_to_s3_compatible_bucket.webm

Per file examples

move_file_from_s3_compatible_bucket_to_different_one_then_to_local_file.webm

List all remote files in bucket

Shows all files in bucket, even the ones not in Frappe File doctype.

list_files_in_remote_s3_bucket.webm

Customizable

Choose the best setup for you: S3 only for all site files or specified folders, use S3 / Minio presigned urls, cache or not small files, etc.

Settings

Requirements

  • Frappe version >= 14

Functionalities

  • S3 bucket can be defined per folder/s. If "Home" folder defined, all Frappe / ERPNext files will use that S3 bucket.
  • Files accesible with custom URLs: /file/[File ID]/[file name.extension]
  • Frappe / ERPNext private/public functionality is preserved for external files. If an external private file is loaded not having access a not found page will be showed.
  • External Storages can be write disabled, but files will be visible yet.
  • Bulk file relocation (upload and download). You can filter by local S3 bucket/local filesystem and then change all those files to a different S3 bucket or to local filesystem. All files are "moved" without load them fully in memory optimizing large ones transfer.
  • Small icon allows you visualize if file is within an S3 bucket.
  • Same file upload (same file hash) will reuse existent S3 key and is not reuploaded. Same functionality as Frappe has with local files.
  • Choosed S3 bucket file listing tool.
  • S3 bucket can not be deleted if has "File"s assigned / within it.
  • If bucket is not accesible file will be uploaded to local filesystem.
  • Stream data in chunks to and from S3 without reading whole files into memory (thanks to Khoran
  • List all remote objects in bucket (includes too the ones not uploaded trough Frappe)
  • Support for S3 / Minio presigned urls: allowing video streaming capabilities and other S3 functionalities.
  • Presigned url can be used for all files in defined folders but defined by mimetype.
  • Files are now streamed by default.
  • Extended settings per External Storage doc:
    • Cache only files smaller than
    • Cache for x seconds
    • Stream buffer size
    • Presigned url activation
    • Presigned url only for mimetypes defined
    • Presigned url expiration
    • Use S3 file size instead of saved on Frappe File (needed for files > 2GB)
  • ... maybe I am forgetting something ;)

Flow options

  • No S3 external storages defined
  • or S3 external storages defined but not assigned to folders:
    • All uploaded files are saved in local filesystem
  • One S3 external storage assigned to "Attachments" folder:
    • Only files uploaded to that folder will be use that S3 bucket
  • One S3 external storage assigned to "Home" folder:
    • All files uploaded to Frappe will be located within that bucket. Except the files uploaded to "Attachments" that will use the above defined bucket

File actions available

  • If a "File" has an "DFP External Storage" assigned.
    • If changed to a different "DFP External Storage" file will be:
      • "downloaded" from previous bucket > "uploaded" to new bucket > "deleted" from previous bucket.
    • If leaved empty, file will be "downloaded" to local filesystem > "deleted" from bucket.
  • If a "File" has no "DFP External Storage" assigned, so it is in local filesystem:
    • If assigned a "DFP External Storage", file will be:
      • "uploaded" to that bucket > "deleted" from filesystem

Setup or try it locally

Install Frappe 14

Follow all steps for your OS within official guide: https://frappeframework.com/docs/v14/user/en/installation.

Create your personal "frappe-bench" environment (customizable folder name)

Into your home folder:

cd ~
bench init frappe-bench

Install "dfp_external_storage" app

cd ~/frappe-bench
bench get-app [email protected]:developmentforpeople/dfp_external_storage.git

Create a new site with "dfp_external_storage" app installed on it

cd ~/frappe-bench
bench new-site dfp_external_storage_site.localhost --install-app dfp_external_storage

Initialize servers to get site running

cd ~/frappe-bench
bench start

Create one or more "DFP External Storage"s

Add one or more S3 bucket and, this is the most important step, assign "Home" folder to it. This makes all files uploaded to Frappe / ERPNext being uploaded to that bucket.

You can select a different folder and only those files will be uploaded, or select different buckets for different folders, your imagination is your limit!! :D

Stream data to and from S3 without reading whole files into memory

Option is valuable when working with large files.

For uploading content from a local file, usage would look like:

file_doc = frappe.get_doc({
    "doctype":"File",
    "is_private":True,
    "file_name": "file name here"
})
file_doc.dfp_external_storage_upload_file(filepath)
file_doc.save()

To download content to a local file:

file_doc = frappe.get_doc("File",doc_name)
file_doc.dfp-external_storage_download_to_file("/path/to/local/file")

To read remote file directly via a proxy object:

file_doc = frappe.get_doc("File",doc_name)

#read zip file table of contents without downloading the whole zip file
with zipfile.ZipFile(file_doc.dfp_external_storage_file_proxy()) as z:
  for zipinfo in z.infolist():
     print(f"{zipinfo.filename}")

Pending

  • Make tests:
    • Create DFP External Storage
    • Upload file to bucket
    • Read bucket file
    • Relocate bucket file
    • Delete bucket file

Contributing

  1. Code of Conduct

Attributions

License

MIT

dfp_external_storage's People

Contributors

developmentforpeople avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dfp_external_storage's Issues

attachment shearing not working

when you share an attachment file with a user, the user can't view the content of the attachment and it doesn't give any errors.

image

DocType Permissions

I am having issues with the app, when installed, I get a permission error. It says Access Denied DFP External Storage by Folder. For some reason I cannot set permissions for this DocType as it is a child object. Any ideas? Thanks in advance.

Repost Item Valuation currently broken when S3 used for attachments

When some stock values change, a Repost Item Valuation document is created by ERPNext and it uses an attachment to store some data. At some point it calls get_full_path on this file and tries to write data to it. Of course, if the file has been stored in S3, since it is an attachment, this fails.

Could you add this doc type ("Repost Item Valuation") to the DFP_EXTERNAL_STORAGE_IGNORE_S3_UPLOAD_FOR_DOCTYPES list? Maybe this could be a user-configurable option even?
Thanks.

file_url base is "file" instead of "files"

The default File doctype uses "/files" for the base of the file_url field. But DFPExternalFile uses "/file". This seems to work fine most of the time. But I just encountered a case where it is a problem. I had a Sales Order with two attachments. It was canceled and I was trying to amend it. Part of that process is transferring the attachments to the new, amended, Sales Order. At some point the File.validate function is called, which calls File.validate_file_url, which contains the following:

def validate_file_url(self):
		if self.is_remote_file or not self.file_url:
			return

		if not self.file_url.startswith(("/files/", "/private/files/")):
			# Probably an invalid URL since it doesn't start with http either
			frappe.throw(
				_("URL must start with http:// or https://"),
				title=_("Invalid URL"),
			)

In particular, it checks that the file_url field starts with either "/files" or "/private/files". The first one fails since it starts with "/file". Now it seems like this check should never even happen because of the is_remote_file check at the beginning. But somehow I got that exact error message, ie, "URL must start with http:// or https://" ( which seems like a terrible error message since that's not what it's checking, but that's a different issue...). I don't have an explanation for that part. I suspect it is creating a new File (still local) by copying the fields from the DFPExternalFile. Thus it's not yet remote, but still has the invalid file_url data.

Is it possible to just change the prefix to "/files"?

ERPNext Report not working due to call of get_content on file in S3

The built in "Stock Balance" report (and likely other reports) in ERPNext is broken when using S3 for attachments.

I have DFP External Storage configured to put all Attachments in S3 storage. When I open the Stock Balance report and click "Generate New Report", I get the error: "Cannot access file path /file/c6238a107c/2023-43-18-13:12.json.gz". Now that file does exist as a "File" record, and is stored in S3, as it should be.

I dug into the code for the report. When you generate a report it runs the query in the background and then displays the content. The content is saved as a file (the one mentioned above), and this file is attached to a "Prepared Report" record type. When the report wants to display the query results, it fetches this Prepared Report (frappe.core.doctype.prepared_report.prepared_report.py) and calls get_prepared_data() on it. This function fetches the file attached to it (the above json file) as a "File" record, and then calls get_content() on it. This fails because the file does not exist locally, and the get_content() function is only defined on "File", but not on "DFPExternalStorageFile". Thus it does not know how to fetch the remote file contents and throws an exception.

A quick search of frappe and erpnext shows many places where get_content() is called, so this could be a more widespread issue.

I'm not sure what the best way to fix this is, hence this issue. It seem like it might be necessary to override get_content() in DFPExternalStorageFile to catch this case and fetch the content for remote files correctly.

import problem

i would like to thank you first for this beautiful project, we currently have an issue with this module as we try to import data
it doesn't select the file so the import process doesn't start.

here is a screen shot of the error, i think this also happens when in other places two.
image

Feature: allow streaming file content directly to S3

I'm willing to implement this feature if you're interested in it. But I thought it would be good to get your feedback on how best to go about it.
I need to be able to store large files (over 2GB). Currently, the only way to save a file is to read its contents into memory and pass it through the contents field. This is not possible for large files without risking a system crash. I was thinking, if you create the file doc like this:

file_doc = frappe.get_doc({
    "doctype":"File",
    "folder":"Home/Uploads",
    "file_name":"name_of_file"
 })

then before saving the document, some function is called to upload the content from a file in a temporary location or a byte stream. This could then be directly streamed to S3. Something like:

file_doc.set_contents(open(temp_file,"r+"))

or

file_doc.set_contents(io.BytesIO(b'hello'))

Internally this could call dfp_external_storage_upload_file, but with another argument, which is the byte stream.

Or do you see a better way to implement that?

duplicate file names

ERP's built in file storage handles files with identical names by adding a random string to the file name and also including that in the file_url field.

However, with S3 this behavior is not the same. The URL is made unique by using the name of the File document, but the object storage key only uses the file_name field, not including the unique name field of the File document. Thus, while ERP stores two different File documents, S3 is writing to the same file for both of them.

The effect can be hidden by the cache for a time. If you've downloaded the first upload of a file, then it is cached using name. If you then upload another file with different content but the same file name, it will appear you can download the two different files with their own content correctly. But if you disable or clear the cache, then you can confirm that the first file has been overwritten by the second file.

The issue seems to stem from the dfp_external_storage_upload_file function:

                key = f"{frappe.local.site}/{self.file_name}"
		is_public = "/public" if not self.is_private else ""
		if not local_file:
			local_file = "./" + frappe.local.site + is_public + self.file_url

		try:
			if not os.path.exists(local_file):
				frappe.throw(_("Local file not found"))
			with open(local_file, "rb") as f:
				self.dfp_external_storage_client.put_object(
					bucket_name=self.dfp_external_storage_doc.bucket_name,
					object_name=key,
					data=f,
					length=os.path.getsize(local_file),
					# Meta removed because same s3 file can be used within different File docs
					# metadata={"frappe_file_id": self.name}
				)

			self.dfp_external_storage_s3_key = key
			self.dfp_external_storage = self.dfp_external_storage_doc.name
			self.file_url = f"/{DFP_EXTERNAL_STORAGE_URL_SEGMENT_FOR_FILE_LOAD}/{self.name}/{self.file_name}"

			if delete_file:
				os.remove(local_file)
			self.save()
		except Exception as e:

The key field is assigned the site name and file name, but not the unique name field. Then it is used as the object_name and saved as the dfp_external_storage_s3_key.

Could this be changed to include the unique name field in the key as well?

a few tweaks to recent streaming changes

I noticed that in the function dfp_external_file_proxy, you removed the following code

object_info = self.dfp_external_storage_client.stat_object(
			bucket_name=self.dfp_external_storage_doc.bucket_name,
			object_name=self.dfp_external_storage_s3_key)

and use self.file_size instead of object_info.file_size, which seems like a good idea at first. But I just encountered a case where a file that was really several gigabytes had the file_size field set to '14', as in, 14 bytes. The result was that the first small read tried to suck the entire file into memory. Of course, that file_size value should be corrected, but I think in order to make the code robust to user error, we should still fetch the true size from the S3 server. I agree it would have been nice to avoid that extra call though.

Another small issue I encountered was in the file function. The check if not response_values['response'] fails with a KeyError if the key 'response' is not even in the response_values dictionary. So an extra guard of 'response' not in response_values or ... would be good there.

Thanks.

URL must start with http:// or https://

Hi developmentforpeople, the last time I experienced this error: "URL must start with http:// or https://" I checked your issues and saw that the issue was resolved. So I decided to reinstall the application and after that I have still been experiencing the same error.

Whenever I try to attach a file from my s3bucket I get this error: "URL must start with http:// or https://".
Can you please help?

streamed downloads

I need to be able to download files larger than memory (potentially). I just realized recently that this is not currently happening. While looking at the file function that does the downloading, I noticed your comment "# TODO: For videos enable direct stream from S3". So it seems you've thought about this too.

I'm willing to implement something, as I need this feature right now. But I thought I'd ask you first if you had some ideas/preferences on how to implement this? Or should I just come up with something on my own?

Do you think it would be possible and/or a good idea, to download directly from S3? Either via a redirect response, or just have a function that returns an S3 URL that could be used however?

When simulating file losts on bucket images show 404 error

I have tried the app for various scenarios, one of them being the total lost of the bucket.

Steps

  1. Delete all content from bucket
  2. I then use the files from the replication bucket to simulate file recovery
  3. I managed to get everything back (had to reconstruct DFP External Storage: S3 Key by adding the folder).
  4. Everything seemed to work fine but then i notice that the image files are not recognized (error 404).
  5. I can move the files back to the server (private/Files or public/files folders) and they will show up, download and preview correctly. I do not get any error from the app, it moves the file back to the server and deletes it from the bucket.
  6. Asigning them back to the bucket shows again a 404 error (the file dissappears from server and appears on bucket as expected)
  7. I re-checked the bucket and the files do exists. But erpnext shows no preview and downloading generates a 404 error.

Actions taken

  1. I tried optimizing again the file (while on the server, step 5) but still same error, i have confirmed this behavior with all "lost" image files and confirm that all other file types work perfect.
  2. Adding new images also works perfectly.

Using Backblaze b2 for bucket.

Regards

Luis Montanaro

files in cloud are opening in browser instead of downloading

Files stored in S3 using DFP_external_storage, like excel spreadsheets (.xlsx), are opening in the browser (showing binary junk) instead of downloading. If I move the file back to local storage, it downloads to the machine as expected. This can be triggered by just going to the file entry and clicking the "Download" button at the top, or by using the file url.

There may be some relation to the version of Frappe. I have a development version at frappe v14.24.0 and dfp_external_storage 0.9.1 which does NOT exhibit the problem. My production version is frappe v14.28.0, dfp_external_storage 0.9.1 and does have the problem. I also had a staging version that started at v14.24.0, with no problem, and then started to show the problem after upgrading it to frappe v14.28.0.

Here is an example file. The problem occurs with other file types as well, though for things that can be displayed by the browser (like .jpg, .pdf, etc. ) it's not so much of a problem.
download_test.xlsx

accessing propery on self.dfp_external_storage_doc when it is None

I installed this app and got an error about self.dfp_external_storage_doc.enabled being accessed on a "None" valued object. In looking at the code, this came from line 209 for doctype/dfp_external_storage/dfp_external_storage.py:

if not self.dfp_external_storage_doc.enabled:

The value self.dfp_external_storage is a function property which can sometimes return None, leading to the above error. So I suggest adding a guard to that statement, like:
if not (self.dfp_external_storage_doc and self.dfp_external_storage_doc.enabled):

That fixed the bug for me and it has been working great since.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.