janko / down Goto Github PK
View Code? Open in Web Editor NEWStreaming downloads using net/http, http.rb, HTTPX or wget
License: MIT License
Streaming downloads using net/http, http.rb, HTTPX or wget
License: MIT License
Hey there!
I noticed this behavior while trying to make a request to an image resource, using If-None-Match: <etag>
in the request headers. As expected, the server returns a 304 Not Modified error when the etag matches that of the existing resource.
In net_http_request
, there is a clause that checks if the response.is_a?(Net::HTTPRedirection)
, and then attempts to verify that the redirect address (response["Location"]
) is a valid address. Although the response 304 Not Modified is a redirect code, it does not return a location field.
As a result, we pass a nil url value to ensure_uri
, and end up with ArgumentError: bad argument (expected URI object or URI string)
Is this expected behavior? It seems like it may be a good idea to add a check for Net::HTTPNotModified, and then raise an error that fits into Down's exception hierarchy, (maybe Down::NotModified
?).
If this would be a welcome change, I'm happy to help out.
Thanks!
The detail is here: #5
I only changed README.md but build of travis-ci failed.Does anyone know why?
Any help is appreciated.Thanks!
can arg with destination return a path?
Hi, I'm using Down to download automatically some files from my webserver... Some of these have spaces in the name, so I replaced all spaces with %20
. By pasting the URL in the browser I get the file downloading, but when using Down.download
I get a ResponseError - 301 Moved Permanently
error from the gem. How can I fix it?
I don't understand how to save the file or is it not designed to do that?
Excellent gem, thanks for making it.
If you run this code...
# test.rb
require 'down'
down = Down.open('https://github.com')
down.each_chunk() {|chunk| }
down.close()
...using ruby -w test.rb
, you'll get a lot of warnings while downloading:
~/.gem/ruby/2.7.0/gems/down-5.1.1/lib/down/chunked_io.rb:304: warning: instance variable @next_chunk not initialized
~/.gem/ruby/2.7.0/gems/down-5.1.1/lib/down/chunked_io.rb:264: warning: instance variable @closed not initialized
This isn't a big deal, but fills up my log files if warnings are turned on.
Thanks
Thanks for making an experimental wget wrapper; this is perfect for me because I need to handle an FTP server with flaky IPv6 support and I don't see any other Ruby libraries that let me force IPv4 easily!
When I try to download this file however it raises EOFError:
require "down/wget"
wget = Down::Wget.new("--inet4-only")
wget.open("ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/Pills/README")
# EOFError (end of file reached)
When I run the command it is generating manually in my CLI it works okay:
wget.send(:generate_command, "ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/Pills/README").join(" ")
# => "wget --no-verbose --save-headers -O - --inet4-only --user-agent Down/5.0.0 --max-redirect 2 --dns-timeout 30 --connect-timeout 30 --read-timeout 30 ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/Pills/README"
Here's my wget version FWIW
$ wget --version
GNU Wget 1.19.4 built on linux-gnu.
-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls
+ntlm +opie +psl +ssl/openssl
Wgetrc:
/etc/wgetrc (system)
Locale:
/usr/share/locale
Compile:
gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
-DLOCALEDIR="/usr/share/locale" -I. -I../../src -I../lib
-I../../lib -Wdate-time -D_FORTIFY_SOURCE=2 -DHAVE_LIBSSL -DNDEBUG
-g -O2 -fdebug-prefix-map=/build/wget-Xb5Z7Y/wget-1.19.4=.
-fstack-protector-strong -Wformat -Werror=format-security
-DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall
Link:
gcc -DHAVE_LIBSSL -DNDEBUG -g -O2
-fdebug-prefix-map=/build/wget-Xb5Z7Y/wget-1.19.4=.
-fstack-protector-strong -Wformat -Werror=format-security
-DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall -Wl,-Bsymbolic-functions
-Wl,-z,relro -Wl,-z,now -lpcre -luuid -lidn2 -lssl -lcrypto -lpsl
ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Originally written by Hrvoje Niksic <[email protected]>.
Please send bug reports and questions to <[email protected]>
I will try to poke around and see if I can figure out what's going on but wanted to file an issue first.
Hello
We are working massively with down gem on importing lot of documents and files from other sites.
To do that, we are using jobs and workers on a Sidekiq queue manager in where each worker calls one Down.download
. We have currently 10 workers
However, we noticed that some downloaded files are old ones, meaning the Down.download
are not downloading the fresh new ones. It seems like internally the tempfile created is not unlinked.
I will try to create a separate test environment to demonstrate the bug
Hi,
First of all, thank you for writing this software. I tried fixing this myself but could not get the ChunkedIO CSV test to work. Currently, ruby's csv calls the io#gets with a nil separator and a limit. In ChunkedIO's current implementation, this causes ChunkedIO to read the entire data source, because it always does so when separator is nil.
When using CSV.new(Chunk.open('http://some/large/file.csv')).each {...}
, I used print statements to verify that the entire file is downloaded before any parsing occurs.
I am using down (5.2.0)
with Rails 5.x
and able to download file successfully but the problem is the filename gets changed so I am trying to rename the tempfile but it returns nil
pry> tempfile = Down.download('https://findkollegie.dk/wp-content/uploads/2018/04/knap-FK-300x54.png', destination: '/home/amit')
=> nil
> tempfile.path
NoMethodError:
undefined method `path' for nil:NilClass
As per the doc this is expected behaviour but shouldn't it have consistent behaviour?
Hello all,
First of all, thank you for the wonderful gem.
I think I've encountered a bug when using the destination
option - the file gets downloaded to the correct location, but the return value from Down.download
is nil
. Omitting that option returns a File
instance.
For example, this works:
url = "https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png"
file = Down.download(url)
# <File:/var/folders/96/q0yz5pns6f7c2y4zld38hsjd4tw3vw/T/down-net_http20190515-83878-bdxd4c.png>
but this does not:
file = Down.download(url, destination: Rails.root.join('public/downloads/' + SecureRandom.uuid))
# nil
even though the file is created (running find public
yields public/downloads/e959fba2-cc08-429c-b8df-7f9938eff8f9
).
Is this the intended behavior, or am I doing something wrong? I am using version 4.8.0
.
How to fix this error:
Down::SSLError (SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate))
I'm not sure if this is as much an issue with down
, but more-so, a side-effect of not having used require 'open-uri'
wherever I had previously used open(url_here)
methods within my app.
After running a bundle update
, down
was updated to 4.0.0, and then I noticed it was breaking unless I required open-uri
.
Is it possible that previously, down
was requiring this, but now doesn't?
Also, is it always necessary to require 'open-uri'
when using open()
?
Hi,
I'm working with the Shrine gem and uploading PDF files to S3 (direct upload).
During the upload, the code is adding some metadata such as the page count.
I noticed that PDF::Reader accepts IO objects as input, but it's not working with the Down::ChunkedIO
class.
This is the code from the shrine uploader:
add_metadata do |file, metadata:, record:, **|
case metadata["mime_type"]
when "application/pdf"
io = file.to_io if file.respond_to?(:to_io)
reader = PDF::Reader.new(io)
{ page_count: reader.page_count }
Exception:
ArgumentError: input must be an IO-like object or a filename (Down::ChunkedIO)
Code from PDF::Reader that will try to read the IO object.
https://github.com/yob/pdf-reader/blob/625e8dca295d8b6a48f634cba812f14fd3b805a4/lib/pdf/reader/object_hash.rb#L601
Shouldn't the Down::ChunkedIO be considered and IO?
The code may work if it input a full file (file.download) or a StringIO, so there are alternatives there.
Any thoughts?
Thank you.
We are using Shrine for handling large CSV uploads that are then stream-processed. We have a progress meter for this which works off the underlying IO object's #pos
values. For local files, this works perfectly. Once we went into our Staging environment with S3 as the storage engine, using Down under-the-hood, it all broke. It seems that after the first 1K of data, Down::ChunkedIO#pos
starts returning values much, much higher than they should be - far beyond the end of the file.
For a particular test file of only 3669 bytes comprising around 55 CSV rows plus header, the size reported by the IO object was consistently correct. However, inside the CSV row iterator, the results of #pos
were:
0
1024
1024
1024
1024
1024
1024
1024
1024
1024
1024
1024
1024
1024
3736
6268
8732
11134
13466
15730
17923
20045
22103
24087
26017
27888
29698
31455
33155
34794
36363
37878
39313
40687
41998
43249
44431
45549
46598
47581
48498
49349
50137
50861
51519
52117
52656
53138
53562
53924
54220
54465
54647
54774
54840
54840
The start offset is 0. The 1024 offset was presumed to be a chunk size from the CSV processor, but if I tried to rewind to zero and read 1024 bytes, I actually got a very strange 1057 bytes, perfectly aligned to a row end, instead. In any event, it then sits at 1024 for a while and once the CSV parsing seems to have gone past that first "chunk" - be it 1024 or 1057 bytes - then the positions reported become, as you can see, very wrong.
The above was generated with no rewinding or other shenanigans; in psuedocode we have:
# shrine_file is our Shrine subclass instance representing the S3 object. The
# encoding specifier is typically UTF-8.
#
# Inside the iterator, io_obj is the Down::ChunkedIO instance. CSV options are:
#
# {:headers=>true, :header_converters=>:symbol, :liberal_parsing=>true}
#
shrine_file.open(encoding: encoding_specifier) do | io_obj |
csv = CSV.new(io_obj, **options)
csv.each do |row|
puts io_obj.pos
end
end
for example:
# frozen_string_literal: true
io = Down::ChunkedIO.new(chunks: %w(one two three).each)
IO.copy_stream(io, some_stream)
=>
FrozenError:
can't modify frozen String
# /Users/lasto/.rvm/gems/ruby-2.5.3/gems/down-4.8.1/lib/down/chunked_io.rb:275:in `force_encoding'
# /Users/lasto/.rvm/gems/ruby-2.5.3/gems/down-4.8.1/lib/down/chunked_io.rb:275:in `retrieve_chunk'
# /Users/lasto/.rvm/gems/ruby-2.5.3/gems/down-4.8.1/lib/down/chunked_io.rb:162:in `readpartial'
Thank you for your work on this gem.
I've a problem with downloading a file, where I need to set a cookie:
dossier_file = Down::NetHttp.download(
dossier_path,
read_timeout: 300,
headers: {
'Cookie': "#{Settings.cookies.session_token}=#{@session_token}",
'User-Agent': @user_agent
}
)
When I do exactly the same with open
it works.
unrecognized option: Cookie
/usr/local/lib/ruby/3.0.0/open-uri.rb:111:in `block in check_options'
/usr/local/lib/ruby/3.0.0/open-uri.rb:108:in `each'
/usr/local/lib/ruby/3.0.0/open-uri.rb:108:in `check_options'
/usr/local/lib/ruby/3.0.0/open-uri.rb:132:in `open_uri'
/usr/local/lib/ruby/3.0.0/open-uri.rb:721:in `open'
/usr/local/bundle/gems/down-5.2.1/lib/down/net_http.rb:146:in `open_uri'
/usr/local/bundle/gems/down-5.2.1/lib/down/net_http.rb:93:in `download'
/usr/local/bundle/gems/down-5.2.1/lib/down/backend.rb:13:in `download'
Hi,
Down::InvalidUrl
/usr/local/bundle/gems/down-4.5.0/lib/down/net_http.rb:236:in `rescue in ensure_uri'
/usr/local/bundle/gems/down-4.5.0/lib/down/net_http.rb:231:in `ensure_uri'
/usr/local/bundle/gems/down-4.5.0/lib/down/net_http.rb:65:in `download'
/usr/local/bundle/gems/down-4.5.0/lib/down/backend.rb:13:in `download'
the standard URI lib used by down
cannot parse URLs with some bizarre characters, such as [
.
Please consider to use https://github.com/sporkmonger/addressable, if you don't mind I can also make a PR for it.
Thanks in advance.
Add Travis CI here, too, please!
Thanks very much!
I am archiving a zip file uploaded with Shrine to S3. Zip::File is complaining about ChunkedIO saying that is not implementing an IO-like object.
The error is:
RuntimeError: Zip::File.open_buffer expects a String or IO-like argument (responds to tell, seek, read, close). Found: Down::ChunkedIO
It looks like ChunkedIO is not implementing tell or seek.
I have worked around this problem by calling download
instead of open
on the Shrine object.
I found already this issue (#44 ) but I'm using active storage in this case with a S3 back-end. So the proposed fix is not useful to me.
What I'm wanting to do is this:
property.asset.attach(io: Down.open("https://somedomain.tld/file.mp4"),
filename: "file.mp4",
content_type: "video/mp4")
So that it can read from the IO and stream the upload and download. Because it uses the S3 gem in backend it tries to read the byte_size to determine if it needs to use single part upload or multipart.
Would it be possible to provide a fix in this gem so that ActiveStorage will work to?
Can I use this gem with open-uri-cached
?
https://github.com/tigris/open-uri-cached
I tried. but got a below error.
ActionView::Template::Error (undefined method
metas' for #StringIO:0x007fead305a968`
This is my code.
# works.
image = open(item.first_image_url)
# this code works first time, but got above error second time.
#image = Down.download(item.first_image_url, read_timeout: 1)
I'm using this gem with Rails project.
thanks.
Using the http
backend is not working with max_size
option:
#Gemfile.lock
down (5.2.2)
http (5.0.1)
to reproduce:
Down.backend :http
remote_file = Down.open(src, max_size: 5 * 1024 * 1024)
remote_file.each_chunk do |chunk|
# do something
end
Hi Janko, thank you for your awesome job!
We encountered NoMethodError: undefined method 'bytesize'
error from down/chunked_io.rb:173:in `readpartial'.
We use Shrine gem which depends on your gem and got error as traceback below.
aws-sdk-core gem's update bellow seems to be related this issue?
Much appreciate if you could look on this issue.
NoMethodError: undefined method `bytesize' for #<Array:0x00007f8c067a8688>
from down/chunked_io.rb:173:in `readpartial'
from shrine/uploaded_file.rb:146:in `copy_stream'
from shrine/uploaded_file.rb:146:in `block in stream'
from shrine/uploaded_file.rb:98:in `open'
from shrine/uploaded_file.rb:146:in `stream'
from shrine/uploaded_file.rb:122:in `download'
from shrine/plugins/derivatives.rb:272:in `process_derivatives'
from shrine/plugins/derivatives.rb:184:in `create_derivatives'
from shrine/plugins/derivatives.rb:44:in `block in define_model_methods'
from (eval):4:in `block (3 levels) in run_file'
from active_record/relation/batches.rb:70:in `block (2 levels) in find_each'
from active_record/relation/batches.rb:70:in `each'
from active_record/relation/batches.rb:70:in `block in find_each'
from active_record/relation/batches.rb:136:in `block in find_in_batches'
from active_record/relation/batches.rb:238:in `block in in_batches'
from active_record/relation/batches.rb:222:in `loop'
from active_record/relation/batches.rb:222:in `in_batches'
from active_record/relation/batches.rb:135:in `find_in_batches'
from active_record/relation/batches.rb:69:in `find_each'
from active_record/querying.rb:21:in `find_each'
from (eval):3:in `block (2 levels) in run_file'
from seed-fu/runner.rb:46:in `eval'
from seed-fu/runner.rb:46:in `block (2 levels) in run_file'
from seed-fu/runner.rb:58:in `block in open'
from seed-fu/runner.rb:57:in `open'
from seed-fu/runner.rb:57:in `open'
from seed-fu/runner.rb:36:in `block in run_file'
from active_record/connection_adapters/abstract/database_statements.rb:280:in `block in transaction'
from active_record/connection_adapters/abstract/transaction.rb:280:in `block in within_new_transaction'
from active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize'
from active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
from active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
from active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
from active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
from active_record/connection_adapters/abstract/transaction.rb:278:in `within_new_transaction'
from active_record/connection_adapters/abstract/database_statements.rb:280:in `transaction'
from active_record/transactions.rb:212:in `transaction'
from seed-fu/runner.rb:35:in `run_file'
from seed-fu/runner.rb:26:in `block in run'
from seed-fu/runner.rb:25:in `each'
from seed-fu/runner.rb:25:in `run'
from seed-fu.rb:29:in `seed'
from tasks/seed_fu.rake:36:in `block (2 levels) in <main>'
from rake/task.rb:281:in `block in execute'
from rake/task.rb:281:in `each'
from rake/task.rb:281:in `execute'
from rake/task.rb:219:in `block in invoke_with_call_chain'
from monitor.rb:235:in `mon_synchronize'
from rake/task.rb:199:in `invoke_with_call_chain'
from rake/task.rb:188:in `invoke'
from rake/application.rb:160:in `invoke_task'
from rake/application.rb:116:in `block (2 levels) in top_level'
from rake/application.rb:116:in `each'
from rake/application.rb:116:in `block in top_level'
from rake/application.rb:125:in `run_with_threads'
from rake/application.rb:110:in `top_level'
from rails/commands/rake/rake_command.rb:23:in `block in perform'
from rake/application.rb:186:in `standard_exception_handling'
from rails/commands/rake/rake_command.rb:20:in `perform'
from rails/command.rb:48:in `invoke'
from rails/commands.rb:18:in `<main>'
from bootsnap/load_path_cache/core_ext/kernel_require.rb:23:in `require'
from bootsnap/load_path_cache/core_ext/kernel_require.rb:23:in `block in require_with_bootsnap_lfi'
from bootsnap/load_path_cache/loaded_features_index.rb:92:in `register'
from bootsnap/load_path_cache/core_ext/kernel_require.rb:22:in `require_with_bootsnap_lfi'
from bootsnap/load_path_cache/core_ext/kernel_require.rb:31:in `require'
from active_support/dependencies.rb:324:in `block in require'
from active_support/dependencies.rb:291:in `load_dependency'
from active_support/dependencies.rb:324:in `require'
from bin/rails:12:in `<main>'
Firstly, thank you for creating this gem, it saved me from a deep rabbit hole on my current project.
I've found an example of a url where the filename_from_content_disposition method returns nil
instead of the filename.
The url in question is "https://codeload.github.com/RehabMan/patch-nvme/zip/master" and
meta['content-disposition'] = attachment; filename=patch-nvme-master.zip
Therefore:
meta["content-disposition"].to_s[/filename="([^"]+)"/, 1] # => nil
instead of the expected:
patch-nvme-master.zip
I have a forked version of the gem with a potential fix (sauy7@3363587) but perhaps you have a nicer regex?
copy_to_tempfile
does not work on Windows I think it's because you are trying to move opened IO on line FileUtils.mv io.path, tempfile.path
.
Errno::EACCES: Permission denied @ unlink_internal
Just noting that almost everything in the repo seems to suggest that 4.8.2 is the latest release, but in fact rubygems only has 4.8.1 (https://rubygems.org/gems/down/).
I was wondering if there's any guidance around using the Down::Http
variant with a proxy? You have some docs for Down::NetHttp
but none for http.rb
.
Thanks in advance!
We ran into this issue out in the wild.
Downloading from https://emma.msrb.org/ER886357.pdf, the Content-Disposition header is set to:
"inline; filename=ER886357.pdf; creation-date=9/17/2012 1:51:37 PM; modification-date=9/17/2012 1:51:37 PM; size=3718678"
Using the method original_filename
from https://github.com/janko/down/blob/master/lib/down/net_http.rb#L382 parses the filename from the header using https://github.com/janko/down/blob/master/lib/down/utils.rb#L14. Unfortunately since the triggering regex (the third one) uses "Not whitespace" as the matcher, it includes the ;
at the end of the filename field as part of the returned filename.
Currently in our project, we are working around this by chomp
ing the ';' off the filename after having it returned.
Researching the header on MDN and the like, it looks like doing "Not ;" as the regex: content_disposition[/filename=([^;]+)/, 1]
might work. It works for this specific use case, but I'm not sure if it'll work overall.
I'm happy to make a pull request if we can figure out a good way to handle this edge case that continues working for other cases.
I'm trying to download a file from S3 bucket, with a filename having brackets encoded with UTF-8, but getting 403 forbidden, works perfectly fine with open-uri.
Down.download("https://bucket_name.s3.amazonaws.com/first/second/filename%282%29.pdf?AWSAccessKeyId=*********&Expires=1594192255&Signature=*********")
Is this built in? If not it looks like you can use open to inspect headers to get checksums (artifactory does this)
I have installed the gem through the Gemfile, but when tried to use it at the console it didn't worked:
irb(main):004:0> Down.download("https://example.org/410897162341.pdf")
Traceback (most recent call last):
2: from (irb):4
1: from (irb):4:in `rescue in irb_binding'
NameError (uninitialized constant Down)
irb(main):002:0> require 'down'
Traceback (most recent call last):
2: from (irb):2
1: from (irb):2:in `rescue in irb_binding'
LoadError (cannot load such file -- down)
Hi Janko,
came across this odd problem today. The following URL works fine in the browser, curl, HTTParty, etc but always returns a 401 when using this gem (by means of remote URL plugin in shrine), i.e.:
url = "https://i.guim.co.uk/img/media/97b07b907a75e7f1b4aecb092f8181ca63d0ad44/2_254_1183_709/master/1183.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom%2Cleft&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctZGVmYXVsdC5wbmc&enable=upscale&s=4c9af90b3d91c2269bad342e6b78d577"
Down.download(url)
I thought this was strange because the following works:
URI(url).open
I've narrowed it down to how the URL is encoded:
Lines 287 to 290 in 68754ed
this changes the comma in the URL (from bottom%2Cleft
to bottom,left
, making the signature in s
param invalid (works like secure URLs in Imgix, as it seems).
Would you call it a bug or is this the desired behavior?
I believe I've run into an edge case...
Try Down.download('https://www.adjustmentdecisionfold.cloud/apple-touch-icon.png')
.
I would expect it to raise a type of Down::ResponseError
. However, it runs into a NoMethodError
trying to raise the Down::ResponseError
. I think there is no response.message
available .
down-5.3.1/lib/down/net_http.rb:334:in `response_error!': undefined method `split' for nil:NilClass (NoMethodError)
message = response.message.split(" ").map(&:capitalize).join(" ")
While using the Down::NetHttp
backend and downloading a file with an unusual response error code like 522 it crashes due to a KeyNotFound error caused by this line
Line 301 in 11181e5
because 522 is not in the list of error codes CODE_TO_OBJ
in the HTTPResponse class
https://www.rubydoc.info/stdlib/net/Net/HTTPResponse
Not sure what would be the best way to fix this. In my project I'm just catching the KeyNotFound error...
Hi, I'm using Down::Http
backend on a project. http
5.0.0 has been released a few days ago. I tried to upgrade the gem but I'm getting the following error:
Gem::LoadError: can't activate http (>= 2.1.0, < 5), already activated http-5.0.0. Make sure all dependencies are added to Gemfile.
Here is a script to reproduce the error.
require 'bundler/inline'
gemfile(true) do
source 'https://rubygems.org'
gem 'down', '~> 5.2'
gem 'http', '~> 5.0'
end
require "down/http"
tempfile = Down::Http.download("https://www.google.com")
puts tempfile.path
The error is raised by a restriction on Down::Http backend.
I made PR #55 to allow version 5.x.
Hi, thanks for this awesome gem, love it!
I have just had a problem that we wanted to rename the file after downloading, but I haven't found any way to solve it.
Is it possible that we can set the original_filename when using download
class method, such as Down.download(url, original_filename: 'hello.jpg')
Appreciate a lot for your reply.
I'm not 100% sure this is a bug, but it's certainly unexpected (to me).
The following code works fine (without Down):
@config = Anyway::Config.for(:campus_access_manager)
data = open supervisor.access_letter, http_basic_authentication: [@config['username'],@config['password']]
send_data data.read, :type => data.content_type, :disposition => 'inline'
but I'd rather use Down, and I'd rather not save the file to disk. Switching to using Down seems fine (at least for the first step of fetching the file):
data = Down.download supervisor.access_letter, http_basic_authentication: [@config['username'], @config['password']]
but if I try to avoid downloading the file and passing chunks while it is being downloaded, to avoid saving it to disk, it fails with Down::ClientError - 401 Unauthorized
.
data = Down.open supervisor.access_letter, http_basic_authentication: [@config['username'], @config['password']]
data.each_chunk { |chunk| chunk }
data.close
Why would HTTP Basic Auth work with download
and not with open
?
Is support for HTTP Basic just not implemented for open
? Is this something that may be added in the near future?
I feel like I should set content-type header and somewhere between these lines
Lines 89 to 91 in 3a70e35
there should be
client.body = payload
For a while, I have been implementing an internal part of my project which is aiming to download a series of URLs that is read from a local file. Now I have a difficulty in running down
along with ruby-progressbar
. progress_proc
option seemed to be best solution to me at first, but after a quick dive into the concept of down
, I noticed that it was not as I expected. Anyone bring a solution or a workaround to my attention?
For some reason it looks like on the FileUtils.mv ruby is being denied the ability to rename the file.
Down::NotFound: Permission denied @ rb_file_s_rename - (C:/Users/vagrant/AppData/Local/Temp/open-uri20200930-1076-18a2wmi, C:/Users/vagrant/AppData/Local/Temp/down20200930-1076-19i1bix.pem)
open_uri_file = downloaded_file
downloaded_file = copy_to_tempfile(URI(url).path, open_uri_file)
OpenURI::Meta.init downloaded_file, open_uri_file
downloaded_file.extend DownloadedFile
downloaded_file
rescue => error
raise if error.is_a?(Down::Error)
raise Down::NotFound, error.message
end
def copy_to_tempfile(basename, io)
tempfile = Tempfile.new(['down', File.extname(basename)], binmode: true)
if io.is_a?(OpenURI::Meta) && io.is_a?(Tempfile)
FileUtils.mv io.path, tempfile.path
else
IO.copy_stream(io, tempfile.path)
io.rewind
end
tempfile.open
tempfile
end
OK, I'm still debugging this, but recording what I've found so far.
I can't share the entire original URL with you because it's confidential content, so I've intentionally corrupted this signed S3 URL. but we have a signed S3 URL that looks like this:
id = "https://scihist-digicoll-production-ingest-mount.s3.amazonaws.com/Verma_Inder%20%2B%20Cole_Charles/verma_i_and_cole_c_0198_FULL.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAU4GX5J7ESI6XHR42%2F20210322%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210322T192711Z&X-Amz-Expires=14400&X-Amz-SignedHeaders=host&X-Amz-Signature=[signature omitted]"
If I do:
Down::NetHttp.open(id)
It works.
But if I instead do:
Down::Http.open(id) # exact same id
Se returns a 403 error.
I think something somewhere is not escaping correctly. Could very well be http-rb's fault. Still trying to debug and reproduce with a smaller isolated test case. Ooh, that gives me an idea.
Stay tuned.
Hey there! Trying to use this brilliant gem with Ruby 2.7
…
tempfile = Down.download(
link,
content_length_proc: -> (content_length) {
file_size = content_length
},
progress_proc: -> (progress) {
changed
downloaded_size = progress
notify_observers(file_size, downloaded_size)
}
)
…
and getting the following warnings:
/usr/local/Cellar/rbenv/1.1.2/versions/2.7.0/lib/ruby/gems/2.7.0/gems/down-5.1.0/lib/down.rb:10: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/usr/local/Cellar/rbenv/1.1.2/versions/2.7.0/lib/ruby/gems/2.7.0/gems/down-5.1.0/lib/down/backend.rb:12: warning: The called method `download' is defined here
It seems that some changes should be made for the future versions of Ruby.
Current workaround (just to hide the warnings, if someone needs it right now):
RUBYOPT='-W:no-deprecated -W:no-experimental' <RUBY_COMMAND>
or adding this ENV variable to your shell init script.
Thanks for the lib.
The gem works perfectly for 2.7, 3.0, and 3.1, but fails for ruby 3.2 with:
ArgumentError:
unrecognized option: max_redirects
# ./vendor/bundle/ruby/3.2.0/gems/down-5.4.0/lib/down/net_http.rb:145:in `open_uri'
# ./vendor/bundle/ruby/3.2.0/gems/down-5.4.0/lib/down/net_http.rb:94:in `download'
# ./vendor/bundle/ruby/3.2.0/gems/down-5.4.0/lib/down/backend.rb:13:in `download'
# ./vendor/bundle/ruby/3.2.0/gems/down-5.4.0/lib/down.rb:10:in `download'
For code:
Down.download(
url,
open_timeout: 10,
read_timeout: 10,
max_redirects: 10,
headers: headers,
content_length_proc: ->(content_length) {
progress_bar.total = content_length if content_length
},
progress_proc: -> (progress) {
progress_bar.increment(progress)
}
)
I'm currently noticing this when working with some CSV files I'm storing in S3 using Shrine. The Down::ChunkedIO
has a private gets
method when CSV.parse
is trying to call (yes I know CSV.parse shouldn't be parsing by lines but that's what it does).
The actual error is:
NoMethodError (private method `gets' called for #<Down::ChunkedIO:0x007f7116dbbad0>):
It would probably help for compatibility with other things that process IO if the gets
method was public and processed chunks until it completed a line an then returned that line.
I should add that I worked around this issue by simply calling download
instead of of to_io
so this is hardly a big issue. I noticed that ChunkedIO
is using a Tempfile as an intermediate anyways so it probably makes no difference. I'm curious though why not just process the IO in memory?
Hey 👋
Down isn't able to download files with 308 redirects.
Exemple :
https://fakeimg.pl/1920x1280/fafbfc redirects to http://fakeimg.pl/1920x1280/fafbfc/ with an 308 code and them redirects to https://fakeimg.pl/1920x1280/fafbfc/ with a 301.
We can follow the redirects here : https://wheregoes.com/trace/20222494475/
Down::NetHttp.download("https://fakeimg.pl/1920x1280/fafbfc", :destination => "/Users/jules/Desktop/test.jpg", :max_redirects => 5)
308 Permanent Redirect
Have you any ideas ?
Bests
Downloaded file doesn't keep the original name. It should be supported by default and you can optionally disable it.
I am trying to read a 2GB+ files in chunks. However, when I read the file in chunks with remote_file.read(1024102450), in order to return chunks of 50MB at a time, it times out at the 3rd method call and only reads 25MB instead of 50MB. All following remote_file.read calls returns a chunk size of 0.
I initially tried using the Down.download method to download the files locally but for files larger than 2GB, the full file size was not being downloaded (downloading a 2GB resulted in a file size of 1.6GB and downloading a 5GB file returned a local file size of 2.3GB and 1.4GB when trying a second time). There are many inconsistencies using the download method.
I am using the exact code that is used in the documentation. Does this library have a solution for large file sizes or is there a way for it to prevent timeouts when streaming large files?
Hi Janko, debugging the shrine issue has led me here:
> Down.download('https://trello-attachments.s3.amazonaws.com/551edb81eda0610c6fd2d322/718x485/0ac204696b33b9eba1744aaa938ded42/large_audience_reach_twitter_%5B1%5D.png')
Bad uri(is not uri?): https://trello-attachments.s3.amazonaws.com/551edb81eda0610c6fd2d322/718x485/0ac204696b33b9eba1744aaa938ded42/large_audience_reach_twitter_[1].png
========================================================================================================================================================================
[0] /vendor/bundle/ruby/2.2.0/gems/down-1.1.0/lib/down.rb:47:in `rescue in download'
42: downloaded_file.extend DownloadedFile
43: downloaded_file
44:
45: rescue => error
46: raise if error.is_a?(Down::Error)
=> 47: raise Down::NotFound, error.message
48: end
49:
50: def copy_to_tempfile(basename, io)
51: tempfile = Tempfile.new(["down", File.extname(basename)], binmode: true)
52: if io.is_a?(OpenURI::Meta) && io.is_a?(Tempfile)
This should be very easy to reproduce.
Update: the issue doesn't manifest on my local machine (mac), while it does on server (heroku). Heroku runs 2.2, while locally I have ruby 2.1.5.
Uri.parse succeeds itself, so I guess there is something else failing.
I just got alert from bugsnag:
Net::ReadTimeout/usr/lib/ruby/2.3.0/net/protocol.rb:158
/usr/lib/ruby/2.3.0/net/protocol.rb:158rbuf_fill
/usr/lib/ruby/2.3.0/net/protocol.rb:106read
/usr/lib/ruby/2.3.0/net/http/response.rb:291block in read_body_0
/usr/lib/ruby/2.3.0/net/http/response.rb:276inflater
/usr/lib/ruby/2.3.0/net/http/response.rb:281read_body_0
/usr/lib/ruby/2.3.0/net/http/response.rb:202read_body
gems/down-4.0.1/lib/down/chunked_io.rb:154each
gems/down-4.0.1/lib/down/chunked_io.rb:154block in chunks_fiber
Looks like Down
didn't intercept Net::HTTP
's Net::ReadTimeout
exception.
I rescue only Down::Error
and hope Down
will intercept its dependencies' exceptions.
In my code I use Down.open
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.