GithubHelp home page GithubHelp logo

janko / down Goto Github PK

View Code? Open in Web Editor NEW
991.0 16.0 52.0 465 KB

Streaming downloads using net/http, http.rb, HTTPX or wget

License: MIT License

Ruby 100.00%
download partial-responses ruby streaming http tempfile

down's People

Contributors

aglushkov avatar aldodelgado avatar antprt avatar benubois avatar bkmgit avatar darndt avatar ermolaev avatar evheny0 avatar hmistry avatar honeyryderchuck avatar janko avatar jcmfernandes avatar korstiaan avatar kunliudji avatar mgrunberg avatar olleolleolle avatar ollym avatar rafbm avatar razum2um avatar sauy7 avatar shime avatar zarqman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

down's Issues

304 Not Modified leads to redirect ArgumentError

Hey there!

I noticed this behavior while trying to make a request to an image resource, using If-None-Match: <etag> in the request headers. As expected, the server returns a 304 Not Modified error when the etag matches that of the existing resource.

In net_http_request, there is a clause that checks if the response.is_a?(Net::HTTPRedirection), and then attempts to verify that the redirect address (response["Location"]) is a valid address. Although the response 304 Not Modified is a redirect code, it does not return a location field.

As a result, we pass a nil url value to ensure_uri, and end up with ArgumentError: bad argument (expected URI object or URI string)

Is this expected behavior? It seems like it may be a good idea to add a check for Net::HTTPNotModified, and then raise an error that fits into Down's exception hierarchy, (maybe Down::NotModified?).

If this would be a welcome change, I'm happy to help out.

Thanks!

Moved permanently

Hi, I'm using Down to download automatically some files from my webserver... Some of these have spaces in the name, so I replaced all spaces with %20. By pasting the URL in the browser I get the file downloading, but when using Down.download I get a ResponseError - 301 Moved Permanently error from the gem. How can I fix it?

save the file

I don't understand how to save the file or is it not designed to do that?

ChunkedIO warnings

Excellent gem, thanks for making it.

If you run this code...

# test.rb
require 'down'

down = Down.open('https://github.com')
down.each_chunk() {|chunk| }
down.close()

...using ruby -w test.rb, you'll get a lot of warnings while downloading:

~/.gem/ruby/2.7.0/gems/down-5.1.1/lib/down/chunked_io.rb:304: warning: instance variable @next_chunk not initialized
~/.gem/ruby/2.7.0/gems/down-5.1.1/lib/down/chunked_io.rb:264: warning: instance variable @closed not initialized

This isn't a big deal, but fills up my log files if warnings are turned on.

Environment

  • ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]
  • down (5.1.1)

Thanks

wget backend raises EOFError

Thanks for making an experimental wget wrapper; this is perfect for me because I need to handle an FTP server with flaky IPv6 support and I don't see any other Ruby libraries that let me force IPv4 easily!

When I try to download this file however it raises EOFError:

require "down/wget"
wget = Down::Wget.new("--inet4-only")
wget.open("ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/Pills/README")
# EOFError (end of file reached)

When I run the command it is generating manually in my CLI it works okay:

wget.send(:generate_command, "ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/Pills/README").join(" ")
# => "wget --no-verbose --save-headers -O - --inet4-only --user-agent Down/5.0.0 --max-redirect 2 --dns-timeout 30 --connect-timeout 30 --read-timeout 30 ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/Pills/README"

Here's my wget version FWIW

$ wget --version
GNU Wget 1.19.4 built on linux-gnu.

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls 
+ntlm +opie +psl +ssl/openssl 

Wgetrc: 
    /etc/wgetrc (system)
Locale: 
    /usr/share/locale 
Compile: 
    gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc" 
    -DLOCALEDIR="/usr/share/locale" -I. -I../../src -I../lib 
    -I../../lib -Wdate-time -D_FORTIFY_SOURCE=2 -DHAVE_LIBSSL -DNDEBUG 
    -g -O2 -fdebug-prefix-map=/build/wget-Xb5Z7Y/wget-1.19.4=. 
    -fstack-protector-strong -Wformat -Werror=format-security 
    -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall 
Link: 
    gcc -DHAVE_LIBSSL -DNDEBUG -g -O2 
    -fdebug-prefix-map=/build/wget-Xb5Z7Y/wget-1.19.4=. 
    -fstack-protector-strong -Wformat -Werror=format-security 
    -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall -Wl,-Bsymbolic-functions 
    -Wl,-z,relro -Wl,-z,now -lpcre -luuid -lidn2 -lssl -lcrypto -lpsl 
    ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a 

Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Originally written by Hrvoje Niksic <[email protected]>.
Please send bug reports and questions to <[email protected]>

I will try to poke around and see if I can figure out what's going on but wanted to file an issue first.

is Down gem thread safe?

Hello

We are working massively with down gem on importing lot of documents and files from other sites.
To do that, we are using jobs and workers on a Sidekiq queue manager in where each worker calls one Down.download . We have currently 10 workers

However, we noticed that some downloaded files are old ones, meaning the Down.download are not downloading the fresh new ones. It seems like internally the tempfile created is not unlinked.

I will try to create a separate test environment to demonstrate the bug

ChunkedIO does not implement #gets correctly, and therefore does not parse CSV while streaming.

Hi,

First of all, thank you for writing this software. I tried fixing this myself but could not get the ChunkedIO CSV test to work. Currently, ruby's csv calls the io#gets with a nil separator and a limit. In ChunkedIO's current implementation, this causes ChunkedIO to read the entire data source, because it always does so when separator is nil.

When using CSV.new(Chunk.open('http://some/large/file.csv')).each {...}, I used print statements to verify that the entire file is downloaded before any parsing occurs.

Tempfile returned as nil

I am using down (5.2.0) with Rails 5.x and able to download file successfully but the problem is the filename gets changed so I am trying to rename the tempfile but it returns nil

pry> tempfile = Down.download('https://findkollegie.dk/wp-content/uploads/2018/04/knap-FK-300x54.png', destination: '/home/amit')
=> nil
> tempfile.path

NoMethodError:
       undefined method `path' for nil:NilClass

As per the doc this is expected behaviour but shouldn't it have consistent behaviour?

Getting nil from "download" method when using the "destination" option

Hello all,

First of all, thank you for the wonderful gem.

I think I've encountered a bug when using the destination option - the file gets downloaded to the correct location, but the return value from Down.download is nil. Omitting that option returns a File instance.

For example, this works:

url = "https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png"
file = Down.download(url)
# <File:/var/folders/96/q0yz5pns6f7c2y4zld38hsjd4tw3vw/T/down-net_http20190515-83878-bdxd4c.png>

but this does not:

file = Down.download(url, destination: Rails.root.join('public/downloads/' + SecureRandom.uuid))
# nil

even though the file is created (running find public yields public/downloads/e959fba2-cc08-429c-b8df-7f9938eff8f9).

Is this the intended behavior, or am I doing something wrong? I am using version 4.8.0.

Update to 4.0.0 from 3.2.0 broke use of open() elsewhere in app

I'm not sure if this is as much an issue with down, but more-so, a side-effect of not having used require 'open-uri' wherever I had previously used open(url_here) methods within my app.

After running a bundle update, down was updated to 4.0.0, and then I noticed it was breaking unless I required open-uri.

Is it possible that previously, down was requiring this, but now doesn't?

Also, is it always necessary to require 'open-uri' when using open()?

Compatibility with PDF::Reader

Hi,
I'm working with the Shrine gem and uploading PDF files to S3 (direct upload).
During the upload, the code is adding some metadata such as the page count.

I noticed that PDF::Reader accepts IO objects as input, but it's not working with the Down::ChunkedIOclass.

This is the code from the shrine uploader:

add_metadata do |file, metadata:, record:, **|
    case metadata["mime_type"]
    when "application/pdf"
      io = file.to_io if file.respond_to?(:to_io)
      reader = PDF::Reader.new(io)

      { page_count: reader.page_count }

Exception:

ArgumentError: input must be an IO-like object or a filename (Down::ChunkedIO)

Code from PDF::Reader that will try to read the IO object.
https://github.com/yob/pdf-reader/blob/625e8dca295d8b6a48f634cba812f14fd3b805a4/lib/pdf/reader/object_hash.rb#L601

Shouldn't the Down::ChunkedIO be considered and IO?

The code may work if it input a full file (file.download) or a StringIO, so there are alternatives there.

Any thoughts?
Thank you.

Down::ChunkedIO#pos returns wildly incorrect values

We are using Shrine for handling large CSV uploads that are then stream-processed. We have a progress meter for this which works off the underlying IO object's #pos values. For local files, this works perfectly. Once we went into our Staging environment with S3 as the storage engine, using Down under-the-hood, it all broke. It seems that after the first 1K of data, Down::ChunkedIO#pos starts returning values much, much higher than they should be - far beyond the end of the file.

For a particular test file of only 3669 bytes comprising around 55 CSV rows plus header, the size reported by the IO object was consistently correct. However, inside the CSV row iterator, the results of #pos were:

0
1024
1024
1024
1024
1024
1024
1024
1024
1024
1024
1024
1024
1024
3736
6268
8732
11134
13466
15730
17923
20045
22103
24087
26017
27888
29698
31455
33155
34794
36363
37878
39313
40687
41998
43249
44431
45549
46598
47581
48498
49349
50137
50861
51519
52117
52656
53138
53562
53924
54220
54465
54647
54774
54840
54840

The start offset is 0. The 1024 offset was presumed to be a chunk size from the CSV processor, but if I tried to rewind to zero and read 1024 bytes, I actually got a very strange 1057 bytes, perfectly aligned to a row end, instead. In any event, it then sits at 1024 for a while and once the CSV parsing seems to have gone past that first "chunk" - be it 1024 or 1057 bytes - then the positions reported become, as you can see, very wrong.

The above was generated with no rewinding or other shenanigans; in psuedocode we have:

# shrine_file is our Shrine subclass instance representing the S3 object. The
# encoding specifier is typically UTF-8.
#
# Inside the iterator, io_obj is the Down::ChunkedIO instance. CSV options are:
#
#   {:headers=>true, :header_converters=>:symbol, :liberal_parsing=>true}
#
shrine_file.open(encoding: encoding_specifier) do | io_obj |
  csv = CSV.new(io_obj, **options)

  csv.each do |row|
    puts io_obj.pos
  end
end

Down::ChunkedIO.new fails if enumerator emits frozen strings

for example:

# frozen_string_literal: true
io = Down::ChunkedIO.new(chunks: %w(one two three).each)
IO.copy_stream(io, some_stream)
=>
    FrozenError:
       can't modify frozen String
     # /Users/lasto/.rvm/gems/ruby-2.5.3/gems/down-4.8.1/lib/down/chunked_io.rb:275:in `force_encoding'
     # /Users/lasto/.rvm/gems/ruby-2.5.3/gems/down-4.8.1/lib/down/chunked_io.rb:275:in `retrieve_chunk'
     # /Users/lasto/.rvm/gems/ruby-2.5.3/gems/down-4.8.1/lib/down/chunked_io.rb:162:in `readpartial'

Download via NetHttp and Cookie header fails

Thank you for your work on this gem.

I've a problem with downloading a file, where I need to set a cookie:

      dossier_file = Down::NetHttp.download(
        dossier_path,
        read_timeout: 300,
        headers: {
          'Cookie': "#{Settings.cookies.session_token}=#{@session_token}",
          'User-Agent': @user_agent
        }
      )

When I do exactly the same with open it works.

unrecognized option: Cookie
/usr/local/lib/ruby/3.0.0/open-uri.rb:111:in `block in check_options'
/usr/local/lib/ruby/3.0.0/open-uri.rb:108:in `each'
/usr/local/lib/ruby/3.0.0/open-uri.rb:108:in `check_options'
/usr/local/lib/ruby/3.0.0/open-uri.rb:132:in `open_uri'
/usr/local/lib/ruby/3.0.0/open-uri.rb:721:in `open'
/usr/local/bundle/gems/down-5.2.1/lib/down/net_http.rb:146:in `open_uri'
/usr/local/bundle/gems/down-5.2.1/lib/down/net_http.rb:93:in `download'
/usr/local/bundle/gems/down-5.2.1/lib/down/backend.rb:13:in `download'

Suggest using addressable

Hi,

Down::InvalidUrl
/usr/local/bundle/gems/down-4.5.0/lib/down/net_http.rb:236:in `rescue in ensure_uri'
/usr/local/bundle/gems/down-4.5.0/lib/down/net_http.rb:231:in `ensure_uri'
/usr/local/bundle/gems/down-4.5.0/lib/down/net_http.rb:65:in `download'
/usr/local/bundle/gems/down-4.5.0/lib/down/backend.rb:13:in `download'

the standard URI lib used by down cannot parse URLs with some bizarre characters, such as [.
Please consider to use https://github.com/sporkmonger/addressable, if you don't mind I can also make a PR for it.

Thanks in advance.

ChunkedIO Api is Incompatible with Zip::File.open_buffer

I am archiving a zip file uploaded with Shrine to S3. Zip::File is complaining about ChunkedIO saying that is not implementing an IO-like object.

The error is:

RuntimeError: Zip::File.open_buffer expects a String or IO-like argument (responds to tell, seek, read, close). Found: Down::ChunkedIO

It looks like ChunkedIO is not implementing tell or seek.

I have worked around this problem by calling download instead of open on the Shrine object.

unknown method byte_size

I found already this issue (#44 ) but I'm using active storage in this case with a S3 back-end. So the proposed fix is not useful to me.

What I'm wanting to do is this:

property.asset.attach(io: Down.open("https://somedomain.tld/file.mp4"),
                      filename: "file.mp4",
                      content_type: "video/mp4")

So that it can read from the IO and stream the upload and download. Because it uses the S3 gem in backend it tries to read the byte_size to determine if it needs to use single part upload or multipart.

Would it be possible to provide a fix in this gem so that ActiveStorage will work to?

undefined method `max_size=' when using Down.backend :http

Using the http backend is not working with max_size option:

#Gemfile.lock
down (5.2.2)
http (5.0.1)

to reproduce:

Down.backend :http
remote_file = Down.open(src, max_size: 5 * 1024 * 1024)
remote_file.each_chunk do |chunk|
  # do something
end

NoMethodError: undefined method `bytesize'

Hi Janko, thank you for your awesome job!

We encountered NoMethodError: undefined method 'bytesize' error from down/chunked_io.rb:173:in `readpartial'.

We use Shrine gem which depends on your gem and got error as traceback below.
aws-sdk-core gem's update bellow seems to be related this issue?
Much appreciate if you could look on this issue.

aws-sdk-core update

aws/aws-sdk-ruby#2357

traceback

NoMethodError: undefined method `bytesize' for #<Array:0x00007f8c067a8688>
  from down/chunked_io.rb:173:in `readpartial'
  from shrine/uploaded_file.rb:146:in `copy_stream'
  from shrine/uploaded_file.rb:146:in `block in stream'
  from shrine/uploaded_file.rb:98:in `open'
  from shrine/uploaded_file.rb:146:in `stream'
  from shrine/uploaded_file.rb:122:in `download'
  from shrine/plugins/derivatives.rb:272:in `process_derivatives'
  from shrine/plugins/derivatives.rb:184:in `create_derivatives'
  from shrine/plugins/derivatives.rb:44:in `block in define_model_methods'
  from (eval):4:in `block (3 levels) in run_file'
  from active_record/relation/batches.rb:70:in `block (2 levels) in find_each'
  from active_record/relation/batches.rb:70:in `each'
  from active_record/relation/batches.rb:70:in `block in find_each'
  from active_record/relation/batches.rb:136:in `block in find_in_batches'
  from active_record/relation/batches.rb:238:in `block in in_batches'
  from active_record/relation/batches.rb:222:in `loop'
  from active_record/relation/batches.rb:222:in `in_batches'
  from active_record/relation/batches.rb:135:in `find_in_batches'
  from active_record/relation/batches.rb:69:in `find_each'
  from active_record/querying.rb:21:in `find_each'
  from (eval):3:in `block (2 levels) in run_file'
  from seed-fu/runner.rb:46:in `eval'
  from seed-fu/runner.rb:46:in `block (2 levels) in run_file'
  from seed-fu/runner.rb:58:in `block in open'
  from seed-fu/runner.rb:57:in `open'
  from seed-fu/runner.rb:57:in `open'
  from seed-fu/runner.rb:36:in `block in run_file'
  from active_record/connection_adapters/abstract/database_statements.rb:280:in `block in transaction'
  from active_record/connection_adapters/abstract/transaction.rb:280:in `block in within_new_transaction'
  from active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize'
  from active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
  from active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
  from active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
  from active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
  from active_record/connection_adapters/abstract/transaction.rb:278:in `within_new_transaction'
  from active_record/connection_adapters/abstract/database_statements.rb:280:in `transaction'
  from active_record/transactions.rb:212:in `transaction'
  from seed-fu/runner.rb:35:in `run_file'
  from seed-fu/runner.rb:26:in `block in run'
  from seed-fu/runner.rb:25:in `each'
  from seed-fu/runner.rb:25:in `run'
  from seed-fu.rb:29:in `seed'
  from tasks/seed_fu.rake:36:in `block (2 levels) in <main>'
  from rake/task.rb:281:in `block in execute'
  from rake/task.rb:281:in `each'
  from rake/task.rb:281:in `execute'
  from rake/task.rb:219:in `block in invoke_with_call_chain'
  from monitor.rb:235:in `mon_synchronize'
  from rake/task.rb:199:in `invoke_with_call_chain'
  from rake/task.rb:188:in `invoke'
  from rake/application.rb:160:in `invoke_task'
  from rake/application.rb:116:in `block (2 levels) in top_level'
  from rake/application.rb:116:in `each'
  from rake/application.rb:116:in `block in top_level'
  from rake/application.rb:125:in `run_with_threads'
  from rake/application.rb:110:in `top_level'
  from rails/commands/rake/rake_command.rb:23:in `block in perform'
  from rake/application.rb:186:in `standard_exception_handling'
  from rails/commands/rake/rake_command.rb:20:in `perform'
  from rails/command.rb:48:in `invoke'
  from rails/commands.rb:18:in `<main>'
  from bootsnap/load_path_cache/core_ext/kernel_require.rb:23:in `require'
  from bootsnap/load_path_cache/core_ext/kernel_require.rb:23:in `block in require_with_bootsnap_lfi'
  from bootsnap/load_path_cache/loaded_features_index.rb:92:in `register'
  from bootsnap/load_path_cache/core_ext/kernel_require.rb:22:in `require_with_bootsnap_lfi'
  from bootsnap/load_path_cache/core_ext/kernel_require.rb:31:in `require'
  from active_support/dependencies.rb:324:in `block in require'
  from active_support/dependencies.rb:291:in `load_dependency'
  from active_support/dependencies.rb:324:in `require'
  from bin/rails:12:in `<main>'

DownloadedFile.filename_from_content_disposition returns nil instead of filename in certain cases

Firstly, thank you for creating this gem, it saved me from a deep rabbit hole on my current project.

I've found an example of a url where the filename_from_content_disposition method returns nil instead of the filename.

The url in question is "https://codeload.github.com/RehabMan/patch-nvme/zip/master" and

meta['content-disposition'] = attachment; filename=patch-nvme-master.zip

Therefore:

meta["content-disposition"].to_s[/filename="([^"]+)"/, 1] # => nil

instead of the expected:

patch-nvme-master.zip

I have a forked version of the gem with a potential fix (sauy7@3363587) but perhaps you have a nicer regex?

copy_to_tempfile does not work on Windows

copy_to_tempfile does not work on Windows I think it's because you are trying to move opened IO on line FileUtils.mv io.path, tempfile.path.

Errno::EACCES: Permission denied @ unlink_internal

Down::Http via a proxy?

I was wondering if there's any guidance around using the Down::Http variant with a proxy? You have some docs for Down::NetHttp but none for http.rb.

Thanks in advance!

Content-Disposition handling with non-standard header from IIS 10.0 returns `;` as part of filename

We ran into this issue out in the wild.

Downloading from https://emma.msrb.org/ER886357.pdf, the Content-Disposition header is set to:

"inline; filename=ER886357.pdf; creation-date=9/17/2012 1:51:37 PM; modification-date=9/17/2012 1:51:37 PM; size=3718678"

Using the method original_filename from https://github.com/janko/down/blob/master/lib/down/net_http.rb#L382 parses the filename from the header using https://github.com/janko/down/blob/master/lib/down/utils.rb#L14. Unfortunately since the triggering regex (the third one) uses "Not whitespace" as the matcher, it includes the ; at the end of the filename field as part of the returned filename.

Currently in our project, we are working around this by chomping the ';' off the filename after having it returned.

Researching the header on MDN and the like, it looks like doing "Not ;" as the regex: content_disposition[/filename=([^;]+)/, 1] might work. It works for this specific use case, but I'm not sure if it'll work overall.

I'm happy to make a pull request if we can figure out a good way to handle this edge case that continues working for other cases.

Can't get it working with Rails 5.2

I have installed the gem through the Gemfile, but when tried to use it at the console it didn't worked:

irb(main):004:0> Down.download("https://example.org/410897162341.pdf")
Traceback (most recent call last):
        2: from (irb):4
        1: from (irb):4:in `rescue in irb_binding'
NameError (uninitialized constant Down)
irb(main):002:0> require 'down'
Traceback (most recent call last):
        2: from (irb):2
        1: from (irb):2:in `rescue in irb_binding'
LoadError (cannot load such file -- down)

Undesirable changing of URLs by Addressable

Hi Janko,

came across this odd problem today. The following URL works fine in the browser, curl, HTTParty, etc but always returns a 401 when using this gem (by means of remote URL plugin in shrine), i.e.:

url = "https://i.guim.co.uk/img/media/97b07b907a75e7f1b4aecb092f8181ca63d0ad44/2_254_1183_709/master/1183.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom%2Cleft&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctZGVmYXVsdC5wbmc&enable=upscale&s=4c9af90b3d91c2269bad342e6b78d577"
Down.download(url)

I thought this was strange because the following works:

URI(url).open

I've narrowed it down to how the URL is encoded:

down/lib/down/net_http.rb

Lines 287 to 290 in 68754ed

def addressable_normalize(url)
addressable_uri = Addressable::URI.parse(url)
addressable_uri.normalize.to_s
end

this changes the comma in the URL (from bottom%2Cleft to bottom,left, making the signature in s param invalid (works like secure URLs in Imgix, as it seems).

Would you call it a bug or is this the desired behavior?

`NoMethodError` trying to raise non-successful response `Down::ResponseError`

I believe I've run into an edge case...

Try Down.download('https://www.adjustmentdecisionfold.cloud/apple-touch-icon.png').

I would expect it to raise a type of Down::ResponseError. However, it runs into a NoMethodError trying to raise the Down::ResponseError. I think there is no response.message available .

down-5.3.1/lib/down/net_http.rb:334:in `response_error!': undefined method `split' for nil:NilClass (NoMethodError)

message = response.message.split(" ").map(&:capitalize).join(" ")

Crash due to non existing HTTP error code

While using the Down::NetHttp backend and downloading a file with an unusual response error code like 522 it crashes due to a KeyNotFound error caused by this line

response_class = Net::HTTPResponse::CODE_TO_OBJ.fetch(code)

because 522 is not in the list of error codes CODE_TO_OBJ in the HTTPResponse class

https://www.rubydoc.info/stdlib/net/Net/HTTPResponse

Not sure what would be the best way to fix this. In my project I'm just catching the KeyNotFound error...

Support for Http 5.0

Hi, I'm using Down::Http backend on a project. http 5.0.0 has been released a few days ago. I tried to upgrade the gem but I'm getting the following error:

Gem::LoadError: can't activate http (>= 2.1.0, < 5), already activated http-5.0.0. Make sure all dependencies are added to Gemfile.

Here is a script to reproduce the error.

require 'bundler/inline'

gemfile(true) do
  source 'https://rubygems.org'

  gem 'down', '~> 5.2'
  gem 'http', '~> 5.0'
end

require "down/http"

tempfile = Down::Http.download("https://www.google.com")
puts tempfile.path

The error is raised by a restriction on Down::Http backend.

I made PR #55 to allow version 5.x.

rename filename when download

Hi, thanks for this awesome gem, love it!
I have just had a problem that we wanted to rename the file after downloading, but I haven't found any way to solve it.
Is it possible that we can set the original_filename when using download class method, such as Down.download(url, original_filename: 'hello.jpg')
Appreciate a lot for your reply.

.open fails where .download works with HTTP Basic Authentication

I'm not 100% sure this is a bug, but it's certainly unexpected (to me).

The following code works fine (without Down):

    @config   = Anyway::Config.for(:campus_access_manager)
    data      = open supervisor.access_letter, http_basic_authentication: [@config['username'],@config['password']]
    send_data data.read, :type => data.content_type, :disposition => 'inline'

but I'd rather use Down, and I'd rather not save the file to disk. Switching to using Down seems fine (at least for the first step of fetching the file):

    data    = Down.download supervisor.access_letter, http_basic_authentication: [@config['username'], @config['password']]

but if I try to avoid downloading the file and passing chunks while it is being downloaded, to avoid saving it to disk, it fails with Down::ClientError - 401 Unauthorized.

    data    = Down.open supervisor.access_letter, http_basic_authentication: [@config['username'], @config['password']]
    data.each_chunk { |chunk| chunk }
    data.close

Why would HTTP Basic Auth work with download and not with open?

Is support for HTTP Basic just not implemented for open? Is this something that may be added in the near future?

How do I pass POST payload?

I feel like I should set content-type header and somewhere between these lines

down/lib/down/http.rb

Lines 89 to 91 in 3a70e35

client = @client
client = client.basic_auth(user: uri.user, pass: uri.password) if uri.user || uri.password
client = block.call(client) if block

there should be

client.body = payload

Progressbar Implementation

For a while, I have been implementing an internal part of my project which is aiming to download a series of URLs that is read from a local file. Now I have a difficulty in running down along with ruby-progressbar. progress_proc option seemed to be best solution to me at first, but after a quick dive into the concept of down, I noticed that it was not as I expected. Anyone bring a solution or a workaround to my attention?

Windows - FileUtils.mv Rename not allowed on an open file.

For some reason it looks like on the FileUtils.mv ruby is being denied the ability to rename the file.

Down::NotFound: Permission denied @ rb_file_s_rename - (C:/Users/vagrant/AppData/Local/Temp/open-uri20200930-1076-18a2wmi, C:/Users/vagrant/AppData/Local/Temp/down20200930-1076-19i1bix.pem)

    open_uri_file = downloaded_file
    downloaded_file = copy_to_tempfile(URI(url).path, open_uri_file)
    OpenURI::Meta.init downloaded_file, open_uri_file

    downloaded_file.extend DownloadedFile
    downloaded_file
  rescue => error
    raise if error.is_a?(Down::Error)
    raise Down::NotFound, error.message
  end

  def copy_to_tempfile(basename, io)
    tempfile = Tempfile.new(['down', File.extname(basename)], binmode: true)
    if io.is_a?(OpenURI::Meta) && io.is_a?(Tempfile)
      FileUtils.mv io.path, tempfile.path
    else
      IO.copy_stream(io, tempfile.path)
      io.rewind
    end
    tempfile.open
    tempfile
  end

Down::Http can't get an S3 URL that Down::NetHttp can; 403 error

OK, I'm still debugging this, but recording what I've found so far.

I can't share the entire original URL with you because it's confidential content, so I've intentionally corrupted this signed S3 URL. but we have a signed S3 URL that looks like this:

id = "https://scihist-digicoll-production-ingest-mount.s3.amazonaws.com/Verma_Inder%20%2B%20Cole_Charles/verma_i_and_cole_c_0198_FULL.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAU4GX5J7ESI6XHR42%2F20210322%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210322T192711Z&X-Amz-Expires=14400&X-Amz-SignedHeaders=host&X-Amz-Signature=[signature omitted]"

If I do:

Down::NetHttp.open(id)

It works.

But if I instead do:

Down::Http.open(id) # exact same id

Se returns a 403 error.

I think something somewhere is not escaping correctly. Could very well be http-rb's fault. Still trying to debug and reproduce with a smaller isolated test case. Ooh, that gives me an idea.

Stay tuned.

Ruby 2.7+ deprecation warning

Hey there! Trying to use this brilliant gem with Ruby 2.7


    tempfile = Down.download(
      link,
      content_length_proc: -> (content_length) {
        file_size = content_length
      },
      progress_proc: -> (progress) {
        changed
        downloaded_size = progress
        notify_observers(file_size, downloaded_size)
      }
    )

and getting the following warnings:

/usr/local/Cellar/rbenv/1.1.2/versions/2.7.0/lib/ruby/gems/2.7.0/gems/down-5.1.0/lib/down.rb:10: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/usr/local/Cellar/rbenv/1.1.2/versions/2.7.0/lib/ruby/gems/2.7.0/gems/down-5.1.0/lib/down/backend.rb:12: warning: The called method `download' is defined here

It seems that some changes should be made for the future versions of Ruby.

Current workaround (just to hide the warnings, if someone needs it right now):

RUBYOPT='-W:no-deprecated -W:no-experimental' <RUBY_COMMAND>

or adding this ENV variable to your shell init script.

Thanks for the lib.

Ruby 3.2 unrecognized option: max_redirects

The gem works perfectly for 2.7, 3.0, and 3.1, but fails for ruby 3.2 with:

ArgumentError:
 unrecognized option: max_redirects
# ./vendor/bundle/ruby/3.2.0/gems/down-5.4.0/lib/down/net_http.rb:145:in `open_uri'
# ./vendor/bundle/ruby/3.2.0/gems/down-5.4.0/lib/down/net_http.rb:94:in `download'
# ./vendor/bundle/ruby/3.2.0/gems/down-5.4.0/lib/down/backend.rb:13:in `download'
# ./vendor/bundle/ruby/3.2.0/gems/down-5.4.0/lib/down.rb:10:in `download'

For code:

Down.download(
  url,
  open_timeout: 10,
  read_timeout: 10,
  max_redirects: 10,
  headers: headers,
  content_length_proc: ->(content_length) {
    progress_bar.total = content_length if content_length
  },
  progress_proc: -> (progress) {
    progress_bar.increment(progress)
  }
)

ChunkedIO API is Incompatibile with CSV.parse

I'm currently noticing this when working with some CSV files I'm storing in S3 using Shrine. The Down::ChunkedIO has a private gets method when CSV.parse is trying to call (yes I know CSV.parse shouldn't be parsing by lines but that's what it does).

The actual error is:

NoMethodError (private method `gets' called for #<Down::ChunkedIO:0x007f7116dbbad0>): 

It would probably help for compatibility with other things that process IO if the gets method was public and processed chunks until it completed a line an then returned that line.

I should add that I worked around this issue by simply calling download instead of of to_io so this is hardly a big issue. I noticed that ChunkedIO is using a Tempfile as an intermediate anyways so it probably makes no difference. I'm curious though why not just process the IO in memory?

308 Permanent Redirect

Hey 👋

Down isn't able to download files with 308 redirects.

Exemple :
https://fakeimg.pl/1920x1280/fafbfc redirects to http://fakeimg.pl/1920x1280/fafbfc/ with an 308 code and them redirects to https://fakeimg.pl/1920x1280/fafbfc/ with a 301.

We can follow the redirects here : https://wheregoes.com/trace/20222494475/

Down::NetHttp.download("https://fakeimg.pl/1920x1280/fafbfc", :destination => "/Users/jules/Desktop/test.jpg", :max_redirects => 5)

308 Permanent Redirect

Have you any ideas ?

Bests

remote_file.read does not read full file for >1GB files

I am trying to read a 2GB+ files in chunks. However, when I read the file in chunks with remote_file.read(1024102450), in order to return chunks of 50MB at a time, it times out at the 3rd method call and only reads 25MB instead of 50MB. All following remote_file.read calls returns a chunk size of 0.

I initially tried using the Down.download method to download the files locally but for files larger than 2GB, the full file size was not being downloaded (downloading a 2GB resulted in a file size of 1.6GB and downloading a 5GB file returned a local file size of 2.3GB and 1.4GB when trying a second time). There are many inconsistencies using the download method.

I am using the exact code that is used in the documentation. Does this library have a solution for large file sizes or is there a way for it to prevent timeouts when streaming large files?

Parse uri error

Hi Janko, debugging the shrine issue has led me here:

> Down.download('https://trello-attachments.s3.amazonaws.com/551edb81eda0610c6fd2d322/718x485/0ac204696b33b9eba1744aaa938ded42/large_audience_reach_twitter_%5B1%5D.png')
Bad uri(is not uri?): https://trello-attachments.s3.amazonaws.com/551edb81eda0610c6fd2d322/718x485/0ac204696b33b9eba1744aaa938ded42/large_audience_reach_twitter_[1].png
========================================================================================================================================================================
[0] /vendor/bundle/ruby/2.2.0/gems/down-1.1.0/lib/down.rb:47:in `rescue in download'
    42:     downloaded_file.extend DownloadedFile
    43:     downloaded_file
    44: 
    45:   rescue => error
    46:     raise if error.is_a?(Down::Error)
 => 47:     raise Down::NotFound, error.message
    48:   end
    49: 
    50:   def copy_to_tempfile(basename, io)
    51:     tempfile = Tempfile.new(["down", File.extname(basename)], binmode: true)
    52:     if io.is_a?(OpenURI::Meta) && io.is_a?(Tempfile)

This should be very easy to reproduce.

Update: the issue doesn't manifest on my local machine (mac), while it does on server (heroku). Heroku runs 2.2, while locally I have ruby 2.1.5.

Uri.parse succeeds itself, so I guess there is something else failing.

Net::ReadTimeout

I just got alert from bugsnag:

Net::ReadTimeout/usr/lib/ruby/2.3.0/net/protocol.rb:158

/usr/lib/ruby/2.3.0/net/protocol.rb:158rbuf_fill	
/usr/lib/ruby/2.3.0/net/protocol.rb:106read	
/usr/lib/ruby/2.3.0/net/http/response.rb:291block in read_body_0	
/usr/lib/ruby/2.3.0/net/http/response.rb:276inflater	
/usr/lib/ruby/2.3.0/net/http/response.rb:281read_body_0	
/usr/lib/ruby/2.3.0/net/http/response.rb:202read_body	
gems/down-4.0.1/lib/down/chunked_io.rb:154each	
gems/down-4.0.1/lib/down/chunked_io.rb:154block in chunks_fiber	

Looks like Down didn't intercept Net::HTTP's Net::ReadTimeout exception.
I rescue only Down::Error and hope Down will intercept its dependencies' exceptions.

In my code I use Down.open.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.