GithubHelp home page GithubHelp logo

Comments (14)

dougwilson avatar dougwilson commented on August 22, 2024 1

Yea, the CD header existed well before the new specs were written, back in the historical days. The main reason why this module is emitting them was due to users who needed this (back when it was part of expressjs before extracted into it's own module).

Many languages are covered by latin1 and there are clients where when the extended filename parameter is included they fail to parse the header. So back then the compromise was to make it so the full usage of latin1 would result in just the plain filename parameter, allowing things like spanish, french, and more all to generate the header without the extended parameter being there and working for various clients.

There are many more clients out there than web browsers, for example both cURL and wget command line tools parse this header to determine the filename to save to disk.

This is a hard balancing act and so that I don't have to read every client implementation out there, the compromise was fairly straight forward: just follow the spec and since it allows the usage of iso-8859-1 (latin1) in the basic filename parameter it made everyone happy. Apparently for many mant years until now...? Haha

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

Ah, I see. So the default module of this module is to follow the specification, as in just because Chrome does not follow the actual specification does not make the bug in this module, as of course there are other clients who do follow the spec.

What if instead of just generating a longer string when not actually required this module adds an option to allow you to always generate both, for example, so you can decide what to do (maybe sniff the user-agent header and only do it then, or unconditionally if that is what you want)?

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

Another potentional option is that this module could kick in the use of extension parameter if the filename falls outside of US-ASCII instead of ISO-8859-1, though in order to know if that would actually be effective, we would need a much better understanding of what exactly these buggy clients are doing with the bytes in the file name, as not all single byte character schemes match the US-ASCII set for the first 127 characters.

from content-disposition.

jeremyspiegel avatar jeremyspiegel commented on August 22, 2024

I'll create a Chromium issue as well for this. However, the specification advises to use ASCII rather than ISO-8859-1:

Appendix D.  Advice on Generating Content-Disposition Header Fields

   To successfully interoperate with existing and future user agents,
   senders of the Content-Disposition header field are advised to:
   o  Avoid using non-ASCII characters in the filename parameter.
      Although most existing implementations will decode them as
      ISO-8859-1, some will apply heuristics to detect UTF-8, and thus
      might fail on certain names.

This strategy seems to be followed by Django, Flask, and Rails.

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

Right, probably the reason that is added to the appendix is because there are buggy clients out there who are not following the specification, as they are trying to account for buggy servers who are accidentally generated bad header values in the wrong character set. It is definitely a tricky problem, which is why I generally try to follow the actual standards, since they actually spell out the way to do it.

That is why I also provided the second suggestion above, though. The spec itself clearly spells out that the header is ISO-8859-1 character encoding, regardless of buggy client implementations. So this implementation is not buggy here.

I suggested that maybe we can add options so you can make the decisions you like to better control the behavior that would work better for your case (my first comment). Would that work for you?

from content-disposition.

jeremyspiegel avatar jeremyspiegel commented on August 22, 2024

I appreciate your quick response, and I understand wanting to avoid sending extra data for misbehaving clients, but I think this module would be better if by default it followed the advice of the spec and the behavior of other popular web server frameworks. I understand if you disagree — an option would also be helpful. Thanks!

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

Right. Of course the RFC has specific wording for the spec, like MUST NOT and SHOULD NOT, etc. The appendix is not even a main point of the spec, relegated to a footnote, without even using the standard wording, so it's hard to draw concrete instructions from it.

So to be clear: I'm not saying I am not going to change anything, and your discussion on this is valuable, so please do not feel dismayed.

from content-disposition.

jeremyspiegel avatar jeremyspiegel commented on August 22, 2024

I might be missing something, but I don't actually see the spec use MUST, SHOULD, MAY, etc. in regards to using ISO-8859-1. It seems to only mention ISO-8859-1 when talking about the "filename*" parameter, and how it allows characters outside the ISO-8859-1 character set. This does imply that the "filename" parameter can be ISO-8859-1, but it doesn't actually come out and say that anywhere else, so it seems a little ambiguous to me. However, I do think the appendix is clear about the recommendation to avoid non-ASCII characters.

Here is the Chromium bug I filed: https://bugs.chromium.org/p/chromium/issues/detail?id=1006345.

Thanks again for considering a change here!

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

I might be missing something, but I don't actually see the spec use MUST, SHOULD, MAY, etc. in regards to using ISO-8859-1. It seems to only mention ISO-8859-1 when talking about the "filename*" parameter, and how it allows characters outside the ISO-8859-1 character set. This does imply that the "filename" parameter can be ISO-8859-1, but it doesn't actually come out and say that anywhere else, so it seems a little ambiguous to me.

This is because since this spec is in relation to a HTTP header, the spec for HTTP itself is what declares header values as ISO-8859-1, so it only passively mentions it since it assumes that as prior knowledge to the specific Content-Disposition header specification. You can see they try to provide the needed link a few times in the spec, notably in https://tools.ietf.org/html/rfc6266#appendix-C

By default, HTTP header field parameters cannot carry characters outside the ISO-8859-1 ([ISO-8859-1]) character encoding (see [RFC2616], Section 2.2).

However, I do think the appendix is clear about the recommendation to avoid non-ASCII characters.

Yes, I do not disagree here. I was just noting it was relegated to an appendix, when it could have been made an actual rule by either not defining the ABNF of the the header as

     filename-parm       = "filename" "=" value
     value         = <value, defined in [RFC2616], Section 3.6>
                   ; token | quoted-string

in which case the value is defined in RFC2616 as

       value                   = token | quoted-string
       quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
       qdtext         = <any TEXT except <">>

with TEXT being

The TEXT rule is only used for descriptive field contents and values
that are not intended to be interpreted by the message parser. Words
of *TEXT MAY contain characters from character sets other than ISO-
8859-1 [22] only when encoded according to the rules of RFC 2047
[14].

       TEXT           = <any OCTET except CTLs,
                        but including LWS>

or annotating in the main part of the spec that TEXT "MUST NOT" contain characters above 0x7f (which, also, they could have just not used the quoted-string ABNF from the RFC2616).

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

Looking around it seems like Firefox has a similar bug open: https://bugzilla.mozilla.org/show_bug.cgi?id=588409

from content-disposition.

jeremyspiegel avatar jeremyspiegel commented on August 22, 2024

From the new spec:

   Historically, HTTP has allowed field content with text in the
   ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
   through use of [RFC2047] encoding.  In practice, most HTTP header
   field values use only a subset of the US-ASCII charset [USASCII].
   Newly defined header fields SHOULD limit their field values to
   US-ASCII octets.  A recipient SHOULD treat other octets in field
   content (obs-text) as opaque data.

I guess Content-Disposition isn't a newly defined field so I'm not sure if this is necessarily relevant, but it's interesting to note.

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

And, ultimately, nothing in the specs seems to be pointing to Chrome is doing the correct thing. At best, the specs admit that there are buggy clients out there (like Chrome, I guess).

from content-disposition.

jeremyspiegel avatar jeremyspiegel commented on August 22, 2024

Fair enough, thank you for that explanation.

from content-disposition.

jeremyspiegel avatar jeremyspiegel commented on August 22, 2024

A Chromium developer has responded to the bug:

I'm skeptical that changing behavior here is worth the investment and dealing with the inevitable breakage, unless we want to reach consensus with other browsers. https://tools.ietf.org/html/rfc7230 pretty clearly bans non-ASCII characters in HTTP headers:

 header-field   = field-name ":" OWS field-value OWS

 field-name     = token
 field-value    = *( field-content / obs-fold )
 field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
 field-vchar    = VCHAR / obs-text

From https://tools.ietf.org/html/rfc5234#appendix-B.1
VCHAR = %x21-7E

So we're dealing with violations of the HTTP spec, and the question is just how to deal with them. (obs-fold is related to line folding, and considered obsolete - regardless, not relevant here).

from content-disposition.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.