GithubHelp home page GithubHelp logo

Comments (8)

emilsedgh avatar emilsedgh commented on August 22, 2024 1

Hi Doug.

On aws-sdk-js#1648 the authors suggested the safest way to use that library is to send US-ASCII characters for filename and everything else on filename*.

So for now my solution was to strip my filename from any non-ascii character and provided that as a callback to content-disposition.

const fallback = name.replace(name, /[^\x00-\x7F]/ig, '?')

const ContentDisposition = contentDisposition(name, {
  type: 'inline',
  fallback
})

However, I think it's in everyone's best interest if aws-sdk would resolve the issue. Your detailed investigation would be valuable to the aws-sdk-js#1648.

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

Hi! I'm not familiar with that module, so can't really say for sure. This module tries to adhere to all specifications around the header field, though. Is there anything more specific perhaps I can answer? I skimmed though the linked issue but I don't see mention of this module and not really sure how you plan to fit them together and what, if any, errors happen when you do.

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

Sorry, I guess you do have an error posted. Is that when using this module? How does the signature calculation work? I guess reading it more, it sounds like the aws-sdk signature calculation is just broken, but it won't be fixed?

You can specify an argument to the fallback option that only contains ascii characters if the aws-sdk is broken as it reads like, or perhaps make a pull request to encode http headers correctly for signature calculation?

from content-disposition.

emilsedgh avatar emilsedgh commented on August 22, 2024

Alright let me try this from another angle.

const cp = require('content-disposition');

cp(String.fromCharCode(80, 80, 160, 80))

cp('©')

None of these are translated into filename*. Instead, they are kept as filename.

Are they valid according to RFC 5987?

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

Yes, that example is 100% valid according to the specification.

The specification for the Content-Disposition header is https://tools.ietf.org/html/rfc6266

The grammar for the header is in section 4.1 (https://tools.ietf.org/html/rfc6266#section-4.1):

     content-disposition = "Content-Disposition" ":"
                            disposition-type *( ";" disposition-parm )

     disposition-type    = "inline" | "attachment" | disp-ext-type
                         ; case-insensitive
     disp-ext-type       = token

     disposition-parm    = filename-parm | disp-ext-parm

     filename-parm       = "filename" "=" value
                         | "filename*" "=" ext-value

     disp-ext-parm       = token "=" value
                         | ext-token "=" ext-value
     ext-token           = <the characters in token, followed by "*">

   Defined in [RFC2616]:

     token         = <token, defined in [RFC2616], Section 2.2>
     quoted-string = <quoted-string, defined in [RFC2616], Section 2.2>
     value         = <value, defined in [RFC2616], Section 3.6>
                   ; token | quoted-string

   Defined in [RFC5987]:

     ext-value   = <ext-value, defined in [RFC5987], Section 3.2>

The plain filename is defined above as filename-parm = "filename" "=" value and value is defined as

     token         = <token, defined in [RFC2616], Section 2.2>
     quoted-string = <quoted-string, defined in [RFC2616], Section 2.2>
     value         = <value, defined in [RFC2616], Section 3.6>
                   ; token | quoted-string

Note that the output of your example has put the filename inside double quotes, which means it is the quoted-string ABNF above.

Going to the referenced RFC (https://tools.ietf.org/html/rfc2616#section-2.2), here is the quoted-string definition:

       quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
       qdtext         = <any TEXT except <">>
       TEXT           = <any OCTET except CTLs,
                        but including LWS>

The above example String.fromCharCode(80, 80, 160, 80) encodes into the octets 50 50 a0 50. All of those octets are valid for the TEXT ABF grammar, as they are all OCTET (<any 8-bit sequence of data>) and none are CTL (<any US-ASCII control character (octets 0 - 31) and DEL (127)>).

Please correct me if I made a mistake.

from content-disposition.

emilsedgh avatar emilsedgh commented on August 22, 2024

No you're right. Thank you so much for the great library and my apologies for the irrelevant and invalid issue.

from content-disposition.

dougwilson avatar dougwilson commented on August 22, 2024

Hi @emilsedgh it's no problem at all. This has me wondering if it might be useful if the module added a feature where you could do fallback: false, in which case it would not output the filename parameter at all and instead only output the filename* parameter.

The biggest downside to this is many (most?) parsers will only ever look at the filename parameter, which is why this module tries hard to always have one, if possible, currently. This may not be an issue with your use-case, however, though I'm not sure. From the thread in awk-sdk, it would seem like this may not be an issue for that particular service, though.

It would seem to me, though, that getting that error would indicate that the client (awk-sdk) is calculating the signature differently that the server. If this value is causing the trip up, it would seem likely that the server is handing the character correcrtly while the client is not, leading to the signature mismatch. It seems odd that the client not calculating the signature the same way the server is is not considered a bug in the client (??).

Just greping through the aws-sdk source code, it would seem the mistake is a disconnect between like https://github.com/aws/aws-sdk-js/blob/5d868465426a66c9bdae1f3f0bb67aeca5a5825f/lib/signers/v4.js#L123 and https://github.com/aws/aws-sdk-js/blob/5d868465426a66c9bdae1f3f0bb67aeca5a5825f/lib/signers/v4.js#L176 where the headers are concatenated into a JavaScript string (which is internally just UCS-2 encoded), but when Node.js is going to write that string out to the wire, it will do so as ISO-8859-1, following the specification, but the signature function in the SDK (which also needs to turn the internal JavaScript string into octets) encods using UTF-8 instead, causing the mismatch.

Node.js source where header string is encoded into octets for wire transfer: https://github.com/nodejs/node/blob/master/lib/_http_outgoing.js#L220-L233 (note the specification to the encoding that the headers are the latin1 encoding).

The location where ultimately that string in the sdk is turned into octets (in the form of a Node.js Buffer object) is https://github.com/aws/aws-sdk-js/blob/c175cb2b89576f01c08ebf39b232584e4fa2c0e0/lib/util.js#L415 at which time since the encoding is not specified, it is encoded as UTF-8.

I hope this helps!

from content-disposition.

emilsedgh avatar emilsedgh commented on August 22, 2024

As for fallback: false, I agree with your comment. It may break many clients.

Do you find value in providing us-ascii as a fallback option?

from content-disposition.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.