Is your feature request related to a problem? Forgive me - parts o

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The 3rd task would be partially accomplished by <a class="issue-link js-issue-link" da

I think we need to do some git archeology to figure out why <code class="

Accept-Encoding header parsing and interpretation about aiohttp HOT 15 OPEN

steverep commented on September 28, 2024

Accept-Encoding header parsing and interpretation

from aiohttp.

Comments (15)

Dreamsorcerer commented on September 28, 2024 1

This is definitely not true.

I was referring to the real world behaviour of Accept-Language, and wondering if the same applies. Clearly that's not the case, so you are probably right that we should use the quality value here.

from aiohttp.

steverep commented on September 28, 2024

@bdraco I think handling the 1st task here for case-sensitivity is quite easy to do before #8063 so I'll submit that.

However, the 3rd task to completely parsing it correctly per the RFC is less a priority to backport IMO. And implementing it raises a larger question for me...

Although it's relatively easy in python, shouldn't that parsing be done by llhttp, and probably the conversion to lowercase too? Meaning, by the time it hits the code to use it for picking or compressing the file, it should be lowered and in a form like {"<encoding>": <quality>, ...} instead of just the raw string. There are other headers that use this format -- Accept, Accept-Language, TE -- but I don't see any of those parsed anywhere in the code.

@Dreamsorcerer is this a question for you?

from aiohttp.

steverep commented on September 28, 2024

The 3rd task would be partially accomplished by #7679, which is drafted for the 4.0 milestone.

from aiohttp.

steverep commented on September 28, 2024

For the 2nd task regarding the behavior of the force parameter in enable_compression(), the change would be pretty easy; however, a decision is needed on intent and backporting. Either:

Bug fix back to 3.9: In other words, the intent of force was never to ignore Accept-Encoding.
Feature for 3.10: Perhaps add a new prefer parameter to make the intent clear, and potentially deprecate force in version 4.
Breaking for v4: I don't recommend this as it would be problematic for users if Brotli were implemented for 3.10 unless it was the default over deflate.

Thoughts?

from aiohttp.

bdraco commented on September 28, 2024

I think we need to do some git archeology to figure out why force was added in the first place before a path forward can be determined.

from aiohttp.

Dreamsorcerer commented on September 28, 2024

Although it's relatively easy in python, shouldn't that parsing be done by llhttp, and probably the conversion to lowercase too? Meaning, by the time it hits the code to use it for picking or compressing the file, it should be lowered and in a form like {"<encoding>": <quality>, ...} instead of just the raw string. There are other headers that use this format -- Accept, Accept-Language, TE -- but I don't see any of those parsed anywhere in the code.

My impression is that llhttp is just passing us header values. Meaning that it has parsed it as HTTP, but not converted anything based on the particular header names, which is probably out-of-scope. So, such changes would likely go into _http_parser.pyx instead, if you want the compiled performance.

However, is there actually any merit in implementing this quality check? I've seen the same syntax for locales, and I think the general consensus is that this was over-engineered and basically pointless. The quality numbers fail to achieve anything more than defining an order of preference (i.e. the actual numbers are never used for anything, they're just sorted on). Therefore, browsers don't provide any way for users to customise these quality values, they just pick evenly spaced values according to order of user preference. Because of all this, browsers will also always order them with the highest quality number at the start, so in reality, you can just read the values left-to-right while ignoring the quality numbers and you'll get the same result. Parsing the quality numbers would therefore just be a waste of CPU-time.

from aiohttp.

steverep commented on September 28, 2024

I think we need to do some git archeology to figure out why force was added in the first place before a path forward can be determined.

It was a boolean before #403 added gzip, and the code before that made it clear that it was actually a "force" to ignore Accept-Encoding. The boolean was originally added in #239 but there's no comments to explain any use case for it.

from aiohttp.

Dreamsorcerer commented on September 28, 2024

I think the force parameter is pretty clear in that it forces a given compression. I'm not clear why you'd want to do this, so I'm not clear that turning it into a preference makes sense without understanding the use case (can we find any projects using the parameter?).

Clearly it's not correct behaviour to use compression that the client claims not to support, but it also seems weird to try and use a fixed compression method in the first place...

from aiohttp.

Dreamsorcerer commented on September 28, 2024

I've seen the same syntax for locales, and I think the general consensus is that this was over-engineered and basically pointless.

OK, there seems to be an important difference between Accept-Encoding and Accept-Language. The former can disallow encodings with q=0, so maybe we need to atleast check for that.

Edit: OK, not a difference, but it's explicitly mentioned under Accept-Encoding, whereas it'd be very weird to try and exclude languages.

from aiohttp.

steverep commented on September 28, 2024

My impression is that llhttp is just passing us header values. Meaning that it has parsed it as HTTP, but not converted anything based on the particular header names, which is probably out-of-scope. So, such changes would likely go into _http_parser.pyx instead, if you want the compiled performance.

Okay, makes sense 👍🏻

However, is there actually any merit in implementing this quality check? I've seen the same syntax for locales, and I think the general consensus is that this was over-engineered and basically pointless. The quality numbers fail to achieve anything more than defining an order of preference (i.e. the actual numbers are never used for anything, they're just sorted on).

That's certainly correct - the numbers mean nothing but a sort order. I think the reason it's not just a list in preference order is that then it becomes impossible to communicate equal preference for some or all items. That's why the default quality if not specified is 1.

To a lesser extent it's also a way to have negations with a wildcard, e.g. "compress;q=0, *" says I'll take any encoding except compress. Although even the spec admits this pattern has little to no practical value.

FWIW, Apache does use the qualities in its content negotiation.

Therefore, browsers don't provide any way for users to customise these quality values, they just pick evenly spaced values according to order of user preference.

True they are not user-facing for sure (nor would they have any reason to be?), and the values are just picked to set the browser's preference.

Because of all this, browsers will also always order them with the highest quality number at the start, so in reality, you can just read the values left-to-right while ignoring the quality numbers and you'll get the same result. Parsing the quality numbers would therefore just be a waste of CPU-time.

This is definitely not true. The spec is clear that values without a quality default to 1 and browsers take advantage of that. For example, both Chrome and Firefox send Accept-Encoding: gzip, deflate, br, but that certainly does not mean they prefer gzip over brotli. Decoding time is inconsequential so the best interest for a client is smallest transfer size. Google even makes this clear in their Lighthouse audit for compression.

What they send for the Accept header also makes this clear:

Firefox = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,/;q=0.8"
Chrome = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7"

from aiohttp.

steverep commented on September 28, 2024

Clearly it's not correct behaviour to use compression that the client claims not to support, but it also seems weird to try and use a fixed compression method in the first place...

I completely agree - neither makes sense to me. I think there is a use case for changing the server's default encoding order though. For example, a user might benchmark and find that sending large files over long distances is likely faster with gzip than deflate (i.e. the larger encoding time is offset by the faster transfer time). So they might want the order to be ["br", "gzip", ..others].

If I were to redesign the current API, it might look something like this to allow a set of encodings to be preferred first but then fallback to others supported:

    def __init__(self, ...) -> None:
        ...
        self._ordered_codings = deque(ContentCoding, len(ContentCoding))

    def enable_compression(self, *prefer: ContentCoding) -> None:
        """Enables response compression with preferred encodings."""
        self._compression = True
        for encoding in prefer:
            self._ordered_codings.remove(encoding)
        self._ordered_codings.extend(prefer)
        self._ordered_codings.rotate(len(prefer))

    async def _start_compression(self, request: "BaseRequest") -> None:
        # Encoding comparisons should be case-insensitive
        # https://www.rfc-editor.org/rfc/rfc9110#section-8.4.1
        accept_encoding = request.headers.get(hdrs.ACCEPT_ENCODING, "").lower()
        for coding in self._ordered_codings:
            if coding.value in accept_encoding:
                await self._do_start_compression(coding)
                return

from aiohttp.

Dreamsorcerer commented on September 28, 2024

Yeah, I'd be open to something like that. I wonder if it's worth adding this as a new feature (and potentially removing force if we can't find any uses for it).

from aiohttp.

steverep commented on September 28, 2024

Yeah, I'd be open to something like that. I wonder if it's worth adding this as a new feature (and potentially removing force if we can't find any uses for it).

Here's my attempt at a search for enable_compression() uses with an argument, after a lot of filtering to get rid of just copies of aiohttp itself.

There are some results using the old boolean, some oddly specifying "identity" 😕, some just specifying "gzip" that clearly would have no harm by turning to a preference (e.g. ignore the Home Assistant results), and even one result that is doing the job aiohttp should be doing and checking the Accept-Encoding header 😉.

Passing "identity" is concerning because currently that would just do nothing, but under my proposal, it would be skipped because it's not going to be in the Accept-Encoding header and then the algorithm would go on to deflate. I could either put an assertion in that disallows it, or just adjust the header check because it's implied as included (unless explicitly excluded).

from aiohttp.

Dreamsorcerer commented on September 28, 2024

Here's my attempt at a search for enable_compression() uses with an argument, after a lot of filtering to get rid of just copies of aiohttp itself.

That gives me no results for some reason. I did create a script previously to search dependents, though it can be slow to run (it searches 1 repo every ~7 seconds due to rate limit): https://gist.github.com/Dreamsorcerer/70285fac0a11c3d9c26b577f7dd989a7

There are some results using the old boolean

Might be interesting to see if any of them mention any rationale in the commit they were added...

Passing "identity" is concerning because currently that would just do nothing, but under my proposal, it would be skipped because it's not going to be in the Accept-Encoding header and then the algorithm would go on to deflate. I could either put an assertion in that disallows it, or just adjust the header check because it's implied as included (unless explicitly excluded).

My counter-proposal is to just add a new prefer parameter which takes a sequence of strings. Then add a deprecation warning to the force parameter and remove it from v4. If we get complaints from the deprecation notice, then we can reconsider removing it.

from aiohttp.

steverep commented on September 28, 2024

That gives me no results for some reason.

The markdown sanitizer keeps removing all the escapes from the regex parts. The search term should look like this with 20 results for me:

aiohttp AND /\.enable_compression$.+$/ language:python NOT is:archived NOT is:fork NOT repo:aio-libs/aiohttp NOT is:vendored NOT is:generated NOT path:/(^|\/)aiohttp\/(multipart|web_response|client_reqrep)\.py$/ NOT path:/tests\/test_(web_response|client_functional|http_writer)\.py$/ NOT repo:cndn/intelligent-code-completion

Might be interesting to see if any of them mention any rationale in the commit they were added...

I didn't check, but one of the false arguments did have a comment mentioning it was a workaround for the double compression issue recently fixed by @bdraco.

My counter-proposal is to just add a new prefer parameter which takes a sequence of strings. Then add a deprecation warning to the force parameter and remove it from v4. If we get complaints from the deprecation notice, then we can reconsider removing it.

Not as elegant, but safer of course. Also, consider that either way this becomes a "breaking" change. For example, the Home Assistant use is definitely not expecting a force, so a change would be required to specify prefer=.

from aiohttp.

Accept-Encoding header parsing and interpretation about aiohttp HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs