GithubHelp home page GithubHelp logo

Comments (4)

pquentin avatar pquentin commented on August 21, 2024 1

Just hit this myself when trying to release urllib3 1.26.19. urllib3 2.x made a change where string bodies are encoded as UTF-8 instead of Latin-1. It was an accidental change, and I've started working on fixing/documenting it in urllib3/urllib3#3053 and urllib3/urllib3#3063 but ultimately dropped the ball, sorry.

Then, #6589 adapted requests to work with urllib3 2.x by encoding to UTF-8 to compute the Content-Length. Which means that with \xff, requests now sets Content-Length to 2, but urllib3 1.26.x only sends one byte, which is why the test hangs. Since we're not planning to revert to Latin-1 in urllib3, the fix would be for requests to explicitly encode string bodies to UTF-8 (or not try to guess the Content-Lenght, I suppose). If it does, it would be nice to avoid encoding twice which is what happens today.

from requests.

nateprewitt avatar nateprewitt commented on August 21, 2024

Thanks for reporting this, @frenzymadness! I'd thought we had a standalone GHA to still test on "urllib3<2" but that's for a separate project. I'll work on getting that added to ensure we don't have regressions.

We'll need to take a closer look at what's happening but I have a feeling this may be a byproduct of #6589. I'm wondering if we're sending a Content-Length 1 byte longer than what we're actually emitting. I was surprised when that issue was opened we hadn't had this problem before but there may be some subtle variance between the two major versions that was overlooked.

from requests.

frenzymadness avatar frenzymadness commented on August 21, 2024

I took the code from the test_content_length_for_string_data_counts_bytes and it seems to work fine:

>>> import requests
>>> data = "This is a string containing multi-byte UTF-8 โ˜ƒ"
>>> length = str(len(data.encode("utf-8")))
>>> req = requests.Request("POST", "http://foo.bar/post", data=data)
>>> p = req.prepare()
>>> p.headers["Content-Length"]
'51'
>>> length
'51'

And for the data from test_unicode_header_name

>>> data = "\xff"
>>> length = str(len(data.encode("utf-8")))
>>> req = requests.Request("POST", "http://foo.bar/post", data=data)
>>> p = req.prepare()
>>> p.headers["Content-Length"]
'2'
>>> length
'2'

from requests.

danilom-git avatar danilom-git commented on August 21, 2024

Hi all,

I was also facing a similar issue like @frenzymadness, and I can confirm that it is caused by #6589. I'm not sure whether I should continue the conversation here or at #6589, but I'll start off here

Intro

First off I want to mention that when you send the request

requests.put('https://httpbin.org/put', headers={'Content-Type': 'application/octet-stream'}, data='\xff')

it doesn't actually hang, but is actually waiting for a response from the server, and after a while the code fails with

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

A similar thing occurred with the server I was communicating with but it it actually sent a response, something like

400: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

I was sending a str with non-ascii characters (same as @frenzymadness) and it turned out that the request had an incorrect Content-Length header, something like 545 instead of the correct 543.

Issue

The issue is that if you pass a str as the data of a request, when calculating the Content-Length in super_len you encode the string with utf-8, and later on when you get to the real encoding of the body in python3.9/http/client.py on line 1330 it actually uses latin-1 (same as iso-8859-1).

So in the case of our simple example where we send '\xff' we have the following

>>> a = '\xff'
>>> len(a)
1
>>> len(a.encode('utf-8'))
2
>>> len(a.encode('latin-1'))
1

So we would be setting the Content-Length to 2 when we would actually be sending 1 byte of data.

What I find interesting is that I don't think the tests created in #6589 serve a real purpose since if you sent a request like the following, your code would fail, and it wouldn't matter that our Content-Length is 'correct'

>>> requests.put('https://httpbin.org/put', data='๐Ÿ‘๐Ÿ‘Ž')
Traceback (most recent call last):
  ...
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: Body ('๐Ÿ‘๐Ÿ‘Ž') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

So the 'workaround' mentioned in #6586 is actually the way a request like this should be sent.

So with all that being said, I'd think the fix would be to just revert the commit that introduced this change.

What do you think?

from requests.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.