I'm building requests 2.32.3 in Fedora Linux and I have a problem with <code class="no

Thanks for reporting this, <a class="user-mention notranslate" data-hovercard-type="us

Hi all, I was also facing a similar issue like <a class="user-mentio

requests 2.32.3 & urllib3 1.26.18 issue with unicode put about requests HOT 4 OPEN

frenzymadness commented on August 21, 2024

requests 2.32.3 & urllib3 1.26.18 issue with unicode put

from requests.

Comments (4)

pquentin commented on August 21, 2024 1

Just hit this myself when trying to release urllib3 1.26.19. urllib3 2.x made a change where string bodies are encoded as UTF-8 instead of Latin-1. It was an accidental change, and I've started working on fixing/documenting it in urllib3/urllib3#3053 and urllib3/urllib3#3063 but ultimately dropped the ball, sorry.

Then, #6589 adapted requests to work with urllib3 2.x by encoding to UTF-8 to compute the Content-Length. Which means that with \xff, requests now sets Content-Length to 2, but urllib3 1.26.x only sends one byte, which is why the test hangs. Since we're not planning to revert to Latin-1 in urllib3, the fix would be for requests to explicitly encode string bodies to UTF-8 (or not try to guess the Content-Lenght, I suppose). If it does, it would be nice to avoid encoding twice which is what happens today.

from requests.

nateprewitt commented on August 21, 2024

Thanks for reporting this, @frenzymadness! I'd thought we had a standalone GHA to still test on "urllib3<2" but that's for a separate project. I'll work on getting that added to ensure we don't have regressions.

We'll need to take a closer look at what's happening but I have a feeling this may be a byproduct of #6589. I'm wondering if we're sending a Content-Length 1 byte longer than what we're actually emitting. I was surprised when that issue was opened we hadn't had this problem before but there may be some subtle variance between the two major versions that was overlooked.

from requests.

frenzymadness commented on August 21, 2024

I took the code from the test_content_length_for_string_data_counts_bytes and it seems to work fine:

>>> import requests
>>> data = "This is a string containing multi-byte UTF-8 ☃"
>>> length = str(len(data.encode("utf-8")))
>>> req = requests.Request("POST", "http://foo.bar/post", data=data)
>>> p = req.prepare()
>>> p.headers["Content-Length"]
'51'
>>> length
'51'

And for the data from test_unicode_header_name

>>> data = "\xff"
>>> length = str(len(data.encode("utf-8")))
>>> req = requests.Request("POST", "http://foo.bar/post", data=data)
>>> p = req.prepare()
>>> p.headers["Content-Length"]
'2'
>>> length
'2'

from requests.

danilom-git commented on August 21, 2024

Hi all,

I was also facing a similar issue like @frenzymadness, and I can confirm that it is caused by #6589. I'm not sure whether I should continue the conversation here or at #6589, but I'll start off here

Intro

First off I want to mention that when you send the request

requests.put('https://httpbin.org/put', headers={'Content-Type': 'application/octet-stream'}, data='\xff')

it doesn't actually hang, but is actually waiting for a response from the server, and after a while the code fails with

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

A similar thing occurred with the server I was communicating with but it it actually sent a response, something like

400: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

I was sending a str with non-ascii characters (same as @frenzymadness) and it turned out that the request had an incorrect Content-Length header, something like 545 instead of the correct 543.

Issue

The issue is that if you pass a str as the data of a request, when calculating the Content-Length in super_len you encode the string with utf-8, and later on when you get to the real encoding of the body in python3.9/http/client.py on line 1330 it actually uses latin-1 (same as iso-8859-1).

So in the case of our simple example where we send '\xff' we have the following

>>> a = '\xff'
>>> len(a)
1
>>> len(a.encode('utf-8'))
2
>>> len(a.encode('latin-1'))
1

So we would be setting the Content-Length to 2 when we would actually be sending 1 byte of data.

What I find interesting is that I don't think the tests created in #6589 serve a real purpose since if you sent a request like the following, your code would fail, and it wouldn't matter that our Content-Length is 'correct'

>>> requests.put('https://httpbin.org/put', data='👍👎')
Traceback (most recent call last):
  ...
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: Body ('👍👎') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

So the 'workaround' mentioned in #6586 is actually the way a request like this should be sent.

So with all that being said, I'd think the fix would be to just revert the commit that introduced this change.

What do you think?

from requests.

requests 2.32.3 & urllib3 1.26.18 issue with unicode put about requests HOT 4 OPEN

Comments (4)

Intro

Issue

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs