A call like this: <div class="highlight highlight-source-python notranslate positi

The Content-Length header for string `data` counts Unicode characters in the string when it should count encoded bytes about requests HOT 6 OPEN

bruceadams commented on July 20, 2024 1

The Content-Length header for string `data` counts Unicode characters in the string when it should count encoded bytes

from requests.

Comments (6)

goelbenj commented on July 20, 2024 3

I do not understand what you are saying. What hack?

Ha, looks like we made the same conclusion here. What I meant regarding the "hack" was requiring the user to encode their string data as UTF-8 for the Content-Length header to be correctly initialized.

from requests.

goelbenj commented on July 20, 2024

I assume that it is incorrect to require the data to be encoded as UTF-8, so I will work on a fix that removes the need for this hack. @bruceadams

from requests.

bruceadams commented on July 20, 2024

I assume that it is incorrect to require the data to be encoded as UTF-8, so I will work on a fix that removes the need for this hack. @bruceadams

I do not understand what you are saying. What hack?

A Python string can contain Unicode characters. To send a Python string as the body of an HTTP request, the string needs to be encoded into bytes. UTF-8 is a common encoding (and I see signs of UTF-8 being assumed elsewhere in the Requests code). In the behavior I saw in the wild, Requests did, in fact, encode the request body as UTF-8.

from requests.

bruceadams commented on July 20, 2024

Ah! Your pull request lines up with how I thought this might be properly addressed! Nice! (I just created a similar pull request #6589.)

from requests.

numblr commented on July 20, 2024

Can I fix this by downgrading to a previous version? I don't want to (and some users probably cannot) change the code to convert to bytes before passing it to the request.

Also don't really get your fixes, the body is at some point converted to bytes (there is a body_to_chunks in request.py) that also seems to set the content-length header? But that is just a side note, I'm not into the code, so just ignore it if I'm talking nonsense..

from requests.

sigmavirus24 commented on July 20, 2024

Bytes are the language of the Internet regardless of whether you think that. Many things try to paper over that. The right thing is to typically send bytes that you know how they should be encoded but barring that, we should be always dealing with bytes internally. Now that we dropped 2.7 support, I'd support always encoding data parameters that are strs to bytes before doing anything else with them (e.g., calculating content length) internally

from requests.

Recommend Projects

The Content-Length header for string `data` counts Unicode characters in the string when it should count encoded bytes about requests HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs