Filing this for tracking purposes - we may decide to immediately close. I thought

I agree and since <a class="user-mention notranslate" data-hovercard-type="user" data-

Add do not flush flag to encode API, accept DOMString about encoding HOT 3 CLOSED

whatwg commented on May 22, 2024

Add do not flush flag to encode API, accept DOMString

from encoding.

Comments (3)

annevk commented on May 22, 2024

Note that this would require some special logic in the API since all the underlying algorithms operate on scalar values.

from encoding.

hsivonen commented on May 22, 2024

I think requiring the caller to pass data in chunks of valid UTF-16 is a feature and not a bug.

As you note, it allows us not to have a separate streaming mode on the encoder side (thanks to ISO-2022-JP not being supported on the TextEncoder side).

Additionally, accommodating strings that are not self-contained valid UTF-16 strings would be a step backwards in terms of steering the Web Platform in a direction that would allow browsers to use UTF-8 strings internally (except in the JS engine when a program manipulates a string by 16-bit units). Some years ago when I argued for document.write to take 16-bit code units instead of valid UTF-16 strings, getting rid of UTF-16 as an internal representation seemed hopeless. However, Servo gives me hope that we might be able to fix the design error of using UTF-16 as the browser-internal memory representation and use UTF-8 in the future. The least we can do on the spec side is to avoid adding new places that expose the internal memory representation of Unicode strings.

Furthermore, having recently worked on a decoder that tries to fill char16_t output buffers fully even if it means that an astral character gets split across a buffer boundary and having worked on an encoder that tries to work properly (as if unpaired surrogates had been replaced with U+FFFD in the input) in the face of invalid input, I've come to especially appreciate Rust's notion of making UTF-8 validity guarantees part of the core notion of safety of the language itself. To the extent we are stuck with using UTF-16 as the browser-internal representation, I think we would benefit from enforcing UTF-16 validity at the boundary between the JS engine and the rest of the browser in order to be able to write non-JS engine code with the assumption that UTF-16 sequences are always valid. (As opposed to sprinkling unpaired surrogate handling all over the code base.)

For these reasons, I think we should close this as "won't fix".

from encoding.

annevk commented on May 22, 2024

I agree and since @inexorabletash wasn't sure either, closing.

from encoding.

Recommend Projects

Add do not flush flag to encode API, accept DOMString about encoding HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs