Comments (3)
Note that this would require some special logic in the API since all the underlying algorithms operate on scalar values.
from encoding.
I think requiring the caller to pass data in chunks of valid UTF-16 is a feature and not a bug.
As you note, it allows us not to have a separate streaming mode on the encoder side (thanks to ISO-2022-JP not being supported on the TextEncoder
side).
Additionally, accommodating strings that are not self-contained valid UTF-16 strings would be a step backwards in terms of steering the Web Platform in a direction that would allow browsers to use UTF-8 strings internally (except in the JS engine when a program manipulates a string by 16-bit units). Some years ago when I argued for document.write
to take 16-bit code units instead of valid UTF-16 strings, getting rid of UTF-16 as an internal representation seemed hopeless. However, Servo gives me hope that we might be able to fix the design error of using UTF-16 as the browser-internal memory representation and use UTF-8 in the future. The least we can do on the spec side is to avoid adding new places that expose the internal memory representation of Unicode strings.
Furthermore, having recently worked on a decoder that tries to fill char16_t output buffers fully even if it means that an astral character gets split across a buffer boundary and having worked on an encoder that tries to work properly (as if unpaired surrogates had been replaced with U+FFFD in the input) in the face of invalid input, I've come to especially appreciate Rust's notion of making UTF-8 validity guarantees part of the core notion of safety of the language itself. To the extent we are stuck with using UTF-16 as the browser-internal representation, I think we would benefit from enforcing UTF-16 validity at the boundary between the JS engine and the rest of the browser in order to be able to write non-JS engine code with the assumption that UTF-16 sequences are always valid. (As opposed to sprinkling unpaired surrogate handling all over the code base.)
For these reasons, I think we should close this as "won't fix".
from encoding.
I agree and since @inexorabletash wasn't sure either, closing.
from encoding.
Related Issues (20)
- Editorial: consider another name for the run algorithm
- Add NeXTSTEP encoding HOT 2
- "For logical right shifts operands must have at ..." HOT 4
- Corner cases arising from Big5 encoder not excluding HKSCS codes with lead bytes 0xFA–FE HOT 6
- End-of-queue during decoding of GB18030 should not mask ASCII characters. HOT 4
- gb18030 encoder using index gb18030 ranges pointer HOT 4
- aria-label usage in BMP coverage table HOT 4
- Bug in TextDecoderStream around processing the end of stream. HOT 1
- Add a static decode and encode method to `TextEncoder` and `TextDecoder` HOT 10
- Shift_JIS decoder HOT 12
- [GB18030] Wrong codepoint at index 7533 HOT 4
- TextDecoderStream: empty Uint8Array should result in an empty string HOT 4
- 7-bit ASCII encoding HOT 3
- The concept of "output encoding" is not described anywhere HOT 5
- Visualization tables has lack of descriptions HOT 2
- Why Big5 index contains unmappable characters? HOT 2
- Consider adding windows-936-2000 as a label for GBK HOT 2
- Preface punctuation
- Reflect changes in GB 18030-2022 HOT 5
- Make encodeInto() throw when given a detached buffer HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from encoding.