For a byte sequence it probably makes sense, e.g., <a href="https://fetch.spec.whatwg.

I guess one downside is that if we have an actual Java string coming from a Java

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

String / byte sequence instance manipulation about infra HOT 18 OPEN

whatwg commented on June 5, 2024

String / byte sequence instance manipulation

from infra.

Comments (18)

domenic commented on June 5, 2024

Agreed we should probably just state that strings and byte sequences are mutable.

Mutable strings are very annoying in programming languages but I don't think there's much of a problem for them in specs. Maybe we should try to solicit wider opinions though especially since I can't remember why they're so bad in programming languages.

from infra.

domenic commented on June 5, 2024

I guess one downside is that if we have an actual JavaScript string coming from a JavaScript program and we treat it as mutable, that is nonsensical.

from infra.

adanilo commented on June 5, 2024

For statically compiled languages strings are immutable since if you include a given string in multiple source files, the linker will resolve them down to a single copy in the final binary. They also end up in the read-only data section of the executable. Makes sense to state strings and byte-sequences are mutable for JS.

from infra.

annevk commented on June 5, 2024

Strings are not mutable in JS. This is about what we do in standards with strings.

But that does bring up an interesting point, if JavaScript strings (as defined by Infra in due course) become mutable, does that mean IDL always needs to copy? It might be better if we match JavaScript after all...

from infra.

esprehn commented on June 5, 2024

How is this web observable? The native string type used inside Blink and WebKit is immutable, so is the JS string. As long as specs never expose the mutability in some object identity way I'm not sure it matters what's in the spec, though it doesn't really match how many implementations work.

@bzbarsky

from infra.

domenic commented on June 5, 2024

Repeating some discussion Elliott and I had offline:

This is not about anything web observable really. It's about whether we write our specs as "lowercase x" or "set x to the result of lowercasing x". Most specs seem to do the former. The question at hand is whether we should explicitly state that spec-strings are mutable so things like that work, or if we should try to move the spec ecosystem away from it and toward the latter style.

Besides Elliott's point about Blink and WebKit using immutable string types and how this means a mutable string type in specs make spec <-> implementation translation harder, he reminded me why mutable strings in programming langauges are scary. It's because they can result in spooky action at a distance. I.e. you could pass a string down through many algorithms and then one of them mutates it, and all the others are now affected. That's pretty bad.

from infra.

bzbarsky commented on June 5, 2024

Right, passing mutable references around should be done very carefully. There's a difference between that and having a mutable reference that's tightly scoped.

In any case, I don't have a strong opinion about whether we should allow the "lowercase x" thing. Either way, the "set y to be x" pattern has gotchas that people need to watch out for: lowercasing x may or may not cause y to also be lowercase, depending on how it's done. And if there is no aliasing, then you can't tell apart in-place lowercasing and copying lowercasing...

from infra.

annevk commented on June 5, 2024

It's observable in the sense that it defies logic, depending on how IDL is defined. We've already established that certain objects pass through IDL so any JavaScript references to them can observe changes that happen in the specification algorithm for the IDL method or attribute, such as detaching an ArrayBuffer object.

Now, if I define a method that takes a DOMString x as "ASCII lowercase x", the result of mutable strings on the inside and immutable strings on the outside without IDL copying the input would result in some kind of logic error.

from infra.

bzbarsky commented on June 5, 2024

I think there is a strong implication, which we should perhaps make explicit, that https://heycam.github.io/webidl/#es-to-DOMString and friends copy.

from infra.

annevk commented on June 5, 2024

@bzbarsky if that's acceptable that would certainly make things easier as we wouldn't have to change much (unless mutable strings are a problem waiting to happen), but it feels a bit like cheating.

That doesn't discount the potential for confusion of course, but we have embraced other subtle differences from JavaScript and as long as everything is defined in detail I'm okay with that.

from infra.

annevk commented on June 5, 2024

It would be great if everyone here could leave a short reply that is one of these:

Mutable (aka please make copying at the IDL boundary explicit and be done with it)
Immutable (aka please stick to the same constraints as programmers have to and don't encourage unnecessary copying)

That would unblock changes to Infra and IDL (which should start adopting the various types defined by Infra). Thanks!

from infra.

zcorpan commented on June 5, 2024

So for example in https://infra.spec.whatwg.org/#collect-a-sequence-of-code-points

this

Append that code point to the end of /result/.

would need to be something like

Set /result/ to /result/ concatenated with that code point.

from infra.

annevk commented on June 5, 2024

That is what immutable would end up requiring yes. (I've since found lots of places in the URL Standard that assume mutable strings and basically treat strings like lists, with appending and prepending being available. So personally I'm leaning towards mutable, even though immutable does seem cleaner.)

from infra.

zcorpan commented on June 5, 2024

OK. Immutable does seem cleaner in that the spec will more closely map to an implementation, which seems like it would be easier to reason about. OTOH I'm not aware of any cases where pretending strings are mutable in specs have caused bugs or problems.

from infra.

annevk commented on June 5, 2024

Feedback from smaug---- (intentionally not used @): "the reason mutable [in specs] is fine, IMO, is that it probably makes specs easier to read".

from infra.

domenic commented on June 5, 2024

We have IMO three options:

Mutable strings
Immutable strings
Something subtle where we say that within an algorithm strings are mutable, but when you pass them to another algorithm, a copy is made. (Or, when you receive them, implicitly at the top of your algorithm?) Thus changes in the original algorithm do not propagate to others.

In practice my guess is that (3) matches existing specs and peoples' intuitions. We'd have to hand-wave in Infra about "pass to another algorithm", perhaps defining that later if we want (e.g. as part of #92).

(1) might be simpler than (3) if we do a survey of specs and find that in fact strings are never updated after being handed off to other algorithms. I guess the main thing to look for would be "in parallel" algorithms operating on the same string? Or passing in e.g. keys from a map somewhere.

(1) can later be changed to (3) in a "non-breaking" fashion if we discover that it's a problem in practice.

If we do (3) we may want to add an explicit note about the parallel between this system and C++'s std::string + const std::string&. Not sure though.

(2)'s only downside is extra spec verbosity, but in a way that is familiar for programmers, so I am not sure it is that bad.

So in conclusion I am OK with any of these. I did want to point out (3) as an explicit option though.

from infra.

bzbarsky commented on June 5, 2024

I rather like (3), actually. It's basically the "mutable stringbuilder, immutable string" model, but with implicit coercions between them...

from infra.

jyasskin commented on June 5, 2024

I'd like to vote for @domenic's (3). However, the interaction with #139 is interesting, in that it's definitely right for a variable holding a Document or an object to be an alias, and it seems confusing for an algorithm parameter to be different from inlining the algorithm and storing that parameter as a variable.

As @annevk suggested in #139, maybe this is the difference between value and reference types, like how WebIDL says some types are "always passed by value". Strings would be value types, so variables and parameters holding them would copy, unless the specification explicitly says to make a reference ("Let path be a reference to foo.path"). Then I think we'd want to use Rust's rule that you can mutate a string as long as you have the only reference to it.

from infra.

String / byte sequence instance manipulation about infra HOT 18 OPEN

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs