Comments (14)
I'd prefer to just escape only the last trailing space. It's simpler and the algorithm would be faster.
So I don't see any advantage in escaping all the trailing spaces, or am I wrong?
from url.
To illustrate using JSDOM:
const urlRecord = parseURL("data:space #hello");
const serialized = serializeURL(urlRecord, /* excludeFragment: */ true);
const reparsed = parseURL(serialized);
const serializedAgain = serializeURL(reparsed, /* excludeFragment: */ true);
assert.strictEqual(serialized, serializedAgain); // Fails
assert.strictEqual(received, expected)
Expected value to strictly be equal to:
"data:space"
Received:
"data:space "
from url.
Nice catch!
We could also try to encode a trailing space, maybe. Or maybe we already considered and discarded that last time for a good reason?
Within a browser serializing without a fragment is mainly done for networking-related code (and thus https:
), but data:
URLs could be impacted I suppose.
from url.
I don't see any discussion last time about escaping trailing spaces. I suggested escaping all spaces, but that was considered likely incompatible with the web. I think it should be fine.
That said, I would suggest that we escape all trailing spaces (not just the final trailing space). I can't imagine that escaping just the final space would be any better for compatibility.
from url.
@achristensen07 @valenting @hayatoito thoughts on how to best tackle this? Given the discussion so far I see these options:
- Always replace a trailing space in an opaque path with
%20
. - Replace a trailing space with
%20
when it becomes problematic, such as when serializing or using one of the setters. (This would also change setter behavior away from removing trailing spaces.) - Only replace a trailing space with
%20
when serializing, but keep removing spaces when using the setters.
And then variants of 1/2/3 where we instead replace all trailing spaces with %20
, let's call those 1b/2b/3b.
I like the simplicity of 1 personally.
from url.
I'd prefer to just escape only the last trailing space. It's simpler and the algorithm would be faster.
So I don't see any advantage in escaping all the trailing spaces, or am I wrong?
So, I think there is probably broad agreement that unescaped spaces are not ideal. Unfortunately, they are required for web compatibility, so we have to allow them as much as possible.
As part of this change, one or more trailing spaces that would previously have been unescaped would now be escaped. That's a change which has the potential to break some applications, but it is equally breaking whether we escape one trailing space or all trailing spaces. There is no compatibility advantage to only escaping a single space - the same applications would break either way, and the remedy would be the same.
I can't think of another time where we escape only a single occurrence of a particular character, and leave all other instances of that character in the same URL component without escaping; generally it's always "code point X is escaped in component Y". So both possibilities add not-insignificant complexity - but, if I were a developer working these kinds of URLs, I think it would be overall simpler and more predictable if the parser escaped all of those trailing spaces for me at once. Then I could do my processing (perhaps removing some of those trailing spaces), and I wouldn't keep seeing the parser add escaping for me all the time.
As for performance? The difference is negligible. We're talking about an edge case of an edge case of an edge case (multiple occurrences of unescaped trailing spaces in a URL with opaque path); the important thing is that the more common scenarios can be fast-pathed, and they can.
from url.
@karwa however, the current proposal gives data:text/html,blah blah%20
for data:text/html,blah blah
so it's not quite correct that all spaces end up replaced. So we might as well do what is simpler.
from url.
Right. The question is only about multiple trailing unescaped spaces. If the source only contains a single trailing space to begin with, the result would be the same.
Result | |
---|---|
Source | data:blah blah ?q |
Escape single trailing | data:blah blah %20?q |
Escape all trailing | data:blah blah%20%20?q |
I think escaping all trailing spaces leads to a simpler and more predictable outcome (and fewer unescaped spaces, even if we can't escape all of them) -- but again, it's an edge case of an edge case of an edge case, so it's not extremely important to me.
from url.
https://software.hixie.ch/utilities/js/live-dom-viewer/saved/12022 demonstrates this issue using XMLHttpRequest. Gecko strips trailing spaces and Chromium/WebKit do not (none escape). Perhaps this lack of interoperability suggests we have some freedom here to do something better?
@achristensen07 @valenting @hayatoito thoughts?
from url.
While I would prefer a consistent behaviour such as escaping all spaces in a path, I suspect it might be easier to just escape or strip the trailing spaces. I don't have any preference between the two.
from url.
I haven't had time to understand the issue yet.
Can't we consider that encoding a trailing space is a user's responsibility, instead of URL Standard's responsibility?
For example, the following is a bad practice. Spaces can be trimmed anytime after some accidental operations.
new URL("data:blah #hello")
Thus, we encourage users to escape trailing spaces by themselves before passing it to URL to avoid an accidental removal.
new URL("data:blah %20#hello");
Please correct me if I don't understand the issue.
from url.
You understand it correctly, the problem here is that we have an idempotence goal: https://url.spec.whatwg.org/#goals. And while the goals are not immutable, idempotence is a property I think we want to keep. There have been quite a few security issues with URL implementations that do not have that property.
from url.
Thanks! I didn't notice that idempotence is clearly stated as a goal in the URL Standard. I understand now.
- Always replace a trailing space in an opaque path with %20.
Proposal 1 means:
const url = new URL("data:blah #a");
assertEquals(url.pathname, "blah %20");
url.hash = "";
assertEquals(url.pathname, "blah %20");
, right? This sounds best to me (out of 1/2/3/1b/2b/3b).
However, as far as I understand, the following URLs (as a result of serialization) are not equivalent to each other:
- "data:blah "
- "data:blah%20%20"
- "data:blah %20"
So every option proposed here seems a technically breaking change.
- We always trim trailing spaces from opaque paths. Technically a breaking change, but overall it's better at ensuring everything stays consistent.
This doesn't seem a popular option here, however, this looks the simplest and easy-to-understand rule to me.
I assume we introduce a breaking change anyway.
from url.
We always trim trailing spaces from opaque paths. Technically a breaking change, but overall it's better at ensuring everything stays consistent.
This doesn't seem a popular option here, however, this looks the simplest and easy-to-understand rule to me.
Iād be fine with this, too.
from url.
Related Issues (20)
- "valid host string" does not allow for percent-encoding
- "valid domain" does not match validation errors in the host parser
- IdnaTestV2.json "xn--xn--a--gua.pt" test case problem HOT 4
- A '@' character in the host part of file URLs HOT 2
- An opaque-host parser and percent encoding HOT 2
- Use in HTTP2/3 Pseudo-Headers HOT 5
- URL path shortening for ../ creates problem with other URL parsers that do not follow the whatwg standard HOT 6
- API mechanism for reporting validity errors HOT 4
- URL path comparison
- Web compatibility issue with various unknown (external) protocols like ed2k HOT 10
- Clearly mark conformance checking-only aspects of the IDNA section
- Explain why valid domain needs to run ToUnicode HOT 3
- Hoist "forbidden domain code point" check into "domain to ASCII" HOT 4
- Encourage denoting character-attributable errors by the REPLACEMENT CHARACTER
- CheckHyphens isn't set to beStrict
- Initialize the IgnoreInvalidPunycode flag when calling UTS 46 HOT 3
- Inconsistency in Handling `special-scheme-missing-following-solidus` URLs HOT 5
- Punycode behavior for labels exceeding DNS length is ill-defined HOT 8
- deprecated `punycode` HOT 1
- Proposal: URL.setSearchParams() HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from url.