I realised that round tripping a data uri breaks it. E.g. <div class="highlight hi

One more thing: IMHO the roundtrip property <div class="highlight highlight-source

Support for "data:" URI scheme? about modern-uri HOT 12 OPEN

janvogt commented on June 1, 2024

Support for "data:" URI scheme?

from modern-uri.

Comments (12)

mrkkrp commented on June 1, 2024

Dropping consecutive slashes in paths is a normalization procedure.

from modern-uri.

janvogt commented on June 1, 2024

I see. But it makes this lib unusable in contexts where data: URIs might exist. If that's intended, this can be closed.

from modern-uri.

mrkkrp commented on June 1, 2024

The problem here is that the slashes end up being interpreted as delimiters in a path. The uri value doesn't look like a valid URI to me.

from modern-uri.

janvogt commented on June 1, 2024

As far as I understand RFCs, it is valid though. See the example at RFC 2397 which also contains consecutive slashes.

from modern-uri.

janvogt commented on June 1, 2024

I'd really like to use the library, but since data: URIs are pretty ubiquitous in modern web, this library is unusable for manipulating websites without support for it. Maybe @mrkkrp can confirm that supporting RFC 2397 is out of scope for this project?

from modern-uri.

mrkkrp commented on June 1, 2024

Would you like to open a PR to add support for data: scheme?

from modern-uri.

janvogt commented on June 1, 2024

Actually, I did a little more digging. Let me tell you, that I become to appreciate this library striving to follow the general URI handling rules laid out in RFC 3986. I think it would be wrong to handle scheme specific cases here, be it data: or something else.

Problem

However, the normalisation of removing // (or more technically: empty path segments), while o.K. for http(s): et. al., is not warranted for generic URIs:

Section 1.2.3 states about hierarchical paths:

For some URI schemes, the visible hierarchy is limited to the scheme itself:
everything after the scheme component delimiter (":") is considered
opaque to URI processing. Other URI schemes make the hierarchy
explicit and visible to generic parsing algorithms.

Section 3.3 defines path segments as zero or more characters:

segment = *pchar

Section 6.1 states about normalisation (emphasis mine):

Because URIs exist to identify resources, presumably they should be
considered equivalent when they identify the same resource. However,
this definition of equivalence is not of much practical use, as there
is no way for an implementation to compare two resources unless it
has full knowledge or control of them. For this reason,
determination of equivalence or difference of URIs is based on string
comparison, perhaps augmented by reference to additional rules
provided by URI scheme definitions.

and the central guiding principle for normalisation:

Therefore, comparison methods are designed to minimize false negatives while strictly avoiding false positives.

Removing // in data: URIs though, leads to false positives and is thusly non compliant.

Solutions

I see two main solutions, going forward

Allow empty path segments in the general case.

This conforms to the standard and should make the roundtrip of the OP possible. I'd see this as the correct(tm) way to solve this issue. The problem though is, that it might break dependent code, that assumes the normalization as currently performed. We could provide a normalizePath :: URI -> URI function to make the fix as easy as possible.

Allow empty path in the general case and perform normalisation for http(s): and ftp:

This would be a more compatible solution. The problem is, that it's unprincipled: so for these protocols we provide normalization on top of the RFC and for other (hierarchical) protocols we don't?

Allow empty path in the general case and perform normalisation for all known protocols that allow it.

This would be the most compatible solution. The problem though is that it requires a lot of research. Also shouldn't we then other protocol specific normalisation as well, to improve the Eq instance as much as possible? This would need even more research.

Do nothing

This is the worst option IMO because it breaks the compliance of this, otherwise pretty nice, URI library.

Offer

I'd be happy to take a stab at either solution 1 or 2 and provide a pull request. Solution 3 would require to much research to commit on doing it alone. Option 4 does not involve any work.

from modern-uri.

janvogt commented on June 1, 2024

One more thing: IMHO the roundtrip property

import Text.URI

prop_roundtrip :: Text -> Bool
prop_roundtrip uri = render <$> mkURI uri == (pure uri)

should really hold for all possible values of uri. That is either parsing fails or the URL can be reproduced exactly as it was. This is incompatible with implicit normalisation, e.g. all solutions but number 1.

from modern-uri.

mrkkrp commented on June 1, 2024

I think the best way forward is starting with 2 and gradually as users request normalization for more schemes we will move forward to 3. If I understand correctly the slashes in your original example are just part of the binary data, they are not delimiters of a path?

from modern-uri.

janvogt commented on June 1, 2024

Yes indeed it's just base64 encoded binary. Which is totally fine with the generic parsing algorithm, as long as there is no normalisation happening by incorrectly assuming a hierarchy. If I understand RFC 3986 correctly, the whole part after the data: scheme is considered as path in this case, though not an hierarchical one.

Just to be clear: Solution (2) and (3) are out of the RFC 3986 spec, insofar as it recommends only basic string comparison for equivalence checks in the generic case.

Additionally having some normalisation for some protocols, seems to be a little confusing. But if that's the way to go forward I'll try to implement a PR for solution (2)

from modern-uri.

mrkkrp commented on June 1, 2024

Yes, please go ahead with (2) 👍

from modern-uri.

janvogt commented on June 1, 2024

Ok, I'm on it.

from modern-uri.

Support for "data:" URI scheme? about modern-uri HOT 12 OPEN

Comments (12)

Problem

Solutions

Offer

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs