GithubHelp home page GithubHelp logo

Comments (12)

mrkkrp avatar mrkkrp commented on June 1, 2024

Dropping consecutive slashes in paths is a normalization procedure.

from modern-uri.

janvogt avatar janvogt commented on June 1, 2024

I see. But it makes this lib unusable in contexts where data: URIs might exist. If that's intended, this can be closed.

from modern-uri.

mrkkrp avatar mrkkrp commented on June 1, 2024

The problem here is that the slashes end up being interpreted as delimiters in a path. The uri value doesn't look like a valid URI to me.

from modern-uri.

janvogt avatar janvogt commented on June 1, 2024

As far as I understand RFCs, it is valid though. See the example at RFC 2397 which also contains consecutive slashes.

from modern-uri.

janvogt avatar janvogt commented on June 1, 2024

I'd really like to use the library, but since data: URIs are pretty ubiquitous in modern web, this library is unusable for manipulating websites without support for it. Maybe @mrkkrp can confirm that supporting RFC 2397 is out of scope for this project?

from modern-uri.

mrkkrp avatar mrkkrp commented on June 1, 2024

Would you like to open a PR to add support for data: scheme?

from modern-uri.

janvogt avatar janvogt commented on June 1, 2024

Actually, I did a little more digging. Let me tell you, that I become to appreciate this library striving to follow the general URI handling rules laid out in RFC 3986. I think it would be wrong to handle scheme specific cases here, be it data: or something else.

Problem

However, the normalisation of removing // (or more technically: empty path segments), while o.K. for http(s): et. al., is not warranted for generic URIs:

Section 1.2.3 states about hierarchical paths:

For some URI schemes, the visible hierarchy is limited to the scheme itself:
everything after the scheme component delimiter (":") is considered
opaque to URI processing. Other URI schemes make the hierarchy
explicit and visible to generic parsing algorithms.

Section 3.3 defines path segments as zero or more characters:

segment = *pchar

Section 6.1 states about normalisation (emphasis mine):

Because URIs exist to identify resources, presumably they should be
considered equivalent when they identify the same resource. However,
this definition of equivalence is not of much practical use, as there
is no way for an implementation to compare two resources unless it
has full knowledge or control of them. For this reason,
determination of equivalence or difference of URIs is based on string
comparison, perhaps augmented by reference to additional rules
provided by URI scheme definitions.

and the central guiding principle for normalisation:

Therefore, comparison methods are designed to minimize false negatives while strictly avoiding false positives.

Removing // in data: URIs though, leads to false positives and is thusly non compliant.

Solutions

I see two main solutions, going forward

  1. Allow empty path segments in the general case.

This conforms to the standard and should make the roundtrip of the OP possible. I'd see this as the correct(tm) way to solve this issue. The problem though is, that it might break dependent code, that assumes the normalization as currently performed. We could provide a normalizePath :: URI -> URI function to make the fix as easy as possible.

  1. Allow empty path in the general case and perform normalisation for http(s): and ftp:

This would be a more compatible solution. The problem is, that it's unprincipled: so for these protocols we provide normalization on top of the RFC and for other (hierarchical) protocols we don't?

  1. Allow empty path in the general case and perform normalisation for all known protocols that allow it.

This would be the most compatible solution. The problem though is that it requires a lot of research. Also shouldn't we then other protocol specific normalisation as well, to improve the Eq instance as much as possible? This would need even more research.

  1. Do nothing

This is the worst option IMO because it breaks the compliance of this, otherwise pretty nice, URI library.

Offer

I'd be happy to take a stab at either solution 1 or 2 and provide a pull request. Solution 3 would require to much research to commit on doing it alone. Option 4 does not involve any work.

from modern-uri.

janvogt avatar janvogt commented on June 1, 2024

One more thing: IMHO the roundtrip property

import Text.URI

prop_roundtrip :: Text -> Bool
prop_roundtrip uri = render <$> mkURI uri == (pure uri)

should really hold for all possible values of uri. That is either parsing fails or the URL can be reproduced exactly as it was. This is incompatible with implicit normalisation, e.g. all solutions but number 1.

from modern-uri.

mrkkrp avatar mrkkrp commented on June 1, 2024

I think the best way forward is starting with 2 and gradually as users request normalization for more schemes we will move forward to 3. If I understand correctly the slashes in your original example are just part of the binary data, they are not delimiters of a path?

from modern-uri.

janvogt avatar janvogt commented on June 1, 2024

Yes indeed it's just base64 encoded binary. Which is totally fine with the generic parsing algorithm, as long as there is no normalisation happening by incorrectly assuming a hierarchy. If I understand RFC 3986 correctly, the whole part after the data: scheme is considered as path in this case, though not an hierarchical one.

Just to be clear: Solution (2) and (3) are out of the RFC 3986 spec, insofar as it recommends only basic string comparison for equivalence checks in the generic case.

Additionally having some normalisation for some protocols, seems to be a little confusing. But if that's the way to go forward I'll try to implement a PR for solution (2)

from modern-uri.

mrkkrp avatar mrkkrp commented on June 1, 2024

Yes, please go ahead with (2) 👍

from modern-uri.

janvogt avatar janvogt commented on June 1, 2024

Ok, I'm on it.

from modern-uri.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.