GithubHelp home page GithubHelp logo

modern-uri's Issues

Trailing slashes for URLs with no path

Hey,

Thanks for maintaining URI!

Is it intentional that a URI representing http://example.com (e.g. when parsed with mkURI) is rendered as http://example.com/? I find the trailing slashes a little ugly when showing URLs to people.

Cheers

Support for "data:" URI scheme?

I realised that round tripping a data uri breaks it. E.g.

import Text.URI

let uri = ""
in render <$> mkURI uri /= (pure uri) :: Maybe URI 

I assume the data-scheme is not RFCs compliant - so it's probably not a bug.

Nevertheless I wanted to let you know, since it took me half a day to figure out why some of these uri's are broken. Maybe it is possible to have them round-trip correctly (e.g. by allowing empty path elements) or at least fail completely? Silently dropping slashes seems to me like unintended behaviour.

Doesn't build with GHC-9.0

The modern-uri library doesn't build with ghc-9.0-rc1. When trying to build with the new compiler, the following compilation errors occur:

Text/URI/Types.hs:126:10: error:
    • Couldn't match type ‘m’ with ‘TH.Q’
      Expected: URI -> m TH.Exp
        Actual: URI -> TH.Q TH.Exp
      ‘m’ is a rigid type variable bound by
        the type signature for:
          TH.lift :: forall (m :: * -> *). TH.Quote m => URI -> m TH.Exp
        at Text/URI/Types.hs:126:3-6
    • In the expression: liftData
      In an equation for ‘TH.lift’: TH.lift = liftData
      In the instance declaration for ‘TH.Lift URI’
    • Relevant bindings include
        lift :: URI -> m TH.Exp (bound at Text/URI/Types.hs:126:3)
    |
126 |   lift = liftData
    |          ^^^^^^^^

Text/URI/Types.hs:129:15: error:
    • Couldn't match type ‘TH.TExp a0’ with ‘URI’
      Expected: URI -> TH.Code m URI
        Actual: URI -> TH.Code m (TH.TExp a0)
    • In the expression: TH.unsafeTExpCoerce . TH.lift
      In an equation for ‘TH.liftTyped’:
          TH.liftTyped = TH.unsafeTExpCoerce . TH.lift
      In the instance declaration for ‘TH.Lift URI’
    |
129 |   liftTyped = TH.unsafeTExpCoerce . TH.lift
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Text/URI/Types.hs:169:10: error:
    • Couldn't match type ‘m’ with ‘TH.Q’
      Expected: Authority -> m TH.Exp
        Actual: Authority -> TH.Q TH.Exp
      ‘m’ is a rigid type variable bound by
        the type signature for:
          TH.lift :: forall (m :: * -> *).
                     TH.Quote m =>
                     Authority -> m TH.Exp
        at Text/URI/Types.hs:169:3-6
    • In the expression: liftData
      In an equation for ‘TH.lift’: TH.lift = liftData
      In the instance declaration for ‘TH.Lift Authority’
    • Relevant bindings include
        lift :: Authority -> m TH.Exp (bound at Text/URI/Types.hs:169:3)
    |
169 |   lift = liftData
    |          ^^^^^^^^

Text/URI/Types.hs:172:15: error:
    • Couldn't match type ‘TH.TExp a1’ with ‘Authority’
      Expected: Authority -> TH.Code m Authority
        Actual: Authority -> TH.Code m (TH.TExp a1)
    • In the expression: TH.unsafeTExpCoerce . TH.lift
      In an equation for ‘TH.liftTyped’:
          TH.liftTyped = TH.unsafeTExpCoerce . TH.lift
      In the instance declaration for ‘TH.Lift Authority’
    |
172 |   liftTyped = TH.unsafeTExpCoerce . TH.lift
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Text/URI/Types.hs:195:10: error:
    • Couldn't match type ‘m’ with ‘TH.Q’
      Expected: UserInfo -> m TH.Exp
        Actual: UserInfo -> TH.Q TH.Exp
      ‘m’ is a rigid type variable bound by
        the type signature for:
          TH.lift :: forall (m :: * -> *). TH.Quote m => UserInfo -> m TH.Exp
        at Text/URI/Types.hs:195:3-6
    • In the expression: liftData
      In an equation for ‘TH.lift’: TH.lift = liftData
      In the instance declaration for ‘TH.Lift UserInfo’
    • Relevant bindings include
        lift :: UserInfo -> m TH.Exp (bound at Text/URI/Types.hs:195:3)
    |
195 |   lift = liftData
    |          ^^^^^^^^

Text/URI/Types.hs:198:15: error:
    • Couldn't match type ‘TH.TExp a2’ with ‘UserInfo’
      Expected: UserInfo -> TH.Code m UserInfo
        Actual: UserInfo -> TH.Code m (TH.TExp a2)
    • In the expression: TH.unsafeTExpCoerce . TH.lift
      In an equation for ‘TH.liftTyped’:
          TH.liftTyped = TH.unsafeTExpCoerce . TH.lift
      In the instance declaration for ‘TH.Lift UserInfo’
    |
198 |   liftTyped = TH.unsafeTExpCoerce . TH.lift
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Text/URI/Types.hs:221:10: error:
    • Couldn't match type ‘m’ with ‘TH.Q’
      Expected: QueryParam -> m TH.Exp
        Actual: QueryParam -> TH.Q TH.Exp
      ‘m’ is a rigid type variable bound by
        the type signature for:
          TH.lift :: forall (m :: * -> *).
                     TH.Quote m =>
                     QueryParam -> m TH.Exp
        at Text/URI/Types.hs:221:3-6
    • In the expression: liftData
      In an equation for ‘TH.lift’: TH.lift = liftData
      In the instance declaration for ‘TH.Lift QueryParam’
    • Relevant bindings include
        lift :: QueryParam -> m TH.Exp (bound at Text/URI/Types.hs:221:3)
    |
221 |   lift = liftData
    |          ^^^^^^^^

Text/URI/Types.hs:224:15: error:
    • Couldn't match type ‘TH.TExp a3’ with ‘QueryParam’
      Expected: QueryParam -> TH.Code m QueryParam
        Actual: QueryParam -> TH.Code m (TH.TExp a3)
    • In the expression: TH.unsafeTExpCoerce . TH.lift
      In an equation for ‘TH.liftTyped’:
          TH.liftTyped = TH.unsafeTExpCoerce . TH.lift
      In the instance declaration for ‘TH.Lift QueryParam’
    |
224 |   liftTyped = TH.unsafeTExpCoerce . TH.lift
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Text/URI/Types.hs:267:10: error:
    • Couldn't match type ‘m’ with ‘TH.Q’
      Expected: RText l -> m TH.Exp
        Actual: RText l -> TH.Q TH.Exp
      ‘m’ is a rigid type variable bound by
        the type signature for:
          TH.lift :: forall (m :: * -> *). TH.Quote m => RText l -> m TH.Exp
        at Text/URI/Types.hs:267:3-6
    • In the expression: liftData
      In an equation for ‘TH.lift’: TH.lift = liftData
      In the instance declaration for ‘TH.Lift (RText l)’
    • Relevant bindings include
        lift :: RText l -> m TH.Exp (bound at Text/URI/Types.hs:267:3)
    |
267 |   lift = liftData
    |          ^^^^^^^^

Text/URI/Types.hs:270:15: error:
    • Couldn't match type: TH.TExp a4
                     with: RText l
      Expected: RText l -> TH.Code m (RText l)
        Actual: RText l -> TH.Code m (TH.TExp a4)
    • In the expression: TH.unsafeTExpCoerce . TH.lift
      In an equation for ‘TH.liftTyped’:
          TH.liftTyped = TH.unsafeTExpCoerce . TH.lift
      In the instance declaration for ‘TH.Lift (RText l)’
    • Relevant bindings include
        liftTyped :: RText l -> TH.Code m (RText l)
          (bound at Text/URI/Types.hs:270:3)
    |
270 |   liftTyped = TH.unsafeTExpCoerce . TH.lift
    |

How to concatenate parts of URI?

I wonder if there some concatenating operators for parts of complex URI? Similar to </> from filepath library. For example, I want to have:

  1. https://markkarpov.com
  2. https://markkarpov.com/foo
  3. https://markkarpov.com/bar
  4. https://markkarpov.com/bar/baz

Sure, I can write four hardcoded quasi quotes. But I wonder what is preferred and simplest way to do this reusing first URI?

I found mkPathPiece smart constructor but have no idea how to reuse it...

File URIs are not parsed

I wasn't sure if file:///... is actually a valid URI so checked the RFC and it gives this example: file:///etc/hosts (section 1.1). However modern-uri can't parse this:

Prelude Text.URI> mkURI "file:///etc/hosts"
*** Exception: ParseException "file:///etc/hosts" (TrivialError (SourcePos {sourceName = "", sourceLine = Pos 1, sourceColumn = Pos 8} :| []) (Just (Tokens ('/' :| ""))) (fromList [Tokens ('%' :| ""),Tokens ('[' :| ""),Label ('A' :| "SCII alpha-numeric character"),Label ('i' :| "nteger"),Label ('u' :| "sername")]))

For comparison, network-uri parses this:

Prelude Network.URI> parseURI "file:///etc/hosts"
Just file:///etc/hosts

Expose parsers of certain classes of URIs

e.g. for absolute URIs only, etc. One can always validate after the fact, but that is potentially less efficient. Might be easier to add a Parser or Internal module exposing all the bits for people to make their own sub parsers.

CC @luigy

relativeTo and quasi-quotation

In the current implementation, relativeTo returns Nothing if the base URI doesn't have a scheme. If the base URI is provided at compile time, I can know that it has a scheme, but there's no way for relativeTo to know. That leaves me in an awkward situation with a Maybe URI that I "know" will contain a value, but unwrapping it with a partial function is brittle, because either the base URI or the implementation of relativeTo could change in the future. Would it be possible to generate a function for a base URI provided at compile time, so that it always succeeds at runtime (no Maybe)? Something like this:

relativeToTest :: URI -> URI
relativeToTest = [relativeToUri|http://test.com/|]

Empty path with a trailing slash

I don't see a way to represent an URLs with the trailing slash only in the path, e.g. https://example.com/. Parsing these URLs and rendering them gives different results, so the following 2 tests fail:

    it "1" $
      render [QQ.uri|https://example.com/|] `shouldBe` "https://example.com/"
    it "2" $
      let given = "https://example.com/"
       in (render <$> mkURI given) `shouldBe` Just given

If uriPath is Nothing I can't specify whether I want a trailing slash or not. Otherwise, I can't give an empty path since path pieces are a NonEmpty list.

Use newtypes instead of `RText`

Hi, would you accept a patch that replaces the RText machinery with a bunch of newtypes? i.e.

newtype Scheme = Scheme { unScheme :: Text }
newtype Host = Host { unHost :: Text }
... etc ...

I think it would make the library a bit friendlier to beginners :)

Bug parsing numeric subdomains

This gives weird behaviour:

> mkURI "https://104.155.144.4.sslip.io:443/"

The host gets chopped and half of it along with the port end up in the uriPath:

URI {
  uriScheme = Just "https",
  uriAuthority = Right (Authority {authUserInfo = Nothing, authHost = "104.155.144.4", authPort = Nothing}),
  uriPath = Just (True,".sslip.io:443" :| []),
  uriQuery = [],
  uriFragment = Nothing
}

I think it may be true that according to some RFCs there shouldn't be numbers in subdomains, but in practice these types of domains exist and work fine everywhere.

In any case, when I roundtrip the current result we get back the following, which is different than the original:

> render uri
"https://104.155.144.4/.sslip.io%3a443/"

Btw the example url I am using is a real url, part of the sslip.io service: https://104.155.144.4.sslip.io:443/

Thank you

Add tutorial or at least some examples

For person who never worked previously with uri or uri-bytestring or Haskell beginner it's not very easy to understand how to use modern-uri package and why one should do it.

Hashable instance

In your next release, it would be great if you could include a Hashable instance for URI :)

`mkURI` fails to correctly parse uri when hostname contains a "_"

Is this the expected behavior?

image

I might be missing something, but the hostname of that uri is auth_service. mkURI seems to be splitting it at the underscore for some reason.

Here is a screenshot showing it correctly handles the uri when I replace auth_service with auth-service:

image

appending paths to urls

One thing I am having a hard time seeing is how does one add a relative path to a uri.

For me it would be quite helpful to have a simple combinator like:

url +/+ dir (ie 'https://example.org/pub' +/+ 'some/subdir' == 'https://example.org/pub/some/subdir')

I dunno if the lens module makes this easier - but I generally try to avoid them.

Actually when I use relativeTo it replaces the path from the base URI with that of the relative URI path.
Is there any smarter way to handle such changes?

Allow `[` and `]` in the query component

This is a follow up to mrkkrp/req#102. I am going to try to convince you that modern-uri implements RFC 3986 too naively.

I will base my case on the two main arguments:

  1. The specification is ambiguous (or, at least, it is unclear for me personally how to interpret it).
  2. There is practice and it differs from what is currently implemented.

The specification

The Reserved Characters section lists [ and ] as gen-delims, which is part of reserved. Then it says:

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component.

This specification does not explicitly state that it is using RFC 2119 keywords, so it is not entirely clear to me how to interpret this “should”, but, I believe, the best option we have is to interpret it as per RFC 2119, and thus web-apps not escaping square brackets are likely in compliance. This alone should be enough to allow them.

It then goes on:

If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.

Now this “no delimiting role is known” part is pretty mysterious. The paragraph right above the one I have been quoting seems to be an attempt at explaining what this means, but I am having a hard time understanding it.

If we go now to the Query section, it reads:

The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.

which suggest that anything that is not a # is not a delimiter or, at least, the delimiting role of [ and ] is unknown (to me 🙂). This part of text may be non-normative and it contradicts the ABNF right below it, but the specification never explains what is normative and what is informative. The intent seems to be to disallow the square brackets, but it is not that obvious, in my opinion.

Given how ambiguous all this is, I am not convinced that implementing exactly the ABNFs from the specification is enough.

The real world

The discussion in the req issue was based around the handling of square brackets in real-world browsers. A couple of years ago there was a discussion about this in the Firefox issue tracker and there is a fantastic survey of behaviours of various browsers.

They surveyed specifically what the browsers were sending to the server, while the discussion on req was about “my browser converts” – I am not sure what exactly you were testing. Note that the actual URL location in that issue was already encoded (likely, by GitHub’s markdown processor): the HTML code of the link was <a href="https://gitlab.com/morley-framework/morley/-/issues/new?issue%5Btitle%5D=Indigo%20website:%20" rel="nofollow">https://gitlab.com/morley-framework/morley/-/issues/new?issue[title]=Indigo%20website:%20</a>. And even when I follow this encoded link in Firefox, it decodes the brackets. This is a result of that survey I linked – as you can see, Firefox was the only browser forcefully encoding square brackets, so this was considered a bug and was changed.

I can’t make a hyperlink with unencoded brackets here on GitHub, but if you paste https://gitlab.com/morley-framework/morley/-/issues/new?issue[title]=Indigo%20website:%20 into the address bar of your Google Chrome it will (I hope – at least that is what I see testing in Chromium) not encode the brackets and send them as is.

Thus, we can see that, regardless of what the specification has to say, in practice the square brackets are thought to be allowed in the query component and not allowing them was fixed as a bug in a major web-browser.

Colons in path are escaped

Version: modern-uri-0.3.4.4 (also occurs on older)

Colons in path pieces are percent-encoded, while it seems to me from https://www.rfc-editor.org/rfc/rfc3986#section-3.3 that they can appear unencoded from the second path piece on.

[nix-shell:~/]$ ghci
GHCi, version 9.0.2: https://www.haskell.org/ghc/  :? for help
ghci> import Data.Text
ghci> import Text.URI
ghci> Right u = mkURI (pack "https://mybusinessbusinessinformation.googleapis.com/v1/categories:batchGet")
ghci> render u
"https://mybusinessbusinessinformation.googleapis.com/v1/categories%3abatchGet"

This gives issues with f.e. Google, which uses colons in paths but does not accept the percent-encoded variant.

Port numbers validation

I don't see any limitations on port numbers in https://tools.ietf.org/html/rfc3986 itself, but I'd expect parsing to fail when port numbers are >= 2^16, at least for the schemes that declare ports to be 16-bit numbers. Is this a bug?

> mkURI "http://localhost:65535"
URI {uriScheme = Just "http", uriAuthority = Right (Authority {authUserInfo = Nothing, authHost = "localhost", authPort = Just 65535}), uriPath = Nothing, uriQuery = [], uriFragment = Nothing}

> mkURI "http://localhost:65536"
URI {uriScheme = Just "http", uriAuthority = Right (Authority {authUserInfo = Nothing, authHost = "localhost", authPort = Just 65536}), uriPath = Nothing, uriQuery = [], uriFragment = Nothing}

> mkURI "http://localhost:65536000"
URI {uriScheme = Just "http", uriAuthority = Right (Authority {authUserInfo = Nothing, authHost = "localhost", authPort = Just 65536000}), uriPath = Nothing, uriQuery = [], uriFragment = Nothing}

Add an `updateQueryParams` function

Heya, thanks for the great library. Working with it I need to "update" the value of an existing QueryParam from a URI, and while writing a function to do this, it surprised me that something like it didn't exist in the library.

I'm not sure if you think this would be a nice thing to add to this library, or perhaps this is easy to do somehow with lenses (I do not know lenses very well, so it might be and I just don't know).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.