Comments (14)
from url.
This additional signature could work:
bool parse_uri( url& dest, string_view s, error_code& ec, parse_options const& opt = {} );
It would require a separate bnf to handle non-compliant queries.
from url.
We need to reach an agreement before fixing that.
My proposed solution is we differentiate between producers url
and consumers url_view
as defined by the RFC. url
would always encode most gen-delims
but url_view
would accept unencoded gen-delims
that are not ambiguous without any "loose" parsing mode.
I have two reasons and some evidence for each:
- The reserved characters change depending on the URL component. Even for producers (
url
), the RFC allows more than the reserved characters in some subcomponents.- The general case forbids
gen-delims
: Of the ASCII character set, the characters: / ? # [ ] @
(gen-delims) are reserved for use as delimiters of the generic URI components and must be percent-encoded – for example,%3F
for a question mark. RFC3986 2.2 - The general case allows
sub-delims
that are not ambiguous:- The characters
! $ & ' ( ) * + , ; =
are permitted by generic URI syntax to be used unencoded in the user information, host, and path as delimiters. RFC3986 3.2.2 and RFC3986 3.3 - Additionally,
:
and@
may appear unencoded within the path, query, and fragment; and?
and/
may appear unencoded as data within the query or fragment. RFC3986 3.3, RFC3986 3.4, and RFC3986 3.5
- The characters
- The general case forbids
- Consumers should accept reserved characters that are not ambiguous. For producers (
url
), the RFC tells us to usually encode the reserved charactersgen-delims
, but it also says very often consumers (url_view
) should accept reserved characters that not ambiguous in that component.- RFC3986 and RFC2396 define a difference between producers and consumers, even though they talk much more about producers and these references are sparse.
- Producers should use unencoded chars sometimes
- Even for producers, it is sometimes recommended for usability to avoid percent-encoding some reserved characters in
sub-delims
. RFC3986 3.4
- Even for producers, it is sometimes recommended for usability to avoid percent-encoding some reserved characters in
- Consumers should accept unencoded chars that are not ambiguous. There's no need for a "loose" parsing mode.
- The regular expression
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
is also considered a valid URL for consumers. This includes reserved delimiters that are not ambiguous. RFC3986 B - Everything between the first
?
and the first#
fits the spec's definition of a query. It can include any characters such as: / . ?
. RFC3986 3.4 and SO question - HTML establishes that a form submitted via HTTP GET should encode the form values as name-value pairs in the form "
?key1=value1&key2=value2...
" (properly encoded). Parsing of the query string is up to the server-side code (e.g. Java servlet engine). URIs should support that. - Accepting reserved chars that are not ambiguous is common practice for consumers: I replicated the consumer algorithm used by Apache here. It basically accepts anything that is not ambiguous. For instance, anything after
#
is a valid fragment. For instance, anything but#
after?
is a valid query, and so on. All other libraries I checked, including Apache, Javascript URL and folly, present the same behaviour. - I don't know what was on their mind when allowing consumers to accept non-ambiguous delimiters, but this relaxation allows parsers to be faster. For instance, the Apache algorithm just looks for the delimiters
?
and then#
, and something similar happens for other components.
- The regular expression
from url.
I think we should focus on the query instead of broadening the question to the entire URL
from url.
I think we should focus on the query instead of broadening the question to the entire URL
We could just change the query BNF to accept [
and ]
and fix this. It's just that related issues keep coming up all the time and they're probably not going to stop.
from url.
I think that the grammar for query should accept any unescaped character except the pound sign ( #
), and that any percent sign ( %
) must be followed by two valid hex digits. When converting an unencoded string into a percent-encoded query string, it should use the general character specified in the RFC:
query = *( pchar / "/" / "?" )
from url.
I think that the grammar for query should accept any unescaped character except the pound sign ( # )
This is what #124 ended up doing. The only reserved chars it didn't accept before were ['#', '[', ']']
.
any percent sign ( % ) must be followed by two valid hex digits
We should probably expand that to other components
from url.
We should probably expand that to other components
I think it already works that way, right ?
from url.
I think it already works that way, right ?
key_chars
accepts whatever is in unreserved_chars + subdelim_chars + ':' + '@' + + '/' + '?'- '&' - '='
. You mean we should wrap it pct_encoded_rule
like pct_encoded_rule<fragment_chars_t>
, right?
from url.
If a query doesn't need to be interpreted as params, why do we parse a range of key/value pairs in query_rule
? If we remove this constraint, we could parse a query_rule
as pct_encoded_rule<query_chars_t>
and just make it:
constexpr
query_chars_t() noexcept
: grammar::lut_chars(
pchars + '/' + '?' + '[' + ']')
{
}
or something directly more permissive like
constexpr
query_chars_t() noexcept
: grammar::lut_chars(
pchars + gen_delim_chars - '#')
{
}
from url.
Because we need to know how many key/value pairs there are
from url.
Because we need to know how many key/value pairs there are
OK. As key_chars
and value_chars
are already wrapped in pct_encoded_rule
, I believe PR #124 is ready then.
pct_encoded_rule<
query_rule::key_chars> t0;
pct_encoded_rule<
query_rule::value_chars> t1;
from url.
Hi! Although this issue is closed, I'm encountering a similar problem. Could you clarify whether square brackets are accepted as part of a query? My test fails in boost 1.84:
auto origin1 = "/path/path?key=value";
auto rv1 = boost::urls::parse_origin_form( origin1 );
BOOST_CHECK( rv1.has_value() );
auto origin2 = "/path/path?key[]=value";
auto rv2 = boost::urls::parse_origin_form( origin2 );
BOOST_CHECK( rv2.has_value() ); // fails
BOOST_CHECK( rv2.error().message() == "leftover" ); // passes
from url.
@mkarasevych Thanks for reporting that.
Yes. It seems like #124 wasn't enough. I'll have a look.
from url.
Related Issues (20)
- Const integer makes format fail
- Boost.URL as subproject should not automatically add dependencies
- Boost CMake testing procedure doesn't work for URL
- Reconfiguring with BUILD_TESTING=OFF doesn't disable tests
- `boost::urls::resolve` gives wrong result when there are more `..`s in relative reference HOT 6
- sanitize_uri moves host to path
- Slash in query param not being encoded as %2F HOT 5
- docs build tmp files HOT 4
- UB Sanitizer implicit-integer-sign-change warning in boost::urls::grammar::detail::find_if_not_pred HOT 6
- Missing coverage
- Source files should not include header guards
- coverage job is generating an empty file
- detail symbol in reference
- Missing codecov token in GHA
- Improve coverage
- When compiling the boost.url libs comes the errors HOT 3
- Exclude tests from Antora compile commands
- craypp crash compiling segments_view.cpp HOT 1
- Test libraries and executables should be declared EXCLUDE_FROM_ALL
- Fix security vulnerabilities detected in Antora docs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from url.