GithubHelp home page GithubHelp logo

httpwg / http-core Goto Github PK

View Code? Open in Web Editor NEW
463.0 51.0 43.0 78.44 MB

Core HTTP Specifications

Home Page: https://httpwg.org/http-core/

Shell 1.57% Perl 0.82% XSLT 92.15% Makefile 3.26% C 1.29% M4 0.05% Yacc 0.46% Lex 0.39% Awk 0.02%
http ietf rfc standards

http-core's Introduction

HTTP Core Documents

CI

This is the working area for the IETF HTTP Working Group's documents that define the HTTP protocol. See also our extensions repository.

Pull requests and issues are welcome. See our contribution guidelines for information about how to participate. The building instructions explain how to build the drafts from source -- but please only create PRs against the XML (i.e., do not update the HTML).

Be aware that all contributions to our work fall under the "NOTE WELL" terms therein.

Status

The HTTP "core" documents have been published; see the specification listing

http-core's People

Contributors

ioggstream avatar kaduk avatar lpardue avatar martinthomson avatar mikebishop avatar mnot avatar reschke avatar royfielding avatar ylafon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

http-core's Issues

* in Accept-*

* in Accept, Accept-Charset and Accept-Encoding request headers is of very limited use, because:

  • With a qvalue higher than another value, it says "send anything in preference to something specific", which isn't useful
  • With a qvalue lower than another media type, it says "send a more specific value in preference to something else", which is already implied by HTTP
  • With a qvalue of 0, it says "don't ever send other value" -- but a server can always ignore that, because this is an optional-to-implement extension
  • In media types (e.g., text/*), it has been cargo culted a lot, but doesn't actually give the server information it can use.

Can we deprecate * in these headers, or at least restrict its use / give better guidance?

Retries

As discussed quite a bit, retries in HTTP need to be better defined. In particular, HTTP does not offer a guarantee that any particular request will not be automatically retried.

RFC7234: First-hand responses and age

From http://www.w3.org/mid/[email protected]:

Imagine a cache that has a stored response A with a Date value X. The
cache sends a conditional request to validate that cached response. The
cache receives a 200 OK response B with a Date value of Y.

If X <= Y, then the situation is clear -- A is stale and the cache
should use B.

What if X > Y? In other words, what if the cache receives a 200 OK
response B that appears to be older (i.e., even more stale) than the
response A the cache is trying to validate? Should the cache trust the
sender's staleness decision or its own date comparison logic?

  • RFC 7234 section 4 says "a cache MUST use the most recent response (as
    determined by the Date header field)". That means A wins.

  • RFC 7234 section 4.3.3 says "the cache MUST use the full response [it
    just received]". That means B wins.

  • RFC 2616 section 13.2.5 says "If a client performing a retrieval
    receives a non-first-hand response for a request that was already fresh
    in its own cache, and the Date header in its existing cache entry is
    newer than the Date on the new response, then the client MAY ignore the
    response". That means A wins if A was fresh and B came from a cache.

If I have to guess, I would use B if it does not have an Age header,
boldly assuming that it is a first-hand response. Otherwise, use A. With
more time/effort, revalidating with max-age=0 would be a good option
(but it may result in the same conundrum).

Is this a gray area, or did I miss a specific HTTPbis rule that resolves
this conflict? Was the quoted RFC 2616 MAY replaced with something
equally specific? If this is a gray area, what do you recommend?

Guidelines for header field names

Is it useful to give some guidance about the use of prefixes like:

  • Accept-
  • Allow-
  • Expect-

etc.? In particular, we could say that you can't depend on the semantics of a prefix, but it's a good idea to align your use with existing prefixes.

415 and Accept

This came up in ACME where they wanted to use 415 and Accept.

RFC 7694 implies this is OK, and makes two observations:

  • 415 is for cases where request content type or content coding is not acceptable
  • you can use Accept-Encoding with 415

RFC 7694 failed to explicitly allow Accept with 415, though it naturally follows.

We might consider rolling 7694 into any update given its tiny size, and then fix this in the process.

RFC 7231 content-type charset param case sensitivity

https://greenbytes.de/tech/webdav/rfc7231.html#media.type

The type, subtype, and parameter name tokens are case-insensitive. Parameter values might or might not be case-sensitive, depending on the semantics of the parameter name. The presence or absence of a parameter might be significant to the processing of a media-type, depending on its definition within the media type registry.

...it then continues showing examples that imply that the charset value is case-insensitive (which is correct), but it would be helpful if it cited RFC 2046 which defines charset's case sensitivity.

Multiple definitions of the same ABNF rule potentially confusing

In RFC 7230, Section 7, there are two definitions for "1#element". One is for what producers must generate and one is for what consumers must accept.

Having two definitions for a single symbol was unexpected and caused some confusion for me. I even mistakenly reported a technical errata against the spec because the first definition didn't match up with something in the range request spec. Julian Reschke caught my mistake and suggested I report this as a potential usability issue.

In my particular case, I'm trying to write strict parsing code for HTTP range requests. This involves jumping back and forth between ABNF rules in the spec as I write the actual code. I didn't notice the prose around the rule that explained the duplication -- I just saw the first rule, assumed it was the only definition of that rule, and started writing code for it.

One change that could help would be to strictly avoid having two definitions for the same symbol. For example, maybe the left-hand side of the rule itself could be annotated with the context:

1#element {producer} => element *( OWS "," OWS element )
1#element {consumer} => *( "," OWS ) element *( OWS "," [ OWS element ] )

RFC 5789: Content-Location header usage

The RFC 5789 describes the appropriate usage of Content-Location header in PATCH:

A response to this method is only cacheable if it contains explicit freshness information (such as an Expires header or "Cache-Control: max-age" directive) as well as the Content-Location header matching the Request-URI, indicating that the PATCH response body is a resource representation.

So, it means, that Content-Location header must appear only if the actual representation is the part of response body for PATCH.

Also the usage of Content-Location is mentioned in another RFC 7231:

For a state-changing request like PUT (Section 4.3.4) or POST (Section 4.3.3), it implies that the server's response contains the new representation of that resource, thereby distinguishing it from representations that might only report about the action (e.g., "It worked!"). This allows authoring applications to update their local copies without the need for a subsequent GET request.

So, accordingly to this information in both RFCs, the apropriate usage of Content-Location with state-changing requests are following:
Content-Location must appear in response only if response-body contains the new resourse representation.

But, there is also an example in RFC 5789:

Successful PATCH response to existing text file:

HTTP/1.1 204 No Content
Content-Location: /file.txt
ETag: "e0023aa4f"

The 204 response code is used because the response does not carry a message body (which a response with the 200 code would have). Note that other success codes could be used as well.

Furthermore, the ETag response header field contains the ETag for the entity created by applying the PATCH, available at http://www.example.com/file.txt, as indicated by the Content-Location response header field.

How you can see, there is a Content-Location presented in response with 204 status code (No content). Of course, this response doesn't contain any body as well as new resource representation. This fact adds some ambiguity in Content-Location header description. What is the correct usage of this header?

Clarify rules around half-closed TCP connections

The HTTP RFCs say nothing about the expectations around half-closed TCP connections.

In practice, I haven't seen any HTTP client in the wild send a request, and then send a FIN (shutdown) while still waiting for the server's response.

Because we haven't see any clients do that, as of Go 1.8, Go's HTTP server is starting to make assumptions that reading an EOF from the client means that the client is no longer interested in the response. (reading EOF being the closest portable approximation to "the client has gone away").

But in golang/go#18527, a user reports that they have an internal HTTP client which does indeed make a half-closed TCP request.

It would be nice if the HTTP RFCs provided guidance as to whether this is allowed or frowned upon.

I would recommend that the RFC suggest that clients SHOULD NOT half-close their TCP connections while awaiting responses. Because nobody else does, empirically, and relying on reading EOF is a useful signal for servers.

/cc @mnot @benburkert

Extension capabilities

We're seeing some interesting use of extensions starting to be discussed, e.g.,:

  • Overriding method semantics using a HTTP/2 setting (in the WebSockets draft)
  • Referring to pseudo-headers as primary artefacts in HTTP header payloads (in the Signature proposal)

Some guidance about the relationship of version-specific syntax/extensions and "top-level" HTTP semantics would be helpful.

RFC7233 - byte range response with empty representation

Regarding byte ranges, an empty (zero-length) representation is unsatisfiable according to section 2.1, but not unsatisfiable according to section 4.4 if the first-byte-pos is zero.

I would like to see an update to the RFC which explicitly resolve this self-contradiction in whatever way seems appropriate. The following is one suggestion, but there may be others:

I would like to suggest explicitly specifying that an empty 200 response should be returned in this case. It is the simplest solution to the current self-contradiction in the RFC, since it is a valid response anyway (if the server chooses to ignore the Range header), clients already handle it properly, it provides all necessary information about the representation to the client, and stating it explicitly can prevent subtle edge-case pitfalls in both the RFC and its implementations (as opposed to more intricate solutions).

Perhaps the following can be added at the end of section 3.1:
"If all of the preconditions are true and the target representation
length is zero, the server SHOULD send a 200 (OK) response."

(someone in the discussion preferred this to be a MUST, but the SHOULD might be more in line with the previous sections and backward-compatibility if some implementations resolved this in some other way.)

I raised this in the mailing list a while back, and it got some discussion and support but did not get officially resolved. More recently I reported it as errata but it was rejected as not being an erratum and redirected here (I apologize - didn't realize issues moved to github, and still not sure what the distinction is :-) ), so here it is, raised again.

All suggestions and feedback are welcome.

Header normalization rules

From httpwg/http-extensions#282 (see that for more context).

Are the following header fields considered equal or equivalent or are they different?

Header: value; p=pval
Header: value; p="pval"
Header: value; p="Pval"
Header: value; P="Pval"

Some specializations of the header syntax permit both quoted-string and token for parameter values. If the string value after removal of quotes is the same as the token, is it the same value?

What about case folding for parameter names? Do we do that?

Field-name syntax

Header field-names are defined as tokens. This is an extremely permissive syntax, including characters that will cause confusion and likely break some senders/recipients.

Most of the special characters allowed are not in the registry or seen "in the wild." Some research would be good to substantiate their use, but a starting point might be:

"-" / "_" / "." / "+" / DIGIT / ALPHA

There are a number of strategies we could take to the transition:

  1. Like OWS / BWS, mark some characters as "do not generate" but "should consume"

  2. Disallow registration of header fields with those characters, and discourage their use in unregistered headers

  3. If we have more confidence that they're not in use, just ignore headers containing those characters.

Content-Encodings should be self-describing

As part of the discussion about the Encryption specification we discussed how Content-Encodings can or should take parameters from HTTP headers.

The consensus was that they shouldn't.

Whatever information a C-E may need should be inside the body of the object, just like the GZIP header etc.

We should document this as a rule, if we ever do a HTTP[1].ter

Trailers

7230 says:

The trailer fields are identical to header fields, except they are sent in a chunked trailer instead of the message's header section.

and:

When a chunked message containing a non-empty trailer is received, the recipient MAY process the fields (aside from those forbidden above) as if they were appended to the message's header section.

As discussed at the 2016 HTTP Workshop as part of @annevk's Fetch presentation, this isn't necessarily sensible.

Via header: host ABNF could allow ","

Reported by [email protected] in https://lists.w3.org/Archives/Public/ietf-http-wg/2016OctDec/0527.html:

I think I found a bug in the specification of the Via header as given
in RFC 7230

From RFC 7230: Via = 1#( received-protocol RWS received-by [ RWS comment ] )
where 1# is a special syntax that means "comma seperated list, at
least one element"

From RFC 7230: received-by = ( uri-host [ ":" port ] ) / pseudonym
From RFC 7230: uri-host = <host, see [RFC3986], Section 3.2.2>
From RFC 3986: host = IP-literal / IPv4address / reg-name
From RFC 3986: reg-name = ( unreserved / pct-encoded / sub-delims )
From RFC 3986: sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "
" /
"+" / "," / ";" / "="

notice "," there in sub-delims; this means that comma is a valid
character in a host.
and hence, that using a comma to terminate a host makes no sense

e.g.
Via: 1.0 fred, 1.1 p.example.net
'fred,' is a valid uri-host
In this case, I think we might be saved by the fact that the rest of
the line doesn't match, so 'fred' ends up being a pseudonym rather
than a uri-host.

However, I believe that there might be corner cases not backed up by
this fallback.

TE: trailers

The use case for TE: trailers is described in RFC7230, section 4.1.2:

Unless the request includes a TE header field indicating "trailers" is acceptable, as described in Section 4.3, a server SHOULD NOT generate trailer fields that it believes are necessary for the user agent to receive. Without a TE containing "trailers", the server ought to assume that the trailer fields might be silently discarded along the path to the user agent. This requirement allows intermediaries to forward a de-chunked message to an HTTP/1.0 recipient without buffering the entire response.

Section 4.3 adds to that:

The presence of the keyword "trailers" indicates that the client is willing to accept trailer fields in a chunked transfer coding, as defined in Section 4.1.2, on behalf of itself and any downstream clients. For requests from an intermediary, this implies that either: (a) all downstream clients are willing to accept trailer fields in the forwarded response; or, (b) the intermediary will attempt to buffer the response on behalf of downstream recipients

A few observations:

  1. Concrete advice about when to generate TE: trailers would be helpful. E.g., should a User-Agent include it on all connections that support trailers? Should an intermediary include it on all connections when the upstream connection also supports trailers, or it commits to buffering the response?
  2. A server receiving TE: trailers can't infer much from it; it only knows that trailers are going to be processed by the path, but could still be dropped on the floor by the client (in 4.1.2: "recipient MAY process the fields"). Is it really a useful signal?

Content-Type header parsing

This is related to #33, but subtly different. As explored in whatwg/mimesniff#30 browsers have different code paths for request and response Content-Type header parsing. Values such as */*, text/html in a response Content-Type header end up being interpreted as text/html, presumably for compatibility with deployed content.

This seems like another fallout of intermediaries adding potentially duplicate headers and (early) implementations being poorly tested for erroneous input.

Transfer-Encoding header on 1xx, 204, 304 response

RFC 7230, section 3.3.3 forbids a response body when the status code is 1xx, 204, or 304. But it makes no mention of a Transfer-Encoding header on a response that cannot contain a message body. Is a TE header on a bodyless response permissible?

An answer to the question should clarify what a client is to do with the connection when a TE header is seen on a 1xx/204/304 response. @tombergan reported that chrome will keep the connection open for reuse, while node.js will close the connection. Closing the connection avoids the ambiguity in handling follow up requests over the same connection, and might protect against response splitting attacks. However, it seems overly defensive to close the connection because the server may be about to invalidate the response by sending a body, even though the response is perfectly valid at the point the connection is being closed.

See golang/go#22330 for more context.

RFC 7230 A transfer-parameter of `q` should not be allowed

Continuing from http://www.rfc-editor.org/errata/eid4683

In the current spec, nothing is said about how to handle transfer-parameters.
Notably, nothing is said about the case sensitivity of the parameter key.

This results in a conflict with the TE header: if you see a "q" token,
you cannot know if it is a transfer-parameter vs a t-ranking.

It is noted that the "q" token is case insensitive in section 4.3.

When multiple transfer codings are acceptable, the client MAY rank
the codings by preference using a case-insensitive "q" parameter

Parsing content after a 204 response

When data is written to a connection following a 204 response, the user agent may interpret the data as it pleases. In web browsers today, this means:

  • The Chromium and Edge web browsers will inspect the first 4 bytes for the
    start of a valid HTTP response. If found, they will parse the data that
    follows as a new response (and if any of those first four bytes bytes are
    invalid they will be discarded). If more than four invalid bytes are
    encountered, the browsers abort parsing and interpret the data as a
    HTTP/0.9 response. This is consistent with their general response parsing
    behavior (i.e. without a preceding a 204 response)
  • The Firefox web browser may inspect 1 kilobyte of data or more (the exact
    number has been variable in my testing) for a valid response. If found, it
    will discard any preceding invalid data. This tolerant behavior is only
    observable following a 204 response; otherwise, Firefox seems to parse in the
    same way as Chromium and Edge.
  • the Safari web browser, upon receiving any invalid data, makes no attempt to
    recover and discards the remaining data

This variation has led to instability in automated tests written for the Web Platform Tests project--see issue 5037.

I originally reported this inconsistency in issue 5227, where @mnot provided the following context (from RFC7230 section 3.3.3):

If the final response to the last request on a connection has been completely
received and there remains additional data to read, a user agent MAY discard
the remaining data or attempt to determine if that data belongs as part of
the prior response body, which might be the case if the prior message's
Content-Length value is incorrect. A client MUST NOT process, cache, or
forward such extra data as a separate response, since such behavior would be
vulnerable to cache poisoning.

Mark followed up by saying:

What I think's being requested is a recommendation for how much data should
be discarded before the client gives up; possibly a minimum. It feels kind of
analogous to when we established the minimum URL length that should be
supported by implementations, so it's not completely off base.

That said, this is truly a corner case; the right answer is "don't do that."
Anyone depending on interop in this case is doing it wrong to start with.

Though I agree with @annevk: "I think ideally HTTP defines how to parse HTTP." Can the specification language be made more explicit for expected behavior in this situation?

Thanks for your consideration!

Duplicate header resolution in H1

Firefox rejects responses with multiple Location headers (unless their values happen to be identical). Other browsers have different handling and it likely differs per header field. It would be great to get a consistent story around this.

.onion names

Note the requirements in RFC6761 regarding .onion names in RFC7230bis Sections 2.7.1 and 2.7.2.

Method case sensitivity

HTTP defines methods as case-insensitive, but many implementations / apps built on HTTP case-normalise to uppercase.

I don't remember discussing this on the list, and a quick search of the issues list doesn't show anything.

See also w3c/ServiceWorker#120.

Ranges and Content and Transfer Encoding

RFC 7233 does not mention content encoding at all. Same for transfer encoding. I assume that is because this is completely unspecified and therefore completely unreliable, however, for my sanity...

My reading is that a 206 response includes ranges of the encoded message, and that the content-encoding applies to the complete message body prior to being split into ranges. Thus, if I had a "x2" content encoding that turned "Hello World!" into "HHeelllloo WWoorrlldd!!", asking for bytes 3-5 would get you "eel" and not "llo".

The text in Section 4.1 suggests that you would not include a Content-Encoding header field if the client used If-Range on the expectation that they already know. That seems pretty dangerous, but it's consistent with the idea that you are repairing a larger message.

On the other hand, I have to assume that a Transfer-Encoding applies after the range request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.