intarchboard / draft-use-it-or-lose-it Goto Github PK

View Code? Open in Web Editor NEW

2.0 4.0 4.0 686 KB

Long-term Viability of Protocol Extension Mechanisms

License: Other

Makefile 100.00%

ietf grease protocol stackevo

draft-use-it-or-lose-it's People

Contributors

Stargazers

Watchers

Forkers

britram hardaker chris-wood

draft-use-it-or-lose-it's Issues

Grease suggestions and motivations

The way the grease section ends leaves it unclear in my mind whether it's being a recommended solution or not. It feels concluded (and maybe it's just informational, but then stating explicitly that there are pros/cons with this mechanism may help state explicitly that there isn't a conclusion being offered).

It's also unclear why someone would be motivated to actually implement grease at all; what is the gain by doing so unless you're a security scanner? Why would the average implementation do something like this? What's their motivation that justifies an increase in complexity (which you specifically point out doesn't happen in typical engineering in "Good Protocol Design is Not Sufficient")?

Mention invariants

...and their interaction with greasing. Invariants are the properties of the protocol that you don't want to protect.

Discuss greasing as an addition to an existing protocol, versus being part of it from the start

We may want to consider the distinction between trying to add greasing to an existing protocol (TLS, HTTP) versus adding it from the start (QUIC).

"extension point" needs a definition

The draft liberally uses the term "extension point". Having a definition seems I have a PR for this, but I think it might need a touch up or two, and the placement might not be perfect.

ideas to make the document more crisp and shorter

The message of the document is rather easy and I think it would help to convey this message more clearly if the document would be shorter (also more crisp and less redundant maybe). Here are some proposals:

I find the intro text in section 2 rather long and the content of 2.1. more strong to start with and then maybe cut the intro text to:

"Bugs in how new codepoints or extensions can manifest as abrupt termination of sessions, errors, crashes, or disappearances of endpoints and timeouts. These interoperability problems can hinder or stop deployment of new, valuable features even if the negative reactions happen infrequently or only under relatively rare conditions. Fixing bugs
that limit interoperability involves a difficult process that includes identifying the cause of these errors, finding the responsible implementation(s), coordinating a bug fix and release plan, contacting users and/or the operator of affected services, and waiting for the fix to be deployed. Especially for protocols involving multiple parties or that are considered critical infrastructure (e.g., IP, BGP, DNS, or TLS), it could even be necessary to come up with a new protocol design that uses a different method to achieve the same result."

I would move the examples in 2.2. to the appendix and maybe only keep this paragraph (of the TLS section):

"Even where extension points have multiple valid values, if the set of permitted values does not change over time, there is still a risk that new values are not tolerated by existing implementations. If the set of values for a particular field remains fixed over a long period, some implementations might not correctly handle a new value when it is introduced. For example, implementations of TLS broke when new values of the signature_algorithms extension were introduced."

In section 2.3 the part about MPTCP and TCP Open could also go to the appendix (and maybe provide a bit more insights there).
maybe 3.1 could also go to the appendix; also because the last two paragraphs in 3.1. are really redundant with the intro of section 3.
I know we just shuffled this around but I guess section 3.2 and 3.3 could also be subsections of 4. I'm actually not quite sure what the purpose or message of section 3 is...?

Abstract should be more descriptive of content

The abstract doesn't actually say what the doc is going to cover. I have a PR ready to go to help this.

Reframe "cryptography" section as being about boundaries

Instead of just crypto, we're really talking about limiting/controlling participation via a boundary. Cryptography is a good technique here, but there may be others.

mnot is right

This needs some careful editing to tighten it up.

link to ourself

The published Internet-Draft should have a link to the github repo where the document is being maintained.

Mix of core and extensions

From Wes:

do we want to mention that the success of headers functionally is because there is a mix of both protocol specific fields and extra meta data; IE, allowing users and developers to specify new fields that will never be part of the protocol makes the fields be understood

Consider restructuring section 4

Consider grouping as:

Active Use

Extensions
Version Negotiation (active use of the layer that contains you)
Filler/grease

Other

Crypto
Invariants
Feedback

Presence and order matter

The document talks a lot about how active use of extensions is good practice. Should it also encourage how those used extensions are expressed on the wire? As an example, the order in which TLS extensions appear in a ClientHello are often hard-coded for simplicity. Many implementations simply iterate over the fixed list of extensions, adding them as needed.

However, it's certainly possible for the order to be a point of ossification. (See this BoringSSL change, which acknowledges this fact and allows one to permute the order to change how these extensions are ordered on the wire.) Changing the order would hedge against that ossification. Altering the order may have other side effects, such as performance (though nothing specific comes to mind).

Should the draft touch on this? If so, I can try and send a PR.

Threat model

There's a lot of great material in this document, but I think one thing that's missing is the threat model. I think this was discussed on a prior call, but consider the following two types of TLS examples:

PQ key shares in an extension: in practice, we're primarily concerned with whether or not larger flights will make it through the network. The adversary is not an active attacker, but a benign, possibly buggy middlebox. Greasing to exercise size limits on TLS flights seems like a way to test this in practice.
ECH: in practice, we're primarily concerned with whether passive or active middleboxes will tamper with or otherwise block connections using this extension. Greasing might also apply here, but it's less clear than the above example.

In general, the threat model here seems to be more in line with (1). The syntax (wire image) of a protocol might trigger certain middlebox or endpoint behavior that is undesired, and so greasing to avoid these seems generally good. In contrast, (2) modifies both the syntax and, in way, the semantics of the protocol. Greasing in the spirit of this draft doesn't seem immediately applicable.

If it would be useful, I can try and contribute some text to clarify the desired threat model.

Cite upcoming IAB output

There was discussion of a statement or RFC that says "if you network it, it will be networked" or "any networked device should assume the internet threat model" or something along those lines. That would be a valuable citation in the context of the applicability statement.

There are many cases where people believe that their network won't end up connected to the Internet. They make assumptions on that basis and are eventually disappointed to find that there are pressures to become networked (or someone does it for them, see also "SSL added and removed here"). At that point, assumptions about security become invalid and bad things happen.

Here, the point is that the scope of potential use might not be as constrained as imagined. So if someone thought that these considerations don't apply because of a narrow deployment context, that might not hold true forever. This is less serious than the security thing, but certainly worth a caution.

Provide an example for cases where an extension mechanism is mandatory but not used

Can we provide an example for this text?

Caution is advised to avoid assuming that building a dependency on an extension
mechanism is sufficient to ensure availability of that mechanism in the long
term. If the set of possible uses is narrowly constrained and deployments do
not change over time, implementations might not see new variations or assume a
narrower interpretation of what is possible. Those implementations might still
exhibit errors when presented with new variations.

Correct or fail

One of the nice innovations in the design of TLS 1.3 was the inclusion of mechanisms that ensured that correct implementation of a feature was necessary for interoperation. Including mention of this in the "dependency" chapter would be great. The key element here is creating a mechanism where defection by one entity results in changes to keys, and so a mistake in implementation would result in an inability to communicate.

I can't recall the specific mechanism, but the QUIC Initial encryption is an example of this.

Jari's comments

@jariarkko says

The areas of concern I personally have are s3.2 and s4.4 and possibly better discussion of where this advice and experience applies and where it does not. Machines vs. Humans, applications vs. network gear, etc.

Right now, I don't know how to act based on this. This will need more discussion.

Back claims more

From Stephen Farrell:

There are a bunch of assertions in the draft that
may or may not be true but that are asserted as if
they are true. I would bet that a bunch of 'em are
true but don't see sufficient evidence given or cited.
To take the earliest example: "Protocols can react
to these shifts in one of three ways: adjust usage
patterns within the constraints of the protocol,
extend the protocol, and replace the protocol." Well,
first off, protocols don't react, chemicals and
living things do:-) But assuming you mean some class
of person, I at least can ignore the hell out of
many many things as another option. So this doesn't
seem to be a correct analysis of the options. I
think there are a good few examples like this and
that those add up to too many to be sure how they
may affect the overall argument.

First step: collect the instances where this class of error might appear. Then perform some analysis.

Version negotiation

@tfpauly identified a pattern that is useful for version negotiation: rather than build a mechanism that won't be used until v2, hoist the negotiation out of the protocol to the next layer down. That way you can take advantage of the extension mechanisms at that layer seeing more use (and therefore gaining use-it-or-lose it advantages).

Add a new section specifically about version negotiation in the design principles section.

See also IPv6 Ethertype, ALPN, intarchboard/program-edm#8

It might also be worth pointing to https://eprint.iacr.org/2016/072

Explain HMSV on first use

In section 3.3: HMSV should be expanded on first use, with a reference for it (to RFC 6709). It's currently only expanded in the appendix for TLS.

Appendex A.1: BGP

There's not enough information to understand what happened and how this draft pertains to addressing the BGP failure. For one thing, the reference [RIPE-99] has a reference to a Cisco URL that apparently no longer exists; IOS-XR at that time was brand spanking new. And it's not like BGP extensions weren't a regular thing at that time.

However, there is a different use case that might be interesting to pursue: partial deployment of ROAs and S*BGP led to some unintended consequences. See Lychev, Goldberg, Shapira. This gets into multiparty aspects.

Unify sections on disuse?

There are currently two sections that cover roughly the same point, with separate examples. It seems odd to have one in section 2, and one in section 3. Can 3.3 be merged into 2.2?

2.2. Examples of Disuse

3.3. Unused Extension Points Become Unusable

Can we provide guidelines for frequency of active use?

The section on Active Use currently says:

There are currently no firm
guidelines for new protocol development.

Can we do a bit better here to define some principles about how deployment timings and evolution should relate to the use of extension points? Or does that become a separate document?

EDNS Flag Day

This was broadly successful, maybe this can be used as an example of where the Robustness Principle was applied over many years and did result in cruft. The flag day ended that. Presentation at RIPE and at last IETF.

More analysis needed

The following text:

RFC 6709 {{?EXTENSIBILITY=RFC6709}} contains a great deal of well-considered
advice on designing for extension. It includes the following advice:

This means that, to be useful, a protocol version-negotiation mechanism
should be simple enough that it can reasonably be assumed that all the
implementers of the first protocol version at least managed to implement the
version-negotiation mechanism correctly.

This has proven to be insufficient in practice.

This isn't true for a great many protocols. Two good examples are HTTP and SMTP. SMTP has survived no versioning mechanism, initially, and now has a retrofitted capabilities mechanism. There are absolutely invariants in the protocol - you can't just start speaking ebcdic at it. When an invariant is violated, a new protocol is needed. Stewart Bryant observed this quite some time ago in RFC 5704.

I would reform the point along those lines.

The connection between active use and invariant could be more clear

For me there is this relation between invariants and active use/greasing, that every you don't use actively risks to become an invariant over time and cannot be changed anymore. (Similarly as rfc8558 says that everything that is exposed might be used as path signal and ossify). If people agree, maybe we can make this point more clearly and prominently?

BGP example

Jeff seemed to suggest that experience with BGP might provide some useful insights.

Consider describing RTP/RTCP muxing thing

Reclamation of code points here was eventually successful. Might talk to some mmusic folks about the history there.

Describe greasing in QUIC

Current text is only TLS, we should add QUIC

attribute 99 example

From Stephane Bortzmeyer:

May be adding a mention of the "attribute 99" debacle? That was a good
case of a legal value breaking implementations that were not expecting
it.

https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment/
http://ccie-in-3-months.blogspot.kr/2010/08/decoding-ripe-experiment.html

Multi-Party Interactions and Middleboxes {#middleboxes} superfluous to your point

As this section is written, it is not about "use it or lose it" but about the need for multiple parties to adopt capabilities before they can be used. That's true, but the analysis here is both facile and superfluous. It's superfluous to the point made in the abstract. It's facile because those middleboxes serve an often necessary function. I propose that you not get into that discussion here. If you want to go there, a clearer understanding of the function of a middlebox is important. At the IP layer, every router is either a middlebox or a participant.

This is another reason to limit recommendations to the higher layers more explicitly.

Discuss coordination as a strategy for active use and greasing

In our December EDM call, we discussed how coordination is actually more important than greasing overall as a strategy for active use. This topic should be brought up and discussed in the document.

Section A.4: IP

I think it would be better if you mentioned which address space by prefix you are referring to. For example, 240/4 versus 224/4. A citation about the confusion you are referring to might also help

Codepoint reclaim is difficult if negotiation with every node is needed

The current draft says:

"Codepoints that are reserved for future use can be especially problematic. Reserving codepoints without attributing semantics to their use can result in diverse or conflicting semantics being attributed without any hope of interoperability. An example of this is the "class E" address space in IPv4 {{?RFC0988}}, which was reserved without assigning any semantics. For protocols that can use negotiation to attribute semantics to codepoints, it is possible that unused codepoints can be reclaimed for active use, though this requires that the negotiation include all protocol participants."

This is valid of course. The text is good as is. However, it might be useful to point out that reclaiming in class E type cases is difficult, because you'd have to negotiate with everyone on the path, as otherwise the intervening routers may discard packets that seemingly use wrong kind of addresses. Or clarify that the latter part of the paragraph applies only to actually unused codepoints, whereas reserved-but-unused has likely lead to code that prohibits the use.

E.g., one could add this text to the end of the paragraph: "This is in practice impossible in situations such as those involving class E addresses, as all protocol participants would have to include every node along the path."

Interoperability with other implementations is usually highly valued, so
deploying mechanisms that trigger adverse reactions can be untenable. Where
interoperability is a competitive advantage, this is true even if problems are
infrequent or only occur under relatively rare conditions.

This is a bit opaque. Might I suggest the following reword:

Protocols generally require the backward compatibility necessary for implementations to properly interoperate.

But perhaps that doesn't cover the thrust you intend?

Look for older examples of greasing

Ted suggests that while the name is new, the practice probably isn't. Send an email to architecture-discuss asking for examples of this sort of thing in the past.

intarchboard / draft-use-it-or-lose-it Goto Github PK

draft-use-it-or-lose-it's People

Contributors

Stargazers

Watchers

Forkers

draft-use-it-or-lose-it's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs