GithubHelp home page GithubHelp logo

Comments (8)

LucioFranco avatar LucioFranco commented on August 28, 2024

Would it make sense in the display to limit the amount of viewable bytes? That said, I think this makes sense to me.

from http.

hawkw avatar hawkw commented on August 28, 2024

Would it make sense in the display to limit the amount of viewable bytes? That said, I think this makes sense to me.

Yeah, we probably want to have some kind of "common sense limit" on how much of the URI we print...

from http.

seanmonstar avatar seanmonstar commented on August 28, 2024

On the one hand, better errors are a great user-centric improvement, I love em! If there's a way to improve it for users, even just some of the time, that'd be wonderful.

On the other hand, here's some concerns we'd need to overcome:

  • Some of the parsing methods take Bytes, which is cheap to clone, or even just own, similar to the TryFrom<String>. But those taking a slice, there'd be a cost to copying into a new vec to store in the error.
  • If it were a Bytes, would storing a clone of it in an error message make the original buffer possibly live way longer than one assumed (considering the Bytes could be a small slice of a bigger one that took a big read from the socket)?
    • Would it be a case for a Bytes method that forced a copy if the diff were big enough?
  • The parse data usually comes from an outsider, would printing that data as part of the error message have any security considerations? I imagine the answer is a tentative yes, as in "people could screw up and print it in a way that an attacker can do bad things", so it wouldn't be our fault, but it could be making it easier for some to be pwned.
  • Likewise from a privacy point of view: would the data possibly have PII that people logging these errors weren't expecting?

from http.

hawkw avatar hawkw commented on August 28, 2024
  • Some of the parsing methods take Bytes, which is cheap to clone, or even just own, similar to the TryFrom<String>. But those taking a slice, there'd be a cost to copying into a new vec to store in the error.

Hmm, don't the parsing methods that take a slice convert the slice into a Bytes before trying to parse it, anyway? E.g.

http/src/uri/mod.rs

Lines 713 to 715 in 34a9d6b

fn try_from(t: &'a [u8]) -> Result<Self, Self::Error> {
Uri::from_shared(Bytes::copy_from_slice(t))
}

If I understand the code correctly, it looks like all paths that construct a Uri from a &[u8] or an &str will already copy the bytes into a Bytes before trying to parse, and then the Bytes is dropped if the URI is invalid? Or am I misunderstanding something here?

  • The parse data usually comes from an outsider, would printing that data as part of the error message have any security considerations? I imagine the answer is a tentative yes, as in "people could screw up and print it in a way that an attacker can do bad things", so it wouldn't be our fault, but it could be making it easier for some to be pwned.

  • Likewise from a privacy point of view: would the data possibly have PII that people logging these errors weren't expecting?

This is a valid concern --- perhaps we would want to add a method to the InvalidUri error to access the input bytes, rather than always including it in the Debug implementation? That way, the application can choose whether or not to actually log the URI value. The downside is that this would probably mean that, for hyper users, hyper would need to actually expose the http::uri::InvalidUri as a source and/or downcast target for its error types, or else there would be no way for the application to explicitly choose to display the invalid input...

from http.

hawkw avatar hawkw commented on August 28, 2024

To elaborate on the motivation here a bit, the particular use-case I'm thinking of with this feature request is one where the errors are being returned by a hyper server. In that case, there isn't currently a good way for the application to choose whether or not it wants to display the invalid input, because the buffer that the bytes were read into is owned by hyper, and it's hyper's HTTP parser that's calling into http::Uri's parsing functions, rather than the application itself. In that case, the application can't easily access the portion of the buffer that contained the invalid URI, without doing something really unpleasant like wrapping the underlying IO resource to buffer all the bytes, and essentially reimplementing most of the HTTP parser just to find potentially invalid URIs.

For applications where the primary interaction with URI parsing is passing a user input that might be a URI into hyper, it would be much easier for the application to just hang onto the input and choose to log it if hyper returns an invalid URI error. So, I'm mainly concerned about the server use case here.

from http.

seanmonstar avatar seanmonstar commented on August 28, 2024

don't the parsing methods that take a slice convert the slice into a Bytes before trying to parse it, anyway?

Uh, well, woops. That could be improved...

To elaborate on the motivation here a bit, the particular use-case I'm thinking of with this feature request is one where the errors are being returned by a hyper server.

Making better error messages is already splendid motivation, I didn't mean to seem against the idea. I think we should always help people understand what went wrong. Just wanted to make sure we didn't miss anything while doing so.

from http.

hawkw avatar hawkw commented on August 28, 2024

don't the parsing methods that take a slice convert the slice into a Bytes before trying to parse it, anyway?

Uh, well, woops. That could be improved...

Hmm, if we're planning to change those functions to only copy the bytes into a Bytes if the URI is valid, I suppose we could make the input component of the error optional, and only include it when the URI was constructed from a Bytes or an already-owned String etc? But I think that's probably only a good idea if we're going to include the input in the Display output, rather than allowing explicit access to it, since it seems a bit weird for the error to sometimes contain the invalid input and sometimes not contain it, depending on what type the URI was parsed from...

To elaborate on the motivation here a bit, the particular use-case I'm thinking of with this feature request is one where the errors are being returned by a hyper server.

Making better error messages is already splendid motivation, I didn't mean to seem against the idea. I think we should always help people understand what went wrong. Just wanted to make sure we didn't miss anything while doing so.

We're in agreement on that --- you didn't come off as against it at all! I also want to consider all the possible reasons we shouldn't do this. I mainly thought it was worth explaining the motivation in order to see whether there might be other ways to surface invalid inputs in that don't have the same drawbacks as always including it in the error (e.g. some kind of new API in Hyper?)...

from http.

hawkw avatar hawkw commented on August 28, 2024

Quick ping on this, I'd be happy to open a PR to add it if we can come to consensus on the constraints!

from http.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.