GithubHelp home page GithubHelp logo

Handling invalid UTF-8 bytes about vte HOT 7 OPEN

alacritty avatar alacritty commented on August 26, 2024
Handling invalid UTF-8 bytes

from vte.

Comments (7)

sunfishcode avatar sunfishcode commented on August 26, 2024 1

You could just handle C1 escapes in your application by printing the missing glyph symbol, would that be reasonable? As far as I can tell, all that would be required then would be to make them all available appropriately.

Yes, that's what I want to do. It's ok if vte reports these bytes through execute or a new invalid hook or some other hook. I just want to know when these bytes happen so that I know when to emit replacement characters.

Specifically, I want to do this for both C1 codes like 0x90, and non-C1 codes like 0xfd. I can cope if these two cases are reported differently, and it's even ok if the API doesn't tell me what the actual bytes are, as long as it provides indications that such bytes were processed.

from vte.

chrisduerr avatar chrisduerr commented on August 26, 2024 1

For actually invalid UTF-8, we already print error glyphs (see echo -e "\xc2\xc2"). So as far as I can tell we'd probably just need to make sure that bytes that are ignored right now are somehow propagated (like C1 DCS/CSI/OSC).

For these specific bytes it would be possible to propagate them to the execute function without actually handling them, though I'm not sure about other things like 0xfd, I'd have to look into that myself.

from vte.

chrisduerr avatar chrisduerr commented on August 26, 2024

Non-utf8 8-bit C1 escapes should be passed to execute, so you should be able to handle C1 codes if that's your issue?

from vte.

sunfishcode avatar sunfishcode commented on August 26, 2024

Here's a more specific testcase:

$ echo -e '\x90' > test.txt
$ target/debug/examples/parselog < test.txt
[execute] 0a
$

The 0x90 byte is silently dropped with no execute or any other action.

from vte.

chrisduerr avatar chrisduerr commented on August 26, 2024

\x90 is an escape introducer, which is stripped for security based on my understanding of the code.

So escapes like \x85 will emit an execute, but the DCS(x90)/CSI(x9b)/OSC(x9d) 8-bit escapes are ignored.

from vte.

sunfishcode avatar sunfishcode commented on August 26, 2024

I don't actually want to interpret C1 controls in my use case; I want to replace all non-UTF-8 bytes into replacement characters.

Right now, vte doesn't support that, either for bytes like 0x90 which are C1 controls, or bytes like 0xfd which are not. Is this a use case vte is interested in supporting?

from vte.

chrisduerr avatar chrisduerr commented on August 26, 2024

Is this a use case vte is interested in supporting?

I'm not sure if it's possible to support that without removing existing functionality.

Take things like the NEL non-utf8 8-bit C1 escape \x85. We trigger the execute function for that with this byte attached. So it's a valid escape that we propagate upstream for handling. So it's not actually invalid at all.

You could just handle C1 escapes in your application by printing the missing glyph symbol, would that be reasonable? As far as I can tell, all that would be required then would be to make them all available appropriately.

from vte.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.