GithubHelp home page GithubHelp logo

elide-tools / charset Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jelmer/charset

0.0 0.0 0.0 38 KB

Thunderbird-compatible character encoding decoding for email in Rust

License: Other

Rust 100.00%

charset's Introduction

charset

crates.io docs.rs Apache-2.0 OR MIT dual-licensed

charset is a wrapper around encoding_rs that provides (non-streaming) decoding for character encodings that occur in email by providing decoding for UTF-7 in addition to the encodings defined by the Encoding Standard (and provided by encoding_rs).

Note: Do not use this crate for consuming Web content. For security reasons, consumers of Web content are prohibited from supporting UTF-7. Use encoding_rs directly when consuming Web content.

The set of encodings consisting of UTF-7 and the encodings defined in the Encoding Standard is believed to be appropriate for consuming email, because that's the set of encodings supported by Thunderbird. Furthermore, UTF-7 support is believed to be necessary based on the experience of the Firefox OS email client. In fact, while the UTF-7 implementation in this crate is independent of Thunderbird's UTF-7 implementation, Thunderbird uses encoding_rs to decode the other encodings. The set of labels/aliases recognized by this crate matches those recognized by Thunderbird 60.0. Prior versions of Thunderbird as well as version 60.4 and later recognize more labels. Support for those is a TODO item for this crate.

Known compatibility limitations (shared with Thunderbird and known from Thunderbird bug reports):

  • JavaMail may use non-standard labels for legacy encodings such that the labels aren't recognized by this crate even if the encodings themselves would be supported. (Fixed in Thunderbird 60.4 but not in this crate.)
  • Some ancient Usenet posting in Chinese may not be decodable, because this crate does not support HZ.
  • Some emails sent in Chinese by Sun's email client for CDE on Solaris around the turn of the millennium may not decodable, because this crate does not support ISO-2022-CN.
  • Some emails sent in Korean by IBM/Lotus Notes may not be decodable, because this crate does not support ISO-2022-KR.

This crate intentionally does not support encoding content into legacy encodings. When sending email, always use UTF-8. This is, just call .as_bytes() on &str and label the content as UTF-8.

Licensing

Apache-2.0 OR MIT; please see the file named COPYRIGHT.

API Documentation

Generated API documentation is available online.

Security Considerations

Again, this crate is for email. Please do NOT use it for Web content.

Never try to perform any security analysis on the undecoded data in ASCII-incompatible encodings and in UTF-7 in particular. Always decode first and analyze after. UTF-7 allows even characters that don't have to be represented as base64 to be represented as base64. Also, for consistency with Thunderbird, the UTF-7 decoder in this crate allows e.g. ASCII controls to be represented without base64 encoding even when the spec says they should be base64-encoded.

This implementation is non-constant-time by design. An attacker who can observe input length and the time it takes to decode it can make guesses about relative proportions of characters from different ranges. Guessing the proportion of ASCII vs. non-ASCII should be particularly feasible.

Serde support

The cargo features serde enables Serde support for Charset.

Minimum Rust Version

The MSRV depends on the encoding_rs and base64 dependencies; not on this crate. This crate does not undergo semver bumps for base64 semver bumps.

Disclaimer

This is a personal project. It has a Mozilla copyright notice, because I copied and pasted from encoding_rs. You should not try to read anything more into Mozilla's name appearing.

Release Notes

0.1.3

  • Update base64 to 0.13.0.

0.1.2

  • Implemented From<&'static Encoding> for Charset.
  • Added optional Serde support.

0.1.1

  • Added decode_ascii().
  • Added decode_latin1().

0.1.0

Initial release.

charset's People

Contributors

hsivonen avatar wookietreiber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.