GithubHelp home page GithubHelp logo

cfcs / ocaml-punycode Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 2.0 102 KB

RFC 3492: IDNA Punycode implementation

License: GNU Affero General Public License v3.0

OCaml 100.00%
ocaml-library ocaml punycode idna idn rfc-3492 rfc3492

ocaml-punycode's Introduction

ocaml-punycode Build status

RFC 3492: IDNA Punycode implementation. It deals with full DNS domain names and their individual labels, and as such can be used for encoding/decoding these, but it does not expose a generic Bootstring or Punycode implementation.

Domain name / label validation is provided by Hannes Mehnert's domain-name library

String handling and Unicode support comes courtesy of David Bünzli's astring and uutf libraries.

Please report any issues or requests for additional features on the issue tracker.

Interface:

(** {1 Error messages} *)


type illegal_ascii_label =
  | Illegal_label_size of string
  | Label_contains_illegal_character of string
  | Label_starts_with_illegal_character of char
  | Label_ends_with_hyphen of string

type punycode_decode_error =
  | Overflow_error
  | Invalid_domain_name of string
  | Illegal_label of illegal_ascii_label

type punycode_encode_error =
  | Malformed_utf8_input of string
  | Overflow
  | Invalid_domain_name of string
  | Illegal_label of illegal_ascii_label

val msg_of_encode_error : punycode_encode_error -> [> `Msg of string]
(** [msg_of_encode_error err] is [err] transcribed to a [Rresult.R.msg],
    making it easier to display the error and use it with the Rresult monad.
    Can be used like this:
    [R.reword_error msg_of_encode_error (Punycode.to_ascii "example.com")]
*)

val msg_of_decode_error : punycode_decode_error -> [> `Msg of string]
(** [msg_of_decode_error err] is [err] transcribed to a [Rresult.R.msg],
    making it easier to display the error and use it with the Rresult monad.
    Can be used like this:
    [R.reword_error msg_of_decode_error (Punycode.to_utf8 "example.com")]
*)


(** {1:unicode2punycode Unicode -> Punycode}*)


val to_encoded_domain_name : string ->
  (Domain_name.t, punycode_encode_error) Rresult.result
(** [to_encoded_domain_name domain_name] is the ASCII-only Punycode
    representation of [domain_name] with each label prefixed by ["xn--"],
    and where [domain_name] is a UTF-8-encoded DNS domain name (or label).
    An error is returned if the input is not valid UTF-8, or if the resulting
    encoded string would be an invalid domain name, for example if:
    - a non-Punycode label starts with a hyphen
    - a label is >= 64 ASCII characters or has a zero length
    - the total length is >= 255 ASCII characters including trailing ['.']
      (if not present it will be assumed).
    See {Domain_name.of_strings} for more information regarding the
    produced value and its validation.
*)

val to_ascii : string -> (string, punycode_encode_error) Rresult.result
(** [to_ascii domain_name] is the ASCII-only Punycode
    representation of [domain_name] with each label prefixed by ["xn--"],
    joined by ['.']
    See {!to_encoded_domain_name} for more information.
*)


(** {1:punycode2unicode Punycode -> Unicode} *)


val of_domain_name : Domain_name.t ->
  (Uchar.t list list, punycode_decode_error) Rresult.result
(** [of_domain_name domain] is [domain] decoded to a list of [Uchar.t] elements
    for each label in the domain name with each label prefixed by ["xn--"]
    decoded using the Punycode algorithm.*)

val to_utf8_list : string -> (string list, punycode_decode_error) Rresult.result
(** [to_utf8_list domain] is the UTF-8 representation of [domain]
    where each label prefixed by ["xn--"] is decoded using the
    Punycode algorithm.
    The implementation strives to only accept valid domain names,
    see {!to_ascii}.
    If [domain] is a FQDN (has a trailing ['.']), this is ignored.
*)

val to_utf8 : string -> (string, punycode_decode_error) Rresult.result
(** [to_utf8 domain] is {!to_utf8_list} concatenated with dots.
    Contrary to {!to_utf8_list}, If [domain] is a FQDN (has a trailing ['.']),
    the decoded string will also have a trailing ['.'].
*)

Examples

utop # Punycode.to_ascii "☫.example.ir";;
- : (string, Punycode.punycode_encode_error) result = Ok "xn--s4h.example.ir"

utop # Punycode.to_utf8 "xn--s4h.example.ir";;
- : (string, Punycode.punycode_decode_error) result =
Result.Ok "☫.example.ir"

utop # Punycode.to_ascii "n☢clear.disarmament.☮.example.com";;
- : (string, Punycode.punycode_encode_error) result =
Ok "xn--nclear-3b9c.disarmament.xn--v4h.example.com"

utop # Punycode.to_utf8 "xn--nclear-3b9c.disarmament.xn--v4h.example.com";;
- : (string, Punycode.punycode_decode_error) result =
Result.Ok "n☢clear.disarmament.☮.example.com"

Run tests

This project organizes its tests using Alcotest, and this dune alias runs them:

dune runtest --force --no-buffer

Using --no-buffer gets you colors in output.

ocaml-punycode's People

Contributors

cfcs avatar olleolleolle avatar reynir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ocaml-punycode's Issues

BUG: discrepancy between encoder and decoder

raised exception

Failure("s != processed: '
\\xE8\\xA4\\x86\\xD1\\xB8\\xE7\\xB6\\x8A\\xE4\\x9F\\xB4\\xE3\\xB5\\x9A\\xE8\\xBF\\x92
'!=
\\xE8\\x94\\xAB\\xC2\\x9D\\xE7\\xA6\\xAF\\xE4\\x90\\x99\\xE3\\xA5\\xBF\\xE8\\xAF\\xB7
 (encoded: xn--b3a8320avwg019c78lotj)
")```

Port test suite to Alcotest (and Crowbar?)

The current test suite uses OUnit and QCheck.

The failure reporting is currently kind of ad-hoc, using exceptions to report error messages (which in turn get double-encoded by OUnit).

This is not so nice since it makes it hard to copy-paste the failing test cases when debugging issues.

  • It would be nice to port this to use Alcotest for nicer error reporting.

  • It would be nice to have a Crowbar-backed test in addition to the QCheck test-case (that currently just generates variable-size random UTF-8 strings).

    • Currently there's only one QCheck test case which tests internal integrity by encoding valid UTF-8 and checking that it decodes to the same value. It would be great to have something generating potentially invalid stuff and ensuring we reject that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.