A simple JavaScript library to encode/decode UTF8 strings.
## Encoding
A char:
UTF8.setBytesFromCharCode('é'.charCodeAt(0));
// [0xC3, 0xA9]
A string:
UTF8.setBytesFromString('1.3$ ~= 1€');
// [49, 46, 51, 36, 32, 126, 61, 32, 49, 226, 130, 172]
## Decoding
A char:
String.fromCharCode(UTF8.getCharCode([0xC3, 0xA9]);
// 'é'
A string:
UTF8.getStringFromBytes([49, 46, 51, 36, 32, 126, 61, 32, 49, 226, 130, 172]);
// '1.3$ ~= 1€'
As inputs :
var bytes=new Uint8Array([0xC3, 0xA9, 49, 46, 51, 36, 32, 126, 61, 32, 49, 226, 130, 172]);
// The first char
String.fromCharCode(UTF8.getCharCode(bytes));
// é
// The following string at the offset 2
UTF8.getStringFromBytes(bytes,2);
// '1.3$ ~= 1€'
As well as outputs :
var bytes=new Uint8Array(14);
// First encoding a char
UTF8.setBytesFromCharCode('é'.charCodeAt(0));
// Then encoding a string
UTF8.setBytesFromString('1.3$ ~= 1€', 2);
UTF8.isNotUTF8(bytes);
// true | false
This function can prove the text contained by the given bytes is not UTF-8 (or badly encoded UTF-8 string). It's not reciprocally true, especially for short strings with wich false positives are frequent.
If you try to encode an UTF8 string in an ArrayBuffer too short to contain the complete string, it will silently fail. To avoid this behavior, use the strict mode :
UTF8.setBytesFromString('1.3$ ~= 1€', 2, null, true);
Also available on NPM :
npm install utf-8
- The Debian project for it's free (as freedom) russian/japanese man pages used for real world files tests !