GithubHelp home page GithubHelp logo

mathiasbynens / esrever Goto Github PK

View Code? Open in Web Editor NEW
880.0 21.0 35.0 47 KB

A Unicode-aware string reverser written in JavaScript.

Home Page: https://git.io/esrever

License: MIT License

JavaScript 94.24% HTML 5.76%

esrever's Introduction

Esrever Build status Code coverage status Dependency status

Esrever is a Unicode-aware string reverser written in JavaScript. It allows you to easily reverse any string of Unicode symbols, while handling combining marks and astral symbols just fine. Here’s an online demo.

Why not just use string.split('').reverse().join('')?

The following code snippet is commonly used to reverse a string in JavaScript:

// Don’t use this!
var naiveReverse = function(string) {
  return string.split('').reverse().join('');
};

However, there are some problems with this solution. For example:

naiveReverse('foo 𝌆 bar');
// → 'rab �� oof'
// Where did the `𝌆` symbol go? Whoops!

If you’re wondering why this happens, read up on JavaScript’s internal character encoding.

But there’s more:

naiveReverse('mañana mañana');
// → 'anãnam anañam'
// Wait, so now the tilde is applied to the `a` instead of the `n`? WAT.

In order to correctly reverse any given string, Esrever implements an algorithm that was originally developed by Missy ‘Misdemeanor’ Elliot in September 2002:

I put my thang down, flip it, and reverse it. I put my thang down, flip it, and reverse it.

And indeed: by swapping the position of any combining marks with the symbol they belong to, as well as reversing any surrogate pairs before further processing the string, the above issues are avoided successfully. Thanks, Missy!

Installation

Via npm:

npm install esrever

Via Bower:

bower install esrever

Via Component:

component install mathiasbynens/esrever

In a browser:

<script src="esrever.js"></script>

In Narwhal, Node.js, and RingoJS:

var esrever = require('esrever');

In Rhino:

load('esrever.js');

Using an AMD loader like RequireJS:

require(
  {
    'paths': {
      'esrever': 'path/to/esrever'
    }
  },
  ['esrever'],
  function(esrever) {
    console.log(esrever);
  }
);

API

esrever.version

A string representing the semantic version number.

esrever.reverse(string)

This function takes a string and returns the reversed version of that string, correctly accounting for Unicode combining marks and astral symbols.

Usage example

var input = 'Lorem ipsum 𝌆 dolor sit ameͨ͆t.';
var reversed = esrever.reverse(input);

console.log(reversed);
// → '.teͨ͆ma tis rolod 𝌆 muspi meroL'

esrever.reverse(reversed) == input;
// → true

Using the esrever binary

To use the esrever binary in your shell, simply install Esrever globally using npm:

npm install -g esrever

After that you will be able to reverse strings from the command line:

$ esrever 'I put my thang down, flip it, and reverse it.'
.ti esrever dna ,ti pilf ,nwod gnaht ym tup I

$ esrever 'H̹̙̦̮͉̩̗̗ͧ̇̏̊̾Eͨ͆͒̆ͮ̃͏̷̮̣̫̤̣ ̵̞̹̻̀̉̓ͬ͑͡ͅCͯ̂͐͏̨̛͔̦̟͈̻O̜͎͍͙͚̬̝̣̽ͮ͐͗̀ͤ̍̀͢M̴̡̲̭͍͇̼̟̯̦̉̒͠Ḛ̛̙̞̪̗ͥͤͩ̾͑̔͐ͅṮ̴̷̷̗̼͍̿̿̓̽͐H̙̙̔̄͜'
H̙̙̔̄͜Ṯ̴̷̷̗̼͍̿̿̓̽͐Ḛ̛̙̞̪̗ͥͤͩ̾͑̔͐ͅM̴̡̲̭͍͇̼̟̯̦̉̒͠O̜͎͍͙͚̬̝̣̽ͮ͐͗̀ͤ̍̀͢Cͯ̂͐͏̨̛͔̦̟͈̻ ̵̞̹̻̀̉̓ͬ͑͡ͅEͨ͆͒̆ͮ̃͏̷̮̣̫̤̣H̹̙̦̮͉̩̗̗ͧ̇̏̊̾

$ cat foo.txt
These are the contents of `foo.txt`.
This is line two.

$ esrever -f foo.txt
.owt enil si sihT
.`txt.oof` fo stnetnoc eht era esehT

$ esrever -l foo.txt
.`txt.oof` fo stnetnoc eht era esehT
.owt enil si sihT

Why not just use the good old rev command instead? Glad you asked. rev doesn’t account for Unicode combining marks:

$ rev <<< 'mañana mañana'
anãnam anañam

On the other hand, the esrever binary returns the expected result:

$ esrever 'mañana mañana'
anañam anañam

See esrever --help for the full list of options.

Support

Esrever has been tested in at least Chrome 27-29, Firefox 3-22, Safari 4-6, Opera 10-12, IE 6-10, Node.js v0.10.0, io.js v1.0.0, Narwhal 0.3.2, RingoJS 0.8-0.11, PhantomJS 1.9.0, and Rhino 1.7RC4.

Unit tests & code coverage

After cloning this repository, run npm install to install the dependencies needed for Esrever development and testing. You may want to install Istanbul globally using npm install istanbul -g.

Once that’s done, you can run the unit tests in Node using npm test or node tests/tests.js. To run the tests in Rhino, Ringo, Narwhal, and web browsers as well, use grunt test.

To generate the code coverage report, use grunt cover.

Author

twitter/mathias
Mathias Bynens

License

Esrever is available under the MIT license.

esrever's People

Contributors

mathiasbynens avatar redchair123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

esrever's Issues

Incorrect reversal of the U+0489 character

I've got a string with this character:
҉ U+0489 COMBINING CYRILLIC MILLIONS SIGN

Before reversing: te҉st te\u0489st
After reversing: ts҉et ts\u0489et

I might be wrong, but I expected tse҉t tse\u0489t instead. Is there a reason why it behaves like this or is it just a bug? I found it when my unit tests failed while checking my code using random zalgo examples.

Support for Hebrew diacritics and other grapheme extenders

Hello, I've used your online demo http://mothereff.in/reverse-string and tried entering some hebrew with niqqud (diacritics) there and what I've got:

Actual result: שָׁלוֹם (shalom) got reversed to םֹולָׁש (which is nonsense, because lamed ל got diacritics from ש, look שָׁ -> לָׁ )
Expected: שָׁלוֹם - at least should be reversed to םוֹלשָׁ (so that each letters keeps it diacritics).

What do you think?

Example is just as bad in Firefox.

The example you used here doesn't work: link.

Input:
var input = 'foo 𝌆 bar mañana mañana';
console.log(`Using esrever: ${esrever.reverse(input)}`);
console.log(`Using bad way: ${input.split('').reverse().join('')}`);
Output:
Using esrever: anãnam anañam rab 𝌆 oof
Using bad way: anãnam anañam rab �� oof

Yep, you better start working on this weird issue. It doesn't work on chrome either. This is a very big "oof" for you guys.

(sort of related to #18)

Manana example appears broken in Firefox

In Firefox 48.0.1 on OSX the manana example appears like this for me:

was wondering what "so now the tilde is applied to the a instead of the n" was referring to until I checked it out in Chrome. Not sure if this is a known problem!?

Myanmar vowel signs

hi,
I'm trying to look through this for a way to detect diacritics and modifiers in Myanmar. I could write a regex for my own use, but I'd like to be consistent with other languages (similar to how this library uses node-unicode-data to list all combining marks)

For example the Myanmar string နေပြည်တော် currently looks like ်ာေတ်ညြပေန when reversed, with lots of misplaced and orphaned diacritics, but it should be တော်ည်ပြနေ

In the Unicode spec these are called "vowel signs" and not "combining characters". Should these be included in esrever?

Question about Step 2

more a question than an issue: Why do you manually reverse the string in step 2? After step 1, the otherwise incorrect string.split('').reverse().join('') idiom should work, right? (Tried it and the tests work.)

Is it for performance reasons? Because I would have thought copying a string over and over again yields O(n²), where the array approach would yield O(n).

Other than that thanks for the insightful article on esrever. :)

don't split combined emoji

  • input: 🏄🏼‍♂️
  • expected output: 🏄🏼‍♂️
  • actual output: ♂‍🏼🏄

I wonder how you can achieve tokenising a combined emoji into 1 character.

This library seems to correctly do it for me:
https://github.com/bluelovers/runes

It will tokenise 🏄🏼‍♂️ as just 1 character. (it doesn't flip sentences though...)

BTW, do you consider extracting just the tokeniser part of this library without the reverse part into a separate NPM package?

Help save Esrever (I know we can't use `string.split(str).reverse().join("")` but can't we use `(Array.from(str)).reverse().join("")` or even `[...str].reverse().join("")`)

These days, there are loads of new one-liner JS methods and I think this repo needs a reason to be better than

window.nativeReverse = function(str) {
  return (Array.from(str)).reverse().join("");
}

or

window.nativeReverse = function(str) {
  return [...str].reverse().join("");
}

Yes. These are both one-liners that even follow the exact result of esrevers "foo 𝌆 bar" example. And esrever does "mañana mañana" wrong in Chrome or Firefox and Esrever also has the 'zero-width joiner' issue. Lets hope to see this issue fixed soon, or let esrever fade into the vastness of GitHub.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.