GithubHelp home page GithubHelp logo

chromee / graphemesplitter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ufcpp/graphemesplitter

0.0 1.0 0.0 177 KB

A C# implementation of the Unicode grapheme cluster breaking algorithm

License: MIT License

C# 99.90% CSS 0.01% PowerShell 0.09%

graphemesplitter's Introduction

GraphemeSplitter

A C# implementation of the Unicode grapheme cluster breaking algorithm.

NuGet package

https://www.nuget.org/packages/GraphemeSplitter/

Install-Package GraphemeSplitter

Sample

using GraphemeSplitter;
using static System.Console;
using static System.String;

public partial class Program
{
    static string Split(string s) => Join(", ", s.GetGraphemes());

    static void Main()
    {
        WriteLine(Split("πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘¦πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘¦")); // πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘¦, πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦, πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘¦
    }
}

Web Sample:

Razor Page Sample

Implementation

This library basically implements http://unicode.org/reports/tr29/.

Expample:

type text split result
diacritical marks à̠́̑bΜ‚ΜƒΜ’Μ£cΜƒΜ„Μ£Μ€dΜ…Μ†Μ₯Μ¦ "à̠́̑", "bΜ‚ΜƒΜ’Μ£", "cΜƒΜ„Μ£Μ€", "dΜ…Μ†Μ₯Μ¦"
variation selector 葛葛󠄀葛󠄁 "θ‘›", "θ‘›σ „€", "葛󠄁"
asian syllable ᄋᅑᆫ녕ᄒᅑ세요 "ᄋᅑᆫ", "α„‚α…§α†Ό", "α„’α…‘", "세", "α„‹α…­"
family emoji πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘¦πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘¦ "πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘¦", "πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦", "πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘¦"
emoji skin tone πŸ‘©πŸ»πŸ‘±πŸΌπŸ‘§πŸ½πŸ‘¦πŸΎ "πŸ‘©πŸ»", "πŸ‘±πŸΌ", "πŸ‘§πŸ½", "πŸ‘¦πŸΎ"

but slacks out the GB10, GB12, and GB13 rules for simplification.

original:

  • GB10 … (E_Base | EBG) Extend* Γ— E_Modifier
  • GB12 … sot (RI RI)* RI Γ— RI
  • GB13 … [^RI] (RI RI)* RI Γ— RI

implemented:

  • GB10 … (E_Base | EBG) Γ— Extend
  • GB10 … (E_Base | EBG | Extend) Γ— E_Modifier
  • GB12/GB13 … RI Γ— RI

Difference is:

sequence original implemented
aΜ€πŸ»β€ (U+61, U+300, U+1F3FB) Γ— Γ· Γ— Γ—
πŸ‡―πŸ‡΅πŸ‡ΊπŸ‡Έ (U+1F1EF, U+1F1F5, U+1F1FA, U+1F1F8) Γ— Γ· Γ— Γ— Γ— Γ—

(where Γ· and Γ— means boundary and no bounadry respectively.)

Acknowledgements

This library is influenced by

graphemesplitter's People

Contributors

ufcpp avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.