Comments (4)
Hey @jrochkind,
This looks pretty interesting, but unfortunately we don't have plans to incorporate UTR#30 into TwitterCLDR at the moment. At first glance, it seems like a fairly straightforward algorithm, and I would happily accept a pull request. TwitterCLDR's current transformations are really normalizations, one of which UTR#30 specifically depends on (NFD), so at least that's already done. You can make use of NFD normalization using the corresponding class:
TwitterCldr::Normalization::NFD.normalize(text)
# alternatively:
text.localize.normalize(:using => :NFD)
Good luck!
from twitter-cldr-rb.
Thanks! I may try a pull request in the future.
from twitter-cldr-rb.
Do you have any advice as to how to use the mapping data files of the sort here with TwitterCLDR? That is, is there already a part of TwitterCLDR written to use this kind of mapping data, but applied to other mapping data? I ask because it seems like this may be some kind of standard unicode mapping data file, I'm not sure.
from twitter-cldr-rb.
It looks like those files contain a series of folding rules that map one character (or range of characters) to another. The algorithm in UTR#30 says to perform the following steps:
a. Apply optional folding operations (i.e. rules from the solr files)
b. Apply canonical decomposition (described above)
c. Repeat (a) and (b) until stable (I think "stable" means "until you can't decompose any more")
d. Apply composition if necessary (only if you want the string in composed form, based on your technical requirements)
Applying a folding operation might look something like this: given a rule like 058A>002D
, every time you encounter a "058A" character, you'd replace it with "002D". Bear in mind that I only took a cursory glance over the UTR#30 spec, so that might be incorrect. Indeed, the spec is quite a bit more complicated than that.
If you do decide to work on this feature and submit a PR, I'd suggest looking around for a test file. Unicode publishes a set of test data (inputs and correct outputs) for algorithms like normalization and bidi, so it's possible they have one for folding as well.
Good luck!
from twitter-cldr-rb.
Related Issues (20)
- CLDR 35 Dev Branch HOT 2
- Formatting a full datetime with a time zone gives current time timezone (PST instead of PDT) HOT 4
- Incorrect list formatter behaviour HOT 2
- Exception when using hyphenate HOT 1
- Incorrect LikelySubtags.locale_for behaviour HOT 1
- Incorrect ListFormatter behaviour HOT 6
- Weekday names `:wide` format is the same as `:abbreviated` HOT 5
- Ruby 3.1.0 with Psych 4 forces YAML.safe_load and throws Psych::DisallowedClass HOT 1
- Month names :narrow format returns array of digits rather than the first letter of each month HOT 3
- Ruby 2.7+ compatibility issue/question HOT 1
- Warning when running Sorbet HOT 2
- Why use :one instead of :name HOT 1
- Remove lone release HOT 1
- Missing titlecase and/or uppercase mappings? HOT 1
- Add support for yMMMMd (October 7, 2022) HOT 2
- `format: :long` breaks when combined with currency formatting HOT 4
- Breaking by word a string containing Japanese and Latin characters HOT 3
- Contributor access revoked HOT 1
- Wrong language name is returned for Norwegian Nynorsk (nn) HOT 1
- Unicode, ICU, CLDR update HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twitter-cldr-rb.