eloraiby / arabtype Goto Github PK
View Code? Open in Web Editor NEWa small and simple implementation that transform isolated arabic utf8 character strings into contextual forms.
a small and simple implementation that transform isolated arabic utf8 character strings into contextual forms.
The following text بذيء appears to produce the wrong output by connecting the characters. Is there any way to fix it?
I adapted my algorithm to Java. It works great, but letters that don't need to be transformed don't seem to work:
This is the text:
مستوى صعوبة الحاسوب
This is how it is rendered:
مستو صعوبة الحاسوب
Basically, the 'ى' is missing!
Is that a bug in my code?
public static char correct(char prev, char next, char ch) {
if ((ch >= ARABIC_LETTER_START) && (ch <= ARABIC_LETTER_FINAL)) {
// covert Arabic letter - https://github.com/eloraiby/arabtype/blob/master/arabtype.c
boolean isLa = isLamAlef(ch, next);
boolean isApl = isAlefPrevLam(prev, ch);
boolean isLapl = isLa | isApl;
// determine char to return
if(isLapl) {
int index = ((isLinkingType(ch) ? 1 : 0) << 1) | (isLinkingType(prev) ? 1 : 0);
return (char)ARABIC_FORMS_B[next - ARABIC_LETTER_START][1][index];
}
else {
if (isApl) {
return ch; // skip previously processed lam alef
}
else {
int index = (((isArabicLetter(next) ? 1 : 0) & (isLinkingType(ch) ? 1 : 0)) << 1) | (isLinkingType(prev) ? 1 : 0);
return (char)ARABIC_FORMS_B[ch - ARABIC_LETTER_START][0][index];
}
}
// NOTE: compact form of the above...
// int index = ((((isLapl | isArabicLetter(next)) & isLinkingType(ch)) ? 1 : 0) << 1) | (isLinkingType(prev) ? 1 : 0);
// int ref = (next * (isLa ? 1 : 0)) + (ch * (isLa ? 0 : 1)) - ARABIC_LETTER_START;
// return (char)ARABIC_FORMS_B[ref][isLapl ? 1 : 0][index];
}
else {
// not an Arabic letter to be converted!
return ch;
}
}
private static final char ARABIC_LETTER_START = 0x0621;
private static final char ARABIC_LETTER_FINAL = 0x064A;
// private static final int ENDING = 1;
private static final int INITIAL = 2;
private static final int MEDIAL = 3;
private static final int UNICODE_LAM = 0x644;
private static boolean isArabicLetter(char cp) {
return ( cp >= ARABIC_LETTER_START && cp <= ARABIC_LETTER_FINAL );
}
private static boolean isLamAlef(char cp, char next) {
return cp == UNICODE_LAM && isArabicLetter(next) && ARABIC_FORMS_B[next - ARABIC_LETTER_START][1][INITIAL] != 0;
}
private static boolean isAlefPrevLam(char prev, char cp) {
return prev == UNICODE_LAM && isArabicLetter(cp) && ARABIC_FORMS_B[cp - ARABIC_LETTER_START][1][INITIAL] != 0;
}
private static boolean isLinkingType(char cp) {
return isArabicLetter(cp) && ARABIC_FORMS_B[cp - ARABIC_LETTER_START][0][MEDIAL] != 0;
}
/** Table to convert to Arabic presentation form B. */
private static final int[][][] ARABIC_FORMS_B = {
{ {0xFE80, 0xFE80, 0, 0}, {-1, -1, 0, 0} }, // hamza (0)
{ {0xFE81, 0xFE82, 0, 0}, {-1, -1, 0xFEF5, 0xFEF6} }, // 2alif madda (1)
{ {0xFE83, 0xFE84, 0, 0}, {-1, -1, 0xFEF7, 0xFEF8} }, // 2alif hamza (2)
{ {0xFE85, 0xFE86, 0, 0}, {-1, -1, 0, 0} }, // waw hamza (3)
{ {0xFE87, 0xFE88, 0, 0}, {-1, -1, 0xFEF9, 0xFEFA} }, // 2alif hamza maksoura (4)
{ {0xFE89, 0xFE8A, 0xFE8B, 0xFE8C}, {-1, -1, 0, 0} }, // 2alif maqsoura hamza (5)
{ {0xFE8D, 0xFE8E, 0, 0}, {-1, -1, 0xFEFB, 0xFEFC} }, // 2alif (6)
{ {0xFE8F, 0xFE90, 0xFE91, 0xFE92}, {-1, -1, 0, 0} }, // ba2 (7)
{ {0xFE93, 0xFE94, 0, 0}, {-1, -1, 0, 0} }, // ta2 marbouta (8)
{ {0xFE95, 0xFE96, 0xFE97, 0xFE98}, {-1, -1, 0, 0} }, // ta2 (9)
{ {0xFE99, 0xFE9A, 0xFE9B, 0xFE9C}, {-1, -1, 0, 0} }, // tha2 (10)
{ {0xFE9D, 0xFE9E, 0xFE9F, 0xFEA0}, {-1, -1, 0, 0} }, // jim (11)
{ {0xFEA1, 0xFEA2, 0xFEA3, 0xFEA4}, {-1, -1, 0, 0} }, // 7a2 (12)
{ {0xFEA5, 0xFEA6, 0xFEA7, 0xFEA8}, {-1, -1, 0, 0} }, // kha2 (13)
{ {0xFEA9, 0xFEAA, 0, 0}, {-1, -1, 0, 0} }, // dal (14)
{ {0xFEAB, 0xFEAC, 0, 0}, {-1, -1, 0, 0} }, // dhal (15)
{ {0xFEAD, 0xFEAE, 0, 0}, {-1, -1, 0, 0} }, // ra2 (16)
{ {0xFEAF, 0xFEB0, 0, 0}, {-1, -1, 0, 0} }, // zayn (17)
{ {0xFEB1, 0xFEB2, 0xFEB3, 0xFEB4}, {-1, -1, 0, 0} }, // syn (18)
{ {0xFEB5, 0xFEB6, 0xFEB7, 0xFEB8}, {-1, -1, 0, 0} }, // shin (19)
{ {0xFEB9, 0xFEBA, 0xFEBB, 0xFEBC}, {-1, -1, 0, 0} }, // sad (20)
{ {0xFEBD, 0xFEBE, 0xFEBF, 0xFEC0}, {-1, -1, 0, 0} }, // dad (21)
{ {0xFEC1, 0xFEC2, 0xFEC3, 0xFEC4}, {-1, -1, 0, 0} }, // tah (22)
{ {0xFEC5, 0xFEC6, 0xFEC7, 0xFEC8}, {-1, -1, 0, 0} }, // thah (23)
{ {0xFEC9, 0xFECA, 0xFECB, 0xFECC}, {-1, -1, 0, 0} }, // 3ayn (24)
{ {0xFECD, 0xFECE, 0xFECF, 0xFED0}, {-1, -1, 0, 0} }, // ghayn (25)
{ { 0, 0, 0, 0}, {-1, -1, 0, 0} }, // (26)
{ { 0, 0, 0, 0}, {-1, -1, 0, 0} }, // (27)
{ { 0, 0, 0, 0}, {-1, -1, 0, 0} }, // (28)
{ { 0, 0, 0, 0}, {-1, -1, 0, 0} }, // (29)
{ { 0, 0, 0, 0}, {-1, -1, 0, 0} }, // (30)
{ {0x0640, 0x0640, 0x0640, 0x0640}, {-1, -1, 0, 0} }, // wasla (31)
{ {0xFED1, 0xFED2, 0xFED3, 0xFED4}, {-1, -1, 0, 0} }, // fa2 (32)
{ {0xFED5, 0xFED6, 0xFED7, 0xFED8}, {-1, -1, 0, 0} }, // qaf (33)
{ {0xFED9, 0xFEDA, 0xFEDB, 0xFEDC}, {-1, -1, 0, 0} }, // kaf (34)
{ {0xFEDD, 0xFEDE, 0xFEDF, 0xFEE0}, {-1, -1, 0, 0} }, // lam (35)
{ {0xFEE1, 0xFEE2, 0xFEE3, 0xFEE4}, {-1, -1, 0, 0} }, // mim (36)
{ {0xFEE5, 0xFEE6, 0xFEE7, 0xFEE8}, {-1, -1, 0, 0} }, // noon (37)
{ {0xFEE9, 0xFEEA, 0xFEEB, 0xFEEC}, {-1, -1, 0, 0} }, // ha2 (38)
{ {0xFEED, 0xFEEE, 0, 0}, {-1, -1, 0, 0} }, // waw (39)
{ {0xFEFF, 0xFEF0, 0, 0}, {-1, -1, 0, 0} }, // 2alif maksoura (40)
{ {0xFEF1, 0xFEF2, 0xFEF3, 0xFEF4}, {-1, -1, 0, 0} }, // ya2 (41)
};
I am successfully using your library. It works great - Thanks :)
I wonder, would it be possible to add support for Farsi? I tried to render Farsi text, but the glyphs don't appear correctly connected? I believe part of the Presentation Form A characters would to be used? How difficult would that be to add? Is it possible?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.