GithubHelp home page GithubHelp logo

w3c / alreq Goto Github PK

View Code? Open in Web Editor NEW
61.0 33.0 31.0 12.14 MB

Documenting gaps and requirements for support of Arabic and Persian on the Web and in eBooks.

License: Other

HTML 92.03% CSS 6.63% Groovy 1.16% Shell 0.15% JavaScript 0.03%
arabic writing-systems persian scripts typography text-layout

alreq's People

Contributors

behnam avatar deniak avatar ebraminio avatar manishearth avatar mostafah avatar moyogo avatar ntounsi avatar plehegar avatar r12a avatar shervinafshar avatar simplyahmazing avatar tgraham-antenna avatar titusnemeth avatar tntypography avatar xfq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alreq's Issues

Section on Abbreviation

There are various methods for abbreviating words in Persian/Arabic text, and I think covering best-practices can help with software tooling.

If so, we can add the text to chapter 3. Characters and Words, or in an Appendix.

Anyways, I'll keep posting some examples here for reference.

Drafting “justification”

We have a wiki page to hold the draft for this section. The discussion can happen here. At the moment the draft is only an outline. I’ll be adding text under each title.

Uniform font family and size in sample images

As we are creating more samples images, we need to make sure they follow a uniform looks, specially on font family and size for sample (Arabic) text, as well as annotations (Latin).

For example, the ARABIC LETTER MEEM image has larger Arabic font size, and large sans-serif annotations, but the ARABIC LETTER REH image, sitting close to that, has smaller sizes and a sans annotations.

screen shot 2017-04-03 at 9 08 58 pm

Test

please ignore

Drafting “direction”

I have a draft here in Google Drive. This is very short, because the role of this section is to introduce the direction of the Arabic script briefly and introduce references for further reading and I couldn’t think of more information to put here. If you have any suggestions for what can be added here, please leave a comment.

Ditto Mark in Arabic script

In handwritten Persian text, a symbol similar to double quotation mark is used as the Ditto Mark.

In movable-type Persian text, sometimes, right-pointing double angle quotation mark (U+00BB) is used as the Ditto Mark. Here's an example from Persian Grammer (by Parviz Natel-Khanlari):
screenshot from 2017-02-27 20-11-09

IMHO, the angle marks are used instead of non-angle marks only because:

  • following the English tradition of using double quotation mark, and usually using the angle marks in place of any quotation marks,
  • single quotation mark or double quotation mark characters not being available in the Persian type case—therefore the need to go to another case, if actually needed.

I believe this is one of those things that may actually need a new character, eventually, as both the meaning and shape of the mark is different from any existing characters. Either way, we need more evidence from various languages.

Shortcomings of Characters Table

Hi,

in section Punctuations and symbols I find that these three symbols are marked as not being used in Arabic but only in Farsi:

U+066A ARABIC PERCENT SIGN
U+066B ARABIC DECIMAL SEPARATOR
U+066C ARABIC THOUSANDS SEPARATOR

Which characters are used for that purpose in Arabic documents? Just the Western/European percent, comma, and point?

Any advice very much appreciated,

  • Michael

Section on overlapping letter glyphs

In various styles of Persian/Arabic script, glyphs of letters may overlap intentionally. I think we need to cover this script feature under letter positioning section.

An example from Zarnegar 5.2 Catalog (http://sinasoft.com/Downloads/zarnegar5.2/catalog/Zar52Cat.pdf): on the right we have "before repositioning" and on the left we have "after repositioning".

image

We should also note that sometimes, mostly because of bad (mainly practical) font glyph data, unintentional glyph overlap may occur. This unintentional behavior is different from the intentional design mentioned above, and a shaping engine may decide to be smart about this and resolve it somehow.

Expand on the “ligatures” section

Our ligatures section (currently numbered as 2.4.2) only talks about lam and alef ligatures. It should cover other ligatures as well.

Also, this is the last sentence of the section:

Each of these ligatures also provides a special shape for joining from its right side (to the preceding letter).

It is talking about variations of the lam and alef ligature, but that’s not clear enough and it can confuse readers into thinking that all ligatures are right-joining only.

Requiring control over different font size, baseline shift, and line height for different scripts

Topic Font and Typographical considerations already mentions “font size considerations for mixed-script text,” i.e., being able to select different font sizes for different scripts when mixing them.

My suggestion is that font size is not the only property that need that level of control for different scripts: other text-related properties like baseline shift, line height, and letterspace require different values for mixed scripts too.

This issue also needs more thought about how we want to say these different properties should be applied: per script or per font?

Mentioning glyph overlap behavior, specially with effects

Arabic glyphs have overlap when they join. This shows itself in an unwelcome form in some contexts, like when opacity or text border is applied to texts.

The following image demonstrates this problem: glyph joins should not be seen (like in the “normal” text), but they become visible when transparency or text border is applied in a poor implementation.

glyph overlap

A proper implementation would unify glyph paths into a single one before applying these filters. This is what the correct rendering would result in:

glyph overlap - proper

Is this something we should mention as a requirement?

New reference suggestion: The Style Guide from IUP

@shervinafshar, The Style Guide from IUP is an interesting reference, specially its second half that covers recommendations for mathematics, physics, and chemistry writing, book pages, tables of contents, and glossaries.

But I don’t think that a digital version exists. Just wanted to let you know about it. We may be able to find a way to share it with the group as we need it in the future.

Create a section Arabic Script Overview

While working on the Tibetan layout requirements doc at http://w3c.github.io/tlreq/ it seemed useful to separate out the information that simply described how the script worked into a separate section, Tibetan script overview. The actual layout requirements start in the following section, Typography for Tibetan Characters.

i suspect that a similar Arabic Script Overview section could be useful to gather together descriptive text for alreq.

Drafting "Characters in Use"

@mostafah started a draft for "Characters in Use" topic keyword. The initial draft can be found in here.

We're going to discuss the draft here and add the content to the Google Doc.

Deprecated Arabic presentation forms considered helpful

Dear all,

When styling Arabic letters within a word (e.g. with colors), they may not join in some rendering tools. For example, in Firefox letters joins but not in Safari or Chrome.
Arabic Presentation Forms-B (U+FE70..U+FEFF) or Arabic Presentation Forms-A (U+FB50..U+FDFF), are deprecated but may helps solve this problem.
Please see the example here:
http://www.w3c.org.ma/Tests/joiningColoredLetters.html
where deprecated contextual letters, or deprecated ligature character are used (case 4 and 5).

Opinion?

Najib

How to handle transcriptions in the glossary

Could we add vowelling to the glossary items? Not only would this be useful for cases such as رُقعة vs رِقعة , but it would also help non-native speakers, like me, to know how to pronounce the words.

Variant letter shapes

I believe our character list contains all of the characters ق and ڧ, and the characters ف and ڢ? Given that each pair represents two visual variants of the same letter, if someone in North Africa wants to use a shape relevant to their region, such as ڢ, is it appropriate to use a separate character, or should one expect the difference to be produced by using a different font? Same question for the qaf.

Of course, the reason i ask is that spelling the same word in two different ways is not ideal for searching, security (eg. IDNs), etc.

Vertical Arabic

(Following issue w3c/i18n-drafts#81)

Independently from body texts that run normally vertically (e.g. CJK, Mongolian), small Arabic text can be set vertically, e.g. on a book spine or in table header.

Although the flow direction may depend on writing style, countries or what ever, it might be desirable to have text read from top to bottom

Example-1) Vertical Arabic text running from top to bottom, and Latin from bottom to top (book spine).
https://app.box.com/s/i4j6xx6kn6gu5nt2ha5ynno7qemoxlq4

writing-mode: sideways-lr; appear appropriate for this case. Implemented only in Firefox.
(BTW: sideways value "is at-risk and may be dropped" from CSS3!

Example-2) Vertical Arabic, letters upright (isolated form)

http://photo.elcinema.com.s3.amazonaws.com/uploads/_310x310_dec41d25dd424bae652d87c7c50339116861040b15b863c7e70c8ed9b48c9e12.jpg

http://farm4.static.flickr.com/3277/2830543747_de05b47f02.jpg

http://1.bp.blogspot.com/-46crVsYMaSw/VLICB1smWZI/AAAAAAAAzGQ/AbwLJoh_IQE/s1600/560671_433062960083202_2034873025_n.jpg

text-orientation : upright; is doing this.

In Firefox, Arabic run from top to bottom (in parallel to Latin, by the way!..)
In Safari and Chrome Arabic run from bottom to top.

It should be possible to style a single character without breaking Arabic joining

It is sometimes a requirement in educational texts to style a single character of a word. This requires putting that single character (or a small group of successive characters) in a new group. This should not lead to broken joining.

This image compares the expected behavior with what may be achieved in a poor implementation:

glyph styling

2.2 Direction needs section about "Numbers"

I already learned that decimal numbers use the same positions as in other languages, i.e. the notation is the same. But in the real world there are applications of numbers like e.g. telephone numbers which require localization.
E.g. a telephone number separated by spaces entered as "+49 (555) 123 4567" may appear as "4567 123 (555) 49+" which is possibly wrong.
I suggest to collect more samples of real world numbers and their appearance in Arabic documents.

ACTION-63: Provide sources and urls for the images

Hi,

Here are the source links for the images (see bellow [1])
For most of them, the Commun wikimedia page proposes how to use the images. With at least a href to "the page URL" and a src to the "file URL". There is also an "attribution", but "not not legally required"

Two questions arise :

  1. Should the document link the images to their source page?
  2. Should we keep the image locally (src = "images/...") or point to them at wikimedia (src = "_the origine at wikipedia_") ?

[1]
Kufic script
src = https://upload.wikimedia.org/wikipedia/commons/b/b6/A_section_of_the_Koran_-_Google_Art_Project.jpg
href = https://commons.wikimedia.org/wiki/File:A_section_of_the_Koran_-_Google_Art_Project.jpg

Thuluth script
src = https://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Basmalah-1wm.png/220px-Basmalah-1wm.png
href = https://commons.wikimedia.org/wiki/File:Basmalah-1wm.png

Naskh script
src = https://upload.wikimedia.org/wikipedia/commons/8/83/FirstSurahKoran_%28fragment%29.jpg
href = https://commons.wikimedia.org/wiki/File:FirstSurahKoran_%28fragment%29.jpg

Ruq'ah script
src = https://upload.wikimedia.org/wikipedia/fa/f/f9/Ruq_ah.gif
href = https://fa.wikipedia.org/wiki/پرونده:Ruq_ah.gif

Taaliq
src = https://upload.wikimedia.org/wikipedia/commons/d/df/Miremad-1.jpg
href = https://commons.wikimedia.org/wiki/File:Miremad-1.jpg

Diwani script
src = https://upload.wikimedia.org/wikipedia/commons/0/01/Izzet_44.png
href = https://commons.wikimedia.org/wiki/File:Izzet_44.png

Nasta'liq script
src = https://upload.wikimedia.org/wikipedia/commons/9/9b/Khatt-e_Nastaliq.jpg
href = https://commons.wikimedia.org/wiki/File:Khatt-e_Nastaliq.jpg

Maghribi script
src = https://upload.wikimedia.org/wikipedia/commons/3/35/Maghribi_script_sura_5.jpg
href = https://commons.wikimedia.org/wiki/File:Maghribi_script_sura_5.jpg

Applying a unified style to the document

As I was drafting my sections, some style-related questions came to my mind:

  • Should we show the characters in images from left to right or from right to left?
  • What size should we use for example texts in images, in-image explanations, and the resolution of the images themselves?
  • Is it possible to use SVG images?
  • How should we refer to other works? By its unique code or its full name?

We don’t have to think about these yet. I did my work without worrying about these questions. But I created this issue to have a place to gather these issues as they come up while we work.

BTW, there might be a W3C style guide that we can use.

Why are bidi categories of Arabic-indic & Eastern Arabic numbers different?

Are all numbers equal in category and directional property?

  • Digit 2 (U+0032) is of category "EN, European Number". OK.
  • Arabic-digit indic ٢ (U+0662) is of category "AN, Arabic number". OK.
  • but the other ۲ (U+06F2), the Eastern Arabic-Indic counterpart of it, is of category "EN, European Number" like digit 2. Any reason to this difference between the last two?

There is also a difference in Bidi behavior : the same visual text a2b
will be displayed in RTL context as b2a if two is Arabic number, and a2b, if European number (simply like "a 2 b"). Aren't ALL numbers WEAK in directional property?

Whether to include Math in the first version or not

Let's have a discussion on whether to include Math, as one of the "special cases" we like to document, in the first version.

One reason to not do so is the topic being fairly independent from the rest of the document.

See also:

  • "Arabic mathematical notation", a self-contained discussion of Arabic mathematical notation in MathML [http://www.w3.org/TR/arabic-math/]

Slanted font styles (Italic, Oblique, Iranic, ...)

As discussed on the weekly meeting, we want to document the existing methods for using slanted text, slanted to the left or to the right, as it's been a common practice for the past few decades.

The goal is to document the details of the existing methods, and try to find the common names for them.

Also, we want to note that using slanted text is not a traditional way of emphasis/quotation/etc, but a half-baked borrowing from Latin-script Italic/Oblique methods.

Miscellaneous questions and suggestions about list of topics

I’m opening this issue to discuss the ideas and questions about current state of the document.

  • Should we also cover the problem of glyph overlap when we talk about joining? Arabic glyphs overlap when they join and this shows itself when applying filters like opacity to a word. The preferred behavior probably is to unify the glyph paths before applying these filters to them.
  • Another joining issue that is important in some educational contexts is to style a specific character of a word without breaking its joining.
  • Font size is not the only property that could be discussed about mixing scripts (in the line about Font and Typographical considerations). Line height and baseline shift are as important.
  • The topic of numbering is repeated in different sections and is in fact a requirement for a lot of them. Can we have a unified place in the document that covers numbering in detail and refer to it from other parts like page numbers and lists?

Proper tags for titles: H2 or H3?

Related to our recent cleanup changes to the document, I noticed that we are not consistent about the HTML tags for titles.

Most of the document uses h2 for all levels of headers, depending on nested sections for building hierarchy, but we also have some h3s in similar situations.

Is this something that we should care about? And if so, what is the suggested solution? I can send a PR if we have a policy for this.

[section] List alignment and directions

Questions to address:

  • Default list alignment and direction?
  • Alignment and direction for lists with all items having opposite directions?
  • Alignment and direction for mixed-direction items?

Section on spacings

From a typographical point-of-view, we need to answer the following questions:

  • How much is inter-word spacing and how it relates to justification (to be useful for type designers, mostly),
  • What kinds of intra-word spaces (ZWNJ and narrow spaces) are used and how?
  • How's spacing around and between numerals (maybe also include number/date-ranges and phone-number issues with bidi levels),
  • How much is the spacing around punctuation marks, specially for parenthesis, period, comma, colon, semicolon, question ad exclamation marks, dashes, etc.
    • One important question here is whether there should be more space after (on the left-hand-side of) period comparing to comma/colon/semicolon, or not?

Which transliteration schemes should we use?

Looking at the way Najib spelled 'Riqaa' made me realise that we need to establish some standard approach to transliterating Arabic and Persian words. I really don't want this to get in the way of creating real content for the document, but it's something we should take a look at (and spend as little time on as possible).

A unified numbering section

The topic of numbering is repeated under different headers. I suggest we create a section in the document to numbering. Other parts of the document that need numbering can refer to this section.

Layout (flow direction) of math formula

In different countries in which Arabic script is used the direction of math formula is different. It is by all means different in those countries from rest of the world.

In some countries it can be LTR:
x + y = z
While in others it can be RTL:
z = y + x

Some initial information on preferences in different countries is available from: ​http://www.wiris.com/editor/docs/resources/arabic-numbers-countries

image

image

Not all countries in which Arabic script is used are represented on the images above.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.