GithubHelp home page GithubHelp logo

A lot of questions since i do not know Arabic - trying to compose best online Arabic dictionary for non-Arabic speakers about arramooz HOT 11 CLOSED

linuxscout avatar linuxscout commented on June 9, 2024
A lot of questions since i do not know Arabic - trying to compose best online Arabic dictionary for non-Arabic speakers

from arramooz.

Comments (11)

linuxscout avatar linuxscout commented on June 9, 2024

Salam,
most of those terms are explained in ReadMe file
I will answer you questions,
1- Can you tell me each of these word type in English?
All those terms are :
a- nouns; sub categories of nouns
"اسم فاعل" : a derived nouns from a verb to maker a subject (actor) noun, line : write => writer
"اسم مفعول": a derived nouns from a verb to maker an object noun, like : write => written
"جامد" : an non derived noun, like borrowed words, or irregular derivation words, like book in english
"منسوب": relative nouns, like عرب=> عربي turkey => turkish
"مصدر": Masdar, or nominal form, which mean the action, كتب => كتابة, write => the writing
b- adjectives
"صيغة مبالغة" : adjective, with Exaggerate
"صفة مشبهة" : adjective, with Characteristic
"صفة": adjective
"اسم تفضيل" : comparative
2-Are all roots in verbs database 3 letters? Because i have all conjugations of all roots that are 3 letters
If you want verbs conjugation, you can use another project Qutrub.
You can download all verbs conjugation from https://sourceforge.net/projects/qutrub/files/conjugatedVerbs/All-verb-conjugated-0.2.tar.bz2/download

### 3: What are these categories mean in English?
Most of those categories are repeated, but some ones contains additional information, which must be reported to another fields, semantic or derivation fields

"فاعل": subject noun ( actor)
"(فا.من حَصَد)" : the current word is an actor (فا. as abbreviation ) from verb حصد.
"(فا.من شَغَلَ)" : same
"(فا.من نَاسَبَ)": same
"مفعول" : is obejct noun
"اسم أداة": partical
"اسم": noun
"اِسْمٌ": same above 
"اسْم" : same above
"اسم هيئة": a noun describing a state (حالة)
"مبالغة": Exaggerate adjective
"اسم المرة": a time noun ( time from one time)
"اسم نوع" : a generic noun for a type of creator, like human is a type noun of humen
"(النوْعُ مِنْ فَرَشَ)" : a noun from the verb فرش
"(النَّوْعُ مِنْ قَامَ)" : same above
"(النَّوْعُ مِنْ هَانَ)": same above
"تصغير" : a special adjective to describe a small thing, for example: man رجل=> a small man رويجل

a dog كلب as small dog كليب
"صفة" : adj
"طرف زمان": a particle to indicate time
"منسوب" : relative adjective, like turkish, capitalist
"اسم مكان": a noun indicates a place
"اسم آلة": a machine or instrument name, wash, washer, سيّارةة طائرة
"أُنْثَى الحِرْباءِ.": here it indicate that this word is a female form of Cameleon
"اِسْمٌ مِنْ أسْمَاءِ جَهَنَّمَ": indicate that name is one of names of جهنم
"(مَنْسُوبٌ إِلَى هُوَ)": relative to He
"(صِيغَةُ فَعِل)": a verb form
"مصدر صناعي" : a noun with ية suffix like اشتراكية
"مصدر" a nominal action
"اسم المصدر" a nominal action
"اسم مصدر" : a nominal action
"مصدر ميمي" a noun started by Meem
"صيغة" form
"صفة/صيغة" cf. Q1
"صفة مشبهة" cf. Q1
"(صفة مشبهة - ربــاعي)"
"اسم تفضيل" cf. Q1

from arramooz.

MonsterMMORPG avatar MonsterMMORPG commented on June 9, 2024

@linuxscout salam. Thank you very much for answers. I have read readme file but since i dont know arabic i can not make sense of them

I hope you answer my other questions with examples as well. thank you

I want to generate all forms of given words so i can add them relationally to my database

Like when a person searches for word مرق i should be able to show that person
if exists
plural form
feminin form
feminin plural form
masculine plural form
masculine form
and other suffixes having forms which i dont know what are they at the moment

so i should be able to provide non-Arabic speaker all forms of that particular word in a structured and sense making way :)

from arramooz.

linuxscout avatar linuxscout commented on June 9, 2024

**

Also since i dont know Arabic i am having some problems

**

I am listing my all questions 1 by 1 for nouns
1: What does original mean? I mean what are the difference between these 2

unvocalized: معتاد and original: اِعْتَادَ

The noun معتاد is derived from the verb اِعْتَادَ

2: what does stamped mean?
I mean what are the difference between these 2
unvocalized: معتاد and stamped: معتد

It's used for search, we strip weaked letters (vowels)

3: what does wazn mean?
I mean what are the difference between these 2
unvocalized: منزوي and wazn: مُنْفَعِلٌ

The word template, in arabic KaTab :كتب will be مكتوب =>maKTooB, the template is ma**oo*

4: what does mankous mean?
I mean what are the difference between these 2
unvocalized: بَارِي and how do we obtain mankous?

Mankous or (can be striped) is a word ending by Yeh, and can lost his Yeh in some cases. for example عالي => العالي => عال

5: to make feminable we add this literal ة to the end of the all words right?
for example to make بَارِي to feminine we make it as بَارِية . Is this approach correct?

Yes, if the word is feminable, we can add Teh Marbuta

6: what does defined column means?

Some words are defined by default, and can't accept definition determinant, like الله، جهنم

7: Are these genders correct?

Yes

مذكر : male
"" : no gender

not defined yet, missing info

مؤنث : female
مشترك : can be male or female : both?

yes

8: number 👍
مفرد : single
جمع تكسير : broken plural?

Yes

from arramooz.

MonsterMMORPG avatar MonsterMMORPG commented on June 9, 2024

Waiting you to complete all answers. So i can ask which parts i still do not get. Thank you very much.

from arramooz.

linuxscout avatar linuxscout commented on June 9, 2024

9: i need some cases explanation about number, single, broken plural
case 1 :
number is مفرد which means single

it means the single form of a word, if the word is irregular plural (broken plural)

single is not empty
for example : unvocalized: أكلة , single: رِعْيٌ , brokenplural : +ات [لا يجوز جمع مذكر سالم]

the single here is mistaken, I will correct it

Ok i need explanation about case 1.
If word is already single, why single column is not empty?
If word is single, how do i generate plural form of it from broken plural description. It has + and ] literals

if a word is single, there are three types of plurals,
1- regular plural as masculine, we add ون in indicative form كاتب+ون, or ين in accusative form, like كاتب+ين
2- regular plural as feminin, we add ات in all forms كاتب+ات,
3- irregular plural ( broken), is irregular, then is dictionary based, like باب => أبواب

case 2 :
number is مفرد which means single
single is empty
for example : unvocalized: شاذ , single: '' , brokenplural : ون;ات;شواذ

So in this case unvocalized form is single right? and for generating plural forms, i add each one of the broken plural to the end of the unvocalized form right?

the unvovlized here is single, the number field must indicate مفرد
The brokenplural is the irregular plural, but in this case, this word accept also regular plural by adding ات or ون

for example plural forms of شاذ are شاذون and شاذات

This وحدان is mistaken

  • If this approach is correct, are there any meaning difference between those 3 plural forms?

see above

case 3 :
number is جمع تكسير which broken plural
single is empty
broken_plural is empty
in this case, does this mean, this word is already plural form and doesnt have single form?

Yes, it can has a single form, but it's missed

for example vocalized is : ضأن so this word doesnt have singular form?
or does it mean it is both plural and singular?

this word, in this example is uncountable word,

case 4 :
number is جمع تكسير which broken plural
single is empty
broken_plural is كَاسِبَة

mistaken entry

no, see above

is this correct?
case 5 :

number is مفرد
single is empty
broken_plural is empty
so i am guessing that this word is regular plural
in this case, how do i generate its plural form
for example unvocalized is مخبر so how do i generate its plural form?

Some case, we have incomplete entries

from arramooz.

linuxscout avatar linuxscout commented on June 9, 2024

10: what does dulable column means?
how do i dualable a word?

It accept suffixes like: ان, ين

11: what does mamnou3_sarf column means?

Thoses words doesn't accept Nasb Tanween مlike جهنم can't be جهنمًا

how do i mamnou3_sarf a word?

It doesn't accept alef + tanween

12: what does relative column means?

relative ,like turkey, turkish, italia italain, عرب => عربي
أمريكا أمريكي

how do i relative a word?

  • Yeh

13: what does w_suffix means?

it's not necessary for a standard dictionary,
it's used if a regular plural is attached to next word,
workers of company => موظفو الشركة

how do i w_suffix a word?

add WAW

14: what does hm_suffix means?

it's not necessary for a standard dictionary,
if the word can be attached to a human ( عاقل, an intelligent being),
Their Salaries أجورهم

how do i hm_suffix a word?

accept suffixes: هم، هن

15: what does kal_prefix means?

it's not necessary for a standard dictionary,
if the word can be attached to the lister (You 2nd person),
Your Book, :كتاب+ك

how do i kal_prefix a word?

accpet suffixes like: ك you, كما you two, كم you (plural)

16: what does ha_suffix means?

it's not necessary for a standard dictionary,
if the word can be attached to a human or non human ( عاقل, an intelligent being or non),
legs of table, her legs.
أرجلها

how do i ha_prefix a word?

  • ها

16: what does k_prefix means?

if the word accept attached preposition
in the house بالبيت

how do i k_prefix a word?

accept prefixes like:
ب، ل، ك

17: what does annex means?
how do i annex a word?

similar to w_suffix

from arramooz.

linuxscout avatar linuxscout commented on June 9, 2024

I am listing my all questions 1 by 1 for verbs
1: what does stamped means?
for example unvocalized is بيت and stamped is بت
what is the difference?

Stamped is a reduced form without vowels letters to improve fuzzy search
in some cases, the conjugated verb form change or lose some vowels letters, like بات يبيت
which make finding infinitive verb difficult, the stamped forms allow us to find all possible verbs original for a stem.

2: what does each of the following columns means and how do i obtain those forms?
transitive, double_trans, think_trans, unthink_trans, reflexive_trans,

transitive: a transitive verb
double_trans: a transitive verb which needs two objects : Ahmed give Taha an apple,
(taha and apple are two objects).

think_trans, unthink_trans : transitive to an intelligent being (think), or non intelligent being ( unthink).
for example : Ahmed eat me (wrong), Ahmed eat an apple ( correct)

reflexive_trans: Pronominal verbs, like I shave ( I shave my hair) أنا أحلق، أنا أستحمّ

3: for verb conjugations i will use here : http://acon.baykal.be/index.php
can i obtain all conjugations? that you have listed as past, future, imperative, passive, future_moode,

Use Qutrub instead, you can find all conjugations in https://sourceforge.net/projects/qutrub/files/conjugatedVerbs/All-verb-conjugated-0.2.tar.bz2/download

4: what does confirmed means?

if the verb accept confirmed tense in present and imperative

5: what does future_type means?

The Haraka ( diacritic) of the second letter for triliteral verb in present tense

for example unvocalized is شهب and future_type is فتحة

in present ي+شْهَبُ
y+ch+H+[a]+b+u

what is the difference?
Final question: Can i somehow obtain adverbs and adjectives from your database?

Yes,
Adjectives includes
صفة, اسم فاعل، اسم مفعول، صيغة مبالغة، صفة مشبهة
adverbs = ظرف

from arramooz.

MonsterMMORPG avatar MonsterMMORPG commented on June 9, 2024

@linuxscout Dear Taha, thank you very much for all of your answers

Hhere my questions after i have read all of your answers

1 : word type منسوب means relative. but relative to which word? can i learn it?

2 : word type اسم فاعل means derived. can i learn derived from which word?

3 : word type صفة is adjective. is that means those words are both adjective and noun?
for example : أبلق or بلقاء etc

4: word type أواسط is comparative. can i get its base form. i mean for example fast > faster > fastest
Does this database have such feature? So i could learn all forms of all adjectives

5: I do not understand your definition for wazn. Can you elaborate further with examples?
The word template, in arabic KaTab :كتب will be مكتوب =>maKTooB, the template is ma**oo*
I dont know KaTab or maKTooB

6: For mankous nouns, i can remove the last letter and it would be the still same noun right?
For example مناجي is equal to مناج

7: For the column defined can you elaborate your answer further? I didnt get what you mean
Some words are defined by default, and can't accept definition determinant, like الله، جهنم
I have checked those 2 words from Google translate and they are Allah and Hell but i do not get what you mean by defined column

8: Are these are all mistakes? If they are mistakes which parts of them are incorrect so i should ignore those parts. Like broken_plural information or number information etc.
mistakes1

9: We always add the broken plural to the end of the unvocalized form right?
For example داثر > داثردَوَاثِرُ

10: Some broken plurals are extremely long. Are they correct?
For example for the word عناق the broken plural value is أَعْنُقٌ; عُنُوقٌ : الأُنْثَى مِنْ أَوْلاَدِ الْمَاعِزِ وَالْغَنَمِ قَبْلَ بُلُوغِهَا السَّنَةَ . It has : and ;
So how do i make plural form of the word عناق

11 : Some broken plurals do contains parenthesizes. How do i use them?
For example for the word : شجيع
Broken plural value is : شُجَعَاءُ;شَجَائِعُ;(ج);شُجَعَاء;وشِجاع;وهي;شجيعة;(ج);شجائعُ;وشِجاع;(ج);شُجعانٌ;وشِجْعة

12: Some other cases of broken plurals that i can not make sense how to pluralize those words. Case by case explanation would be great
Word: بسيط broken plural : +ون(sûr);بُسَطاءُ;بُسُطٌ;(ج);بَسائِط
Word: رعي broken plural : أَرْعاءٌ [لا يجوز جمع مذكر سالم]
Word: مَنْسُوبٌ broken plural: +ون،‏ +ات;مناسيبُ

13: Can you give me all suffixes for dualable forms? So i can make all nouns duable
Like suffix to make indicative dual noun for male
Like suffix to make indicative dual noun for female
etc

14: When can you fix all mistaken entries? Or will you fix them? Or can i determine incorrect entries and do not use them?

15: Can you elaborate more mamnou3_sarf. I do not know Nasb or Tanween and what they do
Thoses words doesn't accept Nasb Tanween مlike جهنم can't be جهنمًا

16: Ok relative columns means relative but can i determine it is relative to which word?
For example وحمى is relative but relative to what? Id is : 100062

17: add WAW means adding و letter to the end of the unvocalized form right? And what does it do?

18: are these all hm suffixes? هم، هن Adding these suffixes do what?

19: are these all kal_prefixes? accpet suffixes like: ك you, كما you two, كم you (plural)
Do i add them to the end of the unvocalized form? What they do when added?

20: is this the only ha_prefix? ها
What does it do when added to a word?

21: are these all k_prefixes? ب، ل، ك
What do they do when added to the beginning of the word?

22:
I add all prefixes to the beginning of the word and suffixes to the end of the word right?
Also a noun can get all prefixes or suffixes that you listed?
Or they get only one of them according to some rules?

23: For annexing a word which letters do i add where? I mean end or beginning
And what does it do?

24: About adverb and adjectives
These are adjective types which are defined in the wordtype column : صفة, اسم فاعل، اسم مفعول، صيغة مبالغة، صفة مشبهة

Does this mean they are both adjective and noun or only adjective?
There are superlative and comparative forms of adjectives. I guess Arabic doesnt have superlative form right? Like fast > faster (comparative) > fastest (superlative)

25: Also it returns 0 results for wordtype ظرف adverb
So how do i obtain adverbs?

Does this mean, I can find adjectives and adverbs by only filtering by wordtype right?

from arramooz.

linuxscout avatar linuxscout commented on June 9, 2024

1 : word type منسوب means relative. but relative to which word? can i learn it?

it's relative to word without Yeh,
عربي is relative to عرب

2 : word type اسم فاعل means derived. can i learn derived from which word?

yes, derived from a verb given in "original" field

3 : word type صفة is adjective. is that means those words are both adjective and noun?
for example : أبلق or بلقاء etc

Yes, in arabic, the adjective is a noun also, because adjective is considered as a case of noun,
in classical arabic, there are 3 types: verb, noun, particle,
the adjective is a function for a noun, not a class,
the اسم فاعل is a noun, and can be used as adjective.

4: word type أواسط is comparative. can i get its base form. i mean for example fast > faster > fastest
Does this database have such feature? So i could learn all forms of all adjectives

comparative is named اسم تفضيل, it has an adjective role,
the word اواسط is plural of أوسط
from simple syllabic words, we can make comparative as أفعل e.g. small, smaller صغير، أصغر

5: I do not understand your definition for wazn. Can you elaborate further with examples?
The word template, in arabic KaTab :كتب will be مكتوب =>maKTooB, the template is ma**oo*
I dont know KaTab or maKTooB

Arabic has a template mechanism for derivation,
like a regular expression, patterns
Pattern like ma12w3,
the root ktb (which mean write), k is the first letter, t, the second, b the last,
we put every letter in the specific place, ktb in [ma**\1****\2w\3**] => maktob

6: For mankous nouns, i can remove the last letter and it would be the still same noun right?
For example مناجي is equal to مناج

it's used in indefinite case only,

7: For the column defined can you elaborate your answer further? I didnt get what you mean
Some words are defined by default, and can't accept definition determinant, like الله، جهنم

ok, it tell us if the word is definite or not,
in arabic the definite article is ال and it's a prefix,
some words are definite in the dictionnary, without definite article, or a specific words, like ALLAH, has no indefinite form.
or Hell named جهنم has no definite article, but it can't accept definite article.

I have checked those 2 words from Google translate and they are Allah and Hell but i do not get what you mean by defined column

8: Are these are all mistakes? If they are mistakes which parts of them are incorrect so i should ignore those parts. Like broken_plural information or number information etc.
mistakes1

they are mistaken in single fields, please empty the single fields for those words.

9: We always add the broken plural to the end of the unvocalized form right?
For example داثر > داثردَوَاثِرُ

No, we add only regular plural suffixes if applicable. but the irregular plural form is used as it.

10: Some broken plurals are extremely long. Are they correct?
For example for the word عناق the broken plural value is أَعْنُقٌ; عُنُوقٌ : الأُنْثَى مِنْ أَوْلاَدِ الْمَاعِزِ وَالْغَنَمِ قَبْلَ بُلُوغِهَا السَّنَةَ . It has : and ;

The broken plural can't be a phrase, else is a mistake,
in this cases, different forms is given separated by ';', the last part is an explanation.

So how do i make plural form of the word عناق

أَعْنُقٌ; عُنُوقٌ :

11 : Some broken plurals do contains parenthesizes. How do i use them?
For example for the word : شجيع

Broken plural value is : شُجَعَاءُ;شَجَائِعُ;(ج);شُجَعَاء;وشِجاع;وهي;شجيعة;(ج);شجائعُ;وشِجاع;(ج);شُجعانٌ;وشِجْعة

multiple forms,
but ;وهي;شجيعة is a feminin form, in a wrong place.

12: Some other cases of broken plurals that i can not make sense how to pluralize those words. Case by case explanation would be great
Word: بسيط broken plural : +ون(sûr);بُسَطاءُ;بُسُطٌ;(ج);بَسائِط

plural is +ون;
بُسَطاءُ;
بُسُطٌ;
;بَسائِط

flag (ج) is in wrong place,
the 'sur' is a mistake

Word: رعي broken plural : أَرْعاءٌ [لا يجوز جمع مذكر سالم]

[لا يجوز جمع مذكر سالم] means that regular plural is forbidden

Word: مَنْسُوبٌ broken plural: +ون،‏ +ات;مناسيبُ

regular plural is +ات + ون
it means add suffixes
broken is مناسيب

13: Can you give me all suffixes for dualable forms? So i can make all nouns duable

most nouns expect if not indicated,

Like suffix to make indicative dual noun for male

+ان

Like suffix to make indicative dual noun for female

+تان
the Teh marbutat ة is transformed to ت.

etc

14: When can you fix all mistaken entries? Or will you fix them? Or can i determine incorrect entries and do not use them?

َAs soon as possible,
There are a lot of entries to validate,
I fix your notes

15: Can you elaborate more mamnou3_sarf. I do not know Nasb or Tanween and what they do
Thoses words doesn't accept Nasb Tanween مlike جهنم can't be جهنمًا

in Arabic, when a noun is indefinite by definite article, or by addition ( added to next word like, water of sea),
the word has a diacritic suffix, example, كتاب kitab pronounced kitabun

16: Ok relative columns means relative but can i determine it is relative to which word?
For example وحمى is relative but relative to what? Id is : 100062

to وحم

17: add WAW means adding و letter to the end of the unvocalized form right? And what does it do?

The regular plural of male suffix is waw+noon +ون.
the waw suffix is got from the regular plural suffix. When a regular plural word is attached to next word,
" The workers of company", موظفون الشركة will be موظفو الشركة

18: are these all hm suffixes? هم، هن Adding these suffixes to what?

to nouns
The book of them, => kitab(book)+hm (them)

19: are these all kal_prefixes? accpet suffixes like: ك you, كما you two, كم you (plural)
Do i add them to the end of the unvocalized form? What they do when added?

Your book => kitab (book)+k (you)

20: is this the only ha_prefix? ها
What does it do when added to a word?

Ha is a suffix
Her book, => kitab + Ha

21: are these all k_prefixes? ب، ل، ك
What do they do when added to the beginning of the word?

are preposition :
ك => as a book=> k+kitab
ب=> by a book => b+kitab
ل => for a book => l+kitab

22:
I add all prefixes to the beginning of the word and suffixes to the end of the word right?

For your projects you mustn't add all those affixes

Also a noun can get all prefixes or suffixes that you listed?

There are some rules to do that,
For prefixes, all nouns can have all prefixes, with few exception.
For suffixes, there are rules for some letters like, Yeh and Hamza.

Or they get only one of them according to some rules?

23: For annexing a word which letters do i add where? I mean end or beginning

annexing is at the end.

And what does it do?

it's like "of", in arabic a word can attach to next word without "of" particle, water of sea, ماء البحر

24: About adverb and adjectives
These are adjective types which are defined in the wordtype column : صفة, اسم فاعل، اسم مفعول، صيغة مبالغة، صفة مشبهة

yes

Does this mean they are both adjective and noun or only adjective?

yes, nouns as a class, adjective as a syntactic function

There are superlative and comparative forms of adjectives. I guess Arabic doesnt have superlative form right? Like fast > faster (comparative) > fastest (superlative)

the superlative is the definite form of a comparative,
for example سريع adj
أسرع comparative
ال+أسرع superlative الأسرع
in other words, fast, faster, "the faster" = fastest

25: Also it returns 0 results for wordtype ظرف adverb
So how do i obtain adverbs?

ok, you can get it from another project named arabic stop words

Does this mean, I can find adjectives and adverbs by only filtering by wordtype right?

Yes, for adjective.
No, for adverbs

from arramooz.

MonsterMMORPG avatar MonsterMMORPG commented on June 9, 2024

Thank you very much again. Here my further questions. I need to know all rules to generate all possible real forms

1: if word has relative column, we can remove the last letter and obtain its relative form always?

2: Do derived noun means always derived from verb? Or noun can be derived from noun?

3: Can you show me all rules to generate all comparative and superlative forms of any given adjective word? Maybe you can add this feature to any of your existing projects? Or maybe you can update database to have comparative and superlative forms as additional column?
from simple syllabic words, we can make comparative as أفعلe.g. small, smallerصغير، أصغر`

4: I still do not get how can i use wazn to generate wazn form. I understood from your saying that wazn defines a rule. But since i do not know Arabic, i dont get how do i apply this rule to generate wazn applied form of the word. Maybe you can add wazn applied form to the database as a value rather than wazn rule?
Examples
word: سواق wazn: فَعَّال
word: بطين wazn: فعيل

I see that there are 55 different wazn rules

5: You said mankous are indefinite case only. And for a word to be definite it always need to start with ال right? i checked database and there were 0 words that starts with ال while being also mankous=1. Am i correct?

6: if word's defined column is 0, can i add ال as prefix to all nouns to obtain definite form
for example unvocalized is : تربة means soil and for making in definite i add التربة and it becomes the soil

7: since nouns can also be adjectives, can i add ال front of adjectives to make them definite as well?
for example : eg خالع is an adjective. can i make it الخالع ?

8: The broken plural can't be a phrase, else is a mistake,
so if broken plural have space character can i assume it is a phrase?
and some of the broken plurals have multiple ; so what does each one mean can i understand? like plural female male etc?

for example word : شريف
broken plural : شُرَفَاءُ;أَشْرَافٌ;شَرَائِفُ;(ج);شُرَفاءُ;وأشْرَافٌ;وهُنَّ;شَرائف
so broken plurals are below ones right? and which means which?
شُرَفَاءُ
أَشْرَافٌ
شَرَائِفُ
(ج)
شُرَفاءُ
وأشْرَافٌ
وهُنَّ
شَرائف

9: Some broken plurals may have been written incorrect? for example
word : لتيا
broken plural : اللَّتَيَّاتِ التَّثْنِيَةُ : اللَّتَيَّانِ
It doesnt have any ; so how do i know which word is broken plural form?

Do i need to evaluate manually all broken_plurals because it seems like they do not follow a pattern that i can parse automatically?

10: are these all dual making suffixes?
for male dual : ان
for female dual : تان
and no other?

11: i dont need mamnou3_sarf for dictionary right? or people may type with addition of that?

12: If relative column is 1, can i always remove last letter to obtain relative form?
few examples: am i doing correct to obtain relative form? all these words have relative column as 1
وحدا : وحدان
عما : عمال
مصطا : مصطاف

13: i thought we were adding only ون suffix for making all regular nouns to plural male
do we add WAW as well always?

14: i guess i dont need the followings for a dictionary right? or which ones i need. i should add those suffix/prefix forms?
a) mamnou3_sarf
b) w_suffix
c) hm_suffix
d) kal_prefix
e) ha_suffix
f) k_prefix
g) annex

15: my biggest issue currently is generating comparative and superlative forms
for generating superlative forms i understood that i make definite the comparative form right?

so if i can somehow obtain comparative form, i can add ال to the front of the word and make it superlative right?
however i do not know how can i make comparative form of any given adjective
can you add to the DB all comparative forms? or tell me all rules?

16: I have downloaded arabicstopwords0.3.zip from here : https://sourceforge.net/projects/arabicstopwords/files/

But how do i obtain adverbs from there?

In arabicstopwords0.3 there is a stopwordsallforms.txt which contains 2 columns of values
what does each one mean?

e.g. first column is أفلغيرها and second column is أ-ف-ل-غير-ها

so first one is stop word? or adverb? second one shows what exactly?

I know i ask to much but Arabic is extremely complex and hard language and i do not know Arabic

I hope my questions will be like FAQ for non-Arabic developers/researchers

from arramooz.

linuxscout avatar linuxscout commented on June 9, 2024

Thank you very much again. Here my further questions. I need to know all rules to generate all possible real forms

ok, I think that Sarf project will be a good guide.
I think that you don't have to generate all affixations, but you need only to generate:

  • word
  • feminine form if applicable,
  • regular plural or irregular form.
  • indicate the word_type of the given word.

the other affixes are syntactic and non need to put it in a dictionary,

1: if word has relative column, we can remove the last letter and obtain its relative form always?

in most cases,
except in cases like: تركيا turkey => تركي a turkish.

2: Do derived noun means always derived from verb? Or noun can be derived from noun?

in most case, derived nouns are derived from verbs, but there are some cases derived from nouns,
like مصدر صناعي
اشتراك => اشتراكية

3: Can you show me all rules to generate all comparative and superlative forms of any given adjective word? Maybe you can add this feature to any of your existing projects? Or maybe you can update database to have comparative and superlative forms as additional column?

comparative wordtype is اسم تفضيل
superlative is just comparative + definite article.
the comparative are derived from root, not from verb.

from simple syllabic words, we can make comparative asأفعلe.g. small, smallerصغير، أصغر`

4: I still do not get how can i use wazn to generate wazn form. I understood from your saying that wazn defines a rule. But since i do not know Arabic, i dont get how do i apply this rule to generate wazn applied form of the word. Maybe you can add wazn applied form to the database as a value rather than wazn rule?
Examples
word: سواق wazn: فَعَّال
word: بطين wazn: فعيل

In generale, there are two levels of morphological generation in Arabic:

  • the first one is derivation which create words from other word using templates (wazn), this task is done by language experts, and there are many cases, to do this by a program you can look at Sarf system, This task is dictionary based, dictionaries give us more information.
  • the second one, is to generate feminine, masculine, or dual forms, which is a simple task for every one.

I see that there are 55 different wazn rules

There are more, see https://sourceforge.net/projects/arabicpatterns/ and http://www.attiaspace.com/

5: You said mankous are indefinite case only. And for a word to be definite it always need to start with ال right? i checked database and there were 0 words that starts with ال while being also mankous=1. Am i correct?

The definition of mankous is: the word has a Yeh at the end, in definite case we keep the yeh, in indefinite case we strip Yeh.
But in dictionary, we keep Yeh, to keep word original.
In other case, the arabic word can be definte by addition to next word.

6: if word's defined column is 0, can i add ال as prefix to all nouns to obtain definite form

there are an other column, which is definable, to say if the word accept definite article

for example unvocalized is : تربة means soil and for making in definite i add التربة and it becomes the soil

Yes,

7: since nouns can also be adjectives, can i add ال front of adjectives to make them definite as well?
for example : eg خالع is an adjective. can i make it الخالع ?

Yes

8: The broken plural can't be a phrase, else is a mistake,
so if broken plural have space character can i assume it is a phrase?

no space

and some of the broken plurals have multiple ; so what does each one mean can i understand? like plural female male etc?
for example word : شريف
broken plural : شُرَفَاءُ;أَشْرَافٌ;شَرَائِفُ;(ج);شُرَفاءُ;وأشْرَافٌ;وهُنَّ;شَرائف
so broken plurals are below ones right? and which means which?
شُرَفَاءُ
أَشْرَافٌ
شَرَائِفُ
(ج)
شُرَفاءُ
وأشْرَافٌ
وهُنَّ

this means feminine form in next

شَرائف

9: Some broken plurals may have been written incorrect? for example
word : لتيا
broken plural : اللَّتَيَّاتِ التَّثْنِيَةُ : اللَّتَيَّانِ

incorrect, some misplaced information,
التَّثْنِيَةُ means dual form

It doesnt have any ; so how do i know which word is broken plural form?

Do i need to evaluate manually all broken_plurals because it seems like they do not follow a pattern that i can parse automatically?

I must treat if manually

10: are these all dual making suffixes?
for male dual : ان
for female dual : تان
and no other?

in accusative case, they will be ين and تين
but, I think that must not be included in such project

11: i dont need mamnou3_sarf for dictionary right? or people may type with addition of that?

you can add is as a flag, or tag

12: If relative column is 1, can i always remove last letter to obtain relative form?
few examples: am i doing correct to obtain relative form? all these words have relative column as 1
وحدا : وحدان
عما : عمال
مصطا : مصطاف

is a mistake, not relatives, relative have Yeh at the end, or Yeh+TEH Marbuta ية

13: i thought we were adding only ون suffix for making all regular nouns to plural male
do we add WAW as well always?

no, in some context only,

14: i guess i dont need the followings for a dictionary right? or which ones i need. i should add those suffix/prefix forms?

good, you inderstand

a) mamnou3_sarf
b) w_suffix
c) hm_suffix
d) kal_prefix
e) ha_suffix
f) k_prefix
g) annex

15: my biggest issue currently is generating comparative and superlative forms
for generating superlative forms i understood that i make definite the comparative form right?

Yes, great

so if i can somehow obtain comparative form, i can add ال to the front of the word and make it superlative right?

:+1

however i do not know how can i make comparative form of any given adjective
can you add to the DB all comparative forms? or tell me all rules?

Comparative are derived from root, not from adjective.
you can found them from اسم تفضيل.
for adjective with more letters, we use a rule similar to english, we add "more" for "more expensive"
1- simple: سريع أسرع الأسرع fast, faster, fatest
2- complex: مرعب terrifiant
أكثر رعبا more terrifiant
الأكثر رعبا the most terrifiant

16: I have downloaded arabicstopwords0.3.zip from here : https://sourceforge.net/projects/arabicstopwords/files/

But how do i obtain adverbs from there?

In classified folder, you find a dictionary of lexical entries without affixes,
The allfomrs folder contains all forms after affixation

In arabicstopwords0.3 there is a stopwordsallforms.txt which contains 2 columns of values
what does each one mean?
e.g. first column is أفلغيرها and second column is أ-ف-ل-غير-ها

the first column: the affixed word, the second, segmented word, affixes are separated by '-'

so first one is stop word? or adverb? second one shows what exactly?
I know i ask to much but Arabic is extremely complex and hard language and i do not know Arabic

No, at all, it's a great language, I see that you ask questions as you know arabic well.

I hope my questions will be like FAQ for non-Arabic developers/researchers

from arramooz.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.