soshial / xdxf_makedict Goto Github PK

XDXF — an open and free dictionary format, that stores word articles in a structural and semantic way. The most convertible format

dictionaries dictionary-format dsl-dictionaries pyglossary semantic stardict xdxf xdxf-format xml

xdxf_makedict's People

Contributors

Stargazers

Watchers

xdxf_makedict's Issues

Needs proper namespace and schema

It would be nice to have a proper namespace and a W3C XML Schema (or RelaxNG or Schematron, though, XSD is the most supported by editors and other XML tools).

Modelling the XML in a way, that it could be intermixed with other namespaces easily, would allow for reuse in other types of applications. I came to XDXF, because I am currently developing a 'Flashcard' app, and, for vocabulary learning, I would like to incorporate an existing, well specified format for language dictionaries. This means, that I would have a Flashcard Markup Language file, that contains XDXF XML at appropriate locations, intermixed.

Dictionary shells supporting XDXF

Which dictionary software supports XDXF?:

GoldenDict Mobile (Android) does not support XDXF, however it understands StarDict with XDXF markup (type 'x').
Alpus (Android, iOS, desktop) officially supports XDXF. ~~In fact, it recognizes only first headword (<k>) in the articles with multiple keys.~~ (was fixed in release v9.0)
Aard2 (Android) reads only its own format Slob, but conversion tool xdxf2slob understands XDXF more or less correctly. (Extensive testing of every XDXF feature is needed.)

Poor conversion from dsl/stardict/xdxf to diсtd format

After conversion from dsl to diсtd format appear HTML tags:
<k>...</k>, <blockquote>...</blockquote>, <kref>...</kref>
This formatting is not part of the DICT standard. I think for DICT format it is better, that was PlainText without these elements of formatting.
It is better the HTML tags <kref>...</kref> to replace with format codes {...}, and all other tags to remove.
The above is true for other directions of conversion. For example, from stardict to dictd and from xdxf to dictd.

How to create a bilingual dictionary entry

Hi, Thanks for amazing project. I am interested in this but I can't see how to make a bi-lingual entry. In the Rev34.xml file there are 'to' and 'from' language elements in the meda_info. But they indicate to translations of languages en and lv. However in the ar entries I don't see anything identified by lv. And there does seem to be a translation of 'Home", but this appears to be in Russian but with no language specified. Can you point to any other example? Thanks!!!!

undefined reference to File::printf

I had problems compiling under msys. Finally I succeeded and that's the comment from the fix:

/* Msys/mingw printf problem:
libintl defines macro printf which messes with our File::printf function.
It seems to me, that it happens after including glib/gil8n.h
On the other hand it is reset by cstdio header, so I changed the order
of includes, so that cstdio come after gil8n.h

I'm going to attach the patch, if it's possible.

UPD: I put the patch into my repository, that's the link.

Compile error with glib-2.40.0: variable or field ‘g_log’ declared void

glib has its own g_log() macro since version 2.40.0 which conflicts with g_log() in log.hpp:

[  3%] Building CXX object CMakeFiles/makedict.dir/src/makedict.cpp.o
In file included from /tmp/xdxf_makedict/src/parser.hpp:37:0,
                 from /tmp/xdxf_makedict/src/connector.hpp:4,
                 from /tmp/xdxf_makedict/src/makedict.cpp:37:
/tmp/xdxf_makedict/src/log.hpp:22:13: error: variable or field ‘g_log’ declared void
/tmp/xdxf_makedict/src/log.hpp:22:13: error: expected primary-expression before ‘const’
/tmp/xdxf_makedict/src/log.hpp:22:13: error: expected primary-expression before ‘...’ token
make[2]: *** [CMakeFiles/makedict.dir/src/makedict.cpp.o] Error 1
make[1]: *** [CMakeFiles/makedict.dir/all] Error 2
make: *** [all] Error 2

Removing g_log() from log.hpp and log.cpp fixed the erorr for me.

Problem with DTD

Running xmllint with the dtd and sample data produces this error. It seems to not even want the |br at all in the dtd.

xmllint --dtdvalid xdxf_strict.dtd rev34.xml 
xdxf_strict.dtd:13: parser error : MixedContentDecl : '|' or ')*' expected
<!ELEMENT description (#PCDATA|br)>
                                 ^
xdxf_strict.dtd:13: parser error : expected '>'
<!ELEMENT description (#PCDATA|br)>
                                 ^
xdxf_strict.dtd:13: parser error : Content error in the external subset
<!ELEMENT description (#PCDATA|br)>

#include <cstdlib> missing in file.hpp

When trying to build makedict on Mac OS X 10.9, I was getting errors:
/dictd_parser.cpp:164:10: Use of undeclared identifier 'EXIT_FAILURE'

I fixed this by adding

include

to file.hpp

Now it compiles fine.

XDXF format comments, suggestions and needed corrections

@soshial and whoever else is involved in developing the XDXF format standard:

I recently realized that there are a number of great free Chinese and Japanese dictionaries but unfortunately each is made available in its own specific format, which means it takes a specific tool to read it. This made me start looking for a good dictionary format (preferably XML) that could be used for any language. I found that format in XDXF, which I do consider is the closest we have to an ideal open and global dictionary format standard. As I set out to write a converter for the Chinese-English CC-CEDICT dictionary, I unfortunately also noticed many problems with the format, some of those serious enough to prevent a good dictionary conversion (from non-alphabetic languages), some just minor (or major) inconveniences.

What follows is a series of comments, corrections, criticism and proposals about the XDXF standard.

There are four main points where I think improvement is needed for XDXF to fully achieve it's purpose: format (visual format needs to be completely dropped), file structure (there should actually be two different formats, a flat XML and a package), deeper semantic definition, and better support for non-European and non-alphabetic languages (especially multiple writing systems and transliterations).

It is important that this format must be able to display all information commonly found in a dictionary, be it paper or electronic and from any to any language.

(Any XML markup in the following suggestions that is not currently in the XDXF standard is just a suggestion. I am in no way implying that it should be the final version.)

I. Format

What makes XDXF stand out when compared to other formats is the ability to describe a dictionary in a semantic format. That is what XDXF brings to the table that previous dictionary formats cannot compete with. A stardict dictionary converted to visual XDXF may still be technically an improvement, but it'll be barely noticeable and so it doesn't make much sense to go through the trouble of converting it, when that is the most supported format anyway. (The same is true about other visual formats.)

I propose that the visual format be dropped from the XDXF standard; dictionaries in the visual format should be considered obsolete and no longer supported. I understand that, at least in part, the reason for a visual format is that it allows almost seamless conversion from most other dictionary format. Its discontinuation would make many-to-many converters much harder to write (if possible at all), as all information needs to be parsed. As I argued above, I don't believe being an easy target of conversion is worth much if there isn't a significant improvement of some form, either to the DS maintainers, or to the users. I think it reasonable to suggest that DS keep support for deprecated revision 33 as a way to keep supporting the visual format dictionaries that may be available in the wild.

It is not enough to mention the visual format is not supported, it cannot be part of the most recent revision.

A beneficial side effect is that this will make the XML definition much clearer, as it won't be defining what in effect are two different formats.

II. File Structure

The structure seems very confusing. On the one hand, it seems to be trying to describe an XML format for a dictionary, in the classical meaning for a dictionary: a list of words/phrases with their corresponding description and, possibly, additional metadata. On the other, it describes what could be called a DS "dictionary package" with things that aren't actually traditionally part of a dictionary, like toolbar icons, images, sound files, a folder structure, etc.

I agree that it is reasonable to try to accommodate both interpretations of what a dictionary is and an electronic dictionary should be, but a better solution would be do develop a standard for two different formats: (i) a "flat" XML dictionary format containing only the textual data that traditionally constitutes a dictionary and the metadata to describe it, written to a file with an identifying file name and .xdxf as extension; and (ii) a "dictionary package" possibly modeled after epub or opendocument. That is, essentially a zipped archive with an (XML) index file indicating the contents of the package, which must include one or more xdxf files (allowed only when they're related, e.g.Larousse English-French French-English dictionary would consist of two .xdxf files, the Oxford English Dictionary of only one). Icons, images, and other non-textual should be part of the package, correctly arranged in folders and indicated in the index. An index of images, for example, would indicate their relative location (by default /images/), their file name and the words/phrases under which they should appear. Textual information that is not part of the dictionary per se but that is traditionally part of dictionaries can also be included in pre-defined xml formatted files. More on that below.

In fact, the current dict.xdxf XML file in a folder with a more less defined name and optionally toolbar icons (for a simple dictionary) is an overly complicated, non-practical structure that is not easy to implement. (Imagine any other common file types in a similar structure; MP3, PDF, DOC all with the same name in folder with toolbar icons... who would want to use them?) In fact, notice how all DS that support XDXF will already gladly accept a simple .xdxf file regardless of its name, as is much more intuitive. A clear name identifying the dictionary and its edition/version should be recommended for practical reasons, but is in effect unnecessary as the information is already in the metadata.

This format would also allow for including information which is commonly included with dictionaries (both paper and electronic). One important example is conjugation/declension tables; while these aren't part of key phrases definitions and shouldn't be part of the XDXF file itself, they are commonly included as part of dictionaries and should be represented on the XDXF package format. Conjugations should be included in an independent XML conjugations standard file (to be developed) and referenced in the index file. The DS can then appropriately place a button/link on the entries for which there is a conjugation table which will display the properly formatted information. See the XML conjugation format for French conjugation software Verbiste as an example of such file.

The XML conjugations/declensions can also be used by the DS to recognize, for example, conjugated forms of verbs and display the correct entry (even indicating what form it is).

Some "sub-dictionaries" should also be in their own XML file. For example, some dictionaries include a "name's dictionary" as an annex. The user should be allowed to enable or disable these kinds of "sub-dictionaries" in the DS.

Icons can only be recommended, not required, as they are in no way part of the dictionary. In fact no DS requires icons and few would make any use of them. The icon reference in the current revision seems tailor-made for GoldenDict and the specifications for a standard shouldn't be intended for any specific DS. Icons should be supported, though, in the dictionary package and for the DS that do make use of them. They should be in the appropriate folder and need to be better defined: what format(s) can be used?; which sizes can/must be present?; etc. The icon metadata should also be present in the main index file (possibly unnecessary is defaults are used).

A beneficial side effect of the zipped package format is the enormous size reduction. As XML formats require a constant repetition of opening and closing tags, files are inflated significantly, an inflation that is greatly reduced in a zipped archive. A significant example: the CC-CEDICT dictionary, with 114,959 entries takes 8.4 MB in its original minimalist format; when converted to XDXF it takes 31.4 MB, an almost 4-fold increase in size! A zipped CC-CEDICT file takes 3.3 MB, and the zipped XDXF-converted file only 4.3 MB, a minimal increase over the original file size. In fact, DS should be recommended to import zipped flat XDXF files directly, even when not part of an XDXF package.

III. XML Structure

1. Root Element

See format argument above.

2. `<meta_info>`

All elements should be clealy described.

<title>/<full_title>: What is exactly the difference between <title> and <full_title>? How long can the <title> be, exactly? Several example here would be very helpful.
- More importantly, why does the <title> need to be "written in English"? It makes absolutely no sense to me why a Chinese, or a Russian, or an Arabic monolingual dictionary's title would have to be written in English, a language that hypothetically its users can't even understand. Even for bilingual dictionaries in which none of the languages is English the requirement doesn't make much sense.
<description>: To include the amount of information required, this field will certainly include multiple lines. How should a line break be indicated? With a <br /> tag, like in the entries definitions? Then it should be stated so that it can be supported by the DS.
"<last_edited_date>, <dict_edition>, <publishing_date>, <dict_src_url> are optional meta info.": None of these elements were defined. <publishing_date> refers to the publishing of what? The original dictionary? The file the XDXF dictionary was converted from? This very XDXF file? The same could be asked about <last_edited_date> and <dict_edition>.
Some necessary additions to lexicon element mentioned below will require additional metadata. See below for more details.

3. Lexicon

<k>: This element must support defining different scripts/writing systems for the same "key phrase". This is different from different spellings, in that the user should be allowed to choose in the DS settings which script they prefer and the DS should only display the chosen script, and not show all key phases repeated as may times as there are scripts. The DS may (possibly should) display the other script(s) with the definition text, as it does with transcription, etymology, etc. If the user searches correctly for a word in different script than the one they chose, the entry should be displayed with the word in the chosen script as headword. An obvious example is Chinese: it is common to have all entries in both main variants, simplified and traditional Chinese, which, with the current format, means all entries are doubled in the DS, and a user will have to sift through the simplified entries, even if they only read traditional Chinese (and vice-versa). The same is true for any other language which can be written in more than one script/writing script (either because different areas speaking the same language use different scripts or because it used to be written in a different script and the dictionary includes both variants).

I suggest the different scripts be noted with as system attribute. As in <k system="simplified">词典</k> and <k system="traditional">詞典</k>

As far as I know there isn't an ISO list for writing systems, only for scripts, which doesn't work in this case as some languages use more than one script in one word (i.e. Japanese), and different variants may be counted as only one script. So it seems the writing systems used in the dictionary will have to be defined in the <meta_info> element, in a similar way as abbreviations are defined. More on writing systems (what exactly constitutes a script or a writing system is debatable).
<def>/<deftext>: The usage of the elements <def> and <deftext> is confusing. It seems (from the examples) that a general <def> is always needed as a placeholder to the <def> elements that actually contain definitions. The fist time I read the format description I thought <def> would be used for the more general meaning and <deftext> for the more detailed one. For example, this definition from the OED:

marry, v

1. To join in wedlock or matrimony (...)

a. in pass. (with ref. either to the act and ceremony, or to the wedded state as a result).

b. Said of the priest or other functionary who performs the rite. Also absol.

2. a. To give in marriage, cause to be married. Said esp. of a parent or guardian.

b. With off.

Would be rendered:
```
<def>To join in wedlock or matrimony (...)
<deftext>in pass. (with ref. either to the act and ceremony, or to the wedded state as a result).</deftext>
<deftext>Said of the priest or other functionary who performs the rite. Also absol.</deftext>
<def><deftext>To give in marriage, cause to be married. Said esp. of a parent or guardian.</deftext>
<deftext>With off</deftext></def>
```
But, of course, this wouldn't work when there are three levels of definitions, as in (still from the OED):

marry, v

I. trans.

1. To join in wedlock or matrimony (...)

a. in pass. (with ref. either to the act and ceremony, or to the wedded state as a result).

2. a. To give in marriage, cause to be married. Said esp. of a parent or guardian.

II. 6. intr. a. To enter into the conjugal or matrimonial state(...)

As I now understand the idea is to nest <def> elements. This would work to any number of level of definitions but still requires the doubled elements <def><deftext> (except when there are examples). It seems the idea is to include elements like <gr> and <tr> inside the <def> element, but this kind of information generally belongs to the "key phrase", not to any specific definition, and as such should be directly inside the <ar> element. Otherwise the pronunciation would have to be repeated on each <def> element, as it's not likely to change (except in the rare cases when it changes in different definition). It is possible I'm still not quite understanding the logic to these two elements, but that is also why they need to be more clearly described and examples need to be provided, especially complex examples with several levels of definitions. If a parent <def> tag is always needed to contain the <def> elements that in turn include the <deftext> elements which in turn include the actual definitions then the first and subsequent <def> elements have very different purposes. Changing the first <def> to <definition>, <def_container> or another similar label could make this structure much more clear.
<tr>/<gr> I couldn't get the file to validate with a <tr> element directly inside <def>, so I have to follow the example provided and put each <tr> inside a <gr> element. This makes absolutely no sense, how can a transcription be considered grammar?
<gr>: the element seems to be meant for free text but, XDXF being a semantic format, it needs to allow you to define common grammatical properties semantically. Common examples are grammatical gender and number and parts of speech for European languages and, for Asian languages, like Chinese and Japanese, measure words (or classifiers, as they are also known). You should be able to define measure words as such <gr><mw>份</mw><mw>顿</mw></gr>, letting the DS handle how to display it and allowing for indexing. Similarly, for European languages you should be able to define gender and number, as such <gr><gender>mas</gender><num>sing</number></gr>. The possible options should be predefined. There still needs to a free text element such as <grtext> for properties not yet defined and other grammatical comments. Of course this could be an enormous undertaking, but note only properties normally mentioned in dictionaries need to be defined; for example, verb tense doesn't have to be defined as dictionaries usually don't define it, only verbs in the infinitive are listed (for the languages I know, if this is not true for any language it still needs to be included). Also, not all possible properties for all languages in the world need to be included at once (which wouldn't even be possible) only the most common at first and more can be added to future revisions as needed.

Atributes may be needed to indicate types of grammatical categories, these should be predefined and indicated by the 3-letter language codes from ISO 639-3 standard, as they should be language-dependent. If there are non-language specific categories, there is no need for the language code. For example Japanese adjectives can be divided in -i adjectives, -ii adjectives, -na adjectives, -no adjectives, attributives, -taru adjectives, and noun or verb acting prenominally (This are the categories as defined on JMDICT, what exactly constitutes an adjective is debatable as are its categories.) A parts of speech entry for the adjective 暑い, hot, could be as such: <part type="jpn-i">adj</part>. I'm not sure if these attributes should be defined the <meta_info> element or generally in the XDXF format. In any case, it's not for the XDXF project to decide which categories are valid (or to make any other classification judgment of any kind), any dictionary must be able to set it's categories freely, but the options may be predefined.
<tr>: "Marks transcription/pronunciation information" -- these can be very different things in dictionaries. The description also leaves out transliteration, which is essential to non-alphabetic languages, and which I assume is meant to be included in this element. Transliteration (generally, but not limited to, romanization) is particularly important because for entries to be easily searched their transliteration(s) needs to indexed (for ideographic languages like Chinese that's the only way to search for a word if you know the sound but not which characters are used to write it.) My proposal is that instead of one, there should be three elements:
- Transcription (possibly <tr>) which by default should be IPA but should allow for an attribute defining the transliteration system, e.g. <tr system="SAMPA">"s{mp@</tr>. (The current "mode" attribute isn't very clear.) The valid systems need to be defined in the standard so that DS know exactly what they are and can do things like converting SAMPA (meant to be read by computers) to IPA (meant to be read by people). This will allow the DS to know what transcription system is being used and display it to the user. It is not likely more than one system will be used at a time, but in case it is it will also be supported.
- Pronunciation (possibly <pr>): There are also different ways to indicate pronunciation without phonetic transcription. Pronunciation respelling, is very common in English monolingual dictionaries; also pronunciation respelling for just a syllable, or a even a letter when more than one pronunciation is possible can be seen in dictionaries of different languages. I have seen it in Portuguese monolingual dictionaries, usually enclosed in slashes, and may be used for other languages also. As it's highly unlikely both will be used at the same time, there is no need for attributes, e.g. <pr>paɪəˈnɪə(r)</pr> or <pr>/nɪə/</pr>. Possibly partial respelling should be indicated with an attribute and not by enclosing it in slashes. There may be other ways to indicate pronunciation in languages I don't know, in which case an attribute may be needed.
- Transliteration (possibly <tl>): generally but not necessarily romanization. Needs an attribute to indicate the transliteration system used. As far as I know there is no ISO list of transliterations so the names may have to be defined in the <meta_info> element. The transliteration(s) in the most common system(s) should be indexed so that "key phrases" may be searched; which transliterations should be indexed must be indicated in the <meta_info> element. For example for the word **, China, the transliterations can be indicated as <tl system="pinyin">Zhōngguó</tl><tl system="Bopomofo">ㄓㄨㄥㄍㄨㄛˊ</tl><tl system="Gwoyeu Romatzyh">Jong'gwo</tl><tl system="Wade–Giles">Chung1-kuo2</tl>. But as only pinyin (and bopomofo in Taiwan) are commonly used to indicate pronunciation and to input characters, only one should be indexed (or two, if to be used in Taiwan). Common simplifications for typing the transliteration in ANSI characters should be handled by the DS, but should not be indicated in the XDXF file. For example "zhong1guo2" should be recognized by DS as "zhōngguó", and a search for "zhong" should show results for all possibilities (zhōng, zhóng, zhǒng, zhòng, zhong). When a transliteration is indexed and a user searches using the transliteration the results should show not only the key phrases but the transliteration also, either next to the key phrases or as a tooltip--as very different words can have the same transliteration.
One more example of transliterations: for the Japanese word ローマ字 the transliterations can be defined as <tl system="revised-Hepburn">rōmaji</tl><tl system="kunrei-shiki">rômazi</tl><tl system="Nihon-shiki">rômazi</tl>. But only Hepburn is used nowadays and it is the only system that should be indicated to be indexed. The DS should recognize "roumaji" as both rōmaji and roumaji, but the simplified ANSI form should not be indicated in the XDXF file.

There may be more than one transcription, pronunciation or transliteration, as there may be more than one pronunciation for a word. These elements may be repeated as many times as necessary. There should also be a comment attribute to indicate if a pronunciation is rare, archaic, regional, etc.

These three elements may possibly be organized inside parent element.
<deftext>/writing systems: For dictionaries that have several writing systems defined you should be able to indicate alternatives in the definition text when a word/phrase in the original language is used. Only the selected writing system should be displayed. Also there needs to be a way to indicate transliteration in one of the systems defined, to be indicated by the DS as appropriate (e.g. simply next to the words, or with some specific formatting, or over the word, or as tooltip, etc.). An example of what this would look like for the Chinese word 乎: <deftext>classical particle similar to <tl sys="pinyin" text="yú"><altsys="traditional">於</altsys><altsys="simplified">于</altsys></tl>) in</deftext>.
<rref>: Should only be used in for dictionary packages. Doesn't need as much information as this should be defined in an index file. Something like <k>key phrase<audio type="transcription" /> may be enough. It is preferable to not indicate any external file in the flat XML file. What media types, formats and sizes are allowed should be defined in the XDXF (package) standard. <rref> should be directly inside <ar> or the "container-<def>", unless if it refers to a specific sense of the word; I don't understand how it could ever be inside <gr>.
<c>: I don't think it is needed for a semantic format.
<ex>: Missing one of the most common types of examples: quotations. I don't quite understand why examples should be indexed.
<co> : Not sure why comments should be indexed. Grammatical comments belong in the <gr> element; comments on etymology belong in the <etm> element. Types need to be defined in the standard so DS know how to handle them.
<sr> : Might be better to allow semantic relations to be defined in a separate XML file in dictionary packages. The element should still be be available for flat XML files.
<etm> There needs to be a way to define the genealogical relationship of wors. This is a proposal using nesting and 3-letter language codes with the example "fetish", as per the the OED: <etm><orig><k lang-"fra">fétiche</k><orig><k lang="por">feitiço</k><deftext>charm, sorcery (from which the earliest Eng. forms are directly adopted)</def></orig></orig></etm>. Which could be displayed by the DS as Etymology: From Portuguese "feitiço": charm, sorcery (from which the earliest Eng. forms are directly adopted); via French "fétiche". There may be better ways of defining this relationship, but this example is enough to show what should be possible. There still needs to be a free text element for any other etymological comments.

IV. Other Comments on XDXF

About transliterations and written systems: I don't think there is an ISO (or ISO-like list) of these systems, it would however be extremely useful to have an official list for allowed systems. This would make it clear and easier for DS to handle it. A solution would be an XDXF official list for each of the two, with an official code for each system. It could be done by adding the official ISO transliteration (that is, one per language) and Unicode scripts (not the same a writing systems) and then add as appropriate.

Information Pages: To allow for the XDXF format to include all information that is traditionally part of a dictionary, I believe it's necessary to include a new element under the root element, something I would call "information pages", to allow for including things like introductions, prefaces, bibliographies, abbreviations, etc. All things that are normally part of a dictionary but aren't allowed in the XDXF standard yet. This element should allow for including the same style tags as the textual definitions for key phrases plus <h#> and <p>. The number of information pages should be very limited, this is not an ebook format.

XDXF Project

Some improvements need to happen with the XDXF project itself:

There should be a detailed change log for what changes with each revision and all previous revisions should be made available. This will greatly help DS developers when they want to update their DS implementation of XDXF and, as an open format, an archive of past revisions should be available.
The XDXF standard definition should be moved to its own repository instead of being hosted on a folder on the makedict repository. The current situation makes it look as if XDXF is an internal format to be used in a specific tool and not an XML dictionary standard in its own right. (The first time I found it when looking for a good XML format for dictionaries that's what I thought, and I'm likely not the only one.)
Still related to the point above: XDXF needs it's own website where people can find information about the standard, the standard description, the DTD description and good example dictionaries without going into Github folders. Github is fine for developers wanting to implement XDXF but not for dictionary users trying to understand what this XDXF thing is. GitHub Pages would be a simple and effective solution.
DTD file needs to include all available languages, otherwise dictionaries for languages not included will not be validated.
It would be great if the DTD file were commented in detail. A good example of this would be the JMDICT DTD file.

Related issues: #28, #6, #5

installation doubts

Hi all,

Sorry for my clumsiness. I don't quite understand the last point (7):

After that add "export MAKEDICT_PLUGIN_DIR=/usr/share/makedict/codecs" somewhere into ~/.bashrc.

I'm in Ubuntu 16.04. I got to install makedict and use it in my previous Ubuntu installation, but forgot how I did it after a clean install.

Thank you.

zho for chinese

Regarding the use of ISO language standards on in @lang_from @lang_to:

I believe these should be lower case according to ISO 639-3
It might make sense to add all languages, the space needed is negligible. If making a selection perhaps focus on the largest ones: e.g. Chinese "zho" is not in the DTD, but the much rarer "zha" Chuang/Zhuang is.

Link error - ld cannot find glib-2.0

$ cd Dropbox/Documents/xdxf_makedict_build
$ cmake ../xdxf_makedict
-- The C compiler identification is AppleClang 5.1.0.5030040
-- The CXX compiler identification is AppleClang 5.1.0.5030040
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found PkgConfig: /opt/local/bin/pkg-config (found version "0.28")
-- checking for one of the modules 'glib-2.0'
-- Found GLib2: glib-2.0;intl /opt/local/include/glib-2.0;/opt/local/lib/glib-2.0/include;/opt/local/include
-- Looking for include file glib/gregex.h
-- Looking for include file glib/gregex.h - not found
-- Found ZLIB: /usr/lib/libz.dylib (found version "1.2.5")
-- Found Gettext: /opt/local/bin/msgmerge (found version "0.18.3")
-- Performing Test ICONV_HAVE_WERROR
-- Performing Test ICONV_HAVE_WERROR - Success
-- Performing Test ICONV_SECOND_ARGUMENT_IS_CONST
-- Performing Test ICONV_SECOND_ARGUMENT_IS_CONST - Failed
-- Found Iconv: /usr/lib/libiconv.dylib
-- Looking for dgettext
-- Looking for dgettext - not found
-- Found libintl: /opt/local/lib/libintl.dylib
-- Looking for mmap
-- Looking for mmap - found
-- Looking for locale.h
-- Looking for locale.h - found
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/nomorethirst/Dropbox/Documents/xdxf_makedict_build
$ make
Scanning dependencies of target makedict
[ 3%] Building CXX object CMakeFiles/makedict.dir/src/makedict.cpp.o
[ 7%] Building CXX object CMakeFiles/makedict.dir/src/file.cpp.o
[ 11%] Building CXX object CMakeFiles/makedict.dir/src/parser.cpp.o
[ 15%] Building CXX object CMakeFiles/makedict.dir/src/generator.cpp.o
[ 19%] Building CXX object CMakeFiles/makedict.dir/src/process.cpp.o
[ 23%] Building CXX object CMakeFiles/makedict.dir/src/connector.cpp.o
[ 26%] Building CXX object CMakeFiles/makedict.dir/src/log.cpp.o
[ 30%] Building CXX object CMakeFiles/makedict.dir/src/utils.cpp.o
[ 34%] Building CXX object CMakeFiles/makedict.dir/src/xml.cpp.o
[ 38%] Building CXX object CMakeFiles/makedict.dir/src/charset_conv.cpp.o
[ 42%] Building CXX object CMakeFiles/makedict.dir/src/dictd_generator.cpp.o
[ 46%] Building CXX object CMakeFiles/makedict.dir/src/dictd_parser.cpp.o
[ 50%] Building CXX object CMakeFiles/makedict.dir/src/dsl_ipa.cpp.o
[ 53%] Building CXX object CMakeFiles/makedict.dir/src/dsl_parser.cpp.o
[ 57%] Building CXX object CMakeFiles/makedict.dir/src/dummy_generator.cpp.o
[ 61%] Building CXX object CMakeFiles/makedict.dir/src/dummy_parser.cpp.o
[ 65%] Building CXX object CMakeFiles/makedict.dir/src/lang_tbl.cpp.o
[ 69%] Building CXX object CMakeFiles/makedict.dir/src/lang_tbl_auto.cpp.o
[ 73%] Building CXX object CMakeFiles/makedict.dir/src/mapfile.cpp.o
[ 76%] Building CXX object CMakeFiles/makedict.dir/src/normalize_tags.cpp.o
[ 80%] Building CXX object CMakeFiles/makedict.dir/src/sdict_parser.cpp.o
[ 84%] Building CXX object CMakeFiles/makedict.dir/src/stardict_generator.cpp.o
[ 88%] Building CXX object CMakeFiles/makedict.dir/src/stardict_parser.cpp.o
[ 92%] Building CXX object CMakeFiles/makedict.dir/src/xdxf_generator.cpp.o
[ 96%] Building CXX object CMakeFiles/makedict.dir/src/xdxf_parser.cpp.o
Linking CXX executable makedict
ld: library not found for -lglib-2.0
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [makedict] Error 1
make[1]: *** [CMakeFiles/makedict.dir/all] Error 2
make: *** [all] Error 2

discussion link in wiki broken

I'd like to have some discussion. Where is the place for that? Please update wiki.

A lot of XDXF, Stardict, DSL dictionaries

There (mirror) is a lot of En-Ru, En-En, Ru-En and Ru-Ru dictionaries in XDXF, Stardict, DSL and other formats.

self test fails

make test:
69% tests passed, 4 tests failed out of 13

Total Test time (real) = 18.53 sec

The following tests FAILED:
2 - t_dslparser (Failed)
5 - t_xdxfgenerator (Failed)
10 - t_parser_options (Failed)
11 - t_sdict_parser (Failed)

Is it not?

kref link displaying one piece of text and linking to another?

Hi. I'm trying to convert a dictionary I have to XDXF. Using makedict, I get a file with things like this in it:
<A href="bword://aaa">bbb</A>
That's meant to link to "aaa" while displaying the text "bbb". Is there any way to achieve this with kref?

How can I add image to XDXF file?

Hello!
Please help me to add image into xdxf! The image is in the same directory of xdxf file.

Thank!

Structure of definitions [move to wiki]

@k-sl wrote that:

<def>/<deftext>: The usage of the elements <def> and <deftext> is confusing.

I would like to bring more clarity into usage of those. I encountered a problem while working with nested meaning groups and complex meanings that have some specific properties. Before rev. 33 there was only 1 tag for text of word meaning, which was <def>.

Let's look at different possible cases of word article structures. <entry> and <sense> are pseudocode tags and are used for easy understanding.

The revision attribute is missing in the schema and not fully defined in the XDXF draft

Basically, it is unclear in what format the revision should be stated. This should be strictly defined so that 3rd party XDXF clients could figure out what format version they're dealing with.

[i18n] Support of non-european languages and non-latin scripts

Here is a list of proposals :

1. Writing systems and scripts

@k-sl wrote:

<k>: This element must support defining different scripts/writing systems for the same "key phrase". This is different from different spellings, in that the user should be allowed to choose in the DS settings which script they prefer and the DS should only display the chosen script, and not show all key phases repeated as may times as there are scripts. The DS may (possibly should) display the other script(s) with the definition text, as it does with transcription, etymology, etc. If the user searches correctly for a word in different script than the one they chose, the entry should be displayed with the word in the chosen script as headword. An obvious example is Chinese: it is common to have all entries in both main variants, simplified and traditional Chinese, which, with the current format, means all entries are doubled in the DS, and a user will have to sift through the simplified entries, even if they only read traditional Chinese (and vice-versa). The same is true for any other language which can be written in more than one script/writing script (either because different areas speaking the same language use different scripts or because it used to be written in a different script and the dictionary includes both variants).

The dictionary needs to have both both simplified and traditional Chinese headwords; you need to be able to look up a word in any of the two standards, regardless of which variant is used for the definitions. You also need to be able to see the characters used in the alternative standard when looking up a word. Your suggestion would mean all entries for which simplified and traditional characters are the same would be repeated and that, when looking up a word, the reader would have no way to know how the word is written in the other standard. Besides, most of what I'm describing already works fine in XDXF, I just add both <k> tags to to each article on my Chinese dictionaries and I've been using them like this for years. The problem is there is no way to define which is which, something that should be defined semantically, so the DS can show which is which, hide one if the reader wants to do so, and show the preferred version first, in all Chinese dictionaries.

Proposed solution:
We allow putting <k> with and without a specification, which language or script or country variant this <k> is:

<k xml:lang="zh-Hans">词典</k>
<k xml:lang="zh-Hant">詞典</k>
<k xml:lang="zh-Latn-pinyin" type="transliteration">Zhōngguó</k>
<k xml:lang="zh-Latn-wadegile" type="transliteration">Chung1-kuo2</k>
<k xml:lang="zh-Latn-pinyin" type="indexable_as">zhongguo</k>

How to encode language and scripts? The most reasonable and taking the least amount of work is to use BCP47 standard to support various writing systems.

What to do with multilingual dictionaries?

we should change lang_to and lang_from to support xml:lang and allow us to encode several languages for multilingual dictionaries. This is not possible with <!ATTLIST> I think. So I guess we will have to create new <!ELEMENT> inside meta_info. Am I wrong?

For this reason, I don't think that we need additional tags <tl> (for transliteration) and <pr> (for pronunciation).

XSLT scripts

Would be nice to see some xslt transform scripts for sorting an xdxf file, or yq type scripts to convert from yaml to xdxf. That could provide some more user friendly data entry possibilities. Interesting project!

Some contradictions

I skimmed through the standard and found that this sentence:

"The choice of how they have to be rendered is shifted to dictionary-browsing software ("DS"), its settings and user preferences"

have some conflicts in the text which are found by searching for e.g. "DS should" and "DS must".

cheers

Tables support (word forms, grammar)

What is the recommended way to create tables with XDXF?

Magic string did not match when converting from stardict

If the IFO StarDict file is saved as "UTF-8 with BOM" (which is commonly the case), then I get the error "Magic string did not match". I suggest to make the comparison for the magic string more robust and tolerate the BOM (i.e. 0xEF,0xBB,0xBF at the beginning of the file before the "actual" magic string).

h type for stardict parser

Currently unsupported, but it seems to be trivial to add support for it. Providing the patch.

https://github.com/jarekczek/various_files/blob/master/makedict/h.patch

Add `xdxf` topic to the repo

Could you please add more topics to the repo:

How to keep media files for dictionary (audio, images, SVGs, video)

This is an important question. There is a solution to make keep the whole dictionary in 1 file: to imprint all media to the XDXF xml with base64 encoding. Do you think it is reasonable to do?

Can not compile xdxf_makedict on Mac OS X 10.8.4

I get the following errors when I try to compile xdxf_makedict. Has anybody tested it on Mac OS?

~/dvcs_src/xdxf_makedict$ cd ..
~/dvcs_src$ mkdir xdxf_makedict_build
~/dvcs_src$ cd xdxf_makedict_build
~/dvcs_src/xdxf_makedict_build$ dir
~/dvcs_src/xdxf_makedict_build$ cmake ../xdxf_makedict
-- The C compiler identification is Clang 4.1.0
-- The CXX compiler identification is Clang 4.1.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found PkgConfig: /opt/local/bin/pkg-config (found version "0.28") 
-- checking for one of the modules 'glib-2.0'
-- Found GLib2: glib-2.0;intl /opt/local/include/glib-2.0;/opt/local/lib/glib-2.0/include
-- Looking for include file glib/gregex.h
-- Looking for include file glib/gregex.h - not found
-- Found ZLIB: /usr/lib/libz.dylib (found version "1.2.5") 
-- Found Gettext: /opt/local/bin/msgmerge (found version "0.18.3") 
-- Performing Test ICONV_HAVE_WERROR
-- Performing Test ICONV_HAVE_WERROR - Success
-- Performing Test ICONV_SECOND_ARGUMENT_IS_CONST
-- Performing Test ICONV_SECOND_ARGUMENT_IS_CONST - Failed
-- Found Iconv: /usr/lib/libiconv.dylib
-- Looking for dgettext
-- Looking for dgettext - not found
-- Found libintl: /opt/local/lib/libintl.dylib
-- Looking for mmap
-- Looking for mmap - found
-- Looking for locale.h
-- Looking for locale.h - found
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/py/dvcs_src/xdxf_makedict_build
~/dvcs_src/xdxf_makedict_build$ check
checkRandomLinesExist.py  check_dylib               checkcites                checkgid                  checknr                   checkrad                  checksyms                 
~/dvcs_src/xdxf_makedict_build$ check
checkRandomLinesExist.py  check_dylib               checkcites                checkgid                  checknr                   checkrad                  checksyms                 
~/dvcs_src/xdxf_makedict_build$ make checkinstall
make: *** No rule to make target `checkinstall'.  Stop.
~/dvcs_src/xdxf_makedict_build$ make -j8
Scanning dependencies of target makedict
[  3%] [  7%] [ 11%] [ 15%] Building CXX object CMakeFiles/makedict.dir/src/file.cpp.o
Building CXX object CMakeFiles/makedict.dir/src/makedict.cpp.o
[ 19%] Building CXX object CMakeFiles/makedict.dir/src/generator.cpp.o
[ 23%] Building CXX object CMakeFiles/makedict.dir/src/parser.cpp.o
Building CXX object CMakeFiles/makedict.dir/src/process.cpp.o
[ 30%] [ 30%] Building CXX object CMakeFiles/makedict.dir/src/connector.cpp.o
Building CXX object CMakeFiles/makedict.dir/src/utils.cpp.o
Building CXX object CMakeFiles/makedict.dir/src/log.cpp.o
In file included from /Users/py/dvcs_src/xdxf_makedict/src/parser.cpp:32:
In file included from /Users/py/dvcs_src/xdxf_makedict/src/utils.hpp:4:
In file included from /usr/include/c++/4.2.1/algorithm:64:
In file included from /usr/include/c++/4.2.1/bits/stl_algobase.h:70:
In file included from /usr/include/c++/4.2.1/iosfwd:44:
/usr/include/c++/4.2.1/bits/c++locale.h:66:24: error: no member named 'libintl_setlocale' in namespace 'std'
    char* __old = std::setlocale(LC_NUMERIC, NULL);
                  ~~~~~^
/opt/local/include/libintl.h:432:19: note: expanded from macro 'setlocale'
#define setlocale libintl_setlocale
                  ^
In file included from /Users/py/dvcs_src/xdxf_makedict/src/parser.cpp:32:
In file included from /Users/py/dvcs_src/xdxf_makedict/src/utils.hpp:4:
In file included from /usr/include/c++/4.2.1/algorithm:64:
In file included from /usr/include/c++/4.2.1/bits/stl_algobase.h:70:
In file included from /usr/include/c++/4.2.1/iosfwd:44:
/usr/include/c++/4.2.1/bits/c++locale.h:72:7: error: no member named 'libintl_setlocale' in namespace 'std'
        std::setlocale(LC_NUMERIC, "C");
        ~~~~~^
/opt/local/include/libintl.h:432:19: note: expanded from macro 'setlocale'
#define setlocale libintl_setlocale
                  ^
In file included from /Users/py/dvcs_src/xdxf_makedict/src/parser.cpp:32:
In file included from /Users/py/dvcs_src/xdxf_makedict/src/utils.hpp:4:
In file included from /usr/include/c++/4.2.1/algorithm:64:
In file included from /usr/include/c++/4.2.1/bits/stl_algobase.h:70:
In file included from /usr/include/c++/4.2.1/iosfwd:44:
/usr/include/c++/4.2.1/bits/c++locale.h:88:7: error: no member named 'libintl_setlocale' in namespace 'std'
        std::setlocale(LC_NUMERIC, __sav);
        ~~~~~^
/opt/local/include/libintl.h:432:19: note: expanded from macro 'setlocale'
#define setlocale libintl_setlocale
                  ^
In file included from /Users/py/dvcs_src/xdxf_makedict/src/makedict.cpp:34:
In file included from /usr/include/c++/4.2.1/map:64:
In file included from /usr/include/c++/4.2.1/bits/stl_tree.h:68:
In file included from /usr/include/c++/4.2.1/bits/stl_algobase.h:70:
In file included from /usr/include/c++/4.2.1/iosfwd:44:
/usr/include/c++/4.2.1/bits/c++locale.h:66:24: error: no member named 'libintl_setlocale' in namespace 'std'
    char* __old = std::setlocale(LC_NUMERIC, NULL);
                  ~~~~~^
/opt/local/include/libintl.h:432:19: note: expanded from macro 'setlocale'
#define setlocale libintl_setlocale
                  ^
In file included from /Users/py/dvcs_src/xdxf_makedict/src/makedict.cpp:34:
In file included from /usr/include/c++/4.2.1/map:64:
In file included from /usr/include/c++/4.2.1/bits/stl_tree.h:68:
In file included from /usr/include/c++/4.2.1/bits/stl_algobase.h:70:
In file included from /usr/include/c++/4.2.1/iosfwd:44:
/usr/include/c++/4.2.1/bits/c++locale.h:72:7: error: no member named 'libintl_setlocale' in namespace 'std'
        std::setlocale(LC_NUMERIC, "C");
        ~~~~~^
/opt/local/include/libintl.h:432:19: note: expanded from macro 'setlocale'
#define setlocale libintl_setlocale
                  ^
In file included from /Users/py/dvcs_src/xdxf_makedict/src/makedict.cpp:34:
In file included from /usr/include/c++/4.2.1/map:64:
In file included from /usr/include/c++/4.2.1/bits/stl_tree.h:68:
In file included from /usr/include/c++/4.2.1/bits/stl_algobase.h:70:
In file included from /usr/include/c++/4.2.1/iosfwd:44:
/usr/include/c++/4.2.1/bits/c++locale.h:88:7: error: no member named 'libintl_setlocale' in namespace 'std'
        std::setlocale(LC_NUMERIC, __sav);
        ~~~~~^
/opt/local/include/libintl.h:432:19: note: expanded from macro 'setlocale'
#define setlocale libintl_setlocale
                  ^
In file included from /Users/py/dvcs_src/xdxf_makedict/src/generator.cpp:33:
In file included from /usr/include/c++/4.2.1/numeric:66:
In file included from /usr/include/c++/4.2.1/iterator:69:
In file included from /usr/include/c++/4.2.1/ostream:44:
In file included from /usr/include/c++/4.2.1/ios:42:
In file included from /usr/include/c++/4.2.1/iosfwd:44:
/usr/include/c++/4.2.1/bits/c++locale.h:66:24: error: no member named 'libintl_setlocale' in namespace 'std'
    char* __old = std::setlocale(LC_NUMERIC, NULL);
                  ~~~~~^
/opt/local/include/libintl.h:432:19: note: expanded from macro 'setlocale'
#define setlocale libintl_setlocale
                  ^
In file included from /Users/py/dvcs_src/xdxf_makedict/src/generator.cpp:33:
In file included from /usr/include/c++/4.2.1/numeric:66:
In file included from /usr/include/c++/4.2.1/iterator:69:
In file included from /usr/include/c++/4.2.1/ostream:44:
In file included from /usr/include/c++/4.2.1/ios:42:
In file included from /usr/include/c++/4.2.1/iosfwd:44:
/usr/include/c++/4.2.1/bits/c++locale.h:72:7: error: no member named 'libintl_setlocale' in namespace 'std'
        std::setlocale(LC_NUMERIC, "C");
        ~~~~~^
/opt/local/include/libintl.h:432:19: note: expanded from macro 'setlocale'
#define setlocale libintl_setlocale
                  ^
In file included from /Users/py/dvcs_src/xdxf_makedict/src/generator.cpp:33:
In file included from /usr/include/c++/4.2.1/numeric:66:
In file included from /usr/include/c++/4.2.1/iterator:69:
In file included from /usr/include/c++/4.2.1/ostream:44:
In file included from /usr/include/c++/4.2.1/ios:42:
In file included from /usr/include/c++/4.2.1/iosfwd:44:
/usr/include/c++/4.2.1/bits/c++locale.h:88:7: error: no member named 'libintl_setlocale' in namespace 'std'
        std::setlocale(LC_NUMERIC, __sav);
        ~~~~~^
/opt/local/include/libintl.h:432:19: note: expanded from macro 'setlocale'
#define setlocale libintl_setlocale
                  ^
[ 34%] [ 38%] Building CXX object CMakeFiles/makedict.dir/src/xml.cpp.o
Building CXX object CMakeFiles/makedict.dir/src/charset_conv.cpp.o
[ 42%] 3 errors generated.
make[2]: *** [CMakeFiles/makedict.dir/src/parser.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
Building CXX object CMakeFiles/makedict.dir/src/dictd_generator.cpp.o
[ 46%] Building CXX object CMakeFiles/makedict.dir/src/dictd_parser.cpp.o
3 errors generated.
make[2]: *** [CMakeFiles/makedict.dir/src/makedict.cpp.o] Error 1
3 errors generated.
make[2]: *** [CMakeFiles/makedict.dir/src/generator.cpp.o] Error 1
/Users/py/dvcs_src/xdxf_makedict/src/charset_conv.cpp:107:5: warning: add explicit braces to avoid dangling else [-Wdangling-else]
                } else {
                  ^
1 warning generated.
make[1]: *** [CMakeFiles/makedict.dir/all] Error 2
make: *** [all] Error 2

[feature request] Dictionary of synonyms - too many key phrases

I am compiling a dictionary of synonyms and some articles have too many keys. So, my problem is "how to show them in DS (particulary GoldenDict)?" I have tried several styles, like densed line-space or headers joined in a flat list "key1; key2; key3; …", but still feel they consume to much space.

I am working on a dictionary of Latin words. There is no "official" spelling rules, so some words have spelling variants (e.g., pellex, pelex, paelex). All these variants are "correct" and are not conjugation/declension forms, so hunspell can't bring them to one lemma. Also, it is very common among Latin dictionaries to include Greek words.

Could we add more keys for search (look up), but do not show them to users? The problem is partialy about visual design, but spelling variants and optional keys (Greek in my case) should be handled some way by the dictionary format. #30 proposes <spelling> tag, I think about <k type="hidden">. Or maybe in place of showing all keywords we would include a predefined header for DS?

Below attached is a sample article from my dictionary. It already has 11 keywords (all about mental disease), and I would like to add 9 more but doubt that the exceedingly long list of keys will be usefull.

Share XDXF dictionaries here

Thousands of XDXF dictionaries.

makedict throws "Corrupted dictionary or problem with hard disk" error reading most FreeDict dictionaries, e.g. freedict-afr-deu.index

> makedict --verbose=9 -o xdxf freedict-afr-deu.index
fill_codecs_table: Plugin directory: /usr/local/lib/makedict-codecs
/usr/local/lib/makedict-codecs/mueller7_parser.py is executable, check if it is makedict codec
/usr/local/lib/makedict-codecs/apresyan.py is executable, check if it is makedict codec
Input codec: apresyan   /usr/local/lib/makedict-codecs/apresyan.py
Input codec: mueller7   /usr/local/lib/makedict-codecs/mueller7_parser.py
Corrupted dictionary or problem with hard disk

Sample freedict-afr-deu is here: https://app.box.com/s/jkyzx0aea7wtuaba8b2o

A few FreeDict dictionaries are processed just fine.

These throw error:

freedict-afr-deu.index
freedict-cze-eng.index
freedict-dan-eng.index
freedict-deu-eng.index
freedict-deu-fra.index
freedict-deu-ita.index
freedict-deu-nld.index
freedict-deu-por.index
freedict-eng-deu.index
freedict-eng-fra.index
freedict-eng-hun.index
freedict-eng-iri.index
freedict-eng-ita.index
freedict-eng-lat.index
freedict-eng-nld.index
freedict-eng-por.index
freedict-eng-rom.index
freedict-eng-rus.index
freedict-eng-scr.index
freedict-eng-spa.index
freedict-eng-swe.index
freedict-eng-wel.index
freedict-fra-deu.index
freedict-fra-eng.index
freedict-fra-nld.index
freedict-gla-deu.index
freedict-hun-eng.index
freedict-iri-eng.index
freedict-ita-deu.index
freedict-ita-eng.index
freedict-jpn-deu.index
freedict-lat-deu.index
freedict-lat-eng.index
freedict-nld-deu.index
freedict-nld-eng.index
freedict-nld-fra.index
freedict-por-deu.index
freedict-por-eng.index
freedict-scr-eng.index
freedict-slo-eng.index
freedict-spa-eng.index
freedict-swe-eng.index
freedict-tur-deu.index
freedict-wel-eng.index

These are OK:

freedict-ara-eng.index
freedict-cro-eng.index
freedict-eng-ara.index
freedict-eng-cro.index
freedict-eng-cze.index
freedict-eng-hin.index
freedict-eng-swa.index
freedict-eng-tur.index
freedict-hin-eng.index
freedict-swa-eng.index
freedict-tur-eng.index