- [1] 2017, Interaction of law and language in the EU: Challenges of translating in multilingual environment (17 pages)
- [2] 2008, Translation at the United Nations as Specialized Translation (16 pages)
For sake of simplified documentation, I think we may create a convention for language attributes to mark proofread (or terminologically reviewed based on translator's feedback) for source terms optimized for cases where they're already cannot be changed even if creators of initial terms would see the complaints as valid. The [2], for example, mentions that at UN, they have a strong system in place to review (even before texts are passed for translators), which is very different from the situation on [3].
I'm not sure how many types of attributes we create, but at least one related to average proofreading (for example, work done by software developers or people copy and pasting from other references that may already be wrong) and another related to when translation are done based on material that may already have context that do explain what the concept means, but the exact term on that language is likely to produce wrong translations.
Considering we using now to encode English, approach similar to BCP47 (but as baseline ISO 639-3 and non optional ISO 15924) the source terms could still be eng-Latn
and variant "eng-Latn-x-term1234" where term1234 means the variant. So when exporting translations jobs, the human could try to export the "eng-Latn-x-term1234" and for terms it did not found, it would export from the base eng-Latn
. Or XLIFF formats (or spreadsheets) to give to translators could already differentiate what was the official term and the reviewed term.
Examples of use case
Core-Person-Vocab head term "gender" (with definition that mix two concepts)
See also comment SEMICeu/CPOV#12 (comment).
Note: from the point of view of terminology, the fact of already define a term in so generic terms (in special considering that Core-Person-Vocabulary already was supposed to be a planned controlled vocabulary. But even if is intentionally be ambiguous, the way to design the head terms in English would be make a composed term with "OR", as in "Sex or Gender" or "Biological Sex or Gender Identity" The problem of take one head term from one of the "sub concepts" and attach definition of both concepts causes even more confusion.
For example this table by HL7 https://confluence.hl7.org/display/VOC/Gender+Coding+with+International+Data+Exchange+Standards (https://archive.ph/VQR42) already uses term "gender" in English while in German the word is "Biologisches Geschlecht".
Note that the old version of 1.0 in addition to use tables from ISO / IEC 5218: 2004 (so, if compilers of HL7 try to find the better English term, they following CPV 1.00 would lead to use gender.) Add to this that the [1] 2017, Interaction of law and language in the EU: Challenges of translating in multilingual environment (17 pages) already mentions the issue with English as working language be more ambiguous than German or French, this is not an isolated case.
Examples of conflicting issues with the head term "gender"
"Gender interacts with but is different from sex, which refers to the different biological and physiological characteristics of females, males and intersex persons, such as chromosomes, hormones and reproductive organs. Gender and sex are related to but different from gender identity." -- WHO
Already in English most references on what the definition used on the preview of Person-Specification, and not just WHO, not only would disagree to put same short head word for both concepts, but most de facto used values for these fields are strongly related to "biological sex" (which already do have terminology for it).
Quick comments on potential strategies to document changes on source terms before prepare translation jobs
Based on another job we're doing to compile HXLTM , the TICO-19 (see https://tico-19.github.io/ and https://github.com/EticaAI/tico-19-hxltm) on this podcast https://www.stitcher.com/show/the-global-podcast-2/episode/episode-14-twb-and-tico-19-project-80576088 the TICO-19 members mentions that use translated versions instead of go directly from English already is relevant. Considering both Spanish and their french version, I somewhat also agree that the translations seems to be less literal than the English source.
But here one thing: even either for TICO-19 (that is very different from CPV, was a project based on urgency) one alternative can also be a proofreading version of the English source.
I think on case of urgency projects, like TICO-19, if we add some feature to label alternative versions of source term, who is preparing the work to distribute for new translations could have more freedom and optimize for speed (dozens of terms, days, if not hours, to take actions). But on case of Core-Person-Vocab, not only because is less terms (but also because there is more planning involved), if we document some additional attribute to justify the change for new variant of source term, we may also document that this would need more metadata (for example, organizations like WHO, that could also back up feedback from translations that the source terms are not aligned with definitions).