We are looking to add some pronunciation information to English WordNet and it would b

Looks good in general. A few things: I would maybe go for a mo

I agree with Michael's suggestions. <span class="email-hidden-togg

Yes, I had intended this to be available for Form</cod

Pronunciation information about schemas HOT 11 CLOSED

globalwordnet commented on August 29, 2024

Pronunciation information

from schemas.

Comments (11)

goodmami commented on August 29, 2024 1

Looks good in general. A few things:

I would maybe go for a more general word for "dialect" to avoid political controversies. Maybe "variety"?
Instead of a country code in "dialect" (or whatever we call it) and some specialization under "notes", can we combine them into one bcp-47 tag? This saves an attribute and it dovetails as a specialization of the lexicon's language attribute. E.g., en-GB-x-RP.
Do we need the / in the transcription? I can't imagine a use for phonetic transcription ([...] vs /.../), so perhaps we can assume it is a phonemic transcription and drop the / characters?

from schemas.

jmccrae commented on August 29, 2024

Hi.

yes, this is a good point, variety is better than dialect
we could do this. I guess that this would mean duplicating the language code, but this is okay
I would guess that some would prefer a phonemic transcription. We could drop the / and have an attribute for phonemic transcriptions?

from schemas.

fcbond commented on August 29, 2024

I agree with Michael's suggestions.

…

On Thu, Jul 30, 2020 at 11:37 PM John McCrae ***@***.***> wrote: Hi. 1. yes, this is a good point, variety is better than dialect 2. we could do this. I guess that this would mean duplicating the language code, but this is okay 3. I would guess that some would prefer a phonemic transcription. We could drop the / and have an attribute for phonemic transcriptions? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIPZRX6C4JD4ZZRGGD6LOTR6GHT7ANCNFSM4PNLIJMA> .

-- Francis Bond <http://www3.ntu.edu.sg/home/fcbond/> Division of Linguistics and Multilingual Studies Nanyang Technological University

from schemas.

lmorgadodacosta commented on August 29, 2024

Hi there,

We have been working/discussing this exact topic a tiny bit for Kristang -- we are hoping to provide IPA and voice recordings for individual lemmas soon.

The problem I'd like to raised here is that Kristang shows a lot of metathesis in certain consonant clusters. E.g. ‘-dr-’(kodrah and kordah for ‘to wake up’). Within the context of revitalization, as we want people to start using a single spelling, we have decided it would be best to cluster these as "Forms" under a single lemma (i.e. the canonical form) . However, these internal forms do have different pronunciations.

Up to this point we were happy to use the Tag element (available to both Forms and Lemmas) and come up with our own "category" notation. But I think including such a Pronunciation element is definitely an improvement. However, I would like to see what you all think about:

making "Pronunciation" available to both Lemmas and Forms
adding an explicit attribute to a sound file path

from schemas.

goodmami commented on August 29, 2024

making "Pronunciation" available to both Lemmas and Forms

I think this makes sense. Actually, in a new Python-based wordnet module I'm working on, all lemmas are just forms anyway, so doing something different for lemmas and forms would be more trouble than doing the same thing (but this, at least, is just a selfish reason).

adding an explicit attribute to a sound file path

I'm less enthused about this, but if we're adding logos (#3), then it's not breaking new ground to link to external files. However, shouldn't this be a URL instead of a file path? If a file path, then absolute paths won't work, and we'd need some kind of resource directory such that the paths are relative to this directory, or something. This sounds like over-engineering.

Better, perhaps, would be that your application provides a mapping from local paths into the ids of the wordnet. The trouble is that lemmas/forms do not have their own ids, so it would have to be linked to the LexicalEntry, then to the writtenForm under that entry (are forms guaranteed to be unique under a lexical entry?).

Another issue is when you want multiple audio files for the same lemma/form (e.g., from multiple speakers). It doesn't seem like an attribute for a file path or URL would easily scale to multiple files.

from schemas.

1313ou commented on August 29, 2024

Implications: IPA symbols are not ASCII, so all tools must handle UTF8 (or whatever charset is defined as desired)

from schemas.

jmccrae commented on August 29, 2024

Yes, I had intended this to be available for Forms as well as Lemmas
We could add a URL to the sound file if available, this is useful for some even if it is not ever used
I think UTF-8 is already required by the serialization. If someone wants a strict ASCII file they will have to use an ASCII based transcription scheme

from schemas.

lmorgadodacosta commented on August 29, 2024

Sorry, by sound file path I definitely meant URL or some public URI.
I do kinda see the problem raised by mike for multiple recordings of the same lemma/form... But if pronunciations are multiple elements, then you could just provide multiple Pronunciation elements. Individual projects could then use the attribute notes to keep information about the speaker, if necessary (e.g. male/female). But I would give a quick link to something that is very meaningful under the element we're discussing.

from schemas.

1313ou commented on August 29, 2024

To my knowledge the current state of EWN does not use characters that require coding outside ASCII, so the current files are both ASCII and UTF8. So the relevant tests are still to come.

from schemas.

goodmami commented on August 29, 2024

To my knowledge the current state of EWN does not use characters that require coding outside ASCII, so the current files are both ASCII and UTF8. So the relevant tests are still to come.

I was surprised by this and thought that surely things like jalapeño and résumé would have the diacritics in EWN, even if only as alternative forms, but found nothing but ascii throughout the whole file. In any case, there are non-English wordnets with plenty of non-ascii forms, so it would be unfortunate if any tools assumed wordnets to be ascii-only.

from schemas.

goodmami commented on August 29, 2024

Returning to the issue of marking a transcription as phonemic or phonetic... What do people think of keeping the IPA delimiters (/../ or [...]) in the actual transcription? When I suggested dropping the delimiters (See (3) in my comment above), I assumed we only cared about phonemic transcriptions. Since for non-IPA transcription the attribute may be irrelevant (or implicit given the notation attribute), perhaps the phonemic attribute is a bad idea. Furthermore, the IPA delimiters are shorter and may be clearer for someone familiar with IPA.

from schemas.

Pronunciation information about schemas HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs