GithubHelp home page GithubHelp logo

Data Input/Storage Recommendation about cog HOT 7 OPEN

sillsdev avatar sillsdev commented on August 27, 2024
Data Input/Storage Recommendation

from cog.

Comments (7)

ddaspit avatar ddaspit commented on August 27, 2024

As you have found, Cog was never intended to be used for phonetic transcription. Cog was designed to specifically focus on comparison and analysis of word lists. The assumption is that the word lists have been captured and transcribed in another application and then imported into Cog. Cog was not intended to replace or duplicate the functionality of tools like WordSurv, Excel, or ELAN.

As you mentioned, one possible future direction for Cog is to access WordSurv data directly. WordSurv is specifically designed for capturing word lists, so it seems like a complementary fit for Cog and WordSurv to interact in this way. ELAN is designed for a different purpose, which is annotating audio and video data. Although it can be used to transcribe word lists, it is not specifically designed to do so. It is more of a general-purpose tool. The same is true of ELAN's XML format. It is a format for capturing annotations of video and audio data and not word lists. In this regard, it would probably be an inappropriate format for Cog to use natively to store word lists. Having said that, I do think it makes sense to add EAF import to Cog. No one has requested this feature before, so I'm not sure how much ELAN is used by surveyors, but I could certainly see a surveyor using ELAN for word list transcription. Are you using ELAN to do phonetic transcription of word lists? The tricky thing with implementing EAF import is the generic nature of ELAN. There is probably no standard way of formatting the ELAN annotations when capturing word list transcriptions. Each user could do it differently. The EAF import would need to be flexible enough to be able to deal with the various ways that a user could have setup their annotations.

from cog.

Steve-Miller avatar Steve-Miller commented on August 27, 2024

I have indeed used ELAN together with SayMore for phonetic transcription of a word list. As a user, I didn't have to worry about the proper .eaf format. SayMore opened up the file for me with the proper format in ELAN with a click or two. This eventually became the beginning of my lexicon in FLEx. I'm envisioning the same workflow for Cog.

I doubt I would have thought of using ELAN's .eaf format as a data store on my own. I'm just following behind JohnH, who used it in SayMore, and used it quite well. The philosophy is: "SayMore borrows ELAN’s file format, so that you can do the basics in SayMore (transcription and translation) and then just double-click on the file to do further work in ELAN" if you wish. (http://saymore.palaso.org/news-about-saymore/) Since I've effectively used SayMore + ELAN to transcribe a word list, and since Cog, like Saymore, does not intend to have its own data store, I thought ELAN would be a natural fit for Cog.

In my mind, this is not just fitting WordSurv together with Cog. This is looking at the larger "ecosystem" (to use a buzzword) already established between SayMore, ELAN, and FLEx. I would think having a workflow from WordSurv/Cog all the way into FLEx would be of interest to SIL. I know it's of interest to me. I once wrote about this once to Beth, but I found today that Ryan Pennington already wrote up and published a paper on it: https://www.academia.edu/6474779/Producing_time-aligned_interlinear_texts_Towards_a_SayMore_FLEx_ELAN_workflow. (This was tough for me to get to, even with an Academia account, so you might have to work at it.)

While storing word list transcriptions natively in a specific .eaf format seems like the most elegant solution to me, and entirely appropriate given everything SayMore has done, an .eaf import is my second choice.

from cog.

Steve-Miller avatar Steve-Miller commented on August 27, 2024

FWIW, this is what SayMore says about the .eaf structure it expects, copied from the help file:


ELAN allows a richly nested and flexible set of tiers, which may be different for each media file. When SayMore uses an ELAN file as the basis for a creating media file's annotation file, it expects certain tiers to exist in that ELAN file. Others may be present, but they will be ignored by SayMore.

If you have an existing ELAN file and would like to associate it with a media file in SayMore, the following must be true:

--There is a Transcription tier which has a type for which Time-alignable is selected ().

--If you already have a tier for translation of those phrases, it must be:

----a child of Transcription

----named Phrase Free Translation

----have a type for which the stereotype is Symbolic Association.

To use an existing ELAN file as the basis for a SayMore annotation file, select Copy an existing ELAN file on the Start Annotating tab. Then you can work with oral translation annotations and careful speech annotations, and transcription and free translation annotations.

If necessary, open the file in ELAN and work with transcription and free translations there. Be careful not to remove or rename the 'Transcription' and 'Free Translation' tiers, or add any additional tiers.

from cog.

ddaspit avatar ddaspit commented on August 27, 2024

This information is definitely helpful. Cog could follow the same tier format as SayMore. The "Transcription" tier would be used for the IPA transcription and the "Phrase Free Translation" tier would be used for the meaning. Does that make sense? Obviously, SayMore is used for any kind of recorded sessions, not just word lists, so Cog would have to have the additional requirements that each word is in a separate annotation and that each meaning in the "Phrase Free Translation" tier is unique. Importing this format wouldn't be hard. I would have to think a lot more about how to use the EAF file directly instead of importing. Cog would still need its own project file, since there is lots of other information in it other than the word lists.

from cog.

Steve-Miller avatar Steve-Miller commented on August 27, 2024

Yes, when I used SayMore/ELAN to transcribe the word list, the phonetic transcription went into the Transcription tier of ELAN, and the meaning/gloss went into the Phrase Free Translation tier.

If I know JohnH, he separated the data tier from the UX tier. SayMore is a Palasao project, isn't it? One suggestion is to use as much of that code as you can.

I nearly emailed you the word list audio recording I annotated and the SayMore project file yesterday. The mb size stopped me. (I pay by the mb here, plus I'm not sure if there's a size restriction.) I think what I will do instead is transcribe a one-word "word list" in SayMore/ELAN. If I can do it quickly, I'll either attach it in another comment here or email it to you. Then you can see it for yourself, without trying to figure out how to do it.

I emailed you a message a few minutes ago about a surveyor's perspective of the interaction between WordSurv and Cog. In short, the idea never crossed their minds until I asked them about it. It was a surprise to me, too, when I found it in Cog's help file. As far as I can see now, I don't think anyone will use WordSurv for a data store, nor do I expect people to use Cog and WordSurv together.

from cog.

Steve-Miller avatar Steve-Miller commented on August 27, 2024

Incidentally, SayMore can read an Audacity file. I started there in Audacity, chopping up the word list into individual sound recordings. Really useful. Something more to think about.

from cog.

Steve-Miller avatar Steve-Miller commented on August 27, 2024

Okay, so we're in luck. I found a recording that has three words in it. It's not a true word list, but it gives you the idea. I annotated this really quickly and gave it some glosses/meanings.

I zipped everything up into a .zip file. It has the .wav file, the SayMore project file, the ELAN .eaf file, and a couple of other files of note. If you install SayMore and ELAN, you should be able to unzip this under the SayMore directory. Unfortunately, Git choked on it, so I'll email it to you.

from cog.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.