Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Statistical Significance / Confidence Intervals about suber HOT 2 OPEN

mgaido91 commented on September 15, 2024

Statistical Significance / Confidence Intervals

from suber.

Comments (2)

patrick-wilken commented on September 15, 2024

Hi Marco,
thanks for the proposal, definitely sounds like a useful addition! You are in particular referring to this https://aclanthology.org/W04-3250.pdf, right?
Regarding sampling from subtitles: yes, that seems to be much less obvious than sampling from sentences. For the SubER calculation the files are already split into parts at points in time where both hypothesis and reference agree that there is no subtitle. So far this is an implementation detail for more efficient computation. But this is the closest thing to parallel segments that currently exists and those could maybe be used as units for sampling? There are several problems with this though: 1. segmentation depends on the hypotheses; 2. probably too few segments, depending on specific subtitle content; 3. length of segments varies greatly.
Another idea that comes to my mind is to calculate the SubER edit operations on the whole file, sample a subset of reference subtitle blocks, and calculate SubER scores using only the edit operations (and reference length) corresponding to those blocks. But this is only brainstorming right now, have to think it through...
I will be travelling the next two weeks, so can only really look into this after that. 🙃

from suber.

mgaido91 commented on September 15, 2024

Hi @patrick-wilken ! Thanks for your reply. Yes, that is the paper I was referring to. I looked into the code in these days and the easiest thing that comes to my mind is the following:

In the SubER for loop (https://github.com/apptek/SubER/blob/main/suber/metrics/suber.py#L29), we can keep track of the single edits and reference lengths, instead of just comulating them. Once we have these fine-grained stats, we can bootstrap with them. I already have some sort of implementation doing this. The main issues in this case would be:

How to integrate this in a clean way in the tool?
In this way we can only compute confidence intervals rather than the statistical significance between two hypotheses. But this second thing is very hard for all alignment issues. So as a first step, CI may be enough. What do you think?

Thanks,
Marco

from suber.

Statistical Significance / Confidence Intervals about suber HOT 2 OPEN

Comments (2)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs