cfournie / segmentation.evaluation Goto Github PK

View Code? Open in Web Editor NEW

54.0 54.0 12.0 859 KB

SegEval Segmentation Evaluation Package

Home Page: http://segeval.readthedocs.org/

License: BSD 3-Clause "New" or "Revised" License

Python 99.70% Shell 0.03% Makefile 0.27%

segmentation.evaluation's People

Contributors

Stargazers

Watchers

Forkers

anna-ka pombredanne contours rhoposit kuonanhong levstyle lxh-123 rotmanmi 5610613357 keighrim serapio nickledave

segmentation.evaluation's Issues

Number of boundary matches in Stargazers data set?

I am working with the code in the preview_b branch of your repo, which you so kindly provided for me earlier this spring. I am trying to use it to replicate the numbers in your thesis at http://hdl.handle.net/10393/24064, and I am having a little trouble.

Specifically I am trying to replicate the numbers in Table 5.3b on page 154 of your thesis. See the following test code: https://gist.github.com/rybesh/5627500

This code is using linear_edit_distance from the preview_b branch of your repo: https://github.com/cfournie/segmentation.evaluation/blob/preview_b/src/python/main/segeval/similarity/distance/SingleBoundaryDistance.py#L36

The test code shows 72 additions/deletions and 28 transpositions, but 211 matches rather than 125.

Looking at the routine for calculating matches I see:

matches = 0
for string_a_i, string_b_i in zip(string_a, string_b):
    matches += len(set(string_a_i).union(set(string_b_i)))

https://github.com/cfournie/segmentation.evaluation/blob/preview_b/src/python/main/segeval/similarity/distance/MultipleBoundaryDistance.py#L289-291

I couldn't convince myself that this was correct, so I replaced that code with:

bnds_per_pb = [ sum(chain(*pair)) for pair in zip(string_a, string_b) ]
assert all((x <= 2) for x in bnds_per_pb)
matches = len([ x for x in bnds_per_pb if x == 2 ])

(Note that I am working only with single-boundary-type segmentations.)

But, this gives me yet a third figure for the number of matches: 83.

So, now I am a bit confused as to how the number of matches is being or should be counted in order to calculate the B variant of boundary similarity. Since I am hoping to use the approach outlined in your thesis for a segmentation evaluation I am conducting, any help you could provide would be much appreciated.

Expected agreement not calculated correctly for π*?

I think there may be an issue with the way expected agreement is being calculated here:
https://github.com/cfournie/segmentation.evaluation/blob/master/segeval/agreement/pi.py#L30-L39

Assuming a single boundary type, expected agreement should be the square of the proportion of times a boundary was placed, right? But this is calculating the square of the mean of the proportions for each segmentation, which is not the same.

Assume we have the following:

Doc	PBs	Coder A	Coder B
1	6	1	2
2	8	2	3

proportion of times a boundary was placed: (1+2+2+3) / (2 * (6+8)) = 8/28
mean of proportions for each segmentation: ((1/6)+(2/6)+(2/8)+(3/8)) / 4 = 9/32

It's close, but not the same.

some cases don't have results

window_diff algorithm is not implemented robustly. for example, if hypothesis = [2] and reference = [1,1] then window_diff(hypothesis, reference) would report an error

decimal.InvalidOperation: 0 / 0

assert len(window) is window_size + 1 for equal numbers

Hello,

I would like to report wrong usage of assert x is y line:

https://github.com/cfournie/segmentation.evaluation/blob/master/segeval/window/windowdiff.py

line 110

assert len(window) is window_size + 1

fails.

It should be enough to have

assert len(window) == window_size + 1

Examples (python2.7.6):

>>> assert 3 is 2 + 1
>>> assert 300 is 299 + 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError
>>> id(300)
17688624
>>> id(299 + 1)
17688456
>>> id(1 + 299)
17688600

cfournie / segmentation.evaluation Goto Github PK

segmentation.evaluation's People

Contributors

Stargazers

Watchers

Forkers

segmentation.evaluation's Issues

Number of boundary matches in Stargazers data set?

Expected agreement not calculated correctly for π*?

some cases don't have results

assert len(window) is window_size + 1 for equal numbers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs