Summarizes the analysis of SF Place and its relation to the location of the SF description in the text.
The idea is to analyze whether the mentions of SF place occur near or far away from the sentence (segment) describing the SF.
First, we need to find out the sentence describing the SFs. For this, we use the description
field in the SF
annotations as a proxy for which sentence describes the SFs. We searched for the text in the description
field in the
actual document, and assign the corresponding segment ID to the SFs as the sentence describing/triggering the SFs.
Next, for each SF that has been assigned an entity as its location (SF Place), we find all mentions of that entity in the document and find the closest mention to the sentence containing the SF description. Then we plot the segment ID of the SF description versus the segment ID of the closest location mention in Figure 1.
Figure 1. The plot of segment ID containing the SF description field vs the segment ID containing the closest mention of the corresponding SF place in the gold annotations. The black identity line shows where the dots would be if the place mention is found in the same segment as the SF description.
As we can see, most of the location mention of the SFs lie closely to the diagonal, which means they are very likely to be found in the same sentence. Most of the points lie below the diagonal, which is expected, since usually the place names are mentioned first before the events there are described. But there are also cases where the events are described first before later mentioning the details of the locations.
Next, since this plot does not actually show how many instances belong to each point, we also show the histogram of the distance (in the number of intervening sentences between the sentence containing the SF and the location) in Figure 2.
Figure 2. The histogram of distances (in the number of intervening sentences between the sentence containing the SF and the location) for the 4 ILs.
Here we can clearly see how most of the SF instances (70-90% of all SFs for which we can find the description) actually have their location mention close to the sentence describing the SF itself. This might explain why the methods used by most teams to assign location, which is based on proximity to the sentences found to be triggering the SF, work pretty well.
For completeness, Table 1 lists the number of all SFs, number of SFs which description
field is non-empty and can be
found in the document, and the number of SFs with location mention.
IL5 | IL6 | IL9 | IL10 | |
---|---|---|---|---|
#SFs | 1,581 | 1,146 | 354 | 390 |
#SFs with descriptions | 749 | 722 | 140 | 166 |
#SFs with place | 693 | 608 | 129 | 112 |
Table 1. The statistics of the SFs used in this experiments. Some SFs do not have `description`s, and some of them are not found in the document. Of these, a small portion of them do not have a location mention.
The details of the document ID and the segment ID of the SF trigger and the location is placed in loc_stats_IL{5,6,9,10}.log For example, from loc_stats_IL6.log we can see that the outlier points in IL6 plots in Figure 1 (seg_num={1,6,10}, loc_seg_num=37) comes from document ID IL6_NW_020411_20160311_H0040LSIF refers to the first mention of Oromiyaa in segment-37.
All the code is also made available at this repository. Note that this repository does not store the data necessary to run the experiments. Check the comments at get_loc_stats.bash for more information.