Comments (7)
Filepath
train/t5_economic_0_202.deft
Content
The data/source_txt/t5_economic_jlee_202.txt 8145 8148 O -1 -1 0
great data/source_txt/t5_economic_jlee_202.txt 8149 8154 O -1 -1 0
economist data/source_txt/t5_economic_jlee_202.txt 8155 8164 O -1 -1 0
Milton data/source_txt/t5_economic_jlee_202.txt 8165 8171 O -1 -1 0
Friedman data/source_txt/t5_economic_jlee_202.txt 8172 8180 O -1 -1 0
( data/source_txt/t5_economic_jlee_202.txt 8181 8182 O -1 -1 0
1912 data/source_txt/t5_economic_jlee_202.txt 8182 8186 O -1 -1 0
– data/source_txt/t5_economic_jlee_202.txt 8186 8187 O -1 -1 0
2006 data/source_txt/t5_economic_jlee_202.txt 8187 8191 O -1 -1 0
) data/source_txt/t5_economic_jlee_202.txt 8191 8192 O -1 -1 0
summed data/source_txt/t5_economic_jlee_202.txt 8193 8199 O -1 -1 0
up data/source_txt/t5_economic_jlee_202.txt 8200 8202 O -1 -1 0
the data/source_txt/t5_economic_jlee_202.txt 8203 8206 O -1 -1 0
neoclassical data/source_txt/t5_economic_jlee_202.txt 8207 8219 O -1 -1 0
view data/source_txt/t5_economic_jlee_202.txt 8220 8224 O -1 -1 0
of data/source_txt/t5_economic_jlee_202.txt 8225 8227 O -1 -1 0
the data/source_txt/t5_economic_jlee_202.txt 8228 8231 O -1 -1 0
long data/source_txt/t5_economic_jlee_202.txt 8232 8236 O -1 -1 0
- data/source_txt/t5_economic_jlee_202.txt 8236 8237 O -1 -1 0
term data/source_txt/t5_economic_jlee_202.txt 8237 8241 O -1 -1 0
Phillips data/source_txt/t5_economic_jlee_202.txt 8242 8250 O -1 -1 0
curve data/source_txt/t5_economic_jlee_202.txt 8251 8256 O -1 -1 0
tradeoff data/source_txt/t5_economic_jlee_202.txt 8257 8265 O -1 -1 0
in data/source_txt/t5_economic_jlee_202.txt 8266 8268 O -1 -1 0
a data/source_txt/t5_economic_jlee_202.txt 8269 8270 O -1 -1 0
1967 data/source_txt/t5_economic_jlee_202.txt 8271 8275 O -1 -1 0
speech data/source_txt/t5_economic_jlee_202.txt 8276 8282 O -1 -1 0
: data/source_txt/t5_economic_jlee_202.txt 8282 8283 O -1 -1 0
“ data/source_txt/t5_economic_jlee_202.txt 8284 8285 O -1 -1 0
[ data/source_txt/t5_economic_jlee_202.txt 8285 8286 O -1 -1 0
T]here data/source_txt/t5_economic_jlee_202.txt 8286 8292 O -1 -1 0
is data/source_txt/t5_economic_jlee_202.txt 8293 8295 O -1 -1 0
always data/source_txt/t5_economic_jlee_202.txt 8296 8302 O -1 -1 0
a data/source_txt/t5_economic_jlee_202.txt 8303 8304 O -1 -1 0
temporary data/source_txt/t5_economic_jlee_202.txt 8305 8314 O -1 -1 0
trade data/source_txt/t5_economic_jlee_202.txt 8315 8320 O -1 -1 0
- data/source_txt/t5_economic_jlee_202.txt 8320 8321 O -1 -1 0
off data/source_txt/t5_economic_jlee_202.txt 8321 8324 O -1 -1 0
between data/source_txt/t5_economic_jlee_202.txt 8325 8332 O -1 -1 0
inflation data/source_txt/t5_economic_jlee_202.txt 8333 8342 O -1 -1 0
and data/source_txt/t5_economic_jlee_202.txt 8343 8346 O -1 -1 0
unemployment data/source_txt/t5_economic_jlee_202.txt 8347 8359 O -1 -1 0
; data/source_txt/t5_economic_jlee_202.txt 8359 8360 O -1 -1 0
there data/source_txt/t5_economic_jlee_202.txt 8361 8366 O -1 -1 0
is data/source_txt/t5_economic_jlee_202.txt 8367 8369 O -1 -1 0
no data/source_txt/t5_economic_jlee_202.txt 8370 8372 O -1 -1 0
permanent data/source_txt/t5_economic_jlee_202.txt 8373 8382 O -1 -1 0
trade data/source_txt/t5_economic_jlee_202.txt 8383 8388 O -1 -1 0
- data/source_txt/t5_economic_jlee_202.txt 8388 8389 O -1 -1 0
off data/source_txt/t5_economic_jlee_202.txt 8389 8392 O -1 -1 0
. data/source_txt/t5_economic_jlee_202.txt 8392 8393 O -1 -1 0
” data/source_txt/t5_economic_jlee_202.txt 8393 8394 O -1 -1 0
Lines 1539-1590, error in 1569
from deft_corpus.
Strange tokens in train:
strange_train = {
'16,000.Now',
'1884)112',
'1962).It',
'1964)',
'1965.Shelby',
'1985).While',
'1988).Craig',
'1991.Gallup',
'2003).Since',
'2005)',
'2009).Cognitive',
'2010).The',
'2013).There',
'2014)',
'2014.Summary',
'2017.Other',
'23.2This',
'E]conomic',
'FDA).http://www.fda.gov',
'States.29',
'T]here',
'cM.By',
'issues.http://blacklivesmatter.com/about/',
'link](b',
'link]).Dan',
'link]).Jose',
'link]).Keyssar',
'link]).Louis',
'link]).The',
'link]).This',
'link].An',
'link].In',
'link].This',
'link]a',
'link]ab',
'link]b',
'link]c',
'views.https://www.aclu.org/',
'vs.-time'
}
In dev:
strange_dev = {
'1979)',
'2000).Comte',
'bb).Imagine',
'link]b',
'link]d',
}
from deft_corpus.
Filepath
train/t1_biology_0_202.deft
Content
The data/source_txt/t1_biology_jlee_202.txt 13100 13103 B-Term T98 0 Direct-Defines
mitochondria data/source_txt/t1_biology_jlee_202.txt 13104 13116 I-Term T98 0 Direct-Defines
- data/source_txt/t1_biology_jlee_202.txt 13116 13117 I-Term T98 0 Direct-Defines
first data/source_txt/t1_biology_jlee_202.txt 13117 13122 I-Term T98 0 Direct-Defines
hypothesis data/source_txt/t1_biology_jlee_202.txt 13123 13133 I-Term T98 0 Direct-Defines
proposes data/source_txt/t1_biology_jlee_202.txt 13134 13142 O -1 -1 0
that data/source_txt/t1_biology_jlee_202.txt 13143 13147 O -1 -1 0
mitochondria data/source_txt/t1_biology_jlee_202.txt 13148 13160 B-Definition T99 T98 Direct-Defines
were data/source_txt/t1_biology_jlee_202.txt 13161 13165 I-Definition T99 T98 Direct-Defines
first data/source_txt/t1_biology_jlee_202.txt 13166 13171 I-Definition T99 T98 Direct-Defines
established data/source_txt/t1_biology_jlee_202.txt 13172 13183 I-Definition T99 T98 Direct-Defines
in data/source_txt/t1_biology_jlee_202.txt 13184 13186 I-Definition T99 T98 Direct-Defines
a data/source_txt/t1_biology_jlee_202.txt 13187 13188 I-Definition T99 T98 Direct-Defines
prokaryotic data/source_txt/t1_biology_jlee_202.txt 13189 13200 I-Definition T99 T98 Direct-Defines
host data/source_txt/t1_biology_jlee_202.txt 13201 13205 I-Definition T99 T98 Direct-Defines
( data/source_txt/t1_biology_jlee_202.txt 13206 13207 O -1 -1 0
[ data/source_txt/t1_biology_jlee_202.txt 13207 13208 O -1 -1 0
link]b data/source_txt/t1_biology_jlee_202.txt 13208 13214 O -1 -1 0
) data/source_txt/t1_biology_jlee_202.txt 13214 13215 O -1 -1 0
, data/source_txt/t1_biology_jlee_202.txt 13215 13216 O -1 -1 0
which data/source_txt/t1_biology_jlee_202.txt 13217 13222 B-Definition-frag T99-frag T99 fragment
subsequently data/source_txt/t1_biology_jlee_202.txt 13223 13235 I-Definition-frag T99-frag T99 fragment
acquired data/source_txt/t1_biology_jlee_202.txt 13236 13244 I-Definition-frag T99-frag T99 fragment
a data/source_txt/t1_biology_jlee_202.txt 13245 13246 I-Definition-frag T99-frag T99 fragment
nucleus data/source_txt/t1_biology_jlee_202.txt 13247 13254 I-Definition-frag T99-frag T99 fragment
, data/source_txt/t1_biology_jlee_202.txt 13254 13255 I-Definition-frag T99-frag T99 fragment
by data/source_txt/t1_biology_jlee_202.txt 13256 13258 I-Definition-frag T99-frag T99 fragment
fusion data/source_txt/t1_biology_jlee_202.txt 13259 13265 I-Definition-frag T99-frag T99 fragment
or data/source_txt/t1_biology_jlee_202.txt 13266 13268 I-Definition-frag T99-frag T99 fragment
other data/source_txt/t1_biology_jlee_202.txt 13269 13274 I-Definition-frag T99-frag T99 fragment
mechanisms data/source_txt/t1_biology_jlee_202.txt 13275 13285 I-Definition-frag T99-frag T99 fragment
, data/source_txt/t1_biology_jlee_202.txt 13285 13286 I-Definition-frag T99-frag T99 fragment
to data/source_txt/t1_biology_jlee_202.txt 13287 13289 I-Definition-frag T99-frag T99 fragment
become data/source_txt/t1_biology_jlee_202.txt 13290 13296 I-Definition-frag T99-frag T99 fragment
the data/source_txt/t1_biology_jlee_202.txt 13297 13300 I-Definition-frag T99-frag T99 fragment
first data/source_txt/t1_biology_jlee_202.txt 13301 13306 I-Definition-frag T99-frag T99 fragment
eukaryotic data/source_txt/t1_biology_jlee_202.txt 13307 13317 I-Definition-frag T99-frag T99 fragment
cell data/source_txt/t1_biology_jlee_202.txt 13318 13322 I-Definition-frag T99-frag T99 fragment
. data/source_txt/t1_biology_jlee_202.txt 13322 13323 O -1 -1 0
Lines 1956-1994, error in 1973
Most data/source_txt/t1_biology_jlee_202.txt 13332 13336 O -1 -1 0
interestingly data/source_txt/t1_biology_jlee_202.txt 13337 13350 O -1 -1 0
, data/source_txt/t1_biology_jlee_202.txt 13350 13351 O -1 -1 0
the data/source_txt/t1_biology_jlee_202.txt 13352 13355 B-Term T86 T86 0
eukaryote data/source_txt/t1_biology_jlee_202.txt 13356 13365 I-Term T86 T86 0
- data/source_txt/t1_biology_jlee_202.txt 13365 13366 I-Term T86 T86 0
first data/source_txt/t1_biology_jlee_202.txt 13366 13371 I-Term T86 T86 0
hypothesis data/source_txt/t1_biology_jlee_202.txt 13372 13382 I-Term T86 T86 0
proposes data/source_txt/t1_biology_jlee_202.txt 13383 13391 O -1 -1 0
that data/source_txt/t1_biology_jlee_202.txt 13392 13396 O -1 -1 0
prokaryotes data/source_txt/t1_biology_jlee_202.txt 13397 13408 B-Definition T100 T86 Direct-Defines
actually data/source_txt/t1_biology_jlee_202.txt 13409 13417 I-Definition T100 T86 Direct-Defines
evolved data/source_txt/t1_biology_jlee_202.txt 13418 13425 I-Definition T100 T86 Direct-Defines
from data/source_txt/t1_biology_jlee_202.txt 13426 13430 I-Definition T100 T86 Direct-Defines
eukaryotes data/source_txt/t1_biology_jlee_202.txt 13431 13441 I-Definition T100 T86 Direct-Defines
by data/source_txt/t1_biology_jlee_202.txt 13442 13444 I-Definition T100 T86 Direct-Defines
losing data/source_txt/t1_biology_jlee_202.txt 13445 13451 I-Definition T100 T86 Direct-Defines
genes data/source_txt/t1_biology_jlee_202.txt 13452 13457 I-Definition T100 T86 Direct-Defines
and data/source_txt/t1_biology_jlee_202.txt 13458 13461 I-Definition T100 T86 Direct-Defines
complexity data/source_txt/t1_biology_jlee_202.txt 13462 13472 I-Definition T100 T86 Direct-Defines
( data/source_txt/t1_biology_jlee_202.txt 13473 13474 O -1 -1 0
[ data/source_txt/t1_biology_jlee_202.txt 13474 13475 O -1 -1 0
link]c data/source_txt/t1_biology_jlee_202.txt 13475 13481 O -1 -1 0
) data/source_txt/t1_biology_jlee_202.txt 13481 13482 O -1 -1 0
. data/source_txt/t1_biology_jlee_202.txt 13482 13483 O -1 -1 0
Lines 1996-2020, error in 2018
from deft_corpus.
Filepath
train/t3_physics_0_101.deft
Content
Thus data/source_txt/t3_physics_jlee_101.txt 19912 19916 O -1 -1 0
we data/source_txt/t3_physics_jlee_101.txt 19917 19919 O -1 -1 0
can data/source_txt/t3_physics_jlee_101.txt 19920 19923 O -1 -1 0
think data/source_txt/t3_physics_jlee_101.txt 19924 19929 O -1 -1 0
of data/source_txt/t3_physics_jlee_101.txt 19930 19932 O -1 -1 0
the data/source_txt/t3_physics_jlee_101.txt 19933 19936 O -1 -1 0
electric data/source_txt/t3_physics_jlee_101.txt 19937 19945 O -1 -1 0
field data/source_txt/t3_physics_jlee_101.txt 19946 19951 O -1 -1 0
arrows data/source_txt/t3_physics_jlee_101.txt 19952 19958 O -1 -1 0
as data/source_txt/t3_physics_jlee_101.txt 19959 19961 O -1 -1 0
showing data/source_txt/t3_physics_jlee_101.txt 19962 19969 O -1 -1 0
the data/source_txt/t3_physics_jlee_101.txt 19970 19973 O -1 -1 0
direction data/source_txt/t3_physics_jlee_101.txt 19974 19983 O -1 -1 0
of data/source_txt/t3_physics_jlee_101.txt 19984 19986 O -1 -1 0
polarization data/source_txt/t3_physics_jlee_101.txt 19987 19999 O -1 -1 0
, data/source_txt/t3_physics_jlee_101.txt 19999 20000 O -1 -1 0
as data/source_txt/t3_physics_jlee_101.txt 20001 20003 O -1 -1 0
in data/source_txt/t3_physics_jlee_101.txt 20004 20006 O -1 -1 0
[ data/source_txt/t3_physics_jlee_101.txt 20007 20008 O -1 -1 0
link].An data/source_txt/t3_physics_jlee_101.txt 20008 20016 O -1 -1 0
EM data/source_txt/t3_physics_jlee_101.txt 20017 20019 O -1 -1 0
wave data/source_txt/t3_physics_jlee_101.txt 20020 20024 O -1 -1 0
, data/source_txt/t3_physics_jlee_101.txt 20024 20025 O -1 -1 0
such data/source_txt/t3_physics_jlee_101.txt 20026 20030 O -1 -1 0
as data/source_txt/t3_physics_jlee_101.txt 20031 20033 O -1 -1 0
light data/source_txt/t3_physics_jlee_101.txt 20034 20039 O -1 -1 0
, data/source_txt/t3_physics_jlee_101.txt 20039 20040 O -1 -1 0
is data/source_txt/t3_physics_jlee_101.txt 20041 20043 O -1 -1 0
a data/source_txt/t3_physics_jlee_101.txt 20044 20045 O -1 -1 0
transverse data/source_txt/t3_physics_jlee_101.txt 20046 20056 O -1 -1 0
wave data/source_txt/t3_physics_jlee_101.txt 20057 20061 O -1 -1 0
. data/source_txt/t3_physics_jlee_101.txt 20061 20062 O -1 -1 0
Lines 3598-3629, error in 3617
from deft_corpus.
Filepath
train/t7_government_0_303.deft
Content
New data/source_txt/t7_government_jlee_303.txt 28279 28282 O -1 -1 0
Jersey data/source_txt/t7_government_jlee_303.txt 28283 28289 O -1 -1 0
governor data/source_txt/t7_government_jlee_303.txt 28290 28298 O -1 -1 0
Chris data/source_txt/t7_government_jlee_303.txt 28299 28304 O -1 -1 0
Christie data/source_txt/t7_government_jlee_303.txt 28305 28313 O -1 -1 0
gained data/source_txt/t7_government_jlee_303.txt 28314 28320 O -1 -1 0
national data/source_txt/t7_government_jlee_303.txt 28321 28329 O -1 -1 0
attention data/source_txt/t7_government_jlee_303.txt 28330 28339 O -1 -1 0
in data/source_txt/t7_government_jlee_303.txt 28340 28342 O -1 -1 0
2012 data/source_txt/t7_government_jlee_303.txt 28343 28347 O -1 -1 0
over data/source_txt/t7_government_jlee_303.txt 28348 28352 O -1 -1 0
his data/source_txt/t7_government_jlee_303.txt 28353 28356 O -1 -1 0
handling data/source_txt/t7_government_jlee_303.txt 28357 28365 O -1 -1 0
of data/source_txt/t7_government_jlee_303.txt 28366 28368 O -1 -1 0
the data/source_txt/t7_government_jlee_303.txt 28369 28372 O -1 -1 0
aftermath data/source_txt/t7_government_jlee_303.txt 28373 28382 O -1 -1 0
of data/source_txt/t7_government_jlee_303.txt 28383 28385 O -1 -1 0
Hurricane data/source_txt/t7_government_jlee_303.txt 28386 28395 O -1 -1 0
Sandy data/source_txt/t7_government_jlee_303.txt 28396 28401 O -1 -1 0
, data/source_txt/t7_government_jlee_303.txt 28401 28402 O -1 -1 0
which data/source_txt/t7_government_jlee_303.txt 28403 28408 O -1 -1 0
caused data/source_txt/t7_government_jlee_303.txt 28409 28415 O -1 -1 0
an data/source_txt/t7_government_jlee_303.txt 28416 28418 O -1 -1 0
estimated data/source_txt/t7_government_jlee_303.txt 28419 28428 O -1 -1 0
$ data/source_txt/t7_government_jlee_303.txt 28429 28430 O -1 -1 0
65 data/source_txt/t7_government_jlee_303.txt 28430 28432 O -1 -1 0
billion data/source_txt/t7_government_jlee_303.txt 28433 28440 O -1 -1 0
worth data/source_txt/t7_government_jlee_303.txt 28441 28446 O -1 -1 0
of data/source_txt/t7_government_jlee_303.txt 28447 28449 O -1 -1 0
damage data/source_txt/t7_government_jlee_303.txt 28450 28456 O -1 -1 0
and data/source_txt/t7_government_jlee_303.txt 28457 28460 O -1 -1 0
cost data/source_txt/t7_government_jlee_303.txt 28461 28465 O -1 -1 0
the data/source_txt/t7_government_jlee_303.txt 28466 28469 O -1 -1 0
lives data/source_txt/t7_government_jlee_303.txt 28470 28475 O -1 -1 0
of data/source_txt/t7_government_jlee_303.txt 28476 28478 O -1 -1 0
over data/source_txt/t7_government_jlee_303.txt 28479 28483 O -1 -1 0
150 data/source_txt/t7_government_jlee_303.txt 28484 28487 O -1 -1 0
individuals data/source_txt/t7_government_jlee_303.txt 28488 28499 O -1 -1 0
along data/source_txt/t7_government_jlee_303.txt 28500 28505 O -1 -1 0
the data/source_txt/t7_government_jlee_303.txt 28506 28509 O -1 -1 0
East data/source_txt/t7_government_jlee_303.txt 28510 28514 O -1 -1 0
Coast data/source_txt/t7_government_jlee_303.txt 28515 28520 O -1 -1 0
of data/source_txt/t7_government_jlee_303.txt 28521 28523 O -1 -1 0
the data/source_txt/t7_government_jlee_303.txt 28524 28527 O -1 -1 0
United data/source_txt/t7_government_jlee_303.txt 28528 28534 O -1 -1 0
States.29 data/source_txt/t7_government_jlee_303.txt 28535 28544 O -1 -1 0
October data/source_txt/t7_government_jlee_303.txt 28545 28552 O -1 -1 0
2014 data/source_txt/t7_government_jlee_303.txt 28553 28557 O -1 -1 0
. data/source_txt/t7_government_jlee_303.txt 28557 28558 O -1 -1 0
Lines 3699-3747, error in 3744
from deft_corpus.
Filepath
train/t5_economic_1_101.deft
Content
At data/source_txt/t5_economic_mkaplan_101.txt 1806 1808 O -1 -1 0
point data/source_txt/t5_economic_mkaplan_101.txt 1809 1814 O -1 -1 0
A data/source_txt/t5_economic_mkaplan_101.txt 1815 1816 O -1 -1 0
on data/source_txt/t5_economic_mkaplan_101.txt 1817 1819 O -1 -1 0
the data/source_txt/t5_economic_mkaplan_101.txt 1820 1823 O -1 -1 0
budget data/source_txt/t5_economic_mkaplan_101.txt 1824 1830 O -1 -1 0
constraint data/source_txt/t5_economic_mkaplan_101.txt 1831 1841 O -1 -1 0
line data/source_txt/t5_economic_mkaplan_101.txt 1842 1846 O -1 -1 0
, data/source_txt/t5_economic_mkaplan_101.txt 1846 1847 O -1 -1 0
by data/source_txt/t5_economic_mkaplan_101.txt 1848 1850 O -1 -1 0
working data/source_txt/t5_economic_mkaplan_101.txt 1851 1858 O -1 -1 0
40 data/source_txt/t5_economic_mkaplan_101.txt 1859 1861 O -1 -1 0
hours data/source_txt/t5_economic_mkaplan_101.txt 1862 1867 O -1 -1 0
a data/source_txt/t5_economic_mkaplan_101.txt 1868 1869 O -1 -1 0
week data/source_txt/t5_economic_mkaplan_101.txt 1870 1874 O -1 -1 0
, data/source_txt/t5_economic_mkaplan_101.txt 1874 1875 O -1 -1 0
50 data/source_txt/t5_economic_mkaplan_101.txt 1876 1878 O -1 -1 0
weeks data/source_txt/t5_economic_mkaplan_101.txt 1879 1884 O -1 -1 0
a data/source_txt/t5_economic_mkaplan_101.txt 1885 1886 O -1 -1 0
year data/source_txt/t5_economic_mkaplan_101.txt 1887 1891 O -1 -1 0
, data/source_txt/t5_economic_mkaplan_101.txt 1891 1892 O -1 -1 0
the data/source_txt/t5_economic_mkaplan_101.txt 1893 1896 O -1 -1 0
utility data/source_txt/t5_economic_mkaplan_101.txt 1897 1904 O -1 -1 0
- data/source_txt/t5_economic_mkaplan_101.txt 1904 1905 O -1 -1 0
maximizing data/source_txt/t5_economic_mkaplan_101.txt 1905 1915 O -1 -1 0
choice data/source_txt/t5_economic_mkaplan_101.txt 1916 1922 O -1 -1 0
is data/source_txt/t5_economic_mkaplan_101.txt 1923 1925 O -1 -1 0
to data/source_txt/t5_economic_mkaplan_101.txt 1926 1928 O -1 -1 0
work data/source_txt/t5_economic_mkaplan_101.txt 1929 1933 O -1 -1 0
a data/source_txt/t5_economic_mkaplan_101.txt 1934 1935 O -1 -1 0
total data/source_txt/t5_economic_mkaplan_101.txt 1936 1941 O -1 -1 0
of data/source_txt/t5_economic_mkaplan_101.txt 1942 1944 O -1 -1 0
2,000 data/source_txt/t5_economic_mkaplan_101.txt 1945 1950 O -1 -1 0
hours data/source_txt/t5_economic_mkaplan_101.txt 1951 1956 O -1 -1 0
per data/source_txt/t5_economic_mkaplan_101.txt 1957 1960 O -1 -1 0
year data/source_txt/t5_economic_mkaplan_101.txt 1961 1965 O -1 -1 0
and data/source_txt/t5_economic_mkaplan_101.txt 1966 1969 O -1 -1 0
earn data/source_txt/t5_economic_mkaplan_101.txt 1970 1974 O -1 -1 0
$ data/source_txt/t5_economic_mkaplan_101.txt 1975 1976 O -1 -1 0
16,000.Now data/source_txt/t5_economic_mkaplan_101.txt 1976 1986 O -1 -1 0
suppose data/source_txt/t5_economic_mkaplan_101.txt 1987 1994 O -1 -1 0
that data/source_txt/t5_economic_mkaplan_101.txt 1995 1999 O -1 -1 0
a data/source_txt/t5_economic_mkaplan_101.txt 2000 2001 O -1 -1 0
government data/source_txt/t5_economic_mkaplan_101.txt 2002 2012 O -1 -1 0
antipoverty data/source_txt/t5_economic_mkaplan_101.txt 2013 2024 O -1 -1 0
program data/source_txt/t5_economic_mkaplan_101.txt 2025 2032 O -1 -1 0
guarantees data/source_txt/t5_economic_mkaplan_101.txt 2033 2043 O -1 -1 0
every data/source_txt/t5_economic_mkaplan_101.txt 2044 2049 O -1 -1 0
family data/source_txt/t5_economic_mkaplan_101.txt 2050 2056 O -1 -1 0
with data/source_txt/t5_economic_mkaplan_101.txt 2057 2061 O -1 -1 0
a data/source_txt/t5_economic_mkaplan_101.txt 2062 2063 O -1 -1 0
single data/source_txt/t5_economic_mkaplan_101.txt 2064 2070 O -1 -1 0
mother data/source_txt/t5_economic_mkaplan_101.txt 2071 2077 O -1 -1 0
and data/source_txt/t5_economic_mkaplan_101.txt 2078 2081 O -1 -1 0
two data/source_txt/t5_economic_mkaplan_101.txt 2082 2085 O -1 -1 0
children data/source_txt/t5_economic_mkaplan_101.txt 2086 2094 O -1 -1 0
$ data/source_txt/t5_economic_mkaplan_101.txt 2095 2096 O -1 -1 0
18,000 data/source_txt/t5_economic_mkaplan_101.txt 2096 2102 O -1 -1 0
in data/source_txt/t5_economic_mkaplan_101.txt 2103 2105 O -1 -1 0
income data/source_txt/t5_economic_mkaplan_101.txt 2106 2112 O -1 -1 0
. data/source_txt/t5_economic_mkaplan_101.txt 2112 2113 O -1 -1 0
Lines 356-416, error in 395
As data/source_txt/t5_economic_mkaplan_101.txt 21889 21891 O -1 -1 0
the data/source_txt/t5_economic_mkaplan_101.txt 21892 21895 O -1 -1 0
famous data/source_txt/t5_economic_mkaplan_101.txt 21896 21902 O -1 -1 0
British data/source_txt/t5_economic_mkaplan_101.txt 21903 21910 O -1 -1 0
economist data/source_txt/t5_economic_mkaplan_101.txt 21911 21920 O -1 -1 0
Joan data/source_txt/t5_economic_mkaplan_101.txt 21921 21925 O -1 -1 0
Robinson data/source_txt/t5_economic_mkaplan_101.txt 21926 21934 O -1 -1 0
wrote data/source_txt/t5_economic_mkaplan_101.txt 21935 21940 O -1 -1 0
some data/source_txt/t5_economic_mkaplan_101.txt 21941 21945 O -1 -1 0
decades data/source_txt/t5_economic_mkaplan_101.txt 21946 21953 O -1 -1 0
ago data/source_txt/t5_economic_mkaplan_101.txt 21954 21957 O -1 -1 0
: data/source_txt/t5_economic_mkaplan_101.txt 21957 21958 O -1 -1 0
“ data/source_txt/t5_economic_mkaplan_101.txt 21959 21960 O -1 -1 0
[ data/source_txt/t5_economic_mkaplan_101.txt 21960 21961 O -1 -1 0
E]conomic data/source_txt/t5_economic_mkaplan_101.txt 21961 21970 O -1 -1 0
theory data/source_txt/t5_economic_mkaplan_101.txt 21971 21977 O -1 -1 0
, data/source_txt/t5_economic_mkaplan_101.txt 21977 21978 O -1 -1 0
in data/source_txt/t5_economic_mkaplan_101.txt 21979 21981 O -1 -1 0
itself data/source_txt/t5_economic_mkaplan_101.txt 21982 21988 O -1 -1 0
, data/source_txt/t5_economic_mkaplan_101.txt 21988 21989 O -1 -1 0
preaches data/source_txt/t5_economic_mkaplan_101.txt 21990 21998 O -1 -1 0
no data/source_txt/t5_economic_mkaplan_101.txt 21999 22001 O -1 -1 0
doctrines data/source_txt/t5_economic_mkaplan_101.txt 22002 22011 O -1 -1 0
and data/source_txt/t5_economic_mkaplan_101.txt 22012 22015 O -1 -1 0
can data/source_txt/t5_economic_mkaplan_101.txt 22016 22019 O -1 -1 0
not data/source_txt/t5_economic_mkaplan_101.txt 22019 22022 O -1 -1 0
establish data/source_txt/t5_economic_mkaplan_101.txt 22023 22032 O -1 -1 0
any data/source_txt/t5_economic_mkaplan_101.txt 22033 22036 O -1 -1 0
universally data/source_txt/t5_economic_mkaplan_101.txt 22037 22048 O -1 -1 0
valid data/source_txt/t5_economic_mkaplan_101.txt 22049 22054 O -1 -1 0
laws data/source_txt/t5_economic_mkaplan_101.txt 22055 22059 O -1 -1 0
. data/source_txt/t5_economic_mkaplan_101.txt 22059 22060 O -1 -1 0
Lines 3578-3609, error in 3592
from deft_corpus.
Ok, I log tokenization errors in other way.
There are list with tuple where the first element is filepath and the second is tuple with token and its info(token is the first element of this nested tuple)
Train data:
[
('train/t7_government_2_0.deft',
('1965.Shelby',
'data/source_txt/t7_government_rlacroix_0.txt',
'38939',
'38950',
'O')),
('train/t1_biology_0_0.deft',
('link]a', 'data/source_txt/t1_biology_jlee_0.txt', '18877', '18883', 'O')),
('deft/deft_corpus/data/deft_files/train/t1_biology_2_0.deft',
('link]b',
'data/source_txt/t1_biology_rlacroix_0.txt',
'18608',
'18614',
'O')),
('deft/deft_corpus/data/deft_files/train/t1_biology_1_303.deft',
('link]ab',
'data/source_txt/t1_biology_mkaplan_303.txt',
'6294',
'6301',
'O')),
('train/t1_biology_1_303.deft',
('link]a',
'data/source_txt/t1_biology_mkaplan_303.txt',
'22867',
'22873',
'O')),
('train/t3_physics_1_0.deft',
('link](b',
'data/source_txt/t3_physics_mkaplan_0.txt',
'1229',
'1236',
'O')),
('train/t3_physics_1_0.deft',
('vs.-time',
'data/source_txt/t3_physics_mkaplan_0.txt',
'6279',
'6287',
'O')),
('train/t6_sociology_0_0.deft',
('link]).This',
'data/source_txt/t6_sociology_jlee_0.txt',
'28157',
'28168',
'O')),
('train/t6_sociology_0_0.deft',
('1964)', 'data/source_txt/t6_sociology_jlee_0.txt', '29454', '29459', 'O')),
('train/t6_sociology_0_0.deft',
('2003).Since',
'data/source_txt/t6_sociology_jlee_0.txt',
'38163',
'38174',
'O')),
('train/t6_sociology_0_0.deft',
('2014.Summary',
'data/source_txt/t6_sociology_jlee_0.txt',
'40171',
'40183',
'O')),
('train/t7_government_1_303.deft',
('FDA).http://www.fda.gov',
'data/source_txt/t7_government_mkaplan_303.txt',
'39381',
'39404',
'O')),
('train/t1_biology_1_404.deft',
('link]a',
'data/source_txt/t1_biology_mkaplan_404.txt',
'17625',
'17631',
'O')),
('train/t1_biology_1_404.deft',
('link]b',
'data/source_txt/t1_biology_mkaplan_404.txt',
'18015',
'18021',
'O')),
('train/t1_biology_1_202.deft',
('23.2This',
'data/source_txt/t1_biology_mkaplan_202.txt',
'25314',
'25322',
'O')),
('train/t7_government_0_101.deft',
('1988).Craig',
'data/source_txt/t7_government_jlee_101.txt',
'7681',
'7692',
'O')),
('train/t7_government_0_101.deft',
('link]).Louis',
'data/source_txt/t7_government_jlee_101.txt',
'16449',
'16461',
'O')),
('train/t7_government_0_101.deft',
('link]).Keyssar',
'data/source_txt/t7_government_jlee_101.txt',
'20547',
'20561',
'O')),
('train/t7_government_0_101.deft',
('1884)112',
'data/source_txt/t7_government_jlee_101.txt',
'22467',
'22475',
'O')),
('train/t7_government_0_101.deft',
('1884)112',
'data/source_txt/t7_government_jlee_101.txt',
'22467',
'22475',
'O')),
('train/t7_government_0_101.deft',
('1991.Gallup',
'data/source_txt/t7_government_jlee_101.txt',
'32975',
'32986',
'O')),
('train/t7_government_0_101.deft',
('link]).Jose',
'data/source_txt/t7_government_jlee_101.txt',
'40326',
'40337',
'O')),
('train/t7_government_0_101.deft',
('link]).Dan',
'data/source_txt/t7_government_jlee_101.txt',
'48993',
'49003',
'O')),
('train/t7_government_0_101.deft',
('issues.http://blacklivesmatter.com/about/',
'data/source_txt/t7_government_jlee_101.txt',
'49359',
'49400',
'O')),
('train/t4_psychology_2_0.deft',
('1962).It',
'data/source_txt/t4_psychology_rlacroix_0.txt',
'35743',
'35751',
'B-Definition'))
]
Dev data:
[
('train/t4_psychology_1_303.deft',
('1979)',
'data/source_txt/t6_sociology_jlee_101.txt',
'22066',
'22071',
'O')),
('train/t6_sociology_0_101.deft',
('link]d',
'data/source_txt/t1_biology_jlee_303.txt',
'4515',
'4521',
'I-Definition')),
('train/t7_government_1_202.deft',
('bb).Imagine',
'data/source_txt/t4_psychology_mkaplan_0.txt',
'2108',
'2119',
'O')),
('train/t1_biology_0_101.deft',
('link]d',
'data/source_txt/t1_biology_jlee_303.txt',
'4515',
'4521',
'I-Definition')),
('train/t5_economic_1_101.deft',
('link]b', 'data/source_txt/t1_biology_jlee_0.txt', '20696', '20702', 'O')),
('train/t4_psychology_2_101.deft',
('2000).Comte',
'data/source_txt/t6_sociology_jlee_0.txt',
'5380',
'5391',
'O')),
('train/t1_biology_jlee_101.deft',
('bb).Imagine',
'data/source_txt/t4_psychology_rlacroix_0.txt',
'1759',
'1770',
'O'))
]
from deft_corpus.
Related Issues (20)
- Double sentences in corpus HOT 1
- [TOKENIZATION] #2
- [TOKENIZATION] #3
- [TOKENIZATION] #4
- [TOKENIZATION] #5
- [TOKENIZATION] #6
- [TOKENIZATION] #7
- [TOKENIZATION] #8
- [TOKENIZATION] #9
- [TOKENIZATION] #10
- [TOKENIZATION] #11
- Duplicate information HOT 2
- A few bad tags in deft_files
- Bug - handling last sentence in task1_converter.py
- Missing subtask 3 label: Qualifies/Supplements HOT 1
- Missing relations HOT 4
- CSV Parser in the evaluation script is not handling quotes correctly HOT 1
- labeled data not the same size and unlabeled one
- no contract dataset HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deft_corpus.