GithubHelp home page GithubHelp logo

kaggle-coleridge-initiative's People

Contributors

riow1983 avatar

Watchers

 avatar  avatar

kaggle-coleridge-initiative's Issues

[Solutions] 47th

[追加した理由]
fine-tuningとstring matchingをしていない点が興味深い.

[Discussion]
47th place solution - no training, no dataset label string matching

[Code]
coleridge_regex_electra

[一言で言うと]
正規表現によるstring matchingと, QAモデル(huggingfaceのpre-trained ELECTRAをfine-tuneせずにそのまま使用)によるanswerを組み合わせた.




[関連技術]

[Solutions] 17th

[追加した理由]
データセット言及スパンの内, "neutral"以外の単語をルールベースで付与可能な{[TITLE], [UPPER], [MIXED]}の特殊トークンでmaskすることでNERモデルのoverfitを防いだ点がユニークだと思ったため.

[Discussion]
17th Place Solution - SpaCy 3 (EntityRuler) and NER CRF model

[Code]
{TITLE}

[一言で言うと]
ルールベースもしくはNERモデルが上げてきたデータセット候補を, LightGBMで選別.





[関連技術]

[Solutions] 14th

[Discussion]
14th Place Solution (with notebooks)

[Code]

[一言で言うと]
trainから目視で抜いてきた"hand label"を使った, fine-tuned bert-base-cased + crf on BIO-tagged chunks of ~200-400 words




[関連技術]

huggingface transformers + PyTorch for NER task fine-tuning

spaCyにしろNERDAにしろ、結局ラッパーなんで、最終的にガチれないところはコンペ向きでは無いかもしれない。そういう意味ではhuggingface+PyTrochが王道なのかもしれん。
https://www.reddit.com/r/LanguageTechnology/comments/lnca2q/some_questions_about_spacy_vs_hugging_face/

huggingfaceのpre-trained BERTをPyTorchXLA(TPU)でfine-tune(NERタスク)するColab notebookが落ちてた。
https://colab.research.google.com/github/abhimishra91/transformers-tutorials/blob/master/transformers_ner.ipynb

しかも訓練データはKaggleのデータ: https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus
(コンペではなく、4年以上前に作られたデータセットだが)

Originally posted by @riow1983 in #6 (comment)

cleaned_labelをカテゴライズしたものでGroup KfoldしたCVを作成する (train.csvに対して)

  • 対象は無加工のtrain.csv
  • cleaned_labelをカテゴライズしたものをgroupとしてGroup Kfoldを行う
    • その際, 教師ラベルをどの変数(カラム)にするかは未定 (適当でいい?)

[参考: cleaned_label 全130種 from train.csv]

0 national education longitudinal study
1 noaa tidal station
2 slosh model
3 noaa c cap
4 aging integrated database agid 
5 alzheimers disease neuroimaging initiative
6 aging integrated database
7 noaa national water level observation network
8 noaa water level station
9 baltimore longitudinal study of aging blsa 
10 national water level observation network
11 arms farm financial and crop production practices
12 beginning postsecondary student
13 noaa sea lake and overland surges from hurricanes
14 noaa tide gauge
15 the national institute on aging genetics of alzheimer s disease data storage site
16 national center for education statistics common core of data
17 national science foundation survey of industrial research and development
18 baccalaureate and beyond
19 noaa international best track archive for climate stewardship
20 agricultural resource management survey
21 national teacher and principal survey
22 international best track archive for climate stewardship
23 nsf higher education research and development survey
24 national science foundation survey of earned doctorates
25 school survey on crime and safety
26 the national institute on aging genetics of alzheimer s disease data storage site niagads 
27 national oceanic and atmospheric administration world ocean database
28 beginning postsecondary students longitudinal study
29 nces common core of data
30 program for the international assessment of adult competencies
31 survey of earned doctorates
32 baltimore longitudinal study of aging
33 early childhood longitudinal study
34 adni
35 national science foundation survey of graduate students and postdoctorates in science and engineering
36 trends in international mathematics and science study
37 national oceanic and atmospheric administration c cap
38 nsf survey of earned doctorates
39 noaa tide station
40 education longitudinal study
41 optimum interpolation sea surface temperature
42 national oceanic and atmospheric administration optimum interpolation sea surface temperature
43 alzheimer s disease neuroimaging initiative adni 
44 baccalaureate and beyond longitudinal study
45 agricultural resources management survey
46 beginning postsecondary students
47 ibtracs
48 coastal change analysis program
49 survey of graduate students and postdoctorates in science and engineering
50 national assessment of education progress
51 sea surface temperature optimum interpolation
52 high school longitudinal study
53 nsf survey of graduate students and postdoctorates in science and engineering
54 national science foundation survey of doctorate recipients
55 survey of doctorate recipients
56 coastal change analysis program land cover
57 survey of industrial research and development
58 world ocean database
59 rural urban continuum codes
60 noaa optimum interpolation sea surface temperature
61 noaa world ocean database
62 common core of data
63 higher education research and development survey
64 noaa storm surge inundation
65 national weather service nws storm surge risk
66 survey of science and engineering research facilities
67 nsf survey of industrial research and development
68 national science foundation survey of science and engineering research facilities
69 national science foundation higher education research and development survey
70 national center for science and engineering statistics survey of earned doctorates
71 national center for science and engineering statistics survey of science and engineering research facilities
72 national center for science and engineering statistics survey of graduate students and postdoctorates in science and engineering
73 national center for science and engineering statistics survey of doctorate recipients
74 national center for science and engineering statistics survey of industrial research and development
75 national center for science and engineering statistics higher education research and development survey
76 nsf survey of science and engineering research facilities
77 ffrdc research and development survey
78 nsf ffrdc research and development survey
79 survey of state government research and development
80 ncses survey of doctorate recipients
81 ncses survey of graduate students and postdoctorates in science and engineering
82 anss comprehensive earthquake catalog
83 anss comprehensive catalog
84 advanced national seismic system anss comprehensive catalog comcat 
85 advanced national seismic system comprehensive catalog
86 census of agriculture
87 usda census of agriculture
88 nass census of agriculture
89 north american breeding bird survey
90 north american breeding bird survey bbs 
91 usgs north american breeding bird survey
92 covid 19 open research dataset cord 19 
93 covid 19 open research dataset
94 covid open research dataset
95 covid 19 open research data
96 complexity science hub covid 19 control strategies list cccsl 
97 complexity science hub covid 19 control strategies list
98 cccsl
99 our world in data covid 19 dataset
100 our world in data covid 19
101 our world in data
102 jh crown registry
103 characterizing health associated risks and your baseline disease in sars cov 2 charybdis 
104 characterizing health associated risks and your baseline disease in sars cov 2
105 covid 19 death data
106 sars cov 2 genome sequence
107 sars cov 2 genome sequences
108 covid 19 genome sequence
109 covid 19 genome sequences
110 2019 ncov genome sequence
111 2019 ncov genome sequences
112 sars cov 2 full genome sequence
113 sars cov 2 full genome sequences
114 sars cov 2 complete genome sequence
115 sars cov 2 complete genome sequences
116 2019 ncov complete genome sequences
117 genome sequences of sars cov 2
118 genome sequence of sars cov 2
119 genome sequence of covid 19
120 genome sequences of covid 19
121 genome sequence of 2019 ncov
122 genome sequences of 2019 ncov
123 covid 19 image data collection
124 rsna international covid 19 open radiology database ricord 
125 rsna international covid 19 open radiology database
126 rsna international covid open radiology database
127 cas covid 19 antiviral candidate compounds dataset
128 cas covid 19 antiviral candidate compounds data set
129 cas covid 19 antiviral candidate compounds data

reference

[Solutions] 1st

[Discussion]
1st place solution: Metric learning and GPT

1st solution: Matching the : Context Similarity via Deep Metric Learning and Beyond

[Code]

[GitHub]

[一言で言うと]
以下2アプローチをそれぞれ別個に用意:

  • GPT(QAタスク) + beamsearch (private LB = 0.565 (0.594 w/o labels obtained by scispaCy's abbreviation detecotor)) by Khoi Nguyen
  • metric learning with usual MLM backbones (private LB = 0.576 (0.588 w/o labels obtained by scispaCy's abbreviation detecotor)) by Nguyen Quan Anh Minh


[関連知識]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.