INSTALLATION
You need to have installed jupyter notebook.
I advice you to create an enviroment (virtualenv or conda)
Setup your enviroment with pip install -r req.txt
The notebooks are divided basing on their purpose, to obtain a proper and correct model which can replicate the experiments of the paper, the notebooks are to be consulted (and surely, runned) in the order showed below:
DATA PREPARATION
- (0.1)
wikipedia_Abstracts.ipynb
: download and preprocess a bit the corpus- requires the corpus, you can download it like showed in the notebook or from here
- (0.2)
import_elmo_embeddings.ipynb
: align ELMo's vectors with words in corpus, create the datasets- requires the elmo vectors, you can generate yourself with an ELMo's implementation or you can download mine from here
- Vectors can be downloaded in bulk (54.5 GB) or 50+ zip can be downloaded, if you download the 50+ zip you have to recompose with
cat splitted* > elmo_vectors.zip
- (0.2.1)
composite_words.ipynb
: retrieving of the word phrases- Requires the corpus and the elmo vectors
- (0.2.2)
minimal_type.ipynb
: solving of polytipe words (words which are retrieved from more than one class) the polytiping have to be erased to avoid a multilabel problem
BUILD THE NETWORK, TRAIN, EVALUATE
- (1)
network.ipynb
: