Currently under development
Solver for Winograd Schema Challenge in Portuguese. Portuguese translations for original Winograd Schema Challenge are also being proposed here.
- Code for Language Model based on Pytorch's Word-level language modeling RNN example
- Code for parallelization of PyTorch model based on PyTorch-Encoding package with help from this medium post.
- Idea of using language model for solving Winograd Schema Challenge based on paper "A Simple Method for Commonsense Reasoning":
@article{DBLP:journals/corr/abs-1806-02847,
author = {Trieu H. Trinh and
Quoc V. Le},
title = {A Simple Method for Commonsense Reasoning},
journal = {CoRR},
volume = {abs/1806.02847},
year = {2018},
url = {http://arxiv.org/abs/1806.02847},
archivePrefix = {arXiv},
eprint = {1806.02847},
timestamp = {Mon, 13 Aug 2018 16:46:22 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1806-02847},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
-
This project has not been tested in machines without CUDA GPUs available.
-
A Dockerfile is available, and may be used with
docker build -t wsc_port .
followed bynvidia-docker run -it -v $PWD/models:/code/models wsc_port
. -
The Dockerfile contains a few different options for running, which can be selected by commenting and uncommenting the final sections of it.
-
For running outside of Docker container, Conda is required
-
To create the conda environment:
conda env create -f environment.yml
-
Makefile contains some of the commands used to run the code. These commands must be run from inside the environment.
- to setup the environment for running the project:
make dev_init
. This command also makes suremake processed_data
is run, which prepares data needed to train model - running
make corpus
will speed up first run of code (but is not necessary) make train
trains a modelmake winograd_test
runs evaluation of Winograd Schema Challengemake generate
runs language model for generation of text
- to setup the environment for running the project:
-
Code runs for both English and Portuguese cases, and this setting is controlled by the variable
PORTUGUESE
insrc.consts
-
Run tests with
make tests
, which is equivalent topytest --cov=src tests/
. Usepytest --cov=src --cov-report=html tests/
for generation of HTML test report. Needs pytest and pytest-cov packages. If there are import errors, should runpip install -e .
to locally install package from source code.
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`.
├── README.md <- The top-level README for developers using this project.
├── environment.yml <- Contains project's requirements, generated from Anaconda environment.
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported.
│
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── githooks <- Contains githooks scripts being used for development. Git hook directory for repo needs to be set to this folder.
│
├── models <- Trained and serialized models, model predictions, or model summaries.
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting.
│
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module.
│ │
│ └── scripts
└── tests <- Tests module, using Pytest.
Project based on the cookiecutter data science project template. #cookiecutterdatascience