GithubHelp home page GithubHelp logo

tpcoref's Introduction

Transformer Pointer Coreference Resolution

Trained Model

Parameters

The final trained model was trained using the following parameters:

Model Hyperparameters
---------------------
Model Hidden Dimension: 512
Num Layers: 4
Num Attention Heads: 8
Dropout: 0.2
Learning Rate: 0.0001 (1e-4)
Final Learning Rate: 6.3e-5
Num Epochs: 10
Batch Size: 4

Meta
    - criterion: nn.CrossEntropyLoss
    - optimizer: nn.SGD
    - scheduler: StepLR(optim, step_size=1, gamma=0.95)
    - src_mask: None
    - tgt_mask: Used
    - torch.manual_seed = 42

Training Data
    - train.tsv
    - TEXT and LABEL vocab
    - NO_REF_TOKEN: '<nr>'

This model achieved the following losses:

Validation Losses
    - Epoch 1: 416.91417050361633
    - Epoch 2: 323.9406987428665
    - Epoch 3: 225.71645319461823
    - Epoch 4: 144.5937288403511
    - Epoch 5: 91.24655830860138
    - Epoch 6: 61.019713655114174
    - Epoch 7: 44.32578928396106
    - Epoch 8: 34.620120372623205
    - Epoch 9: 28.595584958791733
    - Epoch 10: 24.60211064480245

Training

The above model was trained in Google Colab using the entirety of the training data (~30,000 documents) and validation data (500 documents) on a Tesla P100 GPU over 5 hours.

tpcoref's People

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.