GithubHelp home page GithubHelp logo

cnli's Introduction

Baseline for Chinese Natural Language Inference (CNLI) dataset

Description

This repository provides the official training and development dataset for the Chinese Natural Language Inference (CNLI) share task. We evaluate the cnli_1.0 corpus on two baseline models.

Data

The CNLI dataset can be downloaded at here

Both the train and dev set are tab-separated format. Each line in the train (or dev) file corresponds to an instance, and it is arranged as:

sentence-id premise hypothesis label

Model

This repository includes the baseline model for Chinese Natural Language Inference (CNLI) dataset. We provide two baseline models. (1) The Decomposable Attention Model, which use FNNs and inter-attention mechinaism. More details about the model can be found in the original paper. (2) The ESIM Model (https://arxiv.org/pdf/1609.06038.pdf), which is a strong baseline model for SNLI dataset.

Requirements

  • python 3.5
  • tensorflow '1.4.0'
  • jieba 0.39

Training

Data Preprocessing
We use jieba to tokenize the sentences. During trainging, we use the pre-trained SGNS embedding introduced in [Analogical Reasoning on Chinese Morphological and Semantic Relations] (https://arxiv.org/abs/1805.06504). You can download the sgns.merge.word from here.

Main Scripts
config.py:the parameter configuration.
decomposable_att.py: implementation of the Decomposable Attention Model.
data_reader.py: preparing data for the model.
train.py: training the Decomposable Attention Model.

Running Model
You can train the decomposable attention model and the esim model by the following command lines:

python3 train.py --model_type decomposable_att python3 train.py --model_type esim

Results

We provide the whole training data, which comprimises 90,000 items in the training set and 10,000 items in the dev dataset. We adopt early stopping on dev set. The best results are shown in the following table:

Model train-acc(%) dev-acc(%)
Decomposable-Att 76.91 69.35
ESIM 76.82 73.57

Reporting issues

Please let us know, if you encounter any problems.

cnli's People

Contributors

blcunlp avatar thinkingslow avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.