GithubHelp home page GithubHelp logo

cenjat / sememe-sc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thunlp/sememe-sc

0.0 0.0 0.0 57.24 MB

Source code and data for ACL 2019 paper "Modeling Semantic Compositionality with Sememe Knowledge"

License: MIT License

Python 100.00%

sememe-sc's Introduction

Modeling Semantic Compositionality with Sememe Knowledge

Code and data for ACL2019 paper Modeling Semantic Compositionality with Sememe Knowledge [pdf].

Requirements

  • Tensorflow >= 1.13.1
  • Python3.6

Data

This repo contains three types of data.

  • Data for Semantic Compositionality Degree (SCD)

    • ./SC Degree/scd.txt Human annotated MWEs with their SCD, constituents and corresponding sememe set.

      The format for each instance is as follows:

      农民                           ==>{constituent_word_1}
      职位 人 农                      ==>{sememe_set_of_constituent_word_1}
      起义                           ==>{constituent_word_2}
      暴动 事情 政                    ==>{sememe_set_of_constituent_word_2}
      农民起义                        ==>{MWE}
      事情 职位 政 暴动 人 农          ==>{sememe_set_of_MWE}
      3.0                           ==>{SCD_of_the_MWE}
      
  • Core data for our model

    • ./dataset/HowNet_original_new.txt Original HowNet data

    • ./dataset/hownet.txt preprocessed and flattened HowNet data

    • ./dataset/train.bin Training data. Use pickle to load.

    • ./dataset/test.bin Test data. Use pickle to load.

    • ./dataset/dev.bin Dev data. Use pickle to load.

    • ./dataset/all.bin All data. Use pickle to load.

    • ./dataset/sememe_vector.txt Pretrained 1335 sememe embeddings, original file download here.

    • ./dataset/word_embedding.txt.zip Pretrained 200d GloVe embedding. Unzip it before use.

      To load the *.bin file, you can first import pickle and then do as follows in Python:

      pickle.loads(open({file_name}, 'rb').read())

  • Filtered word pairs with human annotated similarity data:

    • ./wordsim/filtered_wordsim240.txt
    • ./wordsim/filtered_wordsim240.txt
    • ./wordsim/COS960.txt

Sememe-based Semantic Compositionality Degree

To compare the correlation between human annotated SCD and our proposed sememe-based SCD, please:

cd 'SC Degree'
python test_scd.py

MWE Similarity Computation

We use Wordsim240, Wordsim297 and COS960 to test our models performance on MWE similarity computation task. We remove the words in above three dataset which are not MWEs in our dataset and manually move the MWEs in above three dataset to test set.

To run our four models for training on similarity computation task, you could run the following commands:

SC-AS:

python ps_SC_AS.py

SC-MSA:

python ps_SC_MSA.py

SC-AS+R

python ps_SC_AS_R.py

SC-MSA+R

python ps_SC_MSA_R.py 

To evaluate the learned MWE embeddings, please:

python eval_wordsim.py {saved_MWE_embedding_path} 

MWE Sememe Prediction

To train and test our models on MWE sememe prediction task, you could run the following commands:

SC-AS:

python sem_SC_AS.py

SC-MSA:

python sem_SC_MSA.py

SC-AS+R

python sem_SC_AS_R.py

SC-MSA+R

python sem_SC_MSA_R.py 

Cite

If you use the code or data, please cite this paper:

@inproceedings{Qi2019ModelingSC,
title={Modeling Semantic Compositionality with Sememe Knowledge},
author={Fanchao Qi and Junjie Huang and Chenghao Yang and Zhiyuan Liu and Xiao Chen and Qun Liu and Sun Maosong},
booktitle={Proceedings of ACL 2019}
year={2019}
}

sememe-sc's People

Contributors

fanchao-qi avatar jun-jie-huang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.