GithubHelp home page GithubHelp logo

zhangguoxiao / cnccgbank Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jogloran/cnccgbank

0.0 1.0 0.0 1.08 MB

Chinese CCGbank conversion algorithm suite

License: Other

Shell 2.27% Python 95.09% JavaScript 0.20% C++ 1.53% C 0.69% Batchfile 0.05% Ruby 0.17%

cnccgbank's Introduction

Chinese CCGbank conversion
==========================
(c) 2008-2012 Daniel Tse <[email protected]>
University of Sydney

Use of this software is governed by the attached "Chinese CCGbank converter Licence Agreement"
supplied in the Chinese CCGbank conversion distribution. If the LICENCE file is missing, please
notify the maintainer Daniel Tse <[email protected]>.

    Licensees shall acknowledge use of the Licensed Software and Derivative Works in all 
    publications of research based in whole or in part on their use through citation of the following publication:

    Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank
    Daniel Tse and James R. Curran
    Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 1083-1091, 
    Beijing, China, 2010.

How to obtain a copy of Chinese CCGbank:
----------------------------------------
Python version:
    The Chinese CCGbank conversion process has been tested on Python 2.7.1 and GNU bash 4.1.5.
    The scripts will require a Unix environment with bash available.

Obtaining a copy of Penn Chinese Treebank:
    The Chinese CCGbank conversion process requires a copy of Penn Chinese Treebank (tested on PCTB 6.0,
    may work on other versions; LDC catalog no. LDC2007T36), which can be obtained through the
    Linguistic Data Consortium (LDC). The LDC catalogue page for this corpus is located at:
        http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T36

Installing required dependencies:
    These Python packages are required by the conversion process:
        PyYAML (tested on 3.10)
        PLY (tested on 3.4)
        cmd2 (tested on 0.6.2)

    The easiest way to install these dependencies is through 'easy_install' (Python setuptools).
    Instructions for installing setuptools can be found here:
        http://pypi.python.org/pypi/setuptools

    Once setuptools is installed, execute:
        easy_install PyYAML
        easy_install PLY
        easy_install cmd2

Installing Python C extensions:
    The tokeniser is written in C for speed. To build it, change to the directory of the
    Chinese CCGbank converter distribution, then execute:
        cd lib && python setup.py install
        (sudo may be required for setup.py to install the library in a location accessible
         to all users)

Generating the corpus:
   While in the directory of the Chinese CCGbank converter distribution, execute:
        ./make.sh -o <CHINESE CCGBANK DESTINATION DIRECTORY> -c <CHINESE PENN TREEBANK DIRECTORY>

        where the argument of '-o' is the desired output directory, and
              the argument of '-c' is the Penn Chinese Treebank directory containing '.fid' files.

    Generating the corpus will also write debugging output to the screen. When the process is complete,
    the CCGbank corpus will be output to the directory specified by the argument of '-o'.

cnccgbank's People

Contributors

jogloran avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.