zhangguoxiao / cnccgbank Goto Github PK
View Code? Open in Web Editor NEWThis project forked from jogloran/cnccgbank
Chinese CCGbank conversion algorithm suite
License: Other
This project forked from jogloran/cnccgbank
Chinese CCGbank conversion algorithm suite
License: Other
Chinese CCGbank conversion ========================== (c) 2008-2012 Daniel Tse <[email protected]> University of Sydney Use of this software is governed by the attached "Chinese CCGbank converter Licence Agreement" supplied in the Chinese CCGbank conversion distribution. If the LICENCE file is missing, please notify the maintainer Daniel Tse <[email protected]>. Licensees shall acknowledge use of the Licensed Software and Derivative Works in all publications of research based in whole or in part on their use through citation of the following publication: Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank Daniel Tse and James R. Curran Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 1083-1091, Beijing, China, 2010. How to obtain a copy of Chinese CCGbank: ---------------------------------------- Python version: The Chinese CCGbank conversion process has been tested on Python 2.7.1 and GNU bash 4.1.5. The scripts will require a Unix environment with bash available. Obtaining a copy of Penn Chinese Treebank: The Chinese CCGbank conversion process requires a copy of Penn Chinese Treebank (tested on PCTB 6.0, may work on other versions; LDC catalog no. LDC2007T36), which can be obtained through the Linguistic Data Consortium (LDC). The LDC catalogue page for this corpus is located at: http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T36 Installing required dependencies: These Python packages are required by the conversion process: PyYAML (tested on 3.10) PLY (tested on 3.4) cmd2 (tested on 0.6.2) The easiest way to install these dependencies is through 'easy_install' (Python setuptools). Instructions for installing setuptools can be found here: http://pypi.python.org/pypi/setuptools Once setuptools is installed, execute: easy_install PyYAML easy_install PLY easy_install cmd2 Installing Python C extensions: The tokeniser is written in C for speed. To build it, change to the directory of the Chinese CCGbank converter distribution, then execute: cd lib && python setup.py install (sudo may be required for setup.py to install the library in a location accessible to all users) Generating the corpus: While in the directory of the Chinese CCGbank converter distribution, execute: ./make.sh -o <CHINESE CCGBANK DESTINATION DIRECTORY> -c <CHINESE PENN TREEBANK DIRECTORY> where the argument of '-o' is the desired output directory, and the argument of '-c' is the Penn Chinese Treebank directory containing '.fid' files. Generating the corpus will also write debugging output to the screen. When the process is complete, the CCGbank corpus will be output to the directory specified by the argument of '-o'.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.