GithubHelp home page GithubHelp logo

minervatech / mecab Goto Github PK

View Code? Open in Web Editor NEW

This project forked from taku910/mecab

0.0 3.0 0.0 85.89 MB

Yet another Japanese morphological analyzer

HTML 20.13% CSS 0.48% Makefile 1.63% Shell 8.54% Perl 0.38% Perl 6 0.07% C# 0.81% C++ 67.18% C 0.06% Java 0.43% Roff 0.03% Python 0.22% Ruby 0.03% Batchfile 0.01%

mecab's Introduction

mecab

Yet another Japanese morphological analyzer

installation

First compile mecab.

git clone https://github.com/minervatech/mecab.git
cd mecab/mecab
./configure  --enable-utf8-only
make
make check
sudo make install

Mecab will be deployed to

/usr/local/etc/mecabrc
/usr/local/bin/mecab
/usr/local/bin/mecab-config

Then, compile dictionary. Use minerva/mecab as the original code have encoding problem.

cd mecab/mecab-ipadic
nkf -w --overwrite *.csv
nkf -w --overwrite *.def
./configure --with-charset=utf8
make
sudo make install
sudo ldconfig

Custom dictionary

Add arbitrary CSV file below mecab/mecab-ipadic and re-build dictionary

cd mecab/mecab-ipadic
vim Noun.new.csv
make clean
make
sudo make install

Use mecab-ipadic-neologd

cd mecab
git clone https://github.com/neologd/mecab-ipadic-neologd
echo `mecab-config --dicdir`"/mecab-ipadic-neologd" # Check target dir
./mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -n -a

Comparison

  • ipadic

echo "10日放送の「中居正広のミになる図書館」(テレビ朝日系)で、SMAPの中居正広が、篠原信一の過去の勘違いを明かす一幕があった。" | mecab -d /usr/local/lib/mecab/dic/ipadic

  • mecab-ipadic-neologd

echo "10日放送の「中居正広のミになる図書館」(テレビ朝日系)で、SMAPの中居正広が、篠原信一の過去の勘違いを明かす一幕があった。" | mecab -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd

Python bindings

pip install mecab-python

mecab-ipadic-neologd

# coding: utf-8
import MeCab
 
input = '10日放送の「中居正広のミになる図書館」(テレビ朝日系)で、SMAPの中居正広が、篠原信一の過去の勘違いを明かす一幕があった。'
tagger = MeCab.Tagger("-Ochasen -d /usr/local/lib/mecab/dic/ipadic/")
tagger.parse('')
node = tagger.parseToNode(input)
while node:
    print (node.surface, node.feature)
    node = node.next

mecab-ipadic-neologd

# coding: utf-8
import MeCab
 
input = '10日放送の「中居正広のミになる図書館」(テレビ朝日系)で、SMAPの中居正広が、篠原信一の過去の勘違いを明かす一幕があった。'
tagger = MeCab.Tagger("-Ochasen -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd/")
tagger.parse('')
node = tagger.parseToNode(input)
while node:
    print (node.surface, node.feature)
    node = node.next

Uninstall

cd /usr/local/lib
sudo rm libmecab.*
sudo rm -rf mecab/

cd /usr/local/bin/
sudo rm mecab
sudo rm mecab-config

cd  /usr/local/etc
sudo rm mecabrc

mecab's People

Contributors

k-kawakami avatar taku910 avatar humem avatar shogo82148 avatar kou avatar

Watchers

James Cloos avatar Kentaro Wada avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.