GithubHelp home page GithubHelp logo

kekyo / mecab.dotnet Goto Github PK

View Code? Open in Web Editor NEW
50.0 6.0 6.0 13.26 MB

A Japanese morphological analysis engine for .NET, .NET Core and .NET Frameworks.

License: GNU General Public License v2.0

C# 99.47% Batchfile 0.26% Shell 0.27%
mecab morphological analysis japanese dotnet dotnet-core

mecab.dotnet's Introduction

MeCab.DotNet

A Japanese language morphological analysis engine for .NET, .NET Core and .NET Framework.

MeCab.DotNet

NuGet MeCab.DotNet .NET

Japanese language is here.

What's this?

NOTE: We will merge both MeCab.DotNet and NMeCab in future release. See related issue.

"MeCab" is a Japanese language morphological analysis engine.

"NMeCab" is a re-implementation of MeCab engine on .NET Framework 2.0 managed library, but didn't update long time (looks like suspended...) --> Revived here (GitHub)

"MeCab.DotNet" (this project) is a ported of NMeCab on .NET, .NET Core and .NET Frameworks and packed into NuGet format.

How to use

MeCab.DotNet targetted platforms:

  • .NET 8 to 5
  • .NET Core 3.1 to 2.0
  • .NET Standard 2.1 to 1.3
  • .NET Framework 4.8.1 to 2.0 (3.5 and 4.0 are Client profile, 2.0 doesn't include extension methods)

Changed from NMeCab:

  • Wider .NET platform supporting and deprecated PCL libraries.
  • Changed namespace NMeCab to MeCab.
  • Removed App.config based configuration (You can only use MeCabParam for additional configuration parameters.)
  • Added more usable methods.

Enabling steps:

  1. Install from NuGet named "MeCab.DotNet".
  2. Usually you'll use default dictionary named IPADIC, the package will append dic folder into your project automatically. But have to declare MeCabUseDefaultDictionary property and set value to False inside PropertyGroup in csproj if you wanna use your own dictionary.
  3. Build and run!

First step sample code

C#

using System;
using MeCab;

namespace ConsoleApp
{
    public static class Program
    {
        public static void Main(string[] args)
        {
            var sentence = "行く川のながれは絶えずして、しかももとの水にあらず。";

            var parameter = new MeCabParam();
            var tagger = MeCabTagger.Create(parameter);

            foreach (var node in tagger.ParseToNodes(sentence))
            {
                if (node.CharType > 0)
                {
                    var features = node.Feature.Split(',');
                    var displayFeatures = string.Join(", ", features);

                    Console.WriteLine($"{node.Surface}\t{displayFeatures}");
                }
            }
        }
    }
}

F#

open MeCab

[<EntryPoint>]
let main argv =
    let sentence = "行く川のながれは絶えずして、しかももとの水にあらず。"

    let parameter = new MeCabParam()
    let tagger = MeCabTagger.Create parameter

    let isCharType (node:MeCabNode) = node.CharType > 0u
    sentence
        |> tagger.ParseToNodes
        |> Seq.filter isCharType
        |> Seq.iter (fun node ->
            let features = node.Feature.Split ','
            let displayFeatures = System.String.Join(", ", features)
            printfn "%s\t%s" node.Surface displayFeatures)
    0

Results

行く    動詞, 自立, *, *, 五段・カ行促音便, 基本形, 行く, イク, イク
川      名詞, 一般, *, *, *, *, 川, カワ, カワ
の      助詞, 連体化, *, *, *, *, の, ノ, ノ
ながれ  動詞, 自立, *, *, 一段, 連用形, ながれる, ナガレ, ナガレ
は      助詞, 係助詞, *, *, *, *, は, ハ, ワ
絶えず  副詞, 一般, *, *, *, *, 絶えず, タエズ, タエズ
し      動詞, 自立, *, *, サ変・スル, 連用形, する, シ, シ
て      助詞, 接続助詞, *, *, *, *, て, テ, テ
、      記号, 読点, *, *, *, *, 、, 、, 、
しかも  接続詞, *, *, *, *, *, しかも, シカモ, シカモ
もと    名詞, 一般, *, *, *, *, もと, モト, モト
の      助詞, 連体化, *, *, *, *, の, ノ, ノ
水      名詞, 一般, *, *, *, *, 水, ミズ, ミズ
に      助詞, 格助詞, 一般, *, *, *, に, ニ, ニ
あら    動詞, 自立, *, *, 五段・ラ行, 未然形, ある, アラ, アラ
ず      助動詞, *, *, *, 特殊・ヌ, 連用ニ接続, ぬ, ズ, ズ
。      記号, 句点, *, *, *, *, 。, 。, 。

License

Under GPL2, LGPL2.1 derived from NMeCab project.

mecab.dotnet's People

Contributors

kekyo avatar luojunyuan avatar shimat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mecab.dotnet's Issues

What Mecab version is used in MeCab.DotNet?

Hi,
MeCab.DotNet is a very wonderful sharing!
Which is the version of Mecab is used in MeCab.DotNet?
As written in readme, MeCab.DotNet is a ported of NMeCab, so the version of Mecab in MeCab.DotNet is 0.98, isn't it?

MeCab.MeCabParam constructor is not compatible with .net6 single file application.

Hello.

.net6 single file application is not compatible with Assembly.Location.

This code throws ArgumentNullException with .net6 single file application. And application does not have nomally solution.

this.GetType().Assembly.Location);

This is sample program exception message.

System.ArgumentNullException: Value cannot be null. (Parameter 'paths')
   at System.IO.Path.Combine(String[] paths)
   at MeCab.MeCabParam.CombinePath(String[] paths)
   at MeCab.MeCabParam..ctor()
   at Program.<Main>$(String[] args) in C:\Users\sudou-daisuke\source\repos\MeCabSampleSolution\MeCabSample\Program.cs:line 5

MeCabSample.zip

dotnet CLI does not add dic directory automatically

Hi, there. Thank you for the great library.
When I add this pacakge to my project, I found that the dotnet CLI did not add dic directory.
I guess I missed some steps, however I could not get them.
Please show me how I should treat this issue.

Steps

$ dotnet new MeCabTest --language=F#
$ dotnet add package MeCab.DotNet
$ vi Program.fs
$ cat Program.fs
open System
open MeCab

[<EntryPoint>]
let main argv =
    let input = "行く川のながれは絶えずして、しかももとの水にあらず。"
    let tagger = MeCabTagger.Create (MeCabParam())
    input
        |> tagger.ParseToNodes
        |> Seq.filter (fun node -> node.CharType > 0u)
        |> Seq.map (fun node -> node.Feature.Split ',')
        |> Seq.iter (fun features -> printf "%s" features.[7])
    0
$ dotnet run
Unhandled exception. System.IO.DirectoryNotFoundException: Could not find a part of the path '/home/********/Documents/MeCabTest/dic/char.bin'.
   at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
   at System.IO.FileStream.OpenHandle(FileMode mode, FileShare share, FileOptions options)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
   at MeCab.Core.CharProperty.Open(String dicDir)
   at MeCab.Core.Tokenizer.Open(MeCabParam param)
   at MeCab.Core.Viterbi.Open(MeCabParam param)
   at MeCab.MeCabTagger.Open(MeCabParam param)
   at MeCab.MeCabTagger.Create(MeCabParam param)
   at Program.main(String[] argv) in /home/********/Documents/MeCabTest/Program.fs:line 7

Cloud not find dictionary with dotnet script

This issue is related to #3. When we use dotnet5 new feature #r "nuget: ..." syntax in F#,
the runtime could not find the dictionary. We can reproduce the following steps,

$ cat test.fsx
#r "nuget: MeCab.DotNet"
open System
open MeCab

let input = "行く川のながれは絶えずして、しかももとの水にあらず。"
let tagger = MeCabTagger.Create (MeCabParam())
input
    |> tagger.ParseToNodes
    |> Seq.filter (fun node -> node.CharType > 0u)
    |> Seq.map (fun node -> node.Feature.Split ',')
    |> Seq.iter (fun features -> printf "%s" features.[7])
$ dotnet fsi test.fsx
System.IO.DirectoryNotFoundException: Could not find a part of the path '/Users/masaya/.nuget/packages/mecab.dotnet/0.0.30/lib/net5.0/dic/char.bin'.
   at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
   at MeCab.Core.CharProperty.Open(String dicDir)
   at MeCab.Core.Tokenizer.Open(MeCabParam param)
   at MeCab.Core.Viterbi.Open(MeCabParam param)
   at MeCab.MeCabTagger.Open(MeCabParam param)
   at MeCab.MeCabTagger.Create(MeCabParam param)
   at <StartupCode$FSI_0002>.$FSI_0002.main@()

I don't know whether C#-script is also so or not. It might be so, I guess.

License

Hello. This isn't so much an issue. I just want make sure I understand the license. If I use MeCab.DotNet in a commercial project what do I have to do as a developer to respect the license? Do I have to provide the source code of my app?

a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)

Could not find the path 'dic\char.bin'

I'm using .net 6 and vscode to write some C# code with MeCab.DotNet. As the documents mentioned, there would be a directory 'dic' in the debug folder. But my app cannot find this folder and throw an exception
Could not find a part of the path 'E:\Code\****\bin\Debug\net6.0-windows\dic\char.bin'.'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.