GithubHelp home page GithubHelp logo

tianliangzhou / ffi-lac Goto Github PK

View Code? Open in Web Editor NEW
10.0 2.0 6.0 2.38 MB

基于百度LAC项目的PHP中文智能分词库

Home Page: https://loocode.com/tool/chinese-word-segmentation

CMake 11.54% C++ 78.46% C 2.95% PHP 7.06%
php-ffi php-lac baidu-lac php-ffi-extension php8

ffi-lac's Introduction

ffi-lac

ffi-lac是一个PHP高性能智能中文分词库,基于百度开源的lac 项目,使用C++导出C函数构建动态链接库来给php调用。 项目中大部分源码都可以从 https://github.com/baidu/lac 找到。

环境

需要php >= 7.4 以上的版本并且开启了FFI扩展。

还需要设置php.ini 中的ffi.enableOn

项目依赖: paddle预测库(点击去下载 ),预测模型库(点击去下载 )。

需要为预测库建立软链。

[meshell@/] ln -s paddle_inference/paddle/lib/libpaddle_inference.so libpaddle_inference.so
[meshell@/] ln -s paddle_inference/third_party/install/mklml/lib/libmklml_intel.so libmklml_intel.so
[meshell@/] ln -s paddle_inference/third_party/install/mklml/lib/libiomp5.so libiomp5.so
[meshell@/] ln -s paddle_inference/third_party/install/mkldnn/lib/libmkldnn.so.0 libdnnl.so.2

该目录不支持window环境,如果你需要请自行编译。

Usage

<?php

include __DIR__ . '/../src/LAC.php';

$dictDir = ""; // 默认库根目录下的model/lac_model

$lac = \FastFFI\LAC\LAC::new($dictDir);

var_dump(
    $lac->parse("LAC智能中文分词库")
);

以上程序执行后的结果:

array(2) {
  ["words"]=>
  string(29) "LAC 智能 中文 分 词库 "
  ["tags"]=>
  string(11) "nz n n v n "
}

结果分别为词和标签都是以空格分隔。

以下是标签含义:

标签 含义 标签 含义 标签 含义 标签 含义
n 普通名词 f 方位名词 s 处所名词 nw 作品名
nz 其他专名 v 普通动词 vd 动副词 vn 名动词
a 形容词 ad 副形词 an 名形词 d 副词
m 数量词 q 量词 r 代词 p 介词
c 连词 u 助词 xc 其他虚词 w 标点符号
PER 人名 LOC 地名 ORG 机构名 TIME 时间

权重定义:

标签 含义 常见于词性
0 query中表述的冗余词 p, w, xc ...
1 query中限定较弱的词 r, c, u ...
2 query中强限定的词 n, s, v ...
3 query中的核心词 nz, nw, LOC ...

在线转换

ffi-lac's People

Contributors

tianliangzhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ffi-lac's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.