GithubHelp home page GithubHelp logo

凝固度存疑 about pyunit HOT 3 CLOSED

jtyoui avatar jtyoui commented on August 21, 2024
凝固度存疑

from pyunit.

Comments (3)

jtyoui avatar jtyoui commented on August 21, 2024

你说的巧客应该是巧克的意思吧,这个算法是按词袋进行统计的,意思是在统计巧克力三个字是不是成词的时候,首先先统计巧克(两个字)是不是成词,当巧克力出现的次数不是很多的时候,巧克的次数差不多等于巧克力的次数时,那么巧克和巧克力的统计分析结果相差不大,自然会出现断字(巧克之类的词语)出现。根本原因是数据量不足导致的。你可以人为的调整参数很增大数据量来避免这样的断字出现,其实有一些断字是有意义的,比如:中华人民共和国、中华、中华人民、共和国等都是有意义的。你如果只想要最大粒度的词语,那么过滤掉就行,过滤的算法可以参考:https://github.com/jtyoui/Jtyoui/blob/master/jtyoui/data/methods.py 里面的110行remove_subset函数。

import jtyoui

print(jtyoui.remove_subset(['aa', 'a', 'ab']))  
# ['aa', 'ab']

from pyunit.

jtyoui avatar jtyoui commented on August 21, 2024

您好,求解答: “巧克力”中“巧客”和“力”的凝固程度很高,所以更倾向于把“巧克力”定义为一个词,为什么按凝固程度,程序会找出“巧客”这样半个词的片段(博文中这么写的)谢谢????

https://github.com/jtyoui/Jtyoui/issues/14#issue-520022018

from pyunit.

WangQi1024 avatar WangQi1024 commented on August 21, 2024

from pyunit.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.