GithubHelp home page GithubHelp logo

smallseg's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

ather1995 gisupc

smallseg's Issues

把分词结果保存到文本文档出现编码问题

如下
#def cuttest(s):
    #wlist = seg.cut(s)
    #wlist.reverse()
    #tmp = "/".join(wlist)
    #print tmp      
    #print "================================================================="        
if __name__=="__main__":
    s1 = file("text1.txt").read()
    wlist = seg.cut(s1)
    wlist.reverse()
    res1 = "/".join(wlist)
    print res1   

    fl=open("result.txt","w")
    fl.write(tmp)
    fl.close()   

取消定义的cuttext模块,下面直接引用,读取文本text1中的内��
�分词,都是可行的。
但是最后三行把分词结果保存到result.txt中出现编码问题:
   UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)
不知道怎么解决啊,前辈能不能帮忙看看怎么修改。



Original issue reported on code.google.com by [email protected] on 1 Jun 2013 at 5:14

为何部署在web 中,会报错,

我在IDE中直接运行test.java 
是可以进行分词的,但是运行在tomcat 
服务器中就报错,javax.servlet.ServletException: 
java.lang.NoClassDefFoundError: Could not initialize class fx.sunjoy.SmallSeg
请麻烦详解一下,是否还要加什么配置文件的?谢谢

Original issue reported on code.google.com by [email protected] on 17 Aug 2010 at 11:59

不能处理unicode的中文字符?

如题,这个函数不能处理unicode的中文字符串吗?
比如,cuttest(u"我喜欢python和c++。")
报错:
Traceback (most recent call last):
  File "D:\bluecat2\Desktop\smallseg_0.5.1\test_fenci.py", line 41, in <module>
    cuttest(u"我喜欢python和c++。")
  File "D:\bluecat2\Desktop\smallseg_0.5.1\test_fenci.py", line 18, in cuttest
    wlist = seg.cut(text)
  File "D:\bluecat2\Desktop\smallseg_0.5.1\smallseg.py", line 56, in cut
    text = text.decode('utf-8','ignore')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: 
ordinal not in range(128)
Windows, Python 2.7


Original issue reported on code.google.com by [email protected] on 22 Feb 2012 at 12:50

biomedical field

What steps will reproduce the problem?
输入“新生小鼠中肌红蛋白含量较成年鼠高吗?”


What is the expected output?
新生 小鼠 中 肌红蛋白 含量 较 成年 鼠 高吗

What do you see instead?
新生 小鼠 中肌 肌红 蛋白 含量 较 成年 鼠高 高吗

What version of the product are you using? On what operating system?
0.6

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 1 May 2011 at 10:15

如何使部署在gae上的分词高效点

我看到你的在线演示,是部署在gae上的。使用起来速度还可��
�。可是我部署在gae
上,每次请求都会加载字典一次,这个过程十分的慢。请问��
�是如何做到让他快速执
行的。

Original issue reported on code.google.com by [email protected] on 13 May 2010 at 3:42

分词后的结果组合不成分词前的内容,修改了下面的地方好了,作者请看下修改的是否正确,以及原因。

"干脆就把那部蒙人的闲法给废了 拉倒!RT @laoship ukong : 
27日,全国人大常 委会第三次审议侵 
权责任法草案,删除了有关 医疗损害责任“举证 
倒置”的规定。在医患纠纷中本已处于弱势地位的消费者由��
�将陷入万劫不复的境地。"
分词后的结果是:
"干脆 就把 那部 蒙人 的闲 闲法 法给 废了 拉倒 RT @laoship 
ukong 27 日 全国人大 常  委会 第三 次 审议 侵 权 责任 
法草案  删除 了 有关 医疗 损害 责任 举证 倒置 的 规定 在 
医患 纠纷 中 本已 处于 弱势 地位 的 消费者 由此 将 陷入 
万劫不复 的 境地"

可以看到 的闲 闲法 这地方有重复。

修改了下面两行:

http://code.google.com/p/smallseg/source/browse/trunk/smallseg.py#43

http://code.google.com/p/smallseg/source/browse/trunk/smallseg.py#44

改为:
for i in xrange(ln,0,-1):
    tmp = s[i-1:i]
    ...

Original issue reported on code.google.com by [email protected] on 10 Aug 2012 at 10:01

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.