GithubHelp home page GithubHelp logo

thucke's Issues

關於res and Segmentation fault

請問res資料夾是要自己創建嗎?
因為我跑了./thucke -i ./test1.txt -n 5 -m ./res
他出現
invalid model_path: ./res
invalid model_path: ./res
Segmentation fault: 11

目前還是用內附的text1文本,Segmentation fault?

謝謝提供開源

您好,我在下载你们的代码和model以后,发现了代码里面有两个比较明显的问题.具体如下:

1.你们在计算idf( = log(总文本数/词所在的文本数))的时候,总文本(lexiconNumDocs)数取的是0,(具体赋值在keywordLoad.cpp的109行,对其进行model>>lexiconNumDocs赋值这一步,得到的大小是0)这导致所有词的idf值都是inf(无穷大).
2.你们在读取文件res/thucke/pro_forward的时候,你们是想按照book/key的顺序读取的,但代码其实是在按key/book的顺序读取的,这也是一个错误.你可以通过printf文件的size来看,会发现300w的数据在读取后只保留了200w个.

关于新模型训练

想要请问下模型训练这块是否开源?由于官方提供的模型是基于网易新闻爬取的新闻数据(大概40000条)进行训练,我想要训练一个新的模型。

dataset

您好,请问可以提供训练时用的网易新闻数据集嘛?

模型文件

您好,请问模型文件夹中的前向概率和后项概率文件如何生成的,请指教

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.