GithubHelp home page GithubHelp logo

wklken / suggestion Goto Github PK

View Code? Open in Web Editor NEW
178.0 19.0 51.0 281 KB

搜索输入框-下拉提示(推荐), 提示词根据权重排序.基于double-array-trie的darts, golang语言实现.

Home Page: https://github.com/wklken/suggestion

Python 6.66% Go 15.47% CSS 2.06% JavaScript 75.81%

suggestion's Introduction

suggestion

简单的输入框下拉提示服务

简介

在搜索输入框等位置,用户输入关键词,系统提示可以使用的关键字,提升用户体验

截图

img

依赖

  1. jquery-2.1.1.min.js

  2. twitter typeahead 0.10.5 github | examples

使用

  1. clone

  2. go run test_web.go

  3. http://localhost:9090

  4. input


数据文件格式

默认文件格式:

format:    word\tweight
coding:    utf-8 [must]
require:   weight type(int)

eg:  植物大战僵尸\t1000

实现方式1: easymap

使用map方式实现树结构,有python和golang两个版本(见easymap子目录)

quick run:

git clone https://github.com/wklken/suggestion.git
cd suggestion/easymap
python suggest.py
go run suggest.go
适用: 小型系统, 关键词在 10W 左右(中文+拼音+拼音首字母共30W左右)
优点: 逻辑简单结构清晰, 代码足够少, 方便扩展(e.g. 可自行修改存储结构,在返回中加入图片等额外信息)
缺点: 内存占用,30W关键词,平均词长3,占用800M内存, 另外对cpu也有一定消耗
处理和实践: 
      python版本 加一层redis/memcached, python版本, 单机8进程, 16核, 占用1G内存, 每天总请求量在300-500w左右, qps峰值在 300 左右, 没什么压力[没做过压测....]
      golang版本完全没在生产上试过, 应该毫无压力

实现方式2: double-array-trie

使用实现了double-array-trie的darts实现,golang代码

darts实现参考项目: awsong/go-darts

double-array-trie文章: What is Trie | An Implementation of Double-Array Trie

quick run

go run test_web.go
访问 http://localhost:9090

or 

 go run test_run.go

input dict length: 29
build out length 65708
3.55us
<nil>
搜索: 植物大战
Result Len: 10
植物大战僵尸 154717704
植物大战僵尸年度中文版 44592048
植物大战僵尸OL 43566752
植物大战僵尸2 630955
植物大战外星人 530403
植物大战怪兽 29727
植物大战异形变态版 14773
植物大战臭虫 5999
植物大战异形 4456
植物大战昆虫2无敌版 3419
适用: 关键词在10w 以上的系统
优点: 内存占用小, 性能保证
缺点: 底层依赖double-array-trie,逻辑有点绕,自定义不是很方便
处理和实践: 加一层redis/memcached

TODO

集中于darts版本(easymap分离出去)
1.性能测试
2.数据结构可自定义
3.容错处理
4.大小写,拼音,首字母等处理

Change Log

2013-10-13 created, python版本
2013-12-14 增加golang版本
2014-05-11 增加double-array-trie实现的golang 版本
2014-11-04 fix golang version bug, 增加前端展示

Donation

如果你觉得我的项目对你有所帮助, You can buy me a coffee:)

donation


wklken(Pythonista/Vimer)

Email: [email protected]

Blog: http://www.wklken.me

Github: https://github.com/wklken

2013-10-13 于深圳

suggestion's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

suggestion's Issues

对拼音的支持

线上还没支持拼音么?

看到注释里写着:

加入拼音的话,导致内存占用翻倍增长,要考虑下如何优化节点,共用内存

运行suggest.py会报错

错误信息如下,对python不是特别的熟悉,求指导。

============ test2 ===============
Traceback (most recent call last):
File "suggest.py", line 290, in
print u'search 植物'
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 7-8: ordinal not in range(256)

Tire 用中文字符的存储效率?

看了suggestion的py实现,受益匪浅。但是有点疑惑:

  1. 中文字符大概8万,常用字大概3500,假设节点只用常用字的话,假设query的长度最长是10的话,给定一个节点,它的子节点数量有可能是3500,最坏情况下,这颗Tire树总共有 (3500)^10个节点,,这太耗费内存了吧
  2. 自己想到用拼音,不知道楼主这边支持不,拼音的字符加上音调,总共加起来也不超过40个字符,但是拼音的问题是有多对一的映射关系。比如 “稀饭”,“西范“ 对应同一个拼音加音调。。 还有个问题就是性能的问题了,查找路径比中文字符的要长。

想听听楼主的想法
PS: darts指的是 double array Trie 么,这个如何实现suggestion

待处理问题: 代码优化

  1. 变更数据结构
  2. 更快的排序方式
  3. 变更score的类型
  4. 增加注释
  5. 性能测试
  6. 提供web端展现页面和测试入口
  7. 修改readme

方法2执行失败

问题

  • 执行:go run test_run.go
  • 错误提示:build command-line-arguments: cannot find module for path _/Users/wqw/Desktop/code/nlp_base/sug/suggestion/darts
  • 环境变量:
    • go version go1.17.6 darwin/amd64
    • GOPATH已经设置为当前目录
export GOROOT=/usr/local/go
export PATH=$PATH:$GOROOT/bin
export GOPATH=$HOME/Desktop/code/nlp_base/sug/suggestion

注:

  • 方法1没问题,看着像是本地包导入失败

参考文献请求

您好,请问你写的hash trie的代码有没有参考文献/资料,能不能帮忙提供参考?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.