GithubHelp home page GithubHelp logo

mtianyan / mtianyansearch Goto Github PK

View Code? Open in Web Editor NEW
249.0 13.0 106.0 43.86 MB

Word2vec 个性化搜索实现 +Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

Home Page: http://search.mtianyan.cn

License: MIT License

Python 92.95% Dockerfile 0.08% PLpgSQL 2.58% SCSS 4.39%

mtianyansearch's Introduction

Word2vec 个性化搜索实现 +Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

Build Status MIT Licence

本仓库为搜索端网站端代码,爬虫端请前往https://github.com/mtianyan/FunpySpiderSearchEngine 获取

可用功能:

  1. 知乎答案问题爬虫存入ElasticSearch
  2. 全文搜索(需结合网站端一起使用),搜索词高亮标红
  3. Redis实现的实时三站已爬取数目展示,热门搜索Top-5
  4. word2vec改变ElasticSearch(function_score, script_score)评分, 比如历史上你搜索过Apple, 会使得Apple经过 Word2vec 计算出的苹果,乔布斯等关键词打分排名靠前

word2vec 模型训练全过程请查看FunpySpiderSearchEngine项目Word2VecModel 中README word2vec 使用,影响ElasticSearch打分,请查看mtianyanSearch中相关代码

核心打分代码:

"source": "double final_score=_score;int count=0;int total = params.title_keyword.size();while(count < total) { String upper_score_title = params.title_keyword[count]; if(doc['title_keyword'].value.contains(upper_score_title)){final_score = final_score+_score;}count++;}return final_score;"

标题每包含一个相关词,分数加倍

项目演示图:

如何开始使用?

本地运行

安装好爬虫端所需的相关环境。

git clone https://github.com/mtianyan/mtianyanSearch.git
pip install -r requirements.txt
cd mtianyanSearch
export not_use_docker=true
python manage.py runserver --settings=FunPySearch.settings.local

Docker 运行

docker network create search-spider
git clone https://github.com/mtianyan/mtianyanSearch.git
cd mtianyanSearch
docker-compose up -d
git clone https://github.com/mtianyan/FunpySpiderSearchEngine
cd FunpySpiderSearchEngine
docker-compose up -d

访问127.0.0.1:8080

赞助

如果我的项目代码对你有帮助,请我吃包辣条吧!

mark

mtianyansearch's People

Contributors

mtianyan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mtianyansearch's Issues

为什末是404 请教博主

POST http://localhost:9200/jobbole/_suggest [status:404 request:0.004s]
Undecodable raw error response from server: Expecting value: line 1 column 1 (char 0)
[31/May/2018 02:29:09] "POST /jobbole/suggest HTTP/1.1" 404 825
[31/May/2018 02:29:09] "GET /suggest/?s=qq&s_type=article&
=1527733749729 HTTP/1.1" 500 927
[31/May/2018 02:29:09] "POST /jobbole/suggest HTTP/1.1" 404 825
POST http://localhost:9200/jobbole/_suggest [status:404 request:0.004s]
Undecodable raw error response from server: Expecting value: line 1 column 1 (char 0)
[31/May/2018 02:29:09] "GET /suggest/?s=qqq&s_type=article&
=1527733749846 HTTP/1.1" 500 927
[31/May/2018 02:29:10] "GET /jobbole/_search HTTP/1.1" 404 825
GET http://127.0.0.1:9200/jobbole/_search [status:404 request:0.003s]
Undecodable raw error response from server: Expecting value: line 1 column 1 (char 0)
[31/May/2018 02:29:10] "GET /search/?q=qqq&s_type=article HTTP/1.1" 500 927

您好,我在学习您的教程中,遇到了一些问题,来寻求您的帮助,感谢!

我在进行搜索补全的时候出现了问题,code完成后并没有得到正确的显示效果
初步怀疑应该是django版本的问题
出现的问题如下:

System check identified no issues (0 silenced).
February 27, 2018 - 12:22:10
Django version 2.0.1, using settings 'lcv_search.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CTRL-BREAK.
[27/Feb/2018 12:22:19] "GET / HTTP/1.1" 200 5670
[27/Feb/2018 12:22:19] "GET /static/css/style.css HTTP/1.1" 200 2822
[27/Feb/2018 12:22:19] "GET /static/css/index.css HTTP/1.1" 200 1846
[27/Feb/2018 12:22:19] "GET /static/js/global.js HTTP/1.1" 200 750
[27/Feb/2018 12:22:20] "GET /static/js/jquery.js HTTP/1.1" 200 252978
[27/Feb/2018 12:22:20] "GET /static/img/inputbg.png HTTP/1.1" 200 2841
[27/Feb/2018 12:22:20] "GET /static/img/seachbtn.png HTTP/1.1" 200 5163
[27/Feb/2018 12:22:20] "GET /static/img/logo.png HTTP/1.1" 200 2651
Not Found: /favicon.ico
[27/Feb/2018 12:22:21] "GET /favicon.ico HTTP/1.1" 404 2214
Method Not Allowed (GET): /suggest/
[27/Feb/2018 12:22:22] "GET /suggest/?s=s&s_type=article&_=1519705342620 HTTP/1.1" 405 0
[27/Feb/2018 12:31:54] "GET / HTTP/1.1" 200 5670
Method Not Allowed (GET): /suggest/
[27/Feb/2018 12:32:00] "GET /suggest/?s=s&s_type=article&_=1519705920782 HTTP/1.1" 405 0
Method Not Allowed (GET): /suggest/
[27/Feb/2018 12:32:01] "GET /suggest/?s=ss&s_type=article&_=1519705921028 HTTP/1.1" 405 0
Method Not Allowed (GET): /suggest/
[27/Feb/2018 12:32:01] "GET /suggest/?s=sss&s_type=article&_=1519705921066 HTTP/1.1" 405 0
Method Not Allowed (GET): /suggest/
[27/Feb/2018 12:32:01] "GET /suggest/?s=ssss&s_type=article&_=1519705921099 HTTP/1.1" 405 0

这是我在控制台中执行

python manage.py runserver

后打印出来的日志,然后我在把我项目里代码贴出来
urls.py
urlpatterns = [ path('admin/', admin.site.urls), path('', TemplateView.as_view(template_name='index.html'),name='index'), path('suggest/', SearchSuggest.as_view(),name='suggest'), ]
views.py
class SearchSuggest(View): def suggest(self,request): key_words = request.GET.get('s','') re_datas=[] if key_words: s = ArticleType.search() s = s.suggest('my_suggest',key_words,completion={ "field":"suggest","fuzzy":{ "fuzziness":2 }, "size":10 }) suggestions = s.execute_suggest() for match in suggestions.my_suggest[0].options: source = match._source re_datas.append(source["title"]) return HttpResponse(json.dumps(re_datas), content_type="appliction/json")
index.html
<script type="text/javascript"> var suggest_url = "{% url 'suggest' %}" var search_url = "/search/"

请教这个 原因出在哪里

File "/Users/admin/anaconda/lib/python3.5/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/admin/Documents/work_python/ArticleSpider-elasticsearch/ArticleSpider/pipelines.py", line 146, in process_item
item.save_to_es()
File "/Users/admin/Documents/work_python/ArticleSpider-elasticsearch/ArticleSpider/items.py", line 266, in save_to_es
article.suggest = gen_suggests(es_article,ArticleType._doc_type.index, ((article.title, 10), (article.tags, 7),(article.content, 3)))
File "/Users/admin/Documents/work_python/ArticleSpider-elasticsearch/ArticleSpider/items.py", line 51, in gen_suggests
words = es.indices.analyze(index=index, analyzer="ik_max_word", params={'filter':["lowercase"]}, body=text)
File "/Users/admin/anaconda/lib/python3.5/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/Users/admin/anaconda/lib/python3.5/site-packages/elasticsearch/client/indices.py", line 32, in analyze
'_analyze'), params=params, body=body)
File "/Users/admin/anaconda/lib/python3.5/site-packages/elasticsearch/transport.py", line 312, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/Users/admin/anaconda/lib/python3.5/site-packages/elasticsearch/connection/http_urllib3.py", line 129, in perform_request
self._raise_error(response.status, raw_data)
File "/Users/admin/anaconda/lib/python3.5/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.