GithubHelp home page GithubHelp logo

tongji-search's Introduction

#tongji-search Our website get url and title of news in sse using the spider.

It can help you find the news from sse, you can search news you want to find in our website.

Build with maven ####Search-engine

  • lucene 4.0.0【更新Lucene6.0.0】

####Analyzer

  • IKAnalyzer 2012FF

####Spider

  • Webmagic 0.5.2【考虑使用nodejs替代】

####Database

  • 野狗云

##项目进度: ###前端部分

  • 重建搜索首页
  • 一个后端登录,配置爬虫用

####数据库

  • 更换数据库源,使用野狗云数据库

###搜索引擎核心

  • 适配野狗云

###爬虫

  • 适配野狗云,或者考虑使用nodejs爬虫
  • 爬虫可配置

##Issue

  • 需要一个云服务器,阿里云太贵了
  • 将爬虫和Lucene分割
  • 网页文件使用文件存储

tongji-search's People

Watchers

 avatar  avatar  avatar

Forkers

thestralzhang

tongji-search's Issues

未完成部分

将抓取的网页文本进行储存,当got URL有变化时重新进行分词处理和建立索引,
manage.html页面的target URL的管理,
搜索结果页的字符串长度限制
manage.html页面的start spider只能点一次,点击第二次hibernate无法getCurrentSession

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.