GithubHelp home page GithubHelp logo

thehappymouse / ccmouse Goto Github PK

View Code? Open in Web Editor NEW
52.0 4.0 27.0 3.18 MB

学习ccmouse老师的课程,自己敲的代码,分布式并发爬虫的噢

Go 14.89% HTML 74.56% CSS 9.68% JavaScript 0.87%

ccmouse's Introduction

幕课网 ccmouse 分布式爬虫

整体构构图

分布式结构

crawler

单机版爬虫到单机并发版爬虫,爬虫的核心都在此文件夹内

存储使用的是 elasticsearch docker run -d -p 9200:9200 elasticsearch

运行: go run crawler/main.go

crawler_distributed

在单机爬虫基本之上,扩展的并发版爬虫版本。 使用json-rpc实现多个节点调用 主要扩展内容:存储服务(一个),Work服务(网页抓取)多个,引擎(一个) 运行步骤:

  1. 存储

go run crawler_distributed/persist/server/saver_main.go (占用1234端口)

2,Worker

go run crawler_distributed/worker/server/worker_main.go --port=9003

3, 引擎

go run crawler_distributed/distributed.go --hosts="192.168.1.8:9002,192.168.1.8:9004,:9002,:9003"

ccmouse's People

Contributors

thehappymouse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ccmouse's Issues

leak memory ??

// 100毫秒执行一次请求
var rateLimiter = time.Tick(50 * time.Millisecond)

不关闭ticker,测试的时候发现会导致内存不断增长。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.