Light

zouyuwuse / spiders Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ychenracing/spiders

0.0 1.0 0.0 453 KB

Python spiders using scrapy framework, or 3rd libraries such as requests, BeautifulSoup, etc.

Home Page: https://blog.csdn.net/c315838651/article/details/72675470

License: Apache License 2.0

Python 100.00%

spiders's Introduction

Spiders

serveral spiders using requests, BeautifulSoup or scrapy, and so on.

Data crawled be stored in MongoDB or MySQL. Spider kongjie downloads pictures of all users in kongjie.com.

Spider haofl:

It crawls haofl.net using scrapy, extends CrawlSpider but in Spider style.

Spider kongjie:

A spider using requests and BeautifulSoup to crawl kongjie.com. It is concise enough because of requests and bs4. Redis hash is used to de-duplicate person.

Blog is here: Python网络爬虫requests、bs4爬取空姐网图片

Spider qiubai:

This spider crawls qiushibaike.com using scrapy. It extends CrawlSpider but in Spider style. Style such as Rule, LinkExtractor in CrawlSpider will be used soon. Data crawled is stored into MongoDB.

Blog is here: Python爬虫框架Scrapy之爬取糗事百科大量段子数据

Spider onesixnine:

A spider using scrapy which can crawl all images in 169ee.com. It use CrawlSpider in scrapy to crawl the full site. Rule and LinkExtractor are used to extract links to follow. Images will be saved in the disk.

Blog is here: 爬虫进阶：CrawlSpider爬取169ee全站美女图片

Spider flhhkkSpider:

A spider using scrapy and selenium to crawl all candidate baidu wangpan download links, then preserves them in MySQL. It use CrawlSpider in scrapy to crawl the full site.

Running

Data

Blog is here: TODO

You can give me a star if they help you.

Cite it when you use it to write any blog or post.

Copyright

ychen@fdu

spiders's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs