GithubHelp home page GithubHelp logo

yoyzhou / weibo_scrapy Goto Github PK

View Code? Open in Web Editor NEW
154.0 27.0 80.0 8.42 MB

WEIBO_SCRAPY is a Multi-Threading SINA WEIBO data extraction Framework in Python.

Home Page: http://yoyzhou.github.io/blog/2013/04/08/weibo-scrapy-framework-with-multi-threading/

Python 100.00%

weibo_scrapy's Introduction

WEIBO_SCRAPY

WEIBO_SCRAPY是一个PYTHON实现的,使用多线程抓取WEIBO信息的框架。WEIBO_SCRAPY框架给用户提供WEIBO的模拟登录和多线程抓取微博信息的接口,让用户只需关心抓取的业务逻辑,而不用处理棘手的WEIBO模拟登录和多线程编程。

WEIBO_SCRAPY is a Multi-Threading SINA WEIBO data extraction Framework in Python. WEIBO_SCRAPY provides WEIBO login simulator and interface for WEIBO data extraction with multi-threading, it saves users a lot of time by getting users out of writing WEIBO login simulator from scratch and multi-threading programming, users now can focus on their own extraction logic.

=======

###WEIBO_SCRAPY的功能 1. 微博模拟登录

2. 多线程抓取框架

3. 抓取任务接口

4. 抓取参数配置

###WEIBO_SCRAPY Provides 1. WEIBO Login Simulator

2. Multi-Threading Extraction Framework

3. Extraction Task Interface

4. Easy Way of Parameters Configuration

###How to Use WEIBO_SCRAPY #!/usr/bin/env python #coding=utf8

from weibo_scrapy import scrapy

class my_scrapy(scrapy):
	
	def scrapy_do_task(self, uid=None):
	     '''
	    User needs to overwrite this method to perform uid-based scrapy task.
	    @param uid: weibo uid
	    @return: a list of uids gained from this task, optional
	    '''
	     super(my_scrapy, self).__init__(**kwds)
	     
	     #do what you want with uid here, note that this scrapy is uid based, so make sure there are uids in task queue, 
	     #or gain new uids from this function
	     print 'WOW...'
	     return 'replace this string with uid list which gained from this task'
	 
if __name__ == '__main__':
	
	s = my_scrapy(uids_file = 'uids_all.txt', config = 'my.ini')
	s.scrapy()

###相关阅读(Readings) 基于UID的WEIBO信息抓取框架WEIBO_SCRAPY

weibo_scrapy's People

Contributors

yoyzhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

weibo_scrapy's Issues

how to generate weibo_cookies.dat

hi, I looked through the code and find cookies_file = weibo_cookies.dat configuration in scrapy.ini file. However, I couldn't find this file or some guide on how to generate this file.
Could you provide some tips on how to get this file.
Thanks in advance!

weibo login

p = re.compile('location.replace(\“(.?)\”)')
替换成
p = re.compile(r'location.replace('(.
?)')')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.