GithubHelp home page GithubHelp logo

crawler-7's Introduction

Crawler

爬虫集

互联网招聘网址爬虫如下:

互联网知名公司招聘信息爬虫如下:

内容服务商爬虫:

爬虫脚手架

pipeline

目前只有两个 pipeline , 一个使用mongo做数据存储,一个使用set做数据的判重, 点击查看源码

middleware

目前只有两个 middleware ,一个使用 fake_useragent 来生成随机UA,一个用于使用http代理列表, 点击查看源码

工具集

抓取免费代理

抓取代理网站中给出的免费代理, 并初步校验,点击查看源码! 目前抓取的代理网站如下:

代理验证

使用 httpbin 来测验代理的时效性和种类。

IP信息获取

使用 geoiplookup 用于查询IP信息。

示例如下:

from utils.ip_info import get_ip_info

print(get_ip_info('8.8.8.8'))
{u'countrycode': u'US', u'ip': u'8.8.8.8', u'isp': u'Google', u'longitude': u'-97.822', u'countryname': u'United States', u'host': u'8.8.8.8', u'latitude': u'37.751'}

翻译函数

目前只做了简单封装,支持如下:
  • 有道词典
    from utils.translate import translate
    import json
    
    print(translate(u'努力工作', dict_name='youdao')['translateResult'][0][0]['tgt'])
    print(translate(u'hard work', dict_name='youdao', lfrom='en', lto='zh-CHS')['translateResult'][0][0]['tgt'])
        
    To work hard
    努力工作
        
  • 百度翻译
    from utils.translate import translate
    
    print(translate(u'努力工作', dict_name='baidu')[0]['dst'])
    print(translate(u'hard work', dict_name='baidu', lfrom='en', lto='zh-CHS')[0]['dst'])
        
    Work hard
    艰苦的工作
        

crawler-7's People

Contributors

brantou avatar oyslboy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.