GithubHelp home page GithubHelp logo

micang / lupro Goto Github PK

View Code? Open in Web Editor NEW

This project forked from luxuncang/lupro

0.0 0.0 0.0 182 KB

lupro是一个优雅的异步爬虫框架

Home Page: https://luxuncang.github.io/lupro/

Python 100.00%

lupro's Introduction

lupro 爬虫框架

lupro是一个优雅的异步爬虫框架

安装 Lupro

使用 PyPi 安装 Lupro

  • pip Find, install and publish Python packages with the Python Package Index
  • pip install lupro

开始使用

  1. 导入 from lupro import lupro

兼容requests或httpx

from lupro import lupro as requests
# or
from lupro import lupro as httpx

这样即可不用修改代码完全替换 requests or httpx

原生lupro

from lupro import lupro
r = lupro.get('https://www.python.org')
r.status_code

批量异步任务

from lupro import lupros, async_lupro

# 请求列表
urls = ['https://www.python.org','https://www.baidu.com']
async_lupro([lupros.get(url) for url in urls])

lupro配置

# 工作路径新建 lupro_config.py 即可自定义配置

# 自定义代理
def get_proxies():
    '''获取代理,并将代理已列表返回

    Args:
        None : 无参数

    Returns:
        list : 返回代理列表['xxx.xxx.xxx.xxx:xxxx',...]
    '''
    raise NameError('Please rewrite `get_proxies` function.')


# HTTP引擎
HTTP_ENGINE = httpx

# 对象持久化
PERSISTENCE_ENABLED = False

# 对象持久化存储路径
PERSISTENCE_PATH = 'endurance.db'

# 代理池
PROXIES = []

# 是否验证代理池
VERIFY_PROXIES = False

lupro 实例

'''
lupro 实例参数
Args:
    filename : str 文件路径或请求名称 推荐使用路径命名
    lupros : lupros requests参数字典
    proxie : bool 是否使用代理
    format : str 保存文件格式
    content : int 回调最少字节
    faultolt : int 可重试次数

lupro 实例方法
method:
    task : 请求
    xpath_analysis : xpath解析
    json_analysis : json解析
    re_analysis : re解析
    css_analysis : css解析
    save_file : 下载器
'''

from lupro import lupro, lupros
from pprint import pprint

def cusjoin(text):
    return 'lupro >>> ' + text

task = lupro('python', lupros.get('https://www.python.org/', timeout = 15), content = 200)

pprint(task.xpath_analysis({'News or Events' :'//*[@id="content"]//ul[@class="menu"]//li/a//text()'}, cusjoin))

batch 实例

from lupro import lupro, lupros, batch
from pprint import pprint

url = 'https://www.python.org/doc/versions/'

task = lupro('python docs/', lupros.get(url , timeout = 15), content = 200)

doc_ver = task.xpath_analysis({'documentation url' :'//*[@id="python-documentation-by-version"]/ul//li/a/@href', 'documentation' :'//*[@id="python-documentation-by-version"]/ul//li/a/text()'})
pprint(doc_ver)

savahtml = batch(task, doc_ver['documentation url'], doc_ver['documentation'])

savahtml.BulkDownload()

任务自省,冷重启

from lupro import batch
from pprint import pprint

# 需要在batch前 设置 PERSISTENCE_ENABLED = True
task_name = 'python docs/'
# 任务冷重启
batch.coldheavy(task_name)
# 任务回调
pprint(batch.callback(task_name))

特性

  • 完全兼容 requests or httpx
  • 异步特性
  • lupro生成器
  • 自动编码修正
  • 解析器与解析链
  • 选择器与选择链
  • 下载器
  • 对象持久化
  • 任务自省,冷重启
  • 交互式
  • 微服务

api 文档

lupro api

lupro's People

Contributors

luxuncang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.