GithubHelp home page GithubHelp logo

panoslin / aio-vextractor Goto Github PK

View Code? Open in Web Editor NEW
55.0 4.0 12.0 153 KB

解析视频 网站/APP/H5 页面视频信息。支持抖音、腾讯视频、YouTube、Instagram 等40余个网站与APP

License: Other

Dockerfile 0.06% Python 91.94% JavaScript 7.79% Shell 0.20%
python crawler video docker

aio-vextractor's Introduction

aioVextractor

Extract video structured data from more than 40 websites/mobile apps/H5 pages. Supporting TikTok, Youtube, Instagram, etc.

解析视频 网站/APP/H5 页面视频信息。支持抖音、腾讯视频、YouTube、Instagram 等40余个网站与APP

开发文档
  1. docker便捷部署

    git clone https://github.com/panoslin/aioVextractor &&\
    cd aioVextractor &&\
    sudo chmod +x build.sh &&\
    sudo sh build.sh
  2. 使用

    from aioVextractor.api import (
        extract,
        breakdown,
        hybrid_worker
    )
    import aiohttp
    import asyncio
    
    async def test():
        async with aiohttp.ClientSession() as session:
            single_url = "https://creative.adquan.com/show/286788"
            playlist_url = "https://weibo.com/p/1005055882998192/photos?type=video#place"
            print(await extract(webpage_url=single_url, session=session))
            print(await hybrid_worker(webpage_url=single_url, session=session))
            print(await breakdown(webpage_url=playlist_url, session=session))
            print(await hybrid_worker(webpage_url=playlist_url, session=session))
    
    
    asyncio.run(test())

    以上提供最高级的 API 解析视频网页链接

    • extract: 解析单个视频网址
    • breakdown: 解析整个播放列表网址
    • hybrid_worker: 自动检测网页是否为 单个视频网址/播放列表网址 并且返回对应结果
  3. 支持的网站

    • youtube
    • tvcf
    • vimeo
    • vmovier
    • iwebad
    • douyin
    • naver
    • hellorf
    • pinterest
    • digitaling
    • weibo
    • adquan
    • xinpianchang
    • carben
    • bilibili
    • tencent
    • instagram
    • lanfan
    • youku
    • renren
    • socialbeta
    • weixin
    • eyepetizer
  4. 测试Demo

    from aioVextractor.extractor.tencent import Extractor as tencentIE
    from pprint import pprint
    
    with tencentIE() as extractor:
        webpage_url = "https://v.qq.com/iframe/player.html?vid=c0912n1rqrw&tiny=0&auto=0"
        res = extractor.sync_entrance(webpage_url=webpage_url)
        pprint(res)
    
    """
    OUTPUT:
    [{'ad_link': None,
      'author': 'Apple 官方频道',
      'author_attention': None,
      'author_avatar': None,
      'author_birthday': None,
      'author_description': None,
      'author_follwer_count': None,
      'author_follwing_count': None,
      'author_gender': None,
      'author_id': None,
      'author_sign': None,
      'author_url': 'http://v.qq.com/vplus/c855f20d041bc7e06f356522325b0902',
      'author_videoNum': None,
      'category': None,
      'cdn_url': None,
      'collect_count': None,
      'comment_count': None,
      'cover': 'http://vpic.video.qq.com/0/c0912n1rqrw.png',
      'description': None,
      'dislike_count': None,
      'download_count': None,
      'downloader': 'aria2c',
      'duration': '30',
      'forward_count': None,
      'from': 'tencent',
      'gender': None,
      'height': None,
      'language': None,
      'like_count': None,
      'play_addr': 'http://video.dispatch.tc.qq.com/uwMROfz2r5zIIaQXGdGlQmdfDmZvd0vRcymWSecrfGm8rzTb/c0912n1rqrw.mp4?vkey=0A9434327F854F742C34AEA63A4F5D91ECD3BD9941D4A21621691B03C74371E884E6AF55D20955207FFCE82AA75A01A55B29C753410E57BDCD9CB487C427D06C88D3DC8EEAF862862C5ACE1D009EA9AB4E9E9FD248C76EA2072BCAF06BA0F96DE76EE242119D5AAC873A6C18214552B745D194B35B1F1525CBE32AC7B90C7EAA',
      'rating': None,
      'recommend': None,
      'region': None,
      'share_count': None,
      'tag': ['敬 Mac 背后的你 - 试出可能 - Apple',
              '腾讯视频',
              '电影',
              '电视剧',
              '综艺',
              '新闻',
              '财经',
              '音乐',
              'MV',
              '高清',
              '视频',
              '在线观看'],
      'title': '敬 Mac 背后的你 - 试出可能 - Apple',
      'upload_date': None,
      'upload_ts': 1262275200,
      'vid': 'c0912n1rqrw',
      'view_count': '246304',
      'webpage_url': 'https://v.qq.com/x/page/c0912n1rqrw.html',
      'width': None}]
    """
  5. 测试Demo

    from aioVextractor.api import hybrid_worker
    import aiohttp
    import asyncio
    from pprint import pprint
    
    async def test(url):
        async with  aiohttp.ClientSession() as session:
            result = await hybrid_worker(
                webpage_url=url,
                session=session,
            )
            return result
    
    url = "https://www.youtube.com/playlist?list=PLs54iBUqIopDv2wRhkqArl9AEV1PU-gmc"  ## u can try any url from `TEST_CASE`
    pprint(asyncio.run(test(url=url)))
    
    
    """
    OUTPUT:
    Processing URL: https://www.youtube.com/playlist?list=PLs54iBUqIopDv2wRhkqArl9AEV1PU-gmc
    ([{'ad_link': None,
       'author': None,
       'author_attention': None,
       'author_avatar': None,
       'author_birthday': None,
       'author_description': None,
       'author_follwer_count': None,
       'author_follwing_count': None,
       'author_gender': None,
       'author_id': None,
       'author_sign': None,
       'author_url': None,
       'author_videoNum': None,
       'category': None,
       'cdn_url': None,
       'collect_count': None,
       'comment_count': None,
       'cover': 'https://i.ytimg.com/vi/61CQm2zVVk0/hqdefault.jpg?sqp=-oaymwEZCPYBEIoBSFXyq4qpAwsIARUAAIhCGAFwAQ==&rs=AOn4CLAKICJl2FlmleQsKntUd0KIeOEjZA',
       'description': None,
       'dislike_count': None,
       'download_count': None,
       'downloader': 'ytd',
       'duration': None,
       'forward_count': None,
       'from': 'youtube',
       'gender': None,
       'height': None,
       'language': None,
       'like_count': None,
       'play_addr': None,
       'playlist_url': 'https://www.youtube.com/playlist?list=PLs54iBUqIopDv2wRhkqArl9AEV1PU-gmc',
       'rating': None,
       'recommend': None,
       'region': None,
       'share_count': None,
       'tag': None,
       'title': "The Avengers Earth's Mightiest Heroes Se1 - Ep01 Breakout (Part "
                '1) - Part 01',
       'upload_date': None,
       'upload_ts': None,
       'vid': '61CQm2zVVk0',
       'view_count': None,
       'webpage_url': 'https://www.youtube.com/watch?v=61CQm2zVVk0&list=PLs54iBUqIopDv2wRhkqArl9AEV1PU-gmc&index=2&t=0s',
       'width': None},
       ...
      {'ad_link': None,
       'author': None,
       'author_attention': None,
       'author_avatar': None,
       'author_birthday': None,
       'author_description': None,
       'author_follwer_count': None,
       'author_follwing_count': None,
       'author_gender': None,
       'author_id': None,
       'author_sign': None,
       'author_url': None,
       'author_videoNum': None,
       'category': None,
       'cdn_url': None,
       'collect_count': None,
       'comment_count': None,
       'cover': 'https://i.ytimg.com/vi/PRT3FjaP71E/hqdefault.jpg?sqp=-oaymwEZCNACELwBSFXyq4qpAwsIARUAAIhCGAFwAQ==&rs=AOn4CLA2zBcMa68iPw6tQO5nSbKlkwFv8w',
       'description': None,
       'dislike_count': None,
       'download_count': None,
       'downloader': 'ytd',
       'duration': None,
       'forward_count': None,
       'from': 'youtube',
       'gender': None,
       'height': None,
       'language': None,
       'like_count': None,
       'play_addr': None,
       'playlist_url': 'https://www.youtube.com/playlist?list=PLs54iBUqIopDv2wRhkqArl9AEV1PU-gmc',
       'rating': None,
       'recommend': None,
       'region': None,
       'share_count': None,
       'tag': None,
       'title': "The Avengers Earth's Mightiest Heroes Se1 - Ep10 Everything Is "
                'Wonderful - Screen 04',
       'upload_date': None,
       'upload_ts': None,
       'vid': 'PRT3FjaP71E',
       'view_count': None,
       'webpage_url': 'https://www.youtube.com/watch?v=PRT3FjaP71E&list=PLs54iBUqIopDv2wRhkqArl9AEV1PU-gmc&index=101&t=0s',
       'width': None}],
     True,
     {'clickTrackingParams': 'CD0QybcCIhMI16ucw-G35QIV40L1BR0A1weh',
      'continuation': '4qmFsgI2EiRWTFBMczU0aUJVcUlvcER2MndSaGtxQXJsOUFFVjFQVS1nbWMaDmVnWlFWRHBEUjFFJTNE'})
    """
  6. 测试通过链接:

aio-vextractor's People

Contributors

dependabot[bot] avatar panoslin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

aio-vextractor's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.