xiyaowong / spiders Goto Github PK

View Code? Open in Web Editor NEW

622.0 21.0 209.0 3.67 MB

Python爬虫，返回一定格式的信息，下载，使用flask提供简易api。抖音无水印、皮皮虾、快手、网易云音乐、qq音乐、咪咕音乐、荔枝FM音频、知乎视频、最右语音、视频、微博......

License: MIT License

Python 99.88% Shell 0.12%

qqmusic 163music douyin kuaishou tudou lizhifm zhihu zuiyou music video

spiders's Introduction

新情况

这是很久没管的旧项目，代码质量和风格一言难尽，部分爬虫仍然可用。现计划用 fastAPI 框架搭建一个简单解析 API 服务，功能仍是简单粗糙，但用于学习或日常使用还是可以的

切换到fastapi 分支即可

都是相对简单的爬虫，熟练应该看一眼就懂了，如果是初学者，里面有些东西还是值得看一看的。
爬虫文件详情在这里 extractor

pip3 install -r requirements.txt
python3 extract.py

可能还需要安装 nodejs

screenshot
release
欢迎star⭐ & fork

spiders's People

Contributors

Stargazers

Watchers

Forkers

rip-github lzjian119 1752325542 shilimin0 cnbillow simfeng sunqiang25 zhaoze92 sse001007 lieternity graceshare xaoyaoo whitespur yhonzou abdurihim seanid leaveorstay berluo kak2019 eewe11 gobyto hefangcan darkfunct blue-battery shadowmimosa fanxy121 lvfulongmy markfinding lanpice blaobla micross backupforkrepos jiesns leafisme narutoboruto pengjinfu qiuhanbao keven998 shawn-john covein ksksks2222 pantness cenchaojun ligenxun sanshao27 yuwenhou kid54001 sunjinbo2008 raodinghao 15737117639 chuyio panziqiang007 detector-m whdevlab jianyushu duhaibo0404 hiadou liulxd wrxxtt yzdsoul ziaenezhad pig3three jiapengwei q409640976 0xdeceiverangel ongbe xiaowenhuman remainsu zhengtongxue mhp08 klcintw shizhibin xunshengliuyin qswdlsr xgx0301 trock wangxianbiao ycj0217 utmcontent ljtonly copyit brianwalkertoretto sunshicheng kelvin-lin-2020 bugroom ming437 knowthyself7433 jimbunny gitchg newrain7803 wcjb arkizat suntao789 zhzack jingjie0718 fantasthu sxhylkl klzg002 pingxingshikong msopengl

spiders's Issues

kuaishou live video support

It might be good to add support for live videos, the are usually just .flv files which you can see when loading the page. The url would look something like this https://live.kuaishou.com/u/LLLYYY666999

网易云音乐的接口也不能用了

大神你好 acfun的你的接口是m3u的可以增加mp4的吗？

我最近见到过别人的程序这种链接可以抓到mp4地址。不知道接口是什么，你能搞出来吗？
链接地址是这样的： http://tx-safety-video.acfun.cn/mediacloud/acfun/acfun_video/_HqzwHV5zsUv5ikORMxxnCZ57jFEfr7yVZTsBkPY4IUoV9z7OHXzsWDBNN347awq.mp4?pkey=AAKf7uEXJLRQHesoGMG9YV0Lyj9zf45yhew8thKGBd3S8-SKGrykotFZE2SutIIytSib9pHPici_CWfopvgbKpc03Qj7IwAaN07B_eDm5JK2fOS4tN4dPqCWSqsSnLTRXilI4glJSuvwerGbtNZugdNWnwJRLpJp1wTIUDcikvT3cobxOuVBarRXjyBUj2HqQuIpSJ1HqVCl9TOYBtc5Hy9cNtyuv8A9ufeK4NfVbpGGWK-FYDiHgocQYAFI5VRSacZaqTP5vBT58CxLzQIz4bggpOPR8lQHnZLrUg4N8x6MMQ

kuaishou

I seems your fix that allows downloads for kuaishou has broken the desktop links again, the same error 'title': '快手，记录世界记录你'

知乎视频不能用了

请大神得空更新下

kuaishou

Traceback (most recent call last):
File "kuaishou.py", line 53, in
Rprint(get(input("url: ")))
NameError: name 'Rprint' is not defined

set back to pprint and it's perfect

没法批量下载

https://v.douyin.com/Jdg9Uxx/ 此链接报错，其它正常

https://v.douyin.com/Jdg9Uxx/

皮皮虾的不行了

皮皮虾的无法获取到链接了，博主什么时候有时间，给更新一下呗。

kuaishou issue

kuaishou gives error 404

荔枝fm屏蔽了网页端播放

ies douyin support

Possible support for douyin desktop video link https://www.iesdouyin.com/share/video/6805024822687534344/?region=UK&mid

希望添加更多的视频下载spider，我很需要

希望添加更多的视频下载spider，我很需要照着抄，多谢了

无法下载QQ音乐的无损啊

不是无损音质的

kuaishou解析的方式好像更新了。

kuaishou解析的方式好像更新了，有没有要升级一下的想法？

修复bilibili视频下载

import re
import requests


def get(url: str) -> dict:
    """
    imgs、videos
    """
    data = {}
    headers = {
        "user-agent":
        "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1",
        "Referer": "https://www.bilibili.com/",
    }

    av_number_pattern = r'(BV[0-9a-zA-Z]*)'
    cover_pattern = r"image: '(.*?)',"
    video_pattern = r"video_url: '(.*?)',"
    title_pattern = r'title":"(.*?)",'

    av = re.findall(av_number_pattern, url)
    if av:
        av = av[0]
    else:
        data["msg"] = "链接可能不正确，因为我无法匹配到av号"
        return data
    url = f"https://www.bilibili.com/video/{av}"

    with requests.get(url, headers=headers, timeout=10) as rep:
        if rep.status_code == 200:
            cover_url = re.findall(cover_pattern, rep.text)
            if cover_url:
                cover_url = cover_url[0]
                if '@' in cover_url:
                    cover_url = cover_url[:cover_url.index('@')]
                data["imgs"] = ['https:'+cover_url]

            video_url = re.findall(video_pattern, rep.text)
            title_text = re.findall(title_pattern, rep.text)
            if video_url:
                video_url = video_url[0]
                data["videos"] = ['https:' + video_url.replace('upos-hz-mirrorakam.akamaized.net','upos-sz-mirrorkodo.bilivideo.com')]
            if title_text:
                data["videoName"] = title_text[0]
        else:
            data["msg"] = "获取失败"
        return data


if __name__ == "__main__":
    print(get(input("url: ")))

How to make the extact.py download the mp4?

Sorry I am a little confused, when I run the extract.py, it lists all the mp4's but does not download, how to make it download like this? https://camo.githubusercontent.com/f9dc47c16f860c6b311fcf4cbf71e8f89621595c/68747470733a2f2f63646e2e6a7364656c6976722e6e65742f67682f786979616f776f6e672f737069646572732f73637265656e73686f742f72756e2e676966

抖音好像不行

bilibili 高清mp4

想要bilibili 1080 的原生mp4 接口。

我能否将此项目中的部分代码用于我的项目？

如题，本人不会将代码用于商业用途，会注明来源

Douyin issue

The douyin.py produces all the valid information and video + audio The video link cannot be downloaded

kuaishou.com one last issue

Your corrections fixed the v.kuaishou link, however it did not fix desktop links. With desktop links it is much easier to batch download. Here is an example desktop link

https://live.kuaishou.com/u/3x5tr2y938qzhx2/3xbt26hzgwwguaa

If you run the desktop link in a mobile browser, it has the removed watermark the same as the v.kuaishou link

xiyaowong / spiders Goto Github PK

spiders's Introduction

新情况

都是相对简单的爬虫，熟练应该看一眼就懂了，如果是初学者，里面有些东西还是值得看一看的。

爬虫文件详情在这里 extractor

screenshot

release

欢迎star⭐ & fork

spiders's People

Contributors

Stargazers

Watchers

Forkers

spiders's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs