erma0 / douyin Goto Github PK
View Code? Open in Web Editor NEW抖音爬虫——采集账号主页、喜欢、收藏、音乐原声、话题、搜索、合集、作品、关注、粉丝等公开数据。
License: GNU General Public License v3.0
抖音爬虫——采集账号主页、喜欢、收藏、音乐原声、话题、搜索、合集、作品、关注、粉丝等公开数据。
License: GNU General Public License v3.0
下载错误提示:
Exception caught
Exception: [download_helper.cc:562] errorCode=1 Failed to open the file .\下载\user_🔱, cause: File not found or it is a directory 在下载目录里有用户命名的下载列表文档,没有视频文件夹及文件。
附视频列表文档:
user_🔱 白子画 短视频拍摄_MS4wLjABAAAAUe1jo5bYxPJybmnDDMxh2e9A95NAvoNfJiL7JVX5nhQ.txt
运行环境:
**运行命令:exec.py -u users.txt
请提供你运行的命令(主要是测试链接地址)以便复现问题
(例如:./douyin -u https://*/)
问题描述:
详细说明出现什么问题
刚开始登录时定义save_cookies提示错误
def save_cookies(cookies: list, key: list[str] = None):
报错:
发生异常: TypeError
'type' object is not subscriptable
File "D:\codes\Douyin\login.py", line 33, in Login
def save_cookies(cookies: list, key: list[str] = []):
File "D:\codes\Douyin\login.py", line 6, in
class Login(object):
TypeError: 'type' object is not subscriptable
这个错误是因为 list[str] = None 是一个语法错误。list[str] 意思是访问列表 list 的 str 键对应的值,但 str 不是一个整数索引,无法访问列表元素。所以会产生 TypeError: 'type' object is not subscriptable 错误。 要修复这个错误,应将默认参数 key 直接设置为 None,而不是 list[str] = None: py def save_cookies(cookies: list, key: list = None): 将 key 的默认值直接设置为 None。 所以这个函数定义应修复为: py def save_cookies(cookies: list, key: list = None):
在
if self.type in ['post', 'like', 'follow', 'fans']: # post页面需提取初始页面数据 self.title = render_data['42']['user']['user']['nickname'] self.info = render_data['42']['user'] # 备用
原程序是41,我自己调试发现是42
还有编译后也不知到哪里有问题,exe不能运行,只能在vscode运行,没来的及调试
运行环境:
问题描述:
似乎因为反爬虫,当主页视频过多的时候,无法自动加载更多视频,每个主页只能采集19个视频或图文
本身在正常环境使用的浏览器中,加载更多视频的时候,也需要通过滑块或者文字顺序点击,才能加载出来
运行环境:
运行命令:
请提供你运行的命令(主要是测试链接地址)以便复现问题
(例如:./douyin -u https://*/)
问题描述:
详细说明出现什么问题
命令:douyin -t post -u https://v.douyin.com/ia7kMcG/
系统:Windows 11
报错内容:requests.exceptions.SSLError: HTTPSConnectionPool(host='v.douyin.com', port=443): Max retries exceeded with url: /iaWnRyC (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1125)')))
[19128] Failed to execute script 'exec' due to unhandled exception!
运行环境:
运行命令:
请提供你运行的命令(主要是测试链接地址)以便复现问题
./douyin -b chrome -n 2 -u https://www.douyin.com/user/MS4wLjABAAAA_ELoNs05CNtn3foI5YZnvV25tlVectip3-uFFokqq_iSUtS6jakIkOBzSVMn5vc5?vid=7312361433088445730
问题描述:
采集时报错说Douyin对象的info属性为空
详细说明出现什么问题
前几天还好好的能采集,今天就采集不了了,报错以下信息:
Traceback (most recent call last):
File "exec.py", line 91, in
File "click/core.py", line 1157, in call
File "click/core.py", line 1078, in main
File "click/core.py", line 1434, in invoke
File "click/core.py", line 783, in invoke
File "exec.py", line 74, in main
File "exec.py", line 83, in start
File "spider.py", line 513, in run
File "spider.py", line 449, in page_init
AttributeError: 'Douyin' object has no attribute 'info'
你好,未来有打算出一个采集指定人的关注列表,然后输出到文件里的功能吗?
大佬,render_data取不到了,咋整
测试
https://www.douyin.com/user/MS4wLjABAAAAeYMREDSRXRWVVy3bk8ielQS59pkqnP-RmxZu5LTB5m-rEnOr0cbTEU-12RupXxAx
也可以换成任意播主
def save(self):
_.append(
f'{line["download_addr"]}\n\tdir={self.down_path}\n\tout={filename}.mp4\n\tuser-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36\n\theader=Cookie:{msToken}\n')
而msToken是根据auth.json再请求一次就会生成出来
aria2c无没有带此cookie下载,服务器就好像返回空白或403
def handle(self, route: Route):
<Route request=>
就是这接口,现在返回空白非json
想求问一下博主大大,怎么能找到如标题的解决方法啊,抖音网页版好像不支持这个服务了
linux
目前的代码爬取的封面并不是真正的封面,而是短视频的第一帧,想请问下有什么方法或接口能够爬取视频封面吗?
当Headless为True时,无法切换下一页,打印重试+1,改为False时正常。
效果非常好,请问作者有意愿拓展下载tiktok海外版的功能吗
你好,请问如何只采集最近更新的几个视频呢?现在的-l 5参数得到的不是最新的视频。
另外使用douyin.exe -t https://....,登录后,本目录下生成的auth.json文件,里面的内容是{"cookies": null},下次再运行时需要删除这个文件再重新登录,否则会报错。
Traceback (most recent call last):
File "douyin.py", line 322, in
File "click\core.py", line 1130, in call
File "click\core.py", line 1055, in main
File "click\core.py", line 1404, in invoke
File "click\core.py", line 760, in invoke
File "douyin.py", line 298, in main
File "douyin.py", line 305, in start
File "douyin.py", line 230, in run
File "playwright\sync_api_generated.py", line 14048, in new_context
File "playwright_impl_sync_base.py", line 104, in _sync
File "playwright_impl_browser.py", line 126, in new_context
File "playwright_impl_connection.py", line 61, in send
File "playwright_impl_connection.py", line 461, in wrap_api_call
File "playwright_impl_connection.py", line 96, in inner_send
playwright._impl._api_types.Error: storageState.cookies: expected array, got object
[9872] Failed to execute script 'douyin' due to unhandled exception!
如标题,还有就是现在cookies登录验证已经失效了,下载不了短视频了。
如何用API.py搭建一个接口
爬取话题时,每个list返回10个作品,后续list爬取时cursor并不会更新,因为返回list中的cursor一直是0,所以相当于当limit>10时,一直在重复爬取前10个视频
运行环境:
运行命令:
直接运行douyin.exe,输入抖音网页URL,运行
问题描述:
显示爬取成功,点开结果发现视频全是0字节,根本没有下载成功。
我自己试着加了一下,但是pagenext 找不到合适的方法;
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.