dermasmid / scrapetube Goto Github PK
View Code? Open in Web Editor NEWA YouTube scraper for scraping channels, playlists, and searching 🔎
Home Page: https://scrapetube.readthedocs.io/en/latest/
License: MIT License
A YouTube scraper for scraping channels, playlists, and searching 🔎
Home Page: https://scrapetube.readthedocs.io/en/latest/
License: MIT License
In [764]: import scrapetube
...: videos = scrapetube.get_channel('@ToolingUSME')
...: vs=list(videos)
...: print(U.stime(),len(vs))
~/anaconda3/lib/python3.9/site-packages/scrapetube/scrapetube.py in get_channel(channel_id, channel_url, limit, sleep, sort_by)
48 api_endpoint = "https://www.youtube.com/youtubei/v1/browse"
49 videos = get_videos(url, api_endpoint, "videoRenderer", limit, sleep)
---> 50 for video in videos:
51 yield video
52
~/anaconda3/lib/python3.9/site-packages/scrapetube/scrapetube.py in get_videos(url, api_endpoint, selector, limit, sleep)
148 if is_first:
149 html = get_initial_data(session, url)
--> 150 client = json.loads(
151 get_json_from_html(html, "INNERTUBE_CONTEXT", 2, '"}},') + '"}}'
152 )["client"]
~/anaconda3/lib/python3.9/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
348 cls = JSONDecoder
~/anaconda3/lib/python3.9/json/decoder.py in decode(self, s, _w)
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):
~/anaconda3/lib/python3.9/json/decoder.py in raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
When attempting to extract videos from a YouTube playlist that contains only Shorts(example: https://www.youtube.com/playlist?list=PLxPmekEwS6-bF6AWu3prP2iRltY6Oslrw) using the get_videos
function with the playlistVideoRenderer
parameter, the function fails to capture any videos. This issue does not occur with playlists containing regular videos or a mix of videos and Shorts.
get_videos
function with a playlist URL that contains only YouTube Shorts.playlistVideoRenderer
as the renderer parameter.The function should capture and yield videos from the playlist, regardless of whether they are Shorts or regular videos.
No videos are captured or yielded when the playlist consists solely of Shorts.
A conditional check was implemented to solve this issue. If no videos are captured with playlistVideoRenderer
, the function will attempt to use reelItemRenderer
instead. Here is the updated code snippet that addresses the problem:
videos = get_videos(url, api_endpoint, "playlistVideoRenderer", limit, sleep)
if not any(videos): # Check if the generator is empty
videos = get_videos(url, api_endpoint, "reelItemRenderer", limit, sleep)
for video in videos:
yield video
can get title videos?
I would like to request a new feature to retrieve video titles and dates published using a video ID.
The feature could be implemented by adding a new function, e.g., scrapetube.get_video_metadata(video_id: str)
, which accepts a video ID and returns the video's title and date of publication.
Otherwise this might work
https://github.com/egbertbouman/youtube-comment-downloader
Hi dermasmid,
first thanks for your work, I really appreciate it.
I'm testing scrapetube and I think that if you add a support for youtube user, get_user, in a similar way that get_channel it would expand the use. I modified it in order to test.
Again thanks.
Scrapetube's documentation says this about the parameter 'channel_url':
channel_url (str
, optional) – The url to the channel you want to get the videos for. Since there is a few type’s of channel url’s, you can use the one you want by passing it here instead of using channel_id
.
...yet I can't actually use that parameter:
>>> videos = scrapetube.channel_url("https://www.youtube.com/c/Cmaj7/videos")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'scrapetube' has no attribute 'channel_url'
Some of the data returned by the internal YouTube API is localized, e.g. the upload date string. If no Accept-Language header is present, YouTube will assume the language via the users's IP.
This leads to inconsistent output data and possible parsing errors if scrapetube is used in a non-English-speaking country.
That's why I would suggest setting the Accept-Language header to en
. I am from Germany and can confirm that YouTube ouputs English date strings with that setting.
session = requests.Session()
session.headers[
"User-Agent"
] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101
Safari/537.36"
session.headers["Accept-Language"] = "en"
I got plenty of channel links which I would like to get the video thumbnail urls from (shorts), so that I can add them to a google sheet. Can scrapetube achive that ?
from flask import Flask, jsonify
from flask_cors import CORS
import scrapetube
app = Flask(name)
CORS(app)
@app.route("/")
def hello_world():
return "Hello world"
@app.route('/name/string:song_name')
def search(song_name):
videoid = scrapetube.get_search(song_name , limit=1 , sleep=1)
for video in videoid:
return video["videoid"]
if name == "main":
app.run(debug=True)
previously we could have get all videos data in each channel. However, recently it is limited to only last 20k videos.
How can we address this problem?
This is not a bug; it's a feature request!
Youtube supports at least two formats of channel url:
/channel/{channel_id}
, which is currently covered by the default channel_id
argument in the get_channels()
function/c/{channel_name}
, this typically pops up when Youtube channels are big enough that they get claim to an actual "name", vs a randomized ID value.The current workaround for this is to utilize the current channel_url
argument, however, thought it might prove more streamlined to support an actual channel_name
argument in case users want to search videos by a list of actual channel names!
Here's an example of the additional format which this argument would support (one of my favorite streamers): https://www.youtube.com/c/Welyn
I'll make a PR with the changes; open to thoughts / comments / concerns! And, as always, thank you for your work on this useful package / contribution to the open source community!
Google is giving more and more importance to Youtube Music (see the near future Google Podcast deprecation).
I am wondering if there will be the possibility to extend the scrapetube capabilities also to Youtube Music to search songs, podcast and playlist.
I noticed that the podcasts are shown as playlist in yt music, this is an url as example: https://music.youtube.com/playlist?list=PLNVOpW1eYiyzcqZkYP0PvPkRgAZrbwidN
Currently it appears that the get_playlist function is only able to collect 100 videos, and if the playlist is longer, it simply skips over the remaining videos. Is it possible for this to be fixed?
class notify(commands.Cog):
def __init__(self, bot):
self.bot = bot
self.channels = {
"Airi Viridis Ch. 【V-Dere】": f"@AiriViridis"
}
self.videos = {}
@commands.Cog.listener()
async def on_ready(self):
self.check.start()
@tasks.loop(seconds=60)
async def check(self):
discord_channel = self.bot.get_channel(838365746069372982)
for channel_name in self.channels:
videos = scrapetube.get_channel(channel_url=self.channels[channel_name], limit=1,content_type="streams")
video_ids = [video["videoId"] for video in videos]
print(video_ids)
if self.check.current_loop == 0:
self.videos[channel_name] = video_ids
continue
for video_id in video_ids:
if video_id not in self.videos[channel_name]:
url = f"https://youtu.be/{video_id}"
await discord_channel.send(f"@everyone\n{url}")
self.videos[channel_name] = video_ids
def setup(bot):
bot.add_cog(notify(bot))
I dont know if i'm doing something wrong, but when the bot is running, it sends old videos, like 1 month o more
Hello,
Any planned integration with Zyte?
get_channel function does not being gathered Youtube "shorts" which are short duration videos on youtube
Very interesting and useful library.
I wonder if there is a way to force the traffic to go thru a proxy, instead directly to Youtube, to not get blocked.
I'm getting the same error as #37 and #36
package is up to date as it was installed post-last release
Unhandled exception in internal background task 'upload_check'.
Traceback (most recent call last):
File "C:\Users\lee_p\AppData\Roaming\Python\Python311\site-packages\disnake\ext\tasks_init_.py", line 162, in loop
await self.coro(*args, **kwargs)
File "C:\Users\lee_p\Documents\Programming\The Handler\cogs\youtube.py", line 27, in upload_check
for latest_video in videos:
File "C:\Users\lee_p\AppData\Roaming\Python\Python311\site-packages\scrapetube\scrapetube.py", line 75, in get_channel
for video in videos:
File "C:\Users\lee_p\AppData\Roaming\Python\Python311\site-packages\scrapetube\scrapetube.py", line 199, in get_videos
client = json.loads(
^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\json_init.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
it was initially working but suddenly stopped
I would like to exit the get channel generator after I have reached the video I want. However, it seems there is no safe exit, a socket is left open. Below is the code:
import scrapetube
channel_videos_generator = scrapetube.get_channel(channel_id=channel_id, sort_by="newest")
for video in channel_videos_generator:
if video["videoId"] != since_video_id:
yield video["videoId"]
else:
channel_videos_generator.close()
When I run the above code, I get the warning below:
ResourceWarning: unclosed <ssl.SSLSocket fd=756, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('192.168.0.20', 58222), raddr=('216.58.223.78', 443)>
channel_videos_generator.close()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
I have video id from user input. how can i get the info from the id itself ?
currently get_video only returns "videoPrimaryInfoRenderer" part from "ytInitialData". although for some other applications having "ytInitialPlayerResponse" would be useful. like storyboards thumbnails.
def get_video(
id: str,
) -> dict:
"""Get a single video.
Parameters:
id (``str``):
The video id from the video you want to get.
"""
session = get_session()
url = f"https://www.youtube.com/watch?v={id}"
html = get_initial_data(session, url)
client = json.loads(
get_json_from_html(html, "INNERTUBE_CONTEXT", 2, '"}},') + '"}}'
)["client"]
session.headers["X-YouTube-Client-Name"] = "1"
session.headers["X-YouTube-Client-Version"] = client["clientVersion"]
data = json.loads(
get_json_from_html(html, "var ytInitialData = ", 0, "};") + "}"
)
ytInitialPlayerResponse = json.loads(
get_json_from_html(html, "var ytInitialPlayerResponse = ", 0, "};") + "}"
)
returning = next(search_dict(data, "videoPrimaryInfoRenderer"))
returning['ytInitialPlayerResponse'] = ytInitialPlayerResponse
return returning
something like this would be appreciated.
If I copy the output of the search result ("print(videos)") and paste it into a JSON editor, the editor fails to parse the output, because the name is single-quoted. That is, it's like {'name': 'value'}
. Shouldn't it be {"name": "value"}
?
import scrapetube
videos = scrapetube.get_channel("UCCezIgC97PvUuR4_gbFUs5g")
for video in videos:
print(video['videoId'])
results in:
Traceback (most recent call last):
File "/tmp/test/main.py", line 5, in <module>
for video in videos:
File "/tmp/test/scrapetube.py", line 75, in get_channel
for video in videos:
File "/tmp/test/scrapetube.py", line 199, in get_videos
client = json.loads(
^^^^^^^^^^^
File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I only see this in the output:
'publishedTimeText': {
'simpleText': 'Streamed 59 minutes ago'
},
Is there a way to get a proper date?
I've been trying to extract video links from Mixes (not playlists) from YouTube, but it seems as the page is formatted in a different way from playlists.
Would it be possible to add this functionality to scrapetube?
Ex.:
Playlist link: https://www.youtube.com/playlist?list=PLx9cYq2MA5pf3QAGDuvosf0TgtMCbmBfR
Mix link: https://www.youtube.com/watch?v=ts9H1v0Ob4o&list=RDts9H1v0Ob4o
It seems that getting videos for a channel is not working because the html youtube responds doesn't contain the desired json anymore.
Consider the example
import scrapetube
videos = scrapetube.get_channel("UCCezIgC97PvUuR4_gbFUs5g")
for video in videos:
print(video['videoId'])
it currently only leads to
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[removed for privacy]/.local/lib/python3.8/site-packages/scrapetube/scrapetube.py", line 46, in get_channel
for video in videos:
File "[removed for privacy]/.local/lib/python3.8/site-packages/scrapetube/scrapetube.py", line 142, in get_videos
client = json.loads(
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I'm using the latest version 2.2.2. The search example still works fine.
Is it possible to get the video duration?
What is the limit of scraptube?
I am using it to get all video links at once from YouTube channels. Channels that have a lot of videos. Because humanly it is not possible, we need software to do so. But scraptube can't recognize it.
I just want to know why. Is it possible to get 22,000 video links at once or not?
I method get_playlist without arg limit max video always 100, I noticed this when I tried get playlist 1000 videos
This is a feature request to support Live Chat messages. Here's an example:
Use replit free environment.
I kept getting the error message using the scrapetube.get_video function.
I don't understand why because the code looks fine.
It works after I create a custom python file that only keeps the functions I need for get_video.
``
import json
from typing import Generator
import requests
from typing_extensions import Literal
type_property_map = {
"videos": "videoRenderer",
"streams": "videoRenderer",
"shorts": "reelItemRenderer"
}
def get_session() -> requests.Session:
session = requests.Session()
session.headers[
"User-Agent"
] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
session.headers["Accept-Language"] = "en"
return session
def get_initial_data(session: requests.Session, url: str) -> str:
session.cookies.set("CONSENT", "YES+cb", domain=".youtube.com")
response = session.get(url, params={"ucbcb":1})
html = response.text
return html
def get_json_from_html(html: str, key: str, num_chars: int = 2, stop: str = '"') -> str:
pos_begin = html.find(key) + len(key) + num_chars
pos_end = html.find(stop, pos_begin)
return html[pos_begin:pos_end]
def search_dict(partial: dict, search_key: str) -> Generator[dict, None, None]:
stack = [partial]
while stack:
current_item = stack.pop(0)
if isinstance(current_item, dict):
for key, value in current_item.items():
if key == search_key:
yield value
else:
stack.append(value)
elif isinstance(current_item, list):
for value in current_item:
stack.append(value)
def get_video(
id: str,
) -> dict:
"""Get a single video.
Parameters:
id (str
):
The video id from the video you want to get.
"""
session = get_session()
url = f"https://www.youtube.com/watch?v={id}"
html = get_initial_data(session, url)
client = json.loads(
get_json_from_html(html, "INNERTUBE_CONTEXT", 2, '"}},') + '"}}'
)["client"]
session.headers["X-YouTube-Client-Name"] = "1"
session.headers["X-YouTube-Client-Version"] = client["clientVersion"]
data = json.loads(
get_json_from_html(html, "var ytInitialData = ", 0, "};") + "}"
)
return next(search_dict(data, "videoPrimaryInfoRenderer"))
``
Hey,
Firstly, amazing job on the code, saved a ton of time for me.
I have noticed that the pip package doesn't include the content_type filter in the source code. I thought I will point that out.
Cheers
Hi, I am facing issues after latest update of the package.
In scrapetube.get_channel("UCCezIgC97PvUuR4_gbFUs5g", sort_by="popular"); the function is not retrieving popular videos instead getting the recent videos only.
Please fix it.
Hi
i m trying to use scrapetube on ubuntu 20.04 Python 3.8.10 behind company proxy setup-ed as follow:
proxies = {"http": "http://x.x.253.137:80","https": "http://x.x.253.137:80"}
scrapetube return:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/blairfancy/.local/lib/python3.8/site-packages/scrapetube/scrapetube.py", line 121, in get_search
for video in videos:
File "/home/blairfancy/.local/lib/python3.8/site-packages/scrapetube/scrapetube.py", line 138, in get_videos
client = json.loads(
File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I suspect that proxy block connection.
This is a feature request to add support to retrieve a list of available playlists in a channel. Example:
https://stackoverflow.com/a/26833637
It would be nice to have this feature when getting videos from a channel.
video['publishedTimeText']['simpleText']
will return some strings like "13 days ago" or "2 weeks ago" but the result will change over time.
Also, there are some issues when converting text to date.
Is there a way to check the publication date of a video while running get_channel function?
Hello
Am I doing something wrong? It seems like get_search always finds around 500-600 results even if there are more (see screenshot below), and there is not the same amount of videos from one time to the next (600 for global warming vs 2 million in reality?).
Do you know why ?
In [1]: import scrapetube
In [2]: videos = scrapetube.get_search("global warming")
In [3]: len([1 for _ in videos])
Out[3]: 604
In [4]: videos = scrapetube.get_search("global warming")
In [5]: len([1 for _ in videos])
Out[5]: 605
In [6]: videos = scrapetube.get_search("global warming")
In [7]: len([1 for _ in videos])
Out[7]: 602
Issue: When posting a new video, scraping returns the correct video id's including the new video, the next scrape returns the previous videos without the new video, the third time scraping returns the correct video ids.
How to reproduce:
Every 70 seconds
videos = scrapetube.get_channel(channel_username = your youtube username)
video_ids = [video['videoId'] for video in videos]
print(video_ids)
Run the script and wait for a print to compare with, then post a video on your youtube channel, and wait for 3+ prints, then compare the results.
I'll use numbers instead of youtube video ID's to demonstrate what results i get, think of the number as a youtube video ID.
[5,4,3,2,1] (state of the channel before new video posted)
[6,5,4,3,2] (new video posted)
[5,4,3,2,1] (scrape now returns the old state of the channel, the previous 5 videos, is this from cache?)
[6,5,4,3,2] (from now on, it returns the correct video ID's)
[6,5,4,3,2]
[6,5,4,3,2]
EDIT: I increased the scrape period to 120 seconds and that worked
Partly working for me.
Pip install version did not work. Can't remember the error. (Python 3.8)
Ended up downloading the respiratory as a zip file and unzipped it into my desired folder.
I now use the lovely code by using a modified version of the test script in the 'tests' folder.
That works lovely.
Thanks for sharing 💯
Hello, this library you made is very good, I really like it, but I have a problem while using it.
When I try to get a list of videos from a youtube channel, using channel_url , it doesn't return anything. You can see my code below
>>> from scrapetube import get_channel
>>> list(get_channel(channel_url = "https://youtube.com/@INSOMNIAFILM?si=idnQuTmk6g5XODzT", limit = 1))
[]
When I use channel_url it doesn't return anything, but when I use the id of the youtube channel it returns a list of videos.
>>> list(get_channel(channel_id = 'UCoWUsYrb3xtukm91d3_S3jw', limit = 1)
[{'videoId': 'WH692kPmQfQ', 'thumbnail': {'thumbnails': [{'url': 'https://i.ytimg.com/vi/WH692kPmQfQ/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLDj0SgQg8icqmdeYnY8SuhU-Nm9aQ', 'width': 168, 'height': 94}, {'url': 'https://i.ytimg.com/vi/WH692kPmQfQ/hqdefault.jpg?sqp=-oaymwEbCMQBEG5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLDZ2tkPU-XkFB3gd56YAA9kjvp1Qw', 'width': 196, 'height': 110}, {'url': 'https://i.ytimg.com/vi/WH692kPmQfQ/hqdefault.jpg?sqp=-oaymwEcCPYBEIoBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLBy3H3kE0Nwo7-zP0E8SyRogwUyDw', 'width': 246, 'height': 138}, {'url': 'https://i.ytimg.com/vi/WH692kPmQfQ/hqdefault.jpg?sqp=-oaymwEcCNACELwBSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLBUvFhCh4Ty2CzJy-mnRG9EZ2xEvg', 'width': 336, 'height': 188}]}, 'title': {'runs': [{'text': 'HABIS NONTON FILM INI LANGSUNG PRAKTEKIN ILMUNYA...'}], 'accessibility': {'accessibilityData': {'label': 'HABIS NONTON FILM INI LANGSUNG PRAKTEKIN ILMUNYA... by INSOMNIA FILM 343,444 views 4 days ago 27 minutes'}}}, 'descriptionSnippet': {'runs': [{'text': '"Terinspirasi" dari Film LE BRIO\n\nPengisi Suara :\nRichard Chandra\n\nManager :\nJoko Mulyanto\n\nPenulis :\nSurya Yahya Wijaya\n\nEditor :\nAnton Ramdhani\nDedi Hidayat\nAchmad Regi Permana\n\nMakasih udah...'}]}, 'publishedTimeText': {'simpleText': '4 days ago'}, 'lengthText': {'accessibility': {'accessibilityData': {'label': '27 minutes, 7 seconds'}}, 'simpleText': '27:07'}, 'viewCountText': {'simpleText': '343,444 views'}, 'navigationEndpoint': {'clickTrackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJcWhhVQ29XVXNZcmIzeHR1a205MWQzX1MzaneaAQMQ8jg=', 'commandMetadata': {'webCommandMetadata': {'url': '/watch?v=WH692kPmQfQ', 'webPageType': 'WEB_PAGE_TYPE_WATCH', 'rootVe': 3832}}, 'watchEndpoint': {'videoId': 'WH692kPmQfQ', 'watchEndpointSupportedOnesieConfig': {'html5PlaybackOnesieConfig': {'commonConfig': {'url': 'https://rr1---sn-uxa3vhnxa-ngpe.googlevideo.com/initplayback?source=youtube&oeis=1&c=WEB&oad=3200&ovd=3200&oaad=11000&oavd=11000&ocs=700&oewis=1&oputc=1&ofpcc=1&beids=24350018&msp=1&odepv=1&id=587ebdda43e641f4&ip=182.3.140.209&initcwndbps=270000&mt=1694766340&oweuc='}}}}}, 'ownerBadges': [{'metadataBadgeRenderer': {'icon': {'iconType': 'CHECK_CIRCLE_THICK'}, 'style': 'BADGE_STYLE_TYPE_VERIFIED', 'tooltip': 'Verified', 'trackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJc', 'accessibilityData': {'label': 'Verified'}}}], 'trackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJcQPSDmZ-ku6-_WA==', 'showActionMenu': False, 'shortViewCountText': {'accessibility': {'accessibilityData': {'label': '343K views'}}, 'simpleText': '343K views'}, 'menu': {'menuRenderer': {'items': [{'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Add to queue'}]}, 'icon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'serviceEndpoint': {'clickTrackingParams': 'CPIBEP6YBBgHIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'CPIBEP6YBBgHIhMI1uPzk5qsgQMVY071BR1sNwJc', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': 'WH692kPmQfQ', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'CPIBEP6YBBgHIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['WH692kPmQfQ'], 'params': 'CAQ%3D'}}, 'videoIds': ['WH692kPmQfQ']}}]}}, 'trackingParams': 'CPIBEP6YBBgHIhMI1uPzk5qsgQMVY071BR1sNwJc'}}, {'menuServiceItemDownloadRenderer': {'serviceEndpoint': {'clickTrackingParams': 'CPEBENGqBRgIIhMI1uPzk5qsgQMVY071BR1sNwJc', 'offlineVideoEndpoint': {'videoId': 'WH692kPmQfQ', 'onAddCommand': {'clickTrackingParams': 'CPEBENGqBRgIIhMI1uPzk5qsgQMVY071BR1sNwJc', 'getDownloadActionCommand': {'videoId': 'WH692kPmQfQ', 'params': 'CAI%3D'}}}}, 'trackingParams': 'CPEBENGqBRgIIhMI1uPzk5qsgQMVY071BR1sNwJc'}}, {'menuServiceItemRenderer': {'text': {'runs': [{'text': 'Share'}]}, 'icon': {'iconType': 'SHARE'}, 'serviceEndpoint': {'clickTrackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/share/get_share_panel'}}, 'shareEntityServiceEndpoint': {'serializedShareEntity': 'CgtXSDY5MmtQbVFmUQ%3D%3D', 'commands': [{'clickTrackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJc', 'openPopupAction': {'popup': {'unifiedSharePanelRenderer': {'trackingParams': 'CPABEI5iIhMI1uPzk5qsgQMVY071BR1sNwJc', 'showLoadingSpinner': True}}, 'popupType': 'DIALOG', 'beReused': True}}]}}, 'trackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJc'}}], 'trackingParams': 'CO0BENwwIhMI1uPzk5qsgQMVY071BR1sNwJc', 'accessibility': {'accessibilityData': {'label': 'Action menu'}}}}, 'thumbnailOverlays': [{'thumbnailOverlayTimeStatusRenderer': {'text': {'accessibility': {'accessibilityData': {'label': '27 minutes, 7 seconds'}}, 'simpleText': '27:07'}, 'style': 'DEFAULT'}}, {'thumbnailOverlayToggleButtonRenderer': {'isToggled': False, 'untoggledIcon': {'iconType': 'WATCH_LATER'}, 'toggledIcon': {'iconType': 'CHECK'}, 'untoggledTooltip': 'Watch later', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'CO8BEPnnAxgCIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'addedVideoId': 'WH692kPmQfQ', 'action': 'ACTION_ADD_VIDEO'}]}}, 'toggledServiceEndpoint': {'clickTrackingParams': 'CO8BEPnnAxgCIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/browse/edit_playlist'}}, 'playlistEditEndpoint': {'playlistId': 'WL', 'actions': [{'action': 'ACTION_REMOVE_VIDEO_BY_VIDEO_ID', 'removedVideoId': 'WH692kPmQfQ'}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Watch later'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'CO8BEPnnAxgCIhMI1uPzk5qsgQMVY071BR1sNwJc'}}, {'thumbnailOverlayToggleButtonRenderer': {'untoggledIcon': {'iconType': 'ADD_TO_QUEUE_TAIL'}, 'toggledIcon': {'iconType': 'PLAYLIST_ADD_CHECK'}, 'untoggledTooltip': 'Add to queue', 'toggledTooltip': 'Added', 'untoggledServiceEndpoint': {'clickTrackingParams': 'CO4BEMfsBBgDIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True}}, 'signalServiceEndpoint': {'signal': 'CLIENT_SIGNAL', 'actions': [{'clickTrackingParams': 'CO4BEMfsBBgDIhMI1uPzk5qsgQMVY071BR1sNwJc', 'addToPlaylistCommand': {'openMiniplayer': True, 'videoId': 'WH692kPmQfQ', 'listType': 'PLAYLIST_EDIT_LIST_TYPE_QUEUE', 'onCreateListCommand': {'clickTrackingParams': 'CO4BEMfsBBgDIhMI1uPzk5qsgQMVY071BR1sNwJc', 'commandMetadata': {'webCommandMetadata': {'sendPost': True, 'apiUrl': '/youtubei/v1/playlist/create'}}, 'createPlaylistServiceEndpoint': {'videoIds': ['WH692kPmQfQ'], 'params': 'CAQ%3D'}}, 'videoIds': ['WH692kPmQfQ']}}]}}, 'untoggledAccessibility': {'accessibilityData': {'label': 'Add to queue'}}, 'toggledAccessibility': {'accessibilityData': {'label': 'Added'}}, 'trackingParams': 'CO4BEMfsBBgDIhMI1uPzk5qsgQMVY071BR1sNwJc'}}, {'thumbnailOverlayNowPlayingRenderer': {'text': {'runs': [{'text': 'Now playing'}]}}}], 'richThumbnail': {'movingThumbnailRenderer': {'movingThumbnailDetails': {'thumbnails': [{'url': 'https://i.ytimg.com/an_webp/WH692kPmQfQ/mqdefault_6s.webp?du=3000&sqp=CIDuj6gG&rs=AOn4CLD3RhdFDTEnfUy23ZcDvsbpDMBPSQ', 'width': 320, 'height': 180}], 'logAsMovingThumbnail': True}, 'enableHoveredLogging': True, 'enableOverlay': True}}}]
Can you help me?:)
After the lastest update i can't get videos from a channel
video_generator = scrapetube.get_channel(channel_id=channel_id, limit=5, sort_by="newest")
The generator gives an empty array
HI,
I'm using the latest version 2.5.0. It worked a few days ago, but now it doesn't work anymore, it has the same error it had last year:
for video in videos:
File "/usr/local/lib/python3.10/dist-packages/scrapetube/scrapetube.py", line 75, in get_channel
for video in videos:
File "/usr/local/lib/python3.10/dist-packages/scrapetube/scrapetube.py", line 199, in get_videos
client = json.loads(
File "/usr/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
For example, there is a channel with 2000, there is a channel with 5000, there is no limit, whether all will be collected, and automatically stop after collection
I tried using scrapetube.get_channel("@MrBeast")
and scrapetube.get_channel("MrBeast")
, but both of these throw an error.
How can I get the videos using the channel's handle?
I have tried the following code, but it kept on searching on its own. I am not good at Python, so I am not sure how that is possible, but my guess is that get_search
is asynchronously keep searching, and Python's for
statement keeps monitoring videos
for newly added elements and continues looping?
import scrapetube
videos = scrapetube.get_search("what to search", sort_by="upload_date")
for video in videos:
print(video["title"]["runs"][0]["text"])
But what I want is manually continuing the search when I want, in the traditional way, like having "more" button and when the user presses it, it shows more search results. Is that possible with this library?
please,
Live video information is essential.
Please update to make it work.
now. use code : blow
scrapetube.get_channel(channel_id)
include normal movie id, live movie is not included
get_search
method does not have the "UPLOAD DATE" filter which is available on the YouTube's website. The thing is, when "SORT BY" "Upload date", it returns different results depending on whether "UPLOAD DATE" is set to "Today" or it is not set. From my experiment, get_search
returns the result that is the same searching YouTube's website without setting "UPLOAD DATE".
I need this "UPLOAD DATE" argument, because without this, the search does not return a lot of the new videos.
If adding this argument is technically infeasible, due to YouTube's encryption, obfuscation, or something like that, please close this issue.
Searching for "puppies", with "SORT BY" = "Upload date", "UPLOAD DATE" = "Last hour"
how to scrape channel with @ at the url.
i tried channel_url="https://www.youtube.com/@Danidev"
and channel_id="@Danidev"
but it didn't work
any suggestion or help will be thank full
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.