tombulled / innertube Goto Github PK

View Code? Open in Web Editor NEW

256.0 7.0 18.0 298 KB

Python Client for Google's Private InnerTube API. Works with YouTube, YouTube Music and more!

Home Page: https://pypi.org/project/innertube/

License: MIT License

Python 100.00%

google youtube innertube python api

innertube's Introduction

innertube

Python Client for Google's Private InnerTube API. Works with YouTube, YouTube Music, YouTube Kids, YouTube Studio and more!

About

This library handles low-level interactions with the underlying InnerTube API used by each of the YouTube services.

Here are a few articles available online relating to the InnerTube API:

Installation

innertube uses Poetry under the hood and can easily be installed from source or from PyPI using pip.

Latest Release

pip install innertube

Bleeding Edge

pip install git+https://github.com/tombulled/innertube@develop

Usage

>>> import innertube
>>>
>>> # Construct a client
>>> client = innertube.InnerTube("WEB")
>>>
>>> # Get some data!
>>> data = client.search(query="foo fighters")
>>>
>>> # Power user? No problem, dispatch requests yourself
>>> data = client("browse", body={"browseId": "FEwhat_to_watch"})
>>>
>>> # The core endpoints are implemented, so the above is equivalent to:
>>> data = client.browse("FEwhat_to_watch")

Comparison with the YouTube Data API

The InnerTube API provides access to data you can't get from the Data API, however it comes at somewhat of a cost (explained below).

	This Library	YouTube Data API
Google account required	No	Yes
Request limit	No	Yes
Clean data	No	Yes

The InnerTube API is used by a variety of YouTube services and is not designed for consumption by users. Therefore, the data returned by the InnerTube API will need to be parsed and sanitised to extract data of interest.

Endpoints

Currently only the following core, unauthenticated endpoints are implemented:

	YouTube	YouTubeMusic	YouTubeKids	YouTubeStudio
config	✓	✓	✓	✓
browse	✓	✓	✓	✓
player	✓	✓	✓	✓
next	✓	✓	✓
search	✓	✓	✓
guide	✓	✓
get_transcript	✓
music/get_search_suggestions		✓
music/get_queue		✓

Authentication

The InnerTube API uses OAuth2, however this has not yet been implemented, therefore this library currently only provides unauthenticated API access.

innertube's People

Contributors

Stargazers

Watchers

Forkers

pixdoet bluabk princenorin puretube vsedov itzmeswapy sulimanlab ryq99 ravigarg01 saifrahmed open-source-bytes shihs oneman trekpots liguangjie0423 davecs1 passw dmozh

innertube's Issues

Android & IOS Client Versions needs bumping

In config.py the client_version for Android and IOS can be bumped to client_version="18.11.34" to avoid HTTP 400 error.

Existing versions:

client_name="ANDROID",
client_version="17.13.3",

client_name="IOS",
client_version="17.14.2",

WEB_REMIX client version update

Good project! This project is very useful for research.

The current WEB_REMIX client version is 0.1, but I found that a new version 1.20220606.03.00 is available. Response JSON structure has changed better. Can you make an update?

:bug: Continuation for browse method not working

Maybe I'm doing it wrongly, I tried the client.browse method to get a channel's videos, it returned the first 30 videos, then I used the continuation argument and passed the continuation token from the first response to get the next maybe 30 videos, but still returned the same first response. I passed the params argument too.(to first navigate to the channel videos page)
Can you please provide an example on this?
Also, what does the index parameter for the .next method do? I tried to go through the codebase but couldn't understand.

Receiving Request Error 400 on API Call

I was using this API to normalize video volume by extracting Youtube's loudness info. I ended up simplifying everything down to match the example provided in the readme and still ended up receiving the following error when trying to call any specific video:

import innertube
client = innertube.InnerTube("WEB")
data = client.browse("-3qAhp6MWYI")
print(data)

Traceback (most recent call last):
File "C:\Users...\main.py", line 5, in
data = client.browse("-3qAhp6MWYI")
File "C:\Users...\lib\site-packages\innertube\clients.py", line 107, in browse
return self(
File "C:\Users...\lib\site-packages\innertube\clients.py", line 32, in call
self.adaptor.dispatch(endpoint, params=params, body=body)
File "C:\Users...\lib\site-packages\innertube\adaptor.py", line 65, in dispatch
raise RequestError(api.error(error))

innertube.errors.RequestError: 400 Bad Request: Request contains an invalid argument.

I didn't notice it at the time but I suppose it hasn't been working properly for the past few weeks. Also I am not too proficient in coding so I apologize if this is something very obvious.

Can you provide some documentation on how the API actually works?

You obviously coded this software to work with the api's specifications. However I'm more interested in your documentation itself. Like what are the API endpoint URLs? What request type does it use(get or post)? Are the parameters for the requests sent in the request body, or in the URL string after the question mark? I assume you had to reverse engineer it to document it before you could implement it as you've done. Could you please post your actual documentation here?

[Question?] How can I get download link video like yt-dlp, youtube-dl ?

How do you get the number of likes?

Is it even possible to do this through the api without loading the page itself? I mean the full number

from innertube import InnerTube, Client
from pprint import pprint

video_id = 'zL3wWykAKfs'
web_remix = InnerTube("WEB_REMIX")
data = web_remix.music_get_queue(video_ids=[video_id])

print(data['queueDatas'][0]['content']['playlistPanelVideoRenderer']['longBylineText']['runs'][-1]['text'])
# 262K likes

TypeError: 'module' object is not callable

Whenever I import innertube, it tries to import export, but failed to do so with the following error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/innertube/__init__.py", line 1, in <module> from .apis import InnerTube File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/innertube/apis.py", line 5, in <module> import useragent File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/useragent/__init__.py", line 2, in <module> from .enums import ProductName File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/useragent/enums.py", line 1, in <module> import enumb File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/enumb/__init__.py", line 1, in <module> from .bases import * File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/enumb/bases.py", line 5, in <module> from . import generators File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/enumb/generators.py", line 6, in <module> from . import models File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/enumb/models.py", line 8, in <module> class Arguments: TypeError: 'module' object is not callable

This happens on python 3.8.7 on MacOS 10.14. I'm not sure about other operating systems and such.
Before this, it complained about exports not available, which I installed with python3 -m pip install exports

AttributeError: 'InnerTubeSession' object has no attribute 'headers'

When using example code on readme:

import innertube
client = innertube.InnerTube(innertube.Client.WEB)

gives

Traceback (most recent call last):
  File "/home/colethedj/dev/innertube-play/main.py", line 2, in <module>
    client = innertube.InnerTube(innertube.Client.WEB)
  File "<attrs generated init innertube.apis.InnerTube>", line 6, in __init__
  File "/home/colethedj/dev/innertube-play/virtpy/lib/python3.9/site-packages/innertube/apis.py", line 53, in __attrs_post_init__
    self.session.headers.update(adaptor.headers)
AttributeError: 'InnerTubeSession' object has no attribute 'headers'

Installed with pip install git+https://github.com/tombulled/innertube for reference.
Thanks

Support Parsing

innertube previously had support for parsing. An example implementation is shown below:

import dataclasses
from typing import Callable, Iterable, Optional, Set

import innertube
import roster
from innertube.models import ResponseFingerprint


@dataclasses.dataclass
class Target:
    request: Set[str] = dataclasses.field(default_factory=set)
    function: Set[str] = dataclasses.field(default_factory=set)
    browse_id: Set[str] = dataclasses.field(default_factory=set)
    context: Set[str] = dataclasses.field(default_factory=set)
    client: Set[str] = dataclasses.field(default_factory=set)

    @classmethod
    def from_response_fingerprints(cls, *response_fingerprints: ResponseFingerprint):
        parser = cls()

        response_fingerprint: ResponseFingerprint
        for response_fingerprint in response_fingerprints:
            key: str
            value: str
            for key, value in dataclasses.asdict(response_fingerprint).items():
                if value is not None:
                    getattr(parser, key).add(value)

        return parser

    def keys(self) -> Iterable[str]:
        return dataclasses.asdict(self).keys()

    def values(self) -> Iterable[str]:
        return dataclasses.asdict(self).values()

    def any(self) -> bool:
        return any(self.values())

    def all(self) -> bool:
        return all(self.values())

    def intersect(self, rhs):
        parser = type(self)()

        key: str
        value: str
        for key, value in dataclasses.asdict(self).items():
            if value in getattr(rhs, key):
                getattr(parser, key).add(value)

        return parser

    def union(self, rhs):
        parser = type(self)()

        for child in (self, rhs):
            key: str
            value: str
            for key, value in dataclasses.asdict(child).items():
                getattr(parser, key).add(value)

        return parser


parsers: roster.Register[Callable[[dict], dict], Target] = roster.Register()

parser: Callable = parsers.value(Target)


@parser(request=innertube.Request.CONFIG)
def parse_config(data: dict) -> dict:
    print("PARSING:", data)

    return data


client: innertube.Client = innertube.Client(
    adaptor=innertube.InnerTubeAdaptor(
        context=innertube.ClientContext("WEB_REMIX", "0.1")
    )
)


@client.middleware
def parse(call_next, data):
    fingerprint: Optional[innertube.ResponseFingerprint] = innertube.fingerprint(data)

    if fingerprint is None:
        raise Exception("Can't fingerprint data")

    fingerprint_target: Target = Target.from_response_fingerprints(fingerprint)

    parse: Callable[[dict], dict]
    target: Target
    for parse, target in reversed(parsers.items()):
        if not target.any() or target.intersect(fingerprint_target).any():
            return call_next(parse(data))

    raise Exception(f"No parser found for response with fingerprint: {fingerprint!r}")


data: dict = client(innertube.Endpoint.CONFIG)

It'd be awesome to re-implement this functionality

ANDROID_LITE

where did this come from?

innertube/innertube/config.py

Lines 174 to 179 in e25abdd

 ClientContext( 

 client_id=38, 

 client_name="ANDROID_LITE", 

 client_version="3.26.1", 

 user_agent=USER_AGENT_ANDROID, 

 ),

How do you filter searches using params?

🧵 Async Client for InnerTube

Hi, thank for this lib, its amazing.
I want to create wrapper for this lib to my custom sdk, but I noticed that innertube does not have an asynchronous interface.
I have researched the code, and I see that httpx supports async interface with AsyncClient. After small modification (which not affect main codebase) i wrote async interface with usage AsyncClient which allows to work in async.

For example:

This sync client in async usage

async def task(client: InnerTube, i):
    print(f'task {i} start', 'sleep s', 5)
    await asyncio.sleep(5)
    client.search("arctic monkeys", params=PARAMS_TYPE_PLAYLIST)
    print(f'task {i} end')


async def main() -> None:
    start = datetime.datetime.now().timestamp()
    client = InnerTube("WEB", "2.20230920.00.00")
    tasks = []
    for i in range(10):
        tasks.append(asyncio.ensure_future(task(client, i)))
    await asyncio.gather(*tasks)
    end = datetime.datetime.now().timestamp()
    print(end-start)


asyncio.run(main())

and result

task 0 start sleep s 5
task 1 start sleep s 5
task 2 start sleep s 5
task 3 start sleep s 5
task 4 start sleep s 5
task 5 start sleep s 5
task 6 start sleep s 5
task 7 start sleep s 5
task 8 start sleep s 5
task 9 start sleep s 5
task 0 end
task 2 end
task 6 end
task 9 end
task 8 end
task 5 end
task 7 end
task 4 end
task 1 end
task 3 end
8.483278036117554

Here we see that event loop is blocked on every request, because every request is synchronous.

And example with async client

async def task(client: InnerTube, i):
    print(f'task {i} start','sleep s', 5)
    await asyncio.sleep(5)
    await client.search("arctic monkeys", params=PARAMS_TYPE_PLAYLIST)
    print(f'task {i} end')


async def main() -> None:
    start = datetime.datetime.now().timestamp()
    client = InnerTube("WEB", "2.20230920.00.00")
    tasks = []
    for i in range(10):
        tasks.append(asyncio.ensure_future(task(client, i)))
    await asyncio.gather(*tasks)
    end = datetime.datetime.now().timestamp()
    print(end-start)


asyncio.run(main())

and result

task 0 start sleep s 5
task 1 start sleep s 5
task 2 start sleep s 5
task 3 start sleep s 5
task 4 start sleep s 5
task 5 start sleep s 5
task 6 start sleep s 5
task 7 start sleep s 5
task 8 start sleep s 5
task 9 start sleep s 5
task 2 end
task 5 end
task 1 end
task 9 end
task 0 end
task 4 end
task 7 end
task 3 end
task 6 end
task 8 end
5.66729998588562

So, what do you think about it? If you intresting, i can create PR with include async interface.

sets dependency missing

It appears that the dependency sets is supposed to be at tombulled/sets, but that is a 404. This means the program cannot be installed or used. Is this a mistake?

What if one day youtube restrict the ApiKey that using in this project ?

I have a concern about the ApiKey, it being fixed, where can i find a replacement ? I try using my apiKey that enabled Youtube Api Data but it's not working

YouTube Comments

Hello, I notice in #17 it's stated that getting comments is not part of the InnerTube API. I'm not sure if things have changed or if I am misunderstanding what constitutes as part of the InnerTube API, but by doing the following I have managed to get the comments:

Send a next request to https://www.youtube.com/youtubei/v1/next?key={key} with the specified video ID in the data.
Extract the continuation token. There's a default, a "Top" sort, and a "New" sort. I've only tried the default.
Sending a second next request without specifying the video ID, but instead specifying the continuation in the data block.
This should return the first 20 or so comments in a very ugly nested way.

Something I've yet to figure out is how to get a highlighted comment to appear at the top of the json list. If you click on a YouTube comment's date, it will open a link with a "&lc=" param that has the comment's ID. And in the comments it will appear at the top as "Highlighted".

If I use the continuation token for the second request from the dev tools inspector when loading the highlighted comment link in the browser then the second next request properly returns the highlighted comment at the top of the json list.

However, if I try using the continuation retrieved from the first next request programmatically then it always returns the comments without the highlighted comment at the top, so it can be assumed the highlighted comment is tied to the continuation token which seems to be generated outside of the scope of the next endpoint, unless I've simply not found the correct way yet.

Requests cause AttributeError

error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/site-packages/innertube/clients.py", line 87, in player
    return self(
  File "/usr/lib/python3.10/site-packages/innertube/clients.py", line 56, in __call__
    response: requests.Response = self.session.post(*args, **kwargs)
  File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 577, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3.10/site-packages/innertube/sessions.py", line 89, in send
    response: requests.Response = super().send(request, **kwargs)
  File "/usr/lib/python3.10/site-packages/innertube/sessions.py", line 48, in send
    if content_type.subtype != mediatype.MediaTypeSubtype.JSON:
AttributeError: 'dict' object has no attribute 'subtype'

code:

>>> import innertube as it
>>> client = it.InnerTube(it.Client.ANDROID_MUSIC)
>>> client.player(video_id="ccIdf7kcH7g")
# Error

How to get information about a video?

Is there a way to get information about a particular video such as likes,comments count or comments itself?

Is it possible to get a home page feed for the logged in user.

How do you use the get_transcript method?

I don't see any useful documentation or examples for get_transcript. The one example in the original PR used a very long param and I can't find anything similar from the next or player methods. Where did it come from? And what about varying languages? Please explain, thank you.

How can I select which captions should be downloaded in get_transcript()?

If have a caption name string (example: "English (auto-generated)"), how can I select this specific caption to get downloaded?
How can I get the transcript's name from the API?

Thank you in advance!

:sparkles: Port `innertube` to Dart

Is it possible for you to port this library to dart so that it could be used in flutter.

Can i just search on YouTubekids? or how do I know the videoID of result is for YouTube or YouTubekids?

how to play the link got from yt music

please say how to use this lib for yt music , i want to get the music url

i got this url :

"streamingData": {
    "expiresInSeconds": "21540",
    "formats": [
      {
        "itag": 18,
        "mimeType": "video/mp4; codecs=\"avc1.42001E, mp4a.40.2\"",
        "bitrate": 385054,
        "width": 640,
        "height": 360,
        "lastModified": "1679177037392938",
        "quality": "medium",
        "fps": 25,
        "qualityLabel": "360p",
        "projectionType": "RECTANGULAR",
        "audioQuality": "AUDIO_QUALITY_LOW",
        "approxDurationMs": "312099",
        "audioSampleRate": "44100",
        "audioChannels": 2,
        "signatureCipher": "s=DQng-n%3DM3r_wNoI4iPeZbVvXMDff5jLP00x-pAU1b6TrAEiAp0dlW5Qn5OKADxYz82wDiijP36LbJcmfZGSdwCK9fRIAhIgRwAgewLNAZAZ&sp=sig&url=https://rr3---sn-cnoa-jv3l.googlevideo.com/videoplayback%3Fexpire%3D1699905706%26ei%3DSixSZcruG_zd4-EPieyh4A0%26ip%3D117.230.27.134%26id%3Do-AHlq_5mSFdDDumCYuy8Xjg5LyeELW7BIiE9naezMfuvh%26itag%3D18%26source%3Dyoutube%26requiressl%3Dyes%26mh%3Drj%26mm%3D31%252C29%26mn%3Dsn-cnoa-jv3l%252Csn-h557snsl%26ms%3Dau%252Crdu%26mv%3Dm%26mvi%3D3%26pl%3D23%26initcwndbps%3D93750%26vprv%3D1%26mime%3Dvideo%252Fmp4%26ns%3DWxFiRiu_A85I_3zbvD1k-LYP%26cnr%3D14%26ratebypass%3Dyes%26dur%3D312.099%26lmt%3D1679177037392938%26mt%3D1699883593%26fvip%3D2%26fexp%3D24007246%26beids%3D24350018%26c%3DWEB_MUSIC%26txp%3D5538434%26n%3D3V3z5f-VGEaUEV6%26sparams%3Dexpire%252Cei%252Cip%252Cid%252Citag%252Csource%252Crequiressl%252Cvprv%252Cmime%252Cns%252Ccnr%252Cratebypass%252Cdur%252Clmt%26lsparams%3Dmh%252Cmm%252Cmn%252Cms%252Cmv%252Cmvi%252Cpl%252Cinitcwndbps%26lsig%3DAM8Gb2swRQIhAIHiYEYuNk0HSCdtBAY-RL5SrAMZ8MjxWoRSJ-MNoPB8AiAoAXG6AjN9pk6T93lHKIKZHAK0pZucXPSwFZLMUWEfqA%253D%253D"
      }
    ]

but the link from this is not playable
andthe link after sig&url= is also not working

how to play the music

:sparkles: Support Authentication

Not sure if you already knew about it or not but from my testing you can do authenticated requests by using cookies. (Not sure how long will this last before needing to repeat the process again).

Steps to do

Open incognito browser and open developer console.
Browse to network tab and enable Preserve log.
Type in set_registration in the network search bar.
Go to https://www.youtube.com/ and login as usual.
Then select the 2nd result and then copy the whole cookie from the request headers.
The cookie values needed are SID, HSID, SSID, APISID and SAPISID. You can delete the rest values or just leave them.
Generate SAPISIDHASH by inserting SAPISID value obatained from above into the function below.

import time
import hashlib

def hashString(password):
    hash_object = hashlib.sha1(password.encode())
    pbHash = hash_object.hexdigest()
    return pbHash

current_time = str(int(time.time()))
origin = "https://www.youtube.com"
sapisidhash = hashString(f"{current_time} {sapisid} {origin}")

Pass the headers below when making requests

headers = {
    "Authorization": f"SAPISIDHASH {current_time}_{sapisidhash}",
    "Cookie": cookie_string_from_step_6,
    "x-origin": origin
}

Your requests are now authenticated requests.

Notes

Reason why incognito is required is because for some reason the SID, HSID and SSID cookies aren't included in the request headers and instead it relies on x-goog-authuser header value if you are already logged in. Clearing cache doesn't seem to include it either.

I'm not really good at actually implementing stuffs so would be grateful if you could add the feature into the existing code.

YouTube Stats for Nerds

Hello,
is there a way to extract the stats for nerds from a youtube video while you're watching it?
I am very interested in watching a video and messing around with my network parameters, introducing things like lag and packet loss to it and see exactly how the stats change there. Being able to extract those stats would be a great help.

🐛 Continuation for search method not working

I tried the client.search method to get the youtube search results. I could get the first 20 videos from the first request, however, I could not get the next videos by using the continuation where it returned from the first request.

I found the issue, I am guessing it might be a similar problem.

update:

modified code as below works for me.

Endpoint.SEARCH, 
# params=utils.filter( 
#     dict( 
#         continuation=continuation, 
#         ctoken=continuation, 
#     ) 
# ), 
body=utils.filter(
     dict( 
         query=query or "",
         params=params,
         continuation=continuation,
     ) 
),

	ClientContext(
	client_id=38,
	client_name="ANDROID_LITE",
	client_version="3.26.1",
	user_agent=USER_AGENT_ANDROID,
	),