qut-digital-observatory / youte Goto Github PK

View Code? Open in Web Editor NEW

26.0 3.0 5.0 4 MB

Command-line utility to help researchers collect video metadata from Youtube API

Home Page: https://youte.readthedocs.io

License: MIT License

Python 100.00%

social-media-analysis social-science cli youtube youtube-api-v3 youtube-data-api youtube-data-scraping

youte's People

Contributors

Stargazers

Watchers

Forkers

jwyg yunusergen bdmarcus mathiasfls adam-s-tech

youte's Issues

Suggestion for docs: Explanation for NA values in channels.like_count and videos.like_count

I noticed that the like_count for some channels and videos was NA, and it turns out this is correct (channels have like count hidden, and videos don't seem to have like count showing on YouTube), but it might confuse people. I have a minor suggestion to put a note about this in the documentation just so people understand what's happening 😸

Search videos based on language and location

Allow the ability to filter searches by language and location

Suggestion: instead of history.db use <output>_history.db

Make it easier to know which history db is associated with which file by default, to make resuming clearer.

Requirements list update

Requirements list in setup.cfg is out of date, and in trying to find and install the requirements, I've discovered there's a problem with the dotenv package, which is sadly no longer installable and seems to be unmaintained?

Here are some things I'd recommend to sort through this and make similar issues easier to catch in future:

Update the requirements list in setup.cfg
set a minimum version for requirements in setup.cfg (format is youtupy >= 0.0.1) - this helps with troubleshooting. Just start with the versions you have currently installed
we may have to find an alternative for dotenv. Maybe someone has forked the project in past as it seems to have been unmaintained for a while? Can give it a few more days to see if the maintainer responds to the above issue, if they do respond they might be willing to accept community fixes for the issue.
I think this is a good time to set up nox! Even if you only have one test written that does nothing, set up nox now and get into the habit of running a nox test session after making changes - it will test the installation process as well as running the tests as you add them. You can look at the noxfile.py in any of our libraries and just copy from them. Happy to help with this. Seeing as you've been reading up on a bunch of clean code stuff, I can also show you the linting setup I've been using, that also goes through nox - I'd be interested in hearing any other ideas or practices you've encountered in your research!

Handle video IDs that start with dashes (related-to option)

First, thank you for all your incredible work on youte - it's been really useful.

I wanted to flag a minor issue with video IDs that start from dashes (for example, -Q7G5zfSal8). I would like to collect videos related to this video, but youte interprets it as an argument. I tried adding another dash or using quotation marks but could not get it to be treated as a value for the first positional argument. Instead, I always get an error Error: No such option: -Q.

Any help or advice would be much appreciated!

Handle duplicated API key in config file

Raise a warning or error if a duplicate API key is added to config

Add logging configuration

Add the ability to input item IDs directly to the CLI as another option from just using text file

list-comments and hydrate currently require item IDs to be in a text file. Making it possible to manually add the IDs straight into the terminal would make it easier to quickly hydrate and list comments without putting IDs in a text file

update docs to include using youte as a library

Have `--get-id` flag of `search` produce `.csv` or `.txt` file

Currently, when using the --get-id flag of the search function to retrieve IDs only, a .json or .jsonl file will be returned. It would be a great to have the IDs exported/saved as .txt or .csv so that this file could then be used later with the hydrate function.

Add export functionality

CSV, Excel, TSV

Add an option to export each of processed tables, or join them together

Request user to input YOUTUBE_API_KEY if they don't have one already

Ask user to input YOUTUBE_API_KEY if there isn't one already
Storing multiple YOUTUBE_API_KEYs

related command

The relatedToVideoId parameter retrieves a list of videos that are related to the video that the parameter value identifies. The parameter value must be set to a YouTube video ID and, if you are using this parameter, the type parameter must be set to video.

Note that if the relatedToVideoId parameter is set, the only other supported parameters are part, maxResults, pageToken, regionCode, relevanceLanguage, safeSearch, type (which must be set to video), and fields.

a different way to save progress than using config file

Limit search by pages

Add a --limit to specify how many pages of search results to retrieve

add full archive workflow

something like archive to extract search results, video and channel metadata, and comments, and put all in an SQL database

add function to extract transcript

A way to persist quota data across sessions instead of sqlite db

Add `list-most-popular` videos command

The chart parameter identifies the chart that you want to retrieve.

Acceptable values are:

mostPopular – Return the most popular videos for the specified content region and video category.

add batching to increase results size

especially when the date range is large and there are potential more results than can be contained in YouTube's standard 13 result pages

List-comments keeps retrieving the same page of comments in a loop

I have a video.id text file with the following ID as the only line: 1BCmx_ICbRU, corresponding to a video with 163 comments: https://www.youtube.com/watch?v=1BCmx_ICbRU

If I run the following command, I just get the same page of 100 comments retrieved over and over again, with no termination:

youtupy list-comments video.id knitting_1BCmx_ICbRU.json -v

I would expect this command to return very quickly as there should only be 2 pages of results in total.

Add list-comments by channel IDs and all threads related to channel IDs

The allThreadsRelatedToChannelId parameter instructs the API to return all comment threads associated with the specified channel. The response can include comments about the channel or about the channel's videos.

The channelId parameter instructs the API to return comment threads containing comments about the specified channel. (The response will not include comments left on videos that the channel uploaded.)

pydantic, orm_mode error ? API ?

I have an error issue...
So I try refresh a new API (v3)

.../pipx/venvs/youte/lib/python3.11/site-packages/pydantic/_internal/_config.py:317: UserWarning: Valid config keys have changed in V2:
* 'orm_mode' has been renamed to 'from_attributes'
  warnings.warn(message, UserWarning)
INFO | Getting API key from config file.
INFO | Getting page 1
^C
Aborted!

Do i need to upgrade pydantic ?

My youte version...

youte --version 
youte, version 2.4.1

Thanks

qut-digital-observatory / youte Goto Github PK

youte's People

Contributors

Stargazers

Watchers

Forkers

youte's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs