GithubHelp home page GithubHelp logo

drkain / scrape-youtube Goto Github PK

View Code? Open in Web Editor NEW
101.0 8.0 27.0 224 KB

A lightning fast package to scrape YouTube search results

License: MIT License

JavaScript 52.48% TypeScript 47.52%
youtube scrape npm search playlist video movie nodejs discord bot

scrape-youtube's Introduction

I'm very sick at the moment and will not be very active on GitHub. I'm sorry for any inconvenience this may cause for people using my tools/libraries, if you need an urgent change then the best action might be forking the repo.

I abhor social media. You will not find me on Facebook, Instagram, TikTok or whatever is popular these days. The best and fastest way to contact me would be through discord (tag below) or email (on the left for logged-in users).

โ™ก Quick Info

  • ๐Ÿ“ซ How to reach me: Discord โ†’ drkain
  • ๐Ÿ”ญ Iโ€™m currently working on -
  • ๐ŸŒฑ Iโ€™m currently learning -
  • ๐Ÿ‘ฏ Iโ€™m looking to collaborate on -
  • ๐Ÿค” Iโ€™m looking for help with tidy-url
  • โšก I'm a fan of data hoarding and music
  • ๐ŸŽถ Favorite Band: Poets of the Fall

scrape-youtube's People

Contributors

damankarora avatar dependabot[bot] avatar drkain avatar ggresillion avatar nicoeg avatar tryhardhusky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scrape-youtube's Issues

No results for certain short search terms

Describe the bug
On a UK located server, the search term 'duggee' doesn't pull through any results. (same issue for many other terms, lego, cats, 'kids toys play' )

note: safesearch is enabled through the header options: { Cookie: "PREF=f2=8000000" }

To Reproduce
Steps to reproduce the behavior:

  1. On UK server, search 'duggee'
  2. No returned results available

Expected behavior
Duggee should pull through the 20 results listed here: https://www.youtube.com/results?search_query=duggee

image

Screenshots
image

Versions:

  • Package: 0.2.6
  • Node: 12.20.1

Additional context
'duggee' results pull through on a machine located in Pakistan.

TypeError: Cannot read property 'simpleText' of undefined

Describe the bug
Hi there! I got this error on my debugger TypeError: Cannot read property 'simpleText' of undefined from searching some channel.
Channel is hiding the information about subscribers

I think you need add try catch for this function. Thank you

const convertSubs = (channel: any): number => {
    const count = channel.subscriberCountText.simpleText.split(' ').shift();

    // If there's no K, M or B at the end.
    if (!isNaN(+count)) return +count;

    const char = count.slice(-1);
    let slicedCount = Number(count.slice(0, -1));

    switch (char.toLowerCase()) {
        case 'k':
            slicedCount *= 1000;
            break;
        case 'k':
            slicedCount *= 1e6;
            break;
        case 'b':
            slicedCount *= 1e9;
            break;
    }

    return ~~slicedCount;
};

To Reproduce
Steps to reproduce the behavior:

  1. Search query is UCadOPMAkX21lbuA1yj0JfCQ
  2. See error

Expected behavior
It should return the channel's information with subscriberCount is zero or just '-'

Screenshots

Versions:

  • Package: 2.1.14
  • Node: 10.21.0

Additional context
Add any other context about the problem here.

Search Pagination

Hi, It would be nice if we have a pagination option with this plugin.

Invalid subscriber count / video count

For some reason YouTube is returning incorrectly labelled data.
The videoCountText contains subscriber count and subscriberCountText contains the channel handle.
I don't modify this data so this is what's being returned by YouTube.

  videoCountText: {
    accessibility: { accessibilityData: [Object] },
    simpleText: '21 subscribers'
  },
  subscriptionButton: { subscribed: false },
  subscriberCountText: { simpleText: '@lespetitesmouettes8996' },

Certain searches yield no results

Describe the bug
Hello, following on from closed issue #33 - update seemed to work, but now I'm getting no results again.
'duggee' never works, 'lego' and 'cats' are intermittent.
UK server, using safe serach.

To Reproduce
App code here:

const express = require("express");
const { default: youtube } = require("scrape-youtube");

const cache = require("../helpers/cache");

const Router = express.Router();

Router.get("/search", async (req, res) => {
 try {
   let { q, cached } = req.query;
   const allowCached = cached !== "false";
   if (!q) return res.send({ error: "Invalid query!" });
   q = q.trim();
   if (allowCached) {
     const cachedResults = await cache.getResults(q);
     if (cachedResults && cachedResults.length !== 0) {
       return res.send({ results: cachedResults, cached: true });
     }
   }
   let freshResults = await youtube.search(
     q,
     { safeSearch: true },
     { safeSearch: true, headers: { Cookie: "PREF=f2=8000000" } }
   );
   freshResults = freshResults.videos.map((video) => ({
     id: video.id,
     title: video.title,
   }));
   res.send({ results: freshResults, length: freshResults.length });
   if (freshResults.length > 0) await cache.saveResults(q, freshResults);
 } catch (error) {
   console.error(error);
   res.send({ error: "Backend error." });
 }
});

module.exports = Router;

Please let me know if you require any further info.
Originally posted by @accesstechnology-mike in #33 (comment)

Versions:

  • Package: 2.0.9
  • Node: 12.20.1

Error: Failed to extract InitialData. The request may have been blocked.

Describe the bug
When searching for a video on YouTube it returns the below error:

Error: Failed to extract InitialData. The request may have been blocked.
    at /usr/src/app/node_modules/scrape-youtube/lib/index.js:252:27
    at new Promise (<anonymous>)
    at Youtube.extractInitialData (/usr/src/app/node_modules/scrape-youtube/lib/index.js:228:16)
    at Youtube.<anonymous> (/usr/src/app/node_modules/scrape-youtube/lib/index.js:187:55)
    at step (/usr/src/app/node_modules/scrape-youtube/lib/index.js:44:23)
    at Object.next (/usr/src/app/node_modules/scrape-youtube/lib/index.js:25:53)
    at /usr/src/app/node_modules/scrape-youtube/lib/index.js:19:71
    at new Promise (<anonymous>)
    at __awaiter (/usr/src/app/node_modules/scrape-youtube/lib/index.js:15:12)
    at Request._callback (/usr/src/app/node_modules/scrape-youtube/lib/index.js:174:112)
    at Request.self.callback (/usr/src/app/node_modules/request/request.js:185:22)
    at Request.emit (events.js:314:20)
    at Request.<anonymous> (/usr/src/app/node_modules/request/request.js:1154:10)
    at Request.emit (events.js:314:20)
    at IncomingMessage.<anonymous> (/usr/src/app/node_modules/request/request.js:1076:12)
    at Object.onceWrapper (events.js:420:28)
    at IncomingMessage.emit (events.js:326:22)
    at endReadableNT (_stream_readable.js:1252:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)

Before I could fix this by retrying searching.

To Reproduce
Use the package as you would normally.

Expected behavior
The package should not fail permanently on such requests.

Screenshots
https://prnt.sc/v7ysxj

Versions:

  • Package: 0.2.4
  • Node: 14.11.0

Additional context
none

TypeError when using exported search function

const youtube = require('scrape-youtube');

Perhaps obvious but youtube.search is no longer a function this way; any means of updating the example to help others understand I do enjoy this quite a bit! :)

const youtube = require('scrape-youtube').youtube;
youtube.search("hello-world").then((results) => {
    console.log(results);
});

Filter videos by hashtag

I know that you can filter whether the video is live or not and so on but it will be helpful to add features to filter only shorts too.

it will be easy since the shorts video has shorts in their url

proxy

hi
thanks for this source code
how use proxy if blocket my ip?

Convert subscriber count to number

Is your feature request related to a problem? Please describe.
The channel subscriber count is displayed as 381K instead in the search results. Some smaller channels will have the full number, IE: 127
This isn't a huge problem but it would be a nice feature to have.

Describe the solution you'd like
The string should be converted to a number. This should be included along with the original value in the results.

1.2M -> 1200000
381K -> 381000
1.5K -> 1500

Additional context
Example response:

{
    "id": "UC0hNui8bT7yV0Xb8w8YxjHw",
    "name": "Poets of the Fall (Official)",
    "link": "https://www.youtube.com/channel/UC0hNui8bT7yV0Xb8w8YxjHw",
    "verified": true,
    "thumbnail": "https://yt3.ggpht.com/ytc/AAUvwniodp2yJktm7UlbHChoA_yqHNDEAUUZlJOKj6Ltxw=s0?imgmax=0",
    "description": "Welcome to the official Poets of the Fall YouTube channel! Finnish rockers Poets of the Fall - singer Marko, guitarist Olli and ...",
    "videoCount": 98,
    "subscribers": "381K",
    "subscriberCount": 381000
}

Returns empty array instead of results

Describe the bug
When calling multiple times with any search query, it returns an empty array sometimes.

To Reproduce
Steps to reproduce the behavior:

  1. Try calling the example in readme multiple times

Add proxy support

Simply to bypass limits or blocks.

Proxied requests should be optional.
Options should be passed as arguments while searching, Something like:

search('OK Go', { limit : 5, proxy : { ... } })

Video descriptions are always blank

Video description snippets are always blank. It appears the render data now uses detailedMetadataSnippets instead of descriptionSnippet

Versions:

  • Package: 2.1.13

Search erroring out

Describe the bug
Every time I want to search something, it just says: 'Failed to extract video data. The request may have been blocked'

To Reproduce
Steps to reproduce the behavior:
Just do any search with any string

Versions:

  • Package: 1.1.0 or 2.0.1
  • Node: lts

Country filter

How to apply country filter for search query so that it will returns videos of that particular country

URL Support

Is your feature request related to a problem? Please describe.

I'm always frustrated when i add the url in query

Describe the solution you'd like
Add URL support for query, so someone can search the video from url.

Can I search by channel?

I want to use this to scrape all the videos from a channel, but there doesnt seem to be any option to select a channel? Searching using the channel name doesnt return all videos on that channel.

Invalid JSON returned on empty searches where itemSectionRenderer does not exist

Describe the bug
Hi there,

This seems to happen when no results are returned:
eg. https://www.youtube.com/results?search_query=Benzinga+Live&sp=EgJAAQ%253D%253D

To Reproduce
Steps to reproduce the behavior:

  1. Make a search for which no results are returned
  2. See the error TypeError: Cannot read property 'itemSectionRenderer' of undefined

Expected behavior
The search should return empty

Screenshots
The Bezinga search is the one that isn't returning at this time, and this is where the error is thrown.
image

Versions:

  • Package: 2.1.3
  • Node: 14.16

Additional context
I guess youtube changed their styling ๐Ÿ˜ข

Explicit Content Filter

Adult Content Filter
I looked on all sites and docs but was unable to find a filter for Explicit Content

You should add this feature

TypeError: Cannot read property 'contents' of undefined

Describe the bug
About 1/4 times my search return Failed to extract video data. Please report this issue on GitHub so it can be fixed..

To Reproduce
The function I'm running this in, and the search terms are below:

const search = async function (searchTerm, link, viewers, channel) {
    return youtube.search(searchTerm, { type: 'live' }).then(results => {
        const unembeddable = [
            'w_Ma8oQLmSM' //ABC
        ];
        let verified = results.streams.filter(result => result.channel.link == link);
        let popular = verified.filter(stream => stream.watching > viewers);
        popular.map(item => (item, (item.gridChannel = channel)));
        let embeddable = popular.filter(item => unembeddable.indexOf(item.id));
        return embeddable;
    });
};

let sky = search('Sky News live', 'https://www.youtube.com/user/skynews', 100, 'Sky News');
let cnn = search('CNN live', 'https://www.youtube.com/user/CNN', 100, 'CNN');
let euronews = search('Euronews live', 'https://www.youtube.com/user/Euronews', 100, 'Euronews');
let abcUS = search('ABC News live', 'https://www.youtube.com/user/ABCNews', 100, 'ABC News USA');
let cnaSingapore = search('CNA', 'https://www.youtube.com/user/channelnewsasia', 100, 'CNA');
let abcAUS = search('ABC News', 'https://www.youtube.com/channel/UCVgO39Bk5sMo66-6o6Spn6Q', 100, 'ABC News AUS');
let foxnews = search('Fox News live', 'https://www.youtube.com/user/FoxNewsChannel', 100, 'Fox News Channel');
let dw = search('DW News live', 'https://www.youtube.com/channel/UCknLrEdhRCp1aegoMqRaCZg', 100, 'DW');
let msnbc = search('msnbc live', 'https://www.youtube.com/user/msnbcleanforward', 100, 'MSNBC');
let aje = search('Al Jazeera live', 'https://www.youtube.com/user/AlJazeeraEnglish', 100, 'Al Jazeera English');
let france24 = search('France 24 live', 'https://www.youtube.com/user/france24english', 100, 'France24');
let nbcnews = search('NBC News live', 'https://www.youtube.com/user/NBCNews', 100, 'NBC News');
let cbsnews = search('CBS News live', 'https://www.youtube.com/user/CBSNewsOnline', 100, 'CBS News');
let pbsnews = search('PBS News', 'https://www.youtube.com/user/PBSNewsHour', 100, 'PBS Newshour');

Expected behavior
The data is returned without the error ๐Ÿ˜„

Versions:

  • Package: 2.0.7
  • Node: 12.18.3

Additional context
Happy to help reproduce this or log out better data to help ๐Ÿ‘

YouTube Playlist Search always only results in two videos

Describe the bug
When I'm trying to search for a playlist by given ID the right playlist is found but always only has two videos although it should have 15.

YT-Playlist I used: PLcpBBg8UA1Oxz7973OVcCzyBR-caH7hCQ
Result:

{
  id: 'PLcpBBg8UA1Oxz7973OVcCzyBR-caH7hCQ',
  title: "Quentin Tarantino's DJANGO UNCHAINED Official Soundtrack",
  link: 'https://www.youtube.com/playlist?list=PLcpBBg8UA1Oxz7973OVcCzyBR-caH7hCQ',
  thumbnail: 'https://i.ytimg.com/vi/OhlVBpEnjig/hqdefault.jpg',
    name: 'Django Unchained OST',
    link: 'https://www.youtube.com/user/UnchainedSoundtrack',
    verified: false,
    thumbnail: 'https://www.gstatic.com/youtube/img/originals/promo/ytr-logo-for-search_160x160.png'
  },
  videoCount: 15,
  videos: [
    {
      id: 'OhlVBpEnjig',
      title: 'Django (Luis Bacalov)',
      link: 'https://youtu.be/OhlVBpEnjig',
      duration: 174,
      thumbnail: 'https://i.ytimg.com/vi/OhlVBpEnjig/hqdefault.jpg'
    },
    {
      id: 'XAWpIQATcRE',
      title: 'The Braying Mule (Ennio Morricone)',
      link: 'https://youtu.be/XAWpIQATcRE',
      duration: 154,
      thumbnail: 'https://i.ytimg.com/vi/XAWpIQATcRE/hqdefault.jpg'
    }
  ]
}

Nevertheless great library by the way! I really like it ๐Ÿ‘

CORS Error

Describe the bug
I always get a CORS error when trying to scrape youtube from a frontend project

To Reproduce
Steps to reproduce the behavior:

  1. Use scrape-youtube in a react project (any frontend project should work)
  2. try to scrape the results for a search term
  3. See error in console:

Versions:

  • Package: Newest
  • Node: v14.3.0

Maybe try adding https://cors-anywhere.herokuapp.com?
I already tried that but I somehow still get the CORS errors.

Unconfirmed CPU spikes

Received report of extremely high CPU spikes from a single search. This will need to be confirmed before a fix is pushed to ensure the issue is resolved.

on the aws free tier ec2 it spikes up to >80%
and about the same on my 6700k

Sometimes search incorrectly returns no results

Describe the bug
Hy, I don't know if it's really a bug or if there is any reason to that but sometime there is no video in the result, but for the same search in youtube, there is some result

To Reproduce
Steps to reproduce the behavior:

youtube.search("l'enfant sauvage").then(results => {
    console.log(results)
});

So the expected result is something with like 20 videos
But i got {videos: Array(0), playlists: Array(0), streams: Array(0)}

With most of my research there is no error, but there is some with request containing a certain word like:
l'enfant sauvage
Hero Of War
Bombtrack
Incense
Thunderstruck

Versions:

  • Package: 7.0.8 (npm -v youtube-scraper)
  • Node: 15.2.1

Google ads are being scraped

Describe the bug
Google ads are sometimes scraped causing an incorrect search result. This is tricky to reproduce because the response is not consistent. A sample is required to properly filter out the results.

Expected behavior
Google ads should not be scraped.

Screenshots
1

Versions:

  • Package: 0.1.7
  • Node: v13.8.0

Additional context
Add any other context about the problem here.

TypeError: Cannot read property 'itemSectionRenderer' of undefined

Describe the bug
Running v2.0.5 but getting this error
Failed to extract video data. Please report this issue on GitHub so it can be fixed.

To Reproduce
Steps to reproduce the behavior:
yarn add scrape-youtube
include using promises:

    let output = await youtube.search(req.query.q, {
      page: req.query.page || 1
    })

yarn start

Results: Failed to extract video data. Please report this issue on GitHub so it can be fixed.

TypeError: The "listener" argument must be of type Function. Received type object

Getting error TypeError: The "listener" argument must be of type Function. Received type object

Any idea why?

Code is in react with a contentEditable div checking for keyboard esacpe command this way:

console.log(term) // correct value typed inside the contentEditable field

const checkReturns = (event) => {
        if(event.key === 'Enter'){

            const options = {
                requestOptions: {
                    headers: {
                        Cookie: 'PREF=f2=8000000'
                    }
                }
            };

            youtube.search(term, options).then(results => {   // this throws the error at .search
                console.log(results.videos)
            });
        }
    }

The react return is as follow:

return (
        <div id="namebox" className="areatext list-group">
            <div
                ref={ref}
                id="first_line"
                contentEditable="true"
                style={{ minHeight: '26px' }}
                onInput={(e) => setTerm(e.currentTarget.textContent)}
                onKeyPress={(event) => checkReturns(event)}>
            </div>
        </div>
    )

Get the ID of a youtube channel

Hello. I'm writing here because I don't know exactly where to do it.

I'm trying to find a way to get the ID of a YouTube channel, through your "npm".

I know that your "npm" is used to obtain the information of the videos, streams, and playlists of a youtube channel, but I don't know if it is possible to get this type of information

No results for many short terms

These terms are failing regularly on my app;

'cat'
'cats'
'hey duggee'
'duggee'
'kids toys play'
'toys'
'roblox'
'lego'

Any idea would these might fail, yet most requests work fine?

Learn semantic versioning please

Is your feature request related to a problem? Please describe.
I'm always frustrated when developers increment minor version, but change how API works

Describe the solution you'd like
The change of 8 days ago should be an 2.0.0 because it breaks previous programs using previous version

Additional context
https://semver.org

[Feature request] Support for new channel handles

Currently the old channel ID/URL is returned (still working) but YouTube is using @ handles.
This should be included in the 'channel' object along with the old ID.
Example:

{
    "id": "UCYDD7WruLEgEBfjxeor48aw",
    "name": "The Heavy",
    "link": "https://www.youtube.com/channel/UCYDD7WruLEgEBfjxeor48aw",
    "verified": true,
    "thumbnail": "https://yt3.ggpht.com/oMj9hZlJsbS-JI4MUZnwPDcfwIW36KhYeA5CnOG3_5lwHPnmaY_FL5hjBuUFBo32esrBmoOiZKI=s0?imgmax=0"
}

..will become:

{
    "id": "UCYDD7WruLEgEBfjxeor48aw",
    "name": "The Heavy",
    "handle": "@TheHeavyOfficial",
    "link": "https://www.youtube.com/@TheHeavyOfficial",
    "verified": true,
    "thumbnail": "https://yt3.ggpht.com/oMj9hZlJsbS-JI4MUZnwPDcfwIW36KhYeA5CnOG3_5lwHPnmaY_FL5hjBuUFBo32esrBmoOiZKI=s0?imgmax=0"
}

ID should remain as it is unique/locked, unlike handles.

Feature: Test runner

Preferably something like ava. Doesn't need to be too complex, just a quick search of each type to verify the response is correct.

This issue is open for anyone to add if they feel like contributing, otherwise I'll add it when I get the time.

Reintroduce support for playlists, live streams, ect

Support will need to be added to search for channels, playlists, movies and live streams.
Each type should have their own interface in the interfaces file, Extending the default Result interface when needed.

Channels

  • name
  • link
  • verified
    • If the channel has a verified badge
  • thumbnail

Live Streams

  • watching
    • Number of users watching the stream
  • duration
    • How long (in seconds) the stream has been running for

Playlists

  • videoCount
    • The number of videos in the playlist
  • videos
    • An array of videos in the playlist. If possible the Video interface should be used.

Movies

These are just the Result interface, but the channel data should be static. All YouTube movies have the same channel information.

Progress:

  • Channels
  • Playlists
  • Live Streams
  • Movies

[Question] Fetching Video ID

is there any way to fetch the video id of the first result? i JUST wanna log the ID of the first video result

Language

Is your feature request related to a problem? Please describe.
Apparently the json I'm getting now is in Ukrainian and I don't know if I can change it to English

Describe the solution you'd like
Changing the language according to preferences

Additional context

URL Support

Add URL Support
My friend said this module doesn't support url query. So, add query support for url.

youtube.search is not a function

Describe the bug
When trying to use search it returns with youtube.search is not a function.

Here is the code below

import youtube from 'scrape-youtube';
const getYouTubeDetails = await youtube.search(`${useSong.artist} - ${useSong.title}`);
console.log(getYouTubeDetails);

Returns with error:

UnhandledPromiseRejectionWarning: TypeError: youtube.search is not a function
    at Timeout._onTimeout (file:///C:/Users/ctouc/Documents/GitHub/billboard-scraper/youtube.js:25:53)

search is not a function

In lastest (2.1.10) version "search" function not working, when i install earlier version (2.1.9) it work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.