A crawler for related youtube channels. Why not? Just playing with the data, maybe nothing major (or maybe something very major).
It works by performing BFS on youtube channels starting from TotalBiscuit channel. This will cover ~3 million channels.
Once crawling is done the final file is written into youtube-user.json
, which
will take ~700MB on the hard drive. Each record has the following format:
{
"id": "UCy1Ms_5qBTawC-k7PVjHXKQ",
"title": "TotalBiscuit, The Cynical Brit",
"related": [
"UC3kJdy9_bXFg8flaSY3RAcQ",
"UCSvQyDawUyfXzrSq4dTVraQ",
"UCvk51_ZhXooIukEKlbw08rA",
"UCqZ0rqkoUeYlcxlUyqSgpdg",
"UCCbfB3cQtkEAiKfdRQnfQvw",
"UC_ufxdQbKBrrMOiZ4LzrUyA",
"UC90ThxjTNaHaqyPVtfyZ4hw",
"UCWCw2Sd7RlYJ2yuNVHDWNOA",
"UCS2OAdHoLt-9T6cG9A2H49Q"
],
"subscribers": "2,104,917",
"relatedTitle": "Check out our colleagues"
}
Note: YouTube has couple related
sections. By default this crawler will take
only the first one (most of the time the first section is generated by users).
MIT