GithubHelp home page GithubHelp logo

rsaved's Introduction

rsaved

A Python utility for mirroring personal reddit.com/saved feeds, and the content on these feeds. This project is in early development, and being modified in wild, major, inconsistent ways.

Requires youtube-dl, requests, ffmpeg and Python 3.6 or newer. bottle.py is required and packaged with this repo.

It is in this author's interest to avoid forcing a non-trivial amount of effort on the user to run this program, to the point of foregoing the use of official APIs, or anything that requires a specially generated API key. If you are able to access it in your web browser: You should be able to access it locally without any extra effort.

How to use

(Hopefully)

Clone repo, and create a user for your reddit account using the saved.json feed URL found on this page.

$ python3 create_user.py https://www.reddit.com/saved.json?feed=558862fc6069139f1b02bbb226a9cfcdaa0207cf&user=saucecode

If done correctly, it will create some folders under you username in the user folder. Next you need to make a local copy of all your saved posts.

$ python3 download_user.py [your reddit username]

This will start downloading all your saved reddit posts (but not the content of these posts). It takes me around 15 seconds to pull close to 1000 of them. You should see some new files appearing in your user folder. Once this is done you can run

$ python3 review_user.py [your reddit username]

This (for now) creates a file index_review.txt in your user folder. If it shows an approximate view of what your own reddit.com/saved page looks like, then you know its done its job.

If that all worked, you're all set to start downloading the actual pictures/videos. Beware, this can take some time, especially if you save a lot of videos.

$ python3 scrape_for_user.py [your reddit username]

You can configure a few aspects of this process in the rsaved.json and config.json files created in your user's folder. Not everything is implemented.

You can now view your local mirror by running a built in web server! Just run

$ python3 server.py

to launch a bottle.py server on port 8080. You can then go to http://localhost:8080/ and start browsing!

Configuration Files

Every user gets two configuration files: rsaved.json and config.json.

rsaved.json controls what you end up downloading. config.json controls how you download it. In the config.json you can set a custom User-Agent and specify a proxy (only SOCKS5 tested - HTTP/HTTPS will probably work).

index.pickle.gz

This is where a lot of the magic happens - this file (once updated) contains the information about every saved post for this user. Let me tell you how to use it.

import rsaved
index = rsaved.load_index('your_username') # returns the content of index.pickle.gz

# print the URL of all the save posts from /r/aww.
for item in index:
    if item['data']['subreddit'] == 'aww':
        print(item['data']['url'])

# if you're not yet familiar with the reddit object structure, familiarize yourself now
import json
print( json.dumps(index[0], indent=4) )

Why?

  • Because reddit won't display more than 1000 posts from your saved feed. Ever.
  • So that it can be searched, filtered, and analysed.
  • So that you can rip content which may one day be deleted.
  • And most importantly, so that it can be searched.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.