GithubHelp home page GithubHelp logo

eben0 / snooshift Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 2.0 46 KB

JavaScript wrapper library for Pushshift with Snoowrap support.

License: MIT License

TypeScript 69.36% JavaScript 30.64%
reddit-api pushshift snoowrap

snooshift's Introduction

SnooShift

JavaScript wrapper library for Pushshift with Snoowrap support.

Install

npm i -S snooshift

Searching Comments

import { SnooShift } from "snooshift";

// create new object
const snoo = new SnooShift();

// search parameters
// https://github.com/pushshift/api#search-parameters-for-comments

// search comments by author
const searchParams = {
  author: "eben0",
};

// send request
snoo.searchComments(searchParams).then((comments) => {
  console.log(comments);
});

Get Single Comment

// get single comment by id
snoo.getComment("gof4uys").then((comment) => {
  console.log(comment);
});

Searching Submissions

// search parameters
// https://github.com/pushshift/api#search-parameters-for-submissions

// search submissions by author
const searchParams = {
  author: "eben0",
};
snoo.searchSubmissions(searchParams).then((comments) => {
  console.log(comments);
});

Get Single Submission

// get single submission by id
snoo.searchSubmissions("lrufxe").then((submission) => {
  console.log(submission);
});

Interacting with Reddit

You can reply, upvote and interact with reddit using Snoowrap object. You must set up your reddit api credentials to do so.

import { SnooShift } from "snooshift";

// list of supported credentials:
// https://github.com/not-an-aardvark/snoowrap#examples
const credentials = {
  userAgent: "put your user-agent string here",
  clientId: "put your client id here",
  clientSecret: "put your client secret here",
  refreshToken: "put your refresh token here",
};

const snoo = new SnooShift(credentials);

// get comment and reply/upvote/etc...
snoo.getComment("gof4uys").then((comment) => {
  comment.reply("My awesome reply").then(value);
  comment.upvote().then(value);
  comment.delete().then(value);
});

Querying Elasticsearch

You can directly query the elasticsearch server if you are familiar with syntax.

import { SnooShift } from "snooshift";

const snoo = new SnooShift();

// elasticsearch query
// this query searches for all author's data ordered by created_utc
const query = {
  query: {
    term: { author: "eben0" },
  },
  sort: {
    created_utc: "desc",
  }
};

// searches for author's comments
snoo.elasticComments(query).then((result) => {
  console.log(result.hits.hits[0]._source);
});

// searches for author's submissions
snoo.elasticSubmissions(query).then((result) => {
  console.log(result.hits.hits[0]._source);
});

snooshift's People

Contributors

eben0 avatar raphael0010 avatar wasserholz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

snooshift's Issues

Cannot read sort_type `new` posts for `/r/askscience`

Thanks for making a useful wrapper!

I noticed when I try to get new posts on some subreddits, including /r/askscience and others, I get zero posts:

const searchParams = {
	size: 25,
	sort_type: 'new',
	subreddit: 'askscience'
};
const res = await snoo.searchSubmissions(searchParams); // zero results

I have also tried setting sort_type to created_utc.

Any insight on why this would be the case?

Incorrect upvote_ratio and score on submissions

Title is pretty straight forward on the issue.

My use case: I am scrapping meme subreddits and using upvotes/downvotes to create a way to bet on whether a meme is dank or not (at least according to the subreddit it was posted in). I have a system fully functional for this in python. I wanted to move the system to TS for many quality of life improvements. But sadly, the submissions returned by snooshift have incorrect upvote_ratio and score which are 99% of the time both a value of 1, hence making the downvotes 0. The properties on snooshift's Submission interface is nearly identical to the Submission class in python praw package. I literally use the same named properties, upvote_ratio and score.

To reproduce, simply use snooshift.searchSubmissions and view the upvote_ratio and score across several Submissions

I attach my code snippet below. I would REALLY LOVE to be able to use snooshift for this instead of python. Much more performative and clean.

@Injectable()
export class WebScraperService {
  private readonly logger = new Logger(WebScraperService.name);
  private readonly subreddits = ["dankmemes", "memes"];
  private readonly exts = [".jpg", ".jpeg", ".png"];
  private readonly snoo = new SnooShift();
  private redditScraperMutex: boolean = false;
  private imgflipScraperMutex: boolean = false;

  constructor(
    private readonly redditMemeService: RedditMemeService,
    private readonly redditorService: RedditorService,
    private readonly imgflipTemplateService: ImgflipTemplateService
  ) {}

  // @Cron(CRON_SCHEDULES[ECronJobRegistry.RedditMemeScrapper], { name: ECronJobRegistry.RedditMemeScrapper })
  async redditorMemeScrapper() {
    if (this.redditScraperMutex) return;
    else this.redditScraperMutex = true;
    for (const subreddit of this.subreddits) {
      this.logger.log(`RUNNING REDDIT MEME SCRAPPER: r/${subreddit}`);
      try {
        await this.scrapeSubReddit({ subreddit });
      } catch (error) {
        this.logger.error(error.message, error.stack);
      }
    }
    this.logger.log("DONE REDDIT MEME SCRAPPER");
    this.redditScraperMutex = false;
  }

  private async scrapeSubReddit({ subreddit, gracePeriod = 7 }: { subreddit: string; gracePeriod?: number }) {
    const endAt = dayjs().startOf("h").subtract(gracePeriod, "d");
    const maxCreatedAt = await this.redditMemeService.repo.max("createdAt");
    let startAt = maxCreatedAt && maxCreatedAt.result ? dayjs(maxCreatedAt.result) : dayjs().startOf("d").subtract(62, "day");
    while (startAt < endAt) {
      this.logger.log(`SCRAPPING STARTING AT ${startAt}`);
      const after = startAt.unix(),
        before = endAt.unix();
      const unfilteredSubmissions = (await this.snoo.searchSubmissions({
        subreddit,
        after,
        before,
        size: 100,
        stickied: false,
      })) as Submission[];
      // console.log("unfilteredSubmissions", unfilteredSubmissions);
      // throw new Error("check");
      const submissions = unfilteredSubmissions.filter(({ url }) => this.exts.some((ext) => url.endsWith(ext)));
      const usernames = submissions.map(({ author_fullname }) => author_fullname);
      const redditors = await this.redditorService.repo.find({ where: { username: In(usernames) } });
      const usernameToOldRedditor = redditors.reduce<Record<string, RedditorEntity>>(
        (prev, redditor) => ({ [redditor.username]: redditor, ...prev }),
        {}
      );
      const urls = submissions.map(({ url }) => url);
      const redditMemes = await this.redditMemeService.repo.find({ select: ["url"], where: { url: In(urls) } });
      const urlSet = new Set(redditMemes.map(({ url }) => url));
      const dedupSubmissions = submissions.filter(({ url }) => url && !urlSet.has(url));
      const usernameToNewRedditor = dedupSubmissions
        .filter(({ author_fullname }) => !usernameToOldRedditor[author_fullname])
        .reduce<Record<string, RedditorEntity>>(
          (prev, { author_fullname }) => ({ [author_fullname]: this.redditorService.repo.create({ username: author_fullname }), ...prev }),
          {}
        );
      await this.redditorService.repo.save(Object.values(usernameToNewRedditor));
      const usernameToRedditor = { ...usernameToOldRedditor, ...usernameToNewRedditor };

      const urlToNewRedditMeme = dedupSubmissions.reduce<Record<string, RedditMemeEntity>>(
        (prev, { id, num_comments, title, score, created_utc, upvote_ratio, url, author_fullname }) => ({
          [url]: this.redditMemeService.repo.create({
            redditId: id,
            numComments: num_comments,
            upvotes: score,
            createdAt: dayjs(created_utc * 1000).toDate(),
            downvotes: Math.round(score / upvote_ratio) - score,
            title,
            url,
            upvoteRatio: upvote_ratio,
            redditorId: usernameToRedditor[author_fullname].id,
            subreddit,
          }),
          ...prev,
        }),
        {}
      );
      await this.redditMemeService.repo.save(Object.values(urlToNewRedditMeme));
      startAt = dayjs(1000 * Math.max(...submissions.map(({ created_utc }) => created_utc)));
      await new Promise((r) => setTimeout(r, 5000));
    }
  }
 }

script tag usage?

Is there a a way to get this package from jsdeliver or unpkg to use with the <script> tag on page instead of having to get it from npm?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.