GithubHelp home page GithubHelp logo

smokincaterpillar / trufflepig Goto Github PK

View Code? Open in Web Editor NEW
19.0 4.0 4.0 1.65 MB

A Steemit Curation Bot based on Natural Language Processing and Machine Learning.

Home Page: https://steemit.com/@trufflepig

License: Other

Jupyter Notebook 43.98% Python 56.02%

trufflepig's Introduction

TrufflePig

A Steemit Curation Bot based on Natural Language Processing and Machine Learning

test Coverage Status

Steemit can be a tough place for minnows, as new users are often called. I had to learn this myself. Due to the incredible number of new posts published every minute, it is exceptionally difficult to stand apart from the crowd. Nice, well-researched, and well-crafted posts from minnows are often overlooked. Minnows do not benefit from influential followers to upvote their high-quality posts. Their contributions are lost long before any whale may notice them and turn these posts into trending topics.

User-based curation does have merrit and it is possible that posts receive the traction and recognition they deserve. I believe there is a way to support the Steemit content curators. A way in which high-quality content no longer goes unnoticed. I have developed a curation bot called TrufflePig to do exactly this using Natural Language Processing and Machine Learning. The deployed bot can be found here: https://steemit.com/@trufflepig

The Concept

The idea is to use well-received posts as training examples to teach a Machine Learning Regressor (MLR) what high-quality Steemit content looks like. Once trained, the Machine Learning Regressor is used to identify high-quality posts which were missed by the curation community. These posts which receive less payment than they deserved are dubbed truffles.

The general idea of the system is as follows:

  1. I train a Machine Learning regressor (MLR) using Steemit posts as inputs and the corresponding Steem Dollar (SBD) reward and the number of votes as outputs.

  2. The MLR learns to predict potential payouts for new Steemit posts.

  3. I compare the predicted payout with the actual payout of these recent Steemit posts (between 2 and 26 hours old). When the Machine Learning model predicts a high reward, where such a reward was not actually assigned to the post, I classify this post as an overlooked truffle.

The Implementation

The Machine Learning Regression model is trained on posts older than 7 days which have already been paid. Features include spelling errors, post length, and readability scores. A post's content is modelled as a Latent Semantic Indexing projection. The final regressor is a multi-output Random Forest.

The bot uses the official Steem Python library to scrape data from the steemit blockchain and to post a toplist of the daily found truffles using its trained model.

The bot works as follows:

  1. Older data is scraped from the blockchain (see bchain.getdata.py) or loaded from disk if possible.

  2. The scraped posts are filtered and preprocessed (see preprocessing.py).

  3. A model is trained on the processed data if one does not yet exist (see model.py) or is otherwise loaded from disk.

  4. More recent data is scraped and checked for truffles using this trained model.

  5. The bot publishes a toplist of truffles on which it both upvotes and comments (see bchain.postdata.py).

Installation and Execution

Clone the project directory:

$ git clone https://github.com/SmokinCaterpillar/TrufflePig.git

Add the project directory to your PYTHONPATH, e.g.

$ export PYTHONPATH=$PYTHONPATH:<path_to_project>

Start the bot using the provided main.py driver:

python main.py

You can manually set the time the bot considers as now via the --now configuration flag.

--now='2018-01-01-11:42:42'.

By default, the bot will not post to the blockchain. To enable posting use the --broadcast flag.

The bot's account information requires you to populate environment variables STEEM_ACCOUNT, STEEM_POSTING_KEY, and STEEM_PASSWORD. STEEM_PASSWORD is optional and used only to encrypt the wallet file. The password should not be your Steemit masterpassword.

Open Source Usage

The bot is open source and can be freely used for non-commercial (!) purposes. Please check the LICENSE file.

trufflepig

TrufflePig

(The bot's avatar has been created using https://robohash.org/)

trufflepig's People

Contributors

smokincaterpillar avatar wyolland avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

trufflepig's Issues

Troubles installing

I am trying to run main.py, but when I do I get the error:
ImportError: No module named trufflepig.bchain.getdata
I feel like I missed a step, but I added the PYTHONPATH to my bash profile like specified.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.