GithubHelp home page GithubHelp logo

archivers's Introduction

ARchivers - repository for TwittAR and ARticle - Two tools which record tweets and articles - permanently storing them on Arweave via Bundlr

To run either TwittAR or ARticle, you need an Arweave wallet - more specifically an Arweave wallet file. You need to copy the contents of this wallet file into the example (example.wallet.json) wallets' "arweave" section.

This assumes you're running Chromium on port 3000

Run yarn to install dependencies.

TwittAR

To run TwittAR you need Twitter API keys, which you can get via a Twitter developer account. You will also need elevated API access. Follow this answer here, and fill in the relevant fields in example.wallet.json:
https://stackoverflow.com/a/6875024/18012461 and then rename it to wallet.json.

Then in the developer portal, request elevated access - this should be approved almost immediately.

ARticle

For ARticle, you need a NewsAPI API key - which you can get at https://newsapi.org.
Add this to your wallet.json (or example.wallet.json - rename to wallet.json)
(it can be run without as an external import - just don't invoke updateNewsApi).

Tweak config.json as required, adding in keyterms - tweak instances to about 80% of your MAX_CONCURRENT_SESSIONS value.

If you are noticing too many re-uploads of unchanged data, or that the system is not responding to changes, change the difference value in the config - lower = more sensitive to changes.
Remember to change the queryID value in the configuration to distinguish your collection from others - you can always filter by owner address, but this allows for more fine grained control.

Running

Install PM2 globally (use elevated terminal):

yarn global add pm2

Build the project:

yarn build

Start the project (TwittAR and ARticle):

pm2 start ARchiver.ecosystem.config.js

Running Chromium with Docker

Docker command to create headless chrome host: docker run --shm-size=4096m -e KEEP_ALIVE=true -e MAX_CONCURRENT_SESSIONS=60 -e MAX_QUEUE_LENGTH=400 -e CONNECTION_TIMEOUT=180000 -p 3000:3000 --restart always -d --name bc browserless/chrome

Tweak the MAX_CONCURRENT_SESSIONS value as required - higher = more load but a higher chance of content being archived (download requests are dropped if the queue gets too full).

archivers's People

Contributors

jessetherobot avatar joshbenaron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

archivers's Issues

Archive Weibo content

For the archiving of Weibo, it looks like Weibo is not open as Twitter, and it only offers a small number of APIs. I assume it will be hard to use these APIs or make a web crawler.

I think it will be a great idea to parse the accounts on Twitter that manually repost some of the contents from platforms of Weibo and WeChat, such as @weibo_read and @TGTM_Official.

Their contents may somehow be "biased" since they manually select them before posting. But the parsing of them should be very easy. It just need to change the config file for Twitter archive by adding their user id into it, and archive all the contents they post.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.