archivers's Introduction

ARchivers - repository for TwittAR and ARticle - Two tools which record tweets and articles - permanently storing them on Arweave via Bundlr

To run either TwittAR or ARticle, you need an Arweave wallet - more specifically an Arweave wallet file. You need to copy the contents of this wallet file into the example (example.wallet.json) wallets' "arweave" section.

This assumes you're running Chromium on port 3000

Run yarn to install dependencies.

TwittAR

To run TwittAR you need Twitter API keys, which you can get via a Twitter developer account. You will also need elevated API access. Follow this answer here, and fill in the relevant fields in example.wallet.json:
https://stackoverflow.com/a/6875024/18012461 and then rename it to wallet.json.

Then in the developer portal, request elevated access - this should be approved almost immediately.

ARticle

For ARticle, you need a NewsAPI API key - which you can get at https://newsapi.org.
Add this to your wallet.json (or example.wallet.json - rename to wallet.json)
(it can be run without as an external import - just don't invoke updateNewsApi).

Tweak config.json as required, adding in keyterms - tweak instances to about 80% of your MAX_CONCURRENT_SESSIONS value.

If you are noticing too many re-uploads of unchanged data, or that the system is not responding to changes, change the difference value in the config - lower = more sensitive to changes.
Remember to change the queryID value in the configuration to distinguish your collection from others - you can always filter by owner address, but this allows for more fine grained control.

Running

Install PM2 globally (use elevated terminal):

yarn global add pm2

Build the project:

yarn build

Start the project (TwittAR and ARticle):

pm2 start ARchiver.ecosystem.config.js

Running Chromium with Docker

Docker command to create headless chrome host: docker run --shm-size=4096m -e KEEP_ALIVE=true -e MAX_CONCURRENT_SESSIONS=60 -e MAX_QUEUE_LENGTH=400 -e CONNECTION_TIMEOUT=180000 -p 3000:3000 --restart always -d --name bc browserless/chrome

Tweak the MAX_CONCURRENT_SESSIONS value as required - higher = more load but a higher chance of content being archived (download requests are dropped if the queue gets too full).

archivers's People

Contributors

Stargazers

Watchers

archivers's Issues

Archive Weibo content

For the archiving of Weibo, it looks like Weibo is not open as Twitter, and it only offers a small number of APIs. I assume it will be hard to use these APIs or make a web crawler.

I think it will be a great idea to parse the accounts on Twitter that manually repost some of the contents from platforms of Weibo and WeChat, such as @weibo_read and @TGTM_Official.

Their contents may somehow be "biased" since they manually select them before posting. But the parsing of them should be very easy. It just need to change the config file for Twitter archive by adding their user id into it, and archive all the contents they post.

Recommend Projects

irys-xyz / archivers Goto Github PK

archivers's Introduction

ARchivers - repository for TwittAR and ARticle - Two tools which record tweets and articles - permanently storing them on Arweave via Bundlr

TwittAR

ARticle

Running

Running Chromium with Docker

archivers's People

Contributors

Stargazers

Watchers

Forkers

archivers's Issues

Archive Weibo content

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs