bbepis / hayden Goto Github PK
View Code? Open in Web Editor NEWUltra-low resource 4chan/altchan thread and board archiver
License: MIT License
Ultra-low resource 4chan/altchan thread and board archiver
License: MIT License
As in bibanon/eve#27, hayden doesn't check for and create the index_counters
table on startup.
Hi, could you release a compiled version? i dont know how to computer but would like to archive some boards, thanks!
Looking to archive some boards using jschan (https://gitgud.io/fatchan/jschan/), any chance support for it could be added?
Example sites: https://zzzchan.xyz/ and https://94chan.org/
API examples:
https://zzzchan.xyz/v/catalog.json
https://zzzchan.xyz/v/thread/204576.json
Thanks!
There is very little documentation on how to run it, which has made me unable to do so properly.
Right now there's no way to view reports from the frontend.
Even a really barebones page would be better than having to manually check the database from a CLI which I do right now.
A feature I'd like is a configurable minimum age threshold that posts have to surpass before being archived.
Setting this to something like 6 hours would essentially remove the need to manually moderate an archive since it gives jannies enough time to delete illegal content before it's archived.
Hello, I have recently been trying to archive 2chen's /tv/ board, and had to restart the program since it got stuck while archiving a thread, and now it's doing something that looks like this:
[9/9/2022 3:31:01 AM] [Image] [10/22043]
[9/9/2022 3:31:48 AM] [Image] [20/22033]
[9/9/2022 3:32:20 AM] [Image] [30/22023]
Which is taking a really long time. Is there a way to skip this and just continue archiving the board? At the rate it's going now, it's not going to be finished for at least a day.
Hi @bbepis, I have just started using the Hayden Scraper again for archiving select 4chan threads with the Hayden database schema. It is working very well ๐ Thank you for this great tool!
I wanted to share some information which could help other users (and myself) in the future.
Hayden Version: (not sure where to find this)
OS: Ubuntu Server 22 LTS
MySQL: 8.0.34
.NETCore: 6.0.18
[Unit]
Description=Hayden Scraper
After=network-online.target mysql.service
[Service]
Type=simple
ExecStart=/mnt/hayden_asagi/Hayden scrape /mnt/hayden_asagi/config.json
WorkingDirectory=/mnt/hayden_asagi
User=m
Group=www-data
Restart=always
RestartSec=600
StandardOutput=null #append:/home/user/hayden_info.log
StandardError=append:/home/user/hayden_error.log
SyslogIdentifier=hayden
[Install]
WantedBy=multi-user.target
{
"source": {
"type": "4chan",
"boards": {
"g": {
"AnyFilter": "battlestation",
"AnyBlacklist": "stable diff|dall.*e.*3"
},
"ck": {}, # download everything
},
"apiDelay": 5.5,
"boardScrapeDelay": 45
},
"readArchive": false,
"proxies": [],
"consumer": {
"type": "Asagi",
"databaseType": "MySQL",
"connectionString": "Server=127.0.0.1;Port=3306;Database=hayden;Uid=USER;Pwd=PASSWORD;",
"downloadLocation": "/mnt/ayase_quart/src/static/hayden_asagi",
"fullImagesEnabled": true,
"thumbnailsEnabled": true
}
}
My Hayden Scraper instance has run for 3 days now. I can confirm that it will continue archiving existing threads after several hours downtime -- restarting the Hayden Scraper service is no issue.
I gtg now, but I plan to add to this blogpost. I've also added some Hayden Scraper instructions at https://github.com/sky-cake/ayase-quart#hayden. Let me know what other information I should include here.
It would be sweet if Hayden could support options to download thumbnails and/or full media per board, rather than per archive instance.
For example, have thumbnails only be downloaded from /g/, full media only downloaded from /p/, and both thumbnails and full media downloaded from /ck/.
This repo was put in front of me fairly recently.
The documentation mentions that it has a front-end, but is this more of an archival engine?
Are there any websites or examples of it running somewhere?
How close is this to a production ready state?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.