GithubHelp home page GithubHelp logo

bbepis / hayden Goto Github PK

View Code? Open in Web Editor NEW
58.0 9.0 4.0 3.59 MB

Ultra-low resource 4chan/altchan thread and board archiver

License: MIT License

C# 70.06% HTML 19.95% CSS 0.87% JavaScript 0.75% Svelte 6.68% TypeScript 1.10% Shell 0.26% Batchfile 0.02% PowerShell 0.32%
imageboard archiver 4chan lynxchan vichan

hayden's People

Contributors

bbepis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hayden's Issues

Release

Hi, could you release a compiled version? i dont know how to computer but would like to archive some boards, thanks!

Report queue

Right now there's no way to view reports from the frontend.
Even a really barebones page would be better than having to manually check the database from a CLI which I do right now.

Minimum post age for scraper

A feature I'd like is a configurable minimum age threshold that posts have to surpass before being archived.
Setting this to something like 6 hours would essentially remove the need to manually moderate an archive since it gives jannies enough time to delete illegal content before it's archived.

Slow archiving after starting again

Hello, I have recently been trying to archive 2chen's /tv/ board, and had to restart the program since it got stuck while archiving a thread, and now it's doing something that looks like this:

[9/9/2022 3:31:01 AM] [Image] [10/22043]
[9/9/2022 3:31:48 AM] [Image] [20/22033]
[9/9/2022 3:32:20 AM] [Image] [30/22023]

Which is taking a really long time. Is there a way to skip this and just continue archiving the board? At the rate it's going now, it's not going to be finished for at least a day.

Hayden User Blog

Hi @bbepis, I have just started using the Hayden Scraper again for archiving select 4chan threads with the Hayden database schema. It is working very well ๐Ÿ˜„ Thank you for this great tool!

I wanted to share some information which could help other users (and myself) in the future.

Hayden Version: (not sure where to find this)
OS: Ubuntu Server 22 LTS
MySQL: 8.0.34
.NETCore: 6.0.18

/etc/systemd/system/hayden.service

[Unit]
Description=Hayden Scraper
After=network-online.target mysql.service

[Service]
Type=simple
ExecStart=/mnt/hayden_asagi/Hayden scrape /mnt/hayden_asagi/config.json
WorkingDirectory=/mnt/hayden_asagi
User=m
Group=www-data
Restart=always
RestartSec=600
StandardOutput=null #append:/home/user/hayden_info.log
StandardError=append:/home/user/hayden_error.log
SyslogIdentifier=hayden

[Install]
WantedBy=multi-user.target

/mnt/hayden_asagi/config.json

{
	"source": {
		"type": "4chan",
		"boards": {
			"g": {
				"AnyFilter": "battlestation",
				"AnyBlacklist": "stable diff|dall.*e.*3"
			},
			"ck": {}, # download everything
		},
		"apiDelay": 5.5,
		"boardScrapeDelay": 45
	},
	"readArchive": false,
	"proxies": [],
	"consumer": {
		"type": "Asagi",
		"databaseType": "MySQL",
		"connectionString": "Server=127.0.0.1;Port=3306;Database=hayden;Uid=USER;Pwd=PASSWORD;",
		"downloadLocation": "/mnt/ayase_quart/src/static/hayden_asagi",
		"fullImagesEnabled": true,
		"thumbnailsEnabled": true
	}
}

My Hayden Scraper instance has run for 3 days now. I can confirm that it will continue archiving existing threads after several hours downtime -- restarting the Hayden Scraper service is no issue.

I gtg now, but I plan to add to this blogpost. I've also added some Hayden Scraper instructions at https://github.com/sky-cake/ayase-quart#hayden. Let me know what other information I should include here.

Interesting

This repo was put in front of me fairly recently.

The documentation mentions that it has a front-end, but is this more of an archival engine?
Are there any websites or examples of it running somewhere?
How close is this to a production ready state?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.