GithubHelp home page GithubHelp logo

ccloli / e-hentai-db Goto Github PK

View Code? Open in Web Editor NEW
78.0 4.0 12.0 369 KB

Just another E-Hentai metadata database

Home Page: https://イー変態.ロリ.みんな

License: GNU General Public License v3.0

JavaScript 94.54% CSS 5.30% EJS 0.17%
e-hentai

e-hentai-db's Introduction

E-Hentai DB

Just another E-Hentai metadata database

Requirements

  • Node.js 8+

  • MySQL 5.3+ / MariaDB 10+

Setup & Start Up

If you just want to see the data from gdata.json, use 0.1.x, and if you want to keep your gallery up-to-date, use 0.2.x. The master branch and 0.3.x and latter includes more features like torrent hashes, and it may takes a long time to sync

  1. git clone the repo

  2. Run npm i --production in the repo directory to install dependencies

    • If you want to build Web UI, use npm i directly, then run npm run build, the static Web UI files will be in /dist directory
  3. Download gdata.json from E-Hentai Forums and place it into the repo directory

  4. Import struct.sql into a MySQL / MariaDB database

  5. Edit config.js, set database username, password, database name, etc.

  6. Run npm run import [file=gdata.json] to import the JSON file into your database

    • If you want to update to latest galleries, run npm run sync [host=e-hentai.org] [timestampOffset=0]
    • If you want to resync gallery metadatas since a few hours ago, run npm run resync [hour=24]
    • If you want to mark all replaced galleries, run npm run mark-replaced (new galleries will mark them automatically)
    • If you want to get torrents from all galleries, run npm run torrent-import [host=e-hentai.org] (USE AT YOUR OWN RISK)
    • If you want to update torrents from torrent list, run npm run torrent-sync [host=e-hentai.org]
    • If you want to manually fetch some galleries, run npm run fetch {gid}/{token} {gid}/{token} ... or npm run fetch [filename]
  7. Wait a few minutes, as it has about 800,000 records (on my PC it takes 260s, and on my server it's 850s)

  8. Run npm start, the server should be run on 8880 port by default config

Available APIs

All the params can be pass as a part of URL, or put it in search query. Like /api/gallery/:gid/:token, you can call it like /api/gallery/123456/abcdef1234 or /api/gallery?gid=123456&token=abcdef1234.

The response type of all APIs are JSON, and follow the format below.

{
    "code": 200,          // 200 = success
    "data": {...},        // response data
    "message": "success", // error message
    "total": 100          // result counts (if `data` is a list)
}

data should normally be a metadata, or a list of metadata, or null if any error happens. The format of metadata is based on E-Hentai's offical gallery JSON API, you can check it on EHWiki. But data type may be a little different from offical API, like using int for posted and filecount instead of string.

{
    "gid": 592178,
    "token": "41cc263dc7",
    "archiver_key": "434486--1617c38d90630b5e399e730d62dea241363cdce6",
    "title": "(Shota Scratch 5) [Studio Zealot (Various)] Bokutachi! Shotappuru!! (Boku no Pico)",
    "title_jpn": "(ショタスクラッチ5) [Studio Zealot (よろず)] ぼくたち!しょたっぷる!! (ぼくのぴこ)",
    "category": "Doujinshi",
    "thumb": "https://ehgt.org/4c/6a/4c6ad39fffcdefcb2cd35218a95395af2e5ad74d-1854978-2118-3000-jpg_l.jpg",
    "uploader": "tooecchi",
    "posted": 1368418878,
    "filecount": 63,
    "filesize": 75630519,
    "expunged": 0,
    "removed": 0,
    "replaced": 0,
    "rating": "4.54",
    "torrentcount": 1, // useless, count it by `torrents` instead
    "root_gid": 592178,
    "tags": [
        "male:crossdressing",
        "male:shotacon",
        "male:tomgirl",
        "male:yaoi",
        "artist:tower",
        "artist:mokkouyou bond",
        "male:anal",
        "male:schoolgirl uniform",
        "male:catboy",
        "artist:murasaki nyaa",
        "artist:po-ju",
        "artist:rustle",
        "artist:miyakawa hajime",
        "artist:fujinomiya yuu",
        "artist:tanuma yuuichirou",
        "male:school swimsuit",
        "artist:mikami hokuto",
        "artist:azuma kyouto",
        "male:josou seme",
        "parody:boku no pico",
        "male:frottage",
        "male:bloomers",
        "artist:nemunemu",
        "group:studio zealot",
        "artist:aoi madoka"
    ],
    "torrents": [
        {
            "id": 632947,
            "name": "(Shota Scratch 5) [Studio Zealot (Various)] Bokutachi! Shotappuru!! (Boku no Pico)",
            "hash": "2a4641feba9943b0e028927879ff6567e74bf0ae",
            "addedstr": "2019-02-28 00:39",
            "fsizestr": "72.13 MB",
            "uploader": "Hyenacub"
        }
    ]
}

/api/gallery/:gid/:token

Alias: /api/g/:gid/:token

Get gallery metadata.

Query params:

  • gid: Gallery ID (required)
  • token: Gallery token (required)

Returns: metadata

/api/list

Get a list of galleries.

Query params:

  • page: Page number (default: 1)
  • limit: Gallery number per page (default: 10, <= 25)

Returns: metadata[]

/api/category/:category

Alias: /api/cat/:category?page={page=1}&limit={limit=10}

Get a list of galleries which matches one of specific categories, category can be a list split with ,, then it will returns the matched galleries.

category can be a list of string or a number (use xor, and if you want to exclude some category, use negative number, like if you want to get a list of Non-H galleries, the category can be one of Non-H, 256 or -767)

Misc                1           (1 << 0)
Doujinshi           2           (1 << 1)
Manga               4           (1 << 2)
Artist CG           8           (1 << 3)
Game CG             16          (1 << 4)
Image Set           32          (1 << 5)
Cosplay             64          (1 << 6)
Asian Porn          128         (1 << 7)
Non-H               256         (1 << 8)
Western             512         (1 << 9)

Query params:

  • category: Gallery category (required)
  • page: Page number (default: 1)
  • limit: Gallery number per page (default: 10, <= 25)

Returns: metadata[]

/api/tag/:tag

Get a list of galleries which matches ALL of specific tags, tag can be a list split with ,, then it will returns the matched galleries.

The tag should include the category type of tag, like if you want to search some full-colored Chinese translated furry galleries with male fox, you can try /api/tag/language:chinese,male:furry,male:fox,full%20color.

Query params:

  • tag: Tags (required)
  • page: Page number (default: 1)
  • limit: Gallery number per page (default: 10, <= 25)

Returns: metadata[]

/api/uploader/:uploader

Get a list of galleries which uploaded by soneone.

Query params:

  • uploader: Uploader (required)
  • page: Page number (default: 1)
  • limit: Gallery number per page (default: 10, <= 25)

Returns: metadata[]

/api/search

Get a list of galleries which matches all the query requests.

The rule of keyword supports most operators of E-Hentai:

  • Search for gallery title and Japanese title
  • Exact terms (" ") with spaces
    • Underscore (_) is not supported (use Quotation " " instead)
  • Wildcard (*/%) at the end of the pattern (though the query will add % by default)
  • Exclude (-) specific terms
  • Or (~), matching any one of them [v0.3.1]
  • Colon namespaces (:) for tags
    • Supports a subset of qualifiers tags: tag:, uploader:, gid: [v0.3.1]
    • Terms without : will be treated as title keyword (probably like title:?)
  • Exact match for tags ($)
    • Tags without $ can be used for prefix match [v0.3.1]
  • Shorten tag namespaces (character: -> char: / c:) [v0.3.1]

For usage examples, see EHWiki.

Before v0.3.1:
  • If you want to search an uploader, use uploader:{uploader}
  • If you want to search a tag, use {tagType}:{tagName}$, and if tagName contains space, quote it and $, like {tagType}:"{tagName}$"
  • If you want to search a word, just put it, and if it contains space, quote it like "{keyword}"

You can use multiple keywords, split them with space %20, relations between all the keywords are AND (except uploder uses OR), so in theory more keywords will get more accure results

Query params:

  • keyword: Search keywords, split them with space %20
  • category: Gallery category, same as /api/category
  • expunged: Show expunged gallery (default: 0)
  • removed: Show removed gallery (default: 0)
  • replaced: Show replaced gallery (default: 0)
  • minpage: Show gallery with page count larger than this (default: 0)
  • maxpage: Show gallery with page count smaller than this (default: 0)
  • minrating: Show gallery with minimal stars (includes minus half stars) (default: 0, <= 5)
  • page: Page number (default: 1)
  • limit: Gallery number per page (default: 10, <= 25)

Returns: metadata[]

Notes

It eats my memory when importing

The import script will load the WHOLE JSON file (as I prefer to insert the older galleries, so I didn't import them by reading the file in chunk). So when importing, it may eat 1 GB ram or even more, make sure you've setup a swap file on your server

dd if=/dev/zero of=swapfile bs=1M count=2048
chmod 0600 swapfile
mkswap swapfile
swapon swapfile

I got duplicate records when re-importing

Do not cancel when importing, as the import script doesn't support resume import, so you'll have to truncate all table or delete them and create a new one

Now the import script supports resume importing, you can cancel your imports and run npm run import at any time, it'll start from your last record

The query speed is still too slow when querying multiple tags

Try adding indexes if you want

ALTER TABLE `gid_tid` ADD UNIQUE(`gid`, `tid`);
ALTER TABLE `gid_tid` ADD INDEX(`tid`);
ALTER TABLE `tag` ADD UNIQUE(`name`);
ALTER TABLE `gallery` ADD INDEX(`category`);
ALTER TABLE `gallery` ADD INDEX(`uploader`);

If you want to add all of these indexes, the database size will increased from 330 MB to about 500 MB

No primary key in table gid_tid

I'm not sure should I add an id column, as I'm not using it to query. But if you want, try the following SQL, and it'll takes about 110 MB

ALTER TABLE `gid_tid` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;

Why MyISAM?

I've little knowledge with database, you can change struct.sql to use InnoDB or others you want

The server quits when I exit the terminal

Try npm start &, or use PM2 or forever to keep it running in background

Web UI is not included in git repository

They may in GitHub release page, but if it's not here, you can build it by yourself, just run as simple as npm i then npm run build, and set webui to true in config.js.

Why React, React Router, Moment.js ... are in devDependencies?

I prefer it's a Node.js project, and Web UI is just an optional function, also you can grab distributed Web UI files without building it. Whether you need Web UI or not, the front-end libraries are not touched when you setting up the server, as they've been packaged into distributed files.

Todos (or not to do)

  • Advanced search (tags, category, uploader, keyword in one search)

  • Web UI

  • Torrent hashes

  • Update to latest galleries

Thanks

  • Sachia Lanlus, as he collects almost all the gallery metadatas before Ex downs and share the gdata.json

  • Tlaster / ehdb, the table structures are based on his SQLite database, as I've almost forgot how to handle the tag list with gallery

  • StackOverflow/11694761#21408164, the answer helps me to handle multiple tags searching, the searching time of 3 tags is from 60s down to 1.7s on my PC

  • Tenboro, the god who creates the world

  • The community helps E-Hentai to overcome (YAY it's alive!)

License

GPLv3

e-hentai-db's People

Contributors

ccloli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

e-hentai-db's Issues

程序在eh更新后无法使用

如题,我不能确定是不是因为e站更新导致的,但是看了下停止的时间点的确是那段时间停下的,报错如下。

[
'/www/server/nodejs/v14.17.6/bin/node',
'/root/ehdb/e-hentai-db/scripts/sync.js',
'exhentai.org'
]
got last posted = 1667489794
requesting page 0...
TypeError: Cannot read property 'map' of null
at IncomingMessage. (/root/ehdb/e-hentai-db/scripts/sync.js:88:21)
at IncomingMessage.emit (events.js:412:35)
at endReadableNT (internal/streams/readable.js:1317:12)
at processTicksAndRejections (internal/process/task_queues.js:82:21)

Database dump refresh

I found the gdata.json from a few years back and your June 06 2022 database dump and they are very appreciated. Would it be possible to get a more recent database dump.

I'm writing a cli tool to add e-hentai metadata to my local book library and while I succeeded in web scraping the website search and using E-H API to get the metadata, the rate limiting is holding me back. Having a local metadata database would enable me to go much faster.

About eh torrent pages limit.

Hi ccloli,
It seems that now ehentai only allow 100 pages view for torrent list. And since the latest sql file you uploaded is at early 2022, there is no way I can re-synchronize those early torrent.
So is it convenient for you to upload a newer version of the sql file? Or maybe just the torrent chart.
Many thanks.

Database dump

Hello,
If possible could you please upload the latest database dump? I could not sync due to API limitations.

IP ban during sync

Hello, I triggered IP ban error during gallery ,metadata fetching (I think from 2.0M ID to 1.9M only at that point).
Is there any solution aside syncing the huge thing? I kinda wonder about maintaining this as well. (If we need to talk this privately, Let me know as well, thanks!)

请问大佬,

这个网站已经好久没动静了,我看得等数据迁移完成,才能用新的服务器
请问网站还会更新吗?一些被版权炮的中文本在pandachika上没有收录

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.