This issue is an offshoot of the discussion that began in <a href="https://www.yacread

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Support for xml based third party metadata formats about yacreader HOT 7 OPEN

yacreader commented on May 19, 2024

Support for xml based third party metadata formats

from yacreader.

Comments (7)

NetherKing1357 commented on May 19, 2024 3

Based on you response in the forum, I guess we could begin attempts for support with the .xml files stored within CBZ and CB7 files.
I've attached a zip file with a CBZ within. This is a comic file with every entry in the CR metadata editor filled in.

peppercarrot_episode01.zip

The following entries have no information stored in the .xml file:

Rating
Community Rating
Series Complete
Proposed Values
Tags
Review
Characters

This is the content of the .xml file:

<?xml version="1.0"?>
<ComicInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Title>Episode 1</Title>
  <Series>Pepper and Carrot</Series>
  <Number>1</Number>
  <Count>23</Count>
  <Volume>1</Volume>
  <AlternateSeries>Pepper and Carrot</AlternateSeries>
  <AlternateNumber>1</AlternateNumber>
  <StoryArc>None</StoryArc>
  <SeriesGroup>Pepper and Carrot</SeriesGroup>
  <AlternateCount>23</AlternateCount>
  <Summary>This is an open source comic. I have added this information to understand how ComicRack adds metadata to comic files.</Summary>
  <Notes>This is an open source comic. I have added this information to understand how ComicRack adds metadata to comic files.</Notes>
  <Year>2017</Year>
  <Month>3</Month>
  <Day>6</Day>
  <Writer>David Revoy</Writer>
  <Penciller>David Revoy</Penciller>
  <Inker>David Revoy</Inker>
  <Colorist>David Revoy</Colorist>
  <Letterer>David Revoy</Letterer>
  <CoverArtist>David Revoy</CoverArtist>
  <Editor>David Revoy</Editor>
  <Publisher>David Revoy</Publisher>
  <Imprint>David Revoy</Imprint>
  <Genre>Web Comic</Genre>
  <Web>https://archive.org/details/peppercarrot-en</Web>
  <PageCount>4</PageCount>
  <LanguageISO>en</LanguageISO>
  <Format>Web Comic</Format>
  <AgeRating>Everyone</AgeRating>
  <BlackAndWhite>No</BlackAndWhite>
  <Manga>No</Manga>
  <Characters>Pepper, Carrot</Characters>
  <Teams>Pepper and Carrot</Teams>
  <Locations>Carrotland</Locations>
  <ScanInformation>Internet Archive HTML5 Uploader 1.6.3</ScanInformation>
  <Pages>
    <Page Image="0" ImageSize="346512" ImageWidth="992" ImageHeight="1373" Type="FrontCover" />
    <Page Image="1" ImageSize="348534" ImageWidth="992" ImageHeight="1373" />
    <Page Image="2" ImageSize="244617" ImageWidth="992" ImageHeight="1373" />
    <Page Image="3" ImageSize="184320" ImageWidth="720" ImageHeight="177" />
  </Pages>
</ComicInfo>

Below are screenshots of the editor itself with all entries filled in. Web alone has been filled in later, and has a entry in the .xml file.

Every file scraped by cbnack's ComicRack ComicVine scraper has the following information appended:

Web has a link to the ComicVine entry for that issue
Either Tags or Notes has this message: Scraped metadata from ComicVine [CVDBxxxxxx].

Example: If Immortal Hulk, issue 14 were scraped:

<Notes>Scraped metadata from ComicVine [CVDB702466].</Notes>
<Web>https://comicvine.gamespot.com/the-immortal-hulk-14-we-only-meet-at-funerals/4000-702466/</Web>

If all else fails, we can use this information to recursively run the YAC scraper for all the files.

I would need some documentation on the way YACReader stores metadata info to compile a map of CR to YAC tags. Could anyone point me in that direction?

from yacreader.

NetherKing1357 commented on May 19, 2024 1

I've done a basic mapping. Please take a look and let me know if I've got anything wrong.

mapping.xlsx

from yacreader.

NetherKing1357 commented on May 19, 2024

Some relevant comments on the forum:

[quote="matthew" post=2058]
Luis, here are the XML tags currently supported by ComicRack:

<ComicInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<Title>Hope And Glory - Part II: Bitter Beginnings</Title>
	<Series>Ninjak</Series>
	<Number>3</Number>
	<Count>6</Count>
	<Volume>1994</Volume>
	<StoryArc>Arthur</StoryArc>
	<SeriesGroup>Islands</SeriesGroup>
	<Summary>The secret origin of Ninjak continues!</Summary>
	<Notes>Scraped metadata from ComicVine [CVDB141693].</Notes>
	<Year>1995</Year>
	<Month>6</Month>
	<Day>24</Day>
	<Writer>Mark Moretti</Writer>
	<Penciller>Bob McLeod, Mark Moretti</Penciller>
	<Inker>Bob McLeod, Dick Giordano</Inker>
	<Colorist>Kathryn Bolinger</Colorist>
	<Letterer>Bob McLeod, Dick Giordano</Letterer>
	<CoverArtist>Bob McLeod, Kathryn Bolinger, Mark Moretti</CoverArtist>
	<Editor>Bob Layton</Editor>
	<Publisher>Valiant</Publisher>
	<Imprint>Aircel Publishing</Imprint>
	<Genre>Action, Fantasy</Genre>
	<Web>http://www.comicvine.com/ninjak-00-hope-and-glory-part-ii-bitter-beginnings/4000-141693/</Web>
	<PageCount>35</PageCount>
	<LanguageISO>en</LanguageISO>
	<Format>Director's Cut</Format>
	<AgeRating>Mature 17+</AgeRating>
	<BlackAndWhite>No</BlackAndWhite>
	<Manga>No</Manga>
	<Characters>Crimson Dragon, Dr. Silk, Fitzhugh, Iwatsu, Michiko Okubo, Neville Alcott, Ninjak, Senator Yusaku Okubo</Characters>
	<Teams>X-Men</Teams>
	<Locations>California, England, Japan, London, Tokyo</Locations>
	<Pages>
		<Page Image="0" ImageSize="568730" ImageWidth="1280" ImageHeight="1977" Type="FrontCover" />
		<Page Image="1" ImageSize="709786" ImageWidth="1280" ImageHeight="1995" />
	</Pages>
</ComicInfo>

[/quote]

[quote="selmf" post=4883]
Since this is requested regularly I'd like to point out a few things that can be done to speed things up a little. If we want to implement metadata import, we roughly have this todo list:

[ol]
[li]Research the format specification for all metadata files we want to support[/li]
[li]Compare the available metadata entries with YACReader's available database entries[/li]
[li]Map foreign metadata to YACReader's metadata, decide what to do with edge cases[/li]
[li]Aquire a set of example files that are [b]fully tagged[/b] in [u]all[/u] metadata format and legal (not pirated!!!) comics[/li]
[li]Add metadata detection to our library and comic routines[/li]
[li]Run tests to make sure it is working correctly[/li]
[li]Write some basic import routines for the most important tags[/li]
[li]Add logic to handle edge cases like multiple metadata files present and other stuff[/li]
[li]Finetune our import dialog to make all options available[/li]
[/ol]

As you can see this is a feature that isn't implemented quickly. If you want to help out, you can create a bug on our Github page and start working on collecting the info that is needed to actually start the task.

[/quote]

[quote="Luis Ángel" post=4884]
To that list I would add an option to re-scan the comics in a library for metada (posibliy add an option to do it for a folder or a spedific file). Once this is implemented people will want the metadata available for the comics already in the library.

Some help with this would be great, anyone?
[/quote]

from yacreader.

selmf commented on May 19, 2024

A first issue I am seeing is that the way we manage libraries is placing our data in a hidden directory in the root directory of the collection in question. That does not really align very well with the concept of a central xml file to "rule them all", so we will have to think about how to handle this or if we're going to handle this at all.
There is also no info on the structure of this database, other than "xml snippets" or "one huge xml file".

Another issue is that the way per-file metadata is stored is not consistent. Sometimes it is in the archives, sometimes not, it might even be "hidden" using special NTFS filesystem features. Supporting all of these variants probably doesn't make sense.

Metadata format seems to be roughly what ComicVine is giving us (@luisangelsm is that more or less correct?) so mapping should be possible.

We also still need some test files. If anyone is interested, Pepper and Carrot is a great open source web comic we have used for testing and showcase purposes in the past, so you could grab a cbz of it and tag it via ComicRack.

from yacreader.

selmf commented on May 19, 2024

YACReaderLibrary stores its metadata in a hidden directory called .yacreaderlibrary which contains a directory with covers and a database file called library.db.
You can use https://sqlitebrowser.org/ to open this file and inspect the entries. For any questions related to the format in general, you will need to ask @luisangelsm - the database is his ~~mess~~ speciality and I have successfully avoided working on it until now.

from yacreader.

selmf commented on May 19, 2024

Thanks for taking the time to do this. This should be enough for me to writing a first draft for an importer. I still need to do some investigations on my own to see for which technical option to support XML in general we should opt and I will need to discuss this technical decision with @luisangelsm to get his input and OK on it.
We might also use this opportunity to take a closer look at our own library metadata and maybe do some improvements on it.

from yacreader.

luisangelsm commented on May 19, 2024

@NetherKing1357 Thanks for all the resources and research, it has been really useful.

It still needs some work, but it is looking good so far.

from yacreader.

Support for xml based third party metadata formats about yacreader HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs