GithubHelp home page GithubHelp logo

jocketf / fimfarchive Goto Github PK

View Code? Open in Web Editor NEW
23.0 4.0 0.0 338 KB

Preserves stories from Fimfiction

Home Page: https://www.fimfiction.net/user/116950/

License: GNU General Public License v3.0

Python 99.59% HTML 0.29% Dockerfile 0.12%
mylittlepony archiving fanfiction fimfiction pony python

fimfarchive's Issues

How to run?

Could you please add some documentation as to how to get this up and running? All the commands from cloning the repo (I already figured that part out :P but I figure it's good to include for completeness) to creating the final zip.

Status visible, old, and deleted fics

Hello,
I noticed that every fic in the index.json has 'status': 'visible', even the ones that have been deleted or that are not accessible publicly.
The submitted and published fields are also always true as far as I can tell.

But the data for some of those stories is pretty inconsistent with the rest of the archive. I think most of those are deleted stories, but I have no way to exclude them since every story is marked visible and published in the archive.
Some of the problems with not being able to exclude deleted stories:

  • Stories on the site all have a 'series' tag (e.g. MLP-FiM or EQG), but there's ~45k fics in the archive that don't have one. Many of those seem to be old deleted stories, but there's no reliable way to know.
    • Note that there are also non-deleted stories that have inconsistent tags! Story 31718 has the MLP:FiM tag on the site, but not in the archive
  • The non-story data won't be up to date: the author object will be full of NULL values on some stories, but not others (even though the author's account is still active).
  • It makes it harder to use fimfarchive as a data source in general. For example I saw the search GUI that works offline, but if I wanted something like this as a webpage that links to the real site, I'd need a way to filter dead links.

So, is it intended that status, submitted and published are always truthy?
Is there a way to filter out deleted fics that I missed, and is it normal that some of the non-deleted fics' tags don't match what's on the site?

I made a browser-based search for fimfarchive

Hello!

I wanted to let you know that I made a browser-based search for fimfarchive. It's not a full-text search, but it is content-based. It relies on text embeddings produced by the SPLADE v2 neural network.

You can find it at https://a0346f102085fe9f.github.io/IAS2/

It can be hosted locally too.

You can add it to the "third party projects" of the fimfarchive page on fimfiction if you see fit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.