GithubHelp home page GithubHelp logo

caption-synthesize's Introduction

Synthesizing Captions for booru-tag style database

Recently, LAION type dataset research have discovered that it has insufficient information, alttexts were disasterously lacking information about actual images.

Rather we can call it miracle that models which were dependent at the open dataset, working despite of those discrepencies.

Google, OpenAi has discovered that synthetic captions are far beneficial for dependent task, and CapsFusion project tries to annotate the large scale dataset with synthetic way.

Meta also tries to make a high-quality refined dataset, with hierarchical way.

Unfortunately, locally we don't have enough resources to process all the large database.

But we can cover it in several way,

  1. Tag-retrieval
  2. Focal crop - Tag - Grouping
  3. Tag relevance based reordering.

Which should help understanding what the tags actually belongs to.

extract-exif.py

The file supports Gradio Demo to to extract Stealth-PNGInfo type image metadata.

query-gpt4.py

The file is example template to query GPT-4V API to get annnotation, based on image and tag.

In the directory, the image should have same name with tag .txt file. image

The txt file format:


copyright: 
character: erica_blandelli
general tags: 1girl arm_up blonde_hair blue_eyes breasts choker cleavage closed_mouth day full_body high_heels long_hair looking_at_viewer miniskirt outdoors pleated_skirt red_skirt sitting skirt smile solo

annotate.py

Gradio Demo of the annotation. Hooman should refine the GPT4V annotation. The sanitize cell, which shows the 'unused tags' will be added soon.

Note that GPT-4V DOES NOT ACCEPT ANY TYPES OF NSFW CONTENTS

For those work, one might need company contact, or fair-use agreement for those annotations.

Unfortunately, most of the open source / crawled dataset has potential risk to contain unclassified data.

And since Data Poisoning is being a severe issue, the problem will be important soon ™️.

If booru database supported 'where' is related with specific tag, then it could have been a novel dataset, which also have semantic information too.

caption-synthesize's People

Contributors

aria1th avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.