GithubHelp home page GithubHelp logo

ravn-tech / hypertag Goto Github PK

View Code? Open in Web Editor NEW
184.0 12.0 14.0 1.02 MB

HyperTag - Intuitive Knowledge Management WebApp & CLI for Humans using Deep Learning & Tags

License: Other

Python 83.10% CSS 3.31% HTML 3.33% JavaScript 10.26%
tags tagging filesystem file organization semantic-similarity search-text pdf search search-engine

hypertag's Issues

Support relative file paths

This will enable to sync the hypertag.db across different machines / devices, while still working with relative file paths.

Add automatic file tagging by file type

  1. Auto tag file with extension (type), e.g. JPG, PNG, TXT, PDF, PY, JS
  2. Auto tag file with group, e.g. Image (JPG, PNG), Document (TXT, PDF), Source (PY, JS)

Add CPU / GPU toggle option

Currently things stop working if no CUDA GPU is available. This is bad. Make CUDA optional (allow CPU only usage). Looks like CLIP does not work without CUDA...

Add test cases

Test basic functions that are unlikely to change behavior:

  • add file
  • import directory
  • add tag
  • add metatag
  • query
  • index (check text cleaning works for challenging file examples for pdf, html, etc.)

Identify file duplicates

Add hash and size columns to files table.
On add: compute hash and size -> Ignore duplicates.

Update HyperTagFS dir lazily

Right now the whole HyperTagFS directory gets rebuild on every tag changing operation. Instead only make partial updates.

Add image search to HyperTagFS

Create a dedicated directory called "Search Images". All directories names created in "Search Images" are interpreted as search queries for image files and accordingly populated with the results.

Add text search to HyperTagFS

Create a dedicated directory called "Search Texts". All directories names created in "Search Texts" are interpreted as search queries for text documents and accordingly populated with the results.

Add semantic video search

First basic version: Partition video into e.g. 16 uniformly spaced (by time) sections and take a screenshot. Embed each screenshot and use average as video embedding.

Advanced: Partition video with higher granularity and extract frames e.g. every 5 seconds or fixed high number (+100). Compute embedding for every extracted frame. Compute pairwise consecutive frame distances in embedding space to infer semantically coherent video sections (similar frames). Embed each section as average of coherent frames (below a threshold). The list of average frame embeddings should be a pretty good representation of the video and comes with section start & end metadata.

Improve query UX using synonym detection

Match semantically very similar words. For example if files are tagged with science and research is queried it should match. Definitely add a toggle to turn this feature off as some users may find it confusing.

Related to #9

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.