Related Work

Related to CLIP (from OpenAI)

CLIP itself: openai/CLIP: Contrastive Language-Image Pretraining [github code] [arXiv abstract] [pdf] Inspiration:

[Github Awesome CLIP]

[Github Awesome Video Text Retrieval]

[Papers with Code: Action Recognition]

Popular Downstream Tasks for Video Representation Learning | by Madeline Schiappa | Towards Data Science [towards data science]

Train-CLIP: A PyTorch Lightning solution to training OpenAI's CLIP from scratch [github code]

also includes Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation [arXiv abstract] [pdf]
OpenCLIP: An open source implementation of CLIP [github code]
KaiyangZhou/CoOp: Prompt Learning for Vision-Language Models [github code]
- 1st CoOp: Learning to Prompt for Vision-Language Models - [arXiv abstract] [pdf], arXiv, 2021.
- 2nd CoCoOp: Conditional Prompt Learning for Vision-Language Models - [arXiv abstract] [pdf], in CVPR, 2022.
gaopengcuhk/CLIP-Adapter [github code] [arXiv abstract] [pdf]
gaopengcuhk/Tip-Adapter [github code] [arXiv abstract] [pdf]
CyCLIP: Cyclic Contrastive Language-Image Pretraining [arXiv abstract] [pdf] [github code]
ZrrSkywalker/PointCLIP: [CVPR 2022] PointCLIP: Point Cloud Understanding by CLIP [github code] [arXiv abstract] [pdf]
sallymmx/ActionCLIP: This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition" [github code] [arXiv abstract] [pdf]
Align: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [arXiv abstract] [pdf]
ClipBERT: Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling [github code] [arXiv abstract] [pdf]
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval [github code] [arXiv abstract] [pdf]

Also present in the Twohee Framework
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval [github code] [arXiv abstract] [pdf]
Multilingual-CLIP: OpenAI CLIP text encoders for multiple languages! [github code] (also mentioned with M-CLIP)
[2207.07285] X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval [github code][arXiv abstract][pdf] [hugging face]
antoine77340/S3D_HowTo100M: S3D Text-Video model trained on HowTo100M using MIL-NCE [github code]
DRL: Disentangled Representation Learning for Text-Video Retrieval [github code] [arXiv abstract] [pdf]

Available as a [Towhee operator]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [github code] [arXiv abstract] [pdf]

Available as a [Towhee operator]
SPOT: Semi-Supervised Temporal Action Detection with Proposal-Free Masking [github code] [arXiv abstract] [pdf]

Also with Temporal Action Localization

Ranked #1

Ranked #1
STALE: Zero-Shot Temporal Action Detection via Vision-Language Prompting [github code] [arXiv abstract] [pdf]
singularity: Revealing Single Frame Bias for Video-and-Language Learning [github code] [arXiv abstract] [pdf]
SlowFast: video understanding codebase from FAIR for reproducing [github code] [arXiv abstract] [pdf] [arXiv abstract X3D: Progressive Network Expansion for Efficient Video Recognition]
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition [github code] [arXiv abstract] [pdf] [weights gdrive]
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection [github code] [arXiv abstract] [pdf]
ViViT: A Video Vision Transformer [github code by Scenic] [arXiv abstract] [pdf]
DE:TR: End-to-End Object Detection with Transformers [github code] [arXiv abstract] [pdf]
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding [github code] [arXiv abstract] [pdf]
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks [presentation + video] [online pdf] [pdf][github code]fitclip
X-CLIP: Expanding Language-Image Pretrained Models for General Video Recognition [arXiv abstract] [pdf] [github code]
BridgeFormer: Bridging Video-text Retrieval with Multiple Choice Questions [arXiv abstract] [pdf] [github code]

It is also available as a Towhee operator
Text4Vis: Revisiting Classifier: Transferring Vision-Language Models for Video Recognition [arXiv abstract] [pdf] [github code]
BIKE: Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models [arXiv abstract] [pdf] [github code]

Captioning with CLIP

Fine-grained Image Captioning with CLIP Reward [arXiv abstract] [pdf] [github code]
ClipCap: CLIP Prefix for Image Captioning [arXiv abstract] [pdf] [github code (CLIP prefix captioning)]

Could produce captions out of encoded images by CLIP (maybe work also for videos)
salesforce/BLIP: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation [arXiv abstract] [pdf] [github code]
- Also does action classification

Object Detection

roboflow: Object tracking implemented with the Roboflow Inference API, DeepSort, and OpenAI CLIP. [github code]
ViLD: Open-vocabulary Object Detection via Vision and Language Knowledge Distillation [arXiv abstract] [pdf] [github code part of tensorflow/tpu]
Crop-CLIP: Crop using CLIP [github code]
Detic: Detecting Twenty-thousand Classes using Image-level Supervision [arXiv abstract] [pdf] [github code]
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks [arXiv abstract] [pdf]
SLIP: Self-supervision meets Language-Image Pre-training [arXiv abstract] [pdf] [github code]
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension [arXiv abstract] [pdf] [github code]

Action Recognition

MoViNets: Mobile Video Networks for Efficient Video Recognition [arXiv abstract] [pdf]

Text -> Image: Query Search

johanmodin/clifs: Contrastive Language-Image Forensic Search allows free text searching through videos using OpenAI's machine-learning model CLIP [github code]
clip-retrieval: Easily compute clip embeddings and build a clip retrieval system with them [github code]
natural-language-image-search: Search photos on Unsplash using natural language [github code]
natural-language-youtube-search: Search inside YouTube videos using natural language [github code]

Temporal localization

See dedicated subfolder: ./temporal_localization

Prompt Engineering

LAMA: Language Models as Knowledge Bases? [arXiv abstract] [pdf] [github code]
- Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly [arXiv abstract] [pdf]
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts [arXiv abstract] [pdf] [github code]

Others

ResNet: Deep Residual Learning for Image Recognition [arXiv abstract] [pdf]
Transformer: Attention Is All You Need [arXiv abstract] [pdf]
ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [arXiv abstract] [pdf]
MoViNets: Mobile Video Networks for Efficient Video Recognition [arXiv abstract] [pdf] [github code]
Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance [arXiv abstract] [pdf]
Robust fine-tuning of zero-shot models (by ML Foundations) [github code] [arXiv abstract] [pdf]
🏄 Embed/reason/rank images and sentences with CLIP models [github code]
t-SNE clearly explained. An intuitive explanation of t-SNE… | by Kemal Erdem (burnpiro) | Towards Data Science [towards data science]
All About ML — Part 8: Understanding Principal Component Analysis — PCA | by Dharani J | All About ML [Medium]

... data sets that have more than 20 features or high dimensional data. To check the correlation between them, we might have to visualize 20C2 = 190 2D scatter plots! That’s a lot to visualize. On top of that, most of them will not be informative. Clearly if we have many features it gets clumsy to analyze the features and understand their relations. Rather than analyzing each pair from many, if we can try to reduce the dimension to a small range by capturing all the information then we can effortlessly get insights from data.
Vision optimization:
- A Simple Cache Model for Image Recognition [arXiv abstract] [pdf] [github code]
- KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and Quantization [arXiv abstract] [pdf]
- [github code]
Feature Pyramid Networks for Object Detection [arXiv abstract] [pdf]
lucidrains/discrete-key-value-bottleneck-pytorch: Implementation of Discrete Key / Value Bottleneck, in Pytorch [github code]
Using ffprobe to get info from a file in a nice JSON format [gist]

ffprobe -v quiet -print_format json -show_format -show_streams "lolwut.mp4" > "lolwut.mp4.json"
deepdraw/deepdraw.ipynb at master · auduno/deepdraw [github code]
Mean-Average-Precision (mAP)
- Mean-Average-Precision (mAP) — PyTorch-Metrics 0.9.3 documentation [readthedocs]
- mAP (mean Average Precision) for Object Detection | by Jonathan Hui [Medium]
GPT3: Language Models are Few-Shot Learners [arXiv abstract] [pdf] [github code]

Datasets

Security and i-LIDS Dataset topics are presented in the /i-LIDS subfolder

VIRAT Video Data
Jean-Marc Odobez - Home Page - IDIAP Research Institute: Traffic Junction
ImageNet Large Scale Visual Recognition Challenge [arXiv abstract] [pdf]
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs [arXiv abstract] [pdf]
YFCC100M: The New Data in Multimedia Research [arXiv abstract] [pdf]
Microsoft COCO: Common Objects in Context [arXiv abstract] [pdf]
- Github: nightrome/cocostuff: The official homepage of the COCO-Stuff dataset
- labels readme labels txt
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild [arXiv abstract] [pdf]
The Kinetics Human Action Video Dataset [arXiv abstract] [pdf]
- A Short Note about Kinetics-600 [arXiv abstract] [pdf]
The THUMOS Challenge on Action Recognition for Videos "in the Wild" [arXiv abstract] [pdf]
FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age [arXiv abstract] [pdf] [github code]

MLOps

MLOps Toys | A Curated List of Machine Learning Projects

Aim: easy-to-use and performant open-source ML experiment tracker. [official website]

Open Source
BentoML: A faster way to ship your models to production [official website]

Open Source
Data Version Control · DVC [official website]

Open Source

by iterative.ai
Home | MLEM [official website]

Open Source

by iterative.ai

Open-source tool to simplify ML model deployment: Save your ML model with a Python call, Model metadata is captured automatically, Deploy models anywhere you want, make git a Model Registry
Weights & Biases – Developer tools for ML [official website]

The developer-first ‍MLOps platform

Build better models faster with experiment tracking, dataset versioning, and model management
Home - neptune.ai [official website]

Track experiments. Register models. Integrate with any MLOps stack.
Aporia - Cloud Native ML Observability | Monitor your Models [official website]
Blog posts:
- Machine Learning Model Management: What It Is, Why You Should Care, and How to Implement It - neptune.ai [📝 blog]
- ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It - neptune.ai [📝 blog]
- Model Deployment Challenges: 6 Lessons From 6 ML Engineers - neptune.ai [📝 blog]
- ML Metadata Store: What It Is, Why It Matters, and How to Implement It - neptune.ai [📝 blog]

Tools

MayaData | Data Agility Delivered [official website]

Bring Your Data to Kubernetes

Don’t let an outdated data layer be your bottleneck. Run stateful workloads on Kubernetes, save money and move faster.

We make leading open source, high performance, and cloud native solutions.
Milvus
- 2021SIGMOD-Milvus
- Vector database - Milvus [official website]
- milvus-io/milvus: Vector database for scalable similarity search and AI applications. [github code]

Model monitoring

What is the difference between outlier detection and data drift detection? | by Elena Samuylova | Towards Data Science [towards data science]
Drift in Machine Learning. Why is it hard and what to do about it? | by Piotr (Peter) Mardziel | Towards Data Science [towards data science]

Learning

Optimization

Optuna - A hyperparameter optimization framework [github code]
Mixed precision | TensorFlow Core
Understanding Mixed Precision Training | by Jonathan Davis | Towards Data Science [towards data science]
Mixed precision: What is mixed precision training?
A Survey of Quantization Methods for Efficient Neural Network Inference [arXiv abstract] [pdf]
PyTorch Model Inference using ONNX and Caffe2 | LearnOpenCV
Ki6an/fastT5: ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x. [github code]
peterliht/knowledge-distillation-pytorch: A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility [github code]
zhutmost/lsq-net: Unofficial implementation of LSQ-Net, a neural network quantization framework [github code]
Introduction to PyTorch Model Compression Through Teacher-Student Knowledge Distillation | by Moussa Taifi PhD | Towards Data Science [towards data science]

IO

Comparison between LMDB and RocksDB
- LevelDB vs. LMDB vs. RocksDB Comparison
- Benchmarking LevelDB vs. RocksDB vs. HyperLevelDB vs. LMDB Performance for InfluxDB | InfluxData
- [LMDB oriented] Scalable Deep Learning via I/O Analysis and Optimization [online pdf] [pdf]
- [LMDB oriented] Efficiently processing large image datasets in Python [:pencil: blog]
jnwatson/py-lmdb: Universal Python binding for the LMDB 'Lightning' Database [github code]
facebook/rocksdb: A library that provides an embeddable, persistent key-value store for fast storage. [github code]
- RocksDB - Database of Databases
- nni/RocksdbExamples.rst at v2.0 · microsoft/nni [NNI github readme]

Libs

dmlc/decord: An efficient video loader for deep learning with smart shuffling that's super easy to digest [github code]
Welcome to ⚡ PyTorch Lightning — PyTorch Lightning 1.8.0dev documentation [readthedocs]
- Build a Model — PyTorch Lightning 1.8.0dev documentation [readthedocs]
Towhee | Home - Towhee
- action-classification/movinet - movinet - Towhee
- Towhee | Operator Task Detail - Towhee
  - video-text-embedding/clip4clip - clip4clip - Towhee
Welcome to TorchMetrics — PyTorch-Metrics 0.9.3 documentation [readthedocs]
streamlit/streamlit: Streamlit — The fastest way to build data web apps in Python [github code]
msamogh/nonechucks: Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more! [github code]
Altair: Declarative Visualization in Python — Altair 4.2.0 documentation
open-mmlab/mmaction2: OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark [github code]

Other Computer Vision tasks/concepts

YOLO: Real-Time Object Detection
- Introduction to YOLO Algorithm for Object Detection | Engineering Education (EngEd) Program | Section
- YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors [arXiv abstract] [pdf] [github code]
Optical flow - Wikipedia
Introduction to Motion Estimation with Optical Flow
OpenCV: Optical Flow
facebookresearch/detectron2: Detectron2 is a platform for object detection, segmentation and other visual recognition tasks. [github code]
Prismer: A Vision-Language Model with An Ensemble of Experts [github code][arXiv abstract][pdf] [hugging face space]
Image derivative. Analysis of the first derivative of an… | by Giuseppe Pio Cannata | Towards Data Science [towards data science]

schallerala / master-thesis-bib-links Goto Github PK

master-thesis-bib-links's Introduction

Related Work

Related to CLIP (from OpenAI)

Captioning with CLIP

Object Detection

Action Recognition

Text -> Image: Query Search

Temporal localization

Prompt Engineering

Others

Datasets

MLOps

Tools

Model monitoring

Learning

Optimization

IO

Libs

Other Computer Vision tasks/concepts

Writting Thesis

master-thesis-bib-links's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs