Ikko Eltociear Ashimine's Projects
utilities for converting deep representations (like sentence embeddings) back to text
VMAS is a vectorized framework designed for efficient Multi-Agent Reinforcement Learning benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface.
Venom is the most complete javascript library for Whatsapp, 100% Open Source.
A new bootable USB solution.
Develop. Preview. Ship.
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, 2022
The open big data serving engine. https://vespa.ai
Official PyTorch codes for the paper: "ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation"
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation.
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Video.js - open source HTML5 & Flash video player
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
A Toolkit for Text-to-Video Generation and Editing
The official Vim repository
Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
Place to store our documentation, code samples, etc for public consumption.
VisualChatGPT
something like visual-chatgpt, 文心一言的开源版
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
VSDocs Public Repo
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
ACG Text-to-Speech
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
📖 The power of CSS typesetting, right at your fingertips.
Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.
A high-throughput and memory-efficient inference and serving engine for LLMs