GithubHelp home page GithubHelp logo

vignetteapp / vignette Goto Github PK

View Code? Open in Web Editor NEW
495.0 12.0 36.0 221.91 MB

The open source VTuber software. ❤

Home Page: https://www.vignetteapp.org

License: Other

Dockerfile 0.86% C# 99.14%
live2d face-recognition vignette vtuber hacktoberfest

vignette's Introduction

Vignette

GitHub

The open-source VTuber toolkit.

Getting Started

Building

Please make sure you meet the following prerequisites:

  • A desktop platform with .NET 7 SDK or above installed.
  • Access to GitHub Packages.

License

Vignette is Copyright © 2023 Cosyne, licensed under GNU General Public License v3.0 with SDK exception. For the full license text please see the LICENSE file in this repository.

vignette's People

Contributors

lenitrous avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vignette's Issues

Use OpenCV directly instead for face recognition

Because of how scuffed DlibDotNet is (#140), we will be ditching DlibDotNet for real time OpenCV detection instead and feed this to a compatible neural network. The work item will be as follows:

  • Remove DlibDotNet on upstream
  • Implement Face Recognition in-app, then export the face data as a PyTorch Tensor like the one in this example.

Of course we will be locked to PyTorch Tensors for now but I guess the third party integrators can figure out converting their models to get PyTorch instead.

Lag Compensation for Prediction Data to Live2D

As part of #28, we have discussed how raw data would result on jittery rough data, even if the neural network used is theoretically as precise as a human eye predicting the facial movements of the subject. To compensate for jittery input, we will implement a sort of lag-compensation algorithm.

Background

John Carmack's work with Latency Mitigation for Virtual Reality Devices (source) explains that the physical movement from the user's head up to the eyes is critical to the experience. While the document is designed mainly for virtual reality, one can argue the methodologies used to provide a seamless experience for virtual reality can be applied for a face tracking application, as face tracking like HMDs, are also very demanding "human-in-the-loop" interfaces.

Byeong-Doo Choi, et al.'s work with frame interpolation using a novel algorithm for motion prediction would enhance a target video's temporal resolution, by using Adaptive OBMC. Such frame interpolation techniques according to the paper has been proven to give better results than the current algorithms used for frame interpolation in the market.

Strategy

As stated on the background, there are many strategies we can perform lag compensation for such raw jittery input from prediction data from the neural network, it is limited to these two strategies:

Frame Interpolation by Motion Prediction

Byeong Doo-Choi, et al. achieves frame interpolation by the following:

First, we propose the bilateral motion estimation scheme to obtain the motion field of an interpolated frame without yielding the hole and overlapping problems. Then, we partition a frame into several object regions by clustering motion vectors. We apply the variable-size block MC (VS-BMC) algorithm to object boundaries in order to reconstruct edge information with a higher quality. Finally, we use the adaptive overlapped block MC (OBMC), which adjusts the coefficients of overlapped windows based on the reliabilities of neighboring motion vectors. The adaptive OBMC (AOBMC) can overcome the limitations of the conventional OBMC, such as over-smoothing and poor de-blocking

According to their experiments, such method would produce better image quality for the interpolated frames, which is helpful for prediction in our neural network, however it comes with a cost of having to process the video at runtime, as the experiment is only done on pre-rendered video frames already.

View Bypass/Time Warping

John Carmack's work with reducing input latency for VR HMDs suggests a multitude of methods, one of them is View Bypass - a method achieved by getting a newer sampling of the input.

To achieve this, the input should be sampled once but can be used by both the simulation and the rendering task, thus reducing the latency for such. However, the input and the game thread must run in parallel and the programmer must be careful not to reference the game state otherwise it would cause a race condition.

Another method mentioned by Carmack is Time Warping, which he states that:

After drawing a frame with the best information at your disposal, possibly with bypassed view parameters, instead of displaying it directly, fetch the latest user input, generate updated view parameters, and calculate a transformation that warps the rendered image into a position that approximates where it would be with the updated parameters. Using that transform, warp the rendered image into an updated form on screen that reflects the new input. If there are two dimensional overlays present on the screen that need to remain fixed, they must be drawn or composited in after the warp operation, to prevent them from incorrectly moving as the view parameters change.

There are different methods of warping which is forward warping and reverse warping, and such warping methods can be used along with View Bypassing. However, the increased complexity for lag compensation of doing input with the main loop concurrently is possible as the input loop will be independent of the game state entirely.

Conclusion

Such strategies mentioned would allow us to have smoother experience, however, based on my personal analysis, I found that Carmack's solutions would be more feasible for a project of our scale. We simply don't have the team and the technical resources to do from-camera video interpolation as it would be computationally expensive to be implemented with minimal overhead.

Settings Implementation

Pulling from #4, we have to decide which settings to add in the UI. We already know which we would like to implement but right now I think we should focus on these areas

  • Model Selection

    • Users have the ability to choose from our presets at HoloTrack.Resources or import their own. (Refer to #11).
  • Background Selection

    • Users can choose between a Green Screen, some presupplied backgrounds, or their own background (Refer to #11).
  • Live2D parameters

    • Select parameters that can be tracked and which part appears in the viewport
    • In this essence, anything that refers to the face is automatically marked as active (Similar to Wakaru's behavior).
  • Hotkeys

    • These refers to hotkeys that manipulates the UI or the model. For example, by default we may have Ctrl+H as our default binding for hiding the Settings Side Panel.
  • Model Position

    • Model Position is controlled by X, Y, and it's scale. We should provide default presets as well so the user has an idea which settings move which.
  • Camera Selection

    • Selects which Camera we would like to get input from. This is already implemented at HoloTrack.Vision.

File Importing

in osu!lazer, when we drag and drop a skin or a beatmap, it gets automatically imported. We might want to implement it since there's a need for it at some point.

This isn't a high priority however, since anything below this issue is much more important.

Make osu!framework use the ECS Paradigm

As it stands right now there's some things fundamentally broken which is a major roadblock on Vignette. We're going to fix one of them, and one of those is the draw hierarchy, because right now the inheritance in the hierarchy doesn't make sense.

Bugs?

After cloning the repo I had to manually install Humanizer and System.Collections.Immutable due to some Unauthorized errors. Now I'm stuck with a green screen. What am I supposed to do?

Reduce LoC for HoloTrack.Vision

Right now I still have my code in there regarding Camera enumeration so I might as well get rid of them along with DirectShow.

Update UX to reflect branding changes

Considering the current UX we have, it feels cramped and unintuitive. In this iteration, we'd focus on visual clarity and as well as follow the branding changes.

Concept:
image

User Inteface

We want to customize the Layout, and to do that we need to do the following:

  • Make the Live2D a draggable component
  • Custom Backgrounds (Green Screen default, white default background, or Image).
  • Persist this layout into a format (YAML, perhaps?)

Todo

  • Draggable and resizable Live2D container.
  • Backgrounds support (White background, Green background, user-defined).

Essentially, since we're going to have a layout similar to this:

image

Restrict Canary Builds only on changes to Source Code

Currently, the lint and build action will always run when there is a push. We'd only want this to happen on the master branch and if there are any changes to the source code as it doesn't make sense to make a build artifact if there aren't any actual changes to the application itself.

Linux CI keeps failing

For some reason our Linux CI is always failing so we might need to investigate further.

Set up Downstream Fork Package Publishing

To accelerate development, we will need to deploy our version of osu.Framework, preferrably under our GHPR repos since in the future we will be integrating changes that will be impossible to mainline to ppy.osu.Framework. While we want to be upstream as possible, we would also love to do our development without waiting too much on the upstream, especially when it's our own changes so the following groundwork will be done:

  • All packages relating to osu-framework should be published under Vignette.osu.Framework.
  • Distinguish osu-framework versions from Vignette-specific versions (preferrably a -vg_<commmit_hash> suffix).
  • Create an empty repository with only the workflow needed to deploy the GHPR packages.

Export Camera OpenCV Mat to a Video Stream

Currently, our Camera implementation exports it into a OpenCV Mat. However, we must convert the Mat to a video stream in order for it to be usable at all for our DLib classification.

Hook up Tracking Worker to Live2D

As the final task for Milestone 1, we're going to hook up the tracking worker to Live2D and see if we can spot some bugs before we turn in on our release.

Investigate Project Reunion Integration

In the next week we will be testing Windows 11 so we can test out our integration to Windows 11 early. We should investigate how we can publish to the Store. Right now, as part of the new Windows Apps SDK, we can use the MSIX format for Windows and distribute to MS Store, then Steam for others. This should allow us to be more flexible with our deployment pattern.

Incorrect Default Theme

On a fresh install, the default theme is incorrect thus failing to load and the entire user interface will be coloured white.

Multiple Cameras will show incorrect names

Quoting @LeNitrous from #31 :

Currently there is a bug when there are more than 1 camera where it'll show the wrong name for the camera currently being used. It'll be fixed in a later PR.

We will be assigning this for AB2 since this will be needed to be fixed along with the finalization of the rest of the basic settings UI.

Hide additional arms in Sample Live2D models

The reason some of these Live2D sample models have additional arm parts is because its required by their baked motion. We shouldn't make these additional arm parts visible if they're not being used.

Support Other Puppet Rendering Systems

It has become apparent that we need to support multiple rendering systems in order to reach a wide audience. In order to make this possible, we need to make the puppet implementation abstract and make it modularized for end users to use whatever they see fit for their needs. The following listed below is what we'd want to support initially.

  • Live2D
    • Apparently became a standard as used in workflows by many VTubers indie or company-wise.
  • Rive
    • A free alternative software-wise. It's editor however isn't and requires to be paid.
  • Inochi2D
    • A fully free alternative from its editor to its runtime. It is at a highly experimental stage however.

Move anything osu! specific out of HoloTrack.Vision

Some parties has raised interest that they want to use our Computer Vision implementation, and out of courtesy, I will be providing a experimental NuGet package in the near future. To do so, I will move out anything that involves o!f out of HoloTrack.Vision.

IPC Architecture

Since osu!framework does not support multiple windows, we'll have to "write" our way out of the problem, a solution proposed by peppy in our discussion in Discord was this:

image

Honestly this sounds more modular in structure so I think we'll follow with this approach.

Todo

Right now, the way we can do this is the following:

  • Split the Live2D Window from the "Manager" Window - we do this by having HoloTrack.Desktop launch the Live2D Client (probably put it over at HoloTrack.Live2D so we don't make another package?)

  • HoloTrack.Live2D is a library, we just tell HoloTrack.Desktop to launch it.

  • HoloTrack.Live2D will have to ask HoloTrack.Desktop for the current user configuration and use that user configuration to build its scene.

Potentially misleading claim

The term "completely open source" is potentially misleading since both Live2D and Bass (used by osu!framework as the audio backend) are not open source.

Face Recognition might get noisy output

We want to only target one singular target at the moment, and currently, Face.GetLandmarks() will end up not using the Landmark we provided and will focus on multiple targets. We need to fix that.

3D support

Since we did the rewrite in #210, we might as well start on 3D support for osu-framework. Still debating if we should make it out of tree fork or a plugin, but it should be something reproducible later on.

TODO for this task is the following:

  • Implement the OpenGL Aquarium in o!f
  • Graphics3 namespace (probably won't happen but we'll see).

Rename to Vignette

The finalized name for our project is now Vignette. However, we have to reflect it to our repositories and the projects that linked with us. GitHub atm redirects holotrack/holotrack to our new name but that won't be the case for the near future.

Documentation Tasks

We'll have to document more significant parts at some point. We'd want contributors to have an idea how everything works in the back-end after all.

For now we can direct them to osu!framework's Getting Started wiki pages.

Model parsing

Since we want to be able to parse models other than the ones stored in HoloTrack.Resources - I think the best way here is:

  • If we look at #4, we have a import model, we don't actually import! We just add a reference to the path of the model file. This should lessen the need for us to implement a model interchange format.

  • Same idea goes for background parsing, we just need to handle references.

  • If the reference returns an error (aka file moved or what not), we can give the option for the user to remove the reference.

Fix how components are scheduled

Even if we fulfill #210, there's still some problems on the engine, since there's no clear indication on how to schedule Components. Component by definition is implementation-agnostic, hence we want it to be scheduled to any thread area the game offers, making it a perfect primitive for the entire game.

Internalization Support (i18n)

We'll have to support multiple languages. A good start is looking at Crowdin as a source. We'll support languages by demand but for starters I think we'll support English, Japanese, and Chinese (Simplified and Traditional) given we have people proficient in those languages.

As for implementation, That would be the second part of investigation.

Extension System

This has been requested by the community; however, this is kinda low priority as we focus most on the core components. The way this works is the following:

  • Extensions can expose their settings in MainMenu.
  • They will be strictly be conformant to the o!f model to properly load. This is considered "bare minimum" for what people requires to make an extension.
  • They will be packaged as either a .dll or a .nupkg which the program can "extract" or "compile" into a DLL, something we can do once we have a better idea with how to dynamically load assemblies.

Anyone can propose a better design here since this is a RFC, we appreciate alternative approaches for this.

New User Experience

We should introduce new users in how to use the application on first boot with an option to skip everything as it isn't clear to new users how to first use the app.

Shader errrors on AMD GCN4 cards

Apparently we're getting shader errors in AMD GCN4 graphics cards, this will affect Zen+ microprocessors as well since they carry Vega 8 to Vega 11

Allow osu!framework to not block compositing

Desktop effects are killed globally when vignette is running; Some parts like disabling decorations are fine, but transparency, wobbly windows, smooth animations for actions, etc are all disabled as long as Vignette is running.

Figure out Camera Switching

https://github.com/holotrack/holotrack/blob/22044e93eccb8462978da4a5d9bd0147814735b6/HoloTrack.Vision/Camera.cs#L21-34

We exposed this function which basically opens a VideoCapture however, int index seems ambigous, but if it does refer to the camera index, then we can use this.

CC @LeNitrous can you check a bit if it's actually a index that refers to a camera device? inasmuch as we can use this, I don't think we can enumerate Video Capture devices which is kind of a hassle :/

Virtual Camera Device

Register a new virtual camera device so that applications such as Discord, Zoom, OBS (using another option other than Game, Desktop or Window Capture) can use it. This way, we can also control what can be sent to that virtual device avoiding the HUD from being seen at all.

Add linting and analyzers

This should help maintain the repository have a uniform standard to coding styles and be able to test before being merged with master.

Model Importing

Optional tasks linked to this: #10

We'll want our users to select from predefined models we have at Resources/ and import their own and save to a directory. Currently we're forcing Haru as our reference model at the moment.

(PS: I have nothing against Haru, she's cute)

Make FaceTracker abstract

This would give us breathing room when we would want to implement other libraries for facial recognition. The hierarchy will be:

  • FaceTracker
  • FaceRecognitionDotNetFaceTracker : FaceTracker
  • TensorFlowFaceTracker : FaceTracker

Evaluate CNTK or Tensorflow for Tracking Backend

Unfortunately, our Tracking Backend, which is FaceRecognitionDotNet, which uses DLib and OpenCV, didn't turn out as performant as expected. The delta is too high to make a significant data, and the models currently perform poorly. In light of that, I will have to make a backend we can control directly instead of relying on others' work which we're not sure that has any quality.

Right now we're looking at CNTK and Tensorflow. While CNTK is from Microsoft, there is more laywork on Tensorflow, so we'll have to decide on this.

Implement HoloTrack Packages

HoloTrack Packages are just ZIP files that we can just unpack, however there is a strict layout to follow, similar to how .osz files were implemented.

  • Implement it over at HoloTrack.Live2D.
  • Drag and Drop support.
  • Assertion and Test cases.

Idea credits from: @LeNitrous

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.