GithubHelp home page GithubHelp logo

himanshu-dutta / remixart Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 1.0 67.41 MB

Code for customized implementation of stacked GAN model, used for text & audio to image synthesis.

Python 76.82% HTML 6.49% JavaScript 14.61% CSS 1.50% Shell 0.59%
pytorch audio gans

remixart's Introduction

RemixArt

Regardless of whether it's a book cover, album art or only template for a simple project, we are continually searching for designs over the web. Indeed, even with some prefix in our mind, we generally don't discover what we need. Our project aims at developing a model that takes inputs in the form of text, images and even audio and attempts to produce a picture or work of art, maybe, as photorealistic as could be expected under the circumstances. To illustrate, we work with a dataset of songs, album covers, artist images, and song lyrics to generate a close-to-real artwork. The idea can then be put to use in various domains also, where a lot of information in various formats are available. We use GANs with alterations made to incorporate inputs from three unique channels, and with that, we train it to learn embedding based on every one of the three distinct channels.

Adding the notes and changes related to the project, to keep track of it.

Generating Album Art using 3 2 channels of input, Audio, Images and/or Text.

Himanshu's:

  • Model Selection and Learning to Apply It
  • Pytorch
  • Applying the model in Pytorch
  • Figuring Out How to Actually Transfer the Workflow to GCloud

Archita's:

  • Data Scrapping and Collection
  • Deciding on the Data Source
  • Storage and Retrival for efficinet processing, locally or over cloud buckets.
  • Choice of Databse that would work well with the project.

Citation:

We leveraged the architecture of Stack GAN model in pytorch, with updates to the recent version of it, made fair share of modifications in terms of both, the procedure the original model followed along with the changes made to the conditional augmentation technique as well as embedding representation, with a vanilla model consisting of one dense layer and relu unit.

@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}

remixart's People

Contributors

himanshu-dutta avatar architavasuki avatar

Watchers

James Cloos avatar  avatar  avatar

Forkers

architavasuki

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.