GithubHelp home page GithubHelp logo

hardmaru / sketch-rnn Goto Github PK

View Code? Open in Web Editor NEW
806.0 33.0 129.0 50.56 MB

Multilayer LSTM and Mixture Density Network for modelling path-level SVG Vector Graphics data in TensorFlow

Python 99.64% Shell 0.36%

sketch-rnn's Introduction

Depreciated

This version of sketch-rnn has been depreciated. Please see an updated version of sketch-rnn, which is a full generative model for vector drawings.

sketch-rnn

Implementation multi-layer recurrent neural network (RNN, LSTM GRU) used to model and generate sketches stored in .svg vector graphic files. The methodology used is to combine Mixture Density Networks with a RNN, along with modelling dynamic end-of-stroke and end-of-content probabilities learned from a large corpus of similar .svg files, to generate drawings that is simlar to the vector training data.

See my blog post at blog.otoro.net for a detailed description on applying sketch-rnn to learn to generate fake Chinese characters in vector format.

Example Training Sketches (20 randomly chosen out of 11000 KanjiVG dataset):

Example Training Sketches

Generated Sketches (Temperature = 0.1):

Generated Sketches

Basic Usage

I tested the implementation on TensorFlow 0.50. I also used the following libraries to help:

svgwrite
IPython.display.SVG
IPython.display.display
xml.etree.ElementTree
argparse
cPickle
svg.path

Loading in Training Data

The training data is located inside the data subdirectory. In this repo, I've included kanji.cpkl which is a preprocessed array of KanjiVG characters.

To add a new set of training data, for example, from the TU Berlin Sketch Database, you have to create a subdirectory, say tuberlin inside the data directory, and in addition create a directory of the same name in the save directory. So you end up with data/tuberlin/ and save/tuberlin, where tuberlin is defined as a name field for flags in the training and sample programs later on. save/tuberlin will contain the check-pointed trained models later on.

Now, put a large collection of .svg files into data/tuberlin/. You can even create subdirectories within data/tuberlin/ and it will work, as the SketchLoader class will scan the entire subdirectory tree.

Currently, sketch-rnn only processes path elements inside svg files, and within the path elements, it only cares about lines and belzier curves at the moment. I found this sufficient to handle TUBerlin and KanjiVG databases, although it wouldn't be difficult to extent to process the other curve elements, even shape elements in the future.

You can use utils.py to play out some random training data after the svg files have been copied in:

%run -i utils.py
loader = SketchLoader(data_filename = 'tuberlin')
draw_stroke_color(random.choice(loader.raw_data))

Example Elephant from TU Berlin database

For this algorithm to work, I recommend the data be similar in size, and similar in style / content. For examples if we have bananas, buildings, elephants, rockets, insects of varying shapes and sizes, it would most likely just produce gibberish.

Training the Model

After the data is loaded, let's continue with the 'tuberlin' example, you can run python train.py --dataset_name tuberlin

A number of flags can be set for training if you wish to experiment with the parameters. You probably want to change these around, especially the scaling factors to better suit the sizes of your .svg data.

The default values are in train.py

--rnn_size RNN_SIZE             size of RNN hidden state (256)
--num_layers NUM_LAYERS         number of layers in the RNN (2)
--model MODEL                   rnn, gru, or lstm (lstm)
--batch_size BATCH_SIZE         minibatch size (100)
--seq_length SEQ_LENGTH         RNN sequence length (300)
--num_epochs NUM_EPOCHS         number of epochs (500)
--save_every SAVE_EVERY         save frequency (250)
--grad_clip GRAD_CLIP           clip gradients at this value (5.0)
--learning_rate LEARNING_RATE   learning rate (0.005)
--decay_rate DECAY_RATE         decay rate after each epoch (adam is used) (0.99)
--num_mixture NUM_MIXTURE       number of gaussian mixtures (24)
--data_scale DATA_SCALE         factor to scale raw data down by (15.0)
--keep_prob KEEP_PROB           dropout keep probability (0.8)
--stroke_importance_factor F    gradient boosting of sketch-finish event (200.0)
--dataset_name DATASET_NAME     name of directory containing training data (kanji)

Sampling a Sketch

I've included a pretrained model in /save so it should work out of the box. Running python sample.py --filename output --num_picture 10 --dataset_name kanji will generate an .svg file containing 10 fake Kanji characters using the pretrained model. Please run python sample.py --help to examine extra flags, to see how to change things like number of sketches per row, etc.

It should be straight forward to examine sample.py to be able to generate sketches interactively using an IPython prompt rather than in the command line. Running %run -i sample.py in an IPython interactive session would generate sketches shown in the IPython interface as well as generating an .svg output.

More useful links, pointers, datasets

  • Alex Graves' paper on text sequence and handwriting generation.

  • Karpathy's char-rnn tool, motivation for creating sketch-rnn.

  • KanjiVG. Fantastic Database of Kanji Stroke Order.

  • Very clean TensorFlow implementation of char-rnn, written by Sherjil Ozair, where I based the skeleton of this code off of.

  • svg.path. I used this well written tool to help convert path data into line data.

  • CASIA Online and Offline Chinese Handwriting Databases. Download stroke data for written cursive Simplifed Chinese.

  • How Do Humans Sketch Objects? TU Berlin Sketch Database. Would be interesting to extend this work and generate random vector art of real life stuff.

  • Doraemon in SVG format.

  • Potrace. Beautiful looking tool to convert raster bitmapped drawings into SVG for potentially scaling up resolution of drawings. Could potentially apply this to generate large amounts of training data.

  • Rendering Belzier Curve Codes. I used this very useful code to convert Belzier curves into line segments.

License

MIT

sketch-rnn's People

Contributors

dribnet avatar hardmaru avatar polm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sketch-rnn's Issues

a error when I run sample.py

When I run the sample.py, an error occurred:
Traceback (most recent call last):
File "sample.py", line 57, in
model = Model(saved_args, True)
File "C:\YMJ\download_pro\magenta-master\magenta\models\sketch_rnn\sketch-rnn-master\model.py", line 93, in init
outputs, states = custom_rnn_autodecoder(inputs, self.initial_input, self.initial_state, cell, scope='rnn_mdn')
File "C:\YMJ\download_pro\magenta-master\magenta\models\sketch_rnn\sketch-rnn-master\model.py", line 71, in custom_rnn_autodecoder
output, new_state = cell(inp, states[-1])
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 183, in call
return super(RNNCell, self).call(inputs, state)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\layers\base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 1066, in call
cur_inp, new_state = cell(cur_inp, cur_state)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 183, in call
return super(RNNCell, self).call(inputs, state)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\layers\base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 441, in call
value=self._linear([inputs, h]), num_or_size_splits=4, axis=1)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 1189, in call
res = math_ops.matmul(array_ops.concat(args, 1), self._weights)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1891, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 2436, in _mat_mul
name=name)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\framework\ops.py", line 2958, in create_op
set_shapes_for_outputs(ret)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\framework\ops.py", line 2209, in set_shapes_for_outputs
shapes = shape_func(op)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\framework\ops.py", line 2159, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 627, in call_cpp_shape_fn
require_shape_fn)
File "C:\ProgramData\Anaconda3\envs\magenta\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 691, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 512 and 261 for 'rnn_mdn_1/rnn_mdn/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/MatMul_1' (op: 'MatMul') with input shapes: [1,512], [261,1024].
Could you tell me how to solve it? Thank you very much!

svg parsing problems

I'm running into issues with SVG parsing using my own dataset. The parsed representation has extra lines where there shouldn't be. Maybe the parser is not correctly handling moveto in some instances?

output of draw_stroke_color:
screen shot 2017-02-08 at 1 52 45 am

actual svg (filled):
screen shot 2017-02-08 at 1 52 53 am

the test problem

  I saw your code on githubs,and trained with the aaron_sheep dataset.Then I got some files like "vector-500.meta", but I don't know how to test.Forgive my dull, and I hope you can tell me how to test.

      I got the training data,and I found "kL" has still been larger than 0.2 in the end, and "recon" become a negative.Is there a problem ? How should it be resolved?

A question about sequence length

Hello,

it looks like the sequence length in your code is fixed, but the length of CASIA handwriting data varies in a big range, I guess the handwriting data generated by your code may end up with many [0, 0, 0, 0, 1] for a short Chinese character like '一'. I try to use bucket to group them by sequence length, the classification for CASIA handwriting works well, but the generation doesn't. Can you give me some information about the training loss decay of your code?

Thanks a lot.

there are a lot of errors when i pickle.load( .pkl)

there are a lot of errors when i pickle.load( .pkl)
1.open(file) should be replaced by open(file,'rb') in python 3x
2.error in pickle.load(f) : ModuleNotFoundError: No module named 'copy_reg\r'
why is that ?
please

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.