The char-rnn-text-generation from yxtay

Non-ASCII symbols support

I'm trying to use this project to generate text with non-ascii symbols (cyrilic) and keep getting spaces/tabs/commas but no text, is this related to string.printable which part i should modify to enable support of non-ascii symbols?

Thanks!

Output

python tf_model.py train --text-path Vojna_i_mir._Kniga_1.txt --checkpoint-path /home/norn/src/char-rnn-text-generation/tf_checkpoint/chk
2017-12-16 15:10:56,234 - main - INFO - corpus length: 1433026.
2017-12-16 15:10:56,235 - main - INFO - building model: {'clip_norm': 5.0, 'batch_size': 64, 'num_layers': 2, 'vocab_size': 98, 'rnn_size': 128, 'p_keep': 1.0, 'learning_rate': 0.001, 'embedding_size': 32}.
2017-12-16 15:10:57.231924: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-12-16 15:10:57,523 - main - INFO - model saved: /home/norn/src/char-rnn-text-generation/tf_checkpoint/chk.
2017-12-16 15:10:57,739 - main - INFO - tensorboard set up.
2017-12-16 15:10:57,740 - main - INFO - building model: {'clip_norm': 5.0, 'batch_size': 1, 'num_layers': 2, 'vocab_size': 98, 'rnn_size': 128, 'p_keep': 1.0, 'learning_rate': 0.001, 'embedding_size': 32}.
2017-12-16 15:10:57,996 - main - INFO - inference model loaded: /home/norn/src/char-rnn-text-generation/tf_checkpoint/chk.
2017-12-16 15:10:58,530 - main - INFO - start of training.
epoch 1/32: 0%| | 0/349 [00:00<?, ?it/s]2017-12-16 15:10:58,532 - utils - INFO - number of batches: 349.
2017-12-16 15:10:58,532 - utils - INFO - effective text length: 1429504.
2017-12-16 15:10:58,532 - utils - INFO - x shape: (64, 22336).
2017-12-16 15:10:58,532 - utils - INFO - y shape: (64, 22336).
epoch 1/32: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 349/349 [02:53<00:00, 1.97it/s]
2017-12-16 15:13:51,930 - main - INFO - epoch: 1, duration: 173s, loss: 0.93437.
2017-12-16 15:13:52,082 - main - INFO - model saved: /home/norn/src/char-rnn-text-generation/tf_checkpoint/chk.
2017-12-16 15:13:52,114 - main - INFO - generating 512 characters from top 10 choices.
2017-12-16 15:13:52,115 - main - INFO - generating with seed: "а помощь к брату, кто бы он ни б".
2017-12-16 15:13:53,027 - main - INFO - generated text:
а помощь к брату, кто бы он ни б , , ,
, ,-
, , .
, . ,

epoch 2/32:

mask_zero should be set to True in the embedding layer

char-rnn-text-generation/keras_model.py

Line 32 in 216f368

model.add(Embedding(vocab_size, embedding_size,

Question on how model selects the best number of epochs

Hi there!

Im a noob exploring your text generation project. Its been excellent so far, but I was hoping that you could help me with one question: what is the approach used to pick the best number of epochs? for example, if I were to assign a random number of epochs when I run the script, does the model intelligently stop at a point where it can no longer learn?
Is there a way for me to find out how many epochs are best, even though I may have assigned and run a large number of epochs at start?

Regarding common words

Hey,
Nice work!!

I just have one query.
I am working on character-level text generation with data as novels.

In Novels, there are character names, which occur too many times, so will that affect my model while generating text?

Thanks in Advance!

yxtay / char-rnn-text-generation Goto Github PK

char-rnn-text-generation's People

Contributors

Stargazers

Watchers

Forkers

char-rnn-text-generation's Issues

Non-ASCII symbols support

mask_zero should be set to True in the embedding layer

Question on how model selects the best number of epochs

Regarding common words

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs