balisujohn / tortoise.cpp Goto Github PK

View Code? Open in Web Editor NEW

151.0 14.0 14.0 43 MB

A ggml (C++) re-implementation of tortoise-tts

License: MIT License

CMake 0.06% C 35.49% C++ 64.45%

ggml text-to-speech tortoise-tts tts speech text to local

tortoise.cpp's Introduction

tortoise.cpp: GGML implementation of tortoise-tts (Support Development Here!)

Downloading

clone the repository with the following command

git clone --recursive https://github.com/balisujohn/tortoise.cpp.git

Compiling

For now, CUDA and CPU only. To compile:

Compile for CPU (works on Linux x86 and Mac ARM)

mkdir build
cd build
cmake .. 
make

This is tested with mac os arm

Compile for CUDA

mkdir build
cd build
cmake .. -DGGML_CUBLAS=ON
make

This is tested with Ubuntu 22.04 and cuda 12.0 and a 1070ti

Compile for Mac OS with metal (work in-progress)

mkdir build
cd build
cmake .. -DGGML_METAL=ON
make

Running

Only lowercase letters, spaces, and punctuation are supported in the prompt.

You will need to place ggml-model.bin, ggml-vocoder-model.bin and ggml-diffusion-model.bin in the models directory to run tortoise.cpp. You can download them here https://huggingface.co/balisujohn/tortoise-ggml. I will release scripts for generating these files from tortoise-tts.

From the build directory, run:

./tortoise

here's an example that should work out of the box:

./tortoise --message "based... dr freeman?" --voice "../models/mouse.bin" --seed 0 --output "based?.wav"

all command line arguments are optional:

arguments:
  --message           Specifies the message to generate, lowercase letters, spaces, and punctuation only. (default: "this is a test message." )
  --voice             Specifies the path to the voice file to use to determine the speaker's voice.  (default: "../models/mol.bin" )
  --output            Specifies the path where the generated wav file will be saved.                 (default: "./output.wav")
  --seed              Specifies the seed for psuedorandom number generation, used in autoregressive sampling and diffusion sampling (default: system time seed)

How to add voices

set up the original tortoise-tts, then run it with whatever voice you have, then after this line: https://github.com/neonbjb/tortoise-tts/blob/e2d9fba0bb5c4376d0d142efea47a448f97c4d90/tortoise/api.py#L401

add this code:

numpy_array = auto_conditioning.to("cpu").numpy().astype(np.float32)  # Ensure float32 for binary format

# Define the file path
file_path = 'auto_conditioning.bin'

# Save NumPy array as binary file
numpy_array.tofile(file_path)

print("saved auto conditioning")
exit()

then you can rename auto_conditioning.bin to the speaker name and put the file in your models folder to use it like any other voice. This works with voices clone with tortoise-tts.

Contributing

If you want to contribute, please make an issue stating what you want to work on. DM me on twitter if you want a link to join the dev Discord, or if you have questions. I am happy to help get people get started with contributing!

I am also making available a fork of tortoise-tts which has my reverse engineering annotations, and also the export script for the autoregressive model.

License

This is released with an MIT License.

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Derived from tortoise-tts and ggml.

tortoise-tts:

Apache 2.0 License James Betker https://github.com/neonbjb/tortoise-tts/blob/main/LICENSE

GGML

MIT License

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

tortoise.cpp's People

Contributors

Stargazers

Watchers

Forkers

sriyuta-apra ishine robertomalatesta marcelopalacioabreu samuraibarbi zeroxclem baileyheading cyber-phys soarek94 dridri n0ctron danemadsen cynicalmonkees sicariussicariistuff

tortoise.cpp's Issues

can't found ggml-vocoder-model.bin

after install and built, run with command line, then exit with no ggml-vocoder-model.bin file found error. where can download this file?

Add test for gpt-2 module to prevent regressions

This project is a fork of ggml so it inherits ggml's tests. They are unfortunately somewhat out of date with ggml's current test battery.

You can run the tests with the following commands

cd build
make test

This task is to add a test to make sure that the gpt-2 model outputs the current output for the current seed and input phrase, to prevent regressions from being introduced from subsequent edits to the forward pass.

The input is the tokens:

 std::vector<gpt_vocab::id> tokens = ::parse_tokens_from_string("255,42,2,97,60,49,2,63,48,61,2,26,163,2,149,2,68,161,33,2,42,2,33,77,78,16,32,2,58,2,42,2,50,18,125,9,2,80,43,32,2,127,2,106,29,57,33,159,7,2,55,2,204,32,9,2,16,31,54,54,2,26,213,61,2,60,2,136,26,29,242,2,51,2,22,20,95,46,2,42,2,36,54,18,2,46,63,31,137,192,2,73,2,26,245,2,26,50,2,19,46,18,9,2,0,0", ',');

and there should be a batch of 4 resulting token sequences that look like

[8192, 8, 964, 2961, 896, 978, 4542, 4134, 887, 878, 7051, 1143, 2608, 670, 3014, 99, 795, 1265, 2228, 6370, 611, 329, 5529, 1813, 616, 4153, 157, 917, 2932, 1919, 2967, 2256, 1815, 37, 6542, 2681, 725, 7135, 6893, 1901, 226, 4182, 3484, 2231, 4191, 6344, 2099, 7415, 6893, 2528, 1265, 3546, 4137, 380, 2770, 5560, 1548, 3020, 2362, 7474, 4062, 6921, 6268, 3225, 6693, 5047, 5805, 1613, 2081, 83, 45, 8, 7406, 1134, 4769, 1702, 2813, 70, 5576, 989, 1730, 184, 369, 4387, 3690, 2617, 500, 2978, 5902, 5478, 2797, 2825, 1209, 315, 5033, 1580, 20, 45, 7406, 6560, 6842, 4518, 804, 6288, 6041, 6490, 7677, 3894, 1227, 2489, 936, 1613, 3415, 2214, 716, 1580, 20, 83, 45, 7005, 964, 4074, 184, 2662, 2731, 670, 6433, 4767, 1488, 2296, 4411, 5759, 1265, 3455, 826, 8142, 7014, 6893, 4084, 8158, 1369, 555, 1539, 2636, 1460, 2930, 2893, 1527, 4136, 4961, 3888, 6009, 2127, 1322, 2770, 2775, 2154, 3879, 4447, 579, 1715, 1875, 7229, 2075, 3996, 1940, 4259, 211, 1660, 7886, 224, 226, 1272, 3734, 1298, 4669, 2371, 2735, 779, 19, 4336, 6964, 5406, 1364, 4062, 3633, 1539, 2976, 186, 4277, 1170, 2286, 2797, 1516, 388, 937, 1425, 3323, 3300, 3894, 8193, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 45, 45, 248, 8193]
[8192, 45, 913, 2261, 448, 4244, 3570, 1451, 799, 2080, 5823, 3188, 37, 2919, 372, 8075, 1265, 4879, 7456, 6893, 1554, 6079, 1328, 3213, 6563, 22, 4036, 485, 2075, 323, 2926, 4989, 907, 2015, 2218, 1032, 4039, 6893, 7992, 6871, 489, 3085, 6456, 5961, 6084, 837, 849, 7815, 1539, 4718, 3415, 4627, 3447, 991, 3020, 7382, 7856, 6659, 3635, 6490, 2290, 3225, 3894, 5047, 5805, 152, 5090, 20, 45, 299, 897, 3419, 2524, 2465, 2954, 6680, 8136, 2099, 2978, 2439, 3214, 3387, 4257, 1277, 6, 184, 7251, 5446, 4090, 5, 7403, 3689, 1580, 7005, 212, 7474, 1692, 5116, 5216, 5874, 5226, 7677, 4754, 1227, 2324, 2354, 134, 432, 1580, 20, 45, 83, 45, 7005, 3005, 4074, 1212, 887, 1588, 3610, 1244, 564, 8051, 456, 3958, 2882, 944, 5960, 1325, 2114, 3099, 4275, 6893, 3250, 6784, 918, 1558, 3546, 37, 6407, 4120, 3385, 2546, 2724, 6599, 3947, 1728, 2247, 4475, 951, 3224, 1174, 16, 758, 4277, 5623, 4221, 3291, 1746, 1215, 2075, 364, 1729, 8100, 224, 2429, 1326, 3628, 134, 283, 7410, 1499, 5199, 3487, 5114, 99, 134, 372, 6106, 5860, 7079, 1364, 4837, 3633, 1820, 611, 3327, 6772, 6848, 645, 2797, 283, 303, 2098, 787, 630, 3300, 3905, 8193, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 45, 45, 248, 8193]
[8192, 20, 11, 7571, 2416, 2524, 2437, 6011, 616, 1451, 4889, 560, 2156, 456, 826, 4967, 2369, 2228, 4080, 3606, 1067, 5191, 4098, 5056, 438, 1363, 442, 2490, 1624, 6672, 1508, 837, 7921, 3761, 7229, 7394, 37, 964, 523, 1539, 2495, 5375, 8072, 942, 6084, 849, 824, 7136, 4264, 2724, 329, 4928, 6284, 350, 5046, 4120, 7020, 6046, 3741, 7607, 4099, 3894, 3905, 297, 5311, 11, 1580, 20, 83, 45, 7406, 282, 3646, 360, 1105, 1813, 7258, 2932, 492, 107, 7464, 4264, 6661, 1181, 3302, 1730, 5902, 4275, 1640, 3616, 1227, 913, 6996, 3848, 4685, 3412, 5961, 1692, 6142, 4940, 748, 6204, 5329, 6490, 6131, 2290, 4099, 3225, 4724, 913, 670, 1356, 1580, 20, 45, 83, 45, 7005, 1906, 4939, 295, 1601, 5606, 1982, 1940, 5336, 1244, 412, 2015, 2061, 1600, 826, 5293, 8022, 6893, 5276, 7965, 4025, 2150, 4992, 5617, 7282, 4120, 3385, 4391, 1753, 4629, 4518, 2247, 2282, 3635, 1670, 3522, 3670, 4374, 6945, 2797, 378, 4091, 1814, 6490, 8100, 22, 2168, 630, 2478, 119, 913, 6308, 1499, 4837, 6834, 2737, 361, 7403, 6570, 8082, 627, 7020, 3633, 2247, 647, 897, 4431, 6511, 7020, 2797, 3223, 388, 283, 5497, 2351, 4389, 297, 3725, 8193, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 45, 45, 248, 8193]
[8192, 8, 913, 2012, 1105, 2465, 6011, 416, 2662, 3613, 4335, 4726, 754, 1488, 372, 7642, 2020, 2853, 4626, 611, 1554, 2279, 1326, 8184, 456, 5288, 3322, 2490, 460, 6534, 2793, 1349, 7039, 2681, 6536, 6111, 5952, 7261, 1763, 3607, 5954, 6084, 1995, 1539, 6893, 6523, 2362, 1355, 826, 7910, 5859, 458, 1460, 6834, 7224, 5085, 4277, 1907, 7020, 2797, 5502, 4389, 6693, 5311, 7005, 45, 83, 45, 299, 2564, 5837, 634, 4432, 5078, 6887, 1236, 3780, 1442, 2704, 5444, 3455, 1833, 4962, 6637, 2672, 6652, 6852, 7356, 7322, 2801, 5952, 1580, 20, 7005, 1059, 5085, 1692, 5625, 7596, 7224, 6490, 6842, 5864, 4099, 5695, 4099, 2112, 134, 2214, 20, 45, 83, 45, 7005, 824, 1833, 1442, 1541, 2219, 3610, 2682, 4767, 6008, 456, 5058, 2301, 944, 5418, 2117, 4928, 6022, 837, 2484, 6233, 2662, 3309, 4264, 1290, 1988, 2131, 555, 2976, 3921, 2138, 3968, 5331, 4810, 558, 1167, 2190, 364, 7576, 2895, 984, 3283, 1174, 2530, 2075, 1601, 7578, 265, 6458, 4547, 670, 1953, 647, 7010, 1351, 4062, 2957, 3828, 913, 19, 2460, 6596, 7079, 3895, 4837, 3633, 4039, 861, 23, 4277, 6511, 4526, 2797, 3123, 1126, 937, 787, 900, 7020, 3300, 4169, 8193, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 83, 45, 45, 248, 8193]

and the autoregressive latent tensor should match the values it currently produces(the first and last 3 values are printed out here)

NAME:
cur
TYPE
0
SHAPE:
1024
500
4
1
DATA:
0.703938
0.134067
0.577487
-0.891981
-0.0644716
0.370529

So this test will require saving the 1024 * 500 * 4 tensor to disk and uploading it to the repo.

Feel free to ask any questions on this issue or in the discord!

Profile the gpt-2 module and the tortoise-tts gpt-2 module and try to improve the gpt-2 module's performance

This task is blocked by the gpt-2 forward pass test being added since this could introduce regressions. #5

The task is as follows:

measure the runtime of the autoregressive model all the way from inputs to the full sequence of tokens and last layer latents being generated (as checked by the test in #5 ), and the time taken for the corresponding batch of 4 token sequences and final layer latents in tortoise-tts.

Then, try improving the efficiency of the tortoise.cpp forward pass. Some suggestions are as follows:

try removing seemingly redundant ops
change ops to in place where possible

Feel free to ask questions here or in the discord.

Make the tokenizer match the tortoise-tts Tokenizer exactly

If people are interested in contributing to tortoise.cpp, a great first task would be getting the tokenizer to always match the tokenization tortoise-tts uses. The tokenizer I'm using in tortoise.cpp seems to be able to load the tokenizer vocab but the regex had issues with some of the special chars which I bandaided at least for spaces, but more perplexingly, the tortoise-tts tokenizer isn't greedy with respect to always choosing the longest possible next token, while the default tokenizer I copied from ggml gpt-2 seems to be greedy. So the task would be studying the tokenizer tortoise-tts uses, and modifying the tokenizer in tortoise,cpp to exactly match this behavior. Please reply here to claim this task, and feel free to ask any questions here if you need help getting set up with development.

Part of solving this would be coming up with a number of test phrases to make sure the tortoise.cpp tokenizer finds matching tokenizations for all of them.

Optimize GPT2 inference: Remove redundant `autoregressive_latent_graph` and enable streaming output

Thank you for this excellent implementation. I'd like to suggest an optimization that could significantly speed up inference and enable streaming output.

Currently, there are two GPT2 graphs:

autoregressive: Generates speech codes (originally for CLVP to select the best result)
autoregressive_latent_graph: Generates latents based on the best result

Since CLVP has been removed, we can streamline this to a single GPT2 graph that directly generates latents. I've implemented this with minimal changes:

In autoregressive_graph, add after cur = ggml_add(ctx0, cur, model.language_model_head_layer_norm_bias);:

// Output latents
ggml_tensor *final_output_2 = ggml_cont(
    ctx0, ggml_view_4d(ctx0, cur, 1024, 1, batch_size, 1,
                       cur->nb[1], cur->nb[2], cur->nb[3],
                       (test_dimension - 1) * sizeof(float) * 1024));

ggml_set_name(final_output_2, "output_latents");
ggml_set_output(final_output_2);
ggml_build_forward_expand(gf, final_output_2);

In the main inference loop, extract the latent:

extract_tensor_to_vector(ggml_graph_get_tensor(gf, "output_latents"), latent);

Benefits:

Faster inference by eliminating redundant GPT2 runs
Enables potential streaming output of latents
Simplifies code structure

This optimization could significantly benefit users looking to speed up inference or implement streaming latent generation.

Thank you very much for this great app. Could you compile and release binaries for Windows, please?

...

cloning fails

First of all, thank you for developing this project. I wanted to play with it but cloning fails:

      git clone --recurse-submodules https://github.com/balisujohn/tortoise.cpp
      Cloning into 'tortoise.cpp'...
      remote: Enumerating objects: 3615, done.
      remote: Counting objects: 100% (472/472), done.
      remote: Compressing objects: 100% (165/165), done.
      remote: Total 3615 (delta 313), reused 449 (delta 298), pack-reused 3143
      Receiving objects: 100% (3615/3615), 42.77 MiB | 3.71 MiB/s, done.
      Resolving deltas: 100% (2452/2452), done.
      Submodule 'ggml' ([email protected]:ggerganov/ggml.git) registered for path 'ggml'
      Cloning into '/home/***/tortoise.cpp/ggml'...
      The authenticity of host 'github.com (140.82.121.3)' can't be established.
      ED25519 key fingerprint is SHA256:+DiY3wvvV6TuJJhbpZisF/zLDA0zPMSvHdkr4UvCOqU.
      This key is not known by any other names
      Are you sure you want to continue connecting (yes/no/[fingerprint])?

Implement the CLVP module

CLVP is a non-essential component of tortoise-tts for filtering for latents that will will yield good generation quality if given to the diffusion model.

This task is to export CLVP from tortoise-tts to a ggml format, then load it tortoise.cpp and reconstruct the forward pass.

This will be relatively involved; as in the Discord if you want to attempt this and I'll go into more detail.

Wrong models files at huggingface

With the latest code + latest models from huggingface

(@venv) dur-randir@orthanc:models/ (master ✗…) $ sha256sum ggml-diffusion-model.bin
1506f93fb6b37b424cfb34d0e9047101ec0dd99c6681d434de13de7617b7070e  ggml-diffusion-model.bin
(@venv) dur-randir@orthanc:models/ (master ✗…) $ sha256sum ggml-model.bin
6d9650f2792b6f1467a24e77c28f846120a183dadc4ca3203bdb4c0a13eda7e3  ggml-model.bin
(@venv) dur-randir@orthanc:models/ (master ✗…) $ sha256sum ggml-vocoder-model.bin
2ac177cbb5f40a9659332b09d90ec6208a0cbe46818e3f9768ebaf239c8e2984  ggml-vocoder-model.bin

I get the following when trying to load model

./tortoise --message "based... dr freeman?" --voice "../models/mouse.bin" --seed 0 --output "based?.wav"
gpt_vocab_init: loading vocab from '../models/tokenizer.json'
gpt_vocab_init: vocab size = 255
autoregressive_model_load: loading model from '../models/ggml-model.bin'
autoregressive_model_load: ggml tensor size    = 368 bytes
autoregressive_model_load: backend buffer size = 1889.29 MB
autoregressive_model_load: using CPU backend
autoregressive_model_load: tensor 'inference_model.transformer.h.0.ln_1.weight' has wrong shape in model file: got [1024, 1], expected [1024, 0]
autoregressive: failed to load model from '../models/ggml-model.bin'

That's a turtle, not a tortoise

Everybody realises that's a picture of a turtle? right? :)

Mac OS: Bringing metal support

Work is underway to get metal working for this project. Keen to get help from anyone interested.

TODO:

add conv_tranpose_1d for metal (already has cpu implementation)
modify ggml_pad backend and added ggml_pad_ext for metal
add pad_reflect_1d to metal (already has cpu implementation)
add unfold_1d to metal (already has cpu implementation)

Existing PRs from @balisujohn decent example to follow for the changes done to CUDA already:
https://github.com/ggerganov/ggml/pulls/balisujohn

Converting .pth files

Hello,
I found a fine-tuned model to handle French : https://huggingface.co/Snowad/French-Tortoise , but the files are in Torch format. Is there any way to convert it to ggml's format ?

Weights

Note that the pre loaded ggml-model.bin saved in this repository does not have all the tensors currently loaded by the program, since the version with a lot of tensors is too big to put in version control. Feel free to make an issue asking for the conversion script if you want me to hurry up in posting it.

Hi,
Thank you so much for working on this! I was wondering if you might be able to post the conversion scripts, and also to post the full weights on Hugging Face.
Thank you!

Add HTTP server into build

Given the use case of this project a server would be very useful. Projects like whisper.cpp should provide a decent example for how we might add this

Ideal features

Streaming
FFMPEG conversion (with flag)

Segfault with "IOT instruction (core dumped)"

The following invocation

./tortoise --message 'So these models are trained 5.4 billion annotations across 126 million images. The number of images is a lot less, but maybe there are more (or better) annotation across those images.

It might also be the case that with Flamingo there was only an alt-text for the whole image, while the FLD-5B dataset used in Florence-2 has multiple annotations per image (segment).

But look at the table on the HuggingFace page. These models are beating most of the multi-billion models on most of the benchmarks.' --voice "../models/mouse.bin" --seed 0 --output "based?.wav"

results in

[2]    342702 IOT instruction (core dumped)  ./tortoise --message  --voice "../models/mouse.bin" --seed 0 --output

EXC_BAD_ACCESS on run on macOS CPU only

Seems to crash on run:

43008 diffusion_model_load: loading model from '../models/ggml-diffusion-model.bin'
diffusion_model_load: ggml tensor size = 368 bytes
diffusion_model_load: backend buffer size = 689.28 MB
diffusion_model_load: using CPU backend

Process 61390 stopped

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x26feb72f8)
    frame #0: 0x00000001843f3b80 libc++.1.dylib`std::__1::ios_base::clear(unsigned int) + 28
libc++.1.dylib`std::__1::ios_base::clear:
->  0x1843f3b80 <+28>: str    w8, [x0, #0x20]
    0x1843f3b84 <+32>: ldr    w9, [x0, #0x24]
    0x1843f3b88 <+36>: tst    w9, w8
    0x1843f3b8c <+40>: b.ne   0x1843f3b98               ; <+52>
Target 0: (tortoise) stopped.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x26feb72f8)
  * frame #0: 0x00000001843f3b80 libc++.1.dylib`std::__1::ios_base::clear(unsigned int) + 28
    frame #1: 0x00000001843f7a90 libc++.1.dylib`std::__1::basic_istream<char, std::__1::char_traits<char>>::read(char*, long) + 204
    frame #2: 0x000000010000e774 tortoise`diffusion_model_load(fname="../models/ggml-diffusion-model.bin", model=0x000000016fdfe9b8) at main.cpp:1564:13
    frame #3: 0x0000000100021e08 tortoise`diffusion(trimmed_latents=size=43008) at main.cpp:5631:10
    frame #4: 0x0000000100026650 tortoise`main(argc=1, argv=0x000000016fdff370) at main.cpp:6581:28
    frame #5: 0x00000001841320e0 dyld`start + 2360

build error

/home/dumball/code/tortoise.cpp/main.cpp:1574:44: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1574 |         buffer_size += 32 * ggml_type_sizef(GGML_TYPE_F32); // convt_pre.1.bias
      |                             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp:1575:53: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1575 |         buffer_size += 32 * 32 * 3 * ggml_type_sizef(GGML_TYPE_F32); // conv_blocks.0.1.weight
      |                                      ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp:1576:44: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1576 |         buffer_size += 32 * ggml_type_sizef(GGML_TYPE_F32); // conv_blocks.0.1.bias
      |                             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp:1577:53: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1577 |         buffer_size += 32 * 32 * 3 * ggml_type_sizef(GGML_TYPE_F32); // conv_blocks.1.1.weight
      |                                      ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp:1578:44: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1578 |         buffer_size += 32 * ggml_type_sizef(GGML_TYPE_F32); // conv_blocks.1.1.bias
      |                             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp:1579:53: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1579 |         buffer_size += 32 * 32 * 3 * ggml_type_sizef(GGML_TYPE_F32); // conv_blocks.2.1.weight
      |                                      ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp:1580:44: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1580 |         buffer_size += 32 * ggml_type_sizef(GGML_TYPE_F32); // conv_blocks.2.1.bias
      |                             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp:1581:53: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1581 |         buffer_size += 32 * 32 * 3 * ggml_type_sizef(GGML_TYPE_F32); // conv_blocks.3.1.weight
      |                                      ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp:1582:44: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1582 |         buffer_size += 32 * ggml_type_sizef(GGML_TYPE_F32); // conv_blocks.3.1.bias
      |                             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp:1587:35: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1587 |     buffer_size += ggml_type_sizef(GGML_TYPE_F32); // post convolution bias
      |                    ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp:1588:44: warning: ‘double ggml_type_sizef(ggml_type)’ is deprecated: use ggml_row_size() instead [-Wdeprecated-declarations]
 1588 |     buffer_size += 32 * 7 * ggml_type_sizef(GGML_TYPE_F32); // post convolution weight
      |                             ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:741:21: note: declared here
  741 |     GGML_API double ggml_type_sizef(enum ggml_type type), // ggml_type_size()/ggml_blck_size() as float
      |                     ^~~~~~~~~~~~~~~
/home/dumball/code/tortoise.cpp/ggml/src/../include/ggml/ggml.h:202:41: note: in definition of macro ‘GGML_DEPRECATED’
  202 | #    define GGML_DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
      |                                         ^~~~
/home/dumball/code/tortoise.cpp/main.cpp: In function ‘ggml_cgraph* vocoder_graph(vocoder_model&, std::vector<float>&, std::vector<float>&)’:
/home/dumball/code/tortoise.cpp/main.cpp:3820:25: error: ‘ggml_pad_reflect_1d’ was not declared in this scope
 3820 |     ggml_tensor * cur = ggml_pad_reflect_1d(ctx0, vocoder_noise, 3,3);
      |                         ^~~~~~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/tortoise.dir/build.make:76: CMakeFiles/tortoise.dir/main.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:115: CMakeFiles/tortoise.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

can u mention the version of ggml used here

why not vulkan

that way you dont waste time implementing cpu,cuda,amd,metal,intel

AMD GPUs

Hi,
I saw this project currently only supports Cuda. I was wondering if it might be possible to use HIPIFY to make it work on AMD GPUs. Do you know if this would be possible?
Thank you!