suno-ai / bark Goto Github PK

View Code? Open in Web Editor NEW

32.7K 32.7K 3.9K 2.58 MB

🔊 Text-Prompted Generative Audio Model

License: MIT License

Python 41.92% Jupyter Notebook 58.08%

bark's People

Contributors

Stargazers

Watchers

Forkers

yuan-manx whitefu ishine deyituo atuxhe entn-at paperwave zhangsanfeng86 kustomzone thanhpham1987 skyyap silyfox chenchy maxmax2016 amorjnyh goodopenrepo tramphero zth9730 xjia520 donstang vpegasus xymfei macroustc lym0302 techthiyanes wotulong 786440445 t0nych3n ethanyhzhang cellinlab johndpope shaun95 talipturkmen aboal3bas peterguoruc 00-00-00-11 toitek justinjohn0306 sorokinvld rw1s shevavm nn1985 blankpromptventures prizrak2033 alyxdow therockstardba aneurycasado kubes kokizzu vaionicle douglas-larocca shackleslayer zygi antimora mantovani huksley ssteo drmachiavel ml-lab ttulttul l-fifa-l gchenfly chrisvnicholson mutualmate touristshaun asagix miketout kbimplis darus90 c0debrain kac487 cat-stack-boop kelenam jonathanfly ascv saucam hyojunguy adamfils2 intellectzproductions robrita fictiverse scrapnode toslim84 prog-ape michaelhuffman meai guoyang94 monad-one h23120 xeerox666 dhrubasumatary hercules261188 klei22 jorik041 shuipi100 stracerxx deriksso vn-os xcytxs baptistzaire

bark's Issues

speeding up inference

stateful load models once and then generate again
use torch compile?

Apple Silicon support

Hey guys, thanks for releasing this as open-source!

Is there any plan to add Apple Silicon support and use MPS with PyTorch if available or is CUDA a "strict" requirement?

Hi, thank you for creating this amazing project. I wonder if it is possible to run it on AMD GPUs using the ROCm version of PyTorch, like Stable Diffusion does. I would really appreciate your answer. Thanks again

Can a new language be added?

Can I add my language? My lang is endangered languages

Tracking issue for Mac M1/M2 support

torch.device("mps") might work out of the box so I will try it...

Reason for sounding robotic

Hi,
I am interested in knowing why the voice output sounds so robotic.... is it because it only uses 24khz or what is causing this?

code-switched no accent

For code-switched text, is it possible for BARK to not employ the native accent for each respective language in same voice?

Finnish language support

This would be appreciated.

Inquiry on Maximum input character text prompt Length

I want to know about max text_prompt length supported by model
and best practice or method to divide the big text into chunks to trained on this model

Chinese prompt generate failed.

how to input chinese?

How to download all models at once at the beginning on colab?

There are many models downloaded during inference which is really annoying, can't we just have a download of all models at the beginning?

How can I generate sound effects?

The documentation mentions being able to generate simple sound effects, but I don't see any examples of how to do this. If I put in a prompt such as "sound effect of a door shutting", I just get the voice of someone saying that, which doesn't have quite the same effect.

CUDA out of memory, running on RTX 3050ti, how to fix?

Exception has occurred: OutOfMemoryError
CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 4.00 GiB total capacity; 3.46 GiB already allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
File "C:\Users\smast\OneDrive\Desktop\Code Projects\Johnny Five\audio test.py", line 8, in
audio_array = generate_audio(text_prompt)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 4.00 GiB total capacity; 3.46 GiB already allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How to train this model

This work showed us a new idea of TTS,can you tell us the method of training this model?

Multi-lingual voice cloning

Hi, Thanks for the great work :)
I'm interested in multilingual voice cloning and seems like there is valle-x.
https://vallex-demo.github.io/

Is this something Bark can handle (maybe in the future)?

How can I save it as a wav file

text_prompt = """
Hello, my name is Suno. And, uh — and I like pizza. [laughs]
But I also have other interests such as playing tic tac toe.
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)

Swedish language support!

Just wanted to give my two cents and kindly ask for Swedish language support. Much love.

Can I add assets prompts maded by myself?

If I want to add new assets prompts, how can I fine-tune them? Or import new prompts files generated from other libraries?

Arbitrarily long text

Is there a way to run on arbitrarily long text for example breaking up by max token (not splitting words)?

CUDA out of memory. Tried to allocate 508.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 0 bytes is free

No GPU being used

I have this message.
No GPU being used. Careful

But my is Geforce 1660 Super
What's wrong?

Driver Version: 472.12
CUDA Version: 11.4
Win10x64

install error

Looking in indexes: http://mirrors.gwm.cn/pypi/web/simple, https://pypi.tuna.tsinghua.edu.cn/simple, http://mirrors.aliyun.com/pypi/simple/, https://pypi.mirrors.ustc.edu.cn/simple/, http://pypi.hustunique.com/, http://pypi.sdutlinux.org, http://pypi.douban.com/simple/, https://mirror.baidu.com/pypi/simple
Collecting git+https://github.com/suno-ai/bark.git
  Cloning https://github.com/suno-ai/bark.git to /tmp/pip-req-build-84uue3vz
  Running command git clone -q https://github.com/suno-ai/bark.git /tmp/pip-req-build-84uue3vz
  Resolved https://github.com/suno-ai/bark.git to commit 905c38b8bba2377c1bddd8060b81aea6d8a1c6d6
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... error
  ERROR: Command errored out with exit status 1:
   command: /home/ybZhang/miniconda3/envs/bark/bin/python3.8 /tmp/pip-standalone-pip-1qc0awh2/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-d5go4i6j/normal --no-warn-script-location --no-binary :none: --only-binary :none: -i http://mirrors.gwm.cn/pypi/web/simple --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple --extra-index-url http://mirrors.aliyun.com/pypi/simple/ --extra-index-url https://pypi.mirrors.ustc.edu.cn/simple/ --extra-index-url http://pypi.hustunique.com/ --extra-index-url http://pypi.sdutlinux.org --extra-index-url http://pypi.douban.com/simple/ --extra-index-url https://mirror.baidu.com/pypi/simple --trusted-host mirrors.gwm.cn --trusted-host pypi.tuna.tsinghua.edu.cn --trusted-host mirrors.aliyun.com --trusted-host pypi.mirrors.ustc.edu.cn --trusted-host pypi.hustunique.com --trusted-host pypi.sdutlinux.org --trusted-host pypi.douban.com --trusted-host mirror.baidu.com -- wheel
       cwd: None
  Complete output (3 lines):
  Looking in indexes: http://mirrors.gwm.cn/pypi/web/simple, https://pypi.tuna.tsinghua.edu.cn/simple, http://mirrors.aliyun.com/pypi/simple/, https://pypi.mirrors.ustc.edu.cn/simple/, http://pypi.hustunique.com/, http://pypi.sdutlinux.org, http://pypi.douban.com/simple/, https://mirror.baidu.com/pypi/simple, https://pypi.tuna.tsinghua.edu.cn/simple, http://mirrors.aliyun.com/pypi/simple/, https://pypi.mirrors.ustc.edu.cn/simple/, http://pypi.hustunique.com/, http://pypi.sdutlinux.org, http://pypi.douban.com/simple/, https://mirror.baidu.com/pypi/simple
  ERROR: Could not install packages due to an OSError: ('Received response with content-encoding: br, but failed to decode it.', Error("Decompression error: b'CL_SPACE'"))

  ----------------------------------------
WARNING: Discarding git+https://github.com/suno-ai/bark.git. Command errored out with exit status 1: /home/ybZhang/miniconda3/envs/bark/bin/python3.8 /tmp/pip-standalone-pip-1qc0awh2/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-d5go4i6j/normal --no-warn-script-location --no-binary :none: --only-binary :none: -i http://mirrors.gwm.cn/pypi/web/simple --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple --extra-index-url http://mirrors.aliyun.com/pypi/simple/ --extra-index-url https://pypi.mirrors.ustc.edu.cn/simple/ --extra-index-url http://pypi.hustunique.com/ --extra-index-url http://pypi.sdutlinux.org --extra-index-url http://pypi.douban.com/simple/ --extra-index-url https://mirror.baidu.com/pypi/simple --trusted-host mirrors.gwm.cn --trusted-host pypi.tuna.tsinghua.edu.cn --trusted-host mirrors.aliyun.com --trusted-host pypi.mirrors.ustc.edu.cn --trusted-host pypi.hustunique.com --trusted-host pypi.sdutlinux.org --trusted-host pypi.douban.com --trusted-host mirror.baidu.com -- wheel Check the logs for full command output.
ERROR: Command errored out with exit status 1: /home/ybZhang/miniconda3/envs/bark/bin/python3.8 /tmp/pip-standalone-pip-1qc0awh2/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-d5go4i6j/normal --no-warn-script-location --no-binary :none: --only-binary :none: -i http://mirrors.gwm.cn/pypi/web/simple --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple --extra-index-url http://mirrors.aliyun.com/pypi/simple/ --extra-index-url https://pypi.mirrors.ustc.edu.cn/simple/ --extra-index-url http://pypi.hustunique.com/ --extra-index-url http://pypi.sdutlinux.org --extra-index-url http://pypi.douban.com/simple/ --extra-index-url https://mirror.baidu.com/pypi/simple --trusted-host mirrors.gwm.cn --trusted-host pypi.tuna.tsinghua.edu.cn --trusted-host mirrors.aliyun.com --trusted-host pypi.mirrors.ustc.edu.cn --trusted-host pypi.hustunique.com --trusted-host pypi.sdutlinux.org --trusted-host pypi.douban.com --trusted-host mirror.baidu.com -- wheel Check the logs for full command output.

Location of models is ambiguous

Personally I like to know where external files are stored on my system and even though I'm trying bark within a VENV it is not clear where the models are downloaded to.

It would be "nice" to have models stored inside a models/ folder within the root of the project, rather than some black hole location that is created from the S3 download.

I see there is an option to set ENV variables for the paths to the models, but that is not documented in your README, and one has to be dissecting your code to find their references.

AttributeError: module 'torch.cuda' has no attribute 'is_bf16_supported'

Traceback (most recent call last):
File "D:\5118\movielearning\testbark\test.py", line 1, in
from bark import SAMPLE_RATE, generate_audio
File "C:\ProgramData\Anaconda3\envs\movielearning\lib\site-packages\bark_init_.py", line 1, in
from .api import generate_audio, text_to_semantic, semantic_to_waveform
File "C:\ProgramData\Anaconda3\envs\movielearning\lib\site-packages\bark\api.py", line 5, in
from .generation import codec_decode, generate_coarse, generate_fine, generate_text_semantic
File "C:\ProgramData\Anaconda3\envs\movielearning\lib\site-packages\bark\generation.py", line 24, in
torch.cuda.is_bf16_supported()
AttributeError: module 'torch.cuda' has no attribute 'is_bf16_supported'

about the prompt

How can I generate the .npz file? Is it from soundStream?

How can I install bark in win10 correctly?

C:\Users\winner\Desktop>pip install git+https://github.com/suno-ai/bark.git
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting git+https://github.com/suno-ai/bark.git
Cloning https://github.com/suno-ai/bark.git to c:\users\winner\appdata\local\temp\pip-req-build-uky214xt
Running command git clone --filter=blob:none -q https://github.com/suno-ai/bark.git 'C:\Users\winner\AppData\Local\Temp\pip-req-build-uky214xt'
Resolved https://github.com/suno-ai/bark.git to commit 2a602ce
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: UNKNOWN
Building wheel for UNKNOWN (pyproject.toml) ... done
Created wheel for UNKNOWN: filename=UNKNOWN-0.0.0-py3-none-any.whl size=7318 sha256=b171442666007d18d81628603548eb93bc889aa3fbcfdc011c1b9c3e7feb3e83
Stored in directory: C:\Users\winner\AppData\Local\Temp\pip-ephem-wheel-cache-fmy74ei2\wheels\5d\50\6d\04e99a146c274ebc61149dfd86e7f046aa2772170a0bc978d3
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Successfully installed UNKNOWN-0.0.0

system requirements

Amazing work! Thank you for publishing your project.
I have Lenovo IdeaPad 3 15ALC6 Ryzen 5500u 8GB RAM with no external GPU. I tried to test the examples. Unfortunately, it's extremly slow.
After hours of struggling, I managed to finish the following code:

      from bark import SAMPLE_RATE, generate_audio
      from IPython.display import Audio
      
      text_prompt = """
           Hello.
      """
      audio_array = generate_audio(text_prompt)
      Audio(audio_array, rate=SAMPLE_RATE)

but it gave me no audio file.

Another point is that I like voice quality of Turkish speech model. I know there are legal issues but I need that Turkish speech model.

I hope you publish your speech model and instructions of how to build them.
All the best.

Output doesn't match demos

Hi I ran this locally

from bark import SAMPLE_RATE, generate_audio
from IPython.display import Audio

text_prompt = """
     ♪ In the jungle, the mighty jungle, the lion barks tonight ♪
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)

Play the file:
https://user-images.githubusercontent.com/7272343/233417746-dbf0ab65-49c7-477c-9373-1b4f87bdfb5e.mp4

Very odd sound any ideas?

How can we specify a smaller batch size for GPU with 8GB memory or less?

Hi Team,
Thanks for the great software. Is it possible to have batch size as a parameter?

I am trying to run the example with a NVIDIA GeForce GTX 1080.
It is a rather old GPU so it is not as powerful. When running the example code, it always fail with the following error:

---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
Cell In[8], line 8
      2 from IPython.display import Audio
      4 text_prompt = """
      5      Hello, my name is Suno. And, uh — and I like pizza. [laughs] 
      6      But I also have other interests such as playing tic tac toe.
      7 """
----> 8 audio_array = generate_audio(text_prompt)
      9 Audio(audio_array, rate=SAMPLE_RATE)

File ~\workspace\bark\bark\api.py:77, in generate_audio(text, history_prompt, text_temp, waveform_temp)
     60 def generate_audio(
     61     text: str,
     62     history_prompt: Optional[str] = None,
     63     text_temp: float = 0.7,
     64     waveform_temp: float = 0.7,
     65 ):
     66     """Generate audio array from input text.
     67 
     68     Args:
   (...)
     75         numpy audio array at sample frequency 24khz
     76     """
---> 77     x_semantic = text_to_semantic(text, history_prompt=history_prompt, temp=text_temp)
     78     audio_arr = semantic_to_waveform(x_semantic, history_prompt=history_prompt, temp=waveform_temp)
     79     return audio_arr

File ~\workspace\bark\bark\api.py:23, in text_to_semantic(text, history_prompt, temp)
      8 def text_to_semantic(
      9     text: str,
     10     history_prompt: Optional[str] = None,
     11     temp: float = 0.7,
     12 ):
     13     """Generate semantic array from text.
     14 
     15     Args:
   (...)
     21         numpy semantic array to be fed into `semantic_to_waveform`
     22     """
---> 23     x_semantic = generate_text_semantic(
     24         text,
     25         history_prompt=history_prompt,
     26         temp=temp,
     27     )
     28     return x_semantic

File ~\workspace\bark\bark\generation.py:404, in generate_text_semantic(text, history_prompt, temp, top_k, top_p, use_gpu, silent, min_eos_p, max_gen_duration_s, allow_early_stop, model)
    402 tot_generated_duration_s = 0
    403 for n in range(n_tot_steps):
--> 404     logits = model(x, merge_context=True)
    405     relevant_logits = logits[0, 0, :SEMANTIC_VOCAB_SIZE]
    406     if allow_early_stop:

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\bark\model.py:168, in GPT.forward(self, idx, merge_context)
    166 x = self.transformer.drop(tok_emb + pos_emb)
    167 for block in self.transformer.h:
--> 168     x = block(x)
    169 x = self.transformer.ln_f(x)
    171 # inference-time mini-optimization: only forward the lm_head on the very last position

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\bark\model.py:100, in Block.forward(self, x)
     98 def forward(self, x):
     99     x = x + self.attn(self.ln_1(x))
--> 100     x = x + self.mlp(self.ln_2(x))
    101     return x

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\bark\model.py:82, in MLP.forward(self, x)
     81 def forward(self, x):
---> 82     x = self.c_fc(x)
     83     x = self.gelu(x)
     84     x = self.c_proj(x)

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\workspace\bark\venv\lib\site-packages\torch\nn\modules\linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 7.33 GiB already allocated; 0 bytes free; 7.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Can not find bark anywhere after install

I installed using the single line command after opening cmd in E/bark:
pip install git+https://github.com/suno-ai/bark.git
and then the directory I used E/bark was empty. Where is it?

Download the models manually

Hi, the download of the models is slow and unstable from my location,

This download takes more than 10 hours, and it does not support "resume download". I have tried several times, but it still cannot be successfully executed

Can you please provide the publicly accessible URL for these models so I can download them using a download tool and manually place them in the CACHE folder?

Allowing to ignore GPU

I get the error torch.cuda.OutOfMemoryError: CUDA out of memory. So I'd like to run on CPU. But there isn't a setting for that, even though the readme talks about being able to run on both CPU and GPU. It would be great if there was a setting to ignore the GPU to be able to avoid any errors relating to an insufficient GPU.

If technically applicable: If running on CPU wouldn't utilize all logical CPU cores by default, there should also be a setting for the number of threads as in llama.cpp, so one can get CPU utilization up to 100% to maximize speed.

Need written documentation?

I've observed many members want documentations on Suno-ai.

Please comment your requirements below, I'll try to write to the best of my knowledge.

ModuleNotFoundError: No module named 'IPython'

The title says it all :-)

Questions about dataset and training processs

Hello, this project is amazing, I want to reproduce your research and improve on it, can you describe in detail the data set used etc.? Or can you provide the training code? Thanks

Installation error: Something went wrong

I am getting this same error on my local system as well as here in collab notebook.

Installation fails on Ubuntu 22.04 LTS

When trying to install with the instructions on the README, the project will not install

user@host:~/p/bark-test$ pip3 install git+https://github.com/suno-ai/bark.git
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/suno-ai/bark.git
  Cloning https://github.com/suno-ai/bark.git to /tmp/pip-req-build-_xf6oh0i
  Running command git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git /tmp/pip-req-build-_xf6oh0i
  Resolved https://github.com/suno-ai/bark.git to commit 4b3462d5f5efc93bafa30bd82492c68a9bd161ac
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: UNKNOWN
  Building wheel for UNKNOWN (pyproject.toml) ... done
  Created wheel for UNKNOWN: filename=UNKNOWN-0.0.0-py3-none-any.whl size=7276 sha256=7bc0c157340f7c229f1253fc7eff09bf224401a9dd068ccdecd9bd56dce59a99
  Stored in directory: /tmp/pip-ephem-wheel-cache-xjhfjbns/wheels/e6/6d/c2/107ed849afe600f905bb4049a026df3c7c5aa75d86c2721ec7
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Successfully installed UNKNOWN-0.0.0

Am I missing something? This installs just fine on my MacOS laptop with the same command.

Add my/your voices to the dataset

I would like to enrich the dataset by adding my voice to the dataset, I would appreciate it if you provide information on how to participate in that.

Support for Portuguese (pt)

The examples provided are in pt-br Portuguese from Brazil and not in Portuguese from Portugal. I suggest replacing the Portuguese flag with a Brazilian one in the documents and adding pt-br instead of pt.

Would be much appreciated to have support for Portuguese from Portugal.

Training time GPU hours

Hi,
Can you provide some information about the training time that was required and the input data?
How many A100 hours would be required to train a model like this?

Generation inconsistencies

Hi! Congratulations on the awesome product!

I tried generating the same prompts as from the demo, and was met with a few odd results that differed from the previous generated results. Using the same colab notebook, I got mostly silence for the spanish text, and some harsh screeching interspersed throughout the other prompts as well.

Here is the link to the colab notebook with the generated sounds:
https://colab.research.google.com/drive/1iJtfgTCs3WgE0kfSQYEY1-XCy9G-TAt3#scrollTo=8KV3klnr-lvo

Can someone help me understand how to create inference?

Anyway, I installed bark in WSL ubuntu in a conda env, I don't get how I'm supposed to do inference.
These commands don't work

from bark import SAMPLE_RATE, generate_audio
from IPython.display import Audio

How to pass custom speaker prompt?

Hi,
As I see you have provided some speaker prompt for the model, I want to send my voice as prompt rather than given what should I do that to convert my voice to prompt.

Is there any documentation? :D

,

Language support for Tamil - World's widely spoken classic language

Kindly add the support for Tamil language as well.

There are so much of speech datasets available online from OpenSLR MILE, IIT Madras etc.

Thanks,
Vasanth

Some questions

@gkucsko Thanks for such an amazing work！
Could you please share some data examples (like 5 items) to show how you construct the dataset? I am quite curious how you manipulate with the [laughs], [humm] tokens or music descriptors. Thanks in advance.
would you mind writing a more specific technical report?
Also the audio generated by the notebook is not as good as the demo shows. Do you have a larger pretrained model?

Possible missing deps in installation instructions?

The Issue

I believe the installation instructions may not fully describe the dependencies required to install the library. My guess is that there could be some common dependencies that many Python developers use so frequently that they were unintentionally omitted from the installation instructions.

I attempted to run the example script in the README.md on both a Windows and Ubuntu machine. Unfortunately, it failed both times due to missing dependencies.

For some background, I don't usually use Python or Pip for my day-to-day development work. I installed both from scratch and followed the installation instructions word-for-word since I'm not typically a Python developer.

The example failed to load on both Windows and Ubuntu.

Steps to Reproduce

I'll run the repro steps in a Docker container because it makes it easier for others to reproduce the steps on their local machine. Although I don't plan to actually run Bark in Docker, it's useful for creating a reproduction of the error.

Provision a Machine to Test Things On

First, let's get an Ubuntu 22 machine running (I ran this in Fish, Bash users might need to adjust the script):

docker run --rm -it -v (pwd):/data ubuntu bash

Then we can verify the Ubuntu version:

root@9269e48a1db8:/# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

Install Deps

Now we'll install Python, Pip, Git, and Bark. I'm also going to install nano to make it easier to copy and paste the example from the README into the container.

apt update
apt install python-is-python3 python3-pip git nano --yes

Install Bark

OPTION 1: PIP Installation

pip install git+https://github.com/suno-ai/bark.git

Yields:

Collecting git+https://github.com/suno-ai/bark.git
  Cloning https://github.com/suno-ai/bark.git to /tmp/pip-req-build-3d5bs4cx
  Running command git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git /tmp/pip-req-build-3d5bs4cx
  Resolved https://github.com/suno-ai/bark.git to commit 874af1bae9a74324b1fff5573963373c0016f0e0
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: UNKNOWN
  Building wheel for UNKNOWN (pyproject.toml) ... done
  Created wheel for UNKNOWN: filename=UNKNOWN-0.0.0-py3-none-any.whl size=7276 sha256=dfc2d55c1364d743af2968153c439788ee12364a281e9c354d5a9e84870d99e4
  Stored in directory: /tmp/pip-ephem-wheel-cache-hvvpy6mf/wheels/e6/6d/c2/107ed849afe600f905bb4049a026df3c7c5aa75d86c2721ec7
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Successfully installed UNKNOWN-0.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

OPTION 2: Git Clone Installation

git clone https://github.com/suno-ai/bark
cd bark && pip install .

Yields:

Cloning into 'bark'...
remote: Enumerating objects: 280, done.
remote: Counting objects: 100% (61/61), done.
remote: Compressing objects: 100% (42/42), done.
remote: Total 280 (delta 42), reused 28 (delta 19), pack-reused 219
Receiving objects: 100% (280/280), 1.34 MiB | 4.02 MiB/s, done.
Resolving deltas: 100% (70/70), done.
Processing /bark
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: UNKNOWN
  Building wheel for UNKNOWN (pyproject.toml) ... done
  Created wheel for UNKNOWN: filename=UNKNOWN-0.0.0-py3-none-any.whl size=7276 sha256=f8d1e0b5666bfda15fc921b10a2169365a43918a66adf1dbb8514119992c0855
  Stored in directory: /tmp/pip-ephem-wheel-cache-ntwrhv92/wheels/de/02/45/2e72ff30ce0400df4bc80201420b614232aa3ff723e67fc622
Successfully built UNKNOWN
Installing collected packages: UNKNOWN
Successfully installed UNKNOWN-0.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

Run Example

touch example.py
nano example.py # Paste the example here.
python example.py

The Error

I had different issues on Windows, but I unfortunately did not save the results (my Windows machine also had a fresh Pip/Python install).

Here is the error I got when building from Git cloned source:

root@42717c6cb118:/bark# python example.py 
Traceback (most recent call last):
  File "/bark/example.py", line 1, in <module>
    from bark import SAMPLE_RATE, generate_audio
  File "/bark/bark/__init__.py", line 1, in <module>
    from .api import generate_audio, text_to_semantic, semantic_to_waveform, save_as_prompt
  File "/bark/bark/api.py", line 3, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

And the error when installing via the pip install git+... method:

Traceback (most recent call last):
  File "//example.py", line 1, in <module>
    from bark import SAMPLE_RATE, generate_audio
ModuleNotFoundError: No module named 'bark'

Conclusion

It appears that there are some missing steps or dependencies in the installation instructions. Please let me know if there's any other information I can provide to help find a resolution to this issue.

How semantic tokens are generated for training?

Are you using a pretrained model? Where is the model?

the audio is successfully generated, but the sound sounds a little strange.

I am setting up a bark environment in wsl (ubuntu22) of window11, and the audio is successfully generated from the text, but the sound sounds a little strange.
Is there a setting I'm missing?

audio.zip