ovh / ai-training-examples Goto Github PK

View Code? Open in Web Editor NEW

129.0 129.0 68.0 160.82 MB

License: Apache License 2.0

Jupyter Notebook 99.53% Dockerfile 0.02% Python 0.43% Shell 0.01% CSS 0.01% HTML 0.02%

ai-training-examples's People

Contributors

Stargazers

Watchers

ai-training-examples's Issues

streamlit/speach-to-text silently failing

I am trying diarization of wav file (on my Macos machine)
My steps:
docker build . -t streamlit_app:latest
docker run --rm -it -p 8501:8501 --user=42420:42420 streamlit_app:latest
set token in UI
Provide wav file
select "differentiate speakers" option
Result:
silently fail after downloading models

SampleOrderTakingCustomerSupportPhilippines.wav.zip

error from matplotlib

when i try to use an example then i get an error from the packages matplotlib that says module 'matplotlib' has no attribute 'axes' and i also have a error that says module 'matplotlib' has no attribute 'pyplot'

can we pass video , it is possible or not ?

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\acer/.cache\\torch\\hub\\master.zip'

when I run app.py I'm getting this error .can you help me to fix this?

Example DockerFile

Hello,

Great resources here !

Could you share the DockerFile you used for this example please ?

Or maybe to generate this image ?

Thank you in advance !
Manu

llama2-fine-tuning/llama_2_finetuning.ipynb run into errors

I tried to run the notebook on my Local server with a 3060 GPU. It run into the following error

`---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[15], line 4
2 model_name = "meta-llama/Llama-2-7b-hf"
3 bnb_config = create_bnb_config()
----> 4 model, tokenizer = load_model(model_name, bnb_config)

ImportError: Using load_in_8bit=True requires Accelerate: pip install accelerate and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes or pip install bitsandbytes

Fine-Tuning LLaMA 2 Models Errors When Saving Merged Model

When saving models at the last cell:
model.save_pretrained(output_merged_dir, safe_serialization=True)
I got this error:

RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.layers.11.self_attn.q_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.11.input_layernorm.weight', 'lm_head.weight'}].
A potential way to correctly save your model is to use save_model.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

How to deal with this problem?

AttributeError: 'NoneType' object has no attribute 'cquantize_blockwise_fp16_nf4'

While running
model, tokenizer = load_model(model_name, bnb_config)

I am getting the following error,

AttributeError Traceback (most recent call last)
Cell In[33], line 4
2 model_name = "meta-llama/Llama-2-7b-hf"
3 bnb_config = create_bnb_config()
----> 4 model, tokenizer = load_model(model_name, bnb_config)

Cell In[4], line 5, in load_model(model_name, bnb_config)
2 n_gpus = torch.cuda.device_count()
3 max_memory = f'{40960}MB'
----> 5 model = AutoModelForCausalLM.from_pretrained(
6 model_name,
7 quantization_config=bnb_config,
8 device_map="auto", # dispatch efficiently the model on the available ressources
9 max_memory = {i: max_memory for i in range(n_gpus)},
10 )
11 tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
13 # Needed for LLaMA tokenizer

File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:566, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
564 elif type(config) in cls._model_mapping.keys():
565 model_class = _get_model_class(config, cls._model_mapping)
--> 566 return model_class.from_pretrained(
567 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
568 )
569 raise ValueError(
570 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n"
571 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}."
572 )

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3480, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3471 if dtype_orig is not None:
3472 torch.set_default_dtype(dtype_orig)
3473 (
3474 model,
3475 missing_keys,
3476 unexpected_keys,
3477 mismatched_keys,
3478 offload_index,
3479 error_msgs,
-> 3480 ) = cls._load_pretrained_model(
3481 model,
3482 state_dict,
3483 loaded_state_dict_keys, # XXX: rename?
3484 resolved_archive_file,
3485 pretrained_model_name_or_path,
3486 ignore_mismatched_sizes=ignore_mismatched_sizes,
3487 sharded_metadata=sharded_metadata,
3488 _fast_init=_fast_init,
3489 low_cpu_mem_usage=low_cpu_mem_usage,
3490 device_map=device_map,
3491 offload_folder=offload_folder,
3492 offload_state_dict=offload_state_dict,
3493 dtype=torch_dtype,
3494 is_quantized=(getattr(model, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES),
3495 keep_in_fp32_modules=keep_in_fp32_modules,
3496 )
3498 model.is_loaded_in_4bit = load_in_4bit
3499 model.is_loaded_in_8bit = load_in_8bit

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3870, in PreTrainedModel._load_pretrained_model(cls, model, state_dict, loaded_keys, resolved_archive_file, pretrained_model_name_or_path, ignore_mismatched_sizes, sharded_metadata, _fast_init, low_cpu_mem_usage, device_map, offload_folder, offload_state_dict, dtype, is_quantized, keep_in_fp32_modules)
3868 if low_cpu_mem_usage:
3869 if not is_fsdp_enabled() or is_fsdp_enabled_and_dist_rank_0():
-> 3870 new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
3871 model_to_load,
3872 state_dict,
3873 loaded_keys,
3874 start_prefix,
3875 expected_keys,
3876 device_map=device_map,
3877 offload_folder=offload_folder,
3878 offload_index=offload_index,
3879 state_dict_folder=state_dict_folder,
3880 state_dict_index=state_dict_index,
3881 dtype=dtype,
3882 is_quantized=is_quantized,
3883 is_safetensors=is_safetensors,
3884 keep_in_fp32_modules=keep_in_fp32_modules,
3885 )
3886 error_msgs += new_error_msgs
3887 else:

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:751, in _load_state_dict_into_meta_model(model, state_dict, loaded_state_dict_keys, start_prefix, expected_keys, device_map, offload_folder, offload_index, state_dict_folder, state_dict_index, dtype, is_quantized, is_safetensors, keep_in_fp32_modules)
748 fp16_statistics = None
750 if "SCB" not in param_name:
--> 751 set_module_quantized_tensor_to_device(
752 model, param_name, param_device, value=param, fp16_statistics=fp16_statistics
753 )
755 return error_msgs, offload_index, state_dict_index

File /opt/conda/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py:98, in set_module_quantized_tensor_to_device(module, tensor_name, device, value, fp16_statistics)
96 new_value = bnb.nn.Int8Params(new_value, requires_grad=False, **kwargs).to(device)
97 elif is_4bit:
---> 98 new_value = bnb.nn.Params4bit(new_value, requires_grad=False, **kwargs).to(device)
100 module._parameters[tensor_name] = new_value
101 if fp16_statistics is not None:

File /opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py:179, in Params4bit.to(self, *args, **kwargs)
176 device, dtype, non_blocking, convert_to_format = torch._C._nn._parse_to(*args, **kwargs)
178 if (device is not None and device.type == "cuda" and self.data.device.type == "cpu"):
--> 179 return self.cuda(device)
180 else:
181 s = self.quant_state

File /opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py:157, in Params4bit.cuda(self, device)
155 def cuda(self, device):
156 w = self.data.contiguous().half().cuda(device)
--> 157 w_4bit, quant_state = bnb.functional.quantize_4bit(w, blocksize=self.blocksize, compress_statistics=self.compress_statistics, quant_type=self.quant_type)
158 self.data = w_4bit
159 self.quant_state = quant_state

File /opt/conda/lib/python3.10/site-packages/bitsandbytes/functional.py:832, in quantize_4bit(A, absmax, out, blocksize, compress_statistics, quant_type)
830 lib.cquantize_blockwise_fp16_fp4(get_ptr(None), get_ptr(A), get_ptr(absmax), get_ptr(out), ct.c_int32(blocksize), ct.c_int(n))
831 else:
--> 832 lib.cquantize_blockwise_fp16_nf4(get_ptr(None), get_ptr(A), get_ptr(absmax), get_ptr(out), ct.c_int32(blocksize), ct.c_int(n))
833 elif A.dtype == torch.bfloat16:
834 if quant_type == 'fp4':

AttributeError: 'NoneType' object has no attribute 'cquantize_blockwise_fp16_nf4'

docker private registry not working in AI training job

USERNAME=hellosivi
REPOSITORY=library
IMAGE_NAME=clr
TAG=v1
IMAGE_URI=$USERNAME/$REPOSITORY/$IMAGE_NAME:$TAG
IMAGE_URI_NEW=registry-1.docker.io/hellosivi/library/$IMAGE_NAME:$TAG

ovhai job run $IMAGE_URI_NEW \
--timeout "10d" \
--gpu 1 \
--name "sample-training-docker" \
--flavor ai1-1-gpu

We added a private registry in dashboard and it created an url :: registry-1.docker.io/hellosivi/library
We tried many combinations of the path. No luck.

Do you support private registries?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble