Hello, I'm wondering about the minimum GPU memory required for training. Could you pro

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

【GPU Memory】 about geochat HOT 4 CLOSED

mbzuai-oryx commented on September 14, 2024 2

【GPU Memory】

from geochat.

Comments (4)

KjAeRsTuIsK commented on September 14, 2024

Hi @Luo-Z13 , thank you for your interest. We trained the model on 4 A100 40 GB gpus. You can train on one A100 80GB or on a single 40 GB A100 by using the quantised models,in 4 or 8 bit.

from geochat.

vvuonghn commented on September 14, 2024

How long your model training?

from geochat.

KjAeRsTuIsK commented on September 14, 2024

Hi @vvuonghn, we finetuned the model for around 10 hrs for the complete dataset, and further fine-tuned for 4-5 hours on the grounding part of the dataset. Please let me know if you have any further queries.

from geochat.

Amazingren commented on September 14, 2024

Hi @vvuonghn, we finetuned the model for around 10 hrs for the complete dataset, and further fine-tuned for 4-5 hours on the grounding part of the dataset. Please let me know if you have any further queries.

Hi @KjAeRsTuIsK ,

Thanks for you nice work.

May I ask how to fine-tune the model on the grounding part of the datasets?

I already fine-tuned it with this:

################## VICUNA ##################
PROMPT_VERSION=v1
MODEL_VERSION="vicuna-v1.5-7b"
gpu_ids=0,1,2,3
################## VICUNA ##################

 deepspeed --master_port=$((RANDOM + 10000)) --include localhost:$gpu_ids geochat/train/train_mem.py \
    --deepspeed ./scripts/zero2.json \
    --lora_enable True \
    --model_name_or_path /data/.../geochat/llava-v1.5-7b \
    --version $PROMPT_VERSION \
    --data_path /data/.../geochat/GeoChat_Instruct.json \
    --image_folder /data/.../geochat/final_images_llava  \
    --vision_tower openai/clip-vit-large-patch14-336 \
    --mm_projector_type mlp2x_gelu \
    --pretrain_mm_mlp_adapter /data/.../geochat/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --bf16 True \
    --output_dir /data/.../geochat/outckpts/geochat_reproduce \
    --num_train_epochs 1 \
    --per_device_train_batch_size 18 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "epoch" \
    --save_steps 7000 \
    --save_total_limit 1 \
    --learning_rate 2e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --dataloader_num_workers 16 \
    --report_to wandb

What should I do next for fine-tuning it on the grounding part of the datasets?

I am not so familiar with the finturning of llava. Could you give me more detailed instructions when you are free recently?

Bests

from geochat.

【GPU Memory】 about geochat HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs