Hi, thanks for releasing these models! It's great to se more open source LLMs, especia

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Model-level Parallelism about codegen HOT 6 CLOSED

salesforce commented on April 16, 2024

Model-level Parallelism

from codegen.

Comments (6)

rooa commented on April 16, 2024 2

@VHellendoorn @boblee22 Changed (dec5101) for better readability, let me know if this clarifies the confusion.

from codegen.

rooa commented on April 16, 2024

Hi Vincent, the reason for 16B model not fitting into a GPU with sufficiently large RAM was due to not sampling under half precision. We made a change to the sampling code earlier and it got turned off by default. We just pushed a small change (139825f) that (1) turns the half precision on by default and (2) forces half precision on 16B models. Could you pull and try sampling again?
On my end, the model occupies about 33GB during sampling, which fits into a single NVIDIA A100 with 40GB RAM.

from codegen.

VHellendoorn commented on April 16, 2024

Thanks, that works! Here I was wondering why it needed more than 3 bytes per weight -- figures :) The memory footprint matches what you report now. Thanks again for freely sharing these models.

from codegen.

boblee22 commented on April 16, 2024

Hi, could you please clarify what you mean by forcing half-precision on 16B models?
If I understand it correctly, when the model name starts with "codegen-16B", then no_fp16 = True and therefore fp16 will be disabled?

from codegen.

rooa commented on April 16, 2024

@boblee22 As you can see, --no_fp16 argument is set to store_false, which will set the value to False when the flag is specified. From the script user perspective this makes sense (setting "no" flag when they don't want to turn fp16 on), hence the name of the variable. However, this causes the confusion like your question, where semantics of the variable name is opposite of how it is used.

We could mitigate by setting the flag to store_true, then declare a new variable like use_fp16 = (not args.no_fp16). But it's as confusing as the current state of code in my opinion.

from codegen.

VHellendoorn commented on April 16, 2024

To add, I do agree that the current nomenclature is confusing. While the argument is called "no fp16", its boolean is passed (without inversion) to "fp16" here. So it already acts as "use fp16". Might be worth renaming to use_fp16 without changing anything else.

from codegen.

Recommend Projects

Model-level Parallelism about codegen HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs