GithubHelp home page GithubHelp logo

Comments (6)

lilux618 avatar lilux618 commented on May 13, 2024 1

I'm confused too. 3 possibilities you can check for:

  • do you have the full GPU allocated for the job, or is it split in 2 instances and you only use 1?
    Yes , I have the full GPU allocated for this job , and it is not split in 2 instances. like the screen shot I show.
  • hardware issue like insufficient cooling (maybe) or bad power delivery (unlikely)
    I don't think cooling is a problem . I have tested this A100 with HPL and HPCG benchmark, they are consistent with public results.
  • check with older 470 driver, maybe the compiler on 510 was changed and gets confused (unlikely)
    I haven't test this

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on May 13, 2024

Strange... I just re-tested on our A100 40GB PCIe, and I get consistent results.
Our system uses Nvidia driver 470.103.01, ECC is enabled on the A100. I also checked on 2 other systems with A100 40GB SXM4 and results are the same.
FP16C is very heavy on compute, so you might get thermal throttling. Did you check for sufficient cooling?

image

from fluidx3d.

lilux618 avatar lilux618 commented on May 13, 2024

image

image

I am very confused

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on May 13, 2024

I'm confused too. 3 possibilities you can check for:

  • do you have the full GPU allocated for the job, or is it split in 2 instances and you only use 1?
  • hardware issue like insufficient cooling (maybe) or bad power delivery (unlikely)
  • check with older 470 driver, maybe the compiler on 510 was changed and gets confused (unlikely)

from fluidx3d.

lilux618 avatar lilux618 commented on May 13, 2024

I have another question, even in your results list , the MLUPS with FP16C is less than that with FP16s , so , in which kind of case we should choose FP16C instead of FP16S ?

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on May 13, 2024

FP16S is memory compression to hardware-supported IEEE-754 FP16 format with 1 bit for sign, 5 bits for exponent und 10 bits for mantissa. The conversion is done in hardware, thus it does only double FLOPs/Byte compared to FP32, as it halves transferred Bytes but does not need significantly more FLOPs for the conversion.

FP16C is a custom floating-point format with 1 bit for sign, 4 bits for exponent und 11 bits for mantissa. This halves the truncation error compared to FP16S, so it's more accurate; though the difference is only visible in edge case scenarios. But conversion is not supported in hardware and has to be emulated in software, increasing FLOPs/Byte by a factor of ~8 compared to FP32. Hardware with very fast memory and at the same time low compute power struggles with that.

Bottom line, use

  • FP32 when accuracy is the main constraint
  • FP16C when both memory and accuracy are the main constraints
  • FP16S when both memory and compute time are the main constraints

For more details, see this paper.

from fluidx3d.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.