GithubHelp home page GithubHelp logo

Comments (3)

nathanielsimard avatar nathanielsimard commented on May 28, 2024

Hmm, I can't reproduce the problem on my nvidia card. Could you run the wgpu test suite on your GPU to see if an operation fails? You can simply run cargo test in the burn-wgpu directory.

from burn.

J-F-Liu avatar J-F-Liu commented on May 28, 2024

Yes, here is result:

failures:

---- fusion::base::tests::maxmin::tests::test_mean_dim_2d stdout ----
thread 'fusion::base::tests::maxmin::tests::test_mean_dim_2d' panicked at burn-wgpu\src\fusion\base.rs:187:5:
assertion `left == right` failed
  left: Data { value: [1.0, 4.0], shape: Shape { dims: [2, 1] } }
 right: Data { value: [0.99999994, 3.9999998], shape: Shape { dims: [2, 1] } }

---- kernel::matmul::tiling2d::unpadded::tests::test_matmul_irregular_shape stdout ----
thread 'kernel::matmul::tiling2d::unpadded::tests::test_matmul_irregular_shape' panicked at burn-wgpu\src\kernel\matmul\utils.rs:65:33:
Tensors are not approx eq:
  => Position 22372: 4.402349472045898 != 4.3502044677734375 | difference 0.05214500427246094 > tolerance 0.0010000000000000002
  => Position 22373: -0.9585940837860107 != -1.0070807933807373 | difference 0.04848670959472656 > tolerance 0.0010000000000000002
  => Position 22374: -8.618410110473633 != -9.033252716064453 | difference 0.4148426055908203 > tolerance 0.0010000000000000002
  => Position 22375: 4.302424907684326 != 4.226462364196777 | difference 0.07596254348754883 > tolerance 0.0010000000000000002
  => Position 22376: 5.406569004058838 != 5.009387016296387 | difference 0.39718198776245117 > tolerance 0.0010000000000000002
11085 more errors...

---- kernel::prng::normal::tests::empirical_mean_close_to_expectation stdout ----
thread 'kernel::prng::normal::tests::empirical_mean_close_to_expectation' panicked at burn-wgpu\src\kernel\prng\normal.rs:93:24:
Tensors are not approx eq:
  => Position 0: 8.946138381958008 != 10 | difference 1.0538616180419922 > tolerance 0.1

---- kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_small stdout ----
thread 'kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_small' panicked at burn-wgpu\src\kernel\reduce\reduction_shared_memory.rs:136:29:
Tensors are not approx eq:
  => Position 0: 351.03289794921875 != 288.3531799316406 | difference 62.679718017578125 > tolerance 0.0010000000000000002

---- kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_large stdout ----
thread 'kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_large' panicked at burn-wgpu\src\kernel\reduce\reduction_shared_memory.rs:177:29:
Tensors are not approx eq:
  => Position 684: 22.973115921020508 != 17.27593421936035 | difference 5.697181701660156 > tolerance 0.0010000000000000002
  => Position 685: 25.75684928894043 != 17.05587387084961 | difference 8.70097541809082 > tolerance 0.0010000000000000002
  => Position 686: 24.88041114807129 != 21.817140579223633 | difference 3.0632705688476563 > tolerance 0.0010000000000000002
  => Position 687: 25.581012725830078 != 21.639711380004883 | difference 3.9413013458251953 > tolerance 0.0010000000000000002
  => Position 688: 24.266672134399414 != 23.075439453125 | difference 1.191232681274414 > tolerance 0.0010000000000000002
20 more errors...

---- kernel::reduce::reduction::tests::reduction_sum_should_work_with_multiple_invocations stdout ----
thread 'kernel::reduce::reduction::tests::reduction_sum_should_work_with_multiple_invocations' panicked at burn-wgpu\src\kernel\reduce\reduction.rs:193:29:
Tensors are not approx eq:
  => Position 0: 763.541748046875 != 634.2994384765625 | difference 129.2423095703125 > tolerance 0.0010000000000000002

---- tests::maxmin::tests::test_mean_dim_2d stdout ----
thread 'tests::maxmin::tests::test_mean_dim_2d' panicked at burn-wgpu\src\lib.rs:49:5:
assertion `left == right` failed
  left: Data { value: [1.0, 4.0], shape: Shape { dims: [2, 1] } }
 right: Data { value: [0.99999994, 3.9999998], shape: Shape { dims: [2, 1] } }


failures:
    fusion::base::tests::maxmin::tests::test_mean_dim_2d
    kernel::matmul::tiling2d::unpadded::tests::test_matmul_irregular_shape
    kernel::prng::normal::tests::empirical_mean_close_to_expectation
    kernel::reduce::reduction::tests::reduction_sum_should_work_with_multiple_invocations
    kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_large
    kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_small
    tests::maxmin::tests::test_mean_dim_2d

test result: FAILED. 1241 passed; 7 failed; 0 ignored; 0 measured; 0 filtered out; finished in 27.42s

from burn.

bionicles avatar bionicles commented on May 28, 2024

noticed while testing the mnist example, i can't seem to get wgpu backend to even use the gpu at all:

image

ran this test 3x, and there seems to only be 3 cpu spikes, the earlier gpu spike seems unrelated to invoking 'cargo test' within 'burn/crates/burn-wgpu'

system: i9-13900k cpu, 64gb ram,
LSB_RELEASE: Ubuntu 22.04.3 LTS

nvidia-smi
Mon Mar 11 18:33:57 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01 Driver Version: 551.76 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 On | Off |
| 0% 47C P5 62W / 450W | 1173MiB / 24564MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

any ideas why wgpu wouldn't use gpu?

from burn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.