Pull lateast code, run cargo run --example mnist --release -

Yes, here is result: <div class="snippet-clipboard-content notranslate position-re

Run MNIST example with WGPU generates invalid model file about burn HOT 3 OPEN

J-F-Liu commented on May 28, 2024

Run MNIST example with WGPU generates invalid model file

from burn.

Comments (3)

nathanielsimard commented on May 28, 2024

Hmm, I can't reproduce the problem on my nvidia card. Could you run the wgpu test suite on your GPU to see if an operation fails? You can simply run cargo test in the burn-wgpu directory.

from burn.

J-F-Liu commented on May 28, 2024

Yes, here is result:

failures:

---- fusion::base::tests::maxmin::tests::test_mean_dim_2d stdout ----
thread 'fusion::base::tests::maxmin::tests::test_mean_dim_2d' panicked at burn-wgpu\src\fusion\base.rs:187:5:
assertion `left == right` failed
  left: Data { value: [1.0, 4.0], shape: Shape { dims: [2, 1] } }
 right: Data { value: [0.99999994, 3.9999998], shape: Shape { dims: [2, 1] } }

---- kernel::matmul::tiling2d::unpadded::tests::test_matmul_irregular_shape stdout ----
thread 'kernel::matmul::tiling2d::unpadded::tests::test_matmul_irregular_shape' panicked at burn-wgpu\src\kernel\matmul\utils.rs:65:33:
Tensors are not approx eq:
  => Position 22372: 4.402349472045898 != 4.3502044677734375 | difference 0.05214500427246094 > tolerance 0.0010000000000000002
  => Position 22373: -0.9585940837860107 != -1.0070807933807373 | difference 0.04848670959472656 > tolerance 0.0010000000000000002
  => Position 22374: -8.618410110473633 != -9.033252716064453 | difference 0.4148426055908203 > tolerance 0.0010000000000000002
  => Position 22375: 4.302424907684326 != 4.226462364196777 | difference 0.07596254348754883 > tolerance 0.0010000000000000002
  => Position 22376: 5.406569004058838 != 5.009387016296387 | difference 0.39718198776245117 > tolerance 0.0010000000000000002
11085 more errors...

---- kernel::prng::normal::tests::empirical_mean_close_to_expectation stdout ----
thread 'kernel::prng::normal::tests::empirical_mean_close_to_expectation' panicked at burn-wgpu\src\kernel\prng\normal.rs:93:24:
Tensors are not approx eq:
  => Position 0: 8.946138381958008 != 10 | difference 1.0538616180419922 > tolerance 0.1

---- kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_small stdout ----
thread 'kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_small' panicked at burn-wgpu\src\kernel\reduce\reduction_shared_memory.rs:136:29:
Tensors are not approx eq:
  => Position 0: 351.03289794921875 != 288.3531799316406 | difference 62.679718017578125 > tolerance 0.0010000000000000002

---- kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_large stdout ----
thread 'kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_large' panicked at burn-wgpu\src\kernel\reduce\reduction_shared_memory.rs:177:29:
Tensors are not approx eq:
  => Position 684: 22.973115921020508 != 17.27593421936035 | difference 5.697181701660156 > tolerance 0.0010000000000000002
  => Position 685: 25.75684928894043 != 17.05587387084961 | difference 8.70097541809082 > tolerance 0.0010000000000000002
  => Position 686: 24.88041114807129 != 21.817140579223633 | difference 3.0632705688476563 > tolerance 0.0010000000000000002
  => Position 687: 25.581012725830078 != 21.639711380004883 | difference 3.9413013458251953 > tolerance 0.0010000000000000002
  => Position 688: 24.266672134399414 != 23.075439453125 | difference 1.191232681274414 > tolerance 0.0010000000000000002
20 more errors...

---- kernel::reduce::reduction::tests::reduction_sum_should_work_with_multiple_invocations stdout ----
thread 'kernel::reduce::reduction::tests::reduction_sum_should_work_with_multiple_invocations' panicked at burn-wgpu\src\kernel\reduce\reduction.rs:193:29:
Tensors are not approx eq:
  => Position 0: 763.541748046875 != 634.2994384765625 | difference 129.2423095703125 > tolerance 0.0010000000000000002

---- tests::maxmin::tests::test_mean_dim_2d stdout ----
thread 'tests::maxmin::tests::test_mean_dim_2d' panicked at burn-wgpu\src\lib.rs:49:5:
assertion `left == right` failed
  left: Data { value: [1.0, 4.0], shape: Shape { dims: [2, 1] } }
 right: Data { value: [0.99999994, 3.9999998], shape: Shape { dims: [2, 1] } }


failures:
    fusion::base::tests::maxmin::tests::test_mean_dim_2d
    kernel::matmul::tiling2d::unpadded::tests::test_matmul_irregular_shape
    kernel::prng::normal::tests::empirical_mean_close_to_expectation
    kernel::reduce::reduction::tests::reduction_sum_should_work_with_multiple_invocations
    kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_large
    kernel::reduce::reduction_shared_memory::tests::reduction_sum_dim_shared_memory_small
    tests::maxmin::tests::test_mean_dim_2d

test result: FAILED. 1241 passed; 7 failed; 0 ignored; 0 measured; 0 filtered out; finished in 27.42s

from burn.

bionicles commented on May 28, 2024

noticed while testing the mnist example, i can't seem to get wgpu backend to even use the gpu at all:

ran this test 3x, and there seems to only be 3 cpu spikes, the earlier gpu spike seems unrelated to invoking 'cargo test' within 'burn/crates/burn-wgpu'

system: i9-13900k cpu, 64gb ram,
LSB_RELEASE: Ubuntu 22.04.3 LTS

nvidia-smi
Mon Mar 11 18:33:57 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01 Driver Version: 551.76 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 On | Off |
| 0% 47C P5 62W / 450W | 1173MiB / 24564MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

any ideas why wgpu wouldn't use gpu?

from burn.

Run MNIST example with WGPU generates invalid model file about burn HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs