gpuweb / cts Goto Github PK

WebGPU Conformance Test Suite

Home Page: https://gpuweb.github.io/cts/

License: BSD 3-Clause "New" or "Revised" License

TypeScript 98.88% JavaScript 0.27% HTML 0.71% C++ 0.14%

cts's Introduction

W3C GPU for the Web Community Group

This is the repository for the W3C GPU for the Web Community Group WebGPU API and WebGPU Shading Language (WGSL) specifications. This specification is formally standardized by the W3C GPU for the Web Working Group.

We use the wiki and issue tracker as the main sources of information related to the work. This repository will hold the actual specification, examples, etc.

Work-in-progress specification: https://gpuweb.github.io/gpuweb/

Work-in-progress WGSL specification: https://gpuweb.github.io/gpuweb/wgsl/

Charter

The charter for this group is maintained in a separate repository.

Membership

Membership in the Community Group is open to anyone. We especially encourage hardware vendors, browser engine developers, 3d software engineers and any Web Developers with expertise in graphics to participate. You'll need a W3C account to join, and if you're affiliated with a W3C member, your W3C representative will confirm your participation. If you're not a W3C member, you're still welcome. All participants are required to agree to the Contributor License Agreement.

Contributions

You are not required to be a member of the Community Group or Working Group in order to file issues, errors, fixes or make suggestions. Anyone with a GitHub account can do so.

In order to assure that WebGPU specifications can be implemented on a Royalty-Free (RF) basis, all significant contributions need to be made with RF commitments. Members of the Working Group, and members of the Community Group who have signed the Final Specification Agreement have already committed to the terms of the W3C Patent Policy. Non-members will be requested to provide an RF commitment under terms similar to the W3C Patent Policy.

All contributions must comply with the group's contribution guidelines.

See CONTRIBUTING.md for technical guidance on contributing.

Code of Conduct

This group operates under W3C's Code of Conduct Policy.

Communication

Our primary public chat channel is via Matrix (what is matrix?) at #WebGPU:matrix.org.

For asynchronous concerns, we use GitHub for both our issue tracker and our discussions forum.

Both the Community Group and the Working Group have W3C email lists as well, though these are largely administrative.

cts's People

Contributors

Stargazers

Watchers

cts's Issues

Add presubmit check to make sure test files aren't missing `.spec` from `.spec.ts`

Basically I think this should:

For each file in src/**/*.ts that isn't *.spec.ts:
- Check that it does not export anything called description or g.

Consider "subcases"

Right now we have .params() which generates "cases" from a "test" (runs the "test function" once for each case).

Each test case gets a unique identifier and shows up individually in test results (in /standalone/, cmdline, wpt, and the JSON result format). However sometimes this isn't really necessary - it's not important which case is failing unless you're actually digging deeply into it. For example, cases that just differ in buffer ranges.

Consider renaming .params() to .cases(), and adding a .subcases()* (e.g. g.test('...').desc('...').cases([...]).subcases(p => [...])), which is basically like a for (...) { t.debug(subcase_params); fn(subcase_params); }. The result of the case would be the combined result of all of the subcases.

Ideally, it would still be possible to manually drill down into a subcase by specifying the subcase params (as printed by t.debug()), for developers fixing a bug in an implementation.

*Note: .subcases() needs to be able to receive the parameterization from .cases() because sometimes it might want to depend on it (similar to .expand()).

Lazy-load test cases on /standalone/ page

The big nested DOM tree on the /standalone/ runner page needs to lazily load subtrees instead of just loading everything at once. The only level that really needs to be lazily loaded is the "case" subtrees; it should be okay to greedily load everything up to the "test" (g.test()) because every "test" is handwritten, while the list of cases is generated by the ParamsBuilder so it has huge explosions of cases. In other words, the subtrees underneath DOM nodes corresponding to TestQueryLevel 1 and 2 should be greedily loaded, but 3 and 4 should be lazily loaded.

The slightly tricky part is in what it means to "lazily load" those subtrees. The way the actual test loading works there's currently no way to lazily load the underlying data (the TestTree object). However I'm pretty sure that's not very slow right now, so hopefully we can keep that for now, and all that actually needs to be lazy is the creation of the DOM elements which occurs in makeSubtreeHTML (makeSubtreeChildrenHTML).

Implement RenderPass StoreOp Tests

Coverage needed for testing render pass store operations

/api/validation/

Test that when depthReadOnly is true, depthStoreOp must be store
Test that when stencilReadOnly is true, stencilStoreOp must be store

/api/operation/

/idl/

Test that a color attachment's storeOp is store by default
Test that depth/stencil attachment's depthStoreOp is store by default
Test that depth/stencil attachment's stencilStoreOp is store by default

Add a check to ensure .spec.ts files don't export anything extra

This would just enforce some code structure to make sure common code is not in .spec.ts files, which are thought of as "leaves" of the dependency graph.

This can be done easily in crawl() where we currently assert that .description and .g are present.

Flaky EXCEPTION: No error scopes to pop.

Flaky EXCEPTION: No error scopes to pop. on:

cts:validation/createTexture:it_is_invalid_to_submit_a_destroyed_texture_before_and_after_encode={"destroyBeforeEncode":false,"destroyAfterEncode":false}
cts:validation/createView:Using_defaults_validates_the_same_as_setting_values_for_only_1_array_layer={"format":"rgba8unorm"}
cts:validation/createView:Using_defaults_validates_the_same_as_setting_values_for_only_1_array_layer={"arrayLayerCount":2}
cts:validation/createView:Using_defaults_validates_the_same_as_setting_values_for_only_1_array_layer={"mipLevelCount":1}
cts:validation/createView:creating_cube_map_texture_view={"dimension":"cube","arrayLayerCount":6}
cts:validation/createView:creating_cube_map_texture_view={"dimension":"cube","arrayLayerCount":7}
cts:validation/error_scope:errors_bubble_to_the_parent_scope_if_not_handled_by_the_current_scope=
cts:validation/error_scope:if_an_error_scope_matches_an_error_it_does_not_bubble_to_the_parent_scope=
cts:validation/fences:increasing_fence_value_by_more_than_1_succeeds~
cts:validation/queue_submit:submitting_with_a_mapped_buffer_is_disallowed=

Might be a Chrome bug, unclear.

Make /standalone/ run without WebGPU support

Currently /standalone/ will crash if the WebGPU constants (GPUBufferUsage.*, GPUShaderStage.*, etc.) aren't defined. Now that the test plan is becoming embedded in the CTS, it is annoying that you can't load it in a release browser. Shouldn't be too hard to fix, it's basically a revert of a change I made a while ago.

Can we also have a test 4 BGLs being a success and 5BGLs being an error?

Originally posted by @Kangz in #25

Consider displaying the params builder in standalone

Perhaps it would be helpful to be able to see how a list of cases was built up, in addition to the fully-expanded combinatorial subtree. This would make using the standalone runner as a "test plan viewer" easier, as you could actually read what the test params are.

Just an idea. This is not super important because you can always see this in the code.

Rename web-platform to web_platform

This matches the rest of the naming conventions better (test names use underscores; filenames should too).

It should also be enforced.

Expect Validation Error from Copy Operations

For invalid CopyBufferToBuffer, spec says to generate validation error in the copy operation itself.
Whereas the cts expects the validation error to be reported on commandEncoder.finish()-

cts/src/webgpu/api/validation/copyBufferToBuffer.spec.ts

Lines 45 to 49 in af5392c

 commandEncoder.copyBufferToBuffer(srcBuffer, srcOffset, dstBuffer, dstOffset, copySize); 

 this.expectValidationError(() => { 

 commandEncoder.finish(); 

 }, !isSuccess);

Consider explicit_timeout in WPT

When these tests are run in WPT, there's a timeout of 10 seconds if the test doesn't report a result.

Sometimes our WPT test variants might include so many CTS test cases that they take longer than that. We can fix this by disabling the page-level timeout and implementing our own timeout in wpt.ts (or possibly using step_wait?)

This is low priority. Chromium, at least, doesn't care about this timeout. Instead of relying on pages to time themselves out via testharness.js, Chromium's harness kills the entire browser on an externally-implemented timeout. And for local developers, we expect them to use the /standalone/ runner instead of WPT. So it's not likely that fixing this matters to anyone right now.

A compute pass with just a setBindGroup should still fail so the module isn't needed here (nor is the pipeline and dispatch call)

Originally posted by @Kangz in https://github.com/_render_node/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMTk5Mzc4OTkxOnYy/pull_request_review_threads/discussion

webgpu:api,operation,buffers,map_oom: only checks for a JS exception

This test checks for a JS exception, expecting that mapping an enormous range would cause a RangeError due to failure to allocate an ArrayBuffer that large. On some platforms, this allocation actually succeeds, and then the test fails with an OOM error when the implementation attempts to map such a large GPU buffer.

I suggest this test be updated to allow a GPU OOM error if there is no JS exception.

Configure gts to always use LF line endings

According to #97 (comment), gts on Windows will use CRLF line endings. I'll bet there's a way to configure it to always use LF, and we should.

Consider tweaks to test query format

Right now the format is

webgpu:examples:gpu,with_texture_compression,bc:textureCompressionBC=false;*
webgpu:examples:gpu,with_texture_compression,bc:textureCompressionBC=false

Based on some feedback and some other work I did, I've been considering changing this to:

webgpu:examples:gpu.with_texture_compression.bc:textureCompressionBC=false
webgpu:examples:gpu.with_texture_compression.bc:textureCompressionBC=false;$

in which:

kPathSeparator changes from , to .
- to make it more intuitive
kWildcard * is inverted and replaced with kTerminator $ (or something)
- to make it so that * is not used (it's considered a wildcard at least in some treatments of WPT test paths)
- to make it so that subqueries are always substrings

Condense capability info tables to be more readable/authorable

Originally posted by @kainino0x in #352 (comment)

I don't think it makes sense to split it here in such a way that it affects other files, just for the sake of making this file more editable. I think it's okay if this file is a little special (have to zoom out your editor maybe).

Maybe we could come up with a little table-making helper thing so we don't have to repeat "renderable:" "color:" etc on every line?

componentType can be computed from dataType right? So we could separate that into a separate table. Or use , ...kDataTypeUnorm where const kDataTypeUnorm = { dataType: 'unorm', componentType: 'float' };.

Let's consider both of these as followups.

Deployment broken

Not sure why exactly, but it seems like a file is missing.
https://gpuweb.github.io/cts/standalone/

cc @austinEng @mehmetoguzderin probably related to the work one of you has been doing?

Recover properly from device loss

Now that we try to create only one GPUDevice, we don't recover properly from device loss.

Implement IDL tests

IDL tests can be highly automated. WPT appears to have a helper for this: idlharness.js.

We should see about using that instead of writing any of our own IDL tests.

See also previous notes: https://hackmd.io/@webgpu/SJtUks9CU

Transpile TypeScript at runtime

Item extracted from #3.

Doing this would remove the build step and allow the source of the project to live in WPT. It would also significantly simplify development, removing the build step.

I think it should be done with Babel because I imagine it's lighter weight/faster than tsc, and we're already using it for the regular build so know what the right config is. The hard part is we need to somehow hook the module loading to run through the transpiler.

preprocessor: Join arrays of strings with '\n' for convenience

Maybe pp should allow interpolants to be arrays of strings that will get joined by \n automatically (and maybe even indented properly?)

Originally posted by @kainino0x in #352 (comment)

Add a button to /standalone/ to console.log the testCreationStack

testCreationStack is added in #319; it should be exposed as a button in /standalone/ just like is done with error logs, so devs can click the button to navigate to the test definition in dev tools.

Dev server crashes on unknown suite names

E.g. http://localhost:8080/standalone/?q=doesnotexist:*

Some code somewhere assumes src/doesnotexist will exist.

Update validation tests to disallow zero-sized bindings

createBindGroup:buffer_offset_and_size_for_bind_groups_match tests currently check that zero-sized bindings are valid. This needs to be fixed to reflect the decision in gpuweb/gpuweb#686 to disallow zero-sized bindings. The reason is that we need to require at least one element in a buffer to perform clamping.

Different device Queue and Fence tests send validation error to the wrong receiver

In the following tests:

webgpu:api,validation,fences:signal_a_fence_on_a_different_device_than_it_was_created_on_is_invalid:
webgpu:api,validation,fences:signal_a_fence_on_a_different_device_does_not_update_fence_signaled_value:

A fence is signaled on a queue from a different device than the fence. The test expects a validation error on the fence's device. This is wrong, and the error should be sent to the "receiver" object, the queue.

Add a "placeholder" webgpu stub that implements all of the validation rules?

I used to have a "dummy" implementation of the WebGPU interface (.d.ts) that could be used to run the tests without a browser implementation.

I'm thinking of resurrecting it, and having it implement all of the exceptions and validation rules from the API. I suspect this is pretty tractable in most cases (shader reflection being one exception). It would also be an easy target to run code coverage on, to make sure the exception and validation tests hit every validation check at least in a dummy implementation.

Perhaps it could also be structured to warn or error if a single input would get caught on more than one validation error (e.g. alignment is wrong AND it goes out of bounds), since we generally want to avoid that in tests.

Roadmap

(i.e. to-do list)

Short term

add license
delete dawn-node code for now
probably delete suites/demos
try to clean up the mess of interfaces/typedefs that are all over the place, esp naming/terminology
- document them
- document params interfaces
add presubmit to make sure imports end in .js
move JS from .html to .ts files so it can be checked

Medium term

Longer term

improve standalone runner (Vue? + better styling)

Maybe consider someday

make testing on node+dawn work for real
maybe?? go semi-buildless by adding a service worker that compiles typescript on the fly

Make expectContents take a "validator" function

Update expectContents (and expectTextureContents?) to take a validator function of the form (index, value) -> boolean or (x, y, z, value) -> boolean.

For expectTextureContents, it would probably be called once per texel. But for expectContents, it might be necessary to also tell it the chunk size (e.g. you might want bytes, or floats, or float4s).

This is a much more general way of testing buffer/texture contents than having to provide an ArrayBuffer like now.

Handles expecting multiple possible values
Avoids having to write expected values out into an ArrayBuffer
Makes a lot more sense for texture expectations than writing strided values into an ArrayBuffer

Credit: @Kangz in #384 (comment)

Add validation tests

I've started adding some validation tests at https://github.com/gpuweb/cts/compare/validation
Note that those will need "push/pop Error scope" but for now, they use "uncapturederror".

FYI @Kangz @kainino0x @austinEng

createPipelineLayout tests expect storage buffers to work in vertex stage

I think I made a mistake when generalizing these tests to cover more cases.

cts:validation/createPipelineLayout:dynamic_offsets_are_only_allowed_on_buffers~
cts:validation/createPipelineLayout:number_of_bind_group_layouts_exceeds_the_maximum_value~

Remove descriptions from listing.js?

When listing.js is generated, it reads the description from each .spec.ts file. This was done so that the file list with descriptions could be loaded without loading the .spec.ts files.

I think this is not really a necessary optimization. Saving /standalone/ from having to load every .spec.ts file in the suite is probably not that worth having duplicated info in the build.

This would fix a problem where the description can get out of date in generated listing.js both in builds and in dev_server.

Originally posted by @kainino0x in #386 (comment)

Test that the alpha:true/false of canvas is honored

This came up in a Chromium issue where rendering alpha:zero in an alpha:false canvas still made the canvas transparent. We should have reftests that check this is correctly handled.

createPipelineLayout:visibility_and_dynamic_offsets: produces invalid BGL descriptor

webgpu:api,validation,createPipelineLayout:visibility_and_dynamic_offsets:

iterates through combinations of bindings types, visibility, and dynamic true/false.
For storage textures it is missing the storageTextureFormat in the GPUBindGroupLayoutDescriptor

It is probably sufficient to just put r32uint here.

Don't test depth stencil formats in color formats tests

"color formats must be renderable" test fails because depth stencil format were added in 840ff3f#diff-a4c78fe12bb75d8679eab0ed0211c06d while they were not part of the original test.

Should overlapping vertex attributes be allowed?

Currently api,validation,vertex_state:check_two_attributes_overlapping fails on Chrome Canary and perhaps other implementations because the test expects that overlapping vertex attributes are a validation error.

I couldn't find detail about this in the WebGPU spec, and I think it should be allowed. I suggest we reverse the test to check that overlapping attributes are allowed.

I tested this in Dawn: a test case with overlapping vertex attributes does indeed pass successfully. It's even allowed to have mixed types aliasing the same data (not sure the use case but it can be done).

Generalize an expectTextureContents for all texture types

We're starting to run into more situations where we need to be able to test values that can't be copied. Someone should own an item to implement this as completely as possible (I guess it can be me) before we accumulate too much tech debt.

Originally posted by @kainino0x in #278 (comment)

Update to @webgpu/glslang 0.0.7

When using @webgpu/glslang 0.0.7, code below is needed in glsl.macro.js. However this is not working as babel-plugin-macros expect synchronous code to happen, causing GLSL to be undefined.

const glslangPromise = new Promise(resolve => {
  glslangModule().then(glslang => {
    resolve(glslang);
  });
});

const glslang = await glslangPromise;
const code = glslang.compileGLSL(source, stage, false);

How can we solve this @kainino0x?

expectSingleColorTextureContents

A lot of this looks like maybe it could be a general expectSingleColorTextureContents helper. Could you consider whether that could be abstracted in a followup if it would be used elsewhere?

Originally posted by @kainino0x in https://github.com/gpuweb/cts/pull/165/files

Set up automatic deploy for gh-pages branch

Performance Tests

(migrated from test plan hackmd)

There’s currently no mechanism for performance test results except for test execution time (maybe that’s enough). Tests could fail or warn if one benchmark is “too much slower” than another - possibly useful for testing emulated paths.
Add informal notes here on possible performance tests.

Most of these tests would be inherently flaky, and might just be "manual" tests. However we separate them from "stress" tests (which may also be manual), which will try to get bad things to happen to the system (like OOM, huge draw calls, etc.)

worker thread work (like pipeline or resource creation) shouldn’t block other work
?

TODO: look at dEQP (OpenGL ES and Vulkan) and WebGL for inspiration here.

Node integration

Hey,

I'd like to use this for the node bindings - @kainino0x could you take a look into how the cts could be integrated? Need some initial steps to get started

Thanks

subqueriesToExpand behavior is order dependent

For two subqueries A and B, where A is a superset of B, the following error occurs only if B comes before A.

subqueriesToExpand entry did not match anything (can happen if the subquery was larger than one file, or due to overlap with another subquery)

Automatically add TODO to file description if test group has no tests

Currently I've tried to add an explicit TODO in every description for a file that's currently totally unimplemented.
This could be done automatically... somehow. Not immediately sure how to do it, as the test group doesn't have access to the description; description could be passed into makeTestGroup instead of exported, perhaps.

copyImageBitmapToTexture tests don't work on Workers

These tests use document.createElement which is not available on workers.

In Chrome, we might be able to use new OffscreenCanvas() instead, but this requires OffscreenCanvas 2d support, which other browsers may not have.

Perhaps we should do this without a canvas. It should be possible since the "expected" data is already there.

Run with http://localhost:8080/?worker&q=cts:copyImageBitmapToTexture:

@shaoboyan , could you take a look? Thanks!

How to make sure a WGSL test is failing for the right reasons?

For example file recursion-fail.wgsl has a recursive function and is a test for validation rule v-0004: Recursion is not allowed. Consider this scenario: A PR implements a syntax change and accidently breaks the recursion detection algorithm. This PR could pass the CTS because test recursion.wgsl could now fail due to using an outdated syntax instead of failing for using recursion.

Possible solutions:

Encode validation rule numbers in the test names: v-0004-recursion-fail.wgsl
Implementation: WGSL has IDs for all validation messages so we can map v-0004 back to the "recursion is not allowed" validation error in the WGSL spec. A test runner should verify at least one of v-xxxx validation errors matches the prefix of the test file name in this case v-0004.
Have markers in the test files so that two test cases can be extracted from one test file, a positive and negative test. There would be minimal differences between the two tests.
Implementation: Use markers like #if error/success #endif. The test runners would then be required to process the test file and generate the two test cases.This way both tests fail due to using an outdated syntax construct.

Test anything that takes ArrayBuffers/ArrayBufferViews with shared and non-shared

val: Update validation tests to enforce per-stage and per-pipeline-layout limits on BGL creation

Following resolution at the F2F gpuweb/gpuweb#409 (comment), the following tests need to be updated to enforce per-stage and per-pipeline-layout limits on BGL creation:

webgpu:api,validation,createBindGroupLayout:max_resources_per_stage,in_bind_group_layout,*
webgpu:api,validation,createBindGroupLayout:number_of_dynamic_buffers_exceeds_the_maximum_value,*
webgpu:api,validation,createPipelineLayout:number_of_dynamic_buffers_exceeds_the_maximum_value:*

Remove isGPULittleEndian

I think we said this GPU were all going to be little endian?

Originally posted by @Kangz in #352 (comment)

Ah, we can remove this globally in another PR then.

Add `assert` once on Typescript 3.7.x and sufficiently new Babel

Typescript 3.7 will allow assert to implement a type guard:

export function assert(condition: any, msg?: string): asserts condition {
  if (!condition) {
      throw new Error(msg);
  }
}

This can replace a bunch of places where we do if (...) throw new Error(...);

	commandEncoder.copyBufferToBuffer(srcBuffer, srcOffset, dstBuffer, dstOffset, copySize);

	this.expectValidationError(() => {
	commandEncoder.finish();
	}, !isSuccess);