data-apis / array-api-tests Goto Github PK

View Code? Open in Web Editor NEW

62.0 62.0 35.0 1.41 MB

Test suite for the PyData Array APIs standard

Home Page: https://data-apis.org/array-api-tests/

License: MIT License

Python 100.00%

array-api-tests's People

Contributors

Stargazers

Watchers

array-api-tests's Issues

Test shape broadcasting for _all_ elementwise/operator tests

Coverage of broadcasting is spotty here, so I need to review each test and see what needs to be done.

Make special case tests only use relevant arrays

Right now we generate arrays irrelevant to test cases, e.g. arrays with no NaNs in a test case about NaN behaviour. A low-priority thing is to generate test cases which are relevant to the test case (either by xps.arrays(..., elements={...}) or xps.arrays(...).filter(...)). Probably requires some significant reworking of generate_stubs.py.

To quote myself from #38:

It seems like the only problem with numpy.array_api (when running 2000 examples) is in-place addition.
>>> x1 = xp.asarray(-0.0)
>>> x1
Array(-0., dtype=float64)
>>> x2 = xp.asarray(-0.0)
>>> xp.add(x1, x2)
Array(-0., dtype=float64)
>>> x1 + x2
Array(-0., dtype=float64)
>>> x1 += x2
>>> x1
Array(0., dtype=float64)  # not negative 0, like it is with xp.add() and __add__()
Interestingly I had to run like 200 examples to get this error consistently. Something for the future would be to filter strategies from the @given() level so that only arrays with NaNs/infs/whateverproperty-we're-testing are generated, which would mitigate this problem.

Test casting 0d arrays to Python scalars

i.e. __bool__, __float__, __int__, and __index__

Type promotion tests

Here is the spec for type promotion rules: https://github.com/data-apis/array-api/blob/master/spec/API_specification/type_promotion.md

Questions:

As with #1, how do I create inputs for the tests? I didn't see any specification on how to create arrays or how to specify dtypes.
Do I read the document correctly in that all the listed dtypes are required to be implemented?
The bullet points at the bottom specify different semantics for zero-dimensional arrays and scalars. Is the distinction between these two spelled out somewhere? As far as I know, NumPy distinguishes these types but they mostly behave the same (and they seem to both participate in type promotion).

Use equality instead of hasing for dtype helpers

@asmeurer noted in #110 (comment) that NumPy proper (i.e. numpy, as opposed to numpy.array_api) wasn't working with the test suite due to the following code. The problem was that the dtype attribute of NumPy-proper arrays were being used in conjuction with namespaced dtypes—these two are different objects, and thus have different hashes so they look like different keys in a dict.

>>> import numpy as np
>>> dtypes_map = {np.int64: "foo"}
>>> dtypes_map[np.asarray(0).dtype]
KeyError

This behaviour is isn't specifically ruled out in the spec, as the spec only says dtypes need equality, which NumPy-proper does conform to.

>>> np.asarray(0).dtype == np.int64
True

So the test suite should phase out assumptions of namespaced dtypes and array dtypes sharing the same hash, instead relying on equality.

Test new `copy` arg in `test_reshape`

Low prio until we get an actually implementation that supports the copy kwarg properly in asarray(), let alone reshape(). A util should be created to refactor what was already done for test_asarray_arrays:

array-api-tests/array_api_tests/test_creation_functions.py

Line 270 in 63ebadb

if copy is not None:

Don't require functions to raise a certain error for an unsupported number of inputs

array-api-tests/array_api_tests/test_signatures.py

Line 182 in 6e3684b

 raises((TypeError, ValueError), lambda: mod_func(*args[:n]), f"{name}() should not accept {n} positional arguments") 

requires a function to raise either a TypeError or a ValueError in case it is called with an unsupported number of positional arguments. This is not true for PyTorch:

>>> import torch
>>> torch.Tensor.__add__()
NotImplemented

There is nothing about this in the specification.

`test_array_object.py::test_getitem_masking` relies on shape assertion

test_array_object.py::test_getitem_masking fails in Dask due to a assert_shape call on a masked array. This should be hidden behind the --disable-data-dependent-shapes flag.

See https://github.com/tomwhite/dask/runs/5306535598?check_suite_focus=true

Extend some type promotion tests

Currently result_type, meshgrid, tensordor and vecdot have type promotion tests, but I should make them standalone and test the output generally. Also it seems I accidentality removed the meshgrid test I wrote before 😅

Test `xp.asarray()`

Table of tests coverage

We need to document the test coverage.

I have pushed a page tests-coverage.md to master and enabled GitHub Pages so that it shows at https://data-apis.org/array-api-tests/tests-coverage.html (it also created a page for the README). I started a basic table with statistical functions, based on #39. It's not yet completely filled out.

@honno Let me know what you think of the general styling and the categories. My thinking is each cell will have "Yes", "No", or "N/A", and if there is more to say about something we can add a footnote. The idea is based on tables you often see on Wikipedia (e.g., this one).

Feel free to edit it and to fill it out more. You can push directly to master if you want. I don't know if there is any other way to see a preview of the page without pushing to master and letting GitHub render it. You can also change the theme if you want (see https://github.com/data-apis/array-api-tests/settings/pages). I was also thinking of adding some Javascript that colors the cells green if they have "Yes" in them and red otherwise.

By the way, if you use emacs, the markdown-mode makes it really easy to edit the table. You can just press tab and enter to move between cells, and it automatically realigns them. I don't know if there are similar plugins for other editors.

Improve indexing coverage

Merge the current getitem indexing tests together
- xps.indices() has come a long way in covering everything in the spec (and no more!)—I've experimented and am fairly confident that it can now supplant the current custom indexing strategies
Test getitem indexing elements (tricky!)
Test __setitem__
Test boolean indexing
Test non-set elements remain the same in __setitem__
Test non 0d arrays in __setitem__

Clear up CI

~~IMO it's not worth touching CI as the 1st spec release is so close as well as NumPy, but here's an issue to keep us accountable when time time comes :)~~ well it's 2022 now so

Update stubs
- Create a custom pre-commit hook to check/fix outdated stubs (done something similar)
Identify remaining tests to xfail for the NumPy job

Subnormals make testing flush-to-zero libs/builds impractical

Running the test suite against cupy.array_api I'm seeing a lot of the following error:

self = <hypothesis.extra.array_api.ArrayStrategy object at 0x7fd585346f40>, val = 5.877471754111438e-39, val_0d = Array(0., dtype=float32)
strategy = floats(width=32)

    def check_set_value(self, val, val_0d, strategy):
        finite = self.builtin is bool or self.xp.isfinite(val_0d)
        if finite and self.builtin(val_0d) != val:
>           raise InvalidArgument(
                f"Generated array element {val!r} from strategy {strategy} "
                f"cannot be represented with dtype {self.dtype}. "
                f"Array module {self.xp.__name__} instead "
                f"represents the element as {val_0d}. "
                "Consider using a more precise elements strategy, "
                "for example passing the width argument to floats()."
            )
E           hypothesis.errors.InvalidArgument: Generated array element 5.877471754111438e-39 from strategy floats(width=32) cannot be represented with dtype float32. Array module cupy.array_api instead represents the element as 0.0. Consider using a more precise elements strategy, for example passing the width argument to floats().

I think the comparison self.builtin(val_0d) != val should be changed to isclose(val_0d, val).

Compare numerical functions against an arbitrary precision library

See data-apis/array-api#29.

We could compare numerical functions against an arbitrary precision library. The question is how off something should be for it to be a failure, but at least we can report the largest deviation (hypothesis makes this straightforward).

mpmath is a good option, as it is pure Python and well tested as it is used inside of SymPy.

One technical issue is that mpmath's arbitrary precision floats have infinite range, unlike machine floats which overflow and underflow. As far as I can remember, that is the main difference between an mpmath.mpf with dps=15 and a machine float, but there may be other differences as well that I'm not remembering.

Split the test suite for spec versions

The 2021 version of the spec is going to be finalized soon. The test suite isn't finished yet. However, we want to have a version of the test suite that runs only against 2021, so that people can run the suite without having to worry about features that are in the 2022 draft.

We should figure out some way to enable or disable test suite features. It would also be nice to be able to do this for the extensions, so that people can easily run the test suite without the linear algebra extension, for instance. Perhaps the best way to do this is using pytest marks? Another idea would be to split the parts of the suite into submodules so that people can run pytest array_api_tests/2021 or pytest array_api_tests/2021/extensions/linear_algebra. For vendoring people could remove any submodules that aren't relevant.

I'm guessing pytest marks will be better, but I need to investigate them to see how well suited they are for this.

Skip rather than xfail tests in workflows

Currently our NumPy workflow has a flaky test in test_creation_functions.py::test_linspace, which means sometimes we'll get an XFAIL and sometimes an XPASS. The random-ish nature of Hypothesis means this is often a possibility for bugs the test suite identifies. Therefore I think instead of xfailing tests for a workflow, we should skip them completely. We could also mix-and-match, but I think it's best to set a precedent that's simple for outsiders to understand and use.

Specify dependency of each test case

The test suite is mostly made up of test methods for each function (or array object method) in the spec. The majority of these tests require other functions to do everything we want, which is problematic when an underlying function does not work—you'll get a lot of spammy errors that actually relate to a single fundamental problem... or even false positives.

So I think it would be a good idea if we declared the dependencies of each test method, i.e. the functions we use and assume has correct behaviour. We could use these declarations to create a dependency graph, which could be hooked with pytest to prioritise zero/low-dependency tests first, and by default skipping tests that use functions we've deemed incorrect. This will benefit:

Array API adoptors, who would much more easily see what they should prioritise developing
Us, who'd be able to see areas where we can try to cut down the functions we depend on

Ideas for declaring dependencies per test method:

Decorator which lists function and method names
Automatically infer dependencies via looking at function code (either searching the code as string, or using the AST)

General challenges would be:

How to get pytests collection hook to do this
Supporting array object methods alongside the top-level functions
- As well as operator symbols

The dependency graph + auto skipping would also allow us to:

Remove some mirrored functions in array_helpers
Remove the module stubbing mechanism
Check that no tests is using the function they're testing for assertions

Generate test cases for the special case behaviour of all operators

(This issue is a bit premature as data-apis/array-api#299 hasn't gone through—just didn't want to forget!)

Unable to disable / deselect currently unsupported dtypes

I'm currently running this test suite against PyTorch. For now uint16, uint32, and uint64 are not supported. Unfortunately, this marks a lot of tests as failing completely instead of only failing for these dtypes. For example in

array-api-tests/array_api_tests/test_broadcasting.py

Lines 113 to 117 in b47e667

 @pytest.mark.parametrize('func_name', [i for i in 

  elementwise_functions.__all__ if 

  nargs(i) > 1]) 

 @given(shape1=shapes, shape2=shapes, dtype=data()) 

 def test_broadcasting_hypothesis(func_name, shape1, shape2, dtype):

only the operator name is regarded as parameter, whereas the shapes and the dtype in handled within the test.

To make this easier it should be possible to

disable unsupported dtypes in one central place, or
make the dtype a parameter so it can be deselected with pytest's -k flag.

Broadcasting tests

Here is the spec for broadcasting https://github.com/data-apis/array-api/blob/master/spec/API_specification/broadcasting.md.

Here are some questions about it:

How do I create input arrays for the test? The array object document is empty https://github.com/data-apis/array-api/blob/master/spec/API_specification/array_object.md. Do we have at least enough an idea of what that will look like so I can create tests?
What is the best way to test broadcasting? The simplest would be to use a function like numpy.broadcast_arrays or numpy.broadcast_to, but these aren't listed in the spec. And even NumPy doesn't have a function that directly implements the shape broadcasting algorithm—it can only be done to explicit arrays. The spec says broadcasting should apply to all elementwise operations. What is a good elementwise operation that we can use to test only the broadcasting semantics? Or should we make sure to test all of them?
The spec doesn't actually specify how resulting broadcast array should look, only what its shape is. Is this intentional? Should we test this? If not, it means we don't actually test the result of a broadcasted operation, only that the shape/errors are correct.
As I understand it, "potentially enable more memory-efficient element-wise operations" means that broadcasting does not necessarily need to be done in a memory-efficient way, i.e., libraries are free to copy axes across broadcast dimensions rather than using something like a stride trick.

Newer versions of PyCharm obfuscate errors found by Hypothesis

I frequently get stack traces such as the following when running tests. These traces don't include the failing lines of code, which is preventing me from being able to find the problems.

ivy_tests/test_array_api/array_api_tests/test_operators_and_elementwise_functions.py::test_isfinite FAILED [100%]
ivy_tests/test_array_api/array_api_tests/test_operators_and_elementwise_functions.py:908 (test_isfinite)
@ given(xps.arrays(dtype=xps.numeric_dtypes(), shape=hh.shapes()))
def test_isfinite(x):
E hypothesis.errors.MultipleFailures: Hypothesis found 3 distinct failures.
ivy_tests/test_array_api/array_api_tests/test_operators_and_elementwise_functions.py:910: MultipleFailures

This issue may be better suited in the Hypothesis repo, but thought I would add here in case others using the array api tests run into the same issue when trying to debug the tests. Is there a good way to find the problematic code from this stack trace?

Avoid using boolean array indexing in special case tests

Boolean array indexing is optional in the spec since the output shape is data dependent. Libraries like Dask create output arrays where the shape is unknown, which causes the special case tests to fail since they use masking in their implementation. It would be good if these tests could avoid using boolean array indexing so the special cases can be checked in Dask.

Update dtype strategies for elementwise/operators tests

Many of the elementwise/operator tests were written a while back, and there has been a lot of updates since in regards to what dtypes are allowed. This means we currently don't test all possible inputs!

Bug found today by @IvanYashchuk was numpy.array_api.pow() not working for integer arrays, probably because of the test suite was not updated with the spec.

So I'll need to review all the test methods to identify which aren't using all valid dtypes, and subsequently update them.

Clean way to optional enable/disable extensions

Now that we are adding more tests for the linalg extension, and we will soon have other extensions as well, we need to have a clean way for modules to not implement these extensions and not fail the test suite. Some test files mix extension and non-extension tests, e.g., the signature tests currently test all signatures.

This is closely related to #20.

Fix `descending=True` logic in `test_argsort`

Currently NumPy just flips the result, which is erroneous when input arrays have duplicate elements (in stable sort scenarios at least). I followed the behaviour when implementing test_argsort, and thus the bug has carried over 😅

>>> x = [0, 1, 0]
>>> from numpy import array_api as xp
>>> xp.argsort(xp.asarray(x))
Array([0, 2, 1], dtype=int64)
>>> xp.argsort(xp.asarray(x), descending=True)
Array([1, 2, 0], dtype=int64)  # should be [1, 0, 2], but we incorrectly say this is okay

cc @pmeier

Add type promotion tests for non-elementwise functions

Currently the type promotion tests only test elementwise functions and array operators. But there are several other functions in the spec that also participate in type promotion. We should add them to the tests. This should be a question of just going through the parameterizations and adding any function not in the elementwise section that should participate in type promotion. We will need to make some minor generalizations to the tests as some functions only type promote on certain arguments (for example where has three arguments but only type promotes on the second and third). It may be a good idea to look at #21 before or concurrently with this.

Test special cases for statistical functions

There are a few special cases in statistical functions that we don't cover.

Testing keyword argument defaults

We need to make sure all the function tests test the default values for keyword arguments, that is, calling the function without passing the keyword argument. This explicitly isn't tested in the signature tests because those tests aren't sophisticated enough to check that the correct default value is used.

I think we can abstract this somewhat in the hypothesis helpers. Something like

class DefaultValue:
    """
    Stand-in value for a keyword argument not being passed
    """

def kwarg(default, kwarg_values):
    return one_of(DefaultValue, just(default), kwarg_values)

with a strategy helper kwarg that would work like

@given(x=arrays(), offset=kwarg(0, integers()))
def test_diagonal(x, offset):
    if offset is DefaultValue:
        res = diagonal(x)
        offset = 0
    else:
        res = diagonal(x, offset=offset)
 
    # Test behavior here

Maybe we could even abstract this further by generating using the function stubs. I'm not sure. The stubs won't contain enough information to infer what strategy values should be drawn from (even with type hints, it won't contain things like bounds or only specific sets of dtypes), but it does have the default values.

Test broadcastable shapes and promotable dtypes for in-place operators

The spec only says the result should not change due to the right argument, so there is room to expand these tests. Same goes for type promotion tests.

Test unstable sorts correctly

@pmeier bought to my attention that test_argsort is too strict for unstable sorts. This would also go for test_sort, but actually there's no way to distinguish what's been sorted anyway. The suite should test that unstable sort indices are in a group of all possible indices.

test_prod failing for PyTorch due to unsupported dtypes

Despite the useful updates referenced in this issue, the prod test is still failing on this line with a PyTorch backend, even with FILTER_UNDEFINED_DTYPES = True and flag -k "not (uint16 or uint32 or uint64)"

The error is: AssertionError: out.dtype=uint8, but should be uint32 [prod(uint8)]

A PR, which had failing CI as a result, is here.

A short discussion on this topic occurred here.

Make two_mutual_arrays accept a dtypes strategy as input

two_mutual_arrays accepts a list of dtype objects rather than a strategy for generating dtypes. This is somewhat annoying, as everything else uses the dtypes strategies.

A issue with this is that it currently ignores the filtering that is done in the dtype strategies. This makes any test using it fail in pytorch, which doesn't have uint8.

If we can make it accept a dtypes strategy, that would be great. We may just need to have an internal mapping of dtype strategies to promotable dtype strategies. Or maybe we can just draw a dtype then draw another dtype that is promotable compatible with it.

`test_concat` is wrong

While running the test suite against PyTorch, I got this error:

Falsifying example: test_concat(
    dtypes=(torch.int16, torch.int8), kw={'axis': -2}, data=data(...),
)
Draw 1 (x1): tensor([[1],
        [1]], dtype=torch.int16)
Draw 2 (x2): tensor([[0]], dtype=torch.int8)
array_api_tests/test_manipulation_functions.py:112: in test_concat
    ph.assert_0d_equals(
array_api_tests/pytest_helpers.py:197: in assert_0d_equals
    assert x_val == out_val, msg
E   AssertionError: out[(1, 0)]=1, should be x2[0, :][(0,)]=0 [concat(axis=-2)]

Repro:

import torch

x1 = torch.tensor([[1], [1]], dtype=torch.int16)
print(x1)
x2 = torch.tensor([[0]], dtype=torch.int8)
print(x2)
out = torch.concat([x1, x2], dim=-2)
print(out)

tensor([[1],
        [1]], dtype=torch.int16)
tensor([[0]], dtype=torch.int8)
tensor([[1],
        [1],
        [0]], dtype=torch.int16)

The error happens, because the test falsely checks x2[0] against out[1] although it should check x1[1] since x1 has two elements.

Support vendoring this test suite by removing need for `array-api` submodule

In #104 we introduce a git submodule of the array-api repo, which going forward is required for the test suite to function (e.g. special case tests). This is not ideal for some vendoring use cases where array libraries such as NumPy might like to ship the test suite in their own repo, and a git submodule would be a big nuisance to that end.

There are two current options I see:

Hard code the signatures files ala the ones we generated with generate_stubs.py. A pre-commit hook could alleviate the pain somewhat of having an out-of-sync repo, which was a common problem before.
Allow the user to specify somewhere (variable? env? package?) where the signatures folder/package is, so someone vendoring can vendor the spec too and use that.

Parameterizing the tests across different modules

The test suite itself should be something that works for any module. So we need to be able to parameterize the module.

@rgommers's suggestion at https://github.com/Quansight/pydata-apis/issues/13:

So something like from ._module_under_test.py import mod, then use mod.<func_name> for everything.

I think we also want some way to specify it without editing a file, as that will make things easier for local development of the test suite. Maybe by specifying some environment variable?

Testing function signatures

For example, for https://github.com/data-apis/array-api/blob/master/spec/API_specification/elementwise_functions.md. The spec specifies:

Positional parameters must be positional-only parameters. Positional-only parameters have no externally-usable name. When a function accepting positional-only parameters is called, positional arguments are mapped to these parameters based solely on their order.

Optional parameters must be keyword-only arguments.

It also specifies a signature for each function.

However, I don't know how to test these things. inspect.signature doesn't work for NumPy ufuncs, and presumably won't work for many other objects implemented in C. Such things could probably make themselves work with it by defining __signature__, but do we want to require this?

Otherwise, I don't think positional-only can be tested. I can probably test other aspect of the signature like the number of positional arguments by checking that the wrong number of arguments leads to a TypeError.

(also as an aside, out is not keyword-only for NumPy functions)

Error messages should print arrays

I found for sort tests that out[<idx>]=foo is useful, but it'd be nicer to also see the input and out arrays after a newline. I should review the pytest helpers so we display this information for all tests.

Setup some CI

It would be good to have some kind of CI here. For starters, we should check that the stub generation works. The CI will need access to clone the private spec repo. What is the best option for this?

Alternately, if the repo will go public soon, we can wait for that if it will make things easier.

Installable as a package?

I don't know if it would be beneficial to have the test suite be installable as a Python package (and to release it on PyPI).

For now, it is not installable. You can run the suite by cloning the repo and running. If you are an array library maintainer and having the test suite installable would make things easier for you, please let us know.

Pretty print indices

Right now indices are usually printed as tuples e.g. out[(1, 3)], but it'd be nice if they were reduced e.g. out[1, 3]. I should review pytest_helpers.py at least to do this with a new util. This goes for ellipsis and slices too.

Add simple example of running with different frameworks in the README

While the NumPy tests need to be run like so

import numpy.array_api as array_module

array_api is not an attribute of other modules such as torch or tensorflow. Presumable these should be run like so:

import torch as array_module

import tensorflow as array_module

A couple of sentences stating this in the README might be helpful? Otherwise if they are not officially supported yet (I noticed there is only a github workflow for numpy), then maybe stating this in the README would be helpful? It isn't immediately obvious to me how I should run these tests for different backend frameworks. Any help appreciated!

Clean up the data in test_type_promotion.py

Everything that's not a test in test_type_promotion.py is a mess right now. There are tons of data dictionaries and helper functions. Quite a few of them are needed throughout the test suite, so they should be moved to a more central location. Furthermore, the data has a lot of redundancies, and even so, to use it, you often have to do tons of dict lookups to get at what you want. I think the biggest problem here is that I used the string values of the dtypes as the main key in the dictionaries, meaning you always have to translate those to actual dtype objects to use them. We should just use the actual dtype objects instead. The only concern here is that the dtype objects might not be hashable. But I don't think this is an actual problem in practice. If it somehow does become a problem for some library, we can worry about it then, and whether it should be worked around in the suite or if we should just require hashability in the spec (we already require equality). Finally, the parameterization logic for the tests that are in test_type_promotion is a bunch of hard to read list comprehensions. They may become easier to read once we refactor the data. If not, we should clean them up as well. So to summarize

Move the reusable data and helper functions to a central location (I'm thinking a new file, like dtype_helpers.py). The input and output type mappings should also be moved, as they are also useful outside of just these tests.
Change the data to be keyed on the dtype objects themselves. I don't actually care if we keep the "i4" type names anywhere. Those are used in the spec, but they aren't actually that clear. I would rather we just use the dtype name itself, like "int32", everywhere, even in the places where the string names are currently used (I think it's only in the type promotion parameterization keys).
Other refactors like renaming variables to better names are also welcome.
Clean up the parameterization logic in test_type_promotion.py

Test explicitly that constants are Python scalars

data-apis/array-api#169 clarified that the constants e, inf, nan, and pi should be Python scalars and not just behave like one. Our current tests

array-api-tests/array_api_tests/test_constants.py

Lines 5 to 11 in 6e3684b

 def test_e(): 

 # Check that e acts as a scalar 

 E = full((1,), e, dtype=float64) 

 # We don't require any accuracy. This is just a smoke test to check that 

 # 'e' is actually the constant e. 

 assert all(less(abs(E - 2.71), one((1,), dtype=float64))), "e is not the constant e"

don't account for that.

Tests for operators

Every __operator__ method has a corresponding function which should have the same behavior, meaning the tests for those operators should be able to just reuse the test for the function. We should figure out a clean way to do this. Maybe it's just a question of parameterizing over the function so that something like test_add will parameterize over add(), x1.__add__ and x1.__radd__. There's also the complication of the in-place operators (x1.__iadd__), which work just like the operators except the input shape and dtypes are more restricted (the result shape and dtype must be the same as the lhs). This might just be a case of only testing op= when the inputs would be legal.

These are already split out in the type promotion tests (which are currently the only tests that test the operator methods). I don't know if those tests also need to be merged. If they can be merged simply they can be, but otherwise it is fine to keep them separate as they are now.

Test type promotion in elementwise/operator tests

@asmeurer has already said to me that he found the granular type promotion tests useful when developing numpy.array_api, but I think it'd be nice if the primary tests also did this anyway.

An implementer might rely on the primary test when developing an elwise/op function, so end up missing the type promotion errors.
xfailing test_type_promotion.py tests is annoying by way of how it parametrizes (e.g. I'm finding this for #49), so it'd be nice if we could have a CI job that only runs the primary tests (as they should implicitly cover the type promotion and signature tests).

Meta tests

Given the complexity of some of the tests, it would be a good idea to test that they are actually testing what we expect. This entails two things

Faking out modules with all the known errors and making sure that the corresponding test fails as expected
Faking out a module that doesn't give any errors

I don't actually know how to do the first one. Can pytest be used as a library, to run just a single test?

For the second one, I don't know if we can actually do it. It would amount to writing a module that actually conforms to the spec. Such an endeavor might be out of scope for this project.

symmetric_matrices function not creating matrices as it should

I was implementing the eigvalsh function in a repository following the Array API standard and had some issues when testing, because of the symmetric_matrices function.

The first thing is that infinite values are being allowed on the creation of the matrices. This is really weird because the elements dictionary is being constructed with allow_infinity : False and passed as a parameter to the creation function, but the resulting matrices are getting infinite values nonetheless.

The other thing is that, now, the logic for producing a symmetric matrix is to get the triangular upper when diagonal=0 of a random matrix, get the triangular upper when diagonal=1 and add the first one and the transpose of the second one. This is creating an issue because when you have more than two dimensions, the transpose operation takes all dimensions into account, not only the last two and, at the end of the function, that leads to an attempt to create a matrix that is not of the shape (..., N, N). One example of this happening is a particular case that I came across where the first tensor's shape was (0, 2, 2) and after transposing, its shape turned to (2, 2, 0) and when we tried to return their addition, it caused a crash due to broadcasting error.

Inconsistent arguments in some helper functions

For example, in assert_dtype, the in_dtype is before out_dtype

array-api-tests/array_api_tests/pytest_helpers.py

Lines 83 to 90 in 511929c

 def assert_dtype( 

 func_name: str, 

 in_dtype: Union[DataType, Sequence[DataType]], 

 out_dtype: DataType, 

 expected: Optional[DataType] = None, 

 *, 

 repr_name: str = "out.dtype", 

 ):

in assert_keepdimable_shape the in_shape is after out_shape

array-api-tests/array_api_tests/pytest_helpers.py

Lines 181 to 189 in 511929c

 def assert_keepdimable_shape( 

 func_name: str, 

 out_shape: Shape, 

 in_shape: Shape, 

 axes: Tuple[int, ...], 

 keepdims: bool, 

 /, 

 **kw, 

 ):

By the way, it would be useful to have small docstrings for the various helper functions.

Linalg stacking tests should not test exact equality

The following two failures show that the exact equality test in _test_stacks should be relaxed, as the results agree within some tolerance that's both hardware and software dependent:

__________________________________________________________________ test_eigh __________________________________________________________________

>   ???

test_linalg.py:217: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_linalg.py:232: in test_eigh
    _test_stacks(lambda x: linalg.eigh(x).eigenvalues, x,
test_linalg.py:61: in _test_stacks
    assert_exactly_equal(res_stack, decomp_res_stack)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

x = Array([0.       , 1.9999999], dtype=float32), y = Array([0., 2.], dtype=float32)

    def assert_exactly_equal(x, y):
        """
        Test that the arrays x and y are exactly equal.
    
        If x and y do not have the same shape and dtype, they are not considered
        equal.
    
        """
        assert x.shape == y.shape, f"The input arrays do not have the same shapes ({x.shape} != {y.shape})"
    
        assert x.dtype == y.dtype, f"The input arrays do not have the same dtype ({x.dtype} != {y.dtype})"
    
>       assert all(exactly_equal(x, y)), "The input arrays have different values"
E       AssertionError: The input arrays have different values

array_helpers.py:181: AssertionError
----------------------------------------------------------------- Hypothesis ------------------------------------------------------------------
Falsifying example: test_eigh(
    x=Array([[[1., 1.],
            [1., 1.]]], dtype=float32),
)
________________________________________________________________ test_eigvalsh ________________________________________________________________

>   ???

test_linalg.py:241: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_linalg.py:248: in test_eigvalsh
    _test_stacks(linalg.eigvalsh, x, res=res, dims=1)
test_linalg.py:61: in _test_stacks
    assert_exactly_equal(res_stack, decomp_res_stack)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

x = Array([0.       , 1.9999999], dtype=float32), y = Array([0., 2.], dtype=float32)

    def assert_exactly_equal(x, y):
        """
        Test that the arrays x and y are exactly equal.
    
        If x and y do not have the same shape and dtype, they are not considered
        equal.
    
        """
        assert x.shape == y.shape, f"The input arrays do not have the same shapes ({x.shape} != {y.shape})"
    
        assert x.dtype == y.dtype, f"The input arrays do not have the same dtype ({x.dtype} != {y.dtype})"
    
>       assert all(exactly_equal(x, y)), "The input arrays have different values"
E       AssertionError: The input arrays have different values

array_helpers.py:181: AssertionError
----------------------------------------------------------------- Hypothesis ------------------------------------------------------------------
Falsifying example: test_eigvalsh(
    x=Array([[[1., 1.],
            [1., 1.]]], dtype=float32),
)

	@pytest.mark.parametrize('func_name', [i for i in
	elementwise_functions.__all__ if
	nargs(i) > 1])
	@given(shape1=shapes, shape2=shapes, dtype=data())
	def test_broadcasting_hypothesis(func_name, shape1, shape2, dtype):

	def test_e():
	# Check that e acts as a scalar
	E = full((1,), e, dtype=float64)

	# We don't require any accuracy. This is just a smoke test to check that
	# 'e' is actually the constant e.
	assert all(less(abs(E - 2.71), one((1,), dtype=float64))), "e is not the constant e"

	def assert_dtype(
	func_name: str,
	in_dtype: Union[DataType, Sequence[DataType]],
	out_dtype: DataType,
	expected: Optional[DataType] = None,
	*,
	repr_name: str = "out.dtype",
	):

	def assert_keepdimable_shape(
	func_name: str,
	out_shape: Shape,
	in_shape: Shape,
	axes: Tuple[int, ...],
	keepdims: bool,
	/,
	**kw,
	):

data-apis / array-api-tests Goto Github PK

array-api-tests's People

Contributors

Stargazers

Watchers

Forkers

array-api-tests's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs