data-apis / array-api Goto Github PK

View Code? Open in Web Editor NEW

205.0 32.0 42.0 11.26 MB

RFC document, tooling and other content related to the array API standard

Home Page: https://data-apis.github.io/array-api/latest/

License: MIT License

Makefile 0.07% Python 99.61% HTML 0.02% JavaScript 0.30% CSS 0.01%

pydata standard spec

array-api's People

Contributors

Stargazers

Watchers

array-api's Issues

Should finfo have smallest normal/subnormal attributes?

Following up on gh-129: we removed the tiny attribute from finfo because it's badly named. In the discussion @kgryte explained its purpose and proposed instead to add two new attributes: smallest_normal and smallest_subnormal. That does sound like a good idea, however it'd be good to propose adding those attributes to numpy.finfo first, because it's not existing API and if NumPy doesn't want to extend finfo then we're better off leaving it out probably.

Some issues with type annotations

Some issues with type annotations that were brought up by @BvB93 at numpy/numpy#18585

Relevant array/tensor libraries

This issue is meant to collect libraries that we should be aware of and perhaps take into account (data on how their API looks, impact of choices on those libraries, etc.).

Array and tensor libraries (TODO: classify main characteristics):

And related projects (accelerators, runtimes, compiler infrastructure, etc.):

Arguments need not be strictly positional or keyword-only for creation and manipulation functions

It makes sense to require positional only arguments for functions like add() where there are no meaningful names, and likewise to require keyword arguments for true options.

However, this is less useful for most creation and manipulation functions. For example, arange currently has the signature arange(start, /, *, stop=None, step=1, dtype=None), but readable code could pass both start and stop as either positional or keyword arguments, e.g., np.arange(start, stop) and np.arange(start=0, stop=10).

I would suggest revisiting all of these functions and allowing arguments to be positional and keyword based when appropriate. Here are my suggestions off-hand:

arange(start, /, *, stop=None, step=1, dtype=None) -> arange(start, stop=None, *, step=1, dtype=None)
empty(shape, /, *, dtype=None) -> empty(shape, dtype=None)
full(shape, fill_value, /, *, dtype=None) -> full(shape, fill_value, *, dtype=None)
linspace(start, stop, num, /, *, dtype=None, endpoint=True) -> linspace(start, stop, num, *, dtype=None, endpoint=True)
ones and zeros should match empty
expand_dims(x, axis, /) -> expand_dims(x, /, axis) or even expand_dims(x, /, *, axis) to match other manipulation functions
reshape(x, shape, /) -> reshape(x, /, shape)
roll(x, shift, /, *, axis=None) -> roll(x, /, shift, *, axis=None)

update Sphinx theme colors to new logo/colorscheme

May as well do this before making things public, should be very quick. The green and dark blue will look better than the bright purple that's used now.

device support

For array creation functions, device support will be needed, unless we intend to only support operations on the default device. Otherwise what will happen if any function that creates a new array (e.g. create the output array with empty() before filling it with the results of some computation) is that the new array will be on the default device, and an exception will be raised if an input array is on a non-default device.

We discussed this in the Aug 27th call, and the preference was to do something PyTorch-like, perhaps a simplified version to start with (we may not need the context manager part), as the most robust option. Summary of some points that were made:

TensorFlow has an issue where its .shape attribute is also a tensor, and that interacts badly with its context manager approach to specifying devices - because metadata like .shape typically should live on the host, not on an accelerator.
PyTorch uses a mix of a default device, a context manager, and device= keywords
JAX also has a context manager-like approach; it has a global default that can be set, and then pmaps can be decorated to override that. The different with other libraries that use a context is that JAX is fairly (too) liberal about implicit device copies.
It'd be best for operations where data is not all on the same device to raise an exception. Implicit device transfers are making it very hard to get a good performance story.
Propagating device assignments through operations is important.
Control over where operations get executed is important; trying to be fully implicit doesn't scale to situation with multiple GPUs
It may not make sense to add syntax for device support for libraries that only support a single device (i.e., CPU).

Links to the relevant docs for each library:

Next step should be to write up a proposal for something PyTorch-like.

Extra space after every inline code

In the rendered spec, there is a weird formatting issue. Every piece of inline code includes an extra space after it. For example, here:

It looks like:

If x_i is NaN , the result is NaN .

instead of

If x_i is NaN, the result is NaN.

If there is already a space after the code, it is rendered as code. If there isn't, one is added. The added spaces are also there if you copy-paste the text.

I'm not sure why this is happening, if it is some issue with Myst or our theme or something else.

keepdims argument to argmin() and argmax()

The argmin and argmax functions require the keepdims keyword argument, same as min and max. However, in NumPy, min and max have this keyword argument, but argmin and argmax do not. We should confirm whether this keyword argument actually makes sense for these functions, or whether it was just added to these functions by mistake because they are also in the non-arg variants.

The description for keepdims is as follows:

keepdims : bool

If True , the reduced axes (dimensions) must be included in the result as singleton dimensions, and, accordingly, the result must be compatible with the input array (see Broadcasting ). Otherwise, if False , the reduced axes (dimensions) must not be included in the result. Default: False .

Perhaps the reason NumPy doesn't implement this for the arg* functions is that they return indices, so maintaining broadcastability is not important. The documentation for max says (emphasis added):

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

Dtype / checking standardization

In the current specification, there are not standardized data type objects, or a specification as to what a data type object needs to implement: https://data-apis.org/array-api/latest/API_specification/data_types.html

Additionally, there's no APIs in the specification for checking dtypes, i.e. something like np.issubdtype that is somewhat commonly used in code that would look something like:

def dispatch_based_on_dtype(array):
    if np.issubdtype(array.dtype, np.integer):
        return ...
    elif np.issubdtype(array.dtype, np.floating):
        return ...
    else:
        return ...

cc @jakirkham as I believe this pattern is used quite a bit in Dask which would presumably want to be able to target arbitrary array objects under the hood

tracking issue: array API standard document

Timeline (high-level)

Aug 31: array RFC draft complete enough for final internal review
Sep 15: array RFC published + array-api repo public

Array API standard document intermediate steps:

Aug 10:

Broadcasting (status: merged)
Mutability (status: pending discussion, see #24)
Dtypes and casting (status: merged)
API signatures: statistical reductions (status: merged)

Aug 17:

Purpose and scope (status: scope part (gh-22) merged, "how to adopt" (gh-33) under review)
Type signatures (status: merged)
Static typing (status: gh-62 for review)
Indexing (status: under review)
API signatures: linear algebra (status: first set of functions merged, others like svd and qr TODO)
API signatures: sorting (status: under review)
API signatures: searching (status: merged)
Accuracy (status: under review)
Methodology: data on existing API design & usage (status: under review)

Aug 24:

~~Array duck typing~~
API signatures: array creation (status: gh-35 merged, a few functions (mainly meshgrid) to be added)
API signatures: array manipulation (status: under review)
API signatures: set functions (status: under review)
API signatures: logical functions (status: merged)
C API (status: WIP content in issue gh-51)
Use cases (status: merged)
Assumptions (status: merged)
API signatures: math, non element-wise

Aug 31:

Array object (status: gh-53 for review)
Future API standard evolution (status: under review, see gh-37; for the actual process see https://github.com/data-apis/workgroup/issues/22)
Parallelism (status: gh-61 for review)
Data interchange mechanisms (see gh-34)
Verification - test suite (status: merged)
API: constants (status: merged)
versioning (status: under review, see gh-37)

Added later:

Python operators, and equivalent dunder methods and library functions/methods (see gh-9)
Device support (see gh-39)

Specifying a sort order when returning unique values

As discussed #25 (comment), we need to resolve if, and how, we should specify a sort order when returning unique values.

A corollary issue is, if we support an optional keyword to return sorted unique values, whether we should also support specifying the sort direction (ascending vs descending).

We could limit sorting to ascending order, but that this may be considered an arbitrary restriction may lend support for the argument of not returning ascending/descending sorted output at all. Instead, punting sorting to userland, where sort order can be specified via sort(). However, as discussed in the OP, combining unique/sort may allow implementation perf optimizations which cannot be replicated when performed as two separate steps.

Top-level https://data-apis.github.io/ URLs does not work

The top level URLs https://data-apis.github.io/ and https://data-apis.github.io/array-api/ give 404. It should be straightforward to make these work. The first can be done by adding a data-apis.github.io repo on this org with a simple index.html page that redirects to the main page. The other can be done similarly with an index.html on the gh-pages of this repo.

API specification seems to assume all arrays are floats

The description of __abs__ and especially __add__ seems to try to go into great detail about how floating point addition / abs works.

I'd argue this obscures the intent behind what is ultimately a container API, and it would be better to refer to the IEEE 754 spec (or perhaps add a separate page summarizing IEEE 754)

Generating a NumPy Array API from library usage

I have been working on attempting to auto-generate a version of the NumPy API based on it's usage from downstream libraries. I am far enough along to present some end to end results, but I still need to run it with more examples for it to be that meaningful.

Here is the generating numpy module, based on running the skimage, xarray, and sklearn test suites.

Next steps

I would appreciate any feedback on the end result or the process. My next steps are to start looking for more codebases to run and analyze. If you wanna take it for a spin, please feel free to clone the repo and run it on your own codebase, and upload the results as well. I will work on adding some more instructions, but the Makefile should get you started.

Also, it would be nice to match it against the documentation data or other more curated resources. We could also experiment with hand writing a list of included functions/classes, and letting this generate signatures for us.

Broadly speaking, this can help us get a sense of what the current API usage looks like for different array libraries and so could help form the base of a proposed API spec. The JSON format is a bit verbose, but does work at describing the different forms of the APIs.

Any other ideas on where to move with this would be appreciated. Or better yet, download the data and tools yourself and see if it's useful.

How?

That prettier form is generated from a structured JSON file, which in turn is generated from the various traces of running the different test suites.

It works by using the setprofile hook to intercept every bytcode execution, and peek at the stack to see if it's a function call what the function and arguments are. It then saves calls from some particular module (xarray and skimage in this case) and to some particular module (numpy), ignoring the rest.

For the API generation, it tries to take the union of the various types and call signatures to come up with a single signature for each function.

Lot's of limitations here, but it gives a start. Again, any feedback would be much appreciated.

Boolean array indexing is not compatible with static memory allocation

JAX is built on top of XLA which (currently) requires static memory allocation for operations. This means it is not possible to express operations like x[y] where y is a boolean, because the resulting array has value dependent size: https://data-apis.github.io/array-api/latest/API_specification/indexing.html#boolean-array-indexing

It's definitely possible that the static memory allocation requirement could be relaxed in the future, but dynamic memory allocation is always going to be harder to implement in a performant way. For example, I would guess Numba also struggles with this sort of operation. I don't think we should require it for array libraries implementing our standard, since it isn't needed for the majority of array operations.

Explicit array conversion (e.g., array(), asarray())

Reading through the standard, it appears that we may have missed an important feature: the ability to explicit coerce objects into a desired array type, either from builtin Python types like float/list or other array libraries. In other words, we need something like NumPy's array() and/or asarray() functions.

Summarize integer type promotion rules with a lattice?

@jakevdp suggested that a nice way to summarize the signed/unsigned integer type promotion rules would be with a lattice, e.g.,

i* denotes a Python int (with unspecified precision).

This is a subset of the full type promotion lattice from the JAX docs:
https://jax.readthedocs.io/en/latest/type_promotion.html

The lattice for floats would just be f* -> f4 -> f8.

Proposal to standardize element-wise arithmetic operations

Based on the analysis of array library APIs, we know that performing element-wise arithmetic operations is both universally implemented and commonly used. Accordingly, this issue proposes to standardize the following arithmetic operations:

Arithmetic Operations

add
subtract
multiply (mul)
divide (div)

Criterion

Commonly implemented across array libraries.
Commonly used by array library consumers.
Operates on two arrays.

Questions

Naming conventions? Currently, this proposal is biased toward verbose API names following NumPy.
Are there any APIs listed above which should not be standardized?
Are there basic arithmetic operations not listed above which should be standardized? Preferably, any additions should be supported by usage data.

Data dependent/unknown shapes

Some libraries, particularly those with a graph-based computational model (e.g., Dask and TensorFlow), have support for "unknown" or "data dependent" shapes, e.g., due to boolean indexing such as x[y > 0] (#84). Other libraries (e.g., JAX and Dask in some cases) do not support some operations because they would produce such data dependent shapes.

We should consider a standard way to represent these shapes in shape attributes, ideally some extension of the "tuple of integer" format used for fully known shapes. For example, TensorFlow and Dask currently use different representations:

TensorFlow uses a custom TensorShape object (which acts very similarly to tuple), where some values may be None
Dask uses tuples, where some values may be nan integer of integers

Boolean dtypes need not specify storage in bytes

The current specification is "Boolean ( True or False ) stored as a byte."
https://data-apis.github.io/array-api/latest/API_specification/data_types.html#bool

In my view, storing the data as a byte is an implementation detail not appropriate to include in the specification, particularly given our goal to be hardware agnostic. For example, its should be possible to make a compliant array library that stores booleans in a single bit each.

Common APIs across array libraries

Overview

To help further the discussion of what array APIs should be included in the standard, I've compiled a (WIP) list of common APIs across various array libraries.

This list should provide some indication as to API importance from the library development perspective based on API curation and need and should summarize current existing practice.

Goal

To standardize a common set of core APIs and minimal signatures (i.e., argument order and keyword arguments) that every array API should implement in order to be array specification compliant.

Method

I compiled the list by doing the following:

Generating a list of APIs based on publicly documented array APIs (e.g., by scraping website documentation).
Computing the intersection across the individual datasets.

The following libraries were analyzed:

numpy
cupy
dask.array
jax
mxnet
pytorch
tensorflow

APIs

The following APIs were found to be common across the above libraries (using NumPy's naming conventions):

angle
arange
arccos
arcsin
arctan
arctan2
argmax
argmin
array
ceil
concatenate
conj
cos
cosh
cumprod
cumsum
einsum
exp
expm1
eye
flip
floor
full
imag
linalg.cholesky
linalg.inv
linalg.norm
linalg.qr
linalg.solve
linalg.svd
linspace
log
log1p
logaddexp
matmul
maximum
mean
meshgrid
minimum
ones
ones_like
prod
real
reshape
roll
sign
sin
sinh
sqrt
square
squeeze
stack
std
sum
tan
tanh
tensordot
trace
transpose
trunc
var
where
zeros
zeros_like

We can split these APIs into various categories as follows...

Array Creation

arange
array
eye
full
linspace
meshgrid
ones
ones_like
zeros
zeros_like

Array Manipulation

concatenate
flip
reshape
roll
squeeze
stack

Special Functions

ceil
exp
expm1
floor
log
log1p
logaddexp
maximum
minimum
sign
square
sqrt
trunc

Trigonometry

arccos
arcsin
arctan
arctan2
cos
cosh
sin
sinh
tan
tanh

Complex Numbers

angle
conj
imag
real

Reductions

cumprod
cumsum
mean
prod
std
sum
var

Linear Algebra

einsum
linalg.cholesky
linalg.inv
linalg.norm
linalg.qr
linalg.solve
linalg.svd
matmul
tensordot
trace
transpose

Indexing

argmax
argmin
where

Next Steps

Provide the intersection of keyword arguments for each of the above APIs.

Questions

While the above uses NumPy naming conventions, some of the above libraries have chosen to deviate from NumPy conventions (absolute vs abs). Are there APIs which should be aliased differently?
How to handle/encode missing data in element-wise functions (ufuncs) and reductions?
Can we standardize a core subset of the above APIs in terms of method names and a limited set of keyword arguments?
To allow for API extensibility, can we specify a common API for arbitrary element-wise and/or axis-wise operations (e.g., apply, reduce)?

Feedback is welcome. :)

Add CI that auto-deploys Sphinx docs after merges to master

It should do something like (now doing locally)

# Install packages in requirements.txt
git clean -xdf
pushd spec
make html
popd
(git checkout --orphan tmp && git branch -D gh-pages || true)
git checkout --orphan gh-pages
git reset --hard
cp -R spec/_build/html/ latest
touch .nojekyll
git add .nojekyll
git add latest
git commit -m "Commit API Standard doc build";
git push --set-upstream origin gh-pages --force

Put the above in a file deploy-ghpages.sh at .. from the repo root, then run (from master, in the repo root):

source ../deploy-ghpages.sh

Parallelism - what do libraries offer, and is there an API aspect to it

Several people have expressed a strong interest in talking about and working on (auto-)parallelization. Here is an attempt at summarizing this topic.

current status
auto-parallelization and nested parallelism
limitations due to Python package distribution mechanisms
The need for a better API pattern or library

Current status

Linear algebra libraries

The main accelerated linear algebra libraries that are in use (for CPU based
code) are OpenBLAS and
MKL.
Both of those libraries auto-parallelize function calls.

OpenBLAS can be built with either its own pthreads-based thread pool, or with
OpenMP support. The number of threads can be controlled with an environment
variable (OPENBLAS_NUM_THREADS or OMP_NUM_THREADS), or from Python via
threadpoolctl. The conda-forge
OpenBLAS package uses OpenMP; the OpenBLAS builds linked into NumPy and SciPy
wheels on PyPI use pthreads.

MKL supports OpenMP and Intel TBB as the threading control mechanisms. The
number of threads can be controlled with an environment variable
(MKL_NUM_THREADS or OMP_NUM_THREADS), or from Python with threadpoolctl.

NumPy

NumPy does not provide parallelization, with the exception of linear algebra
routines which inherit the auto-parallelization of the underlying library
(OpenBLAS or MKL typically). NumPy does however release the GIL consistently
where it can.

Scikit-learn

Scikit-learn provides a keyword n_jobs=1 in many estimators and other
functions to let users enable parallel execution. This is done via the
joblib library, which provides both
multiprocessing (default) and threading backends that can be selected with a
context manager.

Scikit-learn also contains C and Cython code that uses OpenMP. OpenMP is
enabled in both wheels on PyPI and in conda-forge packages. The number of
threads used can be controlled with the OMP_NUM_THREADS environment variable.

Scikit-learn has good documentation on parallelism and resource management.

SciPy

SciPy provides a workers=1 keyword in a (still limited) number of functions
to let users enable parallel execution. It is similar to scikit-learn's
n_jobs keyword, except that it also accepts a map-like callable (e.g.
multiprocess.Pool.map to allow using a custom pool. C++ code in SciPy uses
pthreads; the use of OpenMP was
discussed and rejected.

scipy.linalg also provides a Cython API for BLAS and LAPACK. This lets other
libraries use linear algebra routines without having to ship or build against
an accelerated linear algebra library directly. Scikit-learn, statsmodels and
other libraries do this - thereby again inheriting the auto-parallelization
behavior from OpenBLAS or MKL.

Deep learning frameworks

TensorFlow, PyTorch, MXNet and JAX all have auto-parallelization behavior.
Furthermore they provide support for distributed computing (with the exception
of JAX). These frameworks are very performance-focused, and aim to optimally
use all available hardware. They typically allow building with different
backends like NCCL or GLOO for GPU support, and use OpenMP, MPI, gRPC and more.

The advantage these frameworks have is that users typically only use this one
framework for their whole program, so the parallelism used can be optimized
without having to play well with other Python packages that also execute code
in parallel.

Dask

Dask provides parallel arrays, dataframes and machine learning algorithms with
APIs that match NumPy, Pandas and scikit-learn as much as possible. Dask is a
pure Python library and uses blocked algorithms; each block contains a single
NumPy array or Pandas dataframe. Scaling to hundreds of nodes is possible; Dask
is a good solution to obtain distributed arrays. When used as a method to
obtain parallelism on a single node however, it is not very efficient.

Auto-parallelization and nested parallelism

Some libraries, like the deep learning frameworks, do auto-parallelization.
Most non deep learning libraries do not do this. When a single library or
framework is used to execute an end user program, auto-parallelization is
usually a good thing to have. It uses all available hardware resources in an
optimal fashion.

Problems can occur when multiple libraries are involved. What often happens is
oversubscription of resources. For example, if an end user would write code
using scikit-learn with n_jobs=-1, and NumPy would auto-parallelize
operations, then scikit-learn will use N processes (on an N-core machine)
and NumPy will use N threads per process - leading to N^2 threads being
used. On machines with a large number of cores, the overhead of this quickly
becomes problematic. Given that NumPy uses OpenBLAS or MKL, this problem
already occurs today. For a while Anaconda and Intel shipped a modified NumPy
version that had auto-parallelization behavior for functions other than linear
algebra - and the problem occurred more frequently.

The paper Composable Multi-Threading and Multi-Processing for Numeric
Libraries
from Malakhov et al. contains a good overview with examples and comparisons
between different parallelization methods. It uses NumPy, SciPy, Dask, and
Numba, and uses multiprocessing, concurrent.futures, OpenMP, Intel TBB
(Threading Building Blocks), and a custom library SMP (symmetric
multi-processing).

Limitations due to Python package distribution mechanisms

When one wants to use auto-parallelization, it's important to have control over
the complete set of packages that a user gets installed on their machine. That
way one can ensure there's a single linear algebra library installed, and a
single OpenMP runtime is used.

That control over the full set of packages is common in HPC type situations,
where admins need to deal with build and install requirements to make libraries
work well together. Both packages managers (e.g. Apt in Debian) and Conda have
the ability to do this right as well - both because of dependency resolution
and because of a common build infrastructure.

A large fraction of Python users install packages from PyPI with Pip however.
The binary installers (wheels) on PyPI are not built on a common
infrastructure, and because there's no real support for non-Python
dependencies, libraries like OpenMP and OpenBLAS are bundled into the wheels
and installed into end user environments multiple times. This makes it
very difficult to reliably use, e.g., OpenMP. For this reason SciPy uses custom
pthreads thread pools rather than OpenMP.

The need for a better API pattern or library

The default behavior for libraries like NumPy and SciPy given the status of the
ecosystem today should be to be single-threaded, otherwise it composes badly
with multiprocessing, scikit-learn (joblib), Dask, etc. However, there's
room for improvement here. Two things that could help improve the coordination
of parallelization behavior in a stack of Python libraries are:

A common API pattern for enabling parallelism
A common library providing a parallelization layer

A common API pattern is the simpler of the two options. It could be a keyword
like n_jobs or workers that gets used consistently between libraries, or a
context manager to achieve the same level of per-function or per-code-block
control.

A common library would be more powerful and enable auto-parallelization rather
than giving the user control (which is what the API pattern does). From a
performance perspective, having arrays and dataframes auto-parallelize their
functions as much as possible over all cores on a single node, and then letting
a separate library like Dask deal with multi-node coordination, seems optimal.
Introducing a new dependency into multiple libraries at the core of the PyData
ecosystem is a nontrivial exercise however.

The above attempts to summarize the state of affairs today. The topic of
parallelization is largely an implementation rather than an API question,
however there is an API component to it with option (1) above. How to move
forward here is worth discussing.

Note: there's also a lot of room left in NumPy also for optimizing
single-threaded performance. There's ongoing work on making better use of
intrinsics (this is a large effort, ongoing), or using SLEEF for vector math
(discussed in the past, no one is working on it).

List of APIs currently in (or proposed to be included in) the specification

The following is a list of APIs which are currently included in the specification and/or are proposed to be included in the specification. This is intended to provide a checkpoint to determine any glaring omissions.

arange
empty
empty_like
eye
full
full_like
linspace
ones
ones_like
zeros
zeros_like

concat
expand_dims
flip
reshape
roll
squeeze
stack

abs
acos
acosh
add
asin
asinh
atan
atanh
bitwise_and
bitwise_invert
bitwise_left_shift
bitwise_or
bitwise_right_shift
bitwise_xor
ceil
cos
cosh
divide
equal
exp
expm1
floor
floor_divide
greater
greater_equal
isfinite
isnan
isinf
less
less_equal
log
log1p
log2
log10
logical_and
logical_not
logical_or
logical_xor
multiply
negative
not_equal
positive
pow
remainder
round
sign
sin
sinh
square
sqrt
subtract
tan
tanh
trunc

cross
det
diagonal
inv
norm
outer
trace
transpose

max
mean
min
prod
std
sum
var

unique

argmax
argmin
nonzero
where

argsort
sort

all
any

Add links to rendered docs and blog post in README

Proposal to standardize element-wise elementary mathematical functions

Based on the analysis of array library APIs, we know that evaluating element-wise elementary mathematical functions is both universally implemented and commonly used. Accordingly, this issue proposes to standardize the following elementary mathematical functions:

Special Functions

abs (absolute)
exp
log
sqrt

Rounding

ceil
floor
trunc
round

Trigonometry

sin
cos
tan
asin (arcsin)
acos (arccos)
atan (arctan)
sinh
cosh
tanh
asinh (arcsinh)
acosh (arccosh)
atanh (arctanh)

Criterion

Commonly implemented across array libraries.
Commonly used by array library consumers.
Operates on a single array (e.g., this criterion excludes atan2).

Questions

Naming conventions? The above is biased toward C naming conventions (e.g., atan vs arctan).
Are there any APIs listed above which should not be standardized?
Are there elementary mathematical functions not listed above which should be standardized? Preferably, any additions should be supported by usage data.
Should the standard mandate a minimum precision to ensure portability? (e.g., JavaScript's lack of minimum precision specification offers a cautionary tale)

design for checking something is an array

This is a question that we may have covered but not clearly enough because it keeps coming up in conversations: "how do I check if something is a compliant array object?" People have asked me, and it has also come up in the Dask tracking issue and the NEP 47 review:

The current answer is:

we do not have/want a "reference library" that people need to depend on, hence we cannot do something like create an array ABC that libraries can inherit from to enable an isinstance check.
the one distinguishing feature of a compliant array object is: it has an __array_namespace__ attribute, so at runtime the check to do is hasattr(x, '__array_namespace__').
each library that wants to do this check should probably implement a standard function like:

def is_arrayobj(x):
    return hasattr(x, '__array_namespace__')

we can recommend this somewhere, and give an implementation for people to vendor.
for static type checking we can do better, design an Array typing Protocol: https://data-apis.org/array-api/latest/design_topics/static_typing.html. This is still to be implemented, but isn't hard to do. Again people should vendor this.

Note that in NEP 47 we (I) messed something up: the initial thought was to make numpy.ndarray the object that was the standards-compliant one - and therefore have a __array_namespace__ attribute. However we decided later to create a separate array object. numpy.ndarray is not compliant, and therefore should not have that attribute. However, that also means there is no way to retrieve the compliant namespace from a regular numpy ndarray instance. Hence it should probably be special-cased:

def is_arrayobj_or_ndarray(x):
    # This is just a sketch
    try:
        import numpy as np
        if isinstance(x, np.ndarray):
            x_new = np.array_api.asarray(x)
            # What the caller will need to do is convert back to ndarray at the end ....
            return True, x_new
    except ImportError:
        pass

    return hasattr(x, '__array_namespace__')

The alternative would be two separate checks; either way we need a design pattern that does (at least for existing libraries like SciPy, which must also support numpy.ndarray):

check for compliant array object
check for numpy.ndarray instance
if ndarray instance, convert to compliant array object
retrieve namespace
(do stuff)
if it was an ndarray instance, convert back to ndarray at the end

WIP: strategies for C/Cython API usage

TODO:

update for feedback received in the call today
fill in some blanks in the text below

Strategies for C/Cython API usage

Use of a C API is out of scope for this array API, as mentioned in :ref:Scope.
There are a lot of libraries that do use such an API - in particular via Cython code
or via direct usage of the NumPy C API. When the maintainers of such libraries
want to use this array API standard to support multiple types of arrays, they
need a way to deal with that issue. This section aims to provide some guidance.

The assumption in the rest of this section is that performance matters for the library,
and hence the goal is to make other array types work without converting to a
numpy.ndarray or another particular array type. If that's not the case (e.g. for a
visualization package), then other array types can simply be handled by converting
to the supported array type.

.. note::

Often a zero-copy conversion to `numpy.ndarray` is possible, at least for CPU arrays.
If that's the case, this may be a good way to support other array types.
The main difficulty in that case will be getting the return array type right - however,
we do have a Python-level API for that.

Example situations for C/Cython usage

Situation 1: a Python package that is mostly pure Python, with a limited number of Cython extensions

.. note::

Projects in this situation include Statsmodels, scikit-bio and QuTiP

Main strategy: documentation. The functionality using Cython code will not support other array types (or only with conversion to/from numpy.ndarray), which can be documented per function.

Situation 2: a Python package that contains a lot of Cython code

.. note::

Projects in this situation include scikit-learn and scikit-image

Main strategy: add support for other array types per submodule. This keeps it manageable to explain to the user which functionality does and doesn't have support.

Longer term: specific support for particular array types (e.g. cupy.ndarray can be supported with Python-only code via cupy.ElementwiseKernel).

Situation 3: a Python package that uses the NumPy or Python C API directly

.. note::

Projects in this situation include SciPy and Astropy

Strategy: similar to situation 2, but the number of submodules that can support all array types may be limited.

Device support

Supporting non-CPU array types in code using the C API or Cython seems problematic,
this almost inevitably will require custom device-specific code (e.g., CUDA, ROCm) or
something like JIT compilation with Numba.

Further Python API standardization

There may be cases where it makes sense to standardize additional sets of functions, because they're important enough that array libraries tend to reimplement them. An example of this may be special functions, as provided by scipy.special. Bessel and gamma functions for example are commonly reimplemented by array libraries.

HPy

HPy is a new project that will provide a higher-level
C API and ABI than CPython offers. A Cython backend targeting HPy will be provided as well.

Better PyPy support
Universal ABI - single binary for all supported Python versions
Cython backend generating HPy rather than CPython code

Confusing example in "Copy-view behaviour and mutability"

I find this example very confusing:

This happens when views are combined with mutating operations. This simple example illustrates that:
x = ones(1)
x += 2
y = x   # `y` *may* be a view
y -= 1  # if `y` is a view, this modifies `x`

The semantics of Python mean that y = x always makes y a "view" of x - x is y by definition.

I think the point this example is trying to make is that "if -= operates in place, this modifies x" - but the comments tell a different and incorrect story.

Either that, or the example should include a slice operation of some kind, where the library actually does have an chance to decide whether to make a copy or view.

Type of return value from indexing

The current spec is silent on what the return value from an indexing operation should be: https://data-apis.github.io/array-api/latest/API_specification/indexing.html

I propose that the return value should always be another array object, i.e., so __getitem__ could be type annotated as __getitem__(self: Array, key: IndexKey) -> Array. However, this is inconsistent with NumPy, which returns NumPy scalars in some cases when indexing with integers.

Issues with array creation functions

Some issues I noticed in the array creation functions from adding tests to the test suite:

arange

stop and step should not be keyword-only (this was also mentioned at #85)
Does not specify the behavior if stop or start are out of range for the dtype
Says "If dtype is None, the output array data type must be the default floating-point data type." I think the default for arange should be int if all the arguments are integers.
step cannot be 0.
The function is only defined for numeric dtypes (#98)
"The length of the output array must be ceil((stop-start)/step)" should be caveated (stop and step provided, stop >= start for step > 0 and stop <= start for step < 0)

eye

May be worth explicitly saying elements with index i, j should be 1 if j - i = k and 0 otherwise.
The function is only defined for numeric dtypes (#98)

full (and full_like)

Says "If dtype is None, the output array data type must be the default floating-point data type." I think the default should be a corresponding dtype to the input value (we don't have a notion of a "default" integer dtype).

linspace

It's a bit ambiguous whether it actually says this right now, but I think the stop value should not be required to be included (when endpoint=True). Consider

>>> np.linspace(0, 9288674231451855, 2, dtype=np.int64)
array([               0, 9288674231451856])

The stop value is different from what is given because of floating point loss of precision when computing the linspace.

API for variable number of returns in linalg

Feedback from @mruberry: it may be nice to support always returning the same type from functions like Cholesky, with tensors accessed by names. That way frameworks could return additional tensors and there wouldn't be BC-compatibility issues if more tensor returns are added later. This suggestion, however, comes from a place of "it'd be nice if people wanted to use this API", as opposed to the perspective of, "this API is a lowest common denominator only intended for library writers."

cholesky always return a single array, so I thought it'd be fine - but the comment may be related to pytorch/pytorch#47608 (desire to return an error code rather than raise a RuntimeError for failures). For qr, svd and perhaps some more functions there's the related issue of different returns based on keywords. Using a class to stuff all return values in is a common design pattern for, e.g., scipy.optimize. I'm not convinced it's a good idea for the linalg functions here, but worth considering at least.

Type promotion rules

This issue seeks to come to a consensus on a subset of type promotion rules (i.e., the rules governing the common result type for two array operands during an arithmetic operation) suitable for specification.

As initially discussed in #13, a universal set of type promotion rules can be difficult to standardize due to the needs/constraints of particular runtime environments. However, we should be able to specify a minimal set of type promotion rules which all specification conforming array libraries can, and should, support.

Prior Art

NumPy: promotion rules follow a type hierarchy (where complex > floating > integral > boolean). See promote_types and result_type APIs and source [1, 2].
CuPy: follows NumPy's rules, except for zero-dimension arrays.
Dask: follows NumPy's rules.
JAX: type promotion table (and source) which deviates from NumPy's promotion rules in two ways: (1) biased toward half- and single-precision floating-point numbers and (2) support for a non-standard floating-point type.
PyTorch: promotion rules follow a type hierarchy (where complex > floating > integral > boolean) without inspection of value magnitude.
Tensorflow: requires explicit casting.

Proposal

This issue proposes to specify that all specification conforming array libraries must, at minimum, support the following type promotions:

floating-point type promotion table:

f2 f4 f8

f2 f2 f4 f8

f4 f4 f4 f8

f8 f8 f8 f8

where
- f2: half-precision (16-bit) floating-point number
- f4: single-precision (32-bit) floating-point number
- f8: double-precision (64-bit) floating-point number
unsigned integer type promotion table:

u1 u2 u4 u8

u1 u1 u2 u4 u8

u2 u2 u2 u4 u8

u4 u4 u4 u4 u8

u8 u8 u8 u8 u8

where
- u1: 8-bit unsigned integer
- u2: 16-bit unsigned integer
- u4: 32-bit unsigned integer
- u8: 64-bit unsigned integer
signed integer type promotion table:

i1 i2 i4 i8

i1 i1 i2 i4 i8

i2 i2 i2 i4 i8

i4 i4 i4 i4 i8

i8 i8 i8 i8 i8

where
- i1: 8-bit signed integer
- i2: 16-bit signed integer
- i4: 32-bit signed integer
- i8: 64-bit signed integer
mixed unsigned and signed integer type promotion table:

u1 u2 u4

i1 i2 i4 i8

i2 i2 i4 i8

i4 i4 i4 i8

	f2	f4	f8
f2	f2	f4	f8
f4	f4	f4	f8
f8	f8	f8	f8

	u1	u2	u4	u8
u1	u1	u2	u4	u8
u2	u2	u2	u4	u8
u4	u4	u4	u4	u8
u8	u8	u8	u8	u8

	i1	i2	i4	i8
i1	i1	i2	i4	i8
i2	i2	i2	i4	i8
i4	i4	i4	i4	i8
i8	i8	i8	i8	i8

	u1	u2	u4
i1	i2	i4	i8
i2	i2	i4	i8
i4	i4	i4	i8

Notes

The minimal set of type promotions outlined above explicitly does not define promotions between types which are not of the same kind (i.e., floating-point versus integer). When converting between types of different kinds, libraries tend to support C type promotion semantics, where floating-point, regardless of precision, has a higher rank/precedence than all integer types; however, they differ in the promoted floating-point precision (e.g., JAX promotes (i8, f2) to f2, while NumPy promotes (i8, f2) to f8). The reason for the discrepancy stems from the particular needs/constraints of accelerator devices, and, thus, by omitting specification here, we allow for implementation flexibility and avoid imposing undue burden.
Omitted from the above tables are "unsafe" promotions. Notably, not included are promotion rules for mixed signed/unsigned 64-bit integers i8 and u8. NumPy and JAX both promote (i8, u8) to f8 which is explicitly undefined via the aforementioned note regarding conversions between kinds and also raises questions regarding inexact rounding when converting from a 64-bit integer to double-precision floating-point.
This proposal addresses type promotion among array operands, including zero-dimensional arrays. It remains to be decided whether "scalars" (i.e., non-array operands) should directly participate in type promotion.

Questions related to adoption

Here's a list of questions that we need to think about when driving adoption in both array/tensor libraries, and further downstream.

Do we want array libraries to adopt this into a public or private namespace. I.e., only accessible via arrayobject.__array_namespace__() or additionally via a direct import? If the latter, what should it be named? related to gh-16
What is the design pattern to work around np.asarray in downstream libraries?
Now that we have some features that are kind of optional (e.g., boolean indexing), can we define and should we recommend a testing strategy to figure out how portable some piece of code really is? Also relevant if libraries add extra things into the standard module because it's too hard to remove (e.g., methods on array object).
Should we have a central place to track adoption and compliance level?
Can downstream libraries sanely use type annotations when supporting multiple array libraries? (related to the Protocol mentioned in the Static typing section)

Please add more questions if you can think of them.

Let's change master to main

Updating the branch now before stuff grows will make things easier.

Issues with "Mixing arrays and Python scalars" section

A few issues with the mixing arrays and Python scalars section that came up from testing:

The matmul operator (@) does not (and should not) support scalar types, so it should be excluded.
Should boolean arrays be required to work with arithmetic operators. NumPy gives errors in some cases, for example:

>>> import numpy as np
>>> np.array(False) - False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

How should casting work? For example, int8 array + large Python int. Should it cast the int, or give an error. Or should this be unspecified.
The above question also applies for Python float -> float32 casting, which is implicitly supported by the text that is currently there, but is unsafe in general (is there a "safe" variant of float64 -> float32, or is it always unsafe? Either way, this would be value based casting, which I thought we were trying to avoid).

result_type for explicit promotion

Currently the recommendation for code that needs to promote is:

Therefore we had to make the choice to leave the rules for “mixed kind dtype casting” undefined - when users want to write portable code, they should avoid this situation or use explicit casts to obtain the same results from different array libraries.

In eager mode frameworks, making operations portable in this way also can result in extra memory usage, as you have to first do the cast, and then do the operation, whereas the cast might have been fused internally in the framework.

However, it's not clear how important this actually is in practice, since at least in PyTorch, a lot promotion is in fact implemented by first doing a cast and then doing a normal operation (instead of implementing quadratically many kernels). I don't know off the top of my head which operations this would be relevant for.

add a CONTRIBUTING.md file and issue/PR templates

To do before public release, to ensure people who want to contribute get the right guidance. The scope and the process for adding new features should be linked, to let people assess whether the PR they're about to make is the right thing to work on or not.

Consistent default integer and floating-point types

In today's call we discussed that the current Array API standard does not regulate the default integer and floating-point types across all implementations, only that each implementation should pick one, document clearly and stick to it. However, this is not strong enough, as there could be cross-platform/portability issues.

For example, NumPy is inconsistent in handling the Python integers between Windows and Linux:

import numpy as np
a = np.arange(10, dtype=int)
a.dtype  # np.int32 on Windows, np.int64 on Linux

Similar issues can be found in other APIs.

It's worth noting that as pointed out by @kgryte, the standard does not permit passing dtype=int to most of the functions, so this could eliminate a large class of such inconsistencies. But it's still good to be explicit in the standard to ensure portability.

How to expose API to downstream libraries?

I wanted to open a discussion on how the Array API (and potentially the dataframe API) will be exposed to downstream libraries.

For example, let's say I am the author of scikit-learn. How do I get access to an "Array compatible API"? Or let's say I am a downstream user, using scikit-learn in a notebook. How can I tell it to use Tensorflow over NumPy?

Options

I present three options here, but I would appreciate any suggestions on further ideas:

Manual

The default option is the current status quo where there is no standard way to get access to some array conformant API backend.

Different downstream libraries, like scikit-learn, could introduce their own mechanisms, like a backend kwarg to functions, if they wanted to support different backends.

Local Dispatch

Another approach, would be to provide access to the related module from particular instances of the objects, which is the one taken by NEP 37.

In this case, scikit-learn would either call some x.__array_module__() method on its inputs or we would provide a array-api Python package that would have a helper function like get_array_module(x), similar to the NEP.

There is an open PR in scikit-learn (scikit-learn/scikit-learn#16574) to add support for NEP 37.

Global Dispatch

Instead of requiring an object to inspect, we could instead rely on a global context to store the "active array api" and provide ways of getting and settings this. Some form of this is implemented by scipy, with their scipy.fft.set_backend, which uses uarray.

This would be heavier weight than we would need, probably, but does illustrate the general concept. I think if we implemented this, we could use Context Variables like python's built in decimal module does. i.e. something like this:

from array_api import set_backend, get_backend

import cupy

with set_backend(cupy):
    some_fn()

def some_fn():
    np = get_backend()
    return np.arange(10)

The advantage of using a global dispatch is then you don't need to rely on passing in some custom instance class to set the backend.

Static Typing

This is slightly tangential, but one question that comes up for me is how we could properly statically type options 2 or 3. It seems like what we need is a typing.Protocol but for modules. I raised this as a discussion point on the typing-sig mailing list.

Type signatures for standardized functions

I think standardized functions should include signatures that are at least in principle compatible with static type checking tools such as mypy, ideally treating dtype and shape as an implicit part of array types. This would eventually allow for static checking of code using these array APIs, which would using these APIs much safer.

This would entail standardizing a number of implementation details beyond those in the current draft specs, e.g.,

Python types: this is very basic, but we should specify exactly which Python types are valid for various arguments, e.g., axis must use a Python int or tuple[int, ...].
array types: are operations on an array of some type required to return an array of the same type? Or merely another array (of any type) that also satisfies the standard array API (whatever we eventually define that to be)?
array shapes: do we require implementing the full version of NumPy broadcasting? Or do we define behavior in simpler cases, e.g., where all shapes match? It might also make sense to only require a more explicit subset of broadcasting behavior, without automatic rank promotion.
array dtypes: do we include dtype promotion like what NumPy uses? Or do we only define operations on a predefined set of "safe" dtypes, e.g., add(float64, float64) -> float64? One argument against codifying dtype promotion is that the details of dtype promotion are rather tricky, and NumPy's rules are unsuitable for accelerators (JAX and PyTorch implement incompatible semantics). On the other hand, requiring explicit dtype casting (like in TensorFlow) is rather annoying for users (maybe less of a problem for library code).

Linear Algebra design overview

The intent of this issue is to provide a bird's eye view of linear algebra APIs in order to extract a consistent set of design principles for current and future spec evolution.

Unary APIs

matrix_rank
qr
pinv
trace
transpose
norm
inv
det
diagonal
svd
matrix_power
slogdet
cholesky

Binary APIs:

vecdot
tensordot
matmul
lstsq
solve
outer
cross

Support stacks (batching)

Main idea is that, if an operation is explicitly defined in terms of matrices (i.e., 2D arrays), then an API should support stacks of matrices (aka, batching).

Unary:

matrix_rank
qr
pinv
inv
det
svd
matrix_power
slogdet
cholesky
norm (when axis is 2-tuple containing last two dimensions)
trace (when specify last two axes via axis1 and axis2)
diagonal (when specify last two axes via axis1 and axis2)
transpose (when specify an axis permutation in which only the last two axes are swapped)

Binary:

vecdot
matmul
lstsq
solve

No stack (batching) support

Binary:

tensordot
outer (vectors)
cross (vectors)

Support tolerances

matrix_rank (rtol*largest_singular_value)
lstsq (rtol*largest_singular_value)
pinv (rtol*largest_singular_value)

Supported dtypes

Main idea here is that we should avoid undefined/ambiguous behavior. For example, when type promotion rules cannot capture behavior (e.g., if accept int64, but need to return as float64), how would casting work? Based on type promotion rules only addressing same-kind promotion, would be up to the implementation, and thus ill-defined. To ensure defined behavior, if need to return floating-point, require floating-point input.

Numeric:

vecdot: numeric (mults, sums)
tensordot: numeric (mults, sums)
matmul: numeric (mults, sums)
trace: numeric (sums)
cross: numeric (mults, sums)
outer: numeric (mults)

Floating:

matrix_rank: floating
det: floating
qr: floating
lstsq: floating
pinv: floating
solve: floating
norm: floating
inv: floating
svd: floating
slogdet: floating (due to nat log)
cholesky: floating
matrix_power: floating (exponent can be negative)

Any:

transpose: any
diagonal: any

Output values

Array:

vecdot: array
tensordot: array
matmul: array
matrix_rank: array
trace: array
transpose: array
norm: array
outer: array
inv: array
cross: array
det: array
diagonal: array
pinv: array
matrix_power: array
solve: array
cholesky: array

Tuple:

qr: Tuple[ array, array ]
lstsq: Tuple[ array, array, array, array ]
svd: array OR Tuple[ array, array, array ] (based on keyword arg)
- should consider splitting into svd and svdvals (similar to eig/eigvals)
slogdet: Tuple[ array, array ]

Note: only SVD is polymorphic in output (compute_uv keyword)

Reduced output dims

norm: supports keepdims arg
vecdot: no keepdims
matrix_rank: no keepdims
lstsq (rank): no keepdims
trace: no keepdims
det: no keepdims
diagonal: no keepdims

Conclusion: only norm is unique here in allowing the output array rank to remain the same as that of the input array.

Broadcasting

vecdot: yes
tensordot: yes
matmul: yes
lstsq: yes (first ndims-1 dimensions)
solve: yes (first ndims-1 dimensions)
pinv: yes (rtol)
matrix_rank: yes (rtol)
outer: no (1d vectors)
cross: no (same shape)

Specialized behavior

cholesky: upper
svd: compute_uv and full_matrices
norm: ord and keepdims
qr: mode
tensordot: axes

Proposal to standardize basic statistical functions

Based on the analysis of array library APIs, we know that performing basic statistical functions is both universally implemented and commonly used. Accordingly, this issue proposes to standardize the following functions:

Functions

mean
prod
std
sum
var
min (minimum)
max (maximum)

Criterion

Commonly implemented across array libraries.
Commonly used by array library consumers.
Operates on one array.

Questions

Are there any APIs listed above which should not be standardized?
Are there basic statistical functions not listed above which should be standardized? Preferably, any additions should be supported by usage data.
Should the standard recommend algorithms for increased portability? For reductions, mandating minimum precision requirements is more fraught than for the evaluation of elementary mathematical functions.

Use case: add distributed and GPU support to SciPy

When surveying a representative set of advanced users and research software engineers in 2019 (for this NSF proposal), the single most common pain point brought up about SciPy was performance.

SciPy heavily relies on NumPy (its only non-optional runtime dependency). NumPy provides an array implementation that's in-memory, CPU-only and single-threaded. Common things users ask for are:

parallel algorithms (multi-threaded or multiprocessing based)
support for distributed arrays (with Dask in particular)
support for GPUs

Some parallelism, in particular via multiprocessing, can be supported, however SciPy itself will not directly start depending on a GPU or distributed array implementation, or contain (e.g.) CUDA code - that's not maintainable given the resources for development. However, there is a way to provide distributed or GPU support. Part of the solution is provided by NumPy's "array protocols" (see gh-1), that allow dispatching to other array implementations. The main problem then becomes how to know whether this will work with a particular distributed or GPU array implementation - given that there are zero other array implementations that are even close to providing full NumPy compatibility - without adding it as a hard dependency.

It's clear that SciPy functionality that relies on compiled extensions (C, C++, Cython, Fortran) directly won't work. Pure Python code can work though. There's two main possibilities:

Testing with another package, manually or in CI, and simply provide a list of functionality that is found to work. Then make ad-hoc fixes to expand the set that works.
Start relying on a well-defined subset of the NumPy API (or a new NumPy-like API), for which compatibility is guaranteed.

Option (2) seems strongly preferable, and that "well-defined subset" is what an API standard should provide. Testing will still be needed, to ensure there are no critical corner cases or bugs between array implementations, however that's then a very tractable task.

Store function metadata in a machine readable format

It would be useful for the test suite to have the function metadata stored in a machine readable format. Currently I am parsing the function signatures from the spec files using some regular expressions, and I will probably end up parsing some other information such as types as well. This works fine for now, but it would be cleaner if this data were stored in a machine readable format, say in JSON, and the relevant parts of the spec documents generated from that automatically.

To be sure, not everything in the spec needs to be in JSON, just the parts that will need to be extracted for other things as well, such as the test suite. There should still be a lot of plain English descriptions of behavior.

This is likely too much work for version 1 given that we already have things inline in the Markdown, but it's something to consider for future iterations.

Precedence of selecting from multiple exchange protocols?

I conceive that a few libraries may end up in a situation where more than one protocols are supported, for example,

__array_interface__
__cuda_array_interface__
Python buffer protocol
DLPack

Does it matter which protocol an array implementation should try first? Is it up to the library implementors?

Question: complex number support in Array API?

I am wondering the reason that complex numbers are not considered in the Array API, and if we could give a second thought to make them native dtypes in the API.

The Dataframe API is not considered in the rest of this issue 🙂

I spent quite some time on making sure complex numbers are first-class citizens in CuPy, as many scientific computing applications require using complex numbers. In quantum mechanics, for example, complex numbers are the cornerstones and we can't live without them. Even in some machine learning / deep learning works that we do, either classical or quantum (yes, for those who don't know already there is quantum machine learning 😁), we also need complex numbers in various places like building tensors or communicating with simulations, especially those applying physics-aware neural networks, so it is a great pain to us not being able to build and operate on complex numbers natively.

To date, complex numbers are also an integral part of mainstream programming languages. For example, C has it since C99, and so is C++ (std::complex). Our beloved Python has complex too, so it is just so weird IMHO that when we talk about native dtypes they're being excluded.

As for language extensions to support GPUs, in CUDA we have thrust::complex (which currently supports complex64/complex128) as a clone of std::complex and it is likely that libcu++ will replace Thrust on this aspect, and in ROCm there's also a Thrust clone and native support in HIP, so at least on NVIDIA/AMD GPUs we are good.

Turning to library support, as far as I know

NumPy supports complex64/complex128, but not complex32 (numpy/numpy#14753)
CuPy supports complex64/complex128, and complex32 is being evaluated (ex: cupy/cupy#4454)
PyTorch's support for complex32/complex64/complex128 is catching up (I am unaware of any meta-issue summarizing the status quo, but the label module: complex is a good reference
SciPy / cupyx.scipy has many components supporting complex numbers, the most recent prominent case being the extensive ndimage overhaul (ex: scipy/scipy#12725) done by @grlee77 for image processing (yes, image processing also needs complex numbers!)

The reason I also mention complex32 above is because CUDA now provides complex32 support in some CUDA libraries like cuBLAS and cuFFT. With special hardware acceleration over float16, it is expected that complex32 can also benefit, see the preliminary FFT test being done in cupy/cupy#4407. Hopefully by having complex number support in ML/DL frameworks (complex64 and complex128 are enough to start) many more applications can be benefited as well.

I am aware that Array API picks DLPack as the primary protocol for zero-copy data exchange, and that it currently lacks complex number support. This is one of the reasons I do not like DLPack. While I will create a separate issue to discuss about alternatives to DLPack, I think revising DLPack's format is fairly straightforward (and should be done asap regardless of the Array API standardization due to the need of ML/DL libraries).

Disclaimer: This issue is merely for my research interests (relevant to my and other colleagues' work) and is not driven by CuPy, one of the Array API stakeholders I will represent.

Copy-view behaviour and mutating arrays

Context:

in #20 (comment) we discussed the differences between libraries in returning a copy vs. a view from function calls.
in #8 we were discussing how to deal with mutability

That issue and PR were about unrelated topics, so I'll try to summarize the copy-view and mutation topic here and we can continue the discussion.

Note that the two topics are fairly coupled, because copy/view differences only matter (for semantics, not for performance) when mixed with mutation.

Mutating arrays

There's a number of things that may rely on mutation:

In-place operators like +=, *=
The out= keyword argument
Element and slice assignment with __setitem__

Summary of the issue with mutation by @shoyer was: Mutation can be challenging to support in some execution models (at least without another layer of indirection), which is why several projects currently don't support it (TensorFlow and JAX) or only support it half-heartedly (e.g., Dask). The commonality between these libraries is that they build up abstract computations, which is then transformed (e.g., for autodiff) and/or executed in parallel. Even NumPy has "read only" arrays. I'm particularly concerned about new projects that implement this API, which might find the need to support mutation burdensome.

@alextp said: TensorFlow was planning to add mutability and didn't see a real issue with supporting out=.

@shoyer said: It's definitely always possible to support mutation at the Python level via some sort of wrapper layer.

dask.array is perhaps a good example of this. It supports mutating operations and out in some cases, but its support for mutation is still rather limited. For example, it doesn't support assignment like x[:2, :] = some_other_array.

Working around limitations of no support for mutation can usually be done by one of:

Use where for selection, e.g., where(arange(4) == 2, 1, 0)
Calculate the "inverse" of the assignment operator in terms of indexing, e.g., y = array([0, 1]); x = y[[0, 0, 1, 0]] in this case

Some version of (2) always works, though it can be tricky to work out (especially with current APIs). The duality between indexing and assignment is the difference between specifying where elements come from or where they end up.

The JAX syntax for slice assignment is: x.at[idx].set(y) vs x[idx] = y

One advantage of the non-mutating version is that JAX can have reliable assigning arithmetic on array slices with x.at[idx].add(y) (x[idx] += y doesn't work if x[idx] returns a copy).

A disadvantage is that doing this sort thing inside a loop is almost always a bad idea unless you have a JIT compiler, because every indexing assignment operation makes a full copy. So the naive translation of an efficient Python loop that fills out an array row by row would now make a copy in each step. Instead, you'd have to rewrite that loop to use something like concatenate instead (which in my experience is already about as efficient as using indexing assignment).

Copy-view behaviour

Libraries like NumPy and PyTorch return views where possible from function calls. It's sometimes hard to predict when a view will be returned vs. when a copy - it not only depends on the function in question, but also on whether the input array is contiguous, and sometimes even on input dtype.

This is one place where it's hard to avoid implementation choices leaking into the API:

Static graph based implementations like TensorFlow and MXNet, or a functional implementation like JAX with immutable arrays, will return a copy for a function like transpose().
Implementations which support strides and/or use a dynamic graph are able to, and therefore often will, return a view when they can (which is the case for transpose()).

The above copy vs. view difference starts leaking into the API - i.e., the same code starts giving different results for different implementations - when it is combined with an operation that performs in-place mutation of an array (either the base array or the view on it). In the absence of that combination, views are simply a performance optimization that's invisible to the user.

The question is whether copy-view differences should be allowed, and if so how to deal with the semantics that vary between libraries.

To answer whether is should be allowed, let's first ask how often the combination of views and mutation is used. A few observations:

It is normally considered a bug if a library function (e.g. a SciPy or scikit-learn one) mutates any of its input arguments - unless the function is explicitly documented as doing so, which is rare. So the main concern is use inside functions, with arrays that are either created inside the function or use a copy of the input array.
A search for patterns like *=, += and ] = in SciPy and scikit-learn .py files shows that in-place mutation inside functions is heavily used.
There's a significant difference between mutating a complete array (e.g. with += 1) and mutating part of an array (e.g. with x[:, :2] = y). The former is a lot easier to support for array libraries employing static graphs or a JIT than the latter. See the discussion at #8 (comment) for details.
It's harder to figure out how often the combination of mutating part of an array and that mutation affecting a view occurs. This could be tested though, with a patched NumPy to raise an exception on mutations affecting a view and then running test suites of downstream libraries.

Options for how to standardize

In #8 @shoyer listed the following options for how to deal with mutability:

Require support for in-place operations. Libraries that don't support mutation fully will need to write a wrapper layer, even if it would be inefficient.
Make support for in-place operations optional. Arrays can indicate whether they support mutation via some standard API, e.g., like NumPy's ndarray.flags.writeable. (From later discussion, see #8 (comment) for the implication of that for users of the API).
Don't include support for in-place operations in the spec. This is a conservative choice, one which might have negative performance consequences (but it's a little hard to say without looking carefully). At the very least, it might require a library like SciPy to retain a special path for numpy.ndarray objects.

To that I'd like to add a more granular option:

Require support for in-place operations that are unambiguous, and require raising an exception in case a view is mutated.

Rationale:

(a) This would require libraries that don't support mutation to write a wrapper layer, but the behaviour would be unambiguous and in most cases the wrapper would not be inefficient.
(b) In case inefficient mutation is detected (e.g. mutation a large array row-by-row in a loop), a warning may be emitted.

A variant of this option would be:

Require support for in-place operations that are unambiguous and mutate the whole array at once (i.e. += and out= must be supported, element/slice assignment must raise an exception), and require raising an exception in case a view is mutated.

Trade-off here is ease of implementation for libraries like Dask and JAX vs. putting a rewrite burden on SciPy et al. and a usability burden on end users (the alternative to element/slice assignment is unintuitive).

Supported data types

This issue seeks to come to a consensus on the minimum set of data types an array library must support in order to conform to the specification.

Prior Art

Supported data types across array libraries...

NumPy

bool_
bool8
byte
short
intc
int_
longlong
intp
int8
int16
int32
int64
ubyte
ushort
uintc
uint
ulonglong
uintp
uint8
uint16
uint32
uint64
half
single
double
float_
longfloat
float16
float32
float64
float96
float128
csingle
complex_
clongfloat
complex64
complex128
complex192
complex256
object_
bytes_
unicode_
void

PyTorch

bfloat16
bool
complex64
complex128
float16
float32
float64
int8
int16
int32
int64
uint8

Tensorflow

bool
bfloat16
complex64
complex128
float16
float32
float64
int16
int32
int64
qint8
qint16
qint32
quint8
quint16
string
uint8
uint16
uint32
uint64

bool
bfloat16
complex64
complex128
float16
float32
float64
int8
int16
int32
int64
uint8
uint16
uint32
uint64

CuPy

bool_
complex64
complex128
float16
float32
float64
int8
int16
int32
int64
uint8
uint16
uint32
uint64

Dask (see NumPy)
MXNet (see NumPy)
PyData/Sparse (see NumPy)

Proposal

This issue proposes to specify that all specification conforming array libraries must, at minimum, support the following data types:

bool

int8
int16
int32
int64

uint8
uint16
uint32
uint64

float32
float64

The above data types are common across all array libraries considered in prior art (with PyTorch being the exception).

Notes

complex64 and complex128 are currently omitted from this proposal, as I'd like to defer consideration of some of the thornier aspects of how complex numbers are handled for future specification iterations. The proposed types have considerable prior art and are well-established, and, when questions arise regarding their behavior, normative references, such as IEEE 754 for floating-point arithmetic, are available.

data-apis / array-api Goto Github PK

array-api's People

Contributors

Stargazers

Watchers

Forkers

array-api's Issues

Next steps

How?

Arithmetic Operations

Criterion

Questions

Overview

Goal

Method

APIs

Array Creation

Array Manipulation

Special Functions

Trigonometry

Complex Numbers

Reductions

Linear Algebra

Indexing

Next Steps

Questions

Current status

Linear algebra libraries

NumPy

Scikit-learn

SciPy

Deep learning frameworks

Dask

Auto-parallelization and nested parallelism

Limitations due to Python package distribution mechanisms

The need for a better API pattern or library

Special Functions

Rounding

Trigonometry

Criterion

Questions

Strategies for C/Cython API usage

Example situations for C/Cython usage

Situation 1: a Python package that is mostly pure Python, with a limited number of Cython extensions

Situation 2: a Python package that contains a lot of Cython code

Situation 3: a Python package that uses the NumPy or Python C API directly

Device support

Further Python API standardization

HPy

arange

eye

full (and full_like)

linspace

Prior Art

Proposal

Notes

Options

Manual

Local Dispatch

Global Dispatch

Static Typing

Unary APIs

Binary APIs:

Support stacks (batching)

No stack (batching) support

Support tolerances

Supported dtypes

Output values

Reduced output dims

Broadcasting

Specialized behavior

Functions

Criterion

Questions

Mutating arrays

Copy-view behaviour

Options for how to standardize

Prior Art

Proposal

Notes