data-apis / array-api Goto Github PK
View Code? Open in Web Editor NEWRFC document, tooling and other content related to the array API standard
Home Page: https://data-apis.github.io/array-api/latest/
License: MIT License
RFC document, tooling and other content related to the array API standard
Home Page: https://data-apis.github.io/array-api/latest/
License: MIT License
Following up on gh-129: we removed the tiny
attribute from finfo
because it's badly named. In the discussion @kgryte explained its purpose and proposed instead to add two new attributes: smallest_normal
and smallest_subnormal
. That does sound like a good idea, however it'd be good to propose adding those attributes to numpy.finfo
first, because it's not existing API and if NumPy doesn't want to extend finfo
then we're better off leaving it out probably.
Some issues with type annotations that were brought up by @BvB93 at numpy/numpy#18585
asarray
, it isn't clear how to define NestedSequence
(typing
annotations don't support recursive types).asarray
needs to also include array
as a valid type for obj
(array
should not be considred a subtype of NestedSequence
)NestedSequence
type in asarray
)stack
and concat
, the type for arrays
should be Tuple[array, ...]
instead of Tuple[array]
stack
and concat
(and possibly others), Tuple
is too restrictive vs. Sequence
.unique
can be implemented as an @overload
(see numpy/numpy#18585 (comment)). I don't know if this needs to be changed in the spec.finfo
as finfo
doesn't make sense if finfo
is a function (see numpy/numpy#18585 (comment)).__getitem__
and __setitem__
should include the ellipsis
type (numpy/numpy#18585 (comment))__len__
should return int
__len__
is slated for removal from the spec; see #289)shape
is currently Union[ Tuple[ int, …], <shape> ]
, which needs to be addressed (there is a TODO in the spec for this). (update: see #289)<...>
types. We should make all the type annotations be valid Python/mypy that can be used in the annotations in the signature.typing.Protocol
(numpy/numpy#18585 (comment))? (I don't know if this is relevant for the spec).__add__
, and so on) should include int
, float
, (and bool
as appropriate) as allowed types for other
.This issue is meant to collect libraries that we should be aware of and perhaps take into account (data on how their API looks, impact of choices on those libraries, etc.).
Array and tensor libraries (TODO: classify main characteristics):
And related projects (accelerators, runtimes, compiler infrastructure, etc.):
It makes sense to require positional only arguments for functions like add()
where there are no meaningful names, and likewise to require keyword arguments for true options.
However, this is less useful for most creation and manipulation functions. For example, arange
currently has the signature arange(start, /, *, stop=None, step=1, dtype=None)
, but readable code could pass both start
and stop
as either positional or keyword arguments, e.g., np.arange(start, stop)
and np.arange(start=0, stop=10)
.
I would suggest revisiting all of these functions and allowing arguments to be positional and keyword based when appropriate. Here are my suggestions off-hand:
arange(start, /, *, stop=None, step=1, dtype=None)
-> arange(start, stop=None, *, step=1, dtype=None)
empty(shape, /, *, dtype=None)
-> empty(shape, dtype=None)
full(shape, fill_value, /, *, dtype=None)
-> full(shape, fill_value, *, dtype=None)
linspace(start, stop, num, /, *, dtype=None, endpoint=True)
-> linspace(start, stop, num, *, dtype=None, endpoint=True)
ones
and zeros
should match empty
expand_dims(x, axis, /)
-> expand_dims(x, /, axis)
or even expand_dims(x, /, *, axis)
to match other manipulation functionsreshape(x, shape, /)
-> reshape(x, /, shape)
roll(x, shift, /, *, axis=None)
-> roll(x, /, shift, *, axis=None)
May as well do this before making things public, should be very quick. The green and dark blue will look better than the bright purple that's used now.
For array creation functions, device support will be needed, unless we intend to only support operations on the default device. Otherwise what will happen if any function that creates a new array (e.g. create the output array with empty()
before filling it with the results of some computation) is that the new array will be on the default device, and an exception will be raised if an input array is on a non-default device.
We discussed this in the Aug 27th call, and the preference was to do something PyTorch-like, perhaps a simplified version to start with (we may not need the context manager part), as the most robust option. Summary of some points that were made:
.shape
attribute is also a tensor, and that interacts badly with its context manager approach to specifying devices - because metadata like .shape
typically should live on the host, not on an accelerator.device=
keywordspmap
s can be decorated to override that. The different with other libraries that use a context is that JAX is fairly (too) liberal about implicit device copies.Links to the relevant docs for each library:
Next step should be to write up a proposal for something PyTorch-like.
In the rendered spec, there is a weird formatting issue. Every piece of inline code includes an extra space after it. For example, here:
It looks like:
x_i
is NaN
, the result is NaN
.instead of
x_i
is NaN
, the result is NaN
.If there is already a space after the code, it is rendered as code. If there isn't, one is added. The added spaces are also there if you copy-paste the text.
I'm not sure why this is happening, if it is some issue with Myst or our theme or something else.
The argmin
and argmax
functions require the keepdims
keyword argument, same as min
and max
. However, in NumPy, min
and max
have this keyword argument, but argmin
and argmax
do not. We should confirm whether this keyword argument actually makes sense for these functions, or whether it was just added to these functions by mistake because they are also in the non-arg variants.
The description for keepdims
is as follows:
keepdims : bool
If True , the reduced axes (dimensions) must be included in the result as singleton dimensions, and, accordingly, the result must be compatible with the input array (see Broadcasting ). Otherwise, if False , the reduced axes (dimensions) must not be included in the result. Default: False .
Perhaps the reason NumPy doesn't implement this for the arg*
functions is that they return indices, so maintaining broadcastability is not important. The documentation for max
says (emphasis added):
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
In the current specification, there are not standardized data type objects, or a specification as to what a data type object needs to implement: https://data-apis.org/array-api/latest/API_specification/data_types.html
Additionally, there's no APIs in the specification for checking dtypes, i.e. something like np.issubdtype
that is somewhat commonly used in code that would look something like:
def dispatch_based_on_dtype(array):
if np.issubdtype(array.dtype, np.integer):
return ...
elif np.issubdtype(array.dtype, np.floating):
return ...
else:
return ...
cc @jakirkham as I believe this pattern is used quite a bit in Dask which would presumably want to be able to target arbitrary array objects under the hood
Timeline (high-level)
Array API standard document intermediate steps:
Aug 10:
Aug 17:
svd
and qr
TODO)Aug 24:
meshgrid
) to be added)Aug 31:
Added later:
As discussed #25 (comment), we need to resolve if, and how, we should specify a sort order when returning unique values.
A corollary issue is, if we support an optional keyword to return sorted unique values, whether we should also support specifying the sort direction (ascending vs descending).
We could limit sorting to ascending order, but that this may be considered an arbitrary restriction may lend support for the argument of not returning ascending/descending sorted output at all. Instead, punting sorting to userland, where sort order can be specified via sort()
. However, as discussed in the OP, combining unique/sort may allow implementation perf optimizations which cannot be replicated when performed as two separate steps.
The top level URLs https://data-apis.github.io/ and https://data-apis.github.io/array-api/ give 404. It should be straightforward to make these work. The first can be done by adding a data-apis.github.io
repo on this org with a simple index.html page that redirects to the main page. The other can be done similarly with an index.html on the gh-pages
of this repo.
The description of __abs__
and especially __add__
seems to try to go into great detail about how floating point addition / abs
works.
I'd argue this obscures the intent behind what is ultimately a container API, and it would be better to refer to the IEEE 754 spec (or perhaps add a separate page summarizing IEEE 754)
I have been working on attempting to auto-generate a version of the NumPy API based on it's usage from downstream libraries. I am far enough along to present some end to end results, but I still need to run it with more examples for it to be that meaningful.
Here is the generating numpy
module, based on running the skimage, xarray, and sklearn test suites.
I would appreciate any feedback on the end result or the process. My next steps are to start looking for more codebases to run and analyze. If you wanna take it for a spin, please feel free to clone the repo and run it on your own codebase, and upload the results as well. I will work on adding some more instructions, but the Makefile should get you started.
Also, it would be nice to match it against the documentation data or other more curated resources. We could also experiment with hand writing a list of included functions/classes, and letting this generate signatures for us.
Broadly speaking, this can help us get a sense of what the current API usage looks like for different array libraries and so could help form the base of a proposed API spec. The JSON format is a bit verbose, but does work at describing the different forms of the APIs.
Any other ideas on where to move with this would be appreciated. Or better yet, download the data and tools yourself and see if it's useful.
That prettier form is generated from a structured JSON file, which in turn is generated from the various traces of running the different test suites.
It works by using the setprofile
hook to intercept every bytcode execution, and peek at the stack to see if it's a function call what the function and arguments are. It then saves calls from some particular module (xarray and skimage in this case) and to some particular module (numpy), ignoring the rest.
For the API generation, it tries to take the union of the various types and call signatures to come up with a single signature for each function.
Lot's of limitations here, but it gives a start. Again, any feedback would be much appreciated.
JAX is built on top of XLA which (currently) requires static memory allocation for operations. This means it is not possible to express operations like x[y]
where y
is a boolean, because the resulting array has value dependent size: https://data-apis.github.io/array-api/latest/API_specification/indexing.html#boolean-array-indexing
It's definitely possible that the static memory allocation requirement could be relaxed in the future, but dynamic memory allocation is always going to be harder to implement in a performant way. For example, I would guess Numba also struggles with this sort of operation. I don't think we should require it for array libraries implementing our standard, since it isn't needed for the majority of array operations.
Reading through the standard, it appears that we may have missed an important feature: the ability to explicit coerce objects into a desired array type, either from builtin Python types like float/list or other array libraries. In other words, we need something like NumPy's array()
and/or asarray()
functions.
@jakevdp suggested that a nice way to summarize the signed/unsigned integer type promotion rules would be with a lattice, e.g.,
i*
denotes a Python int
(with unspecified precision).
This is a subset of the full type promotion lattice from the JAX docs:
https://jax.readthedocs.io/en/latest/type_promotion.html
The lattice for floats would just be f* -> f4 -> f8
.
Based on the analysis of array library APIs, we know that performing element-wise arithmetic operations is both universally implemented and commonly used. Accordingly, this issue proposes to standardize the following arithmetic operations:
Some libraries, particularly those with a graph-based computational model (e.g., Dask and TensorFlow), have support for "unknown" or "data dependent" shapes, e.g., due to boolean indexing such as x[y > 0]
(#84). Other libraries (e.g., JAX and Dask in some cases) do not support some operations because they would produce such data dependent shapes.
We should consider a standard way to represent these shapes in shape
attributes, ideally some extension of the "tuple of integer" format used for fully known shapes. For example, TensorFlow and Dask currently use different representations:
TensorShape
object (which acts very similarly to tuple
), where some values may be None
nan
integer of integersThe current specification is "Boolean ( True or False ) stored as a byte."
https://data-apis.github.io/array-api/latest/API_specification/data_types.html#bool
In my view, storing the data as a byte is an implementation detail not appropriate to include in the specification, particularly given our goal to be hardware agnostic. For example, its should be possible to make a compliant array library that stores booleans in a single bit each.
To help further the discussion of what array APIs should be included in the standard, I've compiled a (WIP) list of common APIs across various array libraries.
This list should provide some indication as to API importance from the library development perspective based on API curation and need and should summarize current existing practice.
To standardize a common set of core APIs and minimal signatures (i.e., argument order and keyword arguments) that every array API should implement in order to be array specification compliant.
I compiled the list by doing the following:
The following libraries were analyzed:
The following APIs were found to be common across the above libraries (using NumPy's naming conventions):
angle
arange
arccos
arcsin
arctan
arctan2
argmax
argmin
array
ceil
concatenate
conj
cos
cosh
cumprod
cumsum
einsum
exp
expm1
eye
flip
floor
full
imag
linalg.cholesky
linalg.inv
linalg.norm
linalg.qr
linalg.solve
linalg.svd
linspace
log
log1p
logaddexp
matmul
maximum
mean
meshgrid
minimum
ones
ones_like
prod
real
reshape
roll
sign
sin
sinh
sqrt
square
squeeze
stack
std
sum
tan
tanh
tensordot
trace
transpose
trunc
var
where
zeros
zeros_like
We can split these APIs into various categories as follows...
arange
array
eye
full
linspace
meshgrid
ones
ones_like
zeros
zeros_like
concatenate
flip
reshape
roll
squeeze
stack
ceil
exp
expm1
floor
log
log1p
logaddexp
maximum
minimum
sign
square
sqrt
trunc
arccos
arcsin
arctan
arctan2
cos
cosh
sin
sinh
tan
tanh
angle
conj
imag
real
cumprod
cumsum
mean
prod
std
sum
var
einsum
linalg.cholesky
linalg.inv
linalg.norm
linalg.qr
linalg.solve
linalg.svd
matmul
tensordot
trace
transpose
argmax
argmin
where
absolute
vs abs
). Are there APIs which should be aliased differently?ufuncs
) and reductions?apply
, reduce
)?Feedback is welcome. :)
It should do something like (now doing locally)
# Install packages in requirements.txt
git clean -xdf
pushd spec
make html
popd
(git checkout --orphan tmp && git branch -D gh-pages || true)
git checkout --orphan gh-pages
git reset --hard
cp -R spec/_build/html/ latest
touch .nojekyll
git add .nojekyll
git add latest
git commit -m "Commit API Standard doc build";
git push --set-upstream origin gh-pages --force
Put the above in a file deploy-ghpages.sh
at ..
from the repo root, then run (from master
, in the repo root):
source ../deploy-ghpages.sh
Several people have expressed a strong interest in talking about and working on (auto-)parallelization. Here is an attempt at summarizing this topic.
The main accelerated linear algebra libraries that are in use (for CPU based
code) are OpenBLAS and
MKL.
Both of those libraries auto-parallelize function calls.
OpenBLAS can be built with either its own pthreads-based thread pool, or with
OpenMP support. The number of threads can be controlled with an environment
variable (OPENBLAS_NUM_THREADS
or OMP_NUM_THREADS
), or from Python via
threadpoolctl. The conda-forge
OpenBLAS package uses OpenMP; the OpenBLAS builds linked into NumPy and SciPy
wheels on PyPI use pthreads.
MKL supports OpenMP and Intel TBB as the threading control mechanisms. The
number of threads can be controlled with an environment variable
(MKL_NUM_THREADS
or OMP_NUM_THREADS
), or from Python with threadpoolctl
.
NumPy does not provide parallelization, with the exception of linear algebra
routines which inherit the auto-parallelization of the underlying library
(OpenBLAS or MKL typically). NumPy does however release the GIL consistently
where it can.
Scikit-learn provides a keyword n_jobs=1
in many estimators and other
functions to let users enable parallel execution. This is done via the
joblib library, which provides both
multiprocessing (default) and threading backends that can be selected with a
context manager.
Scikit-learn also contains C and Cython code that uses OpenMP. OpenMP is
enabled in both wheels on PyPI and in conda-forge packages. The number of
threads used can be controlled with the OMP_NUM_THREADS
environment variable.
Scikit-learn has good documentation on parallelism and resource management.
SciPy provides a workers=1
keyword in a (still limited) number of functions
to let users enable parallel execution. It is similar to scikit-learn's
n_jobs
keyword, except that it also accepts a map
-like callable (e.g.
multiprocess.Pool.map
to allow using a custom pool. C++ code in SciPy uses
pthreads; the use of OpenMP was
discussed and rejected.
scipy.linalg
also provides a Cython API for BLAS and LAPACK. This lets other
libraries use linear algebra routines without having to ship or build against
an accelerated linear algebra library directly. Scikit-learn, statsmodels and
other libraries do this - thereby again inheriting the auto-parallelization
behavior from OpenBLAS or MKL.
TensorFlow, PyTorch, MXNet and JAX all have auto-parallelization behavior.
Furthermore they provide support for distributed computing (with the exception
of JAX). These frameworks are very performance-focused, and aim to optimally
use all available hardware. They typically allow building with different
backends like NCCL or GLOO for GPU support, and use OpenMP, MPI, gRPC and more.
The advantage these frameworks have is that users typically only use this one
framework for their whole program, so the parallelism used can be optimized
without having to play well with other Python packages that also execute code
in parallel.
Dask provides parallel arrays, dataframes and machine learning algorithms with
APIs that match NumPy, Pandas and scikit-learn as much as possible. Dask is a
pure Python library and uses blocked algorithms; each block contains a single
NumPy array or Pandas dataframe. Scaling to hundreds of nodes is possible; Dask
is a good solution to obtain distributed arrays. When used as a method to
obtain parallelism on a single node however, it is not very efficient.
Some libraries, like the deep learning frameworks, do auto-parallelization.
Most non deep learning libraries do not do this. When a single library or
framework is used to execute an end user program, auto-parallelization is
usually a good thing to have. It uses all available hardware resources in an
optimal fashion.
Problems can occur when multiple libraries are involved. What often happens is
oversubscription of resources. For example, if an end user would write code
using scikit-learn with n_jobs=-1
, and NumPy would auto-parallelize
operations, then scikit-learn will use N
processes (on an N
-core machine)
and NumPy will use N
threads per process - leading to N^2
threads being
used. On machines with a large number of cores, the overhead of this quickly
becomes problematic. Given that NumPy uses OpenBLAS or MKL, this problem
already occurs today. For a while Anaconda and Intel shipped a modified NumPy
version that had auto-parallelization behavior for functions other than linear
algebra - and the problem occurred more frequently.
The paper Composable Multi-Threading and Multi-Processing for Numeric
Libraries
from Malakhov et al. contains a good overview with examples and comparisons
between different parallelization methods. It uses NumPy, SciPy, Dask, and
Numba, and uses multiprocessing
, concurrent.futures
, OpenMP, Intel TBB
(Threading Building Blocks), and a custom library SMP (symmetric
multi-processing).
When one wants to use auto-parallelization, it's important to have control over
the complete set of packages that a user gets installed on their machine. That
way one can ensure there's a single linear algebra library installed, and a
single OpenMP runtime is used.
That control over the full set of packages is common in HPC type situations,
where admins need to deal with build and install requirements to make libraries
work well together. Both packages managers (e.g. Apt in Debian) and Conda have
the ability to do this right as well - both because of dependency resolution
and because of a common build infrastructure.
A large fraction of Python users install packages from PyPI with Pip however.
The binary installers (wheels) on PyPI are not built on a common
infrastructure, and because there's no real support for non-Python
dependencies, libraries like OpenMP and OpenBLAS are bundled into the wheels
and installed into end user environments multiple times. This makes it
very difficult to reliably use, e.g., OpenMP. For this reason SciPy uses custom
pthreads thread pools rather than OpenMP.
The default behavior for libraries like NumPy and SciPy given the status of the
ecosystem today should be to be single-threaded, otherwise it composes badly
with multiprocessing, scikit-learn (joblib), Dask, etc. However, there's
room for improvement here. Two things that could help improve the coordination
of parallelization behavior in a stack of Python libraries are:
A common API pattern is the simpler of the two options. It could be a keyword
like n_jobs
or workers
that gets used consistently between libraries, or a
context manager to achieve the same level of per-function or per-code-block
control.
A common library would be more powerful and enable auto-parallelization rather
than giving the user control (which is what the API pattern does). From a
performance perspective, having arrays and dataframes auto-parallelize their
functions as much as possible over all cores on a single node, and then letting
a separate library like Dask deal with multi-node coordination, seems optimal.
Introducing a new dependency into multiple libraries at the core of the PyData
ecosystem is a nontrivial exercise however.
The above attempts to summarize the state of affairs today. The topic of
parallelization is largely an implementation rather than an API question,
however there is an API component to it with option (1) above. How to move
forward here is worth discussing.
Note: there's also a lot of room left in NumPy also for optimizing
single-threaded performance. There's ongoing work on making better use of
intrinsics (this is a large effort, ongoing), or using SLEEF for vector math
(discussed in the past, no one is working on it).
The following is a list of APIs which are currently included in the specification and/or are proposed to be included in the specification. This is intended to provide a checkpoint to determine any glaring omissions.
arange
empty
empty_like
eye
full
full_like
linspace
ones
ones_like
zeros
zeros_like
concat
expand_dims
flip
reshape
roll
squeeze
stack
abs
acos
acosh
add
asin
asinh
atan
atanh
bitwise_and
bitwise_invert
bitwise_left_shift
bitwise_or
bitwise_right_shift
bitwise_xor
ceil
cos
cosh
divide
equal
exp
expm1
floor
floor_divide
greater
greater_equal
isfinite
isnan
isinf
less
less_equal
log
log1p
log2
log10
logical_and
logical_not
logical_or
logical_xor
multiply
negative
not_equal
positive
pow
remainder
round
sign
sin
sinh
square
sqrt
subtract
tan
tanh
trunc
cross
det
diagonal
inv
norm
outer
trace
transpose
max
mean
min
prod
std
sum
var
unique
argmax
argmin
nonzero
where
argsort
sort
all
any
Based on the analysis of array library APIs, we know that evaluating element-wise elementary mathematical functions is both universally implemented and commonly used. Accordingly, this issue proposes to standardize the following elementary mathematical functions:
atan2
).atan
vs arctan
).This is a question that we may have covered but not clearly enough because it keeps coming up in conversations: "how do I check if something is a compliant array object?" People have asked me, and it has also come up in the Dask tracking issue and the NEP 47 review:
The current answer is:
isinstance
check.__array_namespace__
attribute, so at runtime the check to do is hasattr(x, '__array_namespace__')
.def is_arrayobj(x):
return hasattr(x, '__array_namespace__')
Array
typing Protocol: https://data-apis.org/array-api/latest/design_topics/static_typing.html. This is still to be implemented, but isn't hard to do. Again people should vendor this.Note that in NEP 47 we (I) messed something up: the initial thought was to make numpy.ndarray
the object that was the standards-compliant one - and therefore have a __array_namespace__
attribute. However we decided later to create a separate array object. numpy.ndarray
is not compliant, and therefore should not have that attribute. However, that also means there is no way to retrieve the compliant namespace from a regular numpy ndarray instance. Hence it should probably be special-cased:
def is_arrayobj_or_ndarray(x):
# This is just a sketch
try:
import numpy as np
if isinstance(x, np.ndarray):
x_new = np.array_api.asarray(x)
# What the caller will need to do is convert back to ndarray at the end ....
return True, x_new
except ImportError:
pass
return hasattr(x, '__array_namespace__')
The alternative would be two separate checks; either way we need a design pattern that does (at least for existing libraries like SciPy, which must also support numpy.ndarray
):
numpy.ndarray
instancendarray
instance, convert to compliant array objectndarray
instance, convert back to ndarray
at the endTODO:
Use of a C API is out of scope for this array API, as mentioned in :ref:Scope
.
There are a lot of libraries that do use such an API - in particular via Cython code
or via direct usage of the NumPy C API. When the maintainers of such libraries
want to use this array API standard to support multiple types of arrays, they
need a way to deal with that issue. This section aims to provide some guidance.
The assumption in the rest of this section is that performance matters for the library,
and hence the goal is to make other array types work without converting to a
numpy.ndarray
or another particular array type. If that's not the case (e.g. for a
visualization package), then other array types can simply be handled by converting
to the supported array type.
.. note::
Often a zero-copy conversion to `numpy.ndarray` is possible, at least for CPU arrays.
If that's the case, this may be a good way to support other array types.
The main difficulty in that case will be getting the return array type right - however,
we do have a Python-level API for that.
.. note::
Projects in this situation include Statsmodels, scikit-bio and QuTiP
Main strategy: documentation. The functionality using Cython code will not support other array types (or only with conversion to/from numpy.ndarray
), which can be documented per function.
.. note::
Projects in this situation include scikit-learn and scikit-image
Main strategy: add support for other array types per submodule. This keeps it manageable to explain to the user which functionality does and doesn't have support.
Longer term: specific support for particular array types (e.g. cupy.ndarray
can be supported with Python-only code via cupy.ElementwiseKernel
).
.. note::
Projects in this situation include SciPy and Astropy
Strategy: similar to situation 2, but the number of submodules that can support all array types may be limited.
Supporting non-CPU array types in code using the C API or Cython seems problematic,
this almost inevitably will require custom device-specific code (e.g., CUDA, ROCm) or
something like JIT compilation with Numba.
There may be cases where it makes sense to standardize additional sets of functions, because they're important enough that array libraries tend to reimplement them. An example of this may be special functions, as provided by scipy.special
. Bessel and gamma functions for example are commonly reimplemented by array libraries.
HPy is a new project that will provide a higher-level
C API and ABI than CPython offers. A Cython backend targeting HPy will be provided as well.
I find this example very confusing:
This happens when views are combined with mutating operations. This simple example illustrates that:
x = ones(1) x += 2 y = x # `y` *may* be a view y -= 1 # if `y` is a view, this modifies `x`
The semantics of Python mean that y = x
always makes y
a "view" of x - x is y
by definition.
I think the point this example is trying to make is that "if -=
operates in place, this modifies x
" - but the comments tell a different and incorrect story.
Either that, or the example should include a slice operation of some kind, where the library actually does have an chance to decide whether to make a copy or view.
The current spec is silent on what the return value from an indexing operation should be: https://data-apis.github.io/array-api/latest/API_specification/indexing.html
I propose that the return value should always be another array object, i.e., so __getitem__
could be type annotated as __getitem__(self: Array, key: IndexKey) -> Array
. However, this is inconsistent with NumPy, which returns NumPy scalars in some cases when indexing with integers.
Some issues I noticed in the array creation functions from adding tests to the test suite:
dtype
is None
, the output array data type must be the default floating-point data type." I think the default for arange should be int if all the arguments are integers.step
cannot be 0.ceil((stop-start)/step)
" should be caveated (stop and step provided, stop >= start for step > 0 and stop <= start for step < 0)dtype
is None
, the output array data type must be the default floating-point data type." I think the default should be a corresponding dtype to the input value (we don't have a notion of a "default" integer dtype).>>> np.linspace(0, 9288674231451855, 2, dtype=np.int64)
array([ 0, 9288674231451856])
The stop value is different from what is given because of floating point loss of precision when computing the linspace.
Feedback from @mruberry: it may be nice to support always returning the same type from functions like Cholesky, with tensors accessed by names. That way frameworks could return additional tensors and there wouldn't be BC-compatibility issues if more tensor returns are added later. This suggestion, however, comes from a place of "it'd be nice if people wanted to use this API", as opposed to the perspective of, "this API is a lowest common denominator only intended for library writers."
cholesky
always return a single array, so I thought it'd be fine - but the comment may be related to pytorch/pytorch#47608 (desire to return an error code rather than raise a RuntimeError
for failures). For qr
, svd
and perhaps some more functions there's the related issue of different returns based on keywords. Using a class to stuff all return values in is a common design pattern for, e.g., scipy.optimize
. I'm not convinced it's a good idea for the linalg functions here, but worth considering at least.
This issue seeks to come to a consensus on a subset of type promotion rules (i.e., the rules governing the common result type for two array operands during an arithmetic operation) suitable for specification.
As initially discussed in #13, a universal set of type promotion rules can be difficult to standardize due to the needs/constraints of particular runtime environments. However, we should be able to specify a minimal set of type promotion rules which all specification conforming array libraries can, and should, support.
promote_types
and result_type
APIs and source [1, 2].This issue proposes to specify that all specification conforming array libraries must, at minimum, support the following type promotions:
floating-point type promotion table:
f2 | f4 | f8 | |
---|---|---|---|
f2 | f2 | f4 | f8 |
f4 | f4 | f4 | f8 |
f8 | f8 | f8 | f8 |
where
unsigned integer type promotion table:
u1 | u2 | u4 | u8 | |
---|---|---|---|---|
u1 | u1 | u2 | u4 | u8 |
u2 | u2 | u2 | u4 | u8 |
u4 | u4 | u4 | u4 | u8 |
u8 | u8 | u8 | u8 | u8 |
where
signed integer type promotion table:
i1 | i2 | i4 | i8 | |
---|---|---|---|---|
i1 | i1 | i2 | i4 | i8 |
i2 | i2 | i2 | i4 | i8 |
i4 | i4 | i4 | i4 | i8 |
i8 | i8 | i8 | i8 | i8 |
where
mixed unsigned and signed integer type promotion table:
u1 | u2 | u4 | |
---|---|---|---|
i1 | i2 | i4 | i8 |
i2 | i2 | i4 | i8 |
i4 | i4 | i4 | i8 |
(i8, f2)
to f2
, while NumPy promotes (i8, f2)
to f8
). The reason for the discrepancy stems from the particular needs/constraints of accelerator devices, and, thus, by omitting specification here, we allow for implementation flexibility and avoid imposing undue burden.i8
and u8
. NumPy and JAX both promote (i8, u8)
to f8
which is explicitly undefined via the aforementioned note regarding conversions between kinds and also raises questions regarding inexact rounding when converting from a 64-bit integer to double-precision floating-point.Here's a list of questions that we need to think about when driving adoption in both array/tensor libraries, and further downstream.
arrayobject.__array_namespace__()
or additionally via a direct import? If the latter, what should it be named? related to gh-16np.asarray
in downstream libraries?Protocol
mentioned in the Static typing section)Please add more questions if you can think of them.
Updating the branch now before stuff grows will make things easier.
A few issues with the mixing arrays and Python scalars section that came up from testing:
@
) does not (and should not) support scalar types, so it should be excluded.>>> import numpy as np
>>> np.array(False) - False
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
int8 array + large Python int
. Should it cast the int, or give an error. Or should this be unspecified.Currently the recommendation for code that needs to promote is:
Therefore we had to make the choice to leave the rules for “mixed kind dtype casting” undefined - when users want to write portable code, they should avoid this situation or use explicit casts to obtain the same results from different array libraries.
In eager mode frameworks, making operations portable in this way also can result in extra memory usage, as you have to first do the cast, and then do the operation, whereas the cast might have been fused internally in the framework.
However, it's not clear how important this actually is in practice, since at least in PyTorch, a lot promotion is in fact implemented by first doing a cast and then doing a normal operation (instead of implementing quadratically many kernels). I don't know off the top of my head which operations this would be relevant for.
To do before public release, to ensure people who want to contribute get the right guidance. The scope and the process for adding new features should be linked, to let people assess whether the PR they're about to make is the right thing to work on or not.
This issue is meant to summarize the current status and likely future direction of the NumPy array protocols, and their relevance to the array API standard.
What are these array protocols?
In summary, they are dispatching mechanisms that allow calling the public NumPy API with other numpy.ndarray
-like arrays (e.g. CuPy or Dask arrays, or any other array that implements the protocols) and have the function call dispatch to that library. There are two protocols, __array_ufunc__
and __array_function__
, that are very similar - the difference is that with __array_ufunc__
the library being dispatched to knows it's getting a ufunc and it can therefore make use of some properties all ufuncs have. The dispatching works the same for both protocols though.
Why were they created?
__array_ufunc__
was created first, the original driver was to be able to call numpy ufuncs on scipy.sparse
matrices. __array_function__
was created later, to be able to cover most of the NumPy API (every function that takes an array as input) and use the NumPy API with other array/tensor implementations:
What is the current status?
The protocols have been adopted by:
They have not (or not yet) been adopted by:
scipy.sparse
(semantics not compatible)The RAPIDS ecosystem, which builds on Dask and CuPy, has been particularly happy with these protocols, and use them heavily. There they've also run into some of the limitations, the most painful one being that array creation functions cannot be dispatched on.
What is likely to change in the near future?
There is still active exploration of new ideas and design alternatives (or additions to) the array protocols. There's 3 main "contenders":
__duckarray__
) + NEP 35 (like=
).__array_module__
)unumpy
)At the moment, the most likely outcome is doing both (1) and (2). It needs prototyping and testing though - any solution should only be accepted when it's clear that it not only solves the immediate pain points RAPIDS ran into, but also that libraries like scikit-learn and SciPy can then adopt it.
What is the relationship of the array protocols with an API standard?
There's several connections:
__array_function__
(figure above) doesn't require an API that's the same as the NumPy one, but in practice the protocols can only be adopted when there's an API with matching signatures and semantics.__array_module__
, unumpy
) provide a good opportunity to introduce a new API standard once that's agreed on.References
__array_function__
In today's call we discussed that the current Array API standard does not regulate the default integer and floating-point types across all implementations, only that each implementation should pick one, document clearly and stick to it. However, this is not strong enough, as there could be cross-platform/portability issues.
For example, NumPy is inconsistent in handling the Python integers between Windows and Linux:
import numpy as np
a = np.arange(10, dtype=int)
a.dtype # np.int32 on Windows, np.int64 on Linux
Similar issues can be found in other APIs.
It's worth noting that as pointed out by @kgryte, the standard does not permit passing dtype=int
to most of the functions, so this could eliminate a large class of such inconsistencies. But it's still good to be explicit in the standard to ensure portability.
I wanted to open a discussion on how the Array API (and potentially the dataframe API) will be exposed to downstream libraries.
For example, let's say I am the author of scikit-learn. How do I get access to an "Array compatible API"? Or let's say I am a downstream user, using scikit-learn in a notebook. How can I tell it to use Tensorflow over NumPy?
I present three options here, but I would appreciate any suggestions on further ideas:
The default option is the current status quo where there is no standard way to get access to some array conformant API backend.
Different downstream libraries, like scikit-learn, could introduce their own mechanisms, like a backend
kwarg to functions, if they wanted to support different backends.
Another approach, would be to provide access to the related module from particular instances of the objects, which is the one taken by NEP 37.
In this case, scikit-learn would either call some x.__array_module__()
method on its inputs or we would provide a array-api
Python package that would have a helper function like get_array_module(x)
, similar to the NEP.
There is an open PR in scikit-learn (scikit-learn/scikit-learn#16574) to add support for NEP 37.
Instead of requiring an object to inspect, we could instead rely on a global context to store the "active array api" and provide ways of getting and settings this. Some form of this is implemented by scipy, with their scipy.fft.set_backend
, which uses uarray
.
This would be heavier weight than we would need, probably, but does illustrate the general concept. I think if we implemented this, we could use Context Variables like python's built in decimal
module does. i.e. something like this:
from array_api import set_backend, get_backend
import cupy
with set_backend(cupy):
some_fn()
def some_fn():
np = get_backend()
return np.arange(10)
The advantage of using a global dispatch is then you don't need to rely on passing in some custom instance class to set the backend.
This is slightly tangential, but one question that comes up for me is how we could properly statically type options 2 or 3. It seems like what we need is a typing.Protocol
but for modules. I raised this as a discussion point on the typing-sig
mailing list.
I think standardized functions should include signatures that are at least in principle compatible with static type checking tools such as mypy, ideally treating dtype and shape as an implicit part of array types. This would eventually allow for static checking of code using these array APIs, which would using these APIs much safer.
This would entail standardizing a number of implementation details beyond those in the current draft specs, e.g.,
axis
must use a Python int
or tuple[int, ...]
.add(float64, float64) -> float64
? One argument against codifying dtype promotion is that the details of dtype promotion are rather tricky, and NumPy's rules are unsuitable for accelerators (JAX and PyTorch implement incompatible semantics). On the other hand, requiring explicit dtype casting (like in TensorFlow) is rather annoying for users (maybe less of a problem for library code).The intent of this issue is to provide a bird's eye view of linear algebra APIs in order to extract a consistent set of design principles for current and future spec evolution.
Main idea is that, if an operation is explicitly defined in terms of matrices (i.e., 2D arrays), then an API should support stacks of matrices (aka, batching).
Unary:
axis
is 2-tuple containing last two dimensions)axis1
and axis2
)axis1
and axis2
)Binary:
Binary:
rtol*largest_singular_value
)rtol*largest_singular_value
)rtol*largest_singular_value
)Main idea here is that we should avoid undefined/ambiguous behavior. For example, when type promotion rules cannot capture behavior (e.g., if accept
int64
, but need to return asfloat64
), how would casting work? Based on type promotion rules only addressing same-kind promotion, would be up to the implementation, and thus ill-defined. To ensure defined behavior, if need to return floating-point, require floating-point input.
Numeric:
Floating:
Any:
Array:
Tuple:
svd
and svdvals
(similar to eig/eigvals
)Note: only SVD is polymorphic in output (compute_uv
keyword)
keepdims
argkeepdims
keepdims
keepdims
keepdims
keepdims
keepdims
Conclusion: only norm
is unique here in allowing the output array rank to remain the same as that of the input array.
ndims-1
dimensions)ndims-1
dimensions)rtol
)rtol
)upper
compute_uv
and full_matrices
ord
and keepdims
mode
axes
Based on the analysis of array library APIs, we know that performing basic statistical functions is both universally implemented and commonly used. Accordingly, this issue proposes to standardize the following functions:
When surveying a representative set of advanced users and research software engineers in 2019 (for this NSF proposal), the single most common pain point brought up about SciPy was performance.
SciPy heavily relies on NumPy (its only non-optional runtime dependency). NumPy provides an array implementation that's in-memory, CPU-only and single-threaded. Common things users ask for are:
Some parallelism, in particular via multiprocessing
, can be supported, however SciPy itself will not directly start depending on a GPU or distributed array implementation, or contain (e.g.) CUDA code - that's not maintainable given the resources for development. However, there is a way to provide distributed or GPU support. Part of the solution is provided by NumPy's "array protocols" (see gh-1), that allow dispatching to other array implementations. The main problem then becomes how to know whether this will work with a particular distributed or GPU array implementation - given that there are zero other array implementations that are even close to providing full NumPy compatibility - without adding it as a hard dependency.
It's clear that SciPy functionality that relies on compiled extensions (C, C++, Cython, Fortran) directly won't work. Pure Python code can work though. There's two main possibilities:
Option (2) seems strongly preferable, and that "well-defined subset" is what an API standard should provide. Testing will still be needed, to ensure there are no critical corner cases or bugs between array implementations, however that's then a very tractable task.
It would be useful for the test suite to have the function metadata stored in a machine readable format. Currently I am parsing the function signatures from the spec files using some regular expressions, and I will probably end up parsing some other information such as types as well. This works fine for now, but it would be cleaner if this data were stored in a machine readable format, say in JSON, and the relevant parts of the spec documents generated from that automatically.
To be sure, not everything in the spec needs to be in JSON, just the parts that will need to be extracted for other things as well, such as the test suite. There should still be a lot of plain English descriptions of behavior.
This is likely too much work for version 1 given that we already have things inline in the Markdown, but it's something to consider for future iterations.
I conceive that a few libraries may end up in a situation where more than one protocols are supported, for example,
__array_interface__
__cuda_array_interface__
Does it matter which protocol an array implementation should try first? Is it up to the library implementors?
I am wondering the reason that complex numbers are not considered in the Array API, and if we could give a second thought to make them native dtypes in the API.
The Dataframe API is not considered in the rest of this issue 🙂
I spent quite some time on making sure complex numbers are first-class citizens in CuPy, as many scientific computing applications require using complex numbers. In quantum mechanics, for example, complex numbers are the cornerstones and we can't live without them. Even in some machine learning / deep learning works that we do, either classical or quantum (yes, for those who don't know already there is quantum machine learning 😁), we also need complex numbers in various places like building tensors or communicating with simulations, especially those applying physics-aware neural networks, so it is a great pain to us not being able to build and operate on complex numbers natively.
To date, complex numbers are also an integral part of mainstream programming languages. For example, C has it since C99, and so is C++ (std::complex
). Our beloved Python has complex
too, so it is just so weird IMHO that when we talk about native dtypes they're being excluded.
As for language extensions to support GPUs, in CUDA we have thrust::complex
(which currently supports complex64
/complex128
) as a clone of std::complex
and it is likely that libcu++
will replace Thrust on this aspect, and in ROCm there's also a Thrust clone and native support in HIP, so at least on NVIDIA/AMD GPUs we are good.
Turning to library support, as far as I know
complex64
/complex128
, but not complex32
(numpy/numpy#14753)complex64
/complex128
, and complex32
is being evaluated (ex: cupy/cupy#4454)complex32
/complex64
/complex128
is catching up (I am unaware of any meta-issue summarizing the status quo, but the label module: complex
is a good referencecupyx.scipy
has many components supporting complex numbers, the most recent prominent case being the extensive ndimage
overhaul (ex: scipy/scipy#12725) done by @grlee77 for image processing (yes, image processing also needs complex numbers!)The reason I also mention complex32
above is because CUDA now provides complex32
support in some CUDA libraries like cuBLAS and cuFFT. With special hardware acceleration over float16
, it is expected that complex32
can also benefit, see the preliminary FFT test being done in cupy/cupy#4407. Hopefully by having complex number support in ML/DL frameworks (complex64
and complex128
are enough to start) many more applications can be benefited as well.
I am aware that Array API picks DLPack as the primary protocol for zero-copy data exchange, and that it currently lacks complex number support. This is one of the reasons I do not like DLPack. While I will create a separate issue to discuss about alternatives to DLPack, I think revising DLPack's format is fairly straightforward (and should be done asap regardless of the Array API standardization due to the need of ML/DL libraries).
Disclaimer: This issue is merely for my research interests (relevant to my and other colleagues' work) and is not driven by CuPy, one of the Array API stakeholders I will represent.
Context:
That issue and PR were about unrelated topics, so I'll try to summarize the copy-view and mutation topic here and we can continue the discussion.
Note that the two topics are fairly coupled, because copy/view differences only matter (for semantics, not for performance) when mixed with mutation.
There's a number of things that may rely on mutation:
+=
, *=
out=
keyword argument__setitem__
Summary of the issue with mutation by @shoyer was: Mutation can be challenging to support in some execution models (at least without another layer of indirection), which is why several projects currently don't support it (TensorFlow and JAX) or only support it half-heartedly (e.g., Dask). The commonality between these libraries is that they build up abstract computations, which is then transformed (e.g., for autodiff) and/or executed in parallel. Even NumPy has "read only" arrays. I'm particularly concerned about new projects that implement this API, which might find the need to support mutation burdensome.
@alextp said: TensorFlow was planning to add mutability and didn't see a real issue with supporting out=
.
@shoyer said: It's definitely always possible to support mutation at the Python level via some sort of wrapper layer.
dask.array
is perhaps a good example of this. It supports mutating operations and out in some cases, but its support for mutation is still rather limited. For example, it doesn't support assignment like x[:2, :] = some_other_array
.
Working around limitations of no support for mutation can usually be done by one of:
where
for selection, e.g., where(arange(4) == 2, 1, 0)
y = array([0, 1]); x = y[[0, 0, 1, 0]]
in this caseSome version of (2) always works, though it can be tricky to work out (especially with current APIs). The duality between indexing and assignment is the difference between specifying where elements come from or where they end up.
The JAX syntax for slice assignment is: x.at[idx].set(y) vs x[idx] = y
One advantage of the non-mutating version is that JAX can have reliable assigning arithmetic on array slices with x.at[idx].add(y)
(x[idx] += y
doesn't work if x[idx]
returns a copy).
A disadvantage is that doing this sort thing inside a loop is almost always a bad idea unless you have a JIT compiler, because every indexing assignment operation makes a full copy. So the naive translation of an efficient Python loop that fills out an array row by row would now make a copy in each step. Instead, you'd have to rewrite that loop to use something like concatenate instead (which in my experience is already about as efficient as using indexing assignment).
Libraries like NumPy and PyTorch return views where possible from function calls. It's sometimes hard to predict when a view will be returned vs. when a copy - it not only depends on the function in question, but also on whether the input array is contiguous, and sometimes even on input dtype.
This is one place where it's hard to avoid implementation choices leaking into the API:
transpose()
.transpose()
).The above copy vs. view difference starts leaking into the API - i.e., the same code starts giving different results for different implementations - when it is combined with an operation that performs in-place mutation of an array (either the base array or the view on it). In the absence of that combination, views are simply a performance optimization that's invisible to the user.
The question is whether copy-view differences should be allowed, and if so how to deal with the semantics that vary between libraries.
To answer whether is should be allowed, let's first ask how often the combination of views and mutation is used. A few observations:
*=
, +=
and ] =
in SciPy and scikit-learn .py
files shows that in-place mutation inside functions is heavily used.+= 1
) and mutating part of an array (e.g. with x[:, :2] = y
). The former is a lot easier to support for array libraries employing static graphs or a JIT than the latter. See the discussion at #8 (comment) for details.In #8 @shoyer listed the following options for how to deal with mutability:
ndarray.flags.writeable
. (From later discussion, see #8 (comment) for the implication of that for users of the API).To that I'd like to add a more granular option:
Require support for in-place operations that are unambiguous, and require raising an exception in case a view is mutated.
Rationale:
(a) This would require libraries that don't support mutation to write a wrapper layer, but the behaviour would be unambiguous and in most cases the wrapper would not be inefficient.
(b) In case inefficient mutation is detected (e.g. mutation a large array row-by-row in a loop), a warning may be emitted.
A variant of this option would be:
Require support for in-place operations that are unambiguous and mutate the whole array at once (i.e. +=
and out=
must be supported, element/slice assignment must raise an exception), and require raising an exception in case a view is mutated.
Trade-off here is ease of implementation for libraries like Dask and JAX vs. putting a rewrite burden on SciPy et al. and a usability burden on end users (the alternative to element/slice assignment is unintuitive).
This issue seeks to come to a consensus on the minimum set of data types an array library must support in order to conform to the specification.
Supported data types across array libraries...
bool_
bool8
byte
short
intc
int_
longlong
intp
int8
int16
int32
int64
ubyte
ushort
uintc
uint
ulonglong
uintp
uint8
uint16
uint32
uint64
half
single
double
float_
longfloat
float16
float32
float64
float96
float128
csingle
complex_
clongfloat
complex64
complex128
complex192
complex256
object_
bytes_
unicode_
void
bfloat16
bool
complex64
complex128
float16
float32
float64
int8
int16
int32
int64
uint8
bool
bfloat16
complex64
complex128
float16
float32
float64
int16
int32
int64
qint8
qint16
qint32
quint8
quint16
string
uint8
uint16
uint32
uint64
bool
bfloat16
complex64
complex128
float16
float32
float64
int8
int16
int32
int64
uint8
uint16
uint32
uint64
bool_
complex64
complex128
float16
float32
float64
int8
int16
int32
int64
uint8
uint16
uint32
uint64
Dask (see NumPy)
MXNet (see NumPy)
PyData/Sparse (see NumPy)
This issue proposes to specify that all specification conforming array libraries must, at minimum, support the following data types:
bool
int8
int16
int32
int64
uint8
uint16
uint32
uint64
float32
float64
The above data types are common across all array libraries considered in prior art (with PyTorch being the exception).
complex64
and complex128
are currently omitted from this proposal, as I'd like to defer consideration of some of the thornier aspects of how complex numbers are handled for future specification iterations. The proposed types have considerable prior art and are well-established, and, when questions arise regarding their behavior, normative references, such as IEEE 754 for floating-point arithmetic, are available.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.