GithubHelp home page GithubHelp logo

patrick-kidger / torchtyping Goto Github PK

View Code? Open in Web Editor NEW
1.4K 16.0 32.0 105 KB

Type annotations and dynamic checking for a tensor's shape, dtype, names, etc.

License: Apache License 2.0

Python 100.00%
tensors named-tensors shape pytorch typing python-typing

torchtyping's Introduction

torchtyping

Type annotations for a tensor's shape, dtype, names, ...

Welcome! For new projects I now strongly recommend using my newer jaxtyping project instead. It supports PyTorch, doesn't actually depend on JAX, and unlike TorchTyping it is compatible with static type checkers. :)


Turn this:

def batch_outer_product(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
    # x has shape (batch, x_channels)
    # y has shape (batch, y_channels)
    # return has shape (batch, x_channels, y_channels)

    return x.unsqueeze(-1) * y.unsqueeze(-2)

into this:

def batch_outer_product(x:   TensorType["batch", "x_channels"],
                        y:   TensorType["batch", "y_channels"]
                        ) -> TensorType["batch", "x_channels", "y_channels"]:

    return x.unsqueeze(-1) * y.unsqueeze(-2)

with programmatic checking that the shape (dtype, ...) specification is met.

Bye-bye bugs! Say hello to enforced, clear documentation of your code.

If (like me) you find yourself littering your code with comments like # x has shape (batch, hidden_state) or statements like assert x.shape == y.shape , just to keep track of what shape everything is, then this is for you.


Installation

pip install torchtyping

Requires Python >=3.7 and PyTorch >=1.7.0.

If using typeguard then it must be a version <3.0.0.

Usage

torchtyping allows for type annotating:

  • shape: size, number of dimensions;
  • dtype (float, integer, etc.);
  • layout (dense, sparse);
  • names of dimensions as per named tensors;
  • arbitrary number of batch dimensions with ...;
  • ...plus anything else you like, as torchtyping is highly extensible.

If typeguard is (optionally) installed then at runtime the types can be checked to ensure that the tensors really are of the advertised shape, dtype, etc.

# EXAMPLE

from torch import rand
from torchtyping import TensorType, patch_typeguard
from typeguard import typechecked

patch_typeguard()  # use before @typechecked

@typechecked
def func(x: TensorType["batch"],
         y: TensorType["batch"]) -> TensorType["batch"]:
    return x + y

func(rand(3), rand(3))  # works
func(rand(3), rand(1))
# TypeError: Dimension 'batch' of inconsistent size. Got both 1 and 3.

typeguard also has an import hook that can be used to automatically test an entire module, without needing to manually add @typeguard.typechecked decorators.

If you're not using typeguard then torchtyping.patch_typeguard() can be omitted altogether, and torchtyping just used for documentation purposes. If you're not already using typeguard for your regular Python programming, then strongly consider using it. It's a great way to squash bugs. Both typeguard and torchtyping also integrate with pytest, so if you're concerned about any performance penalty then they can be enabled during tests only.

API

torchtyping.TensorType[shape, dtype, layout, details]

The core of the library.

Each of shape, dtype, layout, details are optional.

  • The shape argument can be any of:
    • An int: the dimension must be of exactly this size. If it is -1 then any size is allowed.
    • A str: the size of the dimension passed at runtime will be bound to this name, and all tensors checked that the sizes are consistent.
    • A ...: An arbitrary number of dimensions of any sizes.
    • A str: int pair (technically it's a slice), combining both str and int behaviour. (Just a str on its own is equivalent to str: -1.)
    • A str: str pair, in which case the size of the dimension passed at runtime will be bound to both names, and all dimensions with either name must have the same size. (Some people like to use this as a way to associate multiple names with a dimension, for extra documentation purposes.)
    • A str: ... pair, in which case the multiple dimensions corresponding to ... will be bound to the name specified by str, and again checked for consistency between arguments.
    • None, which when used in conjunction with is_named below, indicates a dimension that must not have a name in the sense of named tensors.
    • A None: int pair, combining both None and int behaviour. (Just a None on its own is equivalent to None: -1.)
    • A None: str pair, combining both None and str behaviour. (That is, it must not have a named dimension, but must be of a size consistent with other uses of the string.)
    • A typing.Any: Any size is allowed for this dimension (equivalent to -1).
    • Any tuple of the above. For example.TensorType["batch": ..., "length": 10, "channels", -1]. If you just want to specify the number of dimensions then use for example TensorType[-1, -1, -1] for a three-dimensional tensor.
  • The dtype argument can be any of:
    • torch.float32, torch.float64 etc.
    • int, bool, float, which are converted to their corresponding PyTorch types. float is specifically interpreted as torch.get_default_dtype(), which is usually float32.
  • The layout argument can be either torch.strided or torch.sparse_coo, for dense and sparse tensors respectively.
  • The details argument offers a way to pass an arbitrary number of additional flags that customise and extend torchtyping. Two flags are built-in by default. torchtyping.is_named causes the names of tensor dimensions to be checked, and torchtyping.is_float can be used to check that arbitrary floating point types are passed in. (Rather than just a specific one as with e.g. TensorType[torch.float32].) For discussion on how to customise torchtyping with your own details, see the further documentation.
  • Check multiple things at once by just putting them all together inside a single []. For example TensorType["batch": ..., "length", "channels", float, is_named].
torchtyping.patch_typeguard()

torchtyping integrates with typeguard to perform runtime type checking. torchtyping.patch_typeguard() should be called at the global level, and will patch typeguard to check TensorTypes.

This function is safe to run multiple times. (It does nothing after the first run).

  • If using @typeguard.typechecked, then torchtyping.patch_typeguard() should be called any time before using @typeguard.typechecked. For example you could call it at the start of each file using torchtyping.
  • If using typeguard.importhook.install_import_hook, then torchtyping.patch_typeguard() should be called any time before defining the functions you want checked. For example you could call torchtyping.patch_typeguard() just once, at the same time as the typeguard import hook. (The order of the hook and the patch doesn't matter.)
  • If you're not using typeguard then torchtyping.patch_typeguard() can be omitted altogether, and torchtyping just used for documentation purposes.
pytest --torchtyping-patch-typeguard

torchtyping offers a pytest plugin to automatically run torchtyping.patch_typeguard() before your tests. pytest will automatically discover the plugin, you just need to pass the --torchtyping-patch-typeguard flag to enable it. Packages can then be passed to typeguard as normal, either by using @typeguard.typechecked, typeguard's import hook, or the pytest flag --typeguard-packages="your_package_here".

Further documentation

See the further documentation for:

  • FAQ;
    • Including flake8 and mypy compatibility;
  • How to write custom extensions to torchtyping;
  • Resources and links to other libraries and materials on this topic;
  • More examples.

torchtyping's People

Contributors

adilzouitine avatar alband avatar anivegesana avatar gkorepanov avatar milescranmer avatar olliethomas avatar patrick-kidger avatar teichert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torchtyping's Issues

Is it possible to use torchtyping with pydantic?

Hi! Is it an option to use the torchtyping annotations with other runtime type checkers instead of typeguard?

It is not directly related, but the question came up from the next case. For batch I use dataclass, and typeguard doesn't check in runtime constructor of dataclass. Maybe there is a more clever choice for from what inherit batch?

Support functions in tensor shapes.

e.g. cat(a: TensorType["x"], b: TensorType["y"]) -> TensorType["x + y"].
Given that torchtyping operates at runtime then arbitrary Python expressions should be possible. Essentially just call eval with the appropriate constants x, y etc. bound in. Ignore any errors, so that e.g. func(x: TensorType["x + y"]) remains valid, but if the expression evaluates then compare its result against the value inferred.

This should happen as an additional round of checking, at the end of the current _check_memo function.

Basic mypy integration tests don't work?

Since the documentation suggest that mypy integration mostly works, I would have expected the following basic things to work.

import torch
from torchtyping import TensorType  # type: ignore


# I'd expect that mixing TensorType with torch.Tensor works. However in
# strict mode this errors with:
# error: Returning Any from function declared to return "Tensor"
def simple_test_a(x: TensorType[(), float]) -> torch.Tensor:
    return x


# I'd expect that .item() has the type in the annotation, and that e.g.
# annotation the function with `-> str` would be a type error. However
# in strict mode this errors with:
# error: Returning Any from function declared to return "float"
def simple_test_b(x: TensorType[(), float]) -> float:
    return x.item()


# I'd expect that mypy is a aware of actual methods of a tensor and
# gives a type check error when calling garbage.
def simple_test_c(x: TensorType[(), float]) -> None:
    x.asdfasdfasdf()

Note that the latter two work when using the native torch.Tensor as a type annotation.

From what I can tell so far is that mypy actually doesn't understand the meaning of TensorType at all. I thought that "mostly works" means that I can expect mypy to produce meaningful type check errors based on the annotations (operations with illegal shape or dtype combinations etc.).

Am I doing something wrong here, or does "mostly works" that it simply completely ignores the type annotations? Kind of unexpected considering the primary motivation of type annotations is to be used in type checking.

Support for an or condition, or other way to accomplish this pattern?

n00b to this very cool project, looking to enforce a broadcast-ability pattern where a dimension in one tensor either matches or can be broadcast to (i.e. equals 1) a dimension in another tensor.

@typeguard.typechecked
def mwe(
    x: torchtyping.TensorType[
        ...,
        "foo",
        "bar", # How do we make this "match bar from arg_b or equal 1"?
    ],
    y: torchtyping.TensorType[
        "bar",
    ]) -> torch typing.TensorType[...,"foo","bar"]:
    return x * y

Am I missing an existing way to do this in torchtyping out of the box? Would this need an extension?

mypy not compatible with any named axes?

When I specify a type like TensorType["batch_size", "num_channels", "x", "y"], I get a mypy error like error: Name "batch_size" is not defined for each of the named axes. Is this expected? Am I doing something wrong? This is with the most recent mypy, 0.950.

Support enums

For larger projects, TensorType[T.BATCH] (where T is an enum) can be better than TensorType["batch"], as it enables standard naming conventions & additional code hints.

It would be great to support this out of the box.

Pyright reports an error with named axis

Setup

  • pyright: 1.1.263
  • pytorch: 1.12.0+cu113
  • torchtyping: 0.1.4

Code Example

from torchtyping import TensorType

def example(foo: TensorType["batch"]):
    pass

Problem

Pyright reports the following error: "batch" is not defined

Related issue

The same error is reported by mypy when -1 is omitted: #35

The loss of declaration track in Pycharm

The typing check is really useful, however, I found it a bit inconvenient to jump right into the source code of torch with Pycharm. It seems the declaration track is lost. Below I give two quick examples.

Everything is ok if the typing is set to torch.Tensor
image

The track is lost when set to TensorType in torchtyping
image

There is one workaround though, by using Union in typing to include them both. But I think it a bit ugly.

Subclassing `torch._C._TensorBase` is not supported

When using the latest version of PyTorch, users are seeing the following error: "RuntimeError: Cannot subclass _TensorBase directly"

pytorch/pytorch#131463

This happens because PyTorch recently started forbidding subclassing from the raw c++ _TensorBase type in favor of subclassing torch.Tensor directly. Subclassing the base type is very unsafe as many methods on it actually assume attributes from Tensor are there and so it is not really a valid class to use ever.

Unfortunately, the trick used in

class _TensorTypeMeta(type(torch.Tensor)):
to create the mixing based on type(torch.Tensor) makes it so that the meta type sees a new class of that metatype being created without torch.Tensor being the base class. Hence triggering the error.

I'm not sure why we need the extra step of creating the mixin and then having the final class inherit from Tensor here

class TensorType(torch.Tensor, TensorTypeMixin):
but avoiding the mixin and have the base class be class TensorType(torch.Tensor, metaclass=_TensorTypeMeta): would avoid this issue.
Not sure if that is a proper fix though.

mypy integration

torchtyping currently integrates with typeguard. Integration with mypy would be good to have as well.

It certainly won't be as strong -- there's no way that mypy is powerful enough to can catch shape/dtype errors -- but it's worth thinking about.

TensorType detail: grad_enabled

Is it possible to perform type checking for tensors with grad enabled? I myself am not sure of all the cases necessary to test against to confirm this as I don't fully understand how runtime type checking operates.

class _AutoGradTensorDetail(TensorDetail):
    def check(self, tensor: torch.Tensor)  -> bool:
        return tensor.requires_grad()

how does pep-646 affect this project?

hi!
how does the inclusion of variadic generics (pep-646) in python 3.11 affect this project?

will this allow mypy to statically check tensor shapes? maybe a plugin will be required?

just curious!
cheers

Showing custom types in Sphinx Docs

Hi @patrick-kidger, thanks for maintaining such a great project! :D

I'm running into some behaviour I'd like to change when rendering docs with Sphinx.

When I define a type it is rendered as a torch.Tensor in the docs e.g.:

#types.py

from torchtyping import TensorType

CoordTensor = TensorType[-1, 3]
# function.py

from .types import CoordTensor


def func(x: CoordTensor) -> CoordTensor:
    """Returns the input.

    :param x: tensor
    :type x: CoordTensor
    """"
    return x

docs

Do you know if there is a way to get Sphinx to show the type name I have defined (and possibly make it clickable so users can jump to the type definition in the documentation?)

allow named sizes for named dimensions

(Thanks for this awesome work!)

I've often seen applications in which multiple dimensions of the same tensor will be the same size (which size is only known at runtime). It is useful to be able to separately document that those sizes must be the same while acknowledging the differing purposes of these dimensions.

The following example demonstrates the idea---showing three related features that already work in torchtyping as well as a proposed feature:

def func(feats: TensorType["b": ..., "annotator": 3, "word": "words", "feature"],
         predicates: TensorType["b": ..., "annotator": 3, "predicate": "words", "feature"],
         pred_arg_pairs: TensorType["b": ..., "annotator": 3, "predicate": "words", "argument": "words"]):
    # feats has shape (..., 3, words, features)
    # predicates has shape (..., 3, words, features)
    # pred_arg_pairs has shape (..., 3, words, words)
    # the ... b dimensions are checked to be of the same size.

Things that already work:

  • dimensions that share names (like "feature") are enforced to share the same dimension size
  • named dimensions (like "annotator") can specify a specific dimension size which is enforced
  • named ellipses (like "batch") can be used to represent a fixed (but only known at runtime) set of dimensions and corresponding sizes [this is very close to what I want, but (1) I would prefer to not have the extra power of matching an unspecified number of dimensions and (2) as I understand it, ellipses only represent a single variable set of dimensions, but I want to be able to separately constrain multiple sets of dimensions to share respective sizes]

Proposed:

  • named dimensions (like "token", "predicate", and "argument") should be able to declare a shared-but-unspecified dimension size given by name ("words" in this example)

Additionally, you would probably want to enforce that, if the specified "size name" matches the name of another dimension (like "word" in the following example), then the sizes of those dimensions should be the same:

def func(feats: TensorType["b": ..., "annotator": 3, "word", "feature"],
         predicates: TensorType["b": ..., "annotator": 3, "predicate": "word", "feature"],
         pred_arg_pairs: TensorType["b": ..., "annotator": 3, "predicate": "word", "argument": "word"]):
    # feats has shape (..., 3, words, features)
    # predicates has shape (..., 3, words, features)
    # pred_arg_pairs has shape (..., 3, words, words)
    # the ... b dimensions are checked to be of the same size.

Thoughts?

Support tensor-likes

That is, classes supporting the __torch_function__ protocol.

This shouldn't be too difficult -- most of the necessary work has already been done.

  • There's some places where we have torch.Tensor hardcoded, for example in instance checks and some type annotations, that would need adjusting to accept tensor-likes.
  • TensorTypeMixin would need exposing as a public part of the interface.
  • The documentation needs updating to show how this is possible. Once the above changes are made it should just be:
from torchtyping import TensorTypeMixin

class TensorLike:
    ...

class TensorLikeType(TensorLike, TensorTypeMixin):
    base_cls = TensorLike

Checking the first dimensions of a tensor

Hi!

I just found torchtyping a few days ago, and I am enjoying it so far. However, I am a bit confused when it comes to one particular use-case: checking if the arguments of a function share the same first dimensions.

For example, if I try to write a function such as batch-wise scalar multiplication:

def batchwise_multiply(data: TensorType['B', ...], weights: TensorType['B']):
    pass

I get a NotImplementedError: Having dimensions to the left of ... is not currently supported.

Why is such a behaviour not implemented? What is the difference from performing the same operation on the right?
While I haven't checked the code, to the best of my understanding if TensorType[..., 'B'] is supported, then if you detect a situation like TensorType[..., 'B'], you should be able to reuse the same code but reading the tensors backwards, isn't it?

I feel this feature would be huge for the library. At least with my programming conventions, I tend to put common dimensions in leading positions so that later I can unpack tensors using the * operator.

Restrict valid identifiers in names.

Specifically to standard alphanumeric characters only. This will be very useful for forward compatibility when #2 is introduced, so we can easily detect which names are composites of others.

Empty tensor support

How can I type check an empty tensor? E.g.

import torch
from torchtyping import TensorType, patch_typeguard
from typeguard import typechecked
patch_typeguard()


a = torch.zeros(3,4,0)

def my_fn(a: TensorType["batch", "temporal", -1]):
  return torch.cat((a, torch.zeros(3,4,10), dim=-1)

Using None, -1, or 0 for the last dimension gives me

TypeError: argument "a" must be of type TensorType['batch', 'temporal', -1], got type NoneType instead.

NameError encountered in tutorial

I'm trying out the example in the readme; in particular, I am running

from torch import rand
from torchtyping import TensorType, patch_typeguard
from typeguard import typechecked

patch_typeguard()  # use before @typechecked

@typechecked
def func(x: TensorType["batch"],
         y: TensorType["batch"]) -> TensorType["batch"]:
    return x + y

func(rand(3), rand(3))  # works
func(rand(3), rand(1))

However, after executing func(rand(3), rand(3)) (which is supposed to work), I get

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/shashanksule/miniforge3/envs/pr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/Users/shashanksule/miniforge3/envs/pr/lib/python3.9/site-packages/torch_pesq/loss.py", line 320, in forward d_symm, d_asymm = self.raw(ref, deg) File "/Users/shashanksule/miniforge3/envs/pr/lib/python3.9/site-packages/torch_pesq/loss.py", line 174, in raw ) -> Tuple[TensorType["batch", "sample"], TensorType["batch", "sample"]]: NameError: name 'batch' is not defined

I get the same error even if I define named tensors with names = ("batch",) and enter them into func. What's going wrong here?

Dimension size as expression of another dimension

Hi,

I'm new to using torchtyping, and my usecase is the following: I have a data structure that has tensor of size N and another one of size N - 1. Is there a way to specify this? Naive attempt of using TensorType["time" - 1] failed.

Thanks!

pycharm shows an incorrect dtype when assigning torch.randn to a variable

Hi,

I have recently started using this library, so i might be using it incorrectly, but linting seems to fail in pycharm when assigning the result of torch.randn to TensorType with a float dtype.

Here is an example:

Matrix = TensorType['h', 'w', float]
x: Matrix = torch.randn(5, 3)

The second line gets underlined with the following error:

Expected type 'TensorType[Any, Any, float]', got 'Tensor' instead

If i modify the second line to:

x: Matrix = torch.randn(5, 3).float()

The error goes away, but I would rather not do that as one of the plus sides of this library is to remove extra typing related code from my main logic. Having to add an implicit .float defeats the purpose of this library IMO.

From reading the docs this should work, torch.randn returns a tensor the the deafult dtype, and TensorTypes that have float in them should be of the default type.

vscode/pylance/pyright don't consider a Tensor to be compatible with TensorType

Using torchtyping in vscode, I've found that passing a Tensor to a TorchType generates an error in the type checker:

image

Tagging the TensorType import with type: ignore as recommended in the FAQ for mypy compatibility doesn't help. Is there any other way to suppress these errors short of tagging every use of a tensor with a tensortype'd sig with type: ignore?

Reproduction

vscode's Pylance language server backs onto the pyright project, and so we can get an easier to examine reproduction by using pyright directly.

Here's a quick script to set up an empty conda env with just torch and torchtyping

mkdir tmp
cd tmp
conda create -p ./.env 
conda activate ./.env
pip install torch==1.9.0 torchtyping==0.1.3

and one more command to install pyright

sudo npm install -g pyright

Then create two files, pyrightconfig.json with contents

{
    "useLibraryCodeForTypes": true,
    "exclude": [".env"]
}

and test.py with contents

import torch
from torchtyping import TensorType

def f(a: TensorType):
    pass

f(torch.zeros())

With that all done, running pyright test.py will give the error:

Loading configuration file at /Users/andy/code/tmp/pyrightconfig.json
Assuming Python version 3.9
Assuming Python platform Darwin
stubPath /Users/andy/code/tmp/typings is not a valid directory.
Searching for source files
Found 1 source file
/Users/andy/code/tmp/test.py
  /Users/andy/code/tmp/test.py:7:3 - error: Argument of type "Tensor" cannot be assigned to parameter "a" of type "TensorType" in function "f"
    "Tensor" is incompatible with "TensorType" (reportGeneralTypeIssues)
1 error, 0 warnings, 0 infos 
Completed in 0.715sec

Generalizing the Library

Hello and thanks for publishing this library! I've really enjoyed reading the design and discussion documents you have posted. However, I am now trying to apply this library in a somewhat broader context. Essentially, I am hoping to use it to improve the linear operator library. linear_operator. The idea of this library is to abstract how tensors are stored to be able to perform matrix operations much more efficiently. I'd really like to use torchtyping to add dimensional and storage type checks to help squash bugs in this code. Unfortunately, torchtyping is configured to run exactly on torch.Tensor objects. My first attempt was just to hack the library to pull out a few class checks. But, doing more reading, I feel like torchtyping could be cleanly improved by using protocols. PEP 544 โ€“ Protocols: Structural subtyping (static duck typing). The idea would be to have the library use an abstract tensor protocol rather than tensor directly. This would make the library much more general and I think it could help cleanup the code by making it explicit as to what tensor fields are being used. What do you think / do you have any suggestions on how to add this?
@dannyfriar @m4rs-mt

Using `str: int` pairs in annotations does not verify names at runtime

First of all, thanks for your work, this library is very useful.

While both name constraints (str) and size contraints (int) are combined, the runtime type checks seem to stop working. This example does not throw an error:

def f(x: TensorType["B" : 4, "F"]) -> TensorType["B"]:
    return torch.randn(3)

While if the : 4 is removed from the example, the correct behavior is obtained ("TypeError: Dimension 'B' of inconsistent size."). I have only tried this with Python 3.9

Support for list of variable length tensors

Currently, list of tensors with different sizes (but same number of dimensions) is not supported:

import torch                                                                                                                                                                                                                                   
from torchtyping import TensorType, patch_typeguard                                                                                                                                                                                            
from typeguard import typechecked                                                                                                                                                                                                              
from typing import List                                                                                                                                                                                                                        
                                                                                                                                                                                                                                               
patch_typeguard()                                                                                                                                                                                                                              
                                                                                                                                                                                                                                               
TIME = ""                                                                                                                                                                                                                                      

@typechecked                                                                                                                                                                                                                                   
def noop(                                                                                                                                                                                                                                  
    list_of_seqs: List[TensorType["TIME"]],                                                                                                                                                                                                    
):                                                                                                                                                                                                              
    return None                                                                                                                                                                                                                                

if __name__ == "__main__":                                                                                                                                                                                                                     
    x = [torch.randn(5), torch.randn(10)]                                                                                                                                                                                                      
    y = noop(x)   

returns the error TypeError: Dimension 'TIME' of inconsistent size. Got both 10 and 5.
since not all tensors in the input list are of the same length.

However, ideally we want to (1) be able to check that each tensor in list_of_seqs is a 1D tensor and (2) signify that each tensor is indexed by the same "dimension type" named "TIME". Is there a way to get torchtyping to support this?

If we could support this feature, a related thought is that we don't want to bind a fixed size to the variable-size dimension, since we may want to do something like

import torch                                                                                                                                                                                                                                   
from torchtyping import TensorType, patch_typeguard                                                                                                                                                                                            
from typeguard import typechecked                                                                                                                                                                                                              
from typing import List                                                                                                                                                                                                                        
                                                                                                                                                                                                                                               
patch_typeguard()                                                                                                                                                                                                                              
                                                                                                                                                                                                                                               
TIME = ""                                                                                                                                                                                                                                      
BATCH = ""                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
@typechecked                                                                                                                                                                                                                                   
def to_dense(                                                                                                                                                                                                                                  
    list_of_seqs: List[TensorType["TIME"]],                                                                                                                                                                                                    
) -> TensorType["BATCH", "TIME"]:                                                                                                                                                                                                              
    return torch.stack([seq[:3] for seq in list_of_seqs])                                                                                                                                                                                      
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
if __name__ == "__main__":                                                                                                                                                                                                                     
    x = [torch.randn(5), torch.randn(10)]                                                                                                                                                                                                      
    y = to_dense(x)  

where we still want to signal that the output is indexed by "TIME" but possibly with a different size than the "TIME" dimensions of tensors in list_of_seqs. This happens for example when padding list of tensors to form a single dense tensor.

pycharm warning of Unresolved reference

Hi, I really like this project! It's mind blowing for me!!

I have a question. When coding in pycharm using TensorType with a string dimension name, I got a warning about unresolved reference

image

I am not sure if this is a limitation of pycharm. I am using pycharm 2021.1.1, python 3.7.10 and torchtyping 0.1.4.

Thanks!

Support checks via docstrings instead of type annotations

First off - I've been thinking of almost the same idea as this library for a while because I see runtime errors from tensor shape/dtype mismatches all the time, so glad that there's already something in place!

My initial approach was going to be parsing docstrings of various formats (with an existing library like docstring_parser) and performing validation on these, rather than type annotations. Is that a feature you'd consider accepting into your library? I'd be interested in writing a PR for it with some guidance

For example, I currently write Sphinx style docstrings like this

def forward(self, imgs, tokens):
    """Combine multimodal input features

    :param torch.FloatTensor[N, C, H, W] imgs: Batch of image pixels
        normalized in range of 0-1
    :param torch.LongTensor[N, L] tokens: Vocabularly tokens in sequence
        of length ``L``
    :return torch.FloatTensor[N, C] pred: Predicted probabilities for
        each class
    """

I realize the [N, C, H, W] notation is not quite as rigid as what this project proposes, and that's one reason I've been looking for a more structured approach. But regardless, I do find it nice sometimes to have this information in the docstrings instead of type annotations, particularly for functions with many parameters

Support distributions

Hi,

Thank you for this great library! It has been very helpful. I was wondering if you've considered supporting torch.Distribution datatypes. The idea being that you could do something like func() -> Distribution["state_dim"], so that it is clear that the output is a distribution over vectors of size "state_dim". What do you think?

TorchScript compatibility?

Hi all,

This library looks very nice :)

Is TensorType compatible with the TorchScript compiler? As in, are the annotations transparently converted to torch.Tensor as far as torch.jit.script is concerned, allowing annotated modules/functions to be compiled? (I'm not worried about whether the type checking applied in TorchScript, just whether an annotated program that gets shape-checked in Python can be compiled down to TorchScript.)

Thanks!

Support Any for shape

Hi, I would like to thank you for this cool library.
I was desperate not to find a shape typing for pytorch and I had planned to code it myself if it didn't exist.

I think your api is great, however I find that specifying the dimension of any shape to -1 is not very intuitive (I saw that you have many other ways to declare it). One idea is to declare a dimension with any shape using typing.Any.
As the library nbtyping does :

from typing import Any
import numpy as np 
from nptyping import NDArray
NDArray[(3, 3, Any), np.float32]

In this case we have typed our array with no constraints on the last dimension.
If we apply this modification to your library:

from typing import Any
from torchtyping import TensorType
import torch

TensorType[3, 3, Any, torch.float32]
# Instead of 
TensorType[3, 3, -1, torch.float32]

What do you think of this? If you're interested I can try to make a pull request!

I thank you again for developing this wonderful library.

Arbitrary number of dimensions - but check they are same over the argument tensors

Consider this function

@typechecked
def mean_squared_error(input: TensorType["batch"], target: TensorType["batch"]):
    d = input - target
    d = d * d
    return torch.mean(d)

The above only allows batches to contain 1-element values (i.e. scalars).

I would like to ensure that the shape of items in the input batch is the same as the shape of items in the target batch, i.e input.shape[1:] == target.shape[1:].

I don't want to hardcode the number of dimensions like for example a batch containing images: input: TensorType["batch", "c", "h", "w"].

Is this currently possible?

Type checking based on names

Hello

I'd like to know if there's an easy way to check tensors by name:

import torch
from torch import rand
from torchtyping import TensorType, patch_typeguard, is_named
from typeguard import typechecked

patch_typeguard()  # use before @typechecked


def test():
    t = Test()
    b = t.return_batch()
    o = t.return_other()
    v = t.func(b, o)
    u = t.func(o, b)  # can we have it raise TypeError?


class Test:
    def __init__(self):
        pass

    @typechecked
    def func(
        self,
        x: TensorType["batch"],
        y: TensorType["other"],
    ) -> TensorType["batch", "other"]:
        return torch.outer(x, y)

    def return_batch(self) -> TensorType["batch"]:
        return rand(4)

    def return_other(self) -> TensorType["other"]:
        return rand(3)


test()

Right now, IIUC only dimensions are checked, so in this example there is no error...

I think that I could use is_named in TensorType, but it gets very cumbersome because we also need to use names=... everytime we declare a tensor. This could be OK... but it can get even worse because some pytorch operations don't seem to work with named tensors (outer here! at least with 1.9.1) so we need to rename tensors every 2 lines...

Is is doable to have patch_typeguard(name_check=True), or would it be too complicated to implement? (I think basically I want nominal typing instead of structural typing)

Thanks for your work!

Optionally check consistency across function calls within a checked function.

Essentially:

typeguard uses a "memo" object as a place to store information over the duration of a function call. At the moment, the checking is primarily performed by:

  • Storing extra information in the memo (specifically the pairs of TensorTypes and the corresponding actual tensors passed as arguments);
  • Then parsing all of these to perform the extra checking.

It should be possible to extend this to check consistency across nested function calls, e.g.

def f() -> TensorType["x", "y"]:
    return torch.rand(2, 3)

def g(tensor: TensorType["y", "x"]):
    pass

g(f())

should raise an error, as "x" and "y" get both 2 and 3 passed as sizes.

The solution should be to:

  • Create an additional a thread-local storage.
  • For the duration of a memo's (=function call) existence, have it register itself in the storage.
  • Have each memo compare the inferred sizes of dimensions "x", "y" etc. against those of other memos in the storage.
  • Raise an error if they don't match.
  • Have the memo only perform the checking for function calls in the same file, to avoid incompatibility between uses of torchtyping in library code and user code.

Will need to think about how to do this optimally. A naive implementation involves every call to _check_memo looking in the storage and doing this extra checking, but that will be O(n^2) in the depth of the call stack n.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.