microsoft / yardl Goto Github PK
View Code? Open in Web Editor NEWTooling for streaming instrument data
Home Page: https://microsoft.github.io/yardl/
License: MIT License
Tooling for streaming instrument data
Home Page: https://microsoft.github.io/yardl/
License: MIT License
Dear Yardl developers,
would it be (in principle) possible to use numpy.array_api instead of numpy
as array backend for the generated python code?
Note that numpy.array_api
is a reference implementation of the array API standard.
By doing so, Yardl would be compliant with the python array api and as such more agnostic to the specific array backend which would potentially allow using other compliant array backends (e.g. cupy or pytorch) in the future.
Georg
PS: @johnstairs @hansenms thanks for the support of the 1st ETSI Hackathon
It'd make sense to be able to install generated python.
If:
Using 28aa4af and the following model:
EmptyTest: !protocol
sequence:
strings: !stream
items: string
and the following demonstration program:
int main(void) {
::binary::EmptyTestWriter w("test.bin");
std::vector<std::string> strings;
w.WriteStrings(strings);
w.EndStrings();
w.Close();
::binary::EmptyTestReader r("test.bin");
int count = 0;
strings.reserve(10);
while (r.ReadStrings(strings)) {
for (auto const& s : strings) {
(void)(s);
count++;
}
}
assert(count == 0);
r.Close();
return 0;
}
the call to r.Close()
throws the following error:
terminate called after throwing an instance of 'std::runtime_error'
what(): Expected call to ReadStrings() but received call to Close() instead.
I suggest to add
option(${prefix}_HDF5_SUPPORT "Add HDF5 protocol" ON)
or similar, and also for NDJSON. Could be advanced options. This would allow the advanced to switch off something that they don't need.
Creating a separate issue based on #20 opened by @KrisThielemans
Also, somewhere in the doc we'll need a description of mappings between yardl types and C++ and other target languages. In particular, I believe you generate your own multi-dim array type as there still doesn't seem to be an std container sadly.
It could be useful to support a few existing multi-dim arrays to avoid copies in client-code (Boost.MultiArray and https://amypad.github.io/CuVec/ come to mind), but I can see that becoming very difficult. (If a mapping to a flat array is exposed somewhere, it'd need to be stated if row-major or column-major order is used).
These are good points. We currently use xtensor types for multidimensional arrays that we alias here. These have a .data()
method that exposes the raw flat array.
I think we have some choices for this problem:
_package.yaml
.Related problem: in some instances, perhaps the memory should be allocated on the GPU. Should this be a be a property on the !array
in yardl?
The C++ types for multi-dimensional arrays (yardl::FixedNDArray
, yardl::NDArray
, and yardl::DynamicNDArray
) are all locked to row-major layout at compile-time.
We will eventually support languages that default to column-major ordering. HDF5 requires data to be written in row-major order, so we will need to convert. For the binary format, we could do the same, or we could prefix each array with a byte indicating the layout. This could avoid expensive permutations if readers and writers are both working with column-major ordering.
We will look into wrapping the C++ codegen with pybind11, or whether the Python implementation will be completely separate.
https://github.com/microsoft/yardl/blob/main/docs/docs.md#computed-fields states
MyRec: !record
fields:
arrayField: !array
items: int
dimensions: [x, y]
computedFields:
accessArrayElementByName: arrayField[y:1, x:0]
this swaps the order between dimensions
and access. Is this intentional? It'd be very confusing!
Also, somewhere in the doc we'll need a description of mappings between yardl types and C++ and other target languages. In particular, I believe you generate your own multi-dim array type as there still doesn't seem to be an std
container sadly.
It could be useful to support a few existing multi-dim arrays to avoid copies in client-code (Boost.MultiArray and https://amypad.github.io/CuVec/ come to mind), but I can see that becoming very difficult. (If a mapping to a flat array is exposed somewhere, it'd need to be stated if row-major or column-major order is used).
By using varint
s , strings etc t's possible that data is not aligned to a 32-bit or whatever boundary. It doesn't seem documented if the binary format fills in the gaps or not. This certainly needs to be documented for Records and Streams.
For whatever reason, my conda install got xtensor=0.21.10. The generated code fails to compile though as it xtensor_container
doesn't have the flat
member. xtensor-stack/xtensor@50e3d42 says this means at least 0.23.10 is required.
Ideally, this minimum version should be added to the generated CMakeLists.txt
.
Of course, the same holds for other dependencies.
https://microsoft.github.io/yardl/reference/binary.html#enums-and-flags has a typo "properly" , but it is unclear how the base
type would be defined. A link to somewhere else?
When you define an enum with a size base, you succeed in code generation, but the code fails to compile
MyEnum: !enum
base: size
values:
a: 1
b: 2
c: 3
Using yardl v0.4.0, if you add an aliased nullable union to a Protocol sequence, the NDJSON reader will crash if the value of that step is None
.
Example:
GenericNullableUnion2<T1, T2>: [null, T1, T2]
RecordWithUnions: !record
fields:
value: [null, int, string]
aliasedValue: GenericNullableUnion2<int, string>
Then, using the following code to convert an instance of RecordWithUnions
to json and back again:
import yay
converter = yay.ndjson.RecordWithUnionsConverter()
json = converter.to_json(yay.RecordWithUnions())
r = converter.from_json(json)
The last line throws:
Traceback (most recent call last):
File "/workspaces/yardl/joe/issue-#113/python/test.py", line 7, in <module>
r = converter.from_json(json)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/joe/issue-#113/python/yay/ndjson.py", line 58, in from_json
aliased_value=self._aliased_value_converter.from_json(json_object["aliasedValue"],),
~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'aliasedValue'
Given the following model:
GenericUnionsRecord<T, U>: !record
fields:
a: !union
tv: T*
t: T
b: !union
tm: T->U
t: T
yardl v0.4.0 generates invalid constructor code for both inner unions:
class GenericUnionsRecord(typing.Generic[T, T_NP, U]):
...
def __init__(self, *, ...):
self.a = a if a is not None else TvOrT.Tv([]())
self.b = b if b is not None else TmOrT.Tm({}())
Warnings on import (these are TypeErrors at runtime):
/workspaces/yardl/joe/issue-#112/python/odd/types.py:58: SyntaxWarning: 'list' object is not callable; perhaps you missed a comma?
self.a = a if a is not None else TvOrT.Tv([]())
/workspaces/yardl/joe/issue-#112/python/odd/types.py:59: SyntaxWarning: 'dict' object is not callable; perhaps you missed a comma?
self.b = b if b is not None else TmOrT.Tm({}())
This issue occurs when the first type in the union resolves to a Python list
or dict
.
We should support maps/dictionaries as a first-class datatype. Syntax could be something like:
x: !map
keys: string
values: int
Keys can only be primitive scalar types.
Shorthand syntax could look like:
string->int
We should also make sure maps can be used in computed fields.
Using yardl v0.4.0, I expect to be able to use a computed field switch expression to produce a string value:
RecordWithComputedFields: !record
fields:
myField: [null, string, float]
computedFields:
myResult:
!switch myField:
null: "null"
string: "string"
float: "float"
But yardl complains with:
❌ /workspaces/yardl/joe/switch-case-string/model/model.yml:6:14: there is no variable in scope with the name 'null' nor does the record 'RecordWithComputedFields' does not have a field or computed field named 'null'
❌ /workspaces/yardl/joe/switch-case-string/model/model.yml:7:18: there is no variable in scope with the name 'string' nor does the record 'RecordWithComputedFields' does not have a field or computed field named 'string'
❌ /workspaces/yardl/joe/switch-case-string/model/model.yml:8:17: there is no variable in scope with the name 'float' nor does the record 'RecordWithComputedFields' does not have a field or computed field named 'float'
Anyone has any suggestions for how to handle HDF5 versions in CMake? I have built STIR with a particular version of HDF5, and my yardl stuff accidentally with another version. Result: crash at start-up time.
For instance, on https://github.com/ETSInitiative/PRDdefinition/tree/main/python
$ mypy prd_generator.py
prd/yardl_types.py:270: error: No overload variant of "zip" matches argument types "void", "void" [call-overload]
prd/yardl_types.py:270: note: Possible overload variants:
prd/yardl_types.py:270: note: def [_T_co, _T1] __new__(cls, Iterable[_T1], /, *, strict: bool = ...) -> zip[tuple[_T1]]
prd/yardl_types.py:270: note: def [_T_co, _T1, _T2] __new__(cls, Iterable[_T1], Iterable[_T2], /, *, strict: bool = ...) -> zip[tuple[_T1, _T2]]
prd/yardl_types.py:270: note: def [_T_co, _T1, _T2, _T3] __new__(cls, Iterable[_T1], Iterable[_T2], Iterable[_T3], /, *, strict: bool = ...) -> zip[tuple[_T1, _T2, _T3]]
prd/yardl_types.py:270: note: def [_T_co, _T1, _T2, _T3, _T4] __new__(cls, Iterable[_T1], Iterable[_T2], Iterable[_T3], Iterable[_T4], /, *, strict: bool = ...) -> zip[tuple[_T1, _T2, _T3, _T4]]
prd/yardl_types.py:270: note: def [_T_co, _T1, _T2, _T3, _T4, _T5] __new__(cls, Iterable[_T1], Iterable[_T2], Iterable[_T3], Iterable[_T4], Iterable[_T5], /, *, strict: bool = ...) -> zip[tuple[_T1, _T2, _T3, _T4, _T5]]
prd/yardl_types.py:270: note: def [_T_co] __new__(cls, Iterable[Any], Iterable[Any], Iterable[Any], Iterable[Any], Iterable[Any], Iterable[Any], /, *iterables: Iterable[Any], strict: bool = ...) -> zip[tuple[Any, ...]]
prd/yardl_types.py:299: error: "object" has no attribute "value" [attr-defined]
prd/_ndjson.py:48: error: Incompatible types in assignment (expression has type "TextIO", variable has type "TextIOWrapper") [assignment]
prd/_ndjson.py:86: error: Incompatible types in assignment (expression has type "BufferedReader | TextIO", variable has type "TextIOWrapper") [assignment]
prd/_ndjson.py:940: error: <nothing> has no attribute "to_json" [attr-defined]
prd/_ndjson.py:958: error: <nothing> has no attribute "from_json" [attr-defined]
prd/_ndjson.py:993: error: Incompatible types in assignment (expression has type "None", variable has type "tuple[int, ...]") [assignment]
prd/_ndjson.py:1024: error: Need type annotation for "result" [var-annotated]
prd/_binary.py:1071: error: Incompatible types in assignment (expression has type "None", variable has type "tuple[int, ...]") [assignment]
prd/_binary.py:1076: error: <nothing> has no attribute "_element_serializer" [attr-defined]
prd/_binary.py:1115: error: Need type annotation for "result" [var-annotated]
It'd be nice to be able to do some basic manipulations for a computed field, e.g. subtracting 1
ScannerInformation: !record
fields:
# edge information for TOF bins in mm (e.g. start,edge1, ... end)
tofBinEdges: float*
computedFields:
numberOfTOFBins: size(tofBinEdges)-1
Using the current test model, yardl throws a RuntimeError: Cannot find dtype
for each of the following true assertions:
import test_model as tm
assert tm.get_dtype(tm.AliasedGenericVector[int]) == np.object_
assert tm.get_dtype(tm.AliasedGenericFixedVector[int]) == np.int32
assert tm.get_dtype(tm.AliasedGenericDynamicArray[int]) == np.object_
assert tm.get_dtype(tm.AliasedGenericFixedArray[int]) == np.int32
assert tm.get_dtype(tm.basic_types.AliasedMap[str, int]) == np.object_
Using yardl 2d61ba3 with the following minimal model:
MyRecord: !record
fields:
myField: RecordWithGenericOptional<string>
RecordWithGenericOptional<T>: !record
fields:
value: T?
The generated types.py
does not properly initialize the inner record class. See generated classes below:
class RecordWithGenericOptional(typing.Generic[T]):
value: typing.Optional[T]
def __init__(self, *,
value: typing.Optional[T],
):
self.value = value
...
class MyRecord:
my_field: RecordWithGenericOptional[str]
def __init__(self, *,
my_field: typing.Optional[RecordWithGenericOptional[str]] = None,
):
self.my_field = my_field if my_field is not None else RecordWithGenericOptional()
...
Python throws a TypeError when creating an instance of MyRecord
:
In [1]: import combined
In [2]: combined.MyRecord()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 combined.MyRecord()
File /workspaces/yardl/joe/issue-switch-over-union/python/combined/types.py:45, in MyRecord.__init__(self, my_field)
42 def __init__(self, *,
43 my_field: typing.Optional[RecordWithGenericOptional[str]] = None,
44 ):
---> 45 self.my_field = my_field if my_field is not None else RecordWithGenericOptional()
TypeError: RecordWithGenericOptional.__init__() missing 1 required keyword-only argument: 'value'
In RecordWithGenericOptional.__init__
, value
should be instantiated with a default value of None
because it is Optional.
However, yardl currently omits default values for all generic types:
yardl/tooling/internal/python/types/types.go
Lines 181 to 187 in 2d61ba3
In Python, the binary protocol Writer is meant to be used as a context manager, e.g.
with MyProtocolWriter(filename) as w:
w.write...
It is also possible to use the class directly and manually call its .close()
method when finished, e.g.
w = MyProtocolWriter(filename)
w.write...
w.close()
However, when using it in this form, the zero byte normally written to terminate the stream is not written at all. This causes an unexpected error when reading the stream later (either an early EOF, or unexpected call to read a different protocol step).
Model:
MyProtocol: !protocol
sequence:
xs: !stream
items: int
Example:
from issue.binary import BinaryMyProtocolWriter, BinaryMyProtocolReader
w = BinaryMyProtocolWriter("test.bin")
w.write_xs(list(range(42)))
w.close()
r = BinaryMyProtocolReader("test.bin")
xs = r.read_xs()
assert len(list(xs)) == 42
r.close()
Run it:
Traceback (most recent call last):
File "/workspaces/yardl/joe/issue-#137/python/test.py", line 9, in <module>
assert len(list(xs)) == 42
^^^^^^^^
File "/workspaces/yardl/joe/issue-#137/python/issue/protocols.py", line 118, in _wrap_iterable
yield from iterable
File "/workspaces/yardl/joe/issue-#137/python/issue/_binary.py", line 971, in read
while (i := stream.read_unsigned_varint()) > 0:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/joe/issue-#137/python/issue/_binary.py", line 228, in read_unsigned_varint
self._fill_buffer(1)
File "/workspaces/yardl/joe/issue-#137/python/issue/_binary.py", line 299, in _fill_buffer
raise EOFError("Unexpected EOF")
EOFError: Unexpected EOF
Using 28aa4af and the following model:
UnionOfAlias: !protocol
sequence:
variant: [int, string]
variantAlias: [AliasedInt, string]
produces the following compiler error for the C++ NDJSON serialization:
/workspaces/yardl/joe/quickcheck/cpp/generated/ndjson/protocols.cc:33:8: error: redefinition of 'struct nlohmann::json_abi_v3_11_2::adl_serializer<std::variant<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >'
33 | struct adl_serializer<std::variant<check::AliasedInt, std::string>> {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/workspaces/yardl/joe/quickcheck/cpp/generated/ndjson/protocols.cc:14:8: note: previous definition of 'struct nlohmann::json_abi_v3_11_2::adl_serializer<std::variant<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >'
14 | struct adl_serializer<std::variant<int32_t, std::string>> {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Currently, yardl permits an empty Protocol (one with zero steps in its sequence).
In C++ codegen, an empty Protocol results in an unused writer
parameter in the protocol reader's CopyTo
method.
Model:
MyProtocol: !protocol
sequence:
Generated CopyTo
method:
void MyProtocolReaderBase::CopyTo(MyProtocolWriterBase& writer) {
}
Compiler error:
[2/7] Building CXX object generated/CMakeFiles/issue_generated.dir/protocols.cc.o
FAILED: generated/CMakeFiles/issue_generated.dir/protocols.cc.o
...
.../cpp/generated/protocols.cc: In member function 'void issue::MyProtocolReaderBase::CopyTo(issue::MyProtocolWriterBase&)':
/workspaces/yardl/joe/issue-#ddd/cpp/generated/protocols.cc:73:57: error: unused parameter 'writer' [-Werror=unused-parameter]
73 | void MyProtocolReaderBase::CopyTo(MyProtocolWriterBase& writer) {
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~
cc1plus: all warnings being treated as errors
Either empty Protocols should not be permitted, or I'll add a cast to void to silence the compiler.
When writing a protocol to an HDF5 file, we create a group with the protocol's name. It you wanted to store multiple experiments with the same protocol in the same file, we could have an optional path parameter that specifies the group to put the protocol in.
We could support some syntactic sugar for !vector
and !array
. Perhaps something like:
int* # a vector of int of unknown length
int*3 # a vector of ints of length 3
int[] # an array of ints with an unknown number of dimensions
int[,] # an array of ints with two dimensions
int[x,y] # an array of ints with two named dimensions
int[3,4] # an array of ints with two fixed dimensions
int[x:3, y:4] # an array of ints with two named and fixed dimensions
When defining a union type field within a record there is validation which ensures all type cases in a union are distinct. User defined aliases are not checked. This leads to issues compiling generated code due to variants containing multiples of the same underlying type. There also appears to be a similar conflict with 'size' and 'uint64'.
MyIntType: uint64
MyRecord: !record
fields:
one: [uint64, MyIntType]
MyRecord: !record
fields:
one: [uint64, size]
Records defined as such succeed in code generation, but that code cannot be compiled.
Using the following model:
GenericUnion<T>: !union
t: T
tv: T*
tvf: T[]
yardl v0.4.0 generates invalid Python:
class GenericUnion(typing.Generic[T, T_NP, T, T_NP, T, T_NP]):
Error message on import:
TypeError: Parameters to Generic[...] must all be unique
If the user writes an empty iterable to a binary stream, the underlying StreamSerializer
should not write a 0 byte. The 0 byte used to terminate a serialized stream is written elsewhere. Currently, yardl does this:
yardl/tooling/internal/python/static_files/_binary.py
Lines 958 to 959 in 7a0ab26
How to reproduce:
Change the Simple protocol round trip test to write a mixture of empty and non-empty streams. It currently writes values to each stream in the first part of the test, then writes only "empty" iterables in the second part:
yardl/python/tests/test_protocol_roundtrip.py
Lines 584 to 608 in 7a0ab26
# mixed empty and non-empty streams
with c() as w:
w.write_int_data(range(0))
w.write_optional_int_data([1, 2, None, 4, 5, None, 7, 8, 9, 10])
w.write_record_with_optional_vector_data([])
w.write_fixed_vector(([1, 2, 3] for _ in range(4)))
The test will fail.
Note: Adding this validation to test_simple_streams
uncovered another unrelated bug in NDJsonProtocolReader._read_json_line
. Separate issue.
Implement MATLAB codegen.
Given the following model on commit 42be458:
X: [null, int, float]
MyRec: !record
fields:
a: X
We get the following exception when importing the generated code:
Traceback (most recent call last):
File "/workspaces/yardl/python/run_sandbox.py", line 5, in <module>
import sandbox
File "/workspaces/yardl/python/sandbox/__init__.py", line 21, in <module>
from .types import (
File "/workspaces/yardl/python/sandbox/types.py", line 121, in <module>
get_dtype = _mk_get_dtype()
^^^^^^^^^^^^^^^
File "/workspaces/yardl/python/sandbox/types.py", line 117, in _mk_get_dtype
dtype_map.setdefault(MyRec, np.dtype([('a', get_dtype(typing.Optional[X]))], align=True))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/python/sandbox/_dtypes.py", line 87, in <lambda>
return lambda t: get_dtype_impl(dtype_map, t)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/python/sandbox/_dtypes.py", line 60, in get_dtype_impl
return _get_union_dtype(get_args(t))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/python/sandbox/_dtypes.py", line 81, in _get_union_dtype
inner_type = get_dtype_impl(dtype_map, args[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/python/sandbox/_dtypes.py", line 76, in get_dtype_impl
raise RuntimeError(f"Cannot find dtype for {t}")
RuntimeError: Cannot find dtype for <class 'sandbox.types.X'>
Another problem is that the dtype for [null, int, float]
should be np.object_
, instead of {"has_value": np.bool, "value": np.object_}
Another problem is that the generated union classes that do not have a named type (e.g. Int32OrString
) are not recognized by get_dtype()
and throw.
Vectors of booleans:
V: !vector
items: bool
are not handled because the .data()
method is deleted from the std::vector<bool>
specialization. Additionally, binary serialization should write out a bitstream where each value is a bit rather than a byte.
If the model has any errors, yardl generate --watch
will exit instead of entering the watch loop. Appears to have been introduced in #94 .
https://microsoft.github.io/yardl/reference/binary.html#dates-times-and-datetimes isn't clear on time zone handling. DateTimes
is clear enough, (although a reference to what "since epoch" means would be good) but the doc on Dates
and Times
should say this is in UTC presumably. That might be undesirable/confusing though, so maybe it's better to support time zones spec.
Also, I'm assuming the types are actually singular, e.g. DateTime
. I think a different font needs to be used for the actual type name (like you use for float
etc).
There is some renaming of members going on in the generated code, but it is not consistent
ScannerInformation: !record
fields:
tofBinEdges: !array
computedFields:
numberOfTOFBins: size(tofBinEdges)-1
leads to tof_bin_edges
member in both C++ and Python, but NumberOfTOFBins()
(note capital N
) in C++ while number_of_tof_bins()
in Python
Personally I'd try to avoid any renaming, but maybe that is difficult when covering multiple languages. We could enforce naming in the yardl model?
There is another issue.
MyRec<T>: !record
fields:
a: T*
Does not default the field, whereas this does:
MyRec: !record
fields:
a: int*
(An existing issue, not a regression)
Originally posted by @johnstairs in #96 (comment)
Computed fields are currently an embedded expression language within a YAML file. switch
expressions (to work with unions and optional types) are not expressed in this language, but rather as YAML nodes:
optionalNamedArrayLength: # YAML
!switch optionalNamedArray: # YAML
NamedNDArray arr: size(arr) # YAML-type-expression hybrid
null: 0 # YAML-type-expression hybrid
This does not allow switch expressions to be used as part of larger expressions (type conversions, a function call argument, etc).
Instead, we should consider making switch
part of the expression language. The example above might then look like:
optionalNamedArrayLength: |
switch(optionalNamedArray) {
NamedNDArray arr: size(arr)
null: 0
}
On the other hand, this syntax introduces curly braces within a YAML document, where indentation is usually favoured.
CMAKE_CXX_STANDARD
(what's the current minimum?)target_include_directories
find_package(HDF5)
? (it'll be overwritten, no?)
if(VCPKG_TARGET_TRIPLET)
set(HDF5_CXX_LIBRARIES hdf5::hdf5_cpp-shared)
else()
set(HDF5_CXX_LIBRARIES hdf5::hdf5_cpp)
endif()
At present, I believe the user has to know if the stored data is binary or HDF5, and instantiate the corresponding class. That's efficient but also very inconvenient. It would certainly be nice to be able to write some client-code that does not depend on the container-type. (Edit: I see that there are abstract classes in protocols.h
already, so possibly the only thing that's necessary is a factory that determines the container-type given a filename)
Using yardl commit ae9b826 with the following model:
GenericRecord<T>: !record
fields:
v: T
AliasedRecord<T>: GenericRecord<T>
AliasedOpenGeneric<T>: AliasedRecord<T>
AliasedClosedGeneric: AliasedRecord<string>
To reproduce, generate Python for this model, then import the generated Python module.
The module won't import, failing with the following error:
Traceback (most recent call last):
File "/workspaces/yardl/joe/models/bug/python/test.py", line 1, in <module>
import bug
File "/workspaces/yardl/joe/models/bug/python/bug/__init__.py", line 21, in <module>
from .types import (
File "/workspaces/yardl/joe/models/bug/python/bug/types.py", line 56, in <module>
get_dtype = _mk_get_dtype()
^^^^^^^^^^^^^^^
File "/workspaces/yardl/joe/models/bug/python/bug/types.py", line 52, in _mk_get_dtype
dtype_map[AliasedClosedGeneric] = get_dtype(types.GenericAlias(AliasedRecord, (str,)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/joe/models/bug/python/bug/_dtypes.py", line 107, in <lambda>
return lambda t: get_dtype_impl(dtype_map, t)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/yardl/joe/models/bug/python/bug/_dtypes.py", line 90, in get_dtype_impl
return res(get_args(t))
^^^^^^^^^^^^^^^^
File "/workspaces/yardl/joe/models/bug/python/bug/types.py", line 51, in <lambda>
dtype_map[AliasedOpenGeneric] = lambda type_args: get_dtype(types.GenericAlias(AliasedRecord, (type_args[0],)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
RecursionError: maximum recursion depth exceeded in comparison
For this particular model, reordering the aliases as shown below can eliminate the error, but this is a confusing limitation for the user.
AliasedClosedGeneric: AliasedRecord<string>
AliasedOpenGeneric<T>: AliasedRecord<T>
This is a very minor bug, but it led to a necessary review of how the generated Python get_dtype
function should work.
Given a model with an aliased, generic type:
GenericRecord<T>: !record
fields:
v: T
AliasedRecord<T>: GenericRecord<T>
If I call get_dtype
on GenericRecord without specifying type arguments, I get a useful error message:
m.get_dtype(m.GenericRecord)
...
RuntimeError: Generic type arguments not provided for <class 'm.types.GenericRecord'>
But if I do the same for the aliased type, I do not get the same, expected error message:
m.get_dtype(m.AliasedRecord)
...
RuntimeError: Cannot find dtype for ~T
The user does not know what ~T
is.
We should have a special kind of enum for flags that are meant to be bitwise ORed together:
!flags
values:
- none
- red
- green
- blue
The first value will always be 0.
As with enums, you can specify the base type and integer values:
!flags
values:
none: 0
red: 1
green: 2
blue: 4
Using yardl commit ab1e2b with the following model:
GenericRecord<T>: !record
fields:
v: T
AliasedRecord<T>: GenericRecord<T>
MyRecord: !record
fields:
myField: AliasedRecord<int>
To reproduce, generate Python for this model, then import the generated Python module and create an instance of MyRecord with no arguments.
Python will complain that MyRecord.__init__()
is missing the keyword argument for my_field
:
Python 3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:57:19) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import issue_082
>>> r = issue_082.MyRecord()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: MyRecord.__init__() missing 1 required keyword-only argument: 'my_field'
The generated code for MyRecord
looks like this:
class MyRecord:
my_field: AliasedRecord[yardl.Int32]
def __init__(self, *,
my_field: AliasedRecord[yardl.Int32],
):
self.my_field = my_field
If I remove the AliasedRecord
from the model and use GenericRecord
directly, I get the expected class definition for MyRecord
, and it works:
class MyRecord:
my_field: GenericRecord[yardl.Int32]
def __init__(self, *,
my_field: typing.Optional[GenericRecord[yardl.Int32]] = None,
):
self.my_field = my_field if my_field is not None else GenericRecord(v=0)
The relevant code is
yardl/tooling/internal/python/types/types.go
Lines 179 to 195 in ae9b826
From @hansenms 's email
It seems useful to have a RelativeTime
, i.e. offset w.r.t. some defined DateTime
, such as a scan start. This would be quite useful in de-identifying some data. In some cases, the time of scan needs to be removed from the data, but it'd be painful to have to adjust all times in the file.
When a generic type's type parameter is unused, we should raise an error or warning.
MyUnion<T, U>: [T, int]
U
is unused.
On Windows from Powershell.
mamba env create --file environment.yml
Looking for: ['bash-completion=2.11', 'ccache=4.5.1', 'clang-format=14.0.4', 'cmake=3.21.3', 'fmt=8.1.1', 'gcc_linux-64]
Could not solve for environment specs
Encountered problems while solving:
- nothing provides requested bash-completion 2.11**
- nothing provides requested gcc_linux-64 11.2.0**
- nothing provides requested gdb 11.2**
- nothing provides requested gxx_linux-64 11.2.0**
- nothing provides requested valgrind 3.18.1**
The environment can't be solved, aborting the operation
I guess we should remove valgrind
and gdb
? Even bash_completion
and ccache
. Maybe even clang-format
.
Of course, the justfile
is bash/Linux specific as well and I guess Windows support is for later.
PS: Is pinning the compiler version etc best practice? Maybe some of these could be >=
?
Soon we'll need to actually version yardl
, which may include:
cmd/yardl.go
yardl/tooling/cmd/yardl/main.go
Lines 10 to 13 in 76ea21d
The ability to write a protocol out at JSON could be useful for debugging, even if it is not well-suited for large streams of scientific data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.