yupferris / kaze Goto Github PK

View Code? Open in Web Editor NEW

194.0 14.0 9.0 327 KB

An HDL embedded in Rust.

License: Apache License 2.0

Rust 100.00%

hdl digital-logic-design rust

kaze's Introduction

kaze 風

An HDL embedded in Rust.

kaze provides an API to describe Modules composed of Signals, which can then be used to generate Rust simulator code or Verilog modules.

kaze's API is designed to be as minimal as possible while still being expressive. It's designed to prevent the user from being able to describe buggy or incorrect hardware as much as possible. This enables a user to hack on designs fearlessly, while the API and generators ensure that these designs are sound.

Usage

[dependencies]
kaze = "0.1"

Example

use kaze::*;

fn main() -> std::io::Result<()> {
    // Create a context, which will contain our module(s)
    let c = Context::new();

    // Create a module
    let inverter = c.module("Inverter");
    let i = inverter.input("i", 1); // 1-bit input
    inverter.output("o", !i); // Output inverted input

    // Generate Rust simulator code
    sim::generate(inverter, sim::GenerationOptions::default(), std::io::stdout())?;

    // Generate Verilog code
    verilog::generate(inverter, std::io::stdout())?;

    Ok(())
}

Releases

See changelog for release information.

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

kaze's People

Contributors

Stargazers

Watchers

Forkers

anttilukats longjohncoder gkelly pwang7 ar90n icodein jdonszelmann 0-jake-0 talebna

kaze's Issues

Consider dedicated FSM construct

These are pretty popular in meta-HDLs, and for good reason. Needs design work.

Add `lsb` and `msb` convenience methods for `Signal`s

It's kindof annoying to use x.bit(0) or x.bit(x.bit_width() - 1) (or similar); .lsb() or .msb() are shorter/clearer and are well-established terms.

Modules with different clock domains?

It seems this crate has the restriction that all modules always operate in the same clock domain.
Are there any plans to remove this restriction? :)

Btw, do you have a recommendation which crate or HDL to use, for designing an asynchronous manycore CPU, where each core is self-clocked?

Don't derive `Default` trait to help generate module initialization code

I believe this was OK before I implemented Mem since new and default were always the same and it saved some code in the sim generator due to not having to default-initialize everything, but now that Mem instances add extra initialization code in new, these two initialization functions have diverged, and it's possible to call default instead (seeing as the generated struct does indeed implement this and a user will rightfully expect that this will construct a valid instance) and later get an indexing error during sim due to indexing into a zero-length buffer (created by the rustc-generated default fn) instead of the expected length.

Yes, I myself was bitten by this :)

Explore async/await for co-sim threads

I have no idea if this could work or how it would look, but it could be a very powerful idea.

Quite often, a kaze module is developed against one or more simulated peripherals, written in software. For example, a CPU design is tested against a simulated memory, or even an entire system bus with multiple peripheral simulations.

With just one or two peripheral simulations, managing the simulations and how they interact with the kaze-generated design simulation isn’t terribly difficult, but this complexity very quickly grows as the number of peripherals increases. This is also the case as the ways in which the peripherals interact becomes more complex.

If each simulated peripheral were to be considered in isolation, this is more or less trivial in most cases. A simulation could be implemented as a coroutine that may interact with the kaze generated sim, including stepping cycles and peeking/poking ports.

In order to sync the various simulations, it could make sense to code each one in a separate coroutine, with each cycle step represented by yielding. Some underlying system could manage the various continuations and perhaps propagating signals in between.

These ideas might be able to play nicely with rust’s async/await features somehow. I think this should be explored, as it would make writing and maintaining reusable software simulations much, much nicer, which in turn makes kaze even more effective.

Note that it may make more sense to explore this idea as a separate concept/library from kaze entirely, that kaze could itself integrate well with; I imagine these kinds of software simulations have broader applicability than just supporting kaze design development.

Write verilog::generate() output to a file

Right now, the only way(which I found, I might be wrong) to generate verilog code is to print the verilog code to the console, and copy it from there. Is there a way to write the output to a file, similarly to how sim::generate() works, already existing in the crate? Or could this be implemented into the crate?

Anyways, great work! I've began to use your crate recently(today xD), and it's a tremendous productivity booster!

Document major language assumptions/constraints in high-level docs

Examples:

Only a single, implicit clock domain is currently supported
Only a single, implicit reset is currently supported

Lift signal bit widths into type system when Rust achieves proper support for const generics

This has basically always been the plan, but is on hold until everything is stable. This will be very much a breaking change but I can imagine a lot of user code actually won't look much different due to type inference.

It looks like this is the tracking issue min_const_generics, which is the minimal feature subset that will be stabilized first, but it's not quite sufficient.

Tracing improvements

Some sharp edges I've noticed while using it (in addition to some of the notes in #4):

I miss inner module inputs/outputs. These are essentially ignored when flattening, but perhaps we should include them when tracing is enabled.
It would be nice to make the Trace instance optional, even when tracing is enabled, perhaps by taking an Option<T: Trace> in the ctor. This may have issues with generics, though, since you'd still have to provide a Trace type for the None case.
Owning the Trace instance may not be the best thing, since it means the module can no longer implement Clone / Copy trivially (without the underlying Trace impl also implementing these, which for Copy probably isn't feasible and doesn't seem very clean in the first place).
Performance seems surprisingly bad. I haven't measured or pinpointed where things are going awry, but it's surprising, so it should be investigated.
Due to the current lack of name uniqueness checks in the compiler (which should be fixed!), tracing sometimes ends up with duplicated signals if instance/regs are reused, which is surprisingly easy to do by accident when writing generators it seems. This should lead to invalid verilog codegen, but the rust sim compilation (due to graph flattening) succeeds logically, even though the trace isn't quite right.

Additional trace formats

wavedrom and fst are the first that come to mind, but surely others are useful as well

Cannot reproduce: Validation error if a module has no inputs/outputs

I found this awesome project and decided to fix some TODOs (PRs incoming). But I could not see how to reproduce the todo which is currently directly on the Module struct:

// TODO: Validation error if a module has no inputs/outputs
pub struct Module<'a> { ... }

Do you know how to create this bug, or should the TODO be removed?

match/switch construct

Needs some design work, but is probably relatively straightforward. The biggest benefit here (besides more terse/clear code) is additional error checking possibilities for the compiler.

Finalize `if` syntax sugar

In particular syntax for if-like expressions.

I went through a few rounds with this, from various kinds of macros, to what I have now. The primary benefits to this kind of approach are:

We don't overload normal rust syntax for conditionals. I think this is actually crucial, since we want to be sure we can differentiate between RTL and rust/meta code clearly and easily.
if_ expressions work just like other expressions and are predictable in their behavior. You can also chain them with rust code to add as many else_if expressions as desired; it plays well with metaprogramming.

Some downsides:

The naming convention isn't very pretty, but I want it to be obvious and light, and I think this is achieved currently, but there may be a better solution still.
rustfmt doesn't handle the extra scopes too well and likes to screw up formatting. Ideally there'd be a solution that works better with this, especially one that wouldn't require custom rustfmt configuration/annotation.
It can certainly be annoying to pack up all the alternatives in tuples for cases where you want to handle multiple Signals at a time, especially as this number grows (not to mention explicitly having to implement these cases in the lib itself, though I'm sure there's a macro-based solution for this that would make it much more manageable).

It's certainly not the prettiest but I like it more than other alternatives so far - though I want to continue considering other possibilities.

Document typical use cases

Beyond basic examples, it would be useful to have docs that describe intended usage (eg. generating rust code in a build.rs script and using rust tests to unit-test the resulting sim code).

Consider growing stacks in compiler instead of iterative lowering

I was recently made aware of this, unfortunately after converting lowering to iterative rather than recursive code!

Ofc both approaches have tradeoffs, but the code is probably overall simpler if we conditionally grow stacks, which is very important - so we may want to consider going back to the older pattern instead.

Consider writing book/tutorial

This is significant effort for arguably not much payoff at this point, but I'd still like to track it and have this ticket as a place where I can dump ideas etc.

Complete add/sub overloads

Currently, I've implemented add / sub for signals emulating std's primitive types' wrapping_add / wrapping_sub functionality, which I think is the most ergonomic in most cases.

However, there are cases where other behavior is desired - particularly if we want to extract the carry bit on overflow. std provides overflowing_add / overflowing_sub for these purposes, so I think we should:

Expose wrapping_add / wrapping_sub explicitly, and document that add / sub ops alias these
Expose overflowing_add / overflowing_sub which also returns the overflow/carry bit in a tuple
Consider if we should come up with a third option which includes the overflow/carry bit concat'd with the lower bits, as I foresee this is as being useful often

Note that all of this can be emulated today with the current API - a user can concat low bits to lhs and rhs before add / sub, and they're free to pull apart the resulting Signal however they want.

I still want to handle mul separately; I believe its current behavior is correct/least surprising (but I'm open to feedback on this).

Consider higher-level abstractions for signed/unsigned signals

Currently all Signals are assumed to be unsigned buckets of bits, and signed-ness only really exists for some operations where it will actually produce a different result (eg mul vs mul_signed). While I believe this is good for the graph API in general (since the compilers/code generators don't ever have to care about it), it leaves some possible useful explicitness in user code and error checking on the table.

For example, one idea I've had in my head for a long time now is to augment the API with higher-level Int/UInt (or similar) constructs that would be normal rust structs containing a Signal, but would represent signed-ness in rust's type system. All operators and other API entry points that exist on Signal today would then be exposed on the higher-level constructs as well, so a user would typically use those where they use Signal today. Operators like Mul would be implemented by using mul for UInt and mul_signed for Int transparently, and rust's type system would catch errors if, for example, a user tried to multiply a signed signal (Int) with an unsigned one (UInt). Explicit cast operators can also be added for these types to explicitly change between them (possibly in the form of From impls, so a user could use .into() just like they would with other rust types).

This is potentially a lot of work and testing overhead so I haven't jumped in, but I wanted to document somewhere as I think it's ultimately useful.

This could perhaps be an opt-in feature for the lib as well, in case a user doesn't care and wants to use Signals directly (which I think should always be allowed anyways).

Some unresolved things:

How should Mem and Module input/output APIs change? Ideally a user wants to create eg. a Mem with signed elements, and that should be specified somehow. Perhaps UIntMem or IntMem or similar? The same goes for Module inputs/outputs, where a user wants a given input to only accept signed/unsigned signals.
How should module inputs/outputs change? Should they use signed types?
Should signed signals propagate signed values to tracing? Can this be used in relevant formats?

Update doc links

Now that rustdoc has improved in this area, we can make these less error prone/likely to break.

Possible logic errors: Mutable key types

Running cargo clippy (the linter) on the projects suggests that some key types in hashmaps which implement Ord and Eq have interior mutability (Cell/RefCell) members. This means that it's theoretically possible to mutate the value but not update the hash leading to a logic error. This is worth checking.

Come up with tracing solution for generated sim modules

In theory this is as simple as capturing all signals and dumping a vcd (or some other format), but is potentially more complex for a few reasons:

Performance for generated simulators is paramount for effectiveness in testing and verification.
Tracing something non-trivial can easily produce hundreds of gigabytes' worth of data (or more!), which in addition to taking up loads of disk space to store the results, can make a simulation prohibitively slow due to IO overhead.
As it is today, the generated simulators have zero dependencies besides basic runtime functionality (eg. allocating a std::Vec for Mem storage). I'd very much like to keep it that way (this makes bootstrapping test projects etc trivial), but we don't necessarily want to also generate a bunch of extra boilerplate to output a certain format when relevant libs are readily available.
I can think of at least two formats that I would like to dump traces to off of the top of my head, and ideally, users wouldn't be bound by whatever I've chosen to implement.
Module hierarchy should be available to the trace solution, even though the sim graph has been flattened. This is supported by vcd and without it, traces will get unwieldy.

So, this adds some important constraints on such a solution:

Tracing should be optional. Whether or not this means optionally generating tracing code in the first place and/or generating tracing code that can be conditionally enabled/disabled at runtime I'm not entirely sure about yet; that's a decision that has to be considered as well.
Enabling tracing should not create additional library dependencies for the generated code, unless perhaps a user asks for it - for example, if a module accepts a handle to a generic trace object, we can implement that with several backends, or not, if a user wants to add in their own (which I can imagine being very useful). Note that if such a generic approach is implemented, performance should still be paramount, which means we should prefer generics to dynamic dispatch (i.e. a generated module sim is generic over a generic trace type).
There should be at least one default trace implementation provided by kaze that's easy to use. vcd is the obvious choice here.
It should be easy(/automatic?) to get trace output for a failing rust unit test that's based on a generated simulator. This is a very common use case for verification (see these sim tests and even kaze's own sim test for verification!) and ideally a user would be able to pull up a trace dumped from a failing test in order to better understand how it failed.
Module hierarchy should be available to the trace API.

Some loose ends/other ideas that come to mind:

It might make sense to introduce some kind of "compilation profile" that can allow generation to optionally insert/omit extra signals, i.e. debug outputs for a certain module, that otherwise wouldn't be used in a design. Note that this can be done manually today at a higher level when generating modules with simple conditionals, but perhaps language support is something that could make sense.
Note also that basically all of this can be done manually as it is today, either by manually inspecting the module (optionally via some additional debug outputs added by the user to the design to expose internal state/signals) or by using a traditional RTL sim/compiler on verilog output from kaze. While I've already done this a couple times when working on the xenowing console, I don't believe this is a satisfactory solution because it requires an unreasonable amount of extra effort to bootstrap, and there ends up being a lot of duplication of information about which signals are available, which is already known by the kaze compiler.
Is signed-ness something that manifests in traces, and should the compiler actually be made aware of this for this purpose?

LLHD as Kaze backend

Recently, an LLHD project was announced.

It's interesting to me whether LLHD can be used as a backend to translate Rust into a netlist to be executed on a FPGA. Or at least to be an immediate backend for Kaze.

Allow signed values for `Constant`

Now that we've started adding signed ops (eg. mul_signed) it's natural to allow specifying signed Constant literals in addition to unsigned bit patterns. Note that signed literals would still need to be range-checked.

This would still represent these Signals in the same unsigned way internally (as there's currently no concept of signed-ness in the graph API today, see #7 for potential future improvements in this area), but it be much more convenient when specifying signed literals. Further, in the absence of additional type constraints, rust will infer numeric constants to be of signed types (eg. i32), so today calling eg. lit(1, 1) on a Module will actually raise an error even though 1 would have fit as an unsigned type, so it needs to be annotated eg 1u32 to specify an unsigned literal explicitly. This case would be naturally resolved if we allowed signed types.

Bit selection of vector signal was missing in generated verilog codes.

Hi @yupferris
Thanks for developing the kaze project. It is so interesting to me!
Now I'm trying to write a tiny 4bit CPU with kaze for my practice. And the simulation with Rust codes works very well. But the generated Verilog codes don't work. I think there is a bug to handle the Bits operation.

The following Rust code is a sample about reproducing this failure.

use std::io::stdout;

use kaze::*;

fn main() {
    let c = Context::new();
    let m = c.module("bits");
    let input_vec = m.input("input_vec", 3);
    let input_scalar = m.input("input_scalar", 1);
    let xor_vec = input_vec.bit(0) ^ input_vec.bit(1) ^ input_vec.bit(2);
    m.output("xor_vec", xor_vec);
    m.output("bits_scalar", input_scalar.bit(0));
    m.output("pass_through_scalar", input_scalar);

    verilog::generate(m, stdout());
}

And the output of the above code with the latest kaze is the following.

module bits(
    input wire reset_n,
    input wire clk,
    input wire input_scalar,
    input wire [2:0] input_vec,
    output wire bits_scalar,
    output wire pass_through_scalar,
    output wire xor_vec
    );
    wire __temp_0;
    wire __temp_1;
    assign bits_scalar = input_scalar;
    assign pass_through_scalar = input_scalar;
    assign __temp_0 = input_vec ^ input_vec;
    assign __temp_1 = __temp_0 ^ input_vec;
    assign xor_vec = __temp_1;
endmodule

It seems that the bit selection operations to input_vec are missing.
I think this failure was occurred by using its bit width to detect a scalar signal.
So I modified the compiler.rs as follows.

diff --git a/kaze/src/verilog/compiler.rs b/kaze/src/verilog/compiler.rs
index 9797948..6fa912f 100644
--- a/kaze/src/verilog/compiler.rs
+++ b/kaze/src/verilog/compiler.rs
@@ -296,12 +296,14 @@ impl<'graph> Compiler<'graph> {
                         graph::SignalData::Bits {
                             range_high,
                             range_low,
+                            source: source_signal,
                             ..
                         } => {
+                            let source_bit_width = source_signal.bit_width();
                             let bit_width = signal.bit_width();
                             let source = results.pop().unwrap();
                             // Verilog doesn't allow indexing scalars
-                            Some(if bit_width == 1 {
+                            Some(if source_bit_width == 1 {
                                 source
                             } else {
                                 a.gen_temp(

And then I could get the following results.

module bits(
    input wire reset_n,
    input wire clk,

    input wire input_scalar,
    input wire [2:0] input_vec,
    output wire bits_scalar,
    output wire pass_through_scalar,
    output wire xor_vec
    );

    wire __temp_0;
    wire __temp_1;
    wire __temp_2;
    wire __temp_3;
    wire __temp_4;

    assign bits_scalar = input_scalar;
    assign pass_through_scalar = input_scalar;
    assign __temp_0 = input_vec[2];
    assign __temp_1 = input_vec[1];
    assign __temp_2 = input_vec[0];
    assign __temp_3 = __temp_2 ^ __temp_1;
    assign __temp_4 = __temp_3 ^ __temp_0;
    assign xor_vec = __temp_4;

endmodule

I think it handles input_vec correctly. What do you think about it?
If these modifications are effective, can I send a PR to this project?

FYI
These modifications are pushed to my fork.
https://github.com/ar90n/kaze/tree/fix-scalar-signal-detection

Formal verification support

Lots of ideas here, no time to write them down, will do so later :)

test trace_test_module_2 failed: panicked at 'assertion failed: `(left == right)`

Thanks for making this crate, it seems very useful to get into hardware design.

I just cloned this repo and ran the tests, one test failed:

test tests::trace_test_module_2 ... FAILED

failures:

---- tests::trace_test_module_2 stdout ----
thread 'tests::trace_test_module_2' panicked at 'assertion failed: `(left == right)`
  left: `module trace_test_module_2:
    children:
        module inner1:
            children:
            signals:
                i1: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(4294967295)
                    1: U32(4294967295)
                    2: U32(4294967295)
                i2: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(4294901760)
                    1: U32(4294901760)
                    2: U32(4294901760)
                o: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(4294901760)
                    2: U32(4294901760)
                r: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(4294901760)
                    2: U32(4294901760)
        module inner2:
            children:
            signals:
                i1: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(16711680)
                    1: U32(16711680)
                    2: U32(16711680)
                i2: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(983040)
                    1: U32(983040)
                    2: U32(983040)
                o: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(983040)
                    2: U32(983040)
                r: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(983040)
                    2: U32(983040)
        module inner3:
            children:
            signals:
                i1: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(4294901760)
                    2: U32(4294901760)
                i2: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(983040)
                    2: U32(983040)
                o: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(0)
                    2: U32(983040)
                r: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(0)
                    2: U32(983040)
    signals:
        i1: 32 bit(s) (U32)
            0: U32(0)
            0: U32(4294967295)
            1: U32(4294967295)
            2: U32(4294967295)
        i2: 32 bit(s) (U32)
            0: U32(0)
            0: U32(4294901760)
            1: U32(4294901760)
            2: U32(4294901760)
        i3: 32 bit(s) (U32)
            0: U32(0)
            0: U32(16711680)
            1: U32(16711680)
            2: U32(16711680)
        i4: 32 bit(s) (U32)
            0: U32(0)
            0: U32(983040)
            1: U32(983040)
            2: U32(983040)
        o: 32 bit(s) (U32)
            0: U32(0)
            0: U32(0)
            1: U32(0)
            2: U32(983040)
`,
 right: `module trace_test_module_2:
    children:
        module inner1:
            children:
            signals:
                r: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(4294901760)
                    2: U32(4294901760)
        module inner2:
            children:
            signals:
                r: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(983040)
                    2: U32(983040)
        module inner3:
            children:
            signals:
                r: 32 bit(s) (U32)
                    0: U32(0)
                    0: U32(0)
                    1: U32(0)
                    2: U32(983040)
    signals:
        i1: 32 bit(s) (U32)
            0: U32(0)
            0: U32(4294967295)
            1: U32(4294967295)
            2: U32(4294967295)
        i2: 32 bit(s) (U32)
            0: U32(0)
            0: U32(4294901760)
            1: U32(4294901760)
            2: U32(4294901760)
        i3: 32 bit(s) (U32)
            0: U32(0)
            0: U32(16711680)
            1: U32(16711680)
            2: U32(16711680)
        i4: 32 bit(s) (U32)
            0: U32(0)
            0: U32(983040)
            1: U32(983040)
            2: U32(983040)
        o: 32 bit(s) (U32)
            0: U32(0)
            0: U32(0)
            1: U32(0)
            2: U32(983040)
`', sim-tests\src\lib.rs:2378:9
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/ec2f40c6b04f0e9850dd1f454e8639d319f4ed9b/library\std\src\panicking.rs:577
   1: core::panicking::panic_fmt
             at /rustc/ec2f40c6b04f0e9850dd1f454e8639d319f4ed9b/library\core\src\panicking.rs:67
   2: core::fmt::Arguments::new_v1
             at /rustc/ec2f40c6b04f0e9850dd1f454e8639d319f4ed9b/library\core\src\fmt\mod.rs:416
   3: core::panicking::assert_failed_inner
             at /rustc/ec2f40c6b04f0e9850dd1f454e8639d319f4ed9b/library\core\src\panicking.rs:260
   4: core::panicking::assert_failed<sim_tests::tests::Capture,sim_tests::tests::Capture>
             at /rustc/ec2f40c6b04f0e9850dd1f454e8639d319f4ed9b\library\core\src\panicking.rs:214
   5: sim_tests::tests::trace_test_module_2
             at .\src\lib.rs:2378
   6: sim_tests::tests::trace_test_module_2::closure$0
             at .\src\lib.rs:2346
   7: core::ops::function::FnOnce::call_once<sim_tests::tests::trace_test_module_2::closure_env$0,tuple$<> >
             at /rustc/ec2f40c6b04f0e9850dd1f454e8639d319f4ed9b\library\core\src\ops\function.rs:250
   8: core::ops::function::FnOnce::call_once
             at /rustc/ec2f40c6b04f0e9850dd1f454e8639d319f4ed9b/library\core\src\ops\function.rs:250
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    tests::trace_test_module_2

test result: FAILED. 44 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.03s

error: test failed, to rerun pass `--lib`

kaze/sim-tests/src/lib.rs

Line 2378 in adc8f3f

assert_eq!(

Diff between lhs and rhs of the assert:
https://www.diffchecker.com/Oh4K0ECz/

Non-Recursive Verilog Generator

We used Kaze as part of our intern's hardware fuzzing project to generate lock-like structures--state machines that require a sequence of clocked inputs to reach a goal state. If you take a look at how the linked generator works it builds mux chains to construct an FSM, specifically https://github.com/googleinterns/hw-fuzzing/blob/master/hw/lock/hdl_generator/locksmith/src/main.rs#L78. When lowering to Verilog the approach that Kaze takes currently is to recursively generate expressions. Since this is a deeply nested expression this ends up running out of stack if the depth is too large.

Now, this isn't really an issue for the the hw-fuzzing projcet as the shallow locks are more than sufficient to prove the approach, but it did get me thinking about possible solutions. The recursive generation is very easy to understand, so maybe it makes more sense to take a page from nmigen and expose some sort of FSM construct: https://github.com/nmigen/nmigen/blob/master/examples/basic/fsm.py. Lowering can then be iterative over the conditions.

Module-level iterative lowering also seems like it could be an approach, but that seems like a lot of hassle. Either way, if you have opinions I'm happy to spend some cycles implementing.

Thanks for the Kaze project!

Come up with random initialization solution for generated sim modules

Similar to #4, we need a solution for randomly initializing generated sim module states that ideally doesn't add any external dependencies to the generated simulators (unless perhaps it can be opt-in), and possibly allows a user to specify their own RNG for this purpose.

Consider redesigning how `Module` (hierarchies) work

Sketch/braindump:

Currently, a module hierarchy in kaze might look like this (taken/adapted from cases in sim tests):

fn test<'a>(c: &'a Context<'a>) -> &Module<'a> {
    test_inner(c);

    let m = c.module("Test");
    let i1 = m.instance("inner1", "TestInner");
    i1.drive_input("i1", m.input("i1", 32));
    i1.drive_input("i2", m.input("i2", 32));
    let i2 = m.instance("inner2", "TestInner");
    i2.drive_input("i1", m.input("i3", 32));
    i2.drive_input("i2", m.input("i4", 32));
    let i3 = m.instance("inner3", "TestInner");
    i3.drive_input("i1", i1.output("o"));
    i3.drive_input("i2", i2.output("o"));
    m.output("o", i3.output("o"));

    m
}

fn test_inner<'a>(c: &'a Context<'a>) -> &Module<'a> {
    let m = c.module("TestInner");

    let i1 = m.input("i1", 32);
    let i2 = m.input("i2", 32);
    m.output("o", i1 & i2);

    m
}

This fn creates a simple inner module ("TestInner") instantiated 3 times inside the top-level module ("Test").

While this is easy enough to understand, the code itself has a few problems:

It's quite verbose, especially horizontally. Conceptually, we're not doing something very complicated, yet it still takes quite a few characters to express this. Not only does that make it annoying to type/put together, it also makes the code quite dense and a bit hard to understand what's going on as things get more complicated.
Input/output names are specified as strings in several different places. While kaze validates module hierarchies internally before generating code (which ensures that string names are checked/validated at some point and keeps things safe), it would be nice for as many of these errors as possible to be caught at rust compile time and displayed in the editor (eg. referring to a port that doesn't exist).
It can be difficult to get an overview of the top-level ports of a module. This is especially important when browsing unfamiliar RTL code (which includes code you wrote but haven't looked at in a while!)

Additionally, there are some more general loose ends with the current pattern for creating/instantiating modules:

It's kindof nice to constrain module generation to a fn, but what does that really get us?
Why do we return the Module at the end? This makes it easy to generate code for the top-level module, but a more common case actually is just to call a module generation function, ignore the return value, and then instantiate the module after (by string name, not the reference we just got back!), as is done for TestInner above. Feels a bit janky to say the least.
Currently, verilog gen requires the user to explicitly generate code for each Module that's been instantiated in their design, but other than providing a way to iterate over all Modules in a Context (which is a hack) there's no useful way to query which modules are instantiated (so we don't necessarily know which modules are/aren't important and the only conservative thing is to always generate all modules).
Specialization for modules with different parameters has to be done by hand. This leads to many Modules with hand-generated mangled names. If we're already mangling names, why not try to make this automatic somehow and save some typing/decision making?

Generally, module generation/instantiation feels a bit half-baked, and I'd really like to improve things. I feel it's quite important to have a good pattern for this - we want to encourage the use of several decoupled modules in designs, and if it's too hairy to do this meaningfully in a lot of cases, then users will tend to create larger modules with much more logic, which are harder to build and maintain (I find myself doing this more often than I would like).

One thought I've had after looking a bit at nMigen is for a Module to not represent a verilog module 1:1, but instead have it represent a specific instance (which is currently represented by Instance in kaze). Then the convention might be to wrap each Module (instance!) in a wrapper struct that exposes the inputs/outputs as Signals (or relevant wrappers) as fields. This way, adding a new module into the Context and instantiating it (which are very commonly done one right after another) are merged into one action. For codegen, we can make verilog and rust sim gen more symmetric by always specifying a top-level module and having kaze always output the whole hierarchy (with mangled names for inner modules in the verilog case now, and still flattening the graph for rust sim). This then greatly simplifies codegen for the verilog case, since we only write code to output top-level modules.

Mockup:

struct Test<'a> {
    pub i1: &'a Input<'a>,
    pub i2: &'a Input<'a>,
    pub i3: &'a Input<'a>,
    pub i4: &'a Input<'a>,

    pub o: &'a Output<'a>,
}

impl<'a> Test<'a> {
    // TODO: Optionally make this a submodule
    // TODO: How do we get the inner module again in order to generate code?
    pub fn new<S: Into<String>>(c: &'a Context<'a>, instance_name: S) -> Test<'a> {
        let m = c.module(instance_name, "Test");

        let i1 = m.input("i1", 32);
        let i2 = m.input("i2", 32);
        let i3 = m.input("i3", 32);
        let i4 = m.input("i4", 32);

        let inner1 = TestInner::new(c, "inner1");
        inner1.i1.drive(i1);
        inner1.i2.drive(i2);
        let inner2 = TestInner::new(c, "inner2");
        inner2.i1.drive(i3);
        inner2.i2.drive(i4);
        let inner3 = TestInner::new(c, "inner3");
        inner3.i1.drive(inner1.o);
        inner3.i2.drive(inner2.o);
        let o = m.output("o", inner3.o);

        Test {
            i1,
            i2,
            i3,
            i4,

            o,
        }
    }
}

struct TestInner<'a> {
    pub i1: &'a Input<'a>,
    pub i2: &'a Input<'a>,
    pub o: &'a Output<'a>,
}

impl<'a> TestInner<'a> {
    // TODO: Optionally make this a submodule
    // TODO: How do we get the inner module again in order to generate code?
    pub fn new<S: Into<String>>(c: &'a Context<'a>, instance_name: S) -> TestInner<'a> {
        let m = c.module(instance_name, "TestInner");

        let i1 = m.input("i1", 32);
        let i2 = m.input("i2", 32);
        let o = m.output("o", i1 & i2);

        TestInner {
            i1,
            i2,
            o,
        }
    }
}

Overall, I think this will make modules a bit more verbose (mostly vertically due to struct wrapping etc), but I think the verbosity adds clarity, and the code that actually constructs Modules by combining Signals should get a bit lighter. Further, I think it's very useful to have instance signals exposed on a struct, as this could be extended by having other patterns - for example, we can group some signals with "bus port" structs, and add convenience functions to bind compatible bus ports to one another. Abstractions like this would be made ad-hoc and should reduce a lot of boilerplate in larger designs, especially where certain buses are used a lot (eg. xenowing). These abstractions can also potentially provide better semantic errors (eg. non-matching bus widths) before actually connecting signals (and deferring errors to lower-level kaze signal errors).

A small caveat: kaze currently treats all cases where Signals belonging to different Modules cannot be combined, but if inputs/outputs were to represent inputs/outputs on the submodule instance, we need to make an exception for them. It probably also makes sense to continue to disallow inputs/outputs for any Module that isn't a submodule of the current Module.

It might also be possible/desirable to hide some of the boilerplate with proc macros, but I'm always hesitant about magic syntax like this. If this is a common pattern that works well it might be worth it tho.

Whatever pattern(s) we end up with, even if we support them with proc macros for syntactic convenience, I think it's crucial that they be easy to understand/expressible in plain code. Further, a user can choose to forego this pattern entirely!

Some unresolved issues include:

How do we get the inner module from each wrapper struct in order to generate code for it? In nMigen, a new module is a new class that derives from a common base (Elaboratable I believe it's called), so the new module is the same object we'd generate code for, rather than a wrapper. Do we want to use traits/inheritance of some kind to mimic this? If we eventually move to an API where we separate unvalidated/validated graphs and we produce a validated graph by consuming an unvalidated graph (and transitively, all of the references to nodes in the graph, which includes modules!), how do we get ahold of the module(s) again in order to specify the top-level one(s) for codegen? Do we need to do that, or does it makes sense to "generate all top-level modules in this validated graph" always?
How do we get modules to either represent top-level modules or submodules in a nice way?
How do we specify instance names in a nice way? Do we always need instance names then (eg for top level modules, where this doesn't actually make sense in most cases)?
How can we use Inputs transparently as Signals and as sinks for Signals depending on context? Does it make sense to do Into<&'a Signal<'a>> everywhere instead of &'a Signal<'a> directly? Can this extend to Register as well so we don't have to use Register::value (which I tend to forget a lot at least)?

TraceValue

TODO here:

kaze/kaze/src/runtime/tracing.rs

Line 7 in a3435ad

 // TODO: Do we want to re-use graph::Constant for this? They're equivalent but currently distinct in their usage, so I'm not sure it's the right API design decision. 

How about a newtype?

struct TraceValue(graph::Constant);

Consider multiple clock domains

I've always liked the idea of lifting some kind of clock domain identifier(s) into the type system, so most logic is constrained to only interact with other logic in the same domain, but with some types able to bridge the gap by implementing CDC (eg. special FIFOs), which ensure robust/safe behavior between domains.

I'm not entirely sure how to handle this in sim, though. Perhaps exposing multiple posedge_clk fn's (one per enumerated domain?) or having that function take the domain to transition on as a parameter or something is sufficient? This ends up putting yet more sim scheduling burden on the user, but might make sense. This also puts more pressure on sim efficiency, since more prop calls are likely required per simulated unit of time in order to correctly propagate signals between domains.

Definitely needs more thought/experimentation!

Document error detection/handling philosophy

kaze can only detect certain kinds of errors at certain times. Some examples:

We can only combine Signals with other Signals. This is expressed in rust's type system, and is thus detectable/reported during compilation of the rust kaze code.
We can only combine Signals with certain bit_widths with other Signals of the same bit_width for certain operations, eg. x & y. Since a Signal's bit_width cannot be described with rust's type system (yet), but checking for this error is trivial, we detect and report this error during execution of the rust kaze code as the graph is being constructed (as part of the & operator impl in this case).
We can't generate correct code if a Module instance has unconnected inputs, but we must allow unconnected inputs during graph creation and expect them to be resolved before code is generated. In this case, we defer error checking/reporting until codegen time.

Further, a somewhat unsatisfying decision that I've made is that certain classes of errors are reported in different ways. Obviously errors that are checked by rust's type checker are reported as rust type errors. However, other types of errors are reported as panics. The reasoning behind this is to naturally be able to provide where in the rust code the error occurred (via the stack trace), similar to using a non-embedded language. This choice also leads to more readable user/API code, since the types aren't conflated with error handling details that would have to be used on every Signal parameter and return type. It's somewhat indirect though admittedly from a user's perspective, since a stack trace must be used to obtain this error information, as rust doesn't provide any way to get information about a function's caller out-of-the-box (this could potentially be remedied by having macro wrappers for all API entry points that expand to larger calls with additional file/line info but this also gets messy for existing Op impls which don't allow the API to be extended). This gets a bit messy for codegen as well, where it's natural for the code generator APIs to return a Result (which they actually do currently), but we currently still report those errors as panics and only use Result to report IO errors. This is arguably consistent with other runtime errors (which also panic) and this is why I made this choice, but it's also perfectly reasonable to say that this is inconsistent with its own API which returns an io::Result (though I chose io::Result specifically to communicate that only io-related errors are produced this way).

One thing we could possibly explore is to use Result<Signal<..>> for all API entry points instead of Signal directly and have all ops propagate errors in addition to the logic they already do. This would mean that this kind of propagation also needs to be tested for all ops and I think the code would generally be a lot less readable; perhaps there's a good pattern for type aliasing that would help in this regard. This is especially important for higher-order user code that can be used to generate constructs more abstractly which usually already carries additional mental weight, and shouldn't be further complicated by additional type information.

Generally this is kindof unsatisfying and probably surprising so at the very least, it should be documented as part of how to use kaze generally and other strategies may need to be investigated for future library versions.

Best way to support IP blocks?

What would be the best way to get kaze to integrate with existing IP cores? I'm looking at LVDS (for camera sensors) and HyperRAM on the efinity titanium fpga: https://efinixinc.com/products-efinity.html

The kaze HDL seems really great. I'm curious what a path forward would look like.

Thanks!

Consider dedicated structure construct

I was thinking about caches today. I currently implement them with 3 separate memories - one for valid bits, one for tag words, and one for data words. But it might be advantageous to merge these into a single memory (eg. If it helps implementation later in the pipeline). In this case, we would want to store all of this information in a single memory word, and ideally we’d like this “packing” and “unpacking” to be as safe and comfortable to use as possible. Note that this may put pressure on raising MAX_SIGNAL_BIT_WIDTH, which is ofc possible, but potentially a lot of effort.

I’m not entirely sure what this might look like.

In-process simulator backend

Mostly to simplify testing, as such a simulator would likely be at least an order of magnitude slower than generated Rust code, though it would probably be much more convenient to use, especially for smaller modules and quick prototypes/tests.

My current thinking is that this would be similar to Rust sim gen with module hierarchy flattening, but could generate custom commands that would either be interpreted (simpler) or JIT'd (faster, more fun).