I'd like to propose an architecture for this tool to be as versatile, easily extensible and testable as possible. This would involve decoupling the input language from the class hierarchy and transformations, and decoupling the output format from those as well.
The components would be:
- Core library: responsible for modeling the inheritance tree and providing model transformations
- Input language library/libraries: responsible for building up the inheritance tree and transformations from some input language (YAML, JSON, ...)
- Templating library: wraps up the tree model (sort of like in red-green trees) in the core library to provide a more extensive viewmodel for a templating engine, like Scriban
- Output language library/libraries: Using the viewmodel provided by the templating component, it generates output for a specific language
- CLI: A command-line interface to drive all of this as a simple to invoke .NET tool
In the following sections I'd like to detail these components slightly more.
Core library
Class hierarchy
The core could provide the model for the inheritance tree. It could look something like so (just a sketch):
// Describes a compiler pass
record CompilerPass(
string Name,
string Documentation,
// Transformations to apply to get the tree based on the previous pass
IList<Transformation> Transformations,
TreeHierarchy Tree,
CompilerPass? PreviousPass,
CompilerPass? NextPass
);
// Wraps up an entire hierarchy
// Not necessary, but makes the API nicer
record TreeHierarchy(
IDictionary<string, TreeNodeClass> Nodes
);
// Describes a single class in a hierarchy
record TreeNodeClass(
string Name,
string Documentation,
TreeNodeClass Parent,
IDictionary<string, TreeNodeMember> Members,
// Language-specific things could be here
// Sealed? Abstract? Some applied attribute for Python?
ISet<object> Attributes
);
// Describes a single member/property in a class
record TreeNodeMember(
string Name,
string Documentation,
// Dynamic languages might not have a type
string? Type,
// Language-specific things could be here
// Public? Apply some attribute? Leave out from pretty-printing?
ISet<object> Attributes
);
Something like this wouldn't be too language-specific, but isn't too general either to be practically useless. Things like the type specification could be elaborated better, if needed. Also, read-write properties would be nicer for such an API, I only used records for the simple syntax.
Tree transformation
The key operation the core would provide is tree transformation. It would take a tree hierarchy as an input, apply a transformation that would result in a new tree hierarchy. This is how the passes would build up their trees. Transformations could optionally be applied on nodes matching a certain pattern. A possible API:
interface ITreeNodePattern
{
public bool IsRecursive { get; }
public bool Matches(TreeNodeClass c);
}
interface ITreeTransformer
{
public ITreeNodePattern? Pattern { get; }
public TreeHierarchy Apply(TreeHierarchy h);
}
Built-in transformations we could provide (and we could extend later):
- Add a node
- Remove node
- Add a member to node
- Remove a member from node
Built-in patterns we could provide (and we could extend later):
- Node with given name
- Node with name matching a regex
- Node with given member(s)
- All nodes
Rationale for the scope
I believe this is a well-testable and easily extensible component. The rest deal with input and output, which likely means mostly integration and end-to-end tests will apply to them. This component can be unit-tested to oblivion with all the patterns and transformations.
Input language libraries
These would be less interesting libraries, taking an input language and then transforming it to the core library representations, describing passes. Most likely it would invoke some existing language parser, like YAML or JSON, but it could also be some custom notation. I wouldn't focus on developing many of these "front-ends" until the core has a stable enough API. Note, that the input languages don't have to expose 100% of the core features. It's perfectly fine to only support the necessities.
Templating library
The templating library would wrap up the tree into a more redundant data structure that is more easily consumed by template engines. For example, these node wrappers would provide navigation to both the parent and children, or they could list all members, including the inherited ones. To stay language-agnostic, these should be generic wrappers, that the language-specific wrappers could re-use. For example, this library could ship a node wrapper something like this:
abstract class TreeNodeClassView<TSelf>
where TSelf : TreeNodeView<TSelf>
{
private readonly TreeNodeClass underlying;
protected virtual bool HasAttribute(object attr) => underlying.Attributes.Contains(attr);
public TSelf Parent => /* wrap up the parent in this type */;
public IEnumerable<TSelf> Derived => /* wrap up the derived classes in this type */;
// ...
}
Output language libraries
The output language libraries would adapt the wrappers in the templating library to the destination language (this is why the wrappers are abstract and generic). For example, adapting it to C#:
class CSharpTreeNodeClassView : TreeNodeClassView<CSharpTreeNodeClassView>
{
// Specialize things like attributes to be specific to C#
public bool IsSealed => HasAttribute(CSharpAttribs.Sealed);
public bool IsAbstract => HasAttribute(CSharpAttribs.Abstract);
// ...
}
The libraries would ship the required templates:
- A template for generating a class hierarchy
- A template for generating a visitor base class
Optionally, the library would ship a language formatter, or have the knowledge to invoke a pre-installed language formatter.