radeusgd / quotedpatternmatchingproof Goto Github PK

A mechanized proof of soundness of calculus defined in A Theory of Quoted Code Patterns which is a formalization of pattern matching on code available in Scala 3 as part of its new macro system.

Coq 35.91% Shell 0.13% TeX 63.96%

dotty coq formalization lambda-calculus scala

quotedpatternmatchingproof's People

Contributors

Stargazers

Watchers

quotedpatternmatchingproof's Issues

Ids instance for explicitly typed terms

I base the issue on the following assumption:
Assumption 1: In the explicitly typed calculus, terms that are contained within other terms are also explicitly typed.

Example: ((fun: A -> B) (arg: A)): B - in this instance of applying fun to arg, not only is the whole Application annotated with types but also the terms that are "inside" are in their annotated versions.
An example contradicting Assumption 1: (fun arg): B - in this instance the "inner" terms are still untyped.

Am I right that for the explicitly-typed variant of the calculus, Assumption 1 should hold?

Conjecture 1: It is impossible to derive a meaningful instance of the Ids typeclass for a explicitly typed terms.

Context: Ids (termType: Type) is a typeclass requiring to provide a function ids : var -> termType.
If I understand correctly, that function should be an 'identity substitution' which is equivalent to a variable constructor (ref: page 2).

Given Assumption 1, to create a proper Var, we need not only its index but its target type, which is not available in the given context.

We could realize the function with ids (x: var) := Var x TNat which would assign by default type Nat to all variables. But this sounds like simply a wrong thing to do.

I'm however not exactly sure how the Ids instance is used, because I think it may still work for closed terms.

Extend STLC with a boxed type and lift

Non-trivially recursive functions termination

Coq requires Fixpoint's recursion to be structural to maintain the termination requirements.

When writing substitution with normal names (not DeBruijn indices), I needed to rename variables bound by lambdas to avoid capture of free variables during the substitution.

I tried the following code:

Fixpoint fresh (term: Term) : label :=
  match term with
  | Val (Lit n) => 0
  | Val (Lam x T t) => max x (fresh t + 1)
  | Var x => x + 1
  | App t1 t2 => max (fresh t1) (fresh t2)
  end.

Definition fresh2 (t1: Term) (t2: Term) := (max (fresh t1) (fresh t2)) + 1.

Fixpoint renameVars (term: Term) (old: label) (new: label): Term :=
  match term with
  | Val (Lit n) => term
  | Val (Lam x T t) =>
    if Nat.eq_dec x old then Val (Lam new T (renameVars t old new))
    else Val (Lam x T (renameVars t old new))
  | Var x =>
    if Nat.eq_dec x old then Var new
    else term
  | App t1 t2 => App (renameVars t1 old new) (renameVars t2 old new)
end.

Program Fixpoint substitute (term: Term) (varname: label) (varterm: Term) : Term :=
  match term with
  | Val (Lit n) => term
  | Val (Lam x T t) =>
    if Nat.eq_dec x varname then term
    else
      let freshx := fresh2 varterm t in
      let t' := renameVars t x freshx in
      Val (Lam freshx T (substitute t' varname varterm))
  | Var x =>
    if Nat.eq_dec x varname then varterm
    else term
  | App t1 t2 => App (substitute t1 varname varterm) (substitute t2 varname varterm)
  end.

Unfortunately Coq was unable to prove termination, as I call substitute recursively with an argument that is a result of calling renameVars.
renameVars returns a term with the same structure (thus the same size), but that's not a trivial observation (Coq seems to be able to unfold simple definitions, but cannot prove this automatically for recursive functions).

Hints for tactics

It seems useful to add constructors as 'hints' to the prover, so that the auto and other tactics can use them.

For example:

Inductive is_free_in : label -> Term -> Prop :=
| fv_var : forall x, is_free_in x (Var x)
| fv_app1 : forall x t1 t2, is_free_in x t1 -> is_free_in x (App t1 t2)
| fv_app2 : forall x t1 t2, is_free_in x t2 -> is_free_in x (App t1 t2)
| fv_lam : forall x y t T, is_free_in x t -> x <> y -> is_free_in x (Lam y T t)
.
Hint Constructors is_free_in.

When I have a goal is_free_in x (Var x), normally I'd have to write apply fv_var, but after invoking the Hint command, I can just call auto.
This is useful as I can call for example: induction something; auto and have all the trivial branches dealt with in one line instead of looking for each rule separately.

Inductive predicates vs Fixpoints

The first intuition when defining some property, like 'isvalue' is to use an Inductive predicate.

For example:

Inductive isvalue : typedterm -> Prop :=
| Val_Nat : forall n T, isvalue (Nat n : T)
| Val_Lam : forall t T1 T, isvalue (Lam T1 t : T)
| Val_Box : forall t T, isplain t -> isvalue (Quote t : T).

This makes it easy to use in theorem statements (just use isvalue t) and looks nice.
However it proves troublesome when we need to prove and assume negation, I haven't analysed deeply the core reasons for that, but I think it is because Coq uses constructive/intuitionistic logic and negation has sometimes unusual properties there.

So in case of such predicates it may be better to use a fixpoint definition (it's not always possible but works ok for simple properties) - it's advantage is that we get a boolean value and it works with negation without issues.

For example:

Fixpoint decide_isvalue (t : typedterm) : bool :=
  match t with
  | TypedTerm t' _ =>
    match t' with
    | Nat _ => true
    | Lam _ ebody => true
    | Quote t => decide_isplain t
    | _ => false
    end
  end.

A disadvantage is that some proofs that were trivial with the inductive predicate (using inversion tactic) now may not work, but usually they can usually be fixed with clever usage cbn and cbv.

Add pattern matching

Handling names / binders

While proving lambda calculus properties it's important to somehow handle binding names of variables to their definition.

The first approach was to just use nats as names and handle capture avoiding substitution by generating fresh names if needed, but this had other issues as described in #1

The second approach is to use DeBruijn indices where a variable is a number specifying how many lambdas are betwene the variable and the lambda that it is bound to.

This approach is harder to grasp conceptually, because we are not used to it in programming, but it makes things like substitution much easier and makes sure we don't get any naming conflicts (basically by definition).

Add quotes and splices

Extend STLC with explicit type annotations

Add fix operator

STLC with autosubst

Finished in 6c330c0

Induction over mutually recursive types

With the following definition of term type:

Inductive Term :=
| Val : Value -> Term
| Var : label -> Term
| App : Term -> Term -> Term
with Value :=
| Lit : nat -> Value
| Lam : label -> TType -> Term -> Value.

When doing induction on the structure of the term, the case for Val doesn't get the Inductive Hypothesis for Lam.

This makes it impossible to prove some theorems.

Notation

One of the first things I wanted to learn to be able to specify the theorems in a more readable way was Coq notation mechanism.
By looking at examples and some trial and error I found how to define constants and simple notational shortcuts.

Example constant definition: Notation "∅" := emptyEnvironment. which allows to use the unicode ∅ for an empty environment.

One can also have arguments in the defined notation, so for example a function call can be translated into pretty infix operators like I did for extending the environment:

Notation "G ';' x ':' T" := (Textend G x T) (at level 40, x at level 59).

A more complex example is defining typing judgements using the well known syntax instead of a traditional inductive type syntax.
First I have to "reserve" the notation and then I can use it in the definition at the end "explaining" how the new notation relates to the original inductive type definition:

Reserved Notation "G '⊢' t ':' T" (at level 40, t at level 59).
Inductive term_typing : TypCtx -> Term -> TType -> Prop :=
| ty_lit : forall G n, G ⊢ (Lit n) : TNat
| ty_var : forall G x t, Tcontains G x t -> G ⊢ Var x : t
| ty_lam : forall G arg argT body bodyT, (G; arg : argT) ⊢ body : bodyT -> G ⊢ (Lam arg argT body) : (TLam argT bodyT)
| ty_app : forall G f arg argT retT, G ⊢ f : (TLam argT retT) -> G ⊢ arg : argT -> G ⊢ (App f arg) : retT
where "G '⊢' t ':' T" := (term_typing G t T).

(the where at the end of the definition is what actually binds the notation to the internal construct).

For now I haven't learned how to use the levels correctly - I know that they relate to operator precedence, but I haven't gotten into detail there - I just try to set them intuitively and prefer to use parentheses in unclear situations to avoid ambiguity.

Multiple binders in patterns

Patterns may contain multiple binders, for example we may have a pattern (x1 x2) (matching an application and extracting each term into x1 and x2 respectively).

The number of binders can be arbitrary in a pattern (from 0 to any positive integer).

When using DeBruijn indices for encoding binders, we need to know how many binders are added in the inner term so that when we refer to outer variables we know which index to use.

Another thing is order of the binders - we need to have some notion of order to be able to address binders in the pattern by an index (but this is not a big issue as probably something like index of inorder traversal of the pattern tree could be used).

The issue is however, that autosubst seems to only handle single-level binders, like in a lambda - we can bind one argument. I'm not sure if we can directly achieve multi-level binders for a fixed level, like a lambda taking 2 arguments at once.
Binders parametrized by an arbitrary natural number (which seems required for handling patterns) seem completely out of question.

This is however based on my current view of autosubst and I may be wrong about it. I couldn't find any information in the manual or examples hinting that this was possible.

Quickly applying a hypothesis with a quantifier.

It often happens (for example when doing induction), that I have a complex hypothesis with a forall quantifier, like

  IHt1 : forall T : type,
         ∅ ⊢ t1 : T ->
         isValue t1 \/ (exists t' : term, t1 --> t')

I have some T that I know will fit.
Its conclusion is not directly applicable to my current goal, so I cannot do apply IHt1 with SomeT nor eapply IHt2.

What I do is

    assert (isValue t1 \/ (exists t' : term, t1 --> t')).
    apply IHt1 with SomeT. trivial.

I'm however wondering if it's possible to do that in a more concise way (like writing the assert without mentioning the full conclusion as it can be inferred from the IH.

I' mostly thinking Don't Repeat Yourself principle - I have the conclusion already in IH so copy-pasting it into assert sounds risk (as if I change it slightly the assert will no longer work).

Case analysis of equality

It often comes up in proofs that I have 2 values (for example nats) and the proof continues differently depending on whether they are equal or not.

One trick to deal with that is to use the following code (assume we have x y: nat):

remember (x =? y) as xeqy.
destruct xeqy.
* assert (x = y). apply beq_nat_true. auto.
  [part of proof with the H: x = y]
* assert (x <> y). apply beq_nat_false. auto.
  [part of proof with the H: x <> y]

This becomes tedious to write when there are multiple cases to analyse and can make proof less readable.

Order of quantifiers in theorems proven by induction

This is a rather simple observation, but it's something we don't notice often when doing proofs on paper.

If we formulate have

Inductive B := B0 | B1 : A -> B -> B.

the theorem as forall (a: A) (b: B), P a -> Q b -> R a b and we do induction on the structure of b, in the B1 case we will get an inductive hypothesis of the form Q b -> Q a b where a is the constant a we got from intros.

But for some theorems this may not be enough, so instead we should change the order of quantifiers to forall (b: B) (a: A) , P a -> Q b -> R a b. Now we can intro only b first, do induction on it and intro a only later, so that in B1 case we will get an inductive hypothesis like forall a : A, P a -> Q b -> R a b which is more flexible (as we can instantiate a arbirtarly).

This was important for example when proving the Substitution lemma (

QuotedPatternMatchingProof/STLC.v

Line 280 in e02621d

 Lemma Substitution : forall t2 G t1 T1 x T2, ∅ ⊢ t1 : T1 /\ G ; x : T1 ⊢ t2 : T2 -> G ⊢ substitute t2 x t1 : T2. 

) - we did induction on term t2 and needed to be able to instantiate G to different values than only the one introduced at the beginning.

radeusgd / quotedpatternmatchingproof Goto Github PK

quotedpatternmatchingproof's People

Contributors

Stargazers

Watchers

quotedpatternmatchingproof's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs