kawu / feature-structure Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 552 KB

License: BSD 2-Clause "Simplified" License

Haskell 100.00%

feature-structure's People

Contributors

Watchers

feature-structure's Issues

List pattern matching

It is possible to encode a list as a feature structure using the Tree.list function. There should be also symmetric functions for list pattern matching.

Right now the library provides only a unification function which assumes that the sets of corresponding node identifiers are disjoint. It should provide a more convenient and safe interface (perhaps monadic, given that external identifiers may be needed to be changed as well) for unification as well.

Allow to specify order on the elements of the rule's body

It seems that it is possible to specify order on the elements of the phrase-structure rule in LFG* (but it would be prudent to double check this statement).

If so, it would be useful to have this functionality in our implementation as well.

*Based on: LFG Syntactic Theory, Winter Semester 2009/2010, Antske Fokkens

Two frontier nodes with the same value?

A follow up of issue #27: a graph can be constructed which has two different frontier nodes (with different IDs) with the same atomic value. As a result two graphs which are apparently equal are not considered as such.

Parsing & unification: can we guarantee that nodes of the input graphs are disjoint?

Motivation: if it's not possible to make this assumption, we will generally have to rename (or prefix) nodes in both input graphs before performing unification. Edges will need to be modified as well. But that's not all -- apart from graphs, there are also accompanying structures which hold pointers (identifiers) to graph nodes and which take part in unification. These structures will have to be modified too. All in all, there's a lot of work (which probably lifts complexity to a higher level) which could be avoided if only we could assume "disjointness".

Proposition: all feature structures taking part in the parsing algorithm are introduced at the preprocessing stage. During the proper parsing phase no new feature structures are introduced.

Conclusion: if the proposition above is true, we can re-identify all feature structures in the preprocessing stage and therefore ensure that the assumption (disjointness of graph nodes) will hold during the entire algorithm.

Is there any particular reason why `Data.Tree` doesn't provide an `Ord` instance?

Functional node joining interface?

Perhaps it would be better to replace the monadic interface for joining nodes in a feature graph with a functional one. Right now, we are using a monadic interface mainly because:

We are using a pipe for logging,
There's an underlying state with disjoint-set of node identifiers, queue of node-pairs and the feature graph itself.

Logging is not that important, especially now that we know what we are doing.

Queue is only valid within the scope of one join operation, because every time we join two nodes we want to perform the iterative node-merging process as long as the queue is not empty.

Feature graph can be an explicit argument and result of the join operation.

What remains is the disjoint-set over graph nodes. It is a by-product of the join operation as well. Otherwise, we would have to update all external pointers to graph nodes (as well as internal identifiers kept in edges), taking into account that some of the nodes might have been merged with others and subsequently removed.

If we don't want to update external identifiers, we need to make sure that all identifiers valid before the join are also valid after the operation. They may point to different nodes, but still -- they need to be valid. The only reasonable solution seems to be to represent the graph and the disjoint-set jointly, in one data structure.

Enforce type-level relation between graphs and IDs, if possible

Try to enforce at the type-level relations between graphs and corresponding identifiers so that foreign identifiers cannot be used with the given graph, only IDs dedicated to this graph.

AVM printing using diagrams framework

Use graphviz for feature structure visualization

AVM interface

The library should provide an interface for defining feature graphs using a notation similar to attribute-value matrices.

Type of `runJoinIO`

Do we gain anything by specifying that the function can run in any MonadIO context? Perhaps it would be just as well to make it run directly in IO?

Implement feature graph Ord instance

A follow-up issue of #18.

Handle unbounded chains of rule applications

Right now, for a given span [i, j), we allow at most one application of any given rule from the grammar. We should handle unbounded chains of rule applications.

Implement alternative

At the moment alternative (e.g. that a given feature/attribute has one of the two possible values) is not possible to represent.

Wrap up graph IDs in a newtype and make them incomparable?

Clean up the code

Remove old versions of certain modules, add explicit exports.

Graph equality doesn't check homomorphism

Two graphs which are not homomorphic are currently recognized as equal if they represent the same structure from the unification point of view. This situation happens if in (at least) one of the graphs, one of its subgraphs is duplicated (e.g. there are two leaves which have exactly the same atomic value).

This stems from the fact that feature graphs are not required to be minimal.

Two questions arise:

Would it make sense to require feature graphs to be minimal?
Should equality verify that graphs are homomorphic?

`IsString` instance of the feature tree

Think of a simple and generic solution to represent simple instances of feature trees -- atom, empty and label -- with a String.

QuickCheck tests

We could check correctness of the unification algorithm w.r.t. the mathematical representation of feature structures. The result of unifying u with v is their least upper bound w.r.t. the partial order specified by the subsumption relation. While the subsumption relation should be easy to implement, it can be harder to find the least upper bound.

At least, we could check if the result of the unification is an upper bound (i.e. if both u and v subsume it). The question is whether such a test will be very useful.

Combining two not-fully processed rules

How would it change the behavior of the parsing algorithm if we allowed to combine two not-fully processed rules? Note that, at this moment, only a combination of a not-fully processed rule with a fully processed rule (which will be unified with the first unprocessed body element of the first rule) is allowed.

Towards LFG

Features

To describe LFG grammars with our formalism the following features would be needed:

Representing sets with feature structures.
"We require the f-structure solution for a particular f-description to be the minimal solution to the f-description" <- does it have any implications for the unification process?
Distinction between defining and constraining equations. In particular, it seems that implementing constraining equations would require extending the current unification mechanism.
Other types of defining/constraining equations: negative equation, existential constraint, etc.
Boolean expressions over (defining/constraining) equations.
Regular expressions (there's an example with the Kleene star in [1], page 17) over paths in equations. The so-called functional uncertainty.
Inside-out function application.
Off-path constraints.

Off-path constraints

Hyphothesis: off-path constraints provide means to describe tree-structured constraints.
Question 1: is it possible to describe graph-structured constraints?
Question 2: is it possible to describe tree-structured constraints with several roots?

NOTE: the hyphothesis is not quite true, to represent a tree of constraints we can just use a set of path constraints. The problem is the semantics of regular and off-path constraints with respect to feature sets and quantification scope (at least, such an issue exists in LFG, it seems).

References

[1] Lexical Functional Grammar, Mary Dalrymple

Trim feature graph

Driven by kawu/ltag#15: user should be able to remove nodes unreachable from the roots.

Provide a convenient monadic interface for converting feature trees to feature graphs

Divide the `Uni` class into subclasses

In the current form, it leads to a redundant assumption that, e.g., identifiers of the unify arguments must be of the same type.

Grammar development: get rid of global-level identifiers

The library should provide functions which would make it possible to define a grammar without resorting to global-level identifiers (features, values). Such identifiers are not statically checked and they make the grammar development error-prone.

Improve disjoint-set implementation

Right now the implementation doesn't uses path compression, for example. This is enough for presentation purposes, but should be improved in the final implementation.

Implement subsumption

Implement subsumption operation.

Use it to solve #1 afterwards.

Implement feature graph equality

At this moment two graphs which represent the same feature structure are not necessarily equal because of internal identifiers which may (and in practice do) differ between different FG instances.

A property to test

Does unification satisfies the following property:

compare x y = compare (unify z x) (unify z y)

if it succeeds?

Represent feature graph and disjoint-set together

Following #7, it seems the right thing to do regardless of the interface of the joining module.

Motivation: if there are any external identifiers pointing to nodes of the feature graph, we will need to update them after performing node-merging. Actually, it concerns internal identifiers as well, which are kept in graph edges.

Concern: it will complicate the implementation of the feature graph, and make some of the operations less efficient, which is a bad thing when we really don't need it. But: even when we don't care about external pointers, we still need to update identifiers kept in edges if we keep the two structures (graph and disjoint-set) separately. What's more, the situation when we really care about efficiency of the joining process is during parsing & unification, a situation which involves handling structures of external pointers.

kawu / feature-structure Goto Github PK

feature-structure's People

Contributors

Watchers

feature-structure's Issues

Features

Off-path constraints

References

Recommend Projects

Recommend Topics

Recommend Org

Jobs