GithubHelp home page GithubHelp logo

feature-structure's People

Contributors

kawu avatar

Watchers

 avatar  avatar  avatar

feature-structure's Issues

List pattern matching

It is possible to encode a list as a feature structure using the Tree.list function. There should be also symmetric functions for list pattern matching.

Safe unification

Right now the library provides only a unification function which assumes that the sets of corresponding node identifiers are disjoint. It should provide a more convenient and safe interface (perhaps monadic, given that external identifiers may be needed to be changed as well) for unification as well.

Allow to specify order on the elements of the rule's body

It seems that it is possible to specify order on the elements of the phrase-structure rule in LFG* (but it would be prudent to double check this statement).

If so, it would be useful to have this functionality in our implementation as well.

*Based on: LFG Syntactic Theory, Winter Semester 2009/2010, Antske Fokkens

Two frontier nodes with the same value?

A follow up of issue #27: a graph can be constructed which has two different frontier nodes (with different IDs) with the same atomic value. As a result two graphs which are apparently equal are not considered as such.

Parsing & unification: can we guarantee that nodes of the input graphs are disjoint?

Motivation: if it's not possible to make this assumption, we will generally have to rename (or prefix) nodes in both input graphs before performing unification. Edges will need to be modified as well. But that's not all -- apart from graphs, there are also accompanying structures which hold pointers (identifiers) to graph nodes and which take part in unification. These structures will have to be modified too. All in all, there's a lot of work (which probably lifts complexity to a higher level) which could be avoided if only we could assume "disjointness".

Proposition: all feature structures taking part in the parsing algorithm are introduced at the preprocessing stage. During the proper parsing phase no new feature structures are introduced.

Conclusion: if the proposition above is true, we can re-identify all feature structures in the preprocessing stage and therefore ensure that the assumption (disjointness of graph nodes) will hold during the entire algorithm.

Functional node joining interface?

Perhaps it would be better to replace the monadic interface for joining nodes in a feature graph with a functional one. Right now, we are using a monadic interface mainly because:

  • We are using a pipe for logging,
  • There's an underlying state with disjoint-set of node identifiers, queue of node-pairs and the feature graph itself.

Logging is not that important, especially now that we know what we are doing.

Queue is only valid within the scope of one join operation, because every time we join two nodes we want to perform the iterative node-merging process as long as the queue is not empty.

Feature graph can be an explicit argument and result of the join operation.

What remains is the disjoint-set over graph nodes. It is a by-product of the join operation as well. Otherwise, we would have to update all external pointers to graph nodes (as well as internal identifiers kept in edges), taking into account that some of the nodes might have been merged with others and subsequently removed.

If we don't want to update external identifiers, we need to make sure that all identifiers valid before the join are also valid after the operation. They may point to different nodes, but still -- they need to be valid. The only reasonable solution seems to be to represent the graph and the disjoint-set jointly, in one data structure.

AVM interface

The library should provide an interface for defining feature graphs using a notation similar to attribute-value matrices.

Type of `runJoinIO`

Do we gain anything by specifying that the function can run in any MonadIO context? Perhaps it would be just as well to make it run directly in IO?

Implement alternative

At the moment alternative (e.g. that a given feature/attribute has one of the two possible values) is not possible to represent.

Graph equality doesn't check homomorphism

Two graphs which are not homomorphic are currently recognized as equal if they represent the same structure from the unification point of view. This situation happens if in (at least) one of the graphs, one of its subgraphs is duplicated (e.g. there are two leaves which have exactly the same atomic value).

This stems from the fact that feature graphs are not required to be minimal.

Two questions arise:

  • Would it make sense to require feature graphs to be minimal?
  • Should equality verify that graphs are homomorphic?

QuickCheck tests

We could check correctness of the unification algorithm w.r.t. the mathematical representation of feature structures. The result of unifying u with v is their least upper bound w.r.t. the partial order specified by the subsumption relation. While the subsumption relation should be easy to implement, it can be harder to find the least upper bound.

At least, we could check if the result of the unification is an upper bound (i.e. if both u and v subsume it). The question is whether such a test will be very useful.

Combining two not-fully processed rules

How would it change the behavior of the parsing algorithm if we allowed to combine two not-fully processed rules? Note that, at this moment, only a combination of a not-fully processed rule with a fully processed rule (which will be unified with the first unprocessed body element of the first rule) is allowed.

Towards LFG

Features

To describe LFG grammars with our formalism the following features would be needed:

  • Representing sets with feature structures.
  • "We require the f-structure solution for a particular f-description to be the minimal solution to the f-description" <- does it have any implications for the unification process?
  • Distinction between defining and constraining equations. In particular, it seems that implementing constraining equations would require extending the current unification mechanism.
  • Other types of defining/constraining equations: negative equation, existential constraint, etc.
  • Boolean expressions over (defining/constraining) equations.
  • Regular expressions (there's an example with the Kleene star in [1], page 17) over paths in equations. The so-called functional uncertainty.
  • Inside-out function application.
  • Off-path constraints.

See also http://www2.parc.com/isl/groups/nltt/xle/doc/notations.html.

Off-path constraints

Hyphothesis: off-path constraints provide means to describe tree-structured constraints.
Question 1: is it possible to describe graph-structured constraints?
Question 2: is it possible to describe tree-structured constraints with several roots?

NOTE: the hyphothesis is not quite true, to represent a tree of constraints we can just use a set of path constraints. The problem is the semantics of regular and off-path constraints with respect to feature sets and quantification scope (at least, such an issue exists in LFG, it seems).

References

[1] Lexical Functional Grammar, Mary Dalrymple

Grammar development: get rid of global-level identifiers

The library should provide functions which would make it possible to define a grammar without resorting to global-level identifiers (features, values). Such identifiers are not statically checked and they make the grammar development error-prone.

Improve disjoint-set implementation

Right now the implementation doesn't uses path compression, for example. This is enough for presentation purposes, but should be improved in the final implementation.

Implement feature graph equality

At this moment two graphs which represent the same feature structure are not necessarily equal because of internal identifiers which may (and in practice do) differ between different FG instances.

A property to test

Does unification satisfies the following property:

  • compare x y = compare (unify z x) (unify z y)

if it succeeds?

Represent feature graph and disjoint-set together

Following #7, it seems the right thing to do regardless of the interface of the joining module.

Motivation: if there are any external identifiers pointing to nodes of the feature graph, we will need to update them after performing node-merging. Actually, it concerns internal identifiers as well, which are kept in graph edges.

Concern: it will complicate the implementation of the feature graph, and make some of the operations less efficient, which is a bad thing when we really don't need it. But: even when we don't care about external pointers, we still need to update identifiers kept in edges if we keep the two structures (graph and disjoint-set) separately. What's more, the situation when we really care about efficiency of the joining process is during parsing & unification, a situation which involves handling structures of external pointers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.