GithubHelp home page GithubHelp logo

Comments (4)

unhammer avatar unhammer commented on May 30, 2024

Here's a minimal lsx rule that recreates the situation (the actual rules have pardefs to capture full adverbials of different kinds, I just expanded them to exactly what matches):

    <e>
      <i><w/><s n="pr"/><t/><d/></i>
      <i><w/><s n="adj"/><t/><d/></i>
      <i><w/><s n="n"/><t/><d/></i>
      <p><l>foreligge</l>          <r>ligge</r></p>  <i><t/><d/></i>
      <i><w/><s n="adv"/><t/><d/></i>
      <i><w/><s n="adj"/><t/><d/></i>
      <i><w/><s n="n"/><t/><d/></i>
      <p><l></l>                   <r>fore<s n="adv"/><d/></r></p>
    </e>

The part before the verb is there to create a longer match, since one might want different idioms in different contexts. But you'd get the same behaviour with a simpler rule and marking a different word:

    <e>
      <p><l>foreligge</l>          <r>ligge</r></p>  <i><t/><d/></i>
      <i><w/><s n="adv"/><t/><d/></i>
      <i><w/><s n="adj"/><t/><d/></i>
      <i><w/><s n="n"/><t/><d/></i>
      <p><l></l>                   <r>fore<s n="adv"/><d/></r></p>
    </e>
echo 'Hos personer med atopisk eksem foreligger ofte <a class="crossref" href="https://sml.snl.no/arv">arvelige</a> faktorer' | apertium -u -f html-noent -d . nob-nno_e
Hos personar <a class="crossref" href="https://sml.snl.no/arv">med atopisk eksem ligg ofte arvelege faktorar føre</a>

This is exactly how lsx is meant to be used, so if we want it not to spread tags that way I guess we have to do something to lsx. @mr-martian any ideas?

from apertium-separable.

mr-martian avatar mr-martian commented on May 30, 2024

It would be possible (annoying with pardefs but possible) to detect which LUs are unchanged and mark those as such somehow in the FST, and then any with such a mark don't get wblank spreading. The only question then is how to represent this in the FST itself.

from apertium-separable.

unhammer avatar unhammer commented on May 30, 2024

Would it be easier to detect it in the compiled FST and do some fst-postprocessing? Ie find paths where it's all identities before the <d/>

Marking while pardefs as identity-only wouldn't help if there's a pardef that's got one <e><i>a</i></e> and one <e><p><l>a</l><r>b</p></e> – only when matching the first one do we want to avoid tag-spreading.

(Alternatively, is this something that could be more easily checked at runtime?)

from apertium-separable.

mr-martian avatar mr-martian commented on May 30, 2024

Runtime might not actually be a bad idea. We could check if any of the output words are identical to any of the input words, and if they are, then they only get the wblank of that input word. This only becomes a problem if it turns out to be noticeably slower or if the input contains multiple identical words. Arguably there's an issue if the rule deletes a word and creates another one that happens to be identical, but I think we can just count that as moving it.

from apertium-separable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.