Comments (4)
Here's a minimal lsx rule that recreates the situation (the actual rules have pardefs to capture full adverbials of different kinds, I just expanded them to exactly what matches):
<e>
<i><w/><s n="pr"/><t/><d/></i>
<i><w/><s n="adj"/><t/><d/></i>
<i><w/><s n="n"/><t/><d/></i>
<p><l>foreligge</l> <r>ligge</r></p> <i><t/><d/></i>
<i><w/><s n="adv"/><t/><d/></i>
<i><w/><s n="adj"/><t/><d/></i>
<i><w/><s n="n"/><t/><d/></i>
<p><l></l> <r>fore<s n="adv"/><d/></r></p>
</e>
The part before the verb is there to create a longer match, since one might want different idioms in different contexts. But you'd get the same behaviour with a simpler rule and marking a different word:
<e>
<p><l>foreligge</l> <r>ligge</r></p> <i><t/><d/></i>
<i><w/><s n="adv"/><t/><d/></i>
<i><w/><s n="adj"/><t/><d/></i>
<i><w/><s n="n"/><t/><d/></i>
<p><l></l> <r>fore<s n="adv"/><d/></r></p>
</e>
echo 'Hos personer med atopisk eksem foreligger ofte <a class="crossref" href="https://sml.snl.no/arv">arvelige</a> faktorer' | apertium -u -f html-noent -d . nob-nno_e
Hos personar <a class="crossref" href="https://sml.snl.no/arv">med atopisk eksem ligg ofte arvelege faktorar føre</a>
This is exactly how lsx is meant to be used, so if we want it not to spread tags that way I guess we have to do something to lsx. @mr-martian any ideas?
from apertium-separable.
It would be possible (annoying with pardefs but possible) to detect which LUs are unchanged and mark those as such somehow in the FST, and then any with such a mark don't get wblank spreading. The only question then is how to represent this in the FST itself.
from apertium-separable.
Would it be easier to detect it in the compiled FST and do some fst-postprocessing? Ie find paths where it's all identities before the <d/>
Marking while pardefs as identity-only wouldn't help if there's a pardef that's got one <e><i>a</i></e>
and one <e><p><l>a</l><r>b</p></e>
– only when matching the first one do we want to avoid tag-spreading.
(Alternatively, is this something that could be more easily checked at runtime?)
from apertium-separable.
Runtime might not actually be a bad idea. We could check if any of the output words are identical to any of the input words, and if they are, then they only get the wblank of that input word. This only becomes a problem if it turns out to be noticeably slower or if the input contains multiple identical words. Arguably there's an issue if the rule deletes a word and creates another one that happens to be identical, but I think we can just count that as moving it.
from apertium-separable.
Related Issues (20)
- README says zlib is required, configure.ac doesn't; which is right? HOT 6
- let's drop --enable-debug? everyone else just overrides C{,XX}FLAGS
- Issue with blanks HOT 2
- Outputs extra chars HOT 14
- LU doesn't delete after combining HOT 5
- Needs Python 2to3 conversion
- Compile error because of missing method HOT 2
- lsx-proc eats final blank
- Error: Trying to link nonexistent states HOT 3
- How to keep caps? HOT 6
- rule-initial <w/> can make other rules match HOT 1
- Is it possible to enforce a space? HOT 7
- lsx-comp compiling error HOT 4
- lsx-comp not running as before HOT 4
- apertium-filter-rules for lsx files HOT 14
- weights HOT 2
- Can't compile on Manjaro linux ARM HOT 3
- Possible to use for matching on forms? HOT 7
- lsx-comp --trace option that inserts (top-level rule) line numbers in output HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apertium-separable.