Comments (7)
Proposal:
As discussed in #11, it would be nice if <j/>
could be +
as elsewhere, so add <d/>
as a word boundary. For backwards-compatibility, we could put the new behavior of <j/>
behind a command-line flag or just regex-replace all the existing files.
To solve both this issue and #11, I propose the following:
XML symbol | transducer symbol | behavior |
---|---|---|
<d space="auto"/> or <d/> |
<$> |
use the next entry (possibly empty) in the blank queue (the current behavior of <j/> ) |
<d space="no"/> |
<$-> |
do not pop a blank off the queue |
<d space="yes"/> |
<$_> |
pop a blank off the queue and replace it by a space if it's empty |
@ftyers, @jonorthwash, @unhammer - does this seem reasonable?
If this sounds good to everybody, it shouldn't take more than an hour or so to set up.
from apertium-separable.
Would work for me :-)
I'd prefer regex-replace, seems like something one can fairly safely do.
from apertium-separable.
Should we take three weeks of silence as consent then? :) would be nice to have a solution for this seemingly trivial issue …
from apertium-separable.
… what if I just want a space without ^$
? If I do
<e>
<p> <l>,<s n="cm"/><d/></l> <r></r></p>
<i>og<s n="cnjcoo"/><s n="clb"/><d/></i>
</e>
to delete a comma, I get
$ lsx-comp lr apertium-dan-nor.nor-dan.lsx nor-dan.autoseq.bin && echo '^*foo$^,<cm>$ ^og<cnjcoo><clb>$' | lsx-proc nor-dan.autoseq.bin
main@standard 10 9
^*foo$^og<cnjcoo><clb>$
and the exact same whether I specify space=yes/no or not at all.
I can put <d space="yes"/>
in the <r>
, but then it of course outputs an empty ^$
(rtx-proc seems to drop it, but I'm guessing it could get in the way of the parse)
from apertium-separable.
So the issue here is the handling of blanks at the boundaries of matches.
The hacky solution is just to make your lsx patterns 1 token longer.
The less-hacky solution is to come up with a way of representing the desired status of blanks in the FST.
And perhaps as a third level, we could make apertium-posttransfer
more capable and have it redo spacing more generally.
from apertium-separable.
On a related note: Is there a good solution for enforcing no-final-space on a <par
usage without duplicating the whole pardef tree? E.g. I have a rule that inserts a comma (should have no space in front of it) after a noun phrase. I have a noun phrase pardef that matches (adj)*(ncmp)*n|prn|(ncmp)*np|det.something
via a multitude of other pardefs which all end in <d/>
so (I don't have to add a final <d/>
after each usage in a rule). Maybe I should just never end pardefs in the final <d/>
, and explicitly add d's in the rule? Or is there a better way?
from apertium-separable.
I can't think of a better way, though one thing you could do is
<pardef n="NP_space">
<e><par n="NP"/><i><d/></i></e>
</pardef>
<pardef n="NP_nospace">
<e><par n="NP"/><i><d space="no"/></i></e>
</pardef>
That would still require you to remove <d/>
from a lot of pardefs, but without making you add it to all the rules.
For initial space, it occurs to me that I could special-case <d/>
at runtime so that if it appears as the first symbol of the output it will manipulate spaces without creating an empty LU ... but then lsx-proc
treats blanks as being attached to the preceding word. Drat.
from apertium-separable.
Related Issues (20)
- README says zlib is required, configure.ac doesn't; which is right? HOT 6
- let's drop --enable-debug? everyone else just overrides C{,XX}FLAGS
- Issue with blanks HOT 2
- Outputs extra chars HOT 14
- LU doesn't delete after combining HOT 5
- Needs Python 2to3 conversion
- Compile error because of missing method HOT 2
- lsx-proc eats final blank
- Error: Trying to link nonexistent states HOT 3
- How to keep caps? HOT 6
- rule-initial <w/> can make other rules match HOT 1
- lsx-comp compiling error HOT 4
- lsx-comp not running as before HOT 4
- apertium-filter-rules for lsx files HOT 14
- weights HOT 2
- Tags on individual unchanged LU's are spread across the whole matching rule HOT 4
- Can't compile on Manjaro linux ARM HOT 3
- Possible to use for matching on forms? HOT 7
- lsx-comp --trace option that inserts (top-level rule) line numbers in output HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apertium-separable.