GithubHelp home page GithubHelp logo

Comments (7)

mr-martian avatar mr-martian commented on May 29, 2024 1

Proposal:
As discussed in #11, it would be nice if <j/> could be + as elsewhere, so add <d/> as a word boundary. For backwards-compatibility, we could put the new behavior of <j/> behind a command-line flag or just regex-replace all the existing files.

To solve both this issue and #11, I propose the following:

XML symbol transducer symbol behavior
<d space="auto"/> or <d/> <$> use the next entry (possibly empty) in the blank queue (the current behavior of <j/>)
<d space="no"/> <$-> do not pop a blank off the queue
<d space="yes"/> <$_> pop a blank off the queue and replace it by a space if it's empty

@ftyers, @jonorthwash, @unhammer - does this seem reasonable?

If this sounds good to everybody, it shouldn't take more than an hour or so to set up.

from apertium-separable.

unhammer avatar unhammer commented on May 29, 2024

Would work for me :-)

I'd prefer regex-replace, seems like something one can fairly safely do.

from apertium-separable.

unhammer avatar unhammer commented on May 29, 2024

Should we take three weeks of silence as consent then? :) would be nice to have a solution for this seemingly trivial issue …

from apertium-separable.

unhammer avatar unhammer commented on May 29, 2024

… what if I just want a space without ^$? If I do


    <e>
      <p> <l>,<s n="cm"/><d/></l> <r></r></p>
      <i>og<s n="cnjcoo"/><s n="clb"/><d/></i>
    </e>

to delete a comma, I get

$ lsx-comp lr apertium-dan-nor.nor-dan.lsx nor-dan.autoseq.bin && echo '^*foo$^,<cm>$ ^og<cnjcoo><clb>$' | lsx-proc nor-dan.autoseq.bin
main@standard 10 9
^*foo$^og<cnjcoo><clb>$

and the exact same whether I specify space=yes/no or not at all.

I can put <d space="yes"/> in the <r>, but then it of course outputs an empty ^$ (rtx-proc seems to drop it, but I'm guessing it could get in the way of the parse)

from apertium-separable.

mr-martian avatar mr-martian commented on May 29, 2024

So the issue here is the handling of blanks at the boundaries of matches.

The hacky solution is just to make your lsx patterns 1 token longer.

The less-hacky solution is to come up with a way of representing the desired status of blanks in the FST.

And perhaps as a third level, we could make apertium-posttransfer more capable and have it redo spacing more generally.

from apertium-separable.

unhammer avatar unhammer commented on May 29, 2024

On a related note: Is there a good solution for enforcing no-final-space on a <par usage without duplicating the whole pardef tree? E.g. I have a rule that inserts a comma (should have no space in front of it) after a noun phrase. I have a noun phrase pardef that matches (adj)*(ncmp)*n|prn|(ncmp)*np|det.something via a multitude of other pardefs which all end in <d/> so (I don't have to add a final <d/> after each usage in a rule). Maybe I should just never end pardefs in the final <d/>, and explicitly add d's in the rule? Or is there a better way?

from apertium-separable.

mr-martian avatar mr-martian commented on May 29, 2024

I can't think of a better way, though one thing you could do is

<pardef n="NP_space">
  <e><par n="NP"/><i><d/></i></e>
</pardef>
<pardef n="NP_nospace">
  <e><par n="NP"/><i><d space="no"/></i></e>
</pardef>

That would still require you to remove <d/> from a lot of pardefs, but without making you add it to all the rules.

For initial space, it occurs to me that I could special-case <d/> at runtime so that if it appears as the first symbol of the output it will manipulate spaces without creating an empty LU ... but then lsx-proc treats blanks as being attached to the preceding word. Drat.

from apertium-separable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.