GithubHelp home page GithubHelp logo

Comments (14)

unhammer avatar unhammer commented on June 9, 2024 2

If we combine it with apertium/lttoolbox#104 we'll be back to zero =D

from apertium-separable.

khannatanmai avatar khannatanmai commented on June 9, 2024
$ echo "rumal rech che" | apertium -d . quc-spa-separable
^rumal rech<cnjadv>$ ^chi<pr>$ ^ech<n><rel><px3sg>$^.<sent>$

Seems to work fine for me.

from apertium-separable.

MarcRiera avatar MarcRiera commented on June 9, 2024

Happening on my side as well, with a different pair:

$ echo "hello" | apertium -d . eng-cat-autoseq
^hello<n><sg>$^.<sent>$
?

from apertium-separable.

hectoralos avatar hectoralos commented on June 9, 2024

It is happening in both apertium-fra-cat and apertium-fra-frp. For instance, in the first one:

$ echo "^maison<n><f><sg>$" | od -An -vtu1
  94 109  97 105 115 111 110  60 110  62  60 102  62  60 115 103
  62  36  10
$ echo "^maison<n><f><sg>$" | lsx-proc fra-cat.autosep.bin | od -An -vtu1
  94 109  97 105 115 111 110  60 110  62  60 102  62  60 115 103
  62  36  10  63

(apertium-separable has nothing to do with "maison", but simply passing it to output; instead...)

from apertium-separable.

unhammer avatar unhammer commented on June 9, 2024

I'm getting the ? if use a locale other than C.UTF-8 (even utf8 locales):

$ locale -a|while read -r LANG; do echo "$LANG"; echo '^.<sent>$[][\n]' |  lsx-proc nob-nno.autoseq.bin | grep -c '?' ;done
C
1
C.UTF-8
0
POSIX
1
en_AG
1
en_GB.utf8
1
en_US.utf8
1
nn_NO.utf8
1

(since the ? is at the very end of the output without any trailing newline, my terminal doesn't always show it, but grep gives the correct answer)

from apertium-separable.

unhammer avatar unhammer commented on June 9, 2024

https://github.com/apertium/apertium-separable/blob/master/src/lsx_processor.cc#L164 pushes the ? into blank queue while
https://github.com/apertium/apertium-separable/blob/master/src/lsx_processor.cc#L157 puts it into parts[0]

from apertium-separable.

unhammer avatar unhammer commented on June 9, 2024

So it seems like it's a -1? If I do

diff --git a/src/lsx_processor.cc b/src/lsx_processor.cc
index 5c9aec7..555a47b 100644
--- a/src/lsx_processor.cc
+++ b/src/lsx_processor.cc
@@ -172,6 +172,12 @@ LSXProcessor::processWord(FILE* input, FILE* output)
   if(lu_queue.size() == 0)
   {
     readNextLU(input);
+    while(!feof(input)) {
+        wchar_t c = fgetwc_unlocked(input);
+        wprintf(L"{0x%04x}", c);
+        fputwc_unlocked((int)c, output);
+        fputwc_unlocked(L'\n', output);
+    }
   }
   if(at_end && lu_queue.size() == 1 && lu_queue.back().size() == 0)
   {

and echo '^.<sent>$' | src/lsx-proc nob-nno.autoseq.bin, I see

{0xffffffff}?
^.<sent>$

from apertium-separable.

unhammer avatar unhammer commented on June 9, 2024

I notice NUL handling is different in separable from other tools, not sure if that's relevant; most of the other tools use

  while(!feof(input) && val != 0)
  {
    val = fgetwc_unlocked(input);

while separable uses

  while(!feof(input))
  {
    wchar_t c = fgetwc_unlocked(input);
    if(null_flush && c == L'\0')
    {
      at_end = true;
      at_null = true;
      break;
    }

from apertium-separable.

mr-martian avatar mr-martian commented on June 9, 2024

That difference is because when I wrote it I was copying structure from -recursive and I don't remember if the original reason was that -recursive is more complicated or if it's just that I liked that structure better.

Oh, is the -1 WEOF?

If you insert this in the loop, does it change anything? (line 77-ish)

if (c == WEOF) { break; }

from apertium-separable.

mr-martian avatar mr-martian commented on June 9, 2024

Also, inserting an extra EOF character into the stream could explain the difference in behavior between a pipe and a single tool. If lt-proc -b has a similar issue and lsx-proc is here inserting an extra EOF into the stream then lt-proc could be reading EOF but seeing that feof(input) is false and so trying to output it and having it cast to ? because -1 isn't an actual character.

from apertium-separable.

unhammer avatar unhammer commented on June 9, 2024

@ftyers @MarcRiera @hectoralos is it fixed for you in the newest version?

from apertium-separable.

MarcRiera avatar MarcRiera commented on June 9, 2024

@ftyers @MarcRiera @hectoralos is it fixed for you in the newest version?

Yes, after building from source I no longer get the extra ?. Thanks!

from apertium-separable.

TinoDidriksen avatar TinoDidriksen commented on June 9, 2024

Don't need to build from source - it's already in nightly.

from apertium-separable.

hectoralos avatar hectoralos commented on June 9, 2024

@ftyers @MarcRiera @hectoralos is it fixed for you in the newest version?

Yes, there's no extra "?" now. Thanks a lot, @unhammer!

from apertium-separable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.