Comments (14)
If we combine it with apertium/lttoolbox#104 we'll be back to zero =D
from apertium-separable.
$ echo "rumal rech che" | apertium -d . quc-spa-separable
^rumal rech<cnjadv>$ ^chi<pr>$ ^ech<n><rel><px3sg>$^.<sent>$
Seems to work fine for me.
from apertium-separable.
Happening on my side as well, with a different pair:
$ echo "hello" | apertium -d . eng-cat-autoseq
^hello<n><sg>$^.<sent>$
?
from apertium-separable.
It is happening in both apertium-fra-cat and apertium-fra-frp. For instance, in the first one:
$ echo "^maison<n><f><sg>$" | od -An -vtu1
94 109 97 105 115 111 110 60 110 62 60 102 62 60 115 103
62 36 10
$ echo "^maison<n><f><sg>$" | lsx-proc fra-cat.autosep.bin | od -An -vtu1
94 109 97 105 115 111 110 60 110 62 60 102 62 60 115 103
62 36 10 63
(apertium-separable has nothing to do with "maison", but simply passing it to output; instead...)
from apertium-separable.
I'm getting the ? if use a locale other than C.UTF-8
(even utf8 locales):
$ locale -a|while read -r LANG; do echo "$LANG"; echo '^.<sent>$[][\n]' | lsx-proc nob-nno.autoseq.bin | grep -c '?' ;done
C
1
C.UTF-8
0
POSIX
1
en_AG
1
en_GB.utf8
1
en_US.utf8
1
nn_NO.utf8
1
(since the ? is at the very end of the output without any trailing newline, my terminal doesn't always show it, but grep gives the correct answer)
from apertium-separable.
https://github.com/apertium/apertium-separable/blob/master/src/lsx_processor.cc#L164 pushes the ? into blank queue while
https://github.com/apertium/apertium-separable/blob/master/src/lsx_processor.cc#L157 puts it into parts[0]
from apertium-separable.
So it seems like it's a -1? If I do
diff --git a/src/lsx_processor.cc b/src/lsx_processor.cc
index 5c9aec7..555a47b 100644
--- a/src/lsx_processor.cc
+++ b/src/lsx_processor.cc
@@ -172,6 +172,12 @@ LSXProcessor::processWord(FILE* input, FILE* output)
if(lu_queue.size() == 0)
{
readNextLU(input);
+ while(!feof(input)) {
+ wchar_t c = fgetwc_unlocked(input);
+ wprintf(L"{0x%04x}", c);
+ fputwc_unlocked((int)c, output);
+ fputwc_unlocked(L'\n', output);
+ }
}
if(at_end && lu_queue.size() == 1 && lu_queue.back().size() == 0)
{
and echo '^.<sent>$' | src/lsx-proc nob-nno.autoseq.bin
, I see
{0xffffffff}?
^.<sent>$
from apertium-separable.
I notice NUL handling is different in separable from other tools, not sure if that's relevant; most of the other tools use
while(!feof(input) && val != 0)
{
val = fgetwc_unlocked(input);
while separable uses
while(!feof(input))
{
wchar_t c = fgetwc_unlocked(input);
if(null_flush && c == L'\0')
{
at_end = true;
at_null = true;
break;
}
from apertium-separable.
That difference is because when I wrote it I was copying structure from -recursive and I don't remember if the original reason was that -recursive is more complicated or if it's just that I liked that structure better.
Oh, is the -1
WEOF
?
If you insert this in the loop, does it change anything? (line 77-ish)
if (c == WEOF) { break; }
from apertium-separable.
Also, inserting an extra EOF
character into the stream could explain the difference in behavior between a pipe and a single tool. If lt-proc -b
has a similar issue and lsx-proc
is here inserting an extra EOF
into the stream then lt-proc
could be reading EOF
but seeing that feof(input)
is false and so trying to output it and having it cast to ?
because -1
isn't an actual character.
from apertium-separable.
@ftyers @MarcRiera @hectoralos is it fixed for you in the newest version?
from apertium-separable.
@ftyers @MarcRiera @hectoralos is it fixed for you in the newest version?
Yes, after building from source I no longer get the extra ?
. Thanks!
from apertium-separable.
Don't need to build from source - it's already in nightly.
from apertium-separable.
@ftyers @MarcRiera @hectoralos is it fixed for you in the newest version?
Yes, there's no extra "?" now. Thanks a lot, @unhammer!
from apertium-separable.
Related Issues (20)
- README says zlib is required, configure.ac doesn't; which is right? HOT 6
- let's drop --enable-debug? everyone else just overrides C{,XX}FLAGS
- Issue with blanks HOT 2
- LU doesn't delete after combining HOT 5
- Needs Python 2to3 conversion
- Compile error because of missing method HOT 2
- lsx-proc eats final blank
- Error: Trying to link nonexistent states HOT 3
- How to keep caps? HOT 6
- rule-initial <w/> can make other rules match HOT 1
- Is it possible to enforce a space? HOT 7
- lsx-comp compiling error HOT 4
- lsx-comp not running as before HOT 4
- apertium-filter-rules for lsx files HOT 14
- weights HOT 2
- Tags on individual unchanged LU's are spread across the whole matching rule HOT 4
- Can't compile on Manjaro linux ARM HOT 3
- Possible to use for matching on forms? HOT 7
- lsx-comp --trace option that inserts (top-level rule) line numbers in output HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apertium-separable.