Comments (7)
@andy-noisyduck
I'm bit confused when you say
"It also looks like strings are not escaped when generating POT files from the extractors"
and
"I am not using the extractors"
i.e. referenced change was only targeting the extractor, not anything in catalog/GetString*.
do you have a sample string which you are trying to extract and/or translate?
Thanks!
from gettext.net.
I think there might be 2 closely related issues here. Maybe I should have made 2 tickets to be clearer.
First... translations with escaped strings in .mo files are not being unescaped when parsed. I.e. if you have an escaped single quote \'
it will be returned including the backslash, not just the single quote. If you have an escaped newline \n
in your translation string it won't be converted to an actual new-line etc.
The GetText spec indicates that any C style escape string is valid in a translation string within a .po file. My understanding was that strings in .mo files follow the same format and should thus be un-escaped either when parsing or when returned through GetString
- although I concede there is scope for ambiguity here, as the .mo file format spec just refers to them as "translation strings" and doesn't explicitly document their format.
The existing tooling we use escapes new-line in .mo files, so I'm leaning towards unescaping them being the correct behaviour.
Secondly, a similar issue but on the input rather than output side. It looks like the extractors only escape double quotes now, and don't correctly handle new-lines, tabs etc. I wanted to caveat that we don't really use the extractors, and was something I only noticed when going through the source. It's possible I missed some handling of escape stings elsewhere.
from gettext.net.
as for the second issue, the extractor escapes non-printable characters here:
GetText.NET/src/GetText.Extractor/Engine/ParserBase.cs
Lines 148 to 151 in 146090a
however that doesn't work for double quotes, hence the fix in 85eb836. The removed code was stale copy&paste leftover from an xgettext implementation.
This should work correctly, but if you have some sample where this fails, please provide!
For the first issues, looking into it. What tooling are you using? The scenario is primarily tested with the extractor and PoEdit, from other issues I've seen that some of the older GNU commandline utilities are more peculiar and not as relaxed as Windows-based tools. As far reasonable I'm open to adapt/update.
from gettext.net.
For the first issues, looking into it. What tooling are you using? The scenario is primarily tested with the extractor and PoEdit, from other issues I've seen that some of the older GNU commandline utilities are more peculiar and not as relaxed as Windows-based tools. As far reasonable I'm open to adapt/update.
We manage translations through Phrase.com, and we use it to generate our .mo
files.
It's unclear where the problem is, whether its the generating of the .mo file or the parsing of them. Should the sequence \n
in an .mo file be a newline or the literal \n
? The somewhat ambiguous GetText spec doesn't really help here.
We can work around it in most cases by not having escape strings in the source content (.pot files), but this doesn't work for newlines or double quotes which must always be escaped. Right now we are unescaping them manually after we call GetString
but that feels a bit clunky.
from gettext.net.
this really gets tricky, and the GetText spec isn't of particular help either, but it seems that Phrase is handling escaped strings differently at least.
assuming a string like this
string test = catalog.GetString("This is a long string with \nline break");
extracting this, ie using xgettext, PoEdit, or GetText.Extractor, will become like this in the pot template file
#: Form1.cs:34
#, csharp-format
msgid ""
"This is a long string with \n"
"line break"
msgstr ""
it's mentioned as convention but not required in the GetText documentation that line breaks in the strings, will also induce a line break in the msgid format.
#: Form1.cs:34
#, csharp-format
msgid ""
"This is a long string with \nline break"
msgstr ""
in PoEdit, looks like this
So I've created an test account at Phrase.com, and uploaded the generated pot file there - and oops - the line break is gone?!
Now the even stranger part. While a generated po-translation file from phrase looks ok (except it does not break the strings itself, but as mentioned above that's optional in the documentation, it does escape the strings when compiling to mo:
vs. (using PoEdit)
with that escaping, the translation will not even be found, because the MessageId (original string to look for) was "This is a long string with \nline break" but in the compiled catalog only find "This is a long string with \nline break".
While I originally would have no opinion which behavior is correct, I think the phrase.com output is just wrong by escaping the strings both for the message id (as it breaks the whole lookup) as well the translation as it breaks formatting behavior.
If you have a business relationship with phrase, I'd ask you to open an support issue with them, happy to join the conversation if needed.
Otherwise I see limited option here to update the code, if the same input produces different output from different translation engines, how should GetText.Net else handle this?
from gettext.net.
Thanks for investigating this.
The missing linebreak in Phrase is actually correct. The list on the left is a preview list of all your translated strings, similar to how Poedit displays the text without breaks in the list at the top. It's probably just not obvious with only one string. Phrase then shows the full string (including linebreaks) in the middle section when translating. I'm not sure why your example doesn't have the text here - it might be that Phrase's default mode of operation is to use keys for strings and not the source text (as Phrase supports a bunch of other non-GetText file formats where this makes sense), and the option for this wasn't set when parsing the po/pot file.
That said, Phrase is clearly mishandling .mo file output.
I probably should have done what you did, and compared the output from Phrase with the GetText command line tools. Whilst the GetText spec is a bit vague on .mo file format, using the msgfmt
tool seems a reasonable choice here as an authoritative answer. When writing the binary .mo file format msgfmt
takes any escaped strings in the .po
source and converts them to their binary equivalent (i..e the 2 character literal backslash + n \n
becomes just the newline character). I'm going to assume that's the correct behaviour, and raise this as a bug with Phrase.
That said, what is also strange is that msgfmt
doesn't correctly handle the escape sequence for a single quote. A single quote '
is a valid escape sequence when used as \'
, and although the GetText docs explicitly mentioning this as valid, xgettext will unescape them when parsing. If the 2 char sequence \'
is in the text, msgfmt
will actually return invalid control sequence
when reading the po file. When using the 4-char hex escape sequence for the the quote \x27
it parses it correctly. That seems like a bug in the GetText tooling to me.
This issue can probably be closed?
from gettext.net.
thanks for confirming my research!
I'll do some more testing around the \'
sequence, but if this is purely affecting msgfmt I assume (or hope :) ) GetText.NET is handling it correctly.
In the meanwhile closing this issue.
from gettext.net.
Related Issues (20)
- Another control properties HOT 5
- ObjectListView columns' header title doesn't localize HOT 4
- Non-localizable controls
- Language headers are not parsed HOT 8
- ContextMenuStrip is not localized HOT 3
- Extractor fails when enum description not applied on all members
- GetText.Extractor: bug with ternary operator ?: HOT 6
- Non-localizable control-content
- Unhandled exception: System.ArgumentException: The SetHandler call for command 'GetText.Extractor' is missing an Argument or Option for the parameter at position 4 HOT 2
- PathExtension requires Windows dll import HOT 6
- Bug (?) with aliases requiring member syntax HOT 2
- GetText.Extractor output is nondetererministic HOT 2
- Extractor wrongly extracts interpolated strings with strings inside of them HOT 3
- Refactor commandline options for CommandLine package upgrade limiting options to 8
- Upgrade (Extractor) to .NET 6 HOT 2
- Strong name signing assemblies HOT 2
- Wrong text extraction HOT 6
- Ignore wrongly extracted string HOT 3
- This can work with GtkSharp and glade file?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gettext.net.