lucker6666 / pseudolocalization-tool Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/pseudolocalization-tool
License: Apache License 2.0
Automatically exported from code.google.com/p/pseudolocalization-tool
License: Apache License 2.0
This library provides a tool and an API to perform pseudolocalization. Pseudolocalization generates a fake translation of messages of a program, which helps to highlight weaknesses and bugs in the original program regarding localization. The library includes a structured message API to allow it to be used for complex multi-part messages, and includes the following pseudolocalization methods: - accenter: replaces US-ASCII characters with accented versions, to make it obvious if parts of the output are hard-coded in the program and can't be localized - brackets: adds [brackets] around each message, to show where messages have been concatenated together. This is a localization problem because some languages may need to reorder phrases or the translation may change depending on what is around it. - expander: makes each message longer, to show where the UI doesn't give enough space for languages that result in longer strings, and either wraps awkwardly or truncates. - fakebidi: produces fake Right-to-Left text, using the original source text and wrapping LTR text with RTL markers, so that it renders as if it were RTL text but is still mostly readable to someone who doesn't speak Arabic or Hebrew. These methods can be combined in any order and with user-written methods. In addition, HTML tags can optionally be preserved (it is not recommended to give them to the translator, but especially simple tags show up in translatable text frequently). These can also be accessed via locale variant subtags, which we hope to get standardized. A variant subtag of psaccent corresponds to accenter, expander, and brackets (in that order), and a variant subtag of psbidi corresponds to fakebidi. Initially this project consists of just a library to be used by other tools, but eventually it will provide a command-line tool for generating pseudolocalized message files that can be used just like real translated files in your build process. Integration with GWT is also planned. Dependencies: ============= This project needs JUnit and htmlparser -- compatible versions are included in the lib directory, or you can supply your own version. Additional Credits: =================== The original implementations this library is based upon were written by Jerome Flesch while an intern at Google.
What steps will reproduce the problem?
1. psesudolocalize " Hi I don't want this {place_holder} to be psesudolocalized
"
2.
3.
What is the expected output? What do you see instead?
The Original text :Hi I don't want this {place_holder} to be psesudolocalized
The Modified text by psaccent
:[?î?Î?ðöñ´???åñ????îš?(þ?åçé??ö?ðé?)??ö??é?þšéšûðö?ö
çå?îžéð? one two three four five six seven]
The Modified text by psbidi :?Hi? ?I? ?don?'?t? ?want? ?this?
{?place?_?holder?} ?to? ?be? ?psesudolocalized?
Expected output::
?Hi? ?I? ?don?'?t? ?want? ?this? {place_holder} ?to? ?be? ?psesudolocalized?
What version of the product are you using? On what operating system?
Latest
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 10 Oct 2013 at 5:37
In addition to the API, we need to provide a command-line tool to allow easy
pseudolocalization of existing files, such as Java property files.
Original issue reported on code.google.com by [email protected]
on 14 Jun 2011 at 9:34
What steps will reproduce the problem?
1.
2.
3.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
Please provide any additional information below.
Our app has a lot of Ext JS based UI and I would like to test localization.
Right now we use Selenium and do a functional test approach which I feel is
just too much for the kind of test cases that we run. How do I go about
starting? Say a simple login page which has a username, password and login
button? With it's own CSS files?
Original issue reported on code.google.com by [email protected]
on 13 Feb 2013 at 6:22
What about supporting digits here?
AFAIK this is also used on Android zz_ZZ and AFAICS there is not accented
digits or something like that but it is needed IMO. Native digits generally are
generated via String.format() on Android.
Original issue reported on code.google.com by [email protected]
on 15 Feb 2014 at 8:32
HtmlPreserver currently assumes all HTML tags and their attributes are
non-localizable. However, tags like <input type="submit" value=" Submit "/>
have localizable text in attributes.
Fixing this will probably require some table of tag/attributes that should be
considered localizable, and may complicate generating the non-localizable text
fragments.
Original issue reported on code.google.com by [email protected]
on 14 Jun 2011 at 9:36
The fake bidi method can produce output that even more closely resembles real
RTL text by adding an RLM before each RLO and after each PDF. For example,
where currently for "hello world" it produces "\u202Ehello\u202C
\u202Eworld\u202C", it would now produce "\u200F\u202Ehello\u202C\u200F
\u200F\u202Eworld\u202C\u200F".
While most of the time the visual output would be identical, adding the RLMs
has two advantages:
1. The first-strong directionality estimation method, as specified in the
Unicode Bidirectional Algorithm's rules P2 and P3
(http://www.unicode.org/reports/tr9/#P2), would then decide that fake bidi text
is RTL; currently it decides that it is LTR. As a result, fake bidi text
currently does not behave in the same way as real RTL text (e.g. Hebrew or
Arabic) in contexts like Android TextViews and HTML's dir="auto" attribute,
which use the first-strong algorithm. Adding the RLM would fix this discrepancy.
2. When a message contains a placeholder followed by a localizable text
fragment that begins with a strong character (not a neutral character like a
space or punctuation), and the placeholder ends in a number, the visual
ordering that currently results for fake bidi localization is not equivalent to
that resulting for a real RTL translation: in an RTL context, with fake bidi,
the number appears to the left of the text fragment; with real RTL text, the
number appears to the right. For example, let's say that the placeholder value
is "12" and the localizable text fragment is "hello". Then, when fake bidi
changes the "hello" into "\u202Ehello\u202C", the overall output is
"12\u202Ehello\u202C". You can see the visual ordering specified for that by
the Unicode Bidi Algorithm in an RTL paragraph here:
http://unicode.org/cldr/utility/bidi.jsp?a=12%E2%80%AEhello%E2%80%AC&p=RTL; the
number is on the left. However, if the text fragment were the Hebrew character
alef, "\u05D0", and thus the whole string were "12\u05D0", the number would
come out on the right:
http://unicode.org/cldr/utility/bidi.jsp?a=12%D7%90&p=RTL. This is fixed by
adding the RLMs to fake bidi: "12\u200F\u202Ehello\u202C\u200F" is displayed
with the number on the right, as with real RTL text
(http://unicode.org/cldr/utility/bidi.jsp?a=12%E2%80%8F%E2%80%AEhello%E2%80%AC%E
2%80%8F&p=RTL). The same issue occurs when a placeholder follows a localizable
text fragment that ends in a strong character; this is why I am suggesting not
only to put an RLM before the RLO, but also to put an RLM after the PDF. One
may think that it is strange to have a placeholder come immediately before or
after strong text, not a neutral like a space or punctuation; text like "hello:
12" or "12: hello" is a lot more common than "hello12" or "12hello". However,
the same issue occurs (and is fixed by the RLMs) when between the placeholder
and the localizable text fragment is a nonlocalizable text fragment containing
markup that introduces a space between the two, e.g. "<span style='padding:
5px'>", and this is unfortunately a fairly common practice in HTML.
Original issue reported on code.google.com by [email protected]
on 7 Aug 2014 at 8:53
Suppose you have a string like this:
duplicatesRemovedFragment={0,plural,one{{0} duplicate removed}other{{0} duplicates removed}}
In version 0.2 it gets mangled to this:
duplicatesRemovedFragment={0,plural,one{{0} \u202Eduplicate\u202C \u202Eremoved\u202C}\u202Eother\u202C{{0} \u202Eduplicates\u202C \u202Eremoved\u202C}}
Oddly, the "one" keyword remains untouched (suggesting that the tool does
somehow understand that it's a special keyword) yet the "other" keyword has
been mangled, so at runtime, you get this error:
Missing 'other' keyword in plural pattern in "{0,plural,one{{0} du ..."
Original issue reported on code.google.com by trejkaz
on 30 Jul 2014 at 4:28
Expander currently adds up the entire length of all message fragments it sees.
However, some of these are alternate forms of part/all of the message under a
VariantFragment, which leads to the expansion being overly large.
Intead, it should keep track of VariantFragment structure and remember the
longest such fragment in computing the total length.
Original issue reported on code.google.com by [email protected]
on 14 Jun 2011 at 10:22
The replacements used for & (0x26), * (0x2A), 0x3B (;), and 0x40 (@) are either
problematic on some browser/OS combinations or unrecognizable as replacements
of the original character.
Original issue reported on code.google.com by [email protected]
on 14 Jun 2011 at 10:02
Using pseudolocalization-tool (actually, my own ruby-based clone of the tool
which I had to write because of issue #8, but which generates exactly the same
output) I created pseudolocalised copies of all of our properties files.
The result is kind of odd though. The colons attached to strings for label text
don't move to the left, but rather float on the right, so you end up with
fields like "emaN:".
Looking at your output, you have this:
Pane.nameLabel.text=\u202EName\u202C\:
You should probably move the LRO to cover the \: as well:
Pane.nameLabel.text=\u202EName\:\u202C
Doing so makes the colon appear in the proper location.
Original issue reported on code.google.com by trejkaz
on 1 Aug 2014 at 3:18
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.