lucker6666 / pseudolocalization-tool Goto Github PK

Automatically exported from code.google.com/p/pseudolocalization-tool

License: Apache License 2.0

Java 94.29% HTML 5.52% Shell 0.18%

pseudolocalization-tool's Introduction

This library provides a tool and an API to perform pseudolocalization.
Pseudolocalization generates a fake translation of messages of a program,
which helps to highlight weaknesses and bugs in the original program regarding
localization.

The library includes a structured message API to allow it to be used for
complex multi-part messages, and includes the following pseudolocalization
methods:

  - accenter: replaces US-ASCII characters with accented versions, to make
    it obvious if parts of the output are hard-coded in the program and can't
    be localized
  - brackets: adds [brackets] around each message, to show where messages
    have been concatenated together. This is a localization problem because
    some languages may need to reorder phrases or the translation may change
    depending on what is around it.
  - expander: makes each message longer, to show where the UI doesn't give
    enough space for languages that result in longer strings, and either
    wraps awkwardly or truncates.
  - fakebidi: produces fake Right-to-Left text, using the original source
    text and wrapping LTR text with RTL markers, so that it renders as if it
    were RTL text but is still mostly readable to someone who doesn't speak
    Arabic or Hebrew.

These methods can be combined in any order and with user-written methods. In
addition, HTML tags can optionally be preserved (it is not recommended to give
them to the translator, but especially simple tags show up in translatable
text frequently).

These can also be accessed via locale variant subtags, which we hope to get
standardized. A variant subtag of psaccent corresponds to accenter, expander,
and brackets (in that order), and a variant subtag of psbidi corresponds to
fakebidi.

Initially this project consists of just a library to be used by other tools,
but eventually it will provide a command-line tool for generating
pseudolocalized message files that can be used just like real translated files
in your build process. Integration with GWT is also planned.


Dependencies:
=============
This project needs JUnit and htmlparser -- compatible versions are included
in the lib directory, or you can supply your own version.


Additional Credits:
===================
The original implementations this library is based upon were written by
Jerome Flesch while an intern at Google.

pseudolocalization-tool's People

Contributors

pseudolocalization-tool's Issues

Is there any escape chars available if I want selective text (e.g. Placeholder) not to be psesudolocalized

What steps will reproduce the problem?
1. psesudolocalize " Hi I don't want this {place_holder} to be psesudolocalized 
"
2.
3.

What is the expected output? What do you see instead?

The Original text :Hi I don't want this {place_holder} to be psesudolocalized 
The Modified text by psaccent 
:[?î?Î?ðöñ´???åñ????îš?(þ?åçé??ö?ðé?)??ö??é?þšéšûðö?ö
çå?îžéð? one two three four five six seven]
The Modified text by psbidi :?Hi? ?I? ?don?'?t? ?want? ?this? 
{?place?_?holder?} ?to? ?be? ?psesudolocalized? 

Expected output:: 

?Hi? ?I? ?don?'?t? ?want? ?this? {place_holder} ?to? ?be? ?psesudolocalized? 


What version of the product are you using? On what operating system?

Latest

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 10 Oct 2013 at 5:37

Need to provide command-line tool

In addition to the API, we need to provide a command-line tool to allow easy 
pseudolocalization of existing files, such as Java property files.

Original issue reported on code.google.com by [email protected] on 14 Jun 2011 at 9:34

Document how to use pseudolocalizer for ExtJS apps

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.
Our app has a lot of Ext JS based UI and I would like to test localization. 
Right now we use Selenium and do a functional test approach which I feel is 
just too much for the kind of test cases that we run. How do I go about 
starting? Say a simple login page which has a username, password and login 
button? With it's own CSS files?

Original issue reported on code.google.com by [email protected] on 13 Feb 2013 at 6:22

Digits?

What about supporting digits here?

AFAIK this is also used on Android zz_ZZ and AFAICS there is not accented 
digits or something like that but it is needed IMO. Native digits generally are 
generated via String.format() on Android.

Original issue reported on code.google.com by [email protected] on 15 Feb 2014 at 8:32

Improve localizable text detection in HtmlPreserver

HtmlPreserver currently assumes all HTML tags and their attributes are 
non-localizable.  However, tags like <input type="submit" value=" Submit "/> 
have localizable text in attributes.

Fixing this will probably require some table of tag/attributes that should be 
considered localizable, and may complicate generating the non-localizable text 
fragments.

Original issue reported on code.google.com by [email protected] on 14 Jun 2011 at 9:36

fake bidi method can be improved by adding RLMs

The fake bidi method can produce output that even more closely resembles real 
RTL text by adding an RLM before each RLO and after each PDF. For example, 
where currently for "hello world" it produces "\u202Ehello\u202C 
\u202Eworld\u202C", it would now produce "\u200F\u202Ehello\u202C\u200F 
\u200F\u202Eworld\u202C\u200F".

While most of the time the visual output would be identical, adding the RLMs 
has two advantages:

1. The first-strong directionality estimation method, as specified in the 
Unicode Bidirectional Algorithm's rules P2 and P3 
(http://www.unicode.org/reports/tr9/#P2), would then decide that fake bidi text 
is RTL; currently it decides that it is LTR. As a result, fake bidi text 
currently does not behave in the same way as real RTL text (e.g. Hebrew or 
Arabic) in contexts like Android TextViews and HTML's dir="auto" attribute, 
which use the first-strong algorithm. Adding the RLM would fix this discrepancy.

2. When a message contains a placeholder followed by a localizable text 
fragment that begins with a strong character (not a neutral character like a 
space or punctuation), and the placeholder ends in a number, the visual 
ordering that currently results for fake bidi localization is not equivalent to 
that resulting for a real RTL translation: in an RTL context, with fake bidi, 
the number appears to the left of the text fragment; with real RTL text, the 
number appears to the right. For example, let's say that the placeholder value 
is "12" and the localizable text fragment is "hello". Then, when fake bidi 
changes the "hello" into "\u202Ehello\u202C", the overall output is 
"12\u202Ehello\u202C". You can see the visual ordering specified for that by 
the Unicode Bidi Algorithm in an RTL paragraph here: 
http://unicode.org/cldr/utility/bidi.jsp?a=12%E2%80%AEhello%E2%80%AC&p=RTL; the 
number is on the left. However, if the text fragment were the Hebrew character 
alef, "\u05D0", and thus the whole string were "12\u05D0", the number would 
come out on the right: 
http://unicode.org/cldr/utility/bidi.jsp?a=12%D7%90&p=RTL. This is fixed by 
adding the RLMs to fake bidi: "12\u200F\u202Ehello\u202C\u200F" is displayed 
with the number on the right, as with real RTL text 
(http://unicode.org/cldr/utility/bidi.jsp?a=12%E2%80%8F%E2%80%AEhello%E2%80%AC%E
2%80%8F&p=RTL). The same issue occurs when a placeholder follows a localizable 
text fragment that ends in a strong character; this is why I am suggesting not 
only to put an RLM before the RLO, but also to put an RLM after the PDF. One 
may think that it is strange to have a placeholder come immediately before or 
after strong text, not a neutral like a space or punctuation; text like "hello: 
12" or "12: hello" is a lot more common than "hello12" or "12hello". However, 
the same issue occurs (and is fixed by the RLMs) when between the placeholder 
and the localizable text fragment is a nonlocalizable text fragment containing 
markup that introduces a space between the two, e.g. "<span style='padding: 
5px'>", and this is unfortunately a fairly common practice in HTML.

Original issue reported on code.google.com by [email protected] on 7 Aug 2014 at 8:53

Tool mangles more than desired with ICU plural patterns

Suppose you have a string like this:

    duplicatesRemovedFragment={0,plural,one{{0} duplicate removed}other{{0} duplicates removed}}

In version 0.2 it gets mangled to this:

    duplicatesRemovedFragment={0,plural,one{{0} \u202Eduplicate\u202C \u202Eremoved\u202C}\u202Eother\u202C{{0} \u202Eduplicates\u202C \u202Eremoved\u202C}}

Oddly, the "one" keyword remains untouched (suggesting that the tool does 
somehow understand that it's a special keyword) yet the "other" keyword has 
been mangled, so at runtime, you get this error:

    Missing 'other' keyword in plural pattern in "{0,plural,one{{0} du ..."

Original issue reported on code.google.com by trejkaz on 30 Jul 2014 at 4:28

Expander should count only the longest variant message rather than all variants

Expander currently adds up the entire length of all message fragments it sees.  
However, some of these are alternate forms of part/all of the message under a 
VariantFragment, which leads to the expansion being overly large.

Intead, it should keep track of VariantFragment structure and remember the 
longest such fragment in computing the total length.

Original issue reported on code.google.com by [email protected] on 14 Jun 2011 at 10:22

Accenter needs better choices for a few characters

The replacements used for & (0x26), * (0x2A), 0x3B (;), and 0x40 (@) are either 
problematic on some browser/OS combinations or unrecognizable as replacements 
of the original character.

Original issue reported on code.google.com by [email protected] on 14 Jun 2011 at 10:02

Right-to-left override should include the colon

Using pseudolocalization-tool (actually, my own ruby-based clone of the tool 
which I had to write because of issue #8, but which generates exactly the same 
output) I created pseudolocalised copies of all of our properties files.

The result is kind of odd though. The colons attached to strings for label text 
don't move to the left, but rather float on the right, so you end up with 
fields like "emaN:".

Looking at your output, you have this:

  Pane.nameLabel.text=\u202EName\u202C\:

You should probably move the LRO to cover the \: as well:

  Pane.nameLabel.text=\u202EName\:\u202C

Doing so makes the colon appear in the proper location.

Original issue reported on code.google.com by trejkaz on 1 Aug 2014 at 3:18

lucker6666 / pseudolocalization-tool Goto Github PK

pseudolocalization-tool's Introduction

pseudolocalization-tool's People

Contributors

pseudolocalization-tool's Issues

Is there any escape chars available if I want selective text (e.g. Placeholder) not to be psesudolocalized

Need to provide command-line tool

Document how to use pseudolocalizer for ExtJS apps

Digits?

Improve localizable text detection in HtmlPreserver

fake bidi method can be improved by adding RLMs

Tool mangles more than desired with ICU plural patterns

Expander should count only the longest variant message rather than all variants

Accenter needs better choices for a few characters

Right-to-left override should include the colon

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs