Comments (7)
This is a comment for GSoC students trying to solve this bug.
To solve any debugging task, I suggest that you:
-
find some webpages where this issue can be reproduced. And run MindTheWord on these pages to test it.
-
debug the extension by using Google Chrome's Javascript Console. (You can use "console.log(...)", "console.debug()" and "console.error(...)" to log messages while MindTheWord is executed, and you can try to find the bug by analyzing the messages that you leave to yourself.)
For this particular issue, the bug could be in the "invertMap" function, which is responsible for wrapping a span with the "mtwTranslatedWord" CSS style around translated words. It seems that, for unknown reasons, not all words are being wrapped. Or maybe the bug is in the "processTranslations" function, where the "invertMap" function is called.
from mindtheword.
Here is another example screenshot provided by another user:
from mindtheword.
Strangely, both examples above happen when a definite article (i.e. "a" or "los") is translated before the sentence that is translated but not highlighted. In the second example, this examples several times, almost always after "los" (e.g. "los cerebros", "los investigadores", "los animales", "los fallecidos", "los atletas").
from mindtheword.
It would be useful to know if the users were using the advanced feature that allows the translation of sequences of words. If they are, a possible explanation could be the following:
- MTW translates {"the --> los", "the animals" --> "los animales"}.
- Then it searches and back-translates/highlights the translated words using the inverted translation map {"los"--> "the", "los animales"--> "the animals"}. Consequently, the "los" in "los animales" will be highlighted (and wrapped by "...") and the whole sequence "los animales" will not be found anymore.
from mindtheword.
Hi, I'm a GSOC applicant. I believe I've found the issue and will be submitting a PR to resolve it.
1. Environment
I forked and cloned the repo and installed it in developer mode in Chrome. I added a Yandex API key and made no further modifications. I set English to Spanish translation at 75%.
2. Reproducing the bug:
I went to the same page as the screenshot above: https://www.technologyreview.com/s/600691/new-collar-promises-to-keep-athletes-brains-from-sloshing-during-impact/. I noticed a recurring issue at the excerpt:
both of which tolerate repetitive, high-impact blows to the head
It was getting translated to:
tanto of que tolerate repetitive, de alto impacto blows to the la cabeza
Here, Yandex translates high-impact
to de alto impacto
. Not a perfect translation, but that's not the issue at hand here.
But only de alto impacto only has de highlighted:
3. Investigating
I'm new to the codebase so I started tracing through the functions replaceAll
, invertMap
, processTranslations
, and deepHTMLReplacement
. I found the following was happening for the paragraph in question.
At line 38 in MindTheWord.js
, replaceAll
is called with the paragraph text (with English->Spanish replacements already made), and the inverted translation map iTMap
. iTMap
contains a key for de
and de alto impacto
.
Because of the way rExp
is constructed, the regular expression for de
will come before the regular expression for de alto impacto
.
This means the replacement at line 73 in MindTheWord.js
occurs first for de
and then for de alto impacto
. When the replacement for de
is made, de alto impacto
turns into <span ... >de</span> alto impacto
. So then the replacement for de alto impacto
.
In more general terms: say you have a string aaabbbccc
and you need to match/replace both bbb
and abbbc
. If you do the replacement for bbb
first, then you can no longer match abbbc
.
4. Solution
It's arguable that the replaceAll function can be improved to handle this scenario in a different way. However, I found that a straightforward solution to this problem is to sort the source words in descending order by length in the replaceAll
function before the rExp
string is constructed. This will ensure that de alto impacto
is replaced before de
is replaced.
The PR I am submitting does exactly that, and at least for this case, the resulting output is correct:
and it toggles the phrase correctly:
5. Other thoughts
- There are presumably other problems with
replaceAll
that will need to be resolved (like the whitespace hack), and this fix does not take those into account. - This of course is computationally more expensive. I would say that it will only be noticeable as the number of words becomes extremely large.
- The unit tests pass for my PR, but I assume unit tests will have to be written for this specific case.
TLDR: sort the keys from translationMap in descending order by their length in the replaceAll
function to avoid issues with substrings of longer strings preventing the longer strings from being formatted.
from mindtheword.
Excellent explanation and elegant solution!
Do you have any idea how to improve the whitespace hack?
from mindtheword.
@ceilican Thanks! I have some thoughts but for now I am writing the GSOC application so I will look after I send you my first draft.
from mindtheword.
Related Issues (20)
- Dropdown width is varying HOT 3
- When a new pattern is created, the new pattern should be automatically selected HOT 4
- Blocking of Duplicate Patterns should not be silent HOT 1
- Social Media icons in options page should not be underlined when mouse hovers HOT 2
- Wrong visual hints HOT 1
- Breaking user defined translations HOT 1
- Make "one word per sentence" option local (per pattern)
- Blacklisted words are getting translated HOT 4
- Esperanto is available in Yandex's website, but not in MTW when Yandex is selected
- When Options page is opened, some words are spoken HOT 3
- Dropdown for "Voice Name" does not fit in the "Playback Settings" box HOT 3
- Use the alternative API for Google Translate HOT 12
- It appears as MindTheWord is not working on some Australian news sites HOT 5
- No Hindi translation listed with yandex api
- Add keyboard shortcuts for the MTW functionality HOT 7
- Update changes from dist folder to lib folder HOT 1
- Error: (SystemJS) XHR error on Linux HOT 12
- translation Russian>Spanish not working for some russian sites HOT 1
- Move Bing Translation to Azure Translation HOT 1
- Pressing Enter on User Defined Translations Saves the Current Input
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mindtheword.