Comments (15)
Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).
arbtt should handle unicode properly, and output it in whatever locale your system is running. On Linux, I’d say “make sure that LANG
is set to a UTF8 locale”...
Do you get the weirdness only when piping the output to a file or program, or also when you run it as it is?
from arbtt.
Original comment by amenthes (Bitbucket: amenthes, GitHub: amenthes).
It would appear that the output is ISO-8859-1
on my windows machine, when piped to another command or file. Currently I have to detect the encoding at runtime and convert to utf-8.
I guess, now i have two conversions, one by arbtt-stats (internal to iso-8859-1) and one by my script (iso-8859-1 to utf-8). The conversion to iso will probably be lossy, there's a bunch of characters it can't display. I't love to request a mode where i can force arbtt-stats to output utf-8
regardless of locale and other environment settings.
from arbtt.
Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).
If it is non-trivial to set it via environment variables, I might add a command line flag... but I’m surprised this is so hard.
Have you tried issuing chcp 65001
before running arbtt? According to http://stackoverflow.com/a/388500/946226 this should set the code page to utf8.
from arbtt.
Original comment by amenthes (Bitbucket: amenthes, GitHub: amenthes).
chcp does not seem to have an effect. My terminal happily tells me, that i'm on that codepage, now. But it still outputs ü as 0xFC (ISO-8859-1 or Windows-1252, as both would look identical in that area).
produces this byte sequence:
from arbtt.
Original comment by amenthes (Bitbucket: amenthes, GitHub: amenthes).
I am able to convert this in the receiving script, now. I'm auto-detecting the encoding and always convert to utf-8. This way I was able to import ~10.000 window titles, ~400 of which also contained german umlauts. Still, I think it would make a nice addition, especially when using arbtt-stats as a stepstone in a custom chain of tools.
The current handling works very well in the command line. I have never had a problem with that. I do not want that to change.
from arbtt.
Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).
Of course, the question is first: Does arbtt actually save it correctly internally? It cold well be that the screen capture is wrong...
On the other hand, that’s unlikely, as it would then cause mojibake when printing.
Maybe the problem disappears when I mange to make a new windows release that is then built with a new version of GHC and the base libraries.
from arbtt.
Original comment by amenthes (Bitbucket: amenthes, GitHub: amenthes).
I was using a build from the current head (7e3b5a7) and used
dist\build\arbtt-capture\arbtt-capture.exe -f unicode.stuff
to capture the window title of this website in firefox: https://www.qnap.com/i/de/news/con_show.php?op=showone&cid=416 which reads "QNAP unterstützt Kodi – ehemals XBMC - zur Multimedia-Wiedergabe"
both arbtt-dump and arbtt-stats (same build) have problems with this:
> dist\build\arbtt-dump\arbtt-dump.exe -f unicode.stuff
2015-10-14 19:57:44 (0ms inactive):
( ) [redacted for privacy reasons]
( ) \Device\HarddiskVolume2\Program Files (x86)\Mozilla Firefox\firefox.exe: QNAP unterstützt Kodi arbtt-dump.exe: <stdout>: commitBuffer: invalid argument (invalid character)
The output stops there. No further lines are dumped.
Please note that the title reads just fine in the terminal. When i write the same output to a file, this happens:
> dist\build\arbtt-dump\arbtt-dump.exe -f unicode.stuff > unicode.stuff.dump.txt
arbtt-dump.exe: <stdout>: commitBuffer: invalid argument (invalid character)
(same error and termination of program)
The "ü" is converted to 81
, which is valid in Codepage 850. This is also what my terminal is set to.
If i switch my terminal to chcp 65001
, the "ü" becomes c3bc
-> which is actually valid utf8. The dump will run through as expected. So in that case, everything is well.
arbtt-stats
is also working after issueing a codepage 65001. Interestingly, it does not have the codepage 850 problem. It will work correctly in both cases!
So there's a small caveat that running arbtt-dump from a plain and simple terminal does not work. One has to issue the chcp 65001. I am not sure if this can be fixed, i guess many non-programmer users would find this unnerving.
from arbtt.
Original comment by amenthes (Bitbucket: amenthes, GitHub: amenthes).
(tiny correction to the post above)
from arbtt.
Original comment by amenthes (Bitbucket: amenthes, GitHub: amenthes).
There also appears to be an issue with old files, created with 0.6, it seems the encoding in the existing legacy logfile might confuse the newer arbtt-stats. I'm investigating. But this also only happens on codepage 850, so a user with that problem can work around it easily. I had no problems with a mixed legacy logfile (mixed in the sense that it was written to by both arbtt-capture 0.6 and 0.9).
from arbtt.
Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).
Hmm. I am pretty confident that the log files are fixed to utf8, and have been like that since then, so I would hope that the reading of files old and new files is not a problem.
Otherwise the behaviour is somewhat expected: The program tries to print according to the current locale (i.e. codepage), and prefers to abort rather than print invalid characters.
Is it correct that everything works fine as long as your codepage is 65001?
from arbtt.
Original comment by amenthes (Bitbucket: amenthes, GitHub: amenthes).
Yes, in CP65001, everything is fine.
from arbtt.
Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).
Ok. I’m inclined to close this, with the argument that if you want to use unicode, you need to use a unicode-aware codepage. Do you agree?
from arbtt.
Original comment by amenthes (Bitbucket: amenthes, GitHub: amenthes).
I'm fine with that, but i'd top it off with a note in the windows section of the readme. Once i understand how packaging an installer works, i might be able to contribute one. But i can't promise when i get around to doing that.
from arbtt.
Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).
Mention codepage in the windows readme.
Suggestions to improve this notice and make it easier to follow for “normal”
users are welcome. This fixes #32.
from arbtt.
Original comment by nomeata (Bitbucket: nomeata, GitHub: nomeata).
Heh, when trying to run the test suite under wine I am now stuck with the same problem, and here, I don’t even have chcp
available. I hope someone can help me at http://stackoverflow.com/questions/33156758/get-haskell-programs-to-assume-a-utf8-locale-under-wine.
from arbtt.
Related Issues (20)
- Looking for 3rd party tool HOT 4
- Parser error: unexpected end of input on empty categorize.cfg HOT 13
- arbtt-stats: Non-exhaustive patterns in function renderReportText HOT 2
- arbtt-capture.desktop[PID]: Xlib: extension "MIT-SCREEN-SAVER" missing on display ":0". HOT 19
- Error loading shared lib on system update HOT 2
- arbtt-stats --dump-samples uses wrong timezone HOT 5
- Cant install on Fedora 36 HOT 3
- Consider adding to stackage HOT 3
- arbtt-stats gives invalid Unicode error HOT 1
- Cannot import Arbtt as a library HOT 3
- arbtt-stats fails if window title very long with unmatched parenthesis HOT 7
- Better error reporting for bad regular expressions
- log every window except the pure/empty desktop HOT 3
- [BUG] Error calling arbtt-stats on Debian 11 (Timelog starts with unknown marker) HOT 3
- Outdated Windows binaries HOT 1
- Is there any tutorial? HOT 6
- Link missing HOT 1
- how to use arbtt-dump and arbtt-import HOT 1
- MacOS error building pcre-light dependency
- debian install instructions incorrect HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arbtt.