Comments (18)
Original reporter: ross@
To handle non-ASCII characters in the source, you need to decide which encoding it is in. There is the encoding-independent workaround of using &#nnn; in the source.
The generated HTML doesn't need encoding, as non-ASCII characters are rendered as numeric entities by stringToHtmlString.
from haddock.
Original reporter: dav.vire+haskell@
The workaround of using &#nnn; in the source is not usable. The comment becomes totally unreadable, in the case of comments in a foreign langage it's a real problem.
Haddock should at least be able to handle UTF-8 encoding of the source file without mangling the HTML output.
from haddock.
Original reporter: yuriks.br@
This is a major pain in the ass for anyone who isn't coding in english, I'm bumping this up.
from haddock.
Original reporter: leonelfl@
Non english programmers need this.
While not related to Haskell language capabilities, the fact of having tools that work universally gives credibility to the whole platform.
UTF-8 support is necessary. It must be stressed out that other people are programming, explaning their programs and having interfaces in languages others than english. They do this naturally and expect to so without any inconvient.
Let's do stop thinking that Haskell is just for Haskell programmers that program for fun and that are willing to show each others results in english (lingua franca). Haskell platform components need to usable in environments which purpose is not Haskell itself.
from haddock.
Original reporter: ppavel
I vote for this. I'm willing to hack but will need some directions to get started
from haddock.
Original reporter: david.waern@
Replying to [comment:5 ppavel]:
I vote for this. I'm willing to hack but will need some directions to get started
Hi Pavel,
I've looked at this briefly and I think it could be related to the fact that we use alexGetChar
in the GHC lexer where we should alexGetChar'
instead. You could try changing that and see if it helps.
The lexer is in compiler/parser/Lexer.x
in the GHC source tree. Look for functions that read Haddock comments such as multiline_doc_comment
, nested_doc_comment
, etc.
from haddock.
Original reporter: ddssff@
I don't think alexGetChar' exists any more.
from haddock.
Original reporter: pho@
I vote for this too.
Personally I stick using English in docs while my native language is Japanese, but I'm really fond of UnicodeSyntax. I want to use UnicodeSyntax in code examples, not only the code itself.
from haddock.
Original reporter: marlowsd@
Alex 3 can lex UTF-8 directly, which might make this easier. I made the changes to Haddock to make it work with Alex 3, ut I didn't add Unicode support at the time, because I wanted to keep it working with Alex 2.
from haddock.
Original reporter: david.waern@
Simon,
I made modifications to the GHC lexer so that Unicode characters are preserved in the comments fed to the Haddock lexer. I then tested with a simple Unicode comment and I can see that it appears in the documentation without getting mangled by the Haddock lexer.
However I'm assuming by your last comment that something still needs to be done in the Haddock lexer for this to work 100%. Do you think we could drop compatibility with Alex 2 by now, and if so could you explain what needs to be done in the lexer?
from haddock.
Original reporter: marlowsd@
The comments from GHC are lexed again by Haddock using an Alex lexer, and I would expect that step to mangle the Unicode. From src\Lex.x
:
alexGetByte :: AlexInput -> Maybe (Word8,AlexInput)
alexGetByte (p,c,[]) = Nothing
alexGetByte (p,_,(c:s)) # let p'alexMove p c
in p' `seq` Just (fromIntegral (ord c), (p', c, s))
-- for compat with Alex 2.x:
alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
alexGetChar i = case alexGetByte i of
Nothing -> Nothing
Just (b,i') -> Just (chr (fromIntegral b), i')
You can see we apply ord
in alexGetByte
and chr
again in alexGetChar
, so Unicode should be squashed to the low 8 bits.
from haddock.
Original reporter: selinger@
I agree that this should be fixed. It would be better to assume that all files are UTF8 than to assume all files are ASCII.
Either way, users that use another encoding first have to do an offline conversion before invoking Haddock. But conversion from, say, Latin1 to UTF8 is trivial to do, whereas conversion from Latin1 to ASCII with HTML entities requires offline parsing: non-ASCII characters in Haddock comments must be converted to HTML entities, and non-ASCII characters in the code itself must be converted to something else (UTF8?), because Haddock will croak if it encounters an HTML entity in the code itself.
Moreover, the current HTML entities encoding does not even work correctly; see bug #191.
from haddock.
Original reporter: sol@
I can reproduce this with Haddock 2.9.2, the version of Haddock that ships with GHC 7.4.0.20111219 produces proper HTML entities for codepoints outside the ASCII range.
Are there still any issues left? And if yes, how would a minimal test case look like?
from haddock.
Original reporter: alex-voikov@
Replying to [comment:14 SimonHengel]:
I can reproduce this with Haddock 2.9.2, the version of Haddock that ships with GHC 7.4.0.20111219 produces proper HTML entities for codepoints outside the ASCII range.
Are there still any issues left? And if yes, how would a minimal test case look like?
Haddock version 2.12.0
-- | Это модуль mytime
module MyTime (Time(..),testFunc) where
-- ^ Тип данных время
data Time = Time{ hour :: Int -- ^ Часы
, mins :: Int -- ^ Минуты
}
deriving(Show)
-- |Тестовая функция, которая всегда возвращает 42
testFunc :: String -- ^ строка
-> Int -- ^ возвращает число
testFunc x = 42
$ haddock 3.hs -html
Haddock coverage:
doc comment parse failed: Тип данных время
doc comment parse failed: Тестовая функция, которая всегда возвращает 42
doc comment parse failed: строка
doc comment parse failed: возвращает число
33% ( 1 / 3) in 'MyTime'
from haddock.
Original reporter: batterseapower
These patches implement support for this in Haddock by using Alex 3's native Unicode support.
from haddock.
Sorry if this is the wrong bug to report in, but this is what Github came up with for search results.
Using Haddock that ships with GHC 7.8.2, if I build with LANG=C
(done so by my distribution's package manager), then I still get issues like this: UnkindPartition/tasty-golden#10
from haddock.
Fwiw, I believe this is fixed in GHC HEAD
from haddock.
As noted on haskell/cabal#1721, from Haddock 2.15.0 cabal and Haddock will enforce UTF-8.
If absolutely necessary, this could be backported into the 2.14.3 but as it requires co-ordination with cabal and backporting is a pain, I'd rather not.
from haddock.
Related Issues (20)
- non-local javascript
- New annotation request: @include HOT 2
- Feature proposal: Mermaid diagrams HOT 1
- haddock: internal error: ..Cabal.../share/doc/html/doc-index.json: openBinaryFile: does not exist HOT 7
- Support GHC 9.6.1 HOT 3
- Linearity hidden in multi-line type signatures HOT 1
- hlint CI chokes on `MIN_VERSION_ghc(9,5,0)` (?) HOT 2
- Can't un-collapse collapsible example on chunk of documentation
- Unexpected result when using closing backtick for hyperlinked identifiers that end with single quote
- How to tell from Haddocks whether a data type is a re-export? HOT 3
- Can't install haddock through cabal HOT 4
- Pretty Printing of Types in Declarations
- Haddock panic when combined with "type data"
- MathJax 3 support
- Update readthedocs (by pushing tags?)
- Same line Haddock documentation
- Confusing rendering of 'ToJSON' reference in Data.Aeson.TH module docs (Haddock 2.20.0+ ?)
- Private constructor with `NoFieldSelectors` hides all the fields
- Haddock renders ambiguous type/value names strangely HOT 2
- support for multiline text doctest properties
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from haddock.