davideuler / chm2pdf Goto Github PK

Automatically exported from code.google.com/p/chm2pdf

License: GNU General Public License v2.0

Python 100.00%

chm2pdf's Introduction

CHM2PDF

(c) 2007 Massimo Sandal
(c) 2007-2008 Chris Karakas <http://www.karakas-online.de>

A Python script that converts a CHM file into a single PDF file.

Usage: 
chm2pdf [options] input_filename [output_filename]

See

chm2pdf --help

for all options.

RECOMMENDED READING:
    - http://www.karakas-online.de/forum/viewtopic.php?t=10275
    - http://www.karakas-online.de/forum/viewtopic.php?t=10969


Installation:
- download the .tar.gz
- unzip it: "tar -xzvf chm2pdf-a.b.c.tar.gz"
- enter the newly created directory
- acquire root privileges
- type "python setup.py install"

Requires:
    - python
    - chmlib 
      NOTE: chmlib *must* be configured with ./configure --enable-examples
    - pychm
    - htmldoc

Optional:
    - BeautifulSoup
    
All of these should be in your Linux/Unix distribution repository :)

To contact Massimo: [email protected]
To contact Chris: [email protected]

chm2pdf's People

Watchers

chm2pdf's Issues

Images are no beeing inluded in PDF

What steps will reproduce the problem?
1. Get some CHM with images, ie a Book
2. run chm2pdf --book file.chm

Images are not included. The problem is in the images' name. I've created a
patch before, and send it to the group.

What version of the product are you using? On what operating system?
0.9 running on Ubuntu Hardy

Original issue reported on code.google.com by [email protected] on 19 May 2008 at 3:10

Attachments:

chm2pdf.patch

Chm file path cannot use spaces.

What steps will reproduce the problem?
1. Install the chm2pdf package(v 0.9) and all dependencies from Synaptic,
on Ubuntu
2. chm2pdf --book mybook.chm

The error is caused because the chm file path includes spaces.
Example: /home/myuser/Docs/Some books to read/mychm.chm

Try to gix that

What is the expected output? What do you see instead?
The spected output is a pdf file. I just see a fatal error

What version of the product are you using? On what operating system?
Version 0.9, Ubuntu package.

Please provide any additional information below.
Here is the error:

CHM2PDF_WORK_DIR = /tmp/chm2pdf/work/pbp
CHM2PDF_ORIG_DIR = /tmp/chm2pdf/orig/pbp
Removing any previous temporary files
rm: no se puede borrar «/tmp/chm2pdf/orig/pbp/*»: No existe el fichero ó
directorio
rm: no se puede borrar «/tmp/chm2pdf/work/pbp/*»: No existe el fichero ó
directorio
failed to open /home/hexbase/Escritorio/Almost
sh: cannot create /tmp/chm2pdf/work/pbp/urlslist.txt: Directory nonexistent
Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 887, in <module>
    main(sys.argv)
  File "/usr/bin/chm2pdf", line 883, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/bin/chm2pdf", line 180, in convert_to_pdf
    objective_urls=get_objective_urls_list(filename)
  File "/usr/bin/chm2pdf", line 98, in get_objective_urls_list
    flist=open(CHM2PDF_WORK_DIR+'/urlslist.txt','r')
IOError: [Errno 2] No such file or directory:
'/tmp/chm2pdf/work/pbp/urlslist.txt'

Original issue reported on code.google.com by [email protected] on 13 Dec 2008 at 2:37

Weak file path handling

What steps will reproduce the problem?
1. Look at source
2. Find lines like "CHM2PDF_WORK_DIR = CHM2PDF_TEMP_WORK_DIR + os.sep + 
basename"
3. Replace with "CHM2PDF_WORK_DIR = os.path.join(CHM2PDF_TEMP_WORK_DIR, 
basename)"

Original issue reported on code.google.com by [email protected] on 19 Jul 2009 at 6:29

orig and work directories destroyed after chm2pdf completes

What steps will reproduce the problem?
1. Run chm2pdf --verbose --extract-only <somefile.chm>
2. View output directories created
3. Run ls on /tmp directory to see if working directory has images are there 
and can be opened in viewer

What is the expected output? tmp directories are there for viewing
What do you see instead? No tmp directories exist with names given in the 
output for CHM2PDF_WORK_DIR variable.  



What version of the product are you using? 0.9.1 (from .deb file)

On what operating system? Ubuntu Linux (10.04 LTS- the Lucid Lynx)


Please provide any additional information below.

I am having troubles getting images to display for a chm-converted pdf file.  
Per the article at <http://www.karakas-online.de/forum/viewtopic.php?t=11078>, 
I have tried the --extract-only and --verbose options to get to the html files 
to see what the problem is.  While running, I can see the directories made.  
But once chm2pdf ends, the directories disappear.  

The properties for tmp (via ls -l) are drwxrwxrwxt (not sure what 't' flag is).


Output for chm2pdf and directory listings below.  'ls-l|wc -w' Called during 
and then after chm2pdf runs. (ls truncated, and filename changed)

steve@steve-laptop:~/reading_material/tmp$ chm2pdf --verbose --extract-only 
somefile.chm
CHM2PDF_WORK_DIR = /tmp/tmpJ6BPBT/somefile
CHM2PDF_ORIG_DIR = /tmp/tmpsYO78R/somefile
Correcting links in the HTML files...
steve@steve-laptop:~/reading_material/tmp$ 

<from another terminal>
steve@steve-laptop:/tmp$ ls (results truncated for better viewing)
tmp4rlrQd  tmpmDcthS  tmpuyKHhB  tmpFczhlo  tmpnr1psy  tmpZrUI2e
tmp1gXghc  tmpFI_6VZ  tmpPUEFa tmp41QHFI  tmpgnT0IT  tmpt3XD9o  
steve@steve-laptop:/tmp$ 
steve@steve-laptop:/tmp$ ls|wc -w  (chm2pdf if running)
32
steve@steve-laptop:/tmp$ ls|wc -w  (chm2pdf terminated)
30

Original issue reported on code.google.com by [email protected] on 12 Sep 2010 at 8:20

Filename with space gives error

Command:

chm2pdf --book "Filename with spaces.chm"

Log:

failed to open Filename
Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 1098, in <module>
    main(sys.argv)
  File "/usr/bin/chm2pdf", line 1092, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/bin/chm2pdf", line 318, in convert_to_pdf
    objective_urls=get_objective_urls_list(filename)
  File "/usr/bin/chm2pdf", line 116, in get_objective_urls_list
    flist=open(CHM2PDF_WORK_DIR+'/urlslist.txt','rU')
IOError: [Errno 2] No such file or directory: '/tmp/tmpowiRNl/Filename with
spaces/urlslist.txt'

Using Ubuntu Jaunty 9.04.
/usr/bin/chm2pdf version 0.9.1

Thanks

Original issue reported on code.google.com by [email protected] on 20 Aug 2009 at 5:59

Failing if file name is more than one word

Let's say i have a file "Cool Scripts.chm". Trying to convert it
in pdf will fail.
> 
> $chm2pdf "Cool Scripts.chm"
> failed to open Cool
> failed to open Cool
> Converting individual HTML pages in PDF...
> Traceback (most recent call last):
>   File "/usr/bin/chm2pdf", line 176, in <module>
>     main(sys.argv)
>   File "/usr/bin/chm2pdf", line 172, in main
>     convert_to_pdf(cfile, filename, outputfilename)
>   File "/usr/bin/chm2pdf", line 106, in convert_to_pdf
>     pf=open(page_filename,'r')
> IOError: [Errno 2] No such file or directory:
'../tempout//8015final/toc.html'
> 


To overcome this bug do the following (for you file)

$cp "Cool Scripts.chm" CS.chm
$chm2pdf CS.chm

Original issue reported on code.google.com by [email protected] on 2 Nov 2007 at 5:37

Option --book does not work

What steps will reproduce the problem?
1. Take a .chm file (don't know if it's reproducible with any .chm)
2. Run the 'chm2pdf' command using '--book' option

What is the expected output? What do you see instead?

I was expecting the PDF book :-). However, I see this error message:

sergio@miki ~/media/livros/understanding_llinux_kernel $ chm2pdf --book 
ULK.chm ULK.pdf
ERR002: Error: no pages generated! (did you remember to use webpage mode?
Something wrong happened when launching htmldoc.
exit value:  256
Check if output exists or if it is good.
Done.

What version of the product are you using? On what operating system?

sergio@miki ~/media/livros/understanding_llinux_kernel $ chm2pdf --version
/usr/bin/chm2pdf version 0.9.1

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 22 Nov 2008 at 9:40

exits without error, no pdf generated (hides htmldoc Segfault)

Program "chm2pdf" ignores errors produced during the execution of
"htmldoc", thus
returning the message "file.pdf written. Done" even when no pdf is created.

After examining the program with "strace", it seems that "htmldoc" segfaults
at some point, and this error is not captured by "chm2pdf". So they are
really two bugs.

I'll send you the concrete .CHM if it helps.


mremap(0xb76be000, 282624, 286720, MREMAP_MAYMOVE) = 0xb76be000
brk(0xa6a8000)                          = 0xa6a8000
brk(0xa6c9000)                          = 0xa6c9000
brk(0xa6ea000)                          = 0xa6ea000
mremap(0xb76be000, 286720, 290816, MREMAP_MAYMOVE) = 0xb76be000
--- SIGSEGV (Segmentation fault) @ 0 (0) ---

[...]

mmap2(NULL, 120802, PROT_READ, MAP_PRIVATE, 6, 0) = 0xb7c58000
close(6)                                = 0
write(2, "sh: line 1: 22511 Violaci\363n de segmento   htmldoc --duplex
--format \'pdf14\' --jpeg=\'100\' --linkcolor \'blue\' --header \'c C\'
--size \'a4\' --linkstyle \'plain\' --embedfonts --book --footer \'c C\' [...]


-- System Information:
Debian Release: lenny/sid
 APT prefers unstable
 APT policy: (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.24 (PREEMPT)
Locale: LANG=es_ES.UTF-8, LC_CTYPE=es_ES.UTF-8 (charmap=ISO-8859-1)
(ignored: LC_ALL set to es_ES)
Shell: /bin/sh linked to /bin/bash

Versions of packages chm2pdf depends on:
ii  htmldoc                     1.8.27-3     HTML processor that generates inde
ii  libchm-bin                  2:0.39-7     library for dealing with Microsoft
ii  python                      2.5.2-1      An interactive high-level object-o
ii  python-chm                  0.8.4-0.1+b1 Python binding for CHMLIB
ii  python-support              0.7.7        automated rebuilding support for P

chm2pdf recommends no packages.

-- no debconf information

Original issue reported on code.google.com by [email protected] on 23 Apr 2008 at 4:51

spaces on .chm filenames are not properly escaped


spaces on .chm filenames are not properly escaped, e.g:
chm2pdf --book "file with spaces.chm"


-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.25 (PREEMPT)
Locale: LANG=es_ES.UTF-8, LC_CTYPE=es_ES.UTF-8 (charmap=ISO-8859-1)
(ignored: LC_ALL set to es_ES)
Shell: /bin/sh linked to /bin/bash

Versions of packages chm2pdf depends on:
ii  htmldoc                     1.8.27-3     HTML processor that generates inde
ii  libchm-bin                  2:0.39-9     library for dealing with Microsoft
ii  python                      2.5.2-1      An interactive high-level object-o
ii  python-chm                  0.8.4-0.1+b1 Python binding for CHMLIB
ii  python-support              0.8.1        automated rebuilding support for P

chm2pdf recommends no packages.

-- no debconf information

Original issue reported on code.google.com by [email protected] on 7 Jul 2008 at 2:19

some defensive safety net for string parsing in get_objective_urls_list(filename)


Hi,

 the chm2pdf was crashing on some files I had. The problem was with the file I had and chm2pdf
was aborting midway in the generation of urls in urlslist.txt with some error 
message being added 
to urlslist.txt  too.

The appended error message contained in urlslist.txt  was causing chm2pdf to 
crash later on as

line: 119 (of trunk) 

  spline[5] wouldnt work.

adding a simple check 

 if len(spline) == 5: urls_list.append(spline[5])

takes care of it. Now it comes out more gracefully having generated as much of 
the pdf it could.

Thanks and regards

 -- sreangsu

Original issue reported on code.google.com by [email protected] on 21 Feb 2010 at 8:12

Insecure temporary file creation

Please refer to this for details. A patch can be found in the Debian source
package, please merge this into your repository.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=501959

Original issue reported on code.google.com by [email protected] on 22 Nov 2008 at 5:26

HTML Error

What steps will reproduce the problem?
1. converting to PDF
2.
3.

What is the expected output? What do you see instead?
PDF version of CHM file

What version of the product are you using? On what operating system?
latest on Ubuntu Linux 7.10

Please provide any additional information below.
upon running >  sudo /usr/bin/chm2pdf --book myChm.chm newPdf.pdf. I get
the error > ERR011: Unable to parse HTML element on line 512!
ERR002: Error: no pages generated! (did you remember to use webpage mode?
. This si odd cos the CHM in question is a BOOK, and not a webpage...

Original issue reported on code.google.com by [email protected] on 13 Feb 2008 at 7:33

OError: [Errno 21] Is a directory

What steps will reproduce the problem?
1. Normal --book conversion
2.
3.

What is the expected output? What do you see instead?
Converted PDF

What version of the product are you using? On what operating system?
0.9.1-1.1ubuntu1, ubuntu linux

Please provide any additional information below.

chm2pdf --book haha.chm 
Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 1098, in <module>
    main(sys.argv)
  File "/usr/bin/chm2pdf", line 1092, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/bin/chm2pdf", line 386, in convert_to_pdf
    correct_file(page_filename, htmlout_filename, html_list,
objective_urls, options)
  File "/usr/bin/chm2pdf", line 131, in correct_file
    pf=open(input_file,'rU')
IOError: [Errno 21] Is a directory: '/tmp/tmpr7s2RQ/haha/'

Original issue reported on code.google.com by [email protected] on 21 Oct 2009 at 10:06

Images not rendered in PDF due to upper/lower case spelling error

I have a CHM file with images, and some are not generated in the PDF. The 
reason is (again) that in windows paths and names are not case sensitive, but 
in linux they are. So basically the problem is there: a mismatch in upper/lower 
case somewhere in the CHM is enough. The CHM will display correctly in windows 
but you can't convert completely with chm2pdf.
The curious part is that in my case, the images not displayed where written 
correctly but they where in the same subdirectory with other images from other 
pages: and on one of the other pages the subdirectory was written lower case. 
So the page where images are missing in PDF is not necessarly the page where 
the mispelled upper/lowercase is, it can be on any other page. Probably what 
counts is how the path is spelled the first time it is encountered generating 
the CHM source file....
Anyone has some ideas how this could be solved automagically in chm2pdf?

Original issue reported on code.google.com by [email protected] on 18 Nov 2011 at 6:05

Errors

as per this given link 
 "http://www.karakas-online.de/forum/viewtopic.php?t=10275" i worked but i
am facing  bellow error, 
1>chm2pdf --book my-file.chm 

root@AmSi:/home/amaresh/Desktop# chm2pdf --book Glass\,Ables\ -\ Linux\
for\ Programmers\ and\ Users\ \(Prentice\,\ 2006\).chm 
CHM2PDF_WORK_DIR = /tmp/chm2pdf/work/Glass,Ables - Linux for Programmers
and Users (Prentice, 2006)
CHM2PDF_ORIG_DIR = /tmp/chm2pdf/orig/Glass,Ables - Linux for Programmers
and Users (Prentice, 2006)
Removing any previous temporary files
sh: Syntax error: "(" unexpected
sh: Syntax error: "(" unexpected
sh: Syntax error: "(" unexpected
sh: Syntax error: "(" unexpected
Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 887, in <module>
    main(sys.argv)
  File "/usr/bin/chm2pdf", line 883, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/bin/chm2pdf", line 180, in convert_to_pdf
    objective_urls=get_objective_urls_list(filename)
  File "/usr/bin/chm2pdf", line 98, in get_objective_urls_list
    flist=open(CHM2PDF_WORK_DIR+'/urlslist.txt','r')
IOError: [Errno 2] No such file or directory:
'/tmp/chm2pdf/work/Glass,Ables - Linux for Programmers and Users (Prentice,
2006)/urlslist.txt'

Waiting for solution, 

2>chm2pdf --book --title my-file.chm 
------------------
root@AmSi:/home/amaresh/Desktop# chm2pdf --book --title Glass\,Ables\ -\
Linux\ for\ Programmers\ and\ Users\ \(Prentice\,\ 2006\).chm 
CHM2PDF_WORK_DIR = /tmp/chm2pdf/work/Glass,Ables - Linux for Programmers
and Users (Prentice, 2006)
CHM2PDF_ORIG_DIR = /tmp/chm2pdf/orig/Glass,Ables - Linux for Programmers
and Users (Prentice, 2006)
Removing any previous temporary files
sh: Syntax error: "(" unexpected
sh: Syntax error: "(" unexpected
sh: Syntax error: "(" unexpected
sh: Syntax error: "(" unexpected
Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 887, in <module>
    main(sys.argv)
  File "/usr/bin/chm2pdf", line 883, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/bin/chm2pdf", line 180, in convert_to_pdf
    objective_urls=get_objective_urls_list(filename)
  File "/usr/bin/chm2pdf", line 98, in get_objective_urls_list
    flist=open(CHM2PDF_WORK_DIR+'/urlslist.txt','r')
IOError: [Errno 2] No such file or directory:
'/tmp/chm2pdf/work/Glass,Ables - Linux for Programmers and Users (Prentice,
2006)/urlslist.txt'
--------------------------

Original issue reported on code.google.com by amareshchandradas2005 on 7 May 2009 at 9:24

but no effort is done in chm2pdf to delete javascript

I was able to eliminate one of my ERR011: Unable to parse HTML element
on line xx! errors.

My CHM file contained some javascript, but no effort is done in
chm2pdf to delete javascript (some other unwanted stuff is deleted
before passing all to the htmldoc part).

I am no expert of regex, so the following may not be a good solution,
but at least in my case one ERR011 is gone!

    # Delete javascript (<script type='text/javascript'>...</script>)
    page=re.sub('(?i)<script type=("|\')text/javascript("|\')
(.*?)>(.*?)</script>','', page, flags=re.DOTALL|re.MULTILINE)

Original issue reported on code.google.com by [email protected] on 14 Nov 2011 at 9:49

TOC output problem: all headings with one word on each line

What steps will reproduce the problem?
1. Get a .chm with a TOC
2. Do a chm2pdf --webpage on it
3. TOC has linksto the right places but all whitespace has been replaced with 
newlines.

What is the expected output? What do you see instead?
The TOC should transfer as it is. Instead it transfers with all words in a 
heading on a separate 
line.

What version of the product are you using? On what operating system?
chm2pdf 0.9, OS is MacOS X 10.5.2 on Intel x86.

Please provide any additional information below.

 It is uncertain at this point which .chm files produce this broken TOC output. All the files I have 
been able to get my hands on break when converted.

Original issue reported on code.google.com by [email protected] on 4 Apr 2008 at 8:56

Centos 5.1 install convert error

What steps will reproduce the problem?
1.  Install on Centos 5.1
2. chm2pdf --book RHCEStudy.chm 

3.

What is the expected output? What do you see instead?
rm: cannot remove `/tmp/chm2pdf/orig/RHCEStudy/*': No such file or directory
rm: cannot remove `/tmp/chm2pdf/work/RHCEStudy/*': No such file or directory
sh: /tmp/chm2pdf/work/RHCEStudy/urlslist.txt: No such file or directory
Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 1111, in ?
    main(sys.argv)
  File "/usr/bin/chm2pdf", line 1107, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/bin/chm2pdf", line 326, in convert_to_pdf
    objective_urls=get_objective_urls_list(filename)
  File "/usr/bin/chm2pdf", line 114, in get_objective_urls_list
    flist=open(CHM2PDF_WORK_DIR+'/urlslist.txt','rU')
IOError: [Errno 2] No such file or directory: 
'/tmp/chm2pdf/work/RHCEStudy/urlslist.txt'


What version of the product are you using? On what operating system?
0.9.1  Centos 5.1

Please provide any additional information below.

 Trying to convert a .chm file to .pdf

Original issue reported on code.google.com by [email protected] on 9 Aug 2008 at 5:28

--book option fail with structured chm book

What steps will reproduce the problem?
1. run command on chm book
2.
3.

What is the expected output? What do you see instead?
I expect to get a structured pdf file.
Instead I get
ERR002: Error: no pages generated! (did you remember to use webpage mode?
Something wrong happened when launching htmldoc.
exit value:  256
Check if output exists or if it is good.
Done.


What version of the product are you using? On what operating system?
I am running v. 9.1 on Ubuntu Hardy Heron

Please provide any additional information below.
I saw that a similar bug was posted, but tagged invalid after instructions
to read the man page were given.  I read the man page, and this seems to be
a bug, or an issue with certain chm format books.  The chm files in
question have navigation capabilities.  Unless I am mistaken, that means
they are structured.

Original issue reported on code.google.com by [email protected] on 23 Feb 2009 at 9:18

Missing pages in some case

Hi,
I'm trying to convert a chm book to pdf. But only the first 3 pages get
converted. After some debugging, I think the problem in the PageLister class:

in the start_param() method, change the line 62:

if key=='name' and value=='Local'

to:

if key=='name' and value.lower()=='local'

solved my problem. Apparently some of the fields named 'local' but not 'Local'.

Original issue reported on code.google.com by [email protected] on 10 Jan 2009 at 10:10

SGMLParseError

What steps will reproduce the problem?
1. Installed dependences on Debian
2. Run chm2pdf --book MyFile.chm MyFile.pdf

What is the expected output? What do you see instead?
Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 887, in ?
    main(sys.argv)
  File "/usr/bin/chm2pdf", line 883, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/bin/chm2pdf", line 242, in convert_to_pdf
    correct_file(page_filename, htmlout_filename, html_list, objective_urls)
  File "/usr/bin/chm2pdf", line 118, in correct_file
    image_catcher.feed(page)
  File "/usr/lib/python2.4/sgmllib.py", line 95, in feed
    self.goahead(0)
  File "/usr/lib/python2.4/sgmllib.py", line 165, in goahead
    k = self.parse_declaration(i)
  File "/usr/lib/python2.4/markupbase.py", line 95, in parse_declaration
    decltype, j = self._scan_name(j, i)
  File "/usr/lib/python2.4/markupbase.py", line 384, in _scan_name
    self.error("expected name token at %r"
  File "/usr/lib/python2.4/sgmllib.py", line 102, in error
    raise SGMLParseError(message)
sgmllib.SGMLParseError: expected name token at
'<!\xaf\xb6\x8f\x83|(F?\xe1\x1c\xd2\xbf\xf0\x15?\xc2\x9a\xde'


What version of the product are you using? On what operating system?
chm2pdf-0.9, GNU/Linux Debian Etch

Please provide any additional information below.
None

Regards

Original issue reported on code.google.com by [email protected] on 27 Apr 2008 at 8:28

RFE: Fix Links

It'd be nice if internal links worked in the PDF.  I believe they break
because of the page-by-page approach (using 'pdftk cat') to pdf production.

Perhaps it is better to extract all pages & to use a file list with htmldoc.

See:
http://www.mobileread.com/forums/attachment.php?attachmentid=1794&d=1160611136
discussed at:
http://www.mobileread.com/forums/showthread.php?t=7999

for a system that does preserve links.

Original issue reported on code.google.com by [email protected] on 7 Sep 2007 at 1:13

Multiple pages generated instead of one


I have a CHM with 4 pages of less than 10 lines of text and cross-links (see 
example in the attachment generated expressly to reproduce problem). The PDF 
generated by chm2pdf is of 15 pages.

CHM is made with the microsoft HMTL Help Workshop 4.74.8702.0 (latest one). 
Script CHM2PDF 0.9.1.1ubuntu5 on latest ubuntu 11.10.

see --verbose output:

Example.chm:
--> /#IDXHDR
--> /#ITBITS
--> /#IVB
--> /#STRINGS
--> /#SYSTEM
--> /#TOPICS
--> /#URLSTR
--> /#URLTBL
--> /$FIftiMain
--> /$OBJINST
--> /$WWAssociativeLinks/Property
--> /$WWKeywordLinks/Property
--> /doc/Images/Param_COMP_htm_5b06edf4.bmp
--> /doc/Images/Param_COMP_htm_5b06edf4.GIF
--> /doc/Images/Param_COMP_htm_5b06edf4.PNG
--> /doc/Images/param_MS.png
--> /doc/Index.hhk
--> /doc/P1.htm
--> /doc/P2.htm
--> /doc/P3.htm
--> /doc/P4.htm
--> /toc.hhc
Correcting /tmp/tmpr5QJmW/Example/doc/P1.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P2.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P2.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P2.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P3.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P3.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P3.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P3.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P4.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P4.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P4.htm
############### 1st pass ###############
match P1\.htm and replace it with temp0001_html
match P2\.htm and replace it with temp0002_html
match P2\.htm and replace it with temp0002_html
match P2\.htm and replace it with temp0002_html
match P3\.htm and replace it with temp0005_html
match P3\.htm and replace it with temp0005_html
match P3\.htm and replace it with temp0005_html
match P3\.htm and replace it with temp0005_html
match P4\.htm and replace it with temp0009_html
match P4\.htm and replace it with temp0009_html
match P4\.htm and replace it with temp0009_html

############### 2nd pass ###############
match temp0001_html and replace it with temp0001.html
match temp0002_html and replace it with temp0002.html
match temp0003_html and replace it with temp0003.html
match temp0004_html and replace it with temp0004.html
match temp0005_html and replace it with temp0005.html
match temp0006_html and replace it with temp0006.html
match temp0007_html and replace it with temp0007.html
match temp0008_html and replace it with temp0008.html
match temp0009_html and replace it with temp0009.html
match temp0010_html and replace it with temp0010.html
match temp0011_html and replace it with temp0011.html

htmldoc --webpage --duplex --format 'pdf14' --jpeg='100' --linkcolor 'blue' 
--header 'c C' --size 'a4' --no-duplex --linkstyle 'plain' --embedfonts 
--bodyfont times --footer 'c C'  "/tmp/tmpz5hkxw/Example/temp0001.html" 
"/tmp/tmpz5hkxw/Example/temp0002.html" "/tmp/tmpz5hkxw/Example/temp0003.html" 
"/tmp/tmpz5hkxw/Example/temp0004.html" "/tmp/tmpz5hkxw/Example/temp0005.html" 
"/tmp/tmpz5hkxw/Example/temp0006.html" "/tmp/tmpz5hkxw/Example/temp0007.html" 
"/tmp/tmpz5hkxw/Example/temp0008.html" "/tmp/tmpz5hkxw/Example/temp0009.html" 
"/tmp/tmpz5hkxw/Example/temp0010.html" "/tmp/tmpz5hkxw/Example/temp0011.html" 
-f example.pdf > /dev/null
PAGES: 15
BYTES: 211921                                                                  
Written file example.pdf
Done.

Original issue reported on code.google.com by [email protected] on 10 Nov 2011 at 8:21

Attachments:

CHM2PDF_Example.zip

Last page of CHM incompletly rendered or missing

I have some trouble with the last pages of my documents.

In one case, i get some ERR011: Unable to parse HTML element on line 49! from 
htmldoc on the last pages.
The strange thing is, that if I re-run the very same htmldoc command dipalyed 
with the --verose --verbosity high level (obviosly not deleting the temporary 
files), the pdf will be complete and no error ERR011 is rised.
It's like the last file written by the conversion before invoking HTMLDOC is 
still open or not completely written!

In an other case, the last page is simply completly missing, without any error 
message.

But the code look just fine to me:

            pf=open(filename,'w')
            pf.write(page)
            pf.close

What could be wrong??

Writing a dummy file afterwards before invoking HTMLDOC resolves the problem, 
but this seems quite a ugly hack to me (I am not a programmer).

Then I found the command
           pf.flush()

This also solves the problem. But why is this necessary??

Original issue reported on code.google.com by [email protected] on 26 Nov 2011 at 9:27

Attachments:

chm2pdf_flush.diff

chm2pdf deletes data directories when --extract-only is used

What steps will reproduce the problem?
1. run chm2pdf --extract-only <filename.chm>
2.
3.

What is the expected output? What do you see instead?
This should produce a data directory containing html files.  Instead, data 
directory is deleted.

What version of the product are you using? On what operating system?
chm2pdf v 0.9.1

Please provide any additional information below.

Program source code (with line numbers): 

1069    CHM2PDF_WORK_DIR = CHM2PDF_TEMP_WORK_DIR + os.sep + basename
1070    CHM2PDF_ORIG_DIR = CHM2PDF_TEMP_ORIG_DIR + os.sep + basename

...

1102     convert_to_pdf(cfile, filename, outputfilename, options)
1103     shutil.rmtree(CHM2PDF_TEMP_WORK_DIR)
1104     shutil.rmtree(CHM2PDF_TEMP_ORIG_DIR)


This shows that WORK_DIR and ORIG_DIR are *below* TEMP_WORK_DIR and 
TEMP_ORIG_DIR, and so are deleted at lines 1103, 1104.

Program needs to test for options['extract-only']=='' before calling 
shutil.rmtree(CHM2PDF_TEMP_WORK_DIR).

Original issue reported on code.google.com by [email protected] on 6 Aug 2011 at 11:26

PDF file contains TOC only, main text is missing

chm2pdf 0.9

Bug submission following discussion in Google group chm2pdf.

When converting the NSIS User's Manual NSIS.chm, the resulting
PDF file contains only a table of content. The main text is missing.

CHM file is attached to this report and can also be downloaded
from http://nsis.sourceforge.net/

In the Google group discussion, Chris Karakas mentions that he
reproduces the problem with his development 0.9.1 version and asked
for a bug submission. Here it is...

> I tried both 0.9 and my "development" 0.9.1 version. The problem exists
> in both. It comes from the fact that the CHM file "says" that it contains
> files with names like "SectionF.21.html#F.21.1.2", but actually it
> contains files like "SectionF.21.html". That is, we have to take away the
> "anchor information" (the "#F.21.1.2" part), before dealing with the
> files in chm2pdf.
>
> This seems to be a bug, so please be so kind and open one. :-)
> 
> Chris

Original issue reported on code.google.com by [email protected] on 13 Mar 2008 at 4:17

Attachments:

NSIS.chm

It is no longer necessary to compile your own chmlib, wiki update requested.

Greetings, thank you for this great program. I've created a python 2.6 portfile 
for macports.

I just wanted to note that it is no longer necessary to compile your own chmlib 
as the macports 
chmlib is configured with --enable-examples.

-james

Original issue reported on code.google.com by [email protected] on 9 Sep 2009 at 3:54

chm2pdf crashes on CHM file

What steps will reproduce the problem?
1. Running chm2pdf --book <chm file>
2.
3.

What is the expected output? What do you see instead?
Traceback (most recent call last):
  File "/usr/local/bin/chm2pdf", line 887, in <module>
    main(sys.argv)
  File "/usr/local/bin/chm2pdf", line 883, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/local/bin/chm2pdf", line 179, in convert_to_pdf
    html_list=get_html_list(cfile)
  File "/usr/local/bin/chm2pdf", line 88, in get_html_list
    lister.feed(topicstree)
  File "/usr/lib/python2.7/sgmllib.py", line 103, in feed
    self.rawdata = self.rawdata + data
TypeError: cannot concatenate 'str' and 'NoneType' objects


What version of the product are you using? On what operating system?
0.9.1-1.1ubuntu4

Please provide any additional information below.
I can provide the CHM file by email. Mail me at [email protected]

Original issue reported on code.google.com by [email protected] on 5 Aug 2011 at 5:24

links not working in the PDF with upper/lower case spelling error

In my application some links are not working in the PDF as I have some 
upper/lower case errors in links. As CHM is "windows stuff" this doesen't 
matter there, but "here" it does!

So how about making the 1. pass matching case insensitive adding the (?i) 
modifier in the regular expression?

Original issue reported on code.google.com by [email protected] on 18 Nov 2011 at 6:05

No tables in output file

What steps will reproduce the problem?
1. chm2pdf --book filename

Output:
PAGES: 142
BYTES: 824607                                                                  
Written file tdd.pdf
Done.


What is the expected output? What do you see instead?
The chm involved tables -> but in pdf there is no tables at all, the text
that should be in it disappeared  

What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 11 Nov 2009 at 1:30

SGML parser error "unexpected '\xbd' char in declaration" while converting a chm to pdf

What steps will reproduce the problem?
1. /usr/bin/python /usr/bin/chm2pdf --book book.chm book.pdf

What is the expected output? What do you see instead?
chm is successfully converted to pdf. However this error occurs  
sgmllib.py:111:error:SGMLParseError: unexpected '\xbd' char in
declaration and chm2pdf crashes.

What version of the product are you using? On what operating system?
chm2pdf-0.9.1 on Fedora 12.

Please provide any additional information below.

backtrace
-----
sgmllib.py:111:error:SGMLParseError: unexpected '\xbd' char in declaration

Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 1111, in <module>
    main(sys.argv)
  File "/usr/bin/chm2pdf", line 1107, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/bin/chm2pdf", line 394, in convert_to_pdf
    correct_file(page_filename, htmlout_filename, html_list, objective_urls,
options)
  File "/usr/bin/chm2pdf", line 140, in correct_file
    image_catcher.feed(page)
  File "/usr/lib64/python2.6/sgmllib.py", line 104, in feed
    self.goahead(0)
  File "/usr/lib64/python2.6/sgmllib.py", line 174, in goahead
    k = self.parse_declaration(i)
  File "/usr/lib64/python2.6/markupbase.py", line 136, in parse_declaration
    "unexpected %r char in declaration" % rawdata[j])
  File "/usr/lib64/python2.6/sgmllib.py", line 111, in error
    raise SGMLParseError(message)
SGMLParseError: unexpected '\xbd' char in declaration

Local variables in innermost frame:
message: "unexpected '\\xbd' char in declaration"
self: <__main__.ImageCatcher instance at 0x7fa85d2bdef0>

Bugzilla bug at https://bugzilla.redhat.com/show_bug.cgi?id=629659

Original issue reported on code.google.com by lakshminaras2002 on 14 May 2011 at 11:13

Error when chm2pdf starts

What steps will reproduce the problem?
> chm2pdf

What is the expected output? What do you see instead?

Traceback (most recent call last):
File "/usr/local/bin/chm2pdf", line 24, in <module>
import chm.chm as chm
ImportError: No module named chm.chm

What version of the product are you using? On what operating system?
I use 9.1 version of chm2pdf on linux 2.6.38-11-generic

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 12 Sep 2011 at 12:58

exit value: 16640

Something wrong happened when launching htmldoc.
exit value:  16640
ERR011: Unable to read image file "/tmp/tmpVEDMGt/UNIX\ System\ Programming/"!

Original issue reported on code.google.com by [email protected] on 3 Dec 2010 at 3:33

Failed if the file name contain a single-quote \'

What steps will reproduce the problem?
1. rename a legitimate chm file into abc\'xyz 
2. chm2pdf --book abc\'xyz
3.

What is the expected output? What do you see instead?

Got an error saying a tmp folder doesn't exit.

sh: Syntax error: Unterminated quoted string
Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 1108, in <module>
    main(sys.argv)
  File "/usr/bin/chm2pdf", line 1102, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/bin/chm2pdf", line 318, in convert_to_pdf
    objective_urls=get_objective_urls_list(filename)
  File "/usr/bin/chm2pdf", line 116, in get_objective_urls_list
    flist=open(CHM2PDF_WORK_DIR+'/urlslist.txt','rU')
IOError: [Errno 2] No such file or directory: 
"/tmp/tmp9iuwl8/abc'xyz/urlslist.txt"


What version of the product are you using? On what operating system?

Version 0.9.1
Ubuntu 10.04

Please provide any additional information below.

The problem can be fixed by adding the following lines.

1069d1068
<     basename = '_' + re.sub(r'[^\w]', '', basename)
1091d1089
<     filename = filename.replace("\'", "\\\'") 
1093d1090
<     outputfilename = outputfilename.replace("\'", "\\\'")

Original issue reported on code.google.com by [email protected] on 21 Sep 2012 at 5:36

Error with a huge chm

What steps will reproduce the problem?
1. executing chm2pdf with a "huge" chm (1054 pages)
2.
3.

What is the expected output? What do you see instead?
it gives error on re.sub line 159

What version of the product are you using? On what operating system?
version 0.9.1

Please provide any additional information below.
Bug solved using: page=re.sub(re.escape(iurl),img_filename,page)
probably the iurl contains some special chars

Original issue reported on code.google.com by [email protected] on 11 Mar 2009 at 8:33

"List index out of range" error when CHM contain spaces in internal file structure

What steps will reproduce the problem?

1.  Execute chm2pdf on a CHM file that contains spaces in its internal file 
structure.

What is the expected output?

A shiny, new PDF.

What do you see instead?

user@computer ~ $ chm2pdf --book temp.chm
Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 1098, in <module>
    main(sys.argv)
  File "/usr/bin/chm2pdf", line 1092, in main
    convert_to_pdf(cfile, filename, outputfilename, options)
  File "/usr/bin/chm2pdf", line 318, in convert_to_pdf
    objective_urls=get_objective_urls_list(filename)
  File "/usr/bin/chm2pdf", line 121, in get_objective_urls_list
    urls_list.append(spline[5])
IndexError: list index out of range

What version of the product are you using? On what operating system?

0.9.1 on Linux Mint Debian Edition.

Please provide any additional information below.

A kind fellow named Reto has posted a solution here:

https://groups.google.com/forum/#!topic/chm2pdf/859fW7pSMWA

In get_objective_urls_list of the main script, change the contents of the for 
loop to the following:

for line in flist.readlines()[3:]:
    spline= re.sub(r".*?normal file\s*(.*?)\n$", "\\1", line)
    if spline[0]=="/":
        urls_list.append( spline)
flist.close()

I know little Python and even less CHM, but the fix worked like a charm for me. 
 Now let's see if I can't do something about removing those annoying footers in 
the original CHM...

Thanks for all the time and work that's gone into chm2pdf.  It has already 
helped me out of a tight spot.

Original issue reported on code.google.com by [email protected] on 26 May 2013 at 1:29

"ImportError: No module named chm.chm"


sander@athlon64:~/chm2pdf-0.0.2$ chm2pdf ~/Azureus\ Downloads/Learning\
Something/Learning\ Something.chm
Traceback (most recent call last):
  File "/usr/bin/chm2pdf", line 11, in <module>
    import chm.chm as chm
ImportError: No module named chm.chm
sander@athlon64:~/chm2pdf-0.0.2$

Original issue reported on code.google.com by [email protected] on 19 Aug 2007 at 12:21

ImportError?: No module named chm.chm

On Opensuse 10.3, I installed chm2pdf and chmlib (from source with
--enable-example), pychm (I extract the tarball in /) and htmldoc (from
suse repository), but it appers:

# chm2pdf Traceback (most recent call last):

    File "/usr/local/bin/chm2pdf", line 24, in <module>

        import chm.chm as chm 

ImportError?: No module named chm.chm

Thanks

Original issue reported on code.google.com by [email protected] on 7 Dec 2007 at 8:36

Learn hardware-parts! Python too.

What steps will reproduce the problem?
1. download source
2. unpack to disk
3. look at source script, line 146

What is the expected output? What do you see instead?
144    f=open(output_file,'w')
145    f.write(page)
146    f.close    # BUG! <=========== MISSED "()"
147    #hack to guarantee that the file has been wholly written
148    f=open(output_file,'r')
149    while len(f.read()) < len(page):
150        pass
151    f.close()

INSTEAD:
146    f.close()

there are no method calling without parenthesis :)


What version of the product are you using? On what operating system?
No matter.


Please provide any additional information below.
Use pylint or other tool to verificaton source code. 
Dont do such stupid hacks on that clear language as Python :D

Original issue reported on code.google.com by [email protected] on 4 Feb 2008 at 12:42

crash on debian squeeze

What steps will reproduce the problem?
1.  chm2pdf somefile.chm

What is the expected output? What do you see instead?

segment error.


What version of the product are you using? On what operating system?

os is debian 6. apt-get install chm2pdf


Please provide any additional information below.


I think this is htmldoc's error, so, why not use wkhtmltopdf

Original issue reported on code.google.com by huangmingyou on 25 Feb 2011 at 8:21

E: Couldn't find package pychm

What steps will reproduce the problem?
1. while installing through apt-get install <package>, i am getting this error
2. root@AmSi:/home/amaresh# apt-get install chmlib
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Package chmlib is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package chmlib has no installation candidate

3. root@AmSi:/home/amaresh# apt-get install pychm
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Couldn't find package pychm

4. apt-get install htmldoc
Reading package lists... Done
Building dependency tree       
Reading state information... Done
htmldoc is already the newest version.
The following packages were automatically installed and are no longer required:
  rwhod libdb4.5
Use 'apt-get autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 10 not upgraded.


What is the expected output? What do you see instead?
E: Couldn't find package <name>

What version of the product are you using? On what operating system?
Ubuntu 8.10 ,Linux

Please provide any additional information below.

Original issue reported on code.google.com by amareshchandradas2005 on 7 May 2009 at 7:49

First color in form #00ff00 removed after a link

The first color information after a link is removed.

E.g this in orig:
<table>
<tr>
<td bgcolor="#00ff00">row 1, col1</td>
<td bgcolor="#00ff00">row 1, col2 <a href="P1.htm"> here a link</a></td>
<td bgcolor="#00ff00">row 1, col3</td>
</tr>
</table> 

Becomes this in work:
<table>
<tr>
<td bgcolor="#00ff00">row 1, col1</td>
<td bgcolor="#00ff00">row 1, col2 <a href="temp0001.html"> here a link</a></td>
<td bgcolor="">row 1, col3</td>
</tr>
</table>

Original issue reported on code.google.com by [email protected] on 13 Nov 2011 at 4:40

The page numbers of pdf aren't continuous

The page numbers of the created pdf, start's for every section of chm file
from number 1

Original issue reported on code.google.com by [email protected] on 2 Nov 2007 at 5:54

davideuler / chm2pdf Goto Github PK

chm2pdf's Introduction

chm2pdf's People

Watchers

chm2pdf's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs