jesparza / peepdf Goto Github PK
View Code? Open in Web Editor NEWPowerful Python tool to analyze PDF documents
Home Page: http://peepdf.eternal-todo.com
License: GNU General Public License v3.0
Powerful Python tool to analyze PDF documents
Home Page: http://peepdf.eternal-todo.com
License: GNU General Public License v3.0
Due to the migration to GitHub it is necessary to modify the update process in peepdf. It used to contact Google Code to retrieve the last version of the files, but it should browse GitHub now.
Hi
I try to add txt file to an existing PDF document.
#peepdf -i
ppdf>open my file
ppdf> embed /root/Bureau/share/file.txt text/plain
PPDF> *** Error: Exception not handled using the interactive console!! Please, report it to the author!!
any idea please ?
thank you
Trying to get a fully-featured PeepDF.py run on a current Debian Jessie (as well as on Mac OS X -- but I'll not go into depth with this OS here). Problem: _getting the dependencies on libemu and PyV8 installed._
To install the libemu and lxml dependencies worked like this:
sudo apt-get install libemu2 python-libemu python-lxml libxml2
Getting the PyV8 dependency was successfull only partially. A V8 package is available:
sudo apt-get install libv8-3.14.5
However, the PyV8 is no longer maintained on Google Code (http://code.google.com/p/pyv8/). The latest prebuilt (non-Windows) binaries there are from 2010 and are based on Python-2.6 (while Debian Jessie uses Python-2.7 now).
Trying it with
sudo pip install -v pyv8
leads to the following error message:
src/Wrapper.cpp: In static member function ‘static void CPythonObject::SetupObjectTemplate(v8::Handle<v8::ObjectTemplate>)’:
src/Wrapper.cpp:311:84: error: invalid conversion from ‘v8::Handle<v8::Boolean> (*)(v8::Local<v8::String>, const v8::AccessorInfo&)’ to ‘v8::NamedPropertyQuery {aka v8::Handle<v8::Integer> (*)(v8::Local<v8::String>, const v8::AccessorInfo&)}’ [-fpermissive]
clazz->SetNamedPropertyHandler(NamedGetter, NamedSetter, NamedQuery, NamedDeleter);
^
In file included from src/Exception.h:6:0,
from src/Wrapper.h:8,
from src/Wrapper.cpp:1:
/usr/include/v8.h:2414:8: note: initializing argument 3 of ‘void v8::ObjectTemplate::SetNamedPropertyHandler(v8::NamedPropertyGetter, v8::NamedPropertySetter, v8::NamedPropertyQuery, v8::NamedPropertyDeleter, v8::NamedPropertyEnumerator, v8::Handle<v8::Value>)’
void SetNamedPropertyHandler(NamedPropertyGetter getter,
^
src/Wrapper.cpp:312:94: error: invalid conversion from ‘v8::Handle<v8::Boolean> (*)(uint32_t, const v8::AccessorInfo&) {aka v8::Handle<v8::Boolean> (*)(unsigned int, const v8::AccessorInfo&)}’ to ‘v8::IndexedPropertyQuery {aka v8::Handle<v8::Integer> (*)(unsigned int, const v8::AccessorInfo&)}’ [-fpermissive]
clazz->SetIndexedPropertyHandler(IndexedGetter, IndexedSetter, IndexedQuery, IndexedDeleter);
^
In file included from src/Exception.h:6:0,
from src/Wrapper.h:8,
from src/Wrapper.cpp:1:
/usr/include/v8.h:2437:8: note: initializing argument 3 of ‘void v8::ObjectTemplate::SetIndexedPropertyHandler(v8::IndexedPropertyGetter, v8::IndexedPropertySetter, v8::IndexedPropertyQuery, v8::IndexedPropertyDeleter, v8::IndexedPropertyEnumerator, v8::Handle<v8::Value>)’
void SetIndexedPropertyHandler(IndexedPropertyGetter getter,
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-8XX3Id/pyv8/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-DQ9h7E-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-8XX3Id/pyv8
Traceback (most recent call last):
File "/usr/bin/pip", line 9, in <module>
load_entry_point('pip==1.5.6', 'console_scripts', 'pip')()
File "/usr/lib/python2.7/dist-packages/pip/__init__.py", line 248, in main
return command.main(cmd_args)
File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 161, in main
text = '\n'.join(complete_log)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 42: ordinal not in range(128)
Debugging and possibly solving this is beyond my capabilities.
The next alternative, using the pyv8 sources from Google Code with
python setup.py build
sudo python setup.py install
completes without obvious error, but does not lead to final success either: Because when running peepdf.py
, there is still the message
Warning: PyV8 is not installed!!
I came across this issue when playing with the current PeepDF code from the Git repo....
You can use this sample PDF file (handcoded by me) to reproduce the issue of this report:
The following is from an interactive peepdf
session (I know I haven't installed PyV8
and pylibemu
on this system, but this shouldn't matter for this issue):
kp@mbp:mrmcd15> peepdf.py -fli manuallycoded.pdf
Warning: PyV8 is not installed!!
Warning: pylibemu is not installed!!
File: manuallycoded.pdf
MD5: 758869f79d4fe30496db43b3bef8b708
SHA1: 4821d74bf84b6a0b757fd6c152c2e0a6d5fd3fa9
SHA256: 73016c523779e344ad45ee07dadd6812c14bde2015e6ba2e8e2638ef730c870e
Size: 2656 bytes
Version: 1.0
Binary: False
Linearized: False
Encrypted: False
Updates: 0
Objects: 9
Streams: 2
Comments: 0
Errors: 1
Version 0:
Catalog: 1
Info: 2
Objects (9): [1, 2, 3, 4, 5, 7, 8, 9, 10]
Streams (2): [8, 9]
Encoded (1): [9]
PPDF> object 9
<< /Length 503
/Filter [ /ASCIIHexDecode /ASCIIHexDecode /FlateDecode /LZWDecode /ASCIIHexDecode ] >>
stream
BT
/F1 60 Tf
30 400 Td
1 0 0 rg
(Hallo, MRMCD 2015) Tj
ET
endstream
PPDF> filters 9
[ /ASCIIHexDecode /ASCIIHexDecode /FlateDecode /LZWDecode /ASCIIHexDecode ]
PPDF> filters 9 ahx
<< /Length 228
/Filter /ASCIIHexDecode >>
stream
42540a202020202f4631203630202020202020202020202020202054660a20202020333020203430302020202020202020202020202054640a20202020312030203020202020202020202020202020202072670a202020202848616c6c6f2c204d524d434420323031352920546a0a45540a
endstream
PPDF> filters 9 ahx > stream9.ahx
PPDF> quit
Leaving the Peepdf interactive console...Bye! ;)
Now look at the file(s) named stream9.ahx
created by the last PeepDF command with its output re-directed:
kp@mbp:mrmcd15> ls -l stream9.ahx_* | wc -l
286
kp@mbp:mrmcd15> ls stream9.ahx_*
stream9.ahx_1 stream9.ahx_118 stream9.ahx_137 stream9.ahx_156 stream9.ahx_175 stream9.ahx_194 stream9.ahx_212 stream9.ahx_231 stream9.ahx_250 stream9.ahx_27 stream9.ahx_30 stream9.ahx_5 stream9.ahx_69 stream9.ahx_88
stream9.ahx_10 stream9.ahx_119 stream9.ahx_138 stream9.ahx_157 stream9.ahx_176 stream9.ahx_195 stream9.ahx_213 stream9.ahx_232 stream9.ahx_251 stream9.ahx_270 stream9.ahx_31 stream9.ahx_50 stream9.ahx_7 stream9.ahx_89
stream9.ahx_100 stream9.ahx_12 stream9.ahx_139 stream9.ahx_158 stream9.ahx_177 stream9.ahx_196 stream9.ahx_214 stream9.ahx_233 stream9.ahx_252 stream9.ahx_271 stream9.ahx_32 stream9.ahx_51 stream9.ahx_70 stream9.ahx_9
stream9.ahx_101 stream9.ahx_120 stream9.ahx_14 stream9.ahx_159 stream9.ahx_178 stream9.ahx_197 stream9.ahx_215 stream9.ahx_234 stream9.ahx_253 stream9.ahx_272 stream9.ahx_33 stream9.ahx_52 stream9.ahx_71 stream9.ahx_90
stream9.ahx_102 stream9.ahx_121 stream9.ahx_140 stream9.ahx_16 stream9.ahx_179 stream9.ahx_198 stream9.ahx_216 stream9.ahx_235 stream9.ahx_254 stream9.ahx_273 stream9.ahx_34 stream9.ahx_53 stream9.ahx_72 stream9.ahx_91
stream9.ahx_103 stream9.ahx_122 stream9.ahx_141 stream9.ahx_160 stream9.ahx_18 stream9.ahx_199 stream9.ahx_217 stream9.ahx_236 stream9.ahx_255 stream9.ahx_274 stream9.ahx_35 stream9.ahx_54 stream9.ahx_73 stream9.ahx_92
stream9.ahx_104 stream9.ahx_123 stream9.ahx_142 stream9.ahx_161 stream9.ahx_180 stream9.ahx_2 stream9.ahx_218 stream9.ahx_237 stream9.ahx_256 stream9.ahx_275 stream9.ahx_36 stream9.ahx_55 stream9.ahx_74 stream9.ahx_93
stream9.ahx_105 stream9.ahx_124 stream9.ahx_143 stream9.ahx_162 stream9.ahx_181 stream9.ahx_20 stream9.ahx_219 stream9.ahx_238 stream9.ahx_257 stream9.ahx_276 stream9.ahx_37 stream9.ahx_56 stream9.ahx_75 stream9.ahx_94
stream9.ahx_106 stream9.ahx_125 stream9.ahx_144 stream9.ahx_163 stream9.ahx_182 stream9.ahx_200 stream9.ahx_22 stream9.ahx_239 stream9.ahx_258 stream9.ahx_277 stream9.ahx_38 stream9.ahx_57 stream9.ahx_76 stream9.ahx_95
stream9.ahx_107 stream9.ahx_126 stream9.ahx_145 stream9.ahx_164 stream9.ahx_183 stream9.ahx_201 stream9.ahx_220 stream9.ahx_24 stream9.ahx_259 stream9.ahx_278 stream9.ahx_39 stream9.ahx_58 stream9.ahx_77 stream9.ahx_96
stream9.ahx_108 stream9.ahx_127 stream9.ahx_146 stream9.ahx_165 stream9.ahx_184 stream9.ahx_202 stream9.ahx_221 stream9.ahx_240 stream9.ahx_26 stream9.ahx_279 stream9.ahx_4 stream9.ahx_59 stream9.ahx_78 stream9.ahx_97
stream9.ahx_109 stream9.ahx_128 stream9.ahx_147 stream9.ahx_166 stream9.ahx_185 stream9.ahx_203 stream9.ahx_222 stream9.ahx_241 stream9.ahx_260 stream9.ahx_28 stream9.ahx_40 stream9.ahx_6 stream9.ahx_79 stream9.ahx_98
stream9.ahx_11 stream9.ahx_129 stream9.ahx_148 stream9.ahx_167 stream9.ahx_186 stream9.ahx_204 stream9.ahx_223 stream9.ahx_242 stream9.ahx_261 stream9.ahx_280 stream9.ahx_41 stream9.ahx_60 stream9.ahx_8 stream9.ahx_99
stream9.ahx_110 stream9.ahx_13 stream9.ahx_149 stream9.ahx_168 stream9.ahx_187 stream9.ahx_205 stream9.ahx_224 stream9.ahx_243 stream9.ahx_262 stream9.ahx_281 stream9.ahx_42 stream9.ahx_61 stream9.ahx_80
stream9.ahx_111 stream9.ahx_130 stream9.ahx_15 stream9.ahx_169 stream9.ahx_188 stream9.ahx_206 stream9.ahx_225 stream9.ahx_244 stream9.ahx_263 stream9.ahx_282 stream9.ahx_43 stream9.ahx_62 stream9.ahx_81
stream9.ahx_112 stream9.ahx_131 stream9.ahx_150 stream9.ahx_17 stream9.ahx_189 stream9.ahx_207 stream9.ahx_226 stream9.ahx_245 stream9.ahx_264 stream9.ahx_283 stream9.ahx_44 stream9.ahx_63 stream9.ahx_82
stream9.ahx_113 stream9.ahx_132 stream9.ahx_151 stream9.ahx_170 stream9.ahx_19 stream9.ahx_208 stream9.ahx_227 stream9.ahx_246 stream9.ahx_265 stream9.ahx_284 stream9.ahx_45 stream9.ahx_64 stream9.ahx_83
stream9.ahx_114 stream9.ahx_133 stream9.ahx_152 stream9.ahx_171 stream9.ahx_190 stream9.ahx_209 stream9.ahx_228 stream9.ahx_247 stream9.ahx_266 stream9.ahx_285 stream9.ahx_46 stream9.ahx_65 stream9.ahx_84
stream9.ahx_115 stream9.ahx_134 stream9.ahx_153 stream9.ahx_172 stream9.ahx_191 stream9.ahx_21 stream9.ahx_229 stream9.ahx_248 stream9.ahx_267 stream9.ahx_286 stream9.ahx_47 stream9.ahx_66 stream9.ahx_85
stream9.ahx_116 stream9.ahx_135 stream9.ahx_154 stream9.ahx_173 stream9.ahx_192 stream9.ahx_210 stream9.ahx_23 stream9.ahx_249 stream9.ahx_268 stream9.ahx_29 stream9.ahx_48 stream9.ahx_67 stream9.ahx_86
stream9.ahx_117 stream9.ahx_136 stream9.ahx_155 stream9.ahx_174 stream9.ahx_193 stream9.ahx_211 stream9.ahx_230 stream9.ahx_25 stream9.ahx_269 stream9.ahx_3 stream9.ahx_49 stream9.ahx_68 stream9.ahx_87
Each output file contains exactly 1 Byte. (Concatenating these file in the correct order will give the same output as seen in the interactive PeepDF session without re-directing the output.)
I also tried this command variation for re-directing the output: filters 9 ahx >> stream9.ahx
.
But it doesn't make a difference.
File https://www.virustotal.com/file/c7e31b77e7a4df74515bbac25a3f641598050e3fe1a9c3545efa72f0175f2323/analysis/1528284919/
contains an URI within object 3 but peepdf says:
URIs: 0
Can share the file if wanted.
Hi,
Service code.google.com is closing soon. Do you have plans to migrate to GitHub?
Original issue reported on code.google.com by [email protected]
on 26 Mar 2015 at 3:00
Any idea what is causing this error. I tried python2 peepdf.py --update and the code is up to date. This is happening on Linux.
peepdf.py", line 626, in
stats += beforeStaticLabel + 'URIs: ' + resetColor + statsDict['URIs'] + newLine
KeyError: 'URIs'
There are a number of places that need to be updated for this to work with python3. In particular the print statements. All prints need to be updated to conform to python3 standards. Currently all prints are of the form print 'stuff'
, this does not work for python3. Convert all of the print to print('stuff')
.
@jesparza As this is a python tool. Can you tell if I can use its commands in python programming by importing peepdf as a package.
I have analysed my pdf and had a look at its all objects by executing console commands like object 1, object 2 etc. Now my goal is to replace the content of 24 numbered object. Is that possible with this?
Please suggest.
Add the ASCII85Decode filter to peepdf, using the decoder
from pdfminer.
Original issue reported on code.google.com by [email protected]
on 30 Nov 2012 at 2:49
Attachments:
I needed to dump streams directly to file, e.g. extracting fonts from a PDF.
Attached is a patch which duplicates the 'stream' command, but accepts a
filename to output to rather than the console.
Original issue reported on code.google.com by [email protected]
on 11 Nov 2012 at 5:07
Attachments:
When processing certain files, peepdf crashes with the following error:
UnboundLocalError: local variable 'ret' referenced before assignment
The bug lies in the PDFFilters.py file in the decodeStream() function, line 92:
{{{
Traceback (most recent call last):
File "my_script.py", line 45, in <module>
ret, pdf = PDFCore.PDFParser().parse(filepath, True, True)
File "/home/travesti/peepdf_0.2/PDFCore.py", line 6727, in parse
ret = body.updateObjects()
File "/home/travesti/peepdf_0.2/PDFCore.py", line 4126, in updateObjects
object.resolveReferences()
File "/home/travesti/peepdf_0.2/PDFCore.py", line 2470, in resolveReferences
ret = self.decode()
File "/home/travesti/peepdf_0.2/PDFCore.py", line 2001, in decode
ret = decodeStream(self.encodedStream, self.filter.getValue(), self.filterParams)
File "/home/travesti/peepdf_0.2/PDFFilters.py", line 92, in decodeStream
return ret
UnboundLocalError: local variable 'ret' referenced before assignment
}}}
The exception is raised because there isn't a previous declaration of the "ret"
variable in the decodeStream() function. If none of the conditions are true
then the "ret" variable never gets a value, the function ret is reached and
Python raises the UnboundLocalError exception.
I patched the function just adding the following line at the begenning of the
decodeStream() function:
{{{
ret = (-1, "")
}}}
But it keeps raising errors in other modules :(
Original issue reported on code.google.com by [email protected]
on 8 Mar 2014 at 3:11
peepdf will raise exception when opening the sample.pdf in attachment because
it does not handle key P in standard encryption dictionary properly. The
rc4.patch in attachment can fix this problem.
Original issue reported on code.google.com by czchen
on 21 Oct 2011 at 1:10
Attachments:
It would be useful to not only be able to use the interactive mode for manual checking but to batch dump all JavaScript code from the cli.
Use case scenario, various documents need to be inspected (possibly hundreds), so interactive inspection will take too long.
Perhaps I am missing a rather simple mechanism to do this?
What steps will reproduce the problem?
1. ./peepdf -i
2. create pdf
3. embed file
4. filters 4 lzw
5. save test.pdf
6. exit
7. ./peepdf -i test.pdf
8. peepdf shows decode error in object 4
What is the expected output? What do you see instead?
Peepdf shall encode/decode LZW filter successfully.
What version of the product are you using? On what operating system?
The peepdf version is r45
The python version is 2.7.2+
The operating system is ubuntu 11.10 x86_64
Please provide any additional information below.
The test.pdf can not decode by other PDF tools like origami-pdf.
Original issue reported on code.google.com by czchen
on 27 Oct 2011 at 12:38
I wanted to replace the stream with content from the file. I performed the following operations:
modify stream 45 cidset.dat
save sample_fixed.pdf
peePDF creates broken PDF: simple overview reveals that the document has no trailer after xref table and no %%EOF
marker, while Adobe Preflight complains with the following errors:
This is a low priority issue.
Here are the steps:
It seems to me, that peepdf will delete newlines after any angle brackets (>> or <<).
I am receiving this error:
Error: An error has occurred while parsing an indirect object!!
The error log:
, in parse
ret = body.updateObjects()
peepdf2/PDFCore.py", line 4283, in updateObjects
object.resolveReferences()
File "PDFCore.py", line 3243, in resolveReferences
ret = PDFParser.readObject(objectsSection[offset:])
TypeError: slice indices must be integers or None or have an index method
Traceback (most recent call last):
File "peepdf.py", line 494, in
ret, pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
File "PDFCore.py", line 7064, in parse
ret = body.updateObjects()
File "PDFCore.py", line 4283, in updateObjects
object.resolveReferences()
File "PDFCore.py", line 3243, in resolveReferences
ret = PDFParser.readObject(objectsSection[offset:])
This is the numbers array:
<type 'list'>: ['14', '0', '15', '165', '17', '332']
If i change it to int I get:
PDFParser.readObject(objectsSection[offset:])
{TypeError}unbound method readObject() must be called with PDFParser instance as first argument (got str instance instead)
https://github.com/facebook/pyre2
I have a file running PDF parsing too long.
Traceback (most recent call last):
File "/home/soft/HawkEye/utils/../lib/hawkeye/core/plugins.py", line 230, in process
data = current.run()
File "/home/soft/HawkEye/utils/../modules/processing/static.py", line 1860, in run
static = PDF(self.file_path).run()
File "/home/soft/HawkEye/utils/../modules/processing/static.py", line 1080, in run
results = self._parse(self.file_path)
File "/home/soft/HawkEye/utils/../modules/processing/static.py", line 882, in _parse
ret, self.pdf = PDF_parser.parse(filepath, forceMode=True, looseMode=True, manualAnalysis=True)
File "/usr/lib/python2.7/site-packages/peepdf/PDFCore.py", line 7035, in parse
rawIndirectObjects = self.getIndirectObjects(bodyContent, looseMode)
File "/usr/lib/python2.7/site-packages/peepdf/PDFCore.py", line 7792, in getIndirectObjects
matchingObjectsAux = regExp.findall(content)
KeyboardInterrupt
And I find that i maybe RE problem, so why not use re2 to replace re?
After I replace it , I run very fast!
When i tried to open the file using
Open /sdcard/file.pdf
*** Error: Exception not handled using the interactive console!! Please, report it to the author!!
What steps will reproduce the problem?
1. Have a PDF with /AAPL:Keywords and it will get flagged as /AA based on line
43 of PDFCore.py . By adding a space after each of the the items from line
43-45, i.e. - '/AA ', you will still receive hits for legitimate Additional
Actions still but you now won't receive false positive hits because something
else contains _part_ of the data that was looked to match.
What is the expected output? What do you see instead?
Expected to flag only on the correct Event/Action/Element names but instead you
may receive false hits.
What version of the product are you using? On what operating system?
Version included in REMnux - checked the latest trunk version and it should
still be the same.
Please provide any additional information below.
pdfxray_lite also has this issue since it uses peepdf on the back end, however,
since it uses it's own copy of PDFCore.py that owner will be contacted
separately if this issue is accepted as it'll also need the slight change.
Original issue reported on code.google.com by [email protected]
on 11 Jun 2012 at 10:56
Trying PeePDF on one of the small, handcoded, extensively commeted demo files from our TROOPERS15 workshop, _114_incrementally-updated.pdf
_, which uses the incremental update feature....
The file contains two versions (2 xref
sections and 2 %%EOF
markers).
PeePDF isn’t sure about the number of versions. The tree
command returns 3 (Versions 1-3), the info
command returns 4 (Versions 0-3 and Updates: 3
). The rawobject xref 0
and rawobject xref 1
commands return an error message, while rawobject xref 2
and rawobject xref 3
print the correct info (apart from the version number). offsets
also reports Versions 1-3. (I’m not sure if metadata 2
and metadata 3
should work — a simple metadata
returns Info Object in version 2: [....]
)
The file contains many commented lines, so may this be a cause for PeePDF to choke on it?
I have to use peepdf.py -fi
to force it to parse the file — without the -f
it returns a message only: “Error: PDF sections not found!!” — What type of PDF sections is it talking about?!?
The info
command also reports: Errors: 4
. But I can’t find what exactly makes it think there are 4 errors.
Here is the complete output:
PPDF> info File: 114_incrementally-updated.pdf MD5: 24d635efd52bf29ad2d36421094be5a2 SHA1: f0b30334e111833c1bf898185d2e38135f0f88cc Size: 8527 bytes Version: 1.4 Binary: True Linearized: False Encrypted: False Updates: 3 Objects: 7 Streams: 1 Comments: 0 Errors: 4 Version 0: Catalog: No Info: No Objects (0): [] Streams (0): [] Version 1: Catalog: No Info: No Objects (0): [] Streams (0): [] Version 2: Catalog: 1 Info: 2 Objects (7): [1, 2, 3, 4, 5, 6, 7] Streams (1): [5] Encoded (1): [5] Version 3: Catalog: 1 Info: 2 Objects (0): [] Streams (0): [] PPDF> tree Version 1: Version 2: /Catalog (1) /Pages (3) /Page (4) stream (5) /R8 (7) /Font (6) /Pages (3) /Info (2) Version 3: PPDF> rawobject xref 3 xref 0 1 0000000000 65535 f 5 1 0000006923 00000 n PPDF> rawobject xref 2 xref 0 8 0000000000 65535 f 0000004019 00000 n 0000004072 00000 n 0000004343 00000 n 0000004408 00000 n 0000004623 00000 n 0000006488 00000 n 0000006567 00000 n PPDF> rawobject xref 1 *** Error: xref section not found!! PPDF> metadata Info Object in version 2: << /ModDate D:20131107003857+01'00' /CreationDate D:20131107003857+01'00' /Producer Text Editor, Brain & PDF-1.7 Specification ISO 32000-1:2008 /Title Vim- + Brain-Output /Creator Kurt Pfeifle /Author Kurt Pfeifle >> PPDF> offsets 0 Header Version 1: Version 2: 1502 Object 1 (51) 1552 1555 Object 2 (269) 1823 1826 Object 3 (63) 1888 1891 Object 4 (213) 2103 2106 Object 5 (1863) 3968 3971 Object 6 (77) 4047 4050 Object 7 (31) 4080 4084 Xref Section (168) 4251 4253 Trailer (146) 4398 4399 EOF Version 3: 5693 Xref Section (52) 5744 5746 Trailer (159) 5904 5905 EOF PPDF> errors PDF sections not found No indirect objects found in the body Unspecified parsing error Error parsing object: 5 0 obj (Unspecified parsing error) PPDF>
Lastly, when I attempted to write out save_version 1 114a.pdf
it created a file with only the 2 header lines.
I also created a version of the file which has removed all the commented lines.
With this version PeePDF does not have any problems.
It seems that commented lines can cause PeePDF to wrongly parse a PDF file.
Attached is an example PDF file containing JavaScript which I used for testing:
The new extract js
command fails to extract the complete set of JavaScript fragments.
The info
sub-command lists for "Suspicious elements":
/JS
/JavaScript
Here it agrees with the numbers listed by Didier Stevens' pdfid.py
tool.
However, when it comes to listing the number of "Objects with JS code", it only lists 5 of these: 73, 74, 75, 76 and 77.
Manually checking the source code of my file shows that there is more JavaScript code in objects 32, 86, 87, 92, 94, 96, 98, 101, 104, 107 and 109.
The reason seems to me seems to be two-fold:
/Next
key in object 77 pointing to object 109./AA
("additional actions") dictionaries.The most recent version of pdfinfo -js
is producing a more complete result (even though most of the /JS
and /JavaScript
name tokens are obfuscated).
function Motion(msg, n) {
var f = new String(msg);
return f.substr(n) + f.substr(0, n);
}
function checkField(aField) {
if (aField.value == "") { // empty
var msg = "No fields can be left empty!";
app.alert(msg);
return 0;
}
}
function goNext(item, event, cName) {
AFNumber_Keystroke(0, 0, 0, 0, "", true);
if (event.rc && AFMergeChange(event).length == event.target.charLimit) item.getField(cName).setFocus();
}
var f = this.getField("message.1");
if (global.ttIsRunning == 1) {
app.clearInterval(global.run);
global.ttIsRunning = 0;
}
var f = this.getField("message.1");
var code = new String("this.getField('message.1').value = Motion(this.getField('message.1').value,2);");
global.ttIsRunning = 1;
//global.run = app.setInterval(code,50);
'pdfinfo -js'
(from Poppler version 0.41.0)//////Name Dictionary "Motion":
function Motion(msg,n)
{
var f = new String(msg);
return f.substr(n)+f.substr(0,n);
}
//////Name Dictionary "checkField":
function checkField( aField )
{
if ( aField.value == "" ) { // empty
var msg = "No fields can be left empty!";
app.alert( msg );
return 0;
}
}
////Name Dictionary "goNext":
function goNext( item, event, cName )
{
AFNumber_Keystroke(0,0,0,0, "", true );
if ( event.rc && AFMergeChange(event).length == event.target.charLimit )
item.getField( cName ).setFocus();
}
////Field Activated:
app.alert( "You are running version " + app.viewerVersion + " of Adobe Acrobat " + app.viewerType + " on the "+ app.platform + " platform.")
////Field Activated:
if (typeof(app.viewerType)!="undefined")
if(app.viewerVersion < 5.0)
{
var msg = "Executing this script requires Acrobat 5.0.";
app.alert(msg);
}
else
{
var n = this.getField ("65name");
var annot = this.addAnnot ({
page: 0,
type: "Text",
author: n.value,
point: [462, 475, 810, 814],
strokeColor: color.blue,
popupOpen: true,
contents: "If you can read this, you are too close!"
});
}
////Field Activated:
if (typeof(app.viewerType)!="undefined")
if(app.viewerVersion < 5.0)
{
var msg = "Executing this script requires Acrobat 5.0.";
app.alert(msg);
}
else
{
var name = this.getField("66name");
var annot = this.addAnnot
({
page: 0,
type: "FreeText",
textFont: "Helvetica",
textSize: 18,
alignment: 1,
rect: [570, 450, 400, 400],
fillColor: ["RGB", 1, 1, 0],
strokeColor: color.blue,
name: "FreeText Note",
contents: "For something with more formatting control, a FreeText Annotation works nicely."
})
annot.author = name.value;
}
////Field Activated:
// the procedure begins here
var okToSubmit = true;
// loop over all fields:
for (var j = 0; j < this.numFields; j++)
{
var fieldname = this.getNthFieldName(j);
var theField = this.getField("72field");
if (theField.type != 'text')
continue; // get past buttonfields
var valid = checkField(theField);
if (!valid) // valid == 0? Halt!
{
okToSubmit = false; // set flag
break; // exit loop prematurely
}
}
////Field Activated:
if ( typeof( app.viewerVersion ) != undefined ) { // are we running in a known viewer?
if ( app.viewerVersion < 5.0 ) { // what version?
var ourPath = this.path;
var ourName = ourPath.split("/").pop();
this.getField("message.2")= ourName;
} else {
var ourURL = this.URL;
var ourName = ourURL.split("/").pop();
this.getField("message.2")= ourName;
}
}
////Page Open:
var f = this.getField("message.1");
var code = new String("this.getField('message.1').value = Motion(this.getField('message.1').value,2);");
global.ttIsRunning = 1;
//global.run = app.setInterval(code,50);
////Page Close:
var f = this.getField("message.1");
if (global.ttIsRunning == 1) {
app.clearInterval(global.run);
global.ttIsRunning = 0;
}
////Widget Annotation Activated:
app.alert( "You are running version " + app.viewerVersion + " of Adobe Acrobat " + app.viewerType + " on the "+ app.platform + " platform.")
////Widget Annotation Cursor Enter:
var f = this.getField("tipMessage.1");
f.hidden = false;
////Widget Annotation Cursor Leave:
var f = this.getField("tipMessage.1");
f.hidden = true;
////Widget Annotation Activated:
if (typeof(app.viewerType)!="undefined")
if(app.viewerVersion < 5.0)
{
var msg = "Executing this script requires Acrobat 5.0.";
app.alert(msg);
}
else
{
var n = this.getField ("65name");
var annot = this.addAnnot ({
page: 0,
type: "Text",
author: n.value,
point: [462, 475, 810, 814],
strokeColor: color.blue,
popupOpen: true,
contents: "If you can read this, you are too close!"
});
}
////Widget Annotation Activated:
if (typeof(app.viewerType)!="undefined")
if(app.viewerVersion < 5.0)
{
var msg = "Executing this script requires Acrobat 5.0.";
app.alert(msg);
}
else
{
var name = this.getField("66name");
var annot = this.addAnnot
({
page: 0,
type: "FreeText",
textFont: "Helvetica",
textSize: 18,
alignment: 1,
rect: [570, 450, 400, 400],
fillColor: ["RGB", 1, 1, 0],
strokeColor: color.blue,
name: "FreeText Note",
contents: "For something with more formatting control, a FreeText Annotation works nicely."
})
annot.author = name.value;
}
////Widget Annotation Activated:
// the procedure begins here
var okToSubmit = true;
// loop over all fields:
for (var j = 0; j < this.numFields; j++)
{
var fieldname = this.getNthFieldName(j);
var theField = this.getField("72field");
if (theField.type != 'text')
continue; // get past buttonfields
var valid = checkField(theField);
if (!valid) // valid == 0? Halt!
{
okToSubmit = false; // set flag
break; // exit loop prematurely
}
}
////Widget Annotation Activated:
if ( typeof( app.viewerVersion ) != undefined ) { // are we running in a known viewer?
if ( app.viewerVersion < 5.0 ) { // what version?
var ourPath = this.path;
var ourName = ourPath.split("/").pop();
this.getField("message.2")= ourName;
} else {
var ourURL = this.URL;
var ourName = ourURL.split("/").pop();
this.getField("message.2")= ourName;
}
}
It appears Acrobat will render pdf files properly even when object/stream def after %%EOF, however peepdf will discard the content due to stop at %%EOF.
e.g: the recent hot pdf exploit, bd23ad33accef14684d42c32769092a0
0000023515 00000 n
0000024187 00000 n
0000024261 00000 n
trailer
<<
/Size 67
/Root 10 0 R
>>
startxref
24613
%%EOF
1 0 obj
<<
/Length 56305
/Filter /FlateDecode
>>
stream
....
Current peepdf will failed to parse, throws exception.
The following tries to fix the problem.
diff --git a/PDFCore.py b/PDFCore.py
index 3b2fe00..33cf5a4 100644
--- a/PDFCore.py
+++ b/PDFCore.py
@@ -4315,7 +4315,7 @@ class PDFBody :
self.setObject(compressedId, compressedObject, offset)
del(compressedObjectsDict)
for id in self.referencedJSObjects:
- if id not in self.containingJS:
+ if (len(self.containingJS) and id not in self.containingJS):
object = self.objects[id].getObject()
if object == None:
errorMessage = 'Object is None'
@@ -6941,6 +6941,9 @@ class PDFParser :
self.fileParts.append(fileContent)
else:
sys.exit(errorMessage)
+ # append anything behind %%EOF
+ if fileContent:
+ self.fileParts.append(fileContent)
pdfFile.setUpdates(len(self.fileParts) - 1)
# Getting the body, cross reference table and trailer of each part of the file
Applying the change, there should be no issue of parsing said file:
Version 0:
Catalog: 10
Info: No
Objects (50): [6, 7, 9, 10, 11, 12, 14, 15, 17, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 61, 62, 63, 64, 65, 66]
Errors (1): [33]
Streams (14): [14, 15, 17, 25, 31, 32, 33, 34, 49, 51, 55, 56, 57, 62]
Encoded (11): [14, 15, 17, 25, 31, 32, 33, 49, 51, 55, 56]
Decoding errors (1): [33]
Suspicious elements:
/AcroForm (1): [10]
/OpenAction (1): [10]
/JS (1): [11]
/JavaScript (1): [11]
Version 1:
Catalog: No
Info: No
Objects (1): [1]
Streams (1): [1]
Encoded (1): [1]
Objects with JS code (1): [1]
PPDF> object 1
<< /Length 56305
/Filter /FlateDecode >>
stream
var dlldata= [0x81ec8b55,0x000498ec,0xf4458900 ....
It's a quick fix, you may refactor the logic a bit...
What steps will reproduce the problem?
1. Don't install PyV8
2. try to run peepdf.py on any pdf w/ js
What is the expected output? What do you see instead?
For the python to load.
Instead presented with this:
Traceback (most recent call last):
File "peepdf.py", line 32, in <module>
from PDFCore import PDFParser, vulnsDict
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/PDFCore.py", line 31, in <module>
from JSAnalysis import *
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/JSAnalysis.py", line 36, in <module>
class Global(PyV8.JSClass):
NameError: name 'PyV8' is not defined
What version of the product are you using? On what operating system?
any
Please provide any additional information below.
placing the global class in the try block will fix it... probably a better fix.
try:
import PyV8
JS_MODULE = True
class Global(PyV8.JSClass):
evalCode = ''
def evalOverride(self, expression):
self.evalCode += '\n\n// New evaluated code\n' + expression
return
except:
JS_MODULE = False
Original issue reported on code.google.com by [email protected]
on 5 Sep 2013 at 3:18
Currently one can run peepdf.py -s script my.pdf
and have PeepDF.py execute the commands listed in thescript
file without a need to start it in interactive mode.
It would be nice if we had a more direct way to execute one or more small commands like this:
peepdf.py -C "tree,offsets,filters 9" my.pdf
peepdf.py --commands "tree,offsets,filters 9" my.pdf
This should then behave the same as running peepdf.py -s script
where the contents of script was:
tree
offsets
filters 9
This feature would save us from the sometimes long-winded path of first creating or editing/modifying a script
file.
Hi, from a project using peepdf.
spender-sandbox/cuckoo-modified#54
Some samples are here
Hi
I need to be able to analyze PDF files in order to find the color spaces used for each object (for press preflight purposes - mainly to find out if there are objects from color spaces other than CMYK and if there are any color profiles attached). I am having difficulties finding and open source tool to do that.
Would it be possible with peepdf?
Could peepdf be used at least as some intermediate step to achieve the task?
My aim is to create a command-line tool for verifying PDF files in terms of color space.
The current output of the offsets
command is buggy in so far as it reports the offset to the end of an indirect object as an integer that is off by one. Take this as an example PDF (hand-coded, no binary bytes, so it can be examined easily in a text editor):
The offsets
command reports this output for the file (I put in additional comments about what would be the correct values):
0 Header
74
Object 1 (89)
162 ### 163
166
Object 2 (236)
401 ### 402
405
Object 3 (127)
531 ### 532
534
Object 4 (208)
741 ### 742
745
Object 5 (42)
786 ### 787
807
Object 7 (92)
898 ### 899
902
Object 8 (410)
1311 ### 1312
1315
Object 9 (726)
2040 ### 2041
2044
Object 10 (209)
2252 ### 2253
2317
Xref Section (240)
2556 ### 2557
2558
Trailer (92)
2649 ### 2549, correct!
2650 EOF
More importantly, it could be improved greatly by adding the following info:
%Trailer
comment in the line 148 of the above linked sample PDF)/Length NMNM
for the given indirect object states _as well as_ what PeepDF itself calculates (as you know, there may be mismatches).peepdf crashes with a TypeError
if some PDFs are analyzed in force parsing mode and PDFObjectStream.resolveReferences()
is invoked.
Traceback (most recent call last):
File "/home/sdeiss/Developer/bin/virtualenv/peekaboo/local/lib/python2.7/site-packages/peepdf/main.py", line 409, in main
ret, pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
File "/home/sdeiss/Developer/bin/virtualenv/peekaboo/local/lib/python2.7/site-packages/peepdf/PDFCore.py", line 7098, in parse
ret = body.updateObjects()
File "/home/sdeiss/Developer/bin/virtualenv/peekaboo/local/lib/python2.7/site-packages/peepdf/PDFCore.py", line 4288, in updateObjects
object.resolveReferences()
File "/home/sdeiss/Developer/bin/virtualenv/peekaboo/local/lib/python2.7/site-packages/peepdf/PDFCore.py", line 3253, in resolveReferences
ret = PDFParser.readObject(objectsSection[offset:])
TypeError: slice indices must be integers or None or have an __index__ method
If I fix that TypeError
by converting offset
at PDFCore.py:3243
to an int
object I get another one:
Traceback (most recent call last):
File "/home/sdeiss/Developer/bin/virtualenv/peekaboo/local/lib/python2.7/site-packages/peepdf/main.py", line 409, in main
ret, pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
File "/home/sdeiss/Developer/bin/virtualenv/peekaboo/local/lib/python2.7/site-packages/peepdf/PDFCore.py", line 7098, in parse
ret = body.updateObjects()
File "/home/sdeiss/Developer/bin/virtualenv/peekaboo/local/lib/python2.7/site-packages/peepdf/PDFCore.py", line 4288, in updateObjects
object.resolveReferences()
File "/home/sdeiss/Developer/bin/virtualenv/peekaboo/local/lib/python2.7/site-packages/peepdf/PDFCore.py", line 3253, in resolveReferences
ret = PDFParser.readObject(objectsSection[offset:])
TypeError: unbound method readObject() must be called with PDFParser instance as first argument (got str instance instead)
A possible solution would be to supply the PDFParser
object to PDFObjectStream
when creating that instance and then provide the supplied PDFParser
instance for readObject()
.
UnboundLocalError: local variable 'userPass' referenced before assignment on line 260 of PDFCrypto.py
What I want is I need to process batch of PDF files and just save the Java Scripts of each PDF files. For that thing I need to have a file name of PDF file. Like I am using -s (command option in PDF file)
peepdf -fl -s command pdffile.pdf
in command file I have:
extract js > abc.js
but if there is a pdf_file_name variable in interactive mode then I could have used $filename.js instead of abc.js
What steps will reproduce the problem?
1. Run "./peepdf.py -i"
2. Run "create pdf" in peepdf console
3. Run "save 'test.pdf'" in peepdf console
What is the expected output? What do you see instead?
The following content is the cross-reference table and trailer of test.pdf. The
size of cross-reference table is 4, however, there are 5 entries in table.
There is a useless entry in cross-reference table which does not point to an
object.
xref
0 4
0000000000 65535 f
0000000009 00000 n
0000000059 00000 n
0000000118 00000 n
0000000119 00000 n
trailer
<< /Size 4
/Root 1 0 R >>
startxref
210
%%EOF
What version of the product are you using? On what operating system?
The version of peepdf is r42. The operating system is ubuntu-11.10 x86_64.
Please provide any additional information below.
Original issue reported on code.google.com by czchen
on 24 Oct 2011 at 11:50
Current source from GitHub is not functional. I don't know if it is supposed to be (in the past, after all new commits, it was...)
(I'm aware I don't have pylibemu
and don't have PyV8
installed -- but this shouldn't matter here.)
### Check current Git log:
kp@mbp:git.peepdf.trunk > git log | head -n 12
commit c550c6d1e8b4cb507018deb73392a0487d5d96b4
Author: Jose Miguel Esparza <[email protected]>
Date: Fri Jul 31 01:27:46 2015 +0200
Added /Flash as element to monitor
commit 79d0534981a98a9c553cc68f2b13a62f5afd5c5a
Author: Jose Miguel Esparza <[email protected]>
Date: Mon Jul 27 23:42:55 2015 +0200
Added some PEP8 magic and modified the limit output to 500 lines instead of 1000
### Create an empty, dummy PDF with Ghostscript:
kp@mbp:git.peepdf.trunk > gs -q -o empty-dummy.pdf -sDEVICE=pdfwrite -c showpage
### Check the newly created PDF with `pdfinfo`:
kp@mbp:git.peepdf.trunk > pdfinfo empty-dummy.pdf
Producer: GPL Ghostscript GIT PRERELEASE 9.18
CreationDate: Fri Jul 31 11:42:22 2015
ModDate: Fri Jul 31 11:42:22 2015
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 1
Encrypted: no
Page size: 612 x 792 pts (letter)
Page rot: 0
File size: 2383 bytes
Optimized: no
PDF version: 1.5
### Run peepdf.py:
kp@mbp:git.peepdf.trunk > ./peepdf.py -fil empty-dummy.pdf
Warning: PyV8 is not installed!!
Warning: pylibemu is not installed!!
File: empty-dummy.pdf
MD5: fc9ef463e4de46cdb87805be0f0edc7b
SHA1: ddaa418db6e95360273b13506d592d54dc49a311
SHA256: c36758aa347e1e526addf50b8a795abe6c5cbd79d8d6c30cf86aebea67250ebd
Size: 2383 bytes
Version: 1.5
Binary: True
Linearized: False
Encrypted: False
Updates: 0
Objects: 9
Streams: 2
Comments: 0
Errors: 0
Version 0:
Catalog: 1
Info: 2
Objects (9): [1, 2, 3, 4, 5, 6, 7, 8, 9]
Streams (2): [9, 5]
Encoded (1): [5]
*** Error: Exception not handled!!
Please, don't forget to report the errors found:
- Sending the file "$(pwd)/errors.txt" to the author (mailto:[email protected])
- And/Or creating an issue on the project webpage (https://github.com/jesparza/peepdf/issues)
Hi,
I have found an issue when I try to add/modify string objects with Windows-formatted IP addresses (such as \127.0.0.1).
peepdf detects these IP addresses as if they were octal number \ddd. If the IP address has numbers bigger than 7, an exception occurs in the conversion to octal.
Please, specify the string object content:
\\192.168.1.1
*** Error: The object has not been modified!!
Hi
I just managed to get DCTFilter to work on my ubuntu box. It went out that PIL removed tostring() method.
Exception: tostring() has been removed. Please call tobytes() instead.
Regards
Piotr
It seems that v8 is no longer maintained.
Do you have any plans to change peepdf javascript engine?
For example I cannot install v8 on my Mac OS X (see https://code.google.com/p/pyv8/issues/detail?id=246) so I cannot use the function js_analyse
.
I'm trying to use peepdf for the changelog feature, but I can't make it work and would appreciate some help.
I first tried running the program in the interactive console, but when I call the "open" command I get the error:
*** Error: Exception not handled using the interactive console!! Please, report it to the author!!
And the error.txt
file contains:
Traceback (most recent call last):
File "C:\user\pdf\peepdf2\peepdf.py", line 727, in <module>
console.cmdloop()
File "C:\Python27\lib\cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "C:\Python27\lib\cmd.py", line 221, in onecmd
return func(arg)
File "C:\user\pdf\peepdf2\PDFConsole.py", line 2858, in do_open
ret = pdfParser.parse(fileName, forceMode, looseMode)
File "C:\user\pdf\peepdf2\PDFCore.py", line 7054, in parse
sys.exit('Error: An error has occurred while parsing an indirect object!!')
SystemExit: Error: An error has occurred while parsing an indirect object!!
Then I tried running the command directly trough parameters:
python.exe "C:\user\pdf\peepdf2\peepdf.py" -C changelog -f "C:\user\pdf\pdf_test.pdf"
But another error occured:
Error: Exception not handled!!
errors.txt
:
Traceback (most recent call last):
File "C:\user\pdf\peepdf2\peepdf.py", line 494, in <module>
ret, pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
File "C:\user\pdf\peepdf2\PDFCore.py", line 7061, in parse
ret = body.updateObjects()
File "C:\user\pdf\peepdf2\PDFCore.py", line 4283, in updateObjects
object.resolveReferences()
File "C:\user\pdf\peepdf2\PDFCore.py", line 3243, in resolveReferences
ret = PDFParser.readObject(objectsSection[offset:])
TypeError: slice indices must be integers or None or have an __index__ method
What could be causing those issues?
What steps will reproduce the problem?
1. Get this specially forged PDF:
https://www.virustotal.com/en-gb/file/be9c0025b99f0f8c55f448ba619ba303fc65eba862
cac65a00ea83d480e5efec/analysis/
2. run peepdf -fi filename
3. run js_analysis object 6
What is the expected output? What do you see instead?
Run the JS code the PyV8 .
Because there are XFA tags opening and closing, js emulation fails:
*** Error analysing Javascript: SyntaxError: Unexpected token < ( @ 1 : 0 )
-> <? xml version = "1.0"
What version of the product are you using? On what operating system?
Version: peepdf 0.2 r203
Ubuntu 12.10
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 18 Oct 2013 at 2:53
this is the error.log
Traceback (most recent call last):
File "./peepdf.py", line 541, in <module>
console.cmdloop()
File "/usr/lib/python2.7/cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "/usr/lib/python2.7/cmd.py", line 219, in onecmd
return func(arg)
File "/usr/local/peepdf/PDFConsole.py", line 2721, in do_open
ret = pdfParser.parse(fileName, forceMode, looseMode)
File "/usr/local/peepdf/PDFCore.py", line 6838, in parse
sys.exit('Error: An error has occurred while parsing an indirect object!!')
SystemExit: Error: An error has occurred while parsing an indirect object!!
Traceback (most recent call last):
File "./peepdf.py", line 541, in <module>
console.cmdloop()
File "/usr/lib/python2.7/cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "/usr/lib/python2.7/cmd.py", line 219, in onecmd
return func(arg)
File "/usr/local/peepdf/PDFConsole.py", line 2721, in do_open
ret = pdfParser.parse(fileName, forceMode, looseMode)
File "/usr/local/peepdf/PDFCore.py", line 6838, in parse
sys.exit('Error: An error has occurred while parsing an indirect object!!')
SystemExit: Error: An error has occurred while parsing an indirect object!!
do you need other info?
thanks a lot
Original issue reported on code.google.com by [email protected]
on 23 Jun 2014 at 3:26
When using PDFs containing PNG images with prediction > 10, the current
implementation only decodes part of the image (1/3 of each row of the image).
Luckily, I already found the problem and I will attach a patch with a possible
solution :)
Original issue reported on code.google.com by [email protected]
on 17 Sep 2013 at 9:55
Attachments:
What steps will reproduce the problem?
1.https://www.virustotal.com/en/file/784d1ebd1faccec27f98970cc266859eaf5676da1c4
51e3304fb55435d8c8473/analysis/
2. run peepdf.py -f vtfile
What is the expected output? What do you see instead?
#Expected:
Warning: PyV8 is not installed!!
Warning: pylibemu is not installed!!
Decryption error: Bad format for /O!!
Decryption error: Bad format for /U!!
Decryption error: Default user password not working here!!
File: tp_22340_utf8_88292d7181514fda5390292d73da28d4
MD5: 88292d7181514fda5390292d73da28d4
SHA1: fbc3856fd689e1ac0f8fb56bbd7d0a2b8332a928
Size: 807079 bytes
Version: 1.4
Binary: True
Linearized: False
Encrypted: True (RC4 40 bits)
Updates: 0
Objects: 7
Streams: 1
Comments: 0
Errors: 5
Version 0:
Catalog: 1
Info: No
Objects (7): [1, 2, 3, 4, 5, 8, 9]
Errors (1): [5]
Streams (1): [5]
Encoded (1): [5]
Decoding errors (1): [5]
Suspicious elements:
/AcroForm: [1]
/OpenAction: [1]
/JS: [1]
/JavaScript: [1]
#Instead see:
Traceback (most recent call last):
File "peepdf.py", line 352, in <module>
ret,pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/PDFCore.py", line 6822, in parse
ret = pdfFile.decrypt()
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/PDFCore.py", line 5179, in decrypt
ret = computeUserPass(password, dictO, fileId, perm, keyLength, revision, encryptMetadata)
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/PDFCrypto.py", line 164, in computeUserPass
ret = computeEncryptionKey(userPassString, dictO, dictU, dictOE, dictUE, fileID, pElement, keyLength, revision, encryptMetadata)
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/PDFCrypto.py", line 58, in computeEncryptionKey
md5input = password + dictOwnerPass + struct.pack('<I',abs(int(pElement))) + fileID
TypeError: cannot concatenate 'str' and 'instance' objects
What version of the product are you using? On what operating system?
latest version from svn, any os
Please provide any additional information below.
when forcing and encountering errors and the dict0/dictOwnerPass object doesn't
resolve to a simple string and therefore hinders further execution.
Original issue reported on code.google.com by [email protected]
on 5 Sep 2013 at 3:35
Attachments:
What steps will reproduce the problem?
1. running metadata in the console on a malformed PDF
What is the expected output? What do you see instead?
The program crashed with:
Traceback (most recent call last):
File "/home/.../bin/peepdf.py", line 465, in <module>
console.cmdloop(stats + newLine)
File "/usr/lib64/python2.6/cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "/usr/lib64/python2.6/cmd.py", line 219, in onecmd
return func(arg)
File "/home/.../src/svn/sec/peepdf-read-only/PDFConsole.py", line 2290, in do_metadata
type = object.getElementByName('/Type').getValue()
AttributeError: 'list' object has no attribute 'getValue'
What version of the product are you using? On what operating system?
r158 from svn
Please provide any additional information below.
I don't know if the patch is the right long-term solution, but it solved my
crash.
Maybe every interactive command should be in a try/except block, so the program
does not crash on the user?
Original issue reported on code.google.com by [email protected]
on 30 Nov 2012 at 3:27
Attachments:
Pyv8 is now no longer hosted on Google code or supported. All of the old mirrors people have made on github will not build. Maybe switch to a new library?
I installed pyv8 from https://github.com/buffer/pyv8. The installation of pyv8 succeeded; however, running the following test fails to interpret the javascript code in the variable jscode:
PPDF> set jscode "var a = 8; a = a + 2; print('The content of the variable is '+a);"
PPDF> js_eval variable jscode
*** Error: ReferenceError: print is not defined ( @ 1 : 22 ) -> var a = 8; a = a + 2; print('The content of the variable is '+a);
I get similar messages for PDFs that contain javascript and where I execute js_eval.
I ran this test on debian 7.11 and on remnux 6 with the latest version of peepdf found on github. I would appreciate if you can let me know if there is anything else I am missing in order to get peepdf to interpret javascript code.
I would like to have a functionality in PeepDF which allows to encode _streams. Currently, it is only possible to filter/encode _variables, _files_ and _raw byte ranges_:
PPDF> encode help
Usage: encode variable $var_name $filter1 [$filter2 ...]
Usage: encode file $file_name $filter1 [$filter2 ...]
Usage: encode raw $offset $num_bytes $filter1 [$filter2 ...]
Encodes the content of the specified variable, file or raw bytes using the following
filters or algorithms:
[....]
So it would be nice to have this:
PPDF> encode help
Usage: encode variable $var_name $filter1 [$filter2 ...]
Usage: encode file $file_name $filter1 [$filter2 ...]
Usage: encode raw $offset $num_bytes $filter1 [$filter2 ...]
Usage: encode stream $object $filter1 [$filter2 ...]
Encodes the content of the specified variable, file, raw bytes or stream from $object
using the following filters or algorithms:
[....]
Of course, re-directing the output of that function to a file should also work:
PPDF> encode stream 9 lzw ahx > stream9-lzw-ahx.txt
I know I can work around this by using encode variables
, encode file
or encode raw
. But this requires to go the troublesome path of putting the current stream content into a file or variable first, or of calculating the offset to the stream and its length beforehand....
CVE-2013-3346 pdf samples have obfuscated Javascript code using jjencode
(http://utf-8.jp/public/jjencode.html). It would be nice to have a jjdecoder in
peepdf to quickly deobfuscate the code.
Sample jjdecoder written in Javascript can be found here:
http://csc.cs.utm.my/syed/images/files/jjdecode/jjdecode.html
Some explanation about how a jjdecoder works can be found here:
http://corkami.googlecode.com/svn-history/r399/trunk/misc/jjencode.txt
Original issue reported on code.google.com by [email protected]
on 12 Dec 2013 at 12:28
We have an automated malware analysis system that runs a variety of scans in
memory on input files. We patched PDFCore.py to enable string input of file
contents, rather than a filename. It is attached, in case anyone finds it
useful.
Original issue reported on code.google.com by [email protected]
on 22 Mar 2012 at 2:48
Attachments:
Hi,
This is an amazing tool to analyze mal pdfs and is there a timeline to add support for the jbig2 filter?
Thanks
*** Error: Exception not handled using the interactive console!! Please, report it to the author!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.