runassudo / gfx2gfx-pdftext Goto Github PK
View Code? Open in Web Editor NEWA fork of SWFTools' gfx2gfx which preserves text, rather than converting to shapes.
License: GNU General Public License v2.0
A fork of SWFTools' gfx2gfx which preserves text, rather than converting to shapes.
License: GNU General Public License v2.0
As a result of the manual positioning of the letters, contiguous blocks of text are not recognised as such by PDF readers. As a result, copy-pasting does not work well, and nor does in-text search.
This may be related to the issue that text is interpreted by programs like qpdfview and evince back-to-front, possibly related to the topdown setting of PDFlib.
Hi, I've succesfully compiled the gfx2gfx-pdftext v0.9.2 build 8d5a70b code (and the same is true for SWFTools v0.9.2) under Mac OS X 10.11.6 but I had to fix the jpeg.c file (as I described in the thread https://github.com/matthiaskramm/swftools/issues/37) given that I had some compiling errors.
In particular I've compiled gfx2gfx without errors but if I use it with the command:
gfx2gfx test.swf -o test.pdf
I get the following errors (swf file zipped and attached test.swf.zip):
Error: ID 142 unknown
Error: ID 145 unknown
Error: ID 148 unknown
Error: ID 151 unknown
Error: ID 154 unknown
Error: ID 157 unknown
Error: ID 160 unknown
Error: ID 163 unknown
Error: ID 166 unknown
Error: ID 169 unknown
Error: ID 172 unknown
Error: ID 175 unknown
Error: ID 178 unknown
Error: ID 181 unknown
Error: ID 184 unknown
Error: ID 187 unknown
Error: ID 190 unknown
Error: ID 193 unknown
Error: ID 196 unknown
Error: ID 199 unknown
Error: ID 202 unknown
Error: ID 205 unknown
Error: ID 208 unknown
Error: ID 211 unknown
Error: ID 214 unknown
Error: ID 217 unknown
Error: ID 220 unknown
Error: ID 223 unknown
Error: ID 226 unknown
Error: ID 229 unknown
Error: ID 232 unknown
Error: ID 235 unknown
Error: ID 238 unknown
Error: ID 241 unknown
Error: ID 244 unknown
Error: ID 247 unknown
Error: ID 250 unknown
Error: ID 253 unknown
Error: ID 256 unknown
Error: ID 259 unknown
Error: ID 262 unknown
Error: ID 265 unknown
Error: ID 268 unknown
Error: ID 271 unknown
Error: ID 274 unknown
Error: ID 277 unknown
Error: ID 280 unknown
Error: ID 283 unknown
Error: ID 286 unknown
Error: ID 289 unknown
Error: ID 292 unknown
Error: ID 295 unknown
Error: ID 298 unknown
Do you have any idea what this is? How can I fix it?
I would like to be able to exclude specific strings from the generated pdf.
Is there any file I could modify to do so?
I can't find the responsible method in the text.c or the gfx2gfx.c file.
Thanks for your help.
As reported by @sonst-was.
e.g. page_3.zip
In pdf.js, the letters are huge and reversed. Adobe Reader (acroread) on Linux displays some glyphs correctly, but raises a bunch of errors and fails to display most.
e.g. U+239f, which is erroneously placed at the left of the bounding box rather than the right; italic f, which is placed too close to the next letter.
e.g. U+239b, U+23aa
gfx2gfx-pdf2text - part of swftools 0.9.2 (build )
missing build is 8d5a70b
Compiled under ubuntu 17.04, typically flawless conversion of swf and gau files. This particular page (originally 694.gau) converted with no errors under -r0 option, but caused acrobat 8.1 to crash right here when combining pages 654-714. Garbled graphics in the output, but all 693 prior pages fine. Single page conversion with -r0 and -r300 attached.
Using gfx2gfx compiled on Windows with mingw. Unknown if this is related. Issue also occurs with upstream non-pdftext version.
Example of offending SWF is on file with @RunasSudo
thanks for your fancy project, i'm not sure whether you can read Chinese or not(i got problems while using this tool converting some swf in Chinese),actually, i came across 2 problems:
version: gfx2gfx-pdf2text - part of swftools 0.9.2 (build 8d5a70b)
the 2nd problem is easy to fix(i don't know whether this fix is right or not), i just add 2 lines code after
gfx2gfx-pdftext/lib/devices/pdf.c
Line 392 in 8d5a70b
if(gt7bits>=128)
gt7bits=0;
but there still exists the 1st problem, some character is missing after convertion
This seems to be a compatibility issue between viewers. In some viewers, including Adobe Reader, MuPDF, xpdf and Firefox's pdf.js, text does not display. qpdf, evince, okular and Google Drive view the PDFs correctly.
A temporary fix appears to be to post-process the PDF with Ghostscript or Poppler:
gs -o output.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress input.pdf
or
pdftocairo -pdf input.pdf output.pdf
This produces a PDF which appears to be readable in all the above applications.
cd src;make all
fatal: Not a git repository (or any parent up to mount point /media/stark/afebc556-6185-4b48-81b3-bc81f3987dd8)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
make[1]: Entering directory `/media/stark/afebc556-6185-4b48-81b3-bc81f3987dd8/kiran/gfx2gfx-pdftext-master/src'
gcc -c -DHAVE_CONFIG_H -DGIT_VERSION="" -I/media/stark/afebc556-6185-4b48-81b3-bc81f3987dd8/kiran/PDFlib-Lite-7.0.5p3/libs/pdflib/ -Ilame -Ilib/lame -fPIC -Wimplicit -Wreturn-type -Wno-write-strings -Wformat -O -fomit-frame-pointer -g -O0 gfx2gfx.c -o gfx2gfx.o
cd ../lib;make libgfxpdf.a;cd -
make[2]: Entering directory `/media/stark/afebc556-6185-4b48-81b3-bc81f3987dd8/kiran/gfx2gfx-pdftext-master/lib'
cd pdf;make libgfxpdf
make[3]: Entering directory `/media/stark/afebc556-6185-4b48-81b3-bc81f3987dd8/kiran/gfx2gfx-pdftext-master/lib/pdf'
make[3]: Nothing to be done for `libgfxpdf'.
make[3]: Leaving directory `/media/stark/afebc556-6185-4b48-81b3-bc81f3987dd8/kiran/gfx2gfx-pdftext-master/lib/pdf'
make[2]: Leaving directory `/media/stark/afebc556-6185-4b48-81b3-bc81f3987dd8/kiran/gfx2gfx-pdftext-master/lib'
/media/stark/afebc556-6185-4b48-81b3-bc81f3987dd8/kiran/gfx2gfx-pdftext-master/src
fatal: Not a git repository (or any parent up to mount point /media/stark/afebc556-6185-4b48-81b3-bc81f3987dd8)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
g++ -DHAVE_CONFIG_H -DGIT_VERSION="" gfx2gfx.o -o gfx2gfx ../lib/libgfxswf.a ../lib/librfxswf.a ../lib/libgfxpdf.a ../lib/libgfx.a ../lib/libbase.a -L/usr/local/lib -ljpeg -lz -lm -lstdc++
g++: error: ../lib/libgfxpdf.a: No such file or directory
make[1]: *** [gfx2gfx] Error 1
make[1]: Leaving directory `/media/stark/afebc556-6185-4b48-81b3-bc81f3987dd8/kiran/gfx2gfx-pdftext-master/src'
make: *** [all] Error 2
➜ gfx2gfx-pdftext-master
1.swf.zip
however above is test file.
after convert into pdf result is different 😄
After compiling some pdf's, I ran into an error with text copying. Before the invisible text -fix, text was still copiable and would paste properly. Currently, trying to highlight text from the generated pdf's doesn't work properly.
I only highlighted the word 'dream' but as the picture shows, more words are highlighted. Pasting this results in a mess of incoherency:
Additional details:
I used parameter -r 300
Tested in Adobe Reader and Google Drive
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.