GithubHelp home page GithubHelp logo

unidoc / unipdf-examples Goto Github PK

View Code? Open in Web Editor NEW
263.0 13.0 100.0 82.99 MB

Examples for creating and processing PDF files with UniPDF https://github.com/unidoc/unipdf

Home Page: https://unidoc.io

Go 81.04% Shell 0.04% Smarty 18.92%

unipdf-examples's Introduction

Examples

This example repository demonstrates many use cases for UniDoc's UniPDF library. Example code should make it easy for users to get started with UniPDF. Feel free to add to this by submitting a pull request.

While the majority of examples are fully in pure Go, there are a few examples that demonstrate additional functionality that requires CGO and external dependencies. Those examples are clarified by filename suffix "_cgo.go".

License codes

UniPDF requires license codes to operate, there are two options:

Most of the examples demonstrate loading the Metered License API keys through an environment variable UNIDOC_LICENSE_API_KEY.

Examples for Offline Perpetual License Key loading can be found in the license subdirectory.

Build all examples

Building with go modules:

Simply run the build script which builds all the binaries to subfolder bin/

$ ./build_examples.sh

Building with GOPATH:

Building with GOPATH requires a slightly different approach due to the /v3 semantic import portion of the unipdf import paths. There are two options:

Both options start with:

  • go get github.com/unidoc/unipdf/... to download the packages

Then one can decide between the two options:

  1. Remove the /v3/ in the unipdf import paths, e.g. use github.com/unidoc/unipdf/core instead of github.com/unidoc/unipdf/v3/core
  2. Alternatively create a symbolic link from the v3 subdirectory of unipdf to the unipdf repository, i.e.
ln -s $GOPATH/src/github.com/unidoc/unipdf $GOPATH/src/github.com/unidoc/unipdf/v3

or move/copy the unipdf folder to unipdf/v3 if symbolic links are not an option.

Once this has been done, then can build using the build script as well:

$ ./build_examples.sh

or build individual example codes as desired.

unipdf-examples's People

Contributors

3ace avatar a5i avatar abouroumine avatar adrg avatar alexander-deniskin avatar anovik avatar daniel-orlov avatar danishyasin33 avatar dependabot[bot] avatar dikyridhlo avatar fizzdi avatar grigortovmasian avatar gunnsth avatar kellemnegasi avatar kelogs avatar kucjac avatar moritamori avatar peterwilliams97 avatar sampila avatar tkrajina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

unipdf-examples's Issues

Cloud Providers

Hi,

I'm planning on using the license version in the future. I'm interested in the digital signatures functionality.

However, I don't see examples for different cloud providers. Is it possible to add an example integrating with AWS KMS and Google Cloud KMS?

Tried the external example but it seems it doesn't do the trick.

Replaced text looks incorrect

An issue was received via email.

About text replacement I checked it with our document. The searched text is replaced but some symbols displayed incorrect. For example I try to replace word TYPE with A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z and symbols J, K, V and Y does not displayed. Instead of this symbols I received empty space. The same case with lower case. Also all special symbols displayed incorrect !@#$%^&*(){}_-+=[]

This applies to the advanced text replacement example.
An example PDF template was provided.

example to flatten form-filled pdf to remove fillable fields?

HI there -

nice library! is there an example that shows how to flatten a form filled pdf?

I think I figured out how to fill forms (using acroForm.Fields and the field.V=...), although when the pdf is saved, it's saved with fillable fields (with my values). It would be best to somehow save form filled pdf into a file without any form fields present, with just values (flatten the final pdf in other words).

Thanks!

forms/pdf_form_fill_json.go debug logs sometimes pollute JSON output

One of the example steps in forms/pdf_form_fill_json.go is to run:

go run pdf_form_fill_json.go input.pdf > formdata.json

Because the example enables debug logging to io.Stdout, any debug messages will wind up in the output JSON file, and will break it.

For example, one of my PDFs produces this output:

$ go run pdf_form_fill_json.go in.pdf > data.json
$ head -6 data.json
[DEBUG]  parser.go:754 Pdf version 1.3
[
    {
        "name": "MyField",
        "value": "foo123"
    },

Perhaps logging should be disabled? Or, alternatively, you could alter the loggers in "github.com/unidoc/unipdf/v3/common" to output to io.Stderr (instead of io.Stdout), and a caller could pipe only the stdout stream to the JSON file.

Page rotation flattening example

Create an example for rotation flattening.
If page's rotation is not 0, then the viewer will rotate the page, yet the coordinate system is unchanged.
In some cases, it is desirable to work with the PDF in the same way as it appears in the viewer, i.e. that point (0,0) is the same upper left corner as shown in the viewer.

The rotation flattening example should analyze each page, and if the Rotate number is not 0, then add code to the contentstream (via creator) to rotate the page contents, and then set the page's Rotate to 0. There should be no visual difference when opening the PDF file in the viewer.

example of text search and replace

Cant see how to do this with the current text examples.

My use case is i have a PDF that is a CAD output with text in it.
Its in English and i want to replace the text for German text.
Obviously the English to Germany text is using google translate API.

I can see how to loop the ContentStream from the examples:


fmt.Printf("--------------------\n")
	fmt.Printf("PDF to text extraction:\n")
	fmt.Printf("--------------------\n")
	for i := 0; i < numPages; i++ {
		pageNum := i + 1

		page, err := pdfReader.GetPage(pageNum)
		if err != nil {
			return err
		}

		contentStreams, err := page.GetContentStreams()
		if err != nil {
			return err
		}

		// If the value is an array, the effect shall be as if all of the streams in the array were concatenated,
		// in order, to form a single stream.
		pageContentStr := ""
		for _, cstream := range contentStreams {
			pageContentStr += cstream
		}

		fmt.Printf("Page %d - content streams %d:\n", pageNum, len(contentStreams))
		cstreamParser := pdfcontent.NewContentStreamParser(pageContentStr)
		txt, err := cstreamParser.ExtractText()

		if err != nil {
			return err
		}
		fmt.Printf("\"%s\"\n", txt)
	}

But the insert text is actually not able to replace existing text:
https://github.com/unidoc/unidoc-examples/blob/master/pdf/text/pdf_insert_text.go

Please help...

Problem filling field choice

Hi,

I am trying to fill a pdf with a select/choice by using pdf_form_fill_json.go , however it is not filling.

I haven't got any error, just a log:
[DEBUG] logging.go:125 Unexpected string for button/choice field. Converting to name: 'RJ'

I am using the pdf and json in attachment
teste.pdf
teste.json.txt

PDF Replace text example that works independently of encoding.

The example
https://github.com/unidoc/unipdf-examples/blob/master/text/pdf_search_replace.go

relies on the charcodes to non-encoded and operates directly without accounting for encoding of search or replacement text.

The problem with this is that text is frequently encoded and in that case the example fails.

A couple of proposal options are put forward below. The most immediate is probably the first option. The second option could be feasible in the future when we have a more complete set of graphics extractors.

Proposal option 1:

Take encoding into account when decoding text and encoding replacement.
Need to access the font that generated the text and its encoding properties.

Proposal option 2:

Implement using the extractor and TextMarks.

  1. Load the text via TextMarks
    See
    https://github.com/unidoc/unipdf-examples/blob/f523c3497ec967013956e0c62bca09192b604231/text/pdf_to_csv.go
    https://github.com/unidoc/unipdf-examples/blob/master/text/pdf_text_locations.go

  2. May need to do grouping and such in case glyphs are individually displayed (proximity analysis). Then match against the search text and replace with the target. Take care that the target is encoded and displayed with same font as original.

We could make it optional to shift texts if there is any extra space to make it more natural but by default that is not supported. Need to get some use cases/examples before adding that. Text could shift in a complicated manner (across lines and even pages), so its not a trivial problem.

  1. Add the text to page with Paragraph (creator).

  2. Add non-textual content to Page straight from the original page.
    Might make sense to start by filtering the original page contents and remove text and apply that straight to the page, then draw the text on top of it.
    Can potentially lead to some unintended z-index issues (i.e. text above/under some graphic that it should not be)

At the moment this is tricky because the extractor only supports TextMarks and not extraction of other graphics. Ideally would extract all Graphics (including TextMarks), filter them and then regenerate the output.

Related:

unidoc/unipdf#83
unidoc/unipdf#9

4up pages example

Create an example demonstrating how to load N pages, scale them and place on one page. Similar to creating handouts, i.e. 4 pages scaled down and put on 1 page with a box around each.

This would demonstrate the capability to easily work with and manipulate page contents with the creator package.

Large output sizes from split PDF

I am currently testing this with 5 pages PDF. I am doing a very simple task in golang:

First, I am splitting each page into its own pdf. Second, I take each split page PDF and merge them all into another PDF.

When I compare file size of the original PDF (200 KB) with the output of merged PDF (500 KB), merged PDF is a bigger size.

Is there a way to reduce the size of the merged PDF?

Fill form by json, but cannot display English and digital charcode

We created a Chinese charcode PDF, and it contains forms. we used pdf_form_fill_json.go to fill the PDF forms, but the filled form PDF cannot display English and digital charcode.

How shall we resolve it? emergency

debug info
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='1'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='1'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='1'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='a'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='s'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='d'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='f'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='a'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='s'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='d'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='f'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune='['
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune=']'
[DEBUG] encoder.go:72 Failed to map rune to charcode. rune=' '

fill.json
[
{
"name": "合同",
"value": "合同11111111asdfasdf[] () ()【】。."
}
]

test.pdf

Grayscale conversion: FailingPDF.pdf - Range check error - Pattern colorspace issue

Problem with
[0 0 0 /P1] scn fails

Output from grayscale_convert_bench
$ go run pdf_grayscale_convert_bench.go -d -g /tmp/mybla15 /Users/ghall/pdfdb_small/FailingPDF.pdf
compDir=compare.pdfs/dir.000
0 of 1 FailingPDF.pdf (236279->[DEBUG] parser.go:677 Pdf version 1.3
[DEBUG] pdf_grayscale_convert_bench.go:322 ^^^^page 1
[DEBUG] pdf_grayscale_convert_bench.go:757 Name=(*core.PdfObjectName)(0xc42030a6a0)=Im1
[DEBUG] pdf_grayscale_convert_bench.go:767 xtype=1 pdf.XObjectTypeImage=1
[DEBUG] pdf_grayscale_convert_bench.go:757 Name=(*core.PdfObjectName)(0xc42032a2c0)=Im2
[DEBUG] pdf_grayscale_convert_bench.go:767 xtype=1 pdf.XObjectTypeImage=1
[DEBUG] colorspace.go:2020 ERROR: Unable to convert color via underlying cs: Range check
[DEBUG] processor.go:414 ERROR: Fail to get color from params: [0 0 0 P1] (CS is Pattern)
[DEBUG] processor.go:241 Processor handling error (scn): Range check
[DEBUG] processor.go:242 Operand: "scn"
[ERROR] pdf_grayscale_convert_bench.go:883 processor.Process returned: err=Range check
[ERROR] pdf_grayscale_convert_bench.go:363 transformContentStreamToGrayscale failed. err=Range check
[ERROR] pdf_grayscale_convert_bench.go:172 transformPdfFile failed. err=Range check
, bad
1 files 1 bad 0 pass 1 fail

The problem is in the code
if patternCS.UnderlyingCS != nil {
// Swap out for a gray colorspace.
patternCS.UnderlyingCS = pdf.NewPdfColorspaceDeviceGray()
}

To handle properly, need to use the actual underlying colorspace...

Add table reporting examples

Create examples showcasing the powerful table functionality

  • pdf/report/pdf_tables.go Showing basic tables with header wrapping across column, an image inside the table . (based off TestTableWithImage, TestTableHeaderTest)

Could maybe split into chapters and show different examples in each chapter.

  • pdf/report/pdf_subtables.go showcasing more complex tables using subtables (based off TestTableSubtables)

https://github.com/unidoc/unidoc/blob/0d2e2fa2cda8451aedaf0c3cd519feb91ddf173d/pdf/creator/table_test.go#L410

Support for annotations in grayscale conversions

Annotations can have its appearance defined either via appearance streams, or through the PDF viewer's interpretation and displaying of the contents.

Clearly, for most reliability, appearance streams are preferred.

The most robust way would be to convert the appearance streams to grayscale, as well as any colors that are defined within the annotation, in which case, requires handling for each type of annotation.

position information of Image and Text

Hey,

I see the examples can help up quickly get all the text and images from a PDF, but how can I get the position (BBox) information for each image and character?
Also the font information may be needed to analyse the text :)

Thanks a lot!

Digital signature examples for v3

Create examples for signatures. Already started with a draft in #35. The example cases should be:

  1. Basic example with private/public key in PKCS12 (.p12/.pfx file). e.g. ./pdf_sign_pkcs12 file.p12 input.pdf input_signed.pdf
  2. Example of signing with a blank signature first and then replacing blank signature with actual signature contents.
  3. Example of signing via PKCS11 / HSM such as already in #35
  4. Example of signature appearances
  5. Example for signature validation, e.g. ./pdf_sign_validate file.pdf prints out signature info and validation

Advanced search/replace issue

First, this example was a more elegant solution to the problem as I described in unidoc/unipdf#267 . Thank you!

For the documents I'm interested in, here are some attributes from them since I can't share the originals nor can I create my own documents:

  • Created with Adobe PDF Library 9.0
  • Acrobat PDFMaker 9.0 for Word
  • PDF 1.5
  • Optimized

The advanced search/replace almost works, which is great.

The problem I have is that my letters are segmented to an individual letter per text object, which means this example puts all of the text in the first object, and sets all the other objects to empty. Unfortunately, my pdf viewer moves all the text over to the left, so the new text overlaps with the text next to the old text by quite a bit. Not good.

The solution is to walk each text segment, and in each segment replace the existing characters with the corresponding characters from the replacement text. There are some visible problems when the search text is larger than the replace text, but it's pretty good otherwise.

I've created a modified version of the example, but it was pretty straightforward to do and I don't want to sign the CLA at this time. The fix involves rewriting the inner loop of the 'replace' function, and just modifying the way that you change the chunks. Psuedocode is as follows:


// loop 1
  // loop 2
    // loop 3
      .. existing 'continue' code
      // chunkOffset += 1

      // keep track of consumed characters

      // chunkoffset loop
        // ensure the first chunk retains some of the original content

        // middle chunks: replace all of this chunks content, increment chunkOffset

        // last chunk: append any remaining content

pdf_grayscale_convert_bench.go: Grayscale conversion not working on specific shading dictionary

The grayscale conversion failed on the file: GLType_Stats-Large.pdf.

There was no unidoc error, but color analysis revealed that the output was colored.

go run pdf_grayscale_convert_bench.go -g /tmp/bla1 -d ~/pdfdb_small/GLType_Stats-Large.pdf
compDir=compare.pdfs/dir.000
0 of 1 GLType_Stats-Large.pdf ( 12116->[DEBUG] parser.go:677 Pdf version 1.3
[DEBUG] pdf_grayscale_convert_bench.go:290 ^^^^page 1
[DEBUG] pdf_grayscale_convert_bench.go:723 Name=(*core.PdfObjectName)(0xc42023ac10)=Fm1
[DEBUG] pdf_grayscale_convert_bench.go:733 xtype=2 pdf.XObjectTypeImage=1
[DEBUG] pdf_grayscale_convert_bench.go:809 XObject Form: Fm1
[DEBUG] pdf_grayscale_convert_bench.go:905 Converting shading to gray - cs: Separation
[DEBUG] pdf_grayscale_convert_bench.go:908 Already 1 component - no action
[DEBUG] pdf_grayscale_convert_bench.go:723 Name=(*core.PdfObjectName)(0xc420291ec0)=Fm2
[DEBUG] pdf_grayscale_convert_bench.go:733 xtype=2 pdf.XObjectTypeImage=1
[DEBUG] pdf_grayscale_convert_bench.go:809 XObject Form: Fm2
21503 177%) 1 pages 0.005 sec => /tmp/bla1/GLType_Stats-Large.pdf[ERROR] pdf_grayscale_convert_bench.go:200 isPdfColor: 1 Color pages
, fail
1 files 0 bad 0 pass 1 fail
total duration (everything): 0 seconds
0 bad
0 pass
1 fail
0 /Users/ghall/pdfdb_small/GLType_Stats-Large.pdf - color fail: 1 color pages / 1 total

Digital signing lags when opening pdf with Adobe Reader

I just self signed a document using your library, and it is having problems with Adobe Reader. It loads extremely slow on linux and on windows and OS x it gets stuck or even crashes. Do you know if this is an problem with the unipdf lib or adobe reader?

Grayscale conversion: Handle 1 component colorspaces generally (not assume is gray)

Should not just ignore 1 component colorspaces. Indexed colorspaces are often 1 component and needs to be handled more generally.

Multiple failures in pdf_grayscale_convert_bench.go are failing due to this.
if ximg.ColorSpace.GetNumComponents() == 1 {
return nil
}

Example file that was failing due to this is kdchart-1, which was using an Indexed colorspace for image data.

digital signature filled with image

i want to fill with image when i digital signature.this time i only see


		field, err := annotator.NewSignatureField(
			signature,
			[]*annotator.SignatureLine{
				annotator.NewSignatureLine("Name", "John Doe11"),
				annotator.NewSignatureLine("Date", "2019.15.03"),
				annotator.NewSignatureLine("Reason", "External signature test"),
			},
			opts,
		)
		field.T = core.MakeString("External signature")

Adding "metadata" or something similar to a PDF

Hi guys,

Awesome library. I was wondering if there's a function to allow adding something like metadata to a PDF file. For example, I'd like to tag/label a certain file, but not as part of the file name and in no place that is visible to the user.

Does this exist in PDF? Or is it something that must be implemented in the filesystem level?

Thanks

Emoji's Break Text input

While using an emoji the unidoc newParagraph function, the text is not displayed correctly see screen shots

when entering this this character sequence

p = creator.NewParagraph(text)

Ç % { < a@E£@!&%@&^!@($**))_))!@)(£&^%^!
without the emoji it prints all the characters
image

Ç % { < a@E£@!&%@&^!@($**))_))!@)(£&^%^! /😀
with emoji at the end verticalises the text and does not print emoji

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.