empira / pdfsharp-1.5 Goto Github PK
View Code? Open in Web Editor NEWA .NET library for processing PDF
License: MIT License
A .NET library for processing PDF
License: MIT License
If you once set GlobalFontSettings.FontResolver
you can't change it.
I have a Project that gets inputs from different sources which can bring there own fonts.
Because the different sources can use the same name for different fonts, the easiest way would be create a font resolver for any input and than switching the GlobalFontSettings.FontResolver
. The library would then automatically invalidate all caches.
I currently see only two work arounds:
Unfortunately I can't use 2 and using one will eventually run out of memory, because the internal cache will only grow and never shrinks until the process is terminated.
Hello,
Your project is very useful and written clear. Thanks a lot!
Do you have any plans to complete PDF Core Build to make the libraries work on Linux using Mono?
Hi there,
this bug is still active!
https://stackoverflow.com/questions/38733190/pdfsharp-losing-annotations
Best regards,
Maurice
There appears to be an issue with bookmarks when a PDF document is encrypted via PdfSecuritySettings.
Take the example of adding a bookmark to a newly built PDF [via the AddBookmark() method] using MigraDoc. After that PDF is rendered [via the PdfDocumentRenderer.RenderDocument() method] if a password is set in the OwnerPassword property via the PdfSecuritySettings object in order to secure the PDF (and the DocumentSecurityLevel property is set to PdfDocumentSecurityLevel.Encrypted128Bit), then all bookmark text appears encrypted in the final, saved PDF when viewed via Adobe Acrobat. If a MigraDoc-constructed PDF is not secured, then bookmark text appears in clear text when viewed via Adobe Acrobat.
As a control, if a PDF is manually secured (by adding an owner password, for example) while using Adobe Acrobat, then bookmark text in that PDF remains readable (i.e., in clear text when viewed via Adobe Acrobat).
I need the function
public OpenTypeFontface CreateFontSubSet(Dictionary<int, object> glyphs, bool cidFont)
found in
OpenType\Fonts.OpenType\OpenTypeFontface.cs
in another project, together with XFontMetrics.
I've isolated the OpenType code for my purposes here:
https://github.com/ststeiger/PdfSharpNetStandard/tree/master/OpenType
However, it is intertwined with the PdfSharp code (XFont, XFontSource, XGlyphTypeface, XPrivateFontCollection, FontFactory, some enums, etc).
Would it be possible to refactor, and isolate OpenType in a way that it doesn't depend on PdfSharp ?
Extra bonus points if GDI components could be avoided as well (GDI doesn't work on Azure Web App).
I'm trying to fill a PDF with AcroForm, all goes well but I've problem to set value on PdfCheckBoxField with same name that are used like multiple options.
See "Controllo impianto istallazione interna II" field in the attached PDF: there are 3 field with value 1/2/3.
I saw the PDFsharp source code, but it seems that it handle only specifics situation with only 2 options.
Any idea to workaround the problem?
Thank you.
Davide.
I get this error when trying to open the included pdf using this command. I am trying to open this file to combine with other files.
PdrReader.Open(FileName, pdfDocumentOpenMode.Import)
Object already in table
Thanks.
The official project web site:
http://pdfsharp.net/
The official peer-to-peer support forum:
http://forum.pdfsharp.net/
We strongly recommend using the IssueSubmissionTemplate to make sure we can replicate the issue.
http://www.pdfsharp.net/wiki/IssueSubmissions.ashx
Form flattening has a specific meaning in PDFs where the form fields will be removed as a result of the process. In many circumstances, flattening is just used to disable editing but it has other side effects such as changing the document structure and reducing file size that specific use cases are dependent on. Form flattening is non-trivial to implement, and while this workaround accomplishes the desired effect in some use cases it introduces confusing bugs when the user is performing actions dependent on the form being truly "flattened". In my opinion this function is useful but should be given a different name such as MakeReadOnly, ReadOnly, DisableEditing, etc. that doesn't conflict with the common meaning of flatten.
The form fields will be removed and replaced with their contents as regular markup objects
The form fields are set to read only mode.
This is a great project that has been extremely useful for me, and I'm definitely nitpicking here but I think it's worth considering so that other consumers of the library don't end up down the same confusing debugging road I did. Thanks for the great work!
I discovered a bug in the PdfDocument.Save functions.
Saving without exception.
An InvalidOperationException("Cannot save a PDF document with no pages.") is triggered on a freshly opened PdfDocument, even if it contains pages.
This behaviour is only triggered in the compiled programme or during uninterrupted code runs in the debugger (I am using VS 2017). When using step by step debugging (F10), the exception is not triggered.
Calling the following Test() function with a valid PDF byte array triggers the InvalidOperationException when reaching the pdfDoc.Save(pdfFilePath) instruction. All the other checks before are passed successfully.
public static void Test(this byte[] pdf, string pdfFilePath)
{
if (pdf == null) { throw new ArgumentNullException(nameof(pdf)); }
PdfDocument pdfDoc;
try
{
pdfDoc = PdfReader.Open(pdf.ToPdfMemoryStream(), PdfDocumentOpenMode.Import);
}
catch (FormatException)
{
MessageBox.Show("Error: Invalid or empty PDF.");
return;
}
pdfDoc.Save(pdfFilePath);
}
public static MemoryStream ToPdfMemoryStream(this byte[] pdf)
{
if (pdf == null) { throw new ArgumentNullException(nameof(pdf)); }
PdfDocument outputDocument = new PdfDocument();
using (MemoryStream inputStream = new MemoryStream(pdf))
{
try
{
PdfDocument inputDocument = PdfReader.Open(inputStream, PdfDocumentOpenMode.Import);
foreach (PdfPage page in inputDocument.Pages)
{
outputDocument.AddPage(page);
}
}
catch (InvalidOperationException)
{
throw new FormatException("Kein gültiges PDF-Dokument.");
}
}
if (outputDocument.PageCount == 0) { throw new FormatException("PDF is empty"); }
MemoryStream outputStream = new MemoryStream();
outputDocument.Save(outputStream);
return outputStream;
}
Insert the following line in front of pdfDoc.Save(pdfFilePath);
:
if (pdfDoc.PageCount == 0) { throw new FormatException("PDF is empty"); }
The exception is thrown when in the void DoSave(PdfWriter writer)
function of the PdfDocument
class the following condition is met:
if (_pages == null || _pages.Count == 0)
The bug is caused by the _pages
variable not being initialised. This also explains the workaround: Calling PdfDocument.PageCount
triggers the initialisation of _pages
.
A possible fix of the bug is to replace _pages
with Pages
in the incriminated line, which triggers the initialisation. Here is a compact version of the fix that avoids to attempt the initialisation twice:
if ((Pages?.Count ?? 0) == 0)
Greetings,
In testing the PdfSharp package, we found that certain characters we not rendered properly. For example, the character ප්ර is being rendered as ප් ර. It appears to be linked to characters that use a zero-width joiner in their combination.
I've attached an example to demostrate the issue. Thank you for any support you can provide.
Regards.
Hi guys,
Currently I am suffering a null reference exception when calling the static function
PdfReader.Open(Stream stream, PdfDocumentOpenMode mode).
in my case the mode is PdfDocumentOpenMode.Import
Did you guys encounter this error before? And did I miss something?
Thanks.
Whenever I edit an acro form (for example, change text stream properties, use checkboxes, or adjust listboxes), a warning pops up if I open it in adobe acrobat. I opened it in foxit I don't seem to have these issues.
I can open the adjusted file in adobe acrobat and see the changes made.
The warning shown above pops up in adobe and the form defaults to its original configuration. Changes seem to be fine in foxit.
Adjust an acroform's elements properties, for example, a radiobutton's elements["V"] property.
Surrogate characters (characters that does not fit in 2 bytes) will not drawn correctly.
Drawing string with surrogate characters (e.g. 🅐) should draw the correct glyph.
Two non recognizable characters are printed. The surrogate pair is interpreted as two separated characters.
You can reproduce this with the minimal sample repository
Relative coordinates are drawn normally
Relative coordinates are drawn with increasing of actual coordinate values.
using System;
using System.Diagnostics;
using System.IO;
using PdfSharp.Drawing;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
namespace DrawRectangle
{
class Program
{
static void Main()
{
var srcPdf = @"C:\Users\Sergey Kuchuk\Desktop\TEST.pdf";
var tmpPdf = @"C:\Users\Sergey Kuchuk\Desktop\TEST-REZ.pdf";
if (File.Exists(tmpPdf))
File.Delete(tmpPdf);
File.Copy(srcPdf, tmpPdf);
var inputPdfDocument = PdfReader.Open(tmpPdf, PdfDocumentOpenMode.Import);
var pdfDocument = new PdfDocument();
foreach (var page in inputPdfDocument.Pages)
pdfDocument.AddPage(page);
inputPdfDocument.Close();
double x = 0.18900810339655708;
double y = 0.0638677225513861;
double w = 0.21725069355926102;
double h = 0.028283526804211694;
using (var pageGraphics = XGraphics.FromPdfPage(pdfDocument.Pages[0]))
{
Console.WriteLine("Width (points) {0}; Height: {1}", pdfDocument.Pages[0].Width.Point, pdfDocument.Pages[0].Height.Point);
DrawRectangle(pageGraphics, XColors.Green, 1,
x * pdfDocument.Pages[0].Width, y * pdfDocument.Pages[0].Height,
w * pdfDocument.Pages[0].Width, h * pdfDocument.Pages[0].Height);
}
pdfDocument.Save(tmpPdf);
Process.Start(tmpPdf);
}
public static void DrawRectangle(XGraphics pageGraphics, XColor color, double penWidth, double x, double y, double width, double height)
{
var pen = new XPen(color, penWidth)
{
LineCap = XLineCap.Round,
LineJoin = XLineJoin.Bevel
};
Console.WriteLine(pageGraphics.PageUnit);
pageGraphics.DrawRectangle(pen, new XRect(x, y, width, height));
}
}
}
When testing the ConcatenateDocuments sample, I get a null reference exception on opening HelloWorld.pdf.
(I switched from .NET 2.0 to .NET 4 before running the sample)
I downloaded the PDF Sharp source code from Sourceforge. When i open the solution 'BuildAll-PdfSharp.sln' in VisualStudio and try to build, i am getting following build errors
Error 14 Metadata file 'D:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp-gdi\bin\Debug\PdfSharp-gdi.dll' could not be found D:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp.Charting-gdi\CSC PdfSharp.Charting-gdi
Error 15 Metadata file 'D:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp-wpf\bin\Debug\PdfSharp-wpf.dll' could not be found D:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp.Charting-wpf\CSC PdfSharp.Charting-wpf
Error 13 Metadata file 'D:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\bin\Debug\PdfSharp.dll' could not be found D:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp.Charting\CSC PdfSharp.Charting
Error 1 The name 'nameof' does not exist in the current context D:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Pdf.Advanced\PdfPageInheritableObjects.cs 67 88 PDFsharp
Error 2 The name 'nameof' does not exist in the current context D:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Drawing\XUnit.cs 71 78 PDFsharp
Error 3 The name 'nameof' does not exist in the current context D:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Pdf.Advanced\PdfContents.cs 131 49 PDFsharp
Error 4 The name 'nameof' does not exist in the current context D:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Pdf.Content.Objects\CObjects.cs 775 53 PDFsharp
Error 5 The name 'nameof' does not exist in the current context d:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Pdf.Advanced\PdfPageInheritableObjects.cs 67 88 PdfSharp-gdi
Error 6 The name 'nameof' does not exist in the current context d:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Drawing\XUnit.cs 71 78 PdfSharp-gdi
Error 7 The name 'nameof' does not exist in the current context d:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Pdf.Advanced\PdfContents.cs 131 49 PdfSharp-gdi
Error 8 The name 'nameof' does not exist in the current context d:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Pdf.Content.Objects\CObjects.cs 775 53 PdfSharp-gdi
Error 9 The name 'nameof' does not exist in the current context d:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Pdf.Advanced\PdfPageInheritableObjects.cs 67 88 PdfSharp-wpf
Error 10 The name 'nameof' does not exist in the current context d:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Drawing\XUnit.cs 71 78 PdfSharp-wpf
Error 11 The name 'nameof' does not exist in the current context d:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Pdf.Advanced\PdfContents.cs 131 49 PdfSharp-wpf
Error 12 The name 'nameof' does not exist in the current context d:\eProof\testProjects\PDFSharp_Github\PDFsharp\src\PdfSharp\Pdf.Content.Objects\CObjects.cs 775 53 PdfSharp-wpf
Can anyone help me out with building this?
Issue Submission Code
PNM PDFSharp Issue Submission 2018-04-18.zip
We have a PDF with a crop box that doesn't match the media box and we want to create a new PDF without any cropping but whose appearance matches the original when opened in Acrobat or any other reader.
Expected behavior is that
XPdfForm.PointWidth
and XPdfForm.PointHeight
should take the crop box into accountXGraphics.DrawImage
when the source image is an XPdfForm
should take the crop box into accountsrcRect
for XGraphics.DrawImage
srcRect
for XGraphics.DrawImage
, the image is squished.See issue submission template for code.
XPdfForm.FromStream
to open the source PDF and then set the page index.XPdfForm
object as the source image in a call to XGraphics.DrawImage
srcRect
I tried this with both the current stable version as well as the new version (PDFSharp-GDI), both from nuget. In the attached code, the WPF project is using the new PDFSharp-GDI. The other projects are using the current stable version. So this is still an issue in the current release candidate.
Thanks!
I've been porting the code to NetStandard.
https://github.com/ststeiger/PdfSharpNetStandard
Could you move the code in PdfSharp.Forms and PdfSharp.Windows into a separate shared-project ?
https://github.com/ststeiger/PdfSharpNetStandard/tree/master/PdfSharp_Removed
Also, same thing with Rendering.Forms and Rendering.Windows in MigraDoc.Rendering
https://github.com/ststeiger/PdfSharpNetStandard/tree/master/MigraDoc_Rendering_Removed
Then it would be very simple to have a NetStandard-Version.
Also, if you used partial classes in cunjunction with a shared project for gdi and wpf, then you could get rid of all the #ifs that make the project unreadable, and also wouldn't need to symlink files.
The issue is pretty simple to reproduce: add an image and set the encryption of the document. The result is a broken document (Adobe Reader alerts that), and not image displayed. Other objects seem rendering fine, though.
What I found is about a double-RC4 encryption (thus no encryption) on stream objects. Removing one of them, the resulting document seems okay.
The first encryption is performed here: https://github.com/empira/PDFsharp/blob/b84018e1ef6c646a4062c7bb4f53561c4027d48f/src/PdfSharp/Pdf.Security/PdfStandardSecurityHandler.cs#L170
The second one here, during the document saving (Save method to file): https://github.com/empira/PDFsharp/blob/b84018e1ef6c646a4062c7bb4f53561c4027d48f/src/PdfSharp/Pdf.IO/PdfWriter.cs#L428
At this point, the question is: which one is the better to remove?
Not all fonts have the same glyphs. It would be nice if one could check with an method on an XFont object if a specific character has a glyph in the font. So the calling library's can change to a fallback font.
The best method I've fond was CharCodeToGlypheIndex
. If this returns 0
no glyph was found. But this method is only internal accessible. So calling library's can't use this.
I am trying to put QRCode on a PDF file. Here's my code:
PdfDocument document = new PdfDocument();
PdfPage page = document.AddPage();
page.Orientation = PdfSharp.PageOrientation.Portrait;
page.Width = XUnit.FromInch(8.5);
page.Height = XUnit.FromInch(11);
XGraphics gfx = XGraphics.FromPdfPage(page);
XImage xImage = XImage.FromGdiPlusImage(image);
gfx.DrawImage(xImage, 10, 10, 290, 290);
document.Save("test.pdf");
I'm getting a null reference exception, here's the stacktrace:
at PdfSharp.Pdf.Advanced.PdfImage.ReadIndexedMemoryBitmap(Int32 bits) at PdfSharp.Pdf.Advanced.PdfImage..ctor(PdfDocument document, XImage image) at PdfSharp.Pdf.Advanced.PdfImageTable.GetImage(XImage image) at PdfSharp.Pdf.PdfPage.GetImageName(XImage image) at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.Realize(XImage image) at PdfSharp.Drawing.Pdf.XGraphicsPdfRenderer.DrawImage(XImage image, Double x, Double y, Double width, Double height) at PdfSharp.Drawing.XGraphics.DrawImage(XImage image, Double x, Double y, Double width, Double height) at UniWallet.Services.ApiApplication.Extensions.ImageExtensions.ToPDFFileByteArray(Image image) in C:\CODES\UniWallet\UniWallet_Dev\UniWallet.Services\Api\UniWallet.Services.ApiApplication\Extensions\ImageExtensions.cs:line 48 at UniWallet.Services.ApiApplication.Test.Extensions.ImageExtensionsTest.Create() in C:\CODES\UniWallet\UniWallet_Dev\UniWallet.Services\Api\UniWallet.Services.ApiApplication.Test\Extensions\ImageExtensionsTest.cs:line 36
I tried to follow the stack trace, and here's where I ended up:
PDFImage.cs
case PixelFormat.Format1bppIndexed: ReadIndexedMemoryBitmap(1/*, ref hasMask*/); break;
I tried images with other pixel format, meaning pictures with some color, and the code worked. Looks like there's something happening when an image with only two colors (black and white) is being used.
THANKS.
Document - pdf.pdf
PdfSharp.Pdf.IO.PdfReaderException was unhandled
HResult=-2146233088
Message=Unexpected character '0x0017' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file.
Source=PdfSharp-gdi
StackTrace:
в PdfSharp.Internal.ParserDiagnostics.HandleUnexpectedCharacter(Char ch)
в PdfSharp.Pdf.IO.Lexer.ScanNextToken()
в PdfSharp.Pdf.IO.Parser.ReadInteger(Boolean canBeIndirect)
в PdfSharp.Pdf.IO.Parser.ReadObjectNumber(Int32 position)
в PdfSharp.Pdf.IO.Parser.ReadXRefStream(PdfCrossReferenceTable xrefTable)
в PdfSharp.Pdf.IO.Parser.ReadXRefTableAndTrailer(PdfCrossReferenceTable xrefTable)
в PdfSharp.Pdf.IO.Parser.ReadTrailer()
в PdfSharp.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider)
в PdfSharp.Pdf.IO.PdfReader.Open(Stream stream, PdfDocumentOpenMode openmode)
Install-Package PDFsharp-gdi -Version 1.50.4619-beta4c
;Save
;The document should be saved and protected with a password.
System.NullReferenceException: Object reference not set to an instance of an object.
at PdfSharp.Pdf.Security.PdfStandardSecurityHandler.PrepareRC4Key(Byte[] key, Int32 offset, Int32 length)
at PdfSharp.Pdf.Internal.PdfEncoders.FormatStringLiteral(Byte[] bytes, Boolean unicode, Boolean prefix, Boolean hex, PdfStandardSecurityHandler securityHandler)
at PdfSharp.Pdf.IO.PdfWriter.WriteDocString(String text)
at PdfSharp.Pdf.PdfDate.WriteObject(PdfWriter writer)
at PdfSharp.Pdf.PdfDictionary.WriteDictionaryElement(PdfWriter writer, PdfName key)
at PdfSharp.Pdf.PdfDictionary.WriteObject(PdfWriter writer)
at PdfSharp.Pdf.PdfDocument.DoSave(PdfWriter writer)
at PdfSharp.Pdf.PdfDocument.Save(Stream stream, Boolean closeStream)
at PdfSharp.Pdf.PdfDocument.Save(String path)
static void Main()
{
// Create a new PDF document
PdfDocument document = new PdfDocument();
document.Info.Title = "Created with PDFsharp";
// Create an empty page
PdfPage page = document.AddPage();
// Get an XGraphics object for drawing
XGraphics gfx = XGraphics.FromPdfPage(page);
// Create a font
XFont font = new XFont("Times New Roman", 20, XFontStyle.BoldItalic);
// Draw the text
gfx.DrawString("Hello, World!", font, XBrushes.Black,
new XRect(0, 0, page.Width, page.Height),
XStringFormats.Center);
// Set the password(s):
document.SecuritySettings.OwnerPassword = "password";
document.SecuritySettings.UserPassword = "password";
// Save the document...
const string filename = "HelloWorld_tempfile.pdf";
document.Save(filename);
// ...and start a viewer.
Process.Start(filename);
}
Hello,
Can i use this library to sign an existing pdf file with a certificate?
Thanks,
Frederico
I have an issue with PDF documents that have an owner password and/or some odd content compression, like this one:
https://www.swissfunddata.ch/sfdpub/docs/kid-8059_08_05-20180208-en.pdf
PDFsharp fails to open them (and also crashes when looking up the exception message). I'm not entirely sure what goes wrong and whether it would all be fine if the owner password was available.
Btw, my use case is that I need to combine multiple documents like this one into one big document which can be printed more easily (i.e. the user doesn't have to open and print them one by one).
(MigraDoc) Would it be possible to add TextFormat.StrikeThrough? Thanks!
Please have a look at this commit for places where the order of arguments for ArgumentException
are switched, places where a variable used before a null check and use of the wrong variable in Equals
method.
jnyrup@7d219d2
Hi
First, I would like to thank you for the 2 great libraries PDFSharp and MigraDoc.
I have a suggestion.
I would like to use these libraries in ASP.NET Core, cross platform so that any ASP.NET Core application deployed anywhere can create PDFs documents on the fly with a good performance.
I suggest creating a new repository which will implement this and port the following:
@ststeiger @YetaWF have also created .NET Standard 2.0 partial ports, maybe they could help.
What do you think? would you be interested, see this as a good thing? Maybe we could all do it together.
@ststeiger @YetaWF @JimBobSquarePants
Greetings Damien
Hi,
I would like to see a feature to extract all text to string like PdfBox:
PDFTextStripper pdfTextStripper = new PDFTextStripper();
string contentOfAllPages = pdfTextStripper.getText(PDDocument.load(pdfFileName));
I have already found some code to generate content out of pages but the result has too many linebreaks. Espacially words are sometimes split across multiple lines.
I was taking a look at your software and I must say it's very good.
So I decided to give it a try...
I have one of these PDFs that contains a vector image inside.
I managed to extract the specific stream for the vector content:
This is what I got:
q
1 0 0 1 340.9799957 298.8000031 cm
1 g
0 0 m
20.04 0 l
20.04 -11.46 l
0 -11.46 l
0 0 l
h
f*
Q
BT
/C2_0 10.121 Tf
-0.175 Tc 342.06 289.56 Td
<0004000500060004>Tj
ET
(...)
I'm able to render it.
But I'm having some difficulties rendering the text.
For example:
BT
/C2_0 10.121 Tf
-0.175 Tc 342.06 289.56 Td
<0004000500060004>Tj
ET
The hex string does not seem to be a valid string.
My guess is that it's and index to the font's code page, in this case the font referred by /C2_0
0004000500060004
=> 0004 0005 0006 0004
Depending on the representation I'm assuming 2 bytes per code.
I don't know where to check that information, (I know simple font sizes only take one byte)
The question is how can I have access to the font and respective code page information to extract the text.
Or better yet if there's a simpler way to get all of this without me having to parse the vector data myself.
Getting the Objects directly... For example PdfLine
, PdfText
, PdfCircle
, etc...
Thanks.
I have a replacement for GetFontData, which works on all platforms.
https://gist.github.com/ststeiger/273341aebd29009f2b272b822b69563f
This uses the C# FreeType wrapper from
https://github.com/Robmaister/SharpFont
SharpFont doesn't work on Windows x64 out of the box, this correction is requried:
https://gist.github.com/ststeiger/9e2eb98e29a3c987aca739045af1d2ce
I think you have been working with OpenType before.
I'm pretty sure it would be possible to remove the dependency on FreeType.
when iterating through a pdf file a common way is to write some kind of a recursive method to do this.
A visitor class will make this more simple, something link this:
public class PdfCObjectVisitor
{
// the CObject class should contain a virtual Accept method
public void Accept(CObject @object) => VisitObject(@object);
public virtual void VisitName(CName name)
{
}
public virtual void VisitString(CString @string)
{
}
public virtual void VisitOperator(COperator @operator)
{
VisitSequence(@operator.Operands);
}
public virtual void VisitComment(CComment comment)
{
}
public virtual void VisitArray(CArray array)
{
foreach (var @object in array)
{
VisitObject(@object);
}
}
public virtual void VisitInterger(CInteger integer)
{
}
public virtual void VisitReal(CReal real)
{
}
public virtual void VisitNumber(CNumber number)
{
}
public virtual void VisitSequence(CSequence sequence)
{
foreach (var @object in sequence)
{
VisitObject(@object);
}
}
public virtual void VisitObject(CObject @object)
{
switch (@object)
{
case CName name:
VisitName(name);
break;
case CString @string:
VisitString(@string);
break;
case COperator @operator:
VisitOperator(@operator);
break;
case CComment comment:
VisitComment(comment);
break;
case CArray array:
VisitArray(array);
break;
case CInteger integer:
VisitInterger(integer);
break;
case CReal real:
VisitReal(real);
break;
case CNumber number:
VisitNumber(number);
break;
case CSequence sequence:
VisitSequence(sequence);
break;
}
}
}
then to write a class that extract all of the text from a pdf is really simple:
public class TextExtractorPdfVisitor : PdfCObjectVisitor
{
public StringBuilder Builder { get; } = new StringBuilder();
public override void VisitOperator(COperator @operator)
{
if (@operator.OpCode.OpCodeName != OpCodeName.TJ
&& @operator.OpCode.OpCodeName != OpCodeName.Tj)
{
return;
}
base.VisitOperator(@operator);
}
public override void VisitString(CString @string)
{
Builder.Append(@string.Value);
}
}
Some characters are not correctly extracted (eg: unicode hyphen variant U+2013 == EN DASH == e2 80 93).
CString seems to be empty.
For the summary on XGraphicsPdfPageOptions, shouldn't Append's summary tag read "The new content is appended in front of the old content and any subsequent drawing in done above the existing graphic."?
It threw me for a loop when I read The new content is inserted behind the old content and any subsequent drawing in done above the existing graphic. because that would be the exact opposite of an append.
For cross-platform applications (i.e. .Net Core 2.0) I'd suggest to add the support for ImageSharp as graphics interface.
I've been able to make it working, but that was because someone else made it work before me!
The current mine is somewhat meant as temporary until a better implementation will be available. Let me know if you're interested in my attempt.
When opening a PDF/A-Document with the following code I get a NullReferenceException:
PdfReader.Open(pdfFileName, PdfDocumentOpenMode.ReadOnly));
The exception is:
System.NullReferenceException : Object reference not set to an instance of an object.
at PdfSharp.Pdf.Security.PdfStandardSecurityHandler.PrepareRC4Key(Byte[] key, Int32 offset, Int32 length)
at PdfSharp.Pdf.Security.PdfStandardSecurityHandler.PrepareKey()
at PdfSharp.Pdf.Security.PdfStandardSecurityHandler.EncryptString(PdfString value)
at PdfSharp.Pdf.Security.PdfStandardSecurityHandler.EncryptDictionary(PdfDictionary dict)
at PdfSharp.Pdf.Security.PdfStandardSecurityHandler.EncryptArray(PdfArray array)
at PdfSharp.Pdf.Security.PdfStandardSecurityHandler.EncryptDictionary(PdfDictionary dict)
at PdfSharp.Pdf.Security.PdfStandardSecurityHandler.EncryptObject(PdfObject value)
at PdfSharp.Pdf.Security.PdfStandardSecurityHandler.EncryptDocument()
at PdfSharp.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider)
at PdfSharp.Pdf.IO.PdfReader.Open(String path, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider provider)
at PdfSharp.Pdf.IO.PdfReader.Open(String path, PdfDocumentOpenMode openmode)
at Garaio.REM.Business.Printing.PdfAppenderFixture.PageCountOf(String pdfFileName) in C:\projects\garaio\REM\04_Development\Garaio.REM.REWE.Tester\Garaio.REM.Business\Printing\PdfAppenderFixture.cs:line 29
I am writing a PDF file using PDFSharp. For some reason the value of a boolean object is written as 'False' instead of 'false' (notice the upper case 'F')
As a result while i am reading the file again i am getting following error in PDFSharp
"Unexpected token 'False' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file."
PDFSharp version: Assembly PdfSharp.dll, v1.50.4740.0
I love creating diagrams with PSTricks (that is a library of PostScript instructions wrapped for TeX users ).
I also love C#.
I am a newbie in PDFSharp. My question is
Is it possible to use PDFSharp for drawing diagrams that PSTricks can produce?
Please navigate to this link to know what PSTricks is.
When I try to add a custom font using PDFSharp.Drawing.XPrivateFontCollection.AddFont(), I get a warning that says the method is deprecated, and I should use Add(). There is no Add method. When I run it I get an exception.
I downloaded the code, and I can see in the XPrivateFontCollection.AddFont() method where the first line throws an exception. I also see the Add() method, but it is commented out.
I see you have just put RC1 out. It would be great if custom font support would work in the next release.
For a lot of PDF pages I get a SharpZipBaseException when calling XGraphics.FromPdfPage.
The exception is thrown in InflaterInputStream.Fill() with message "Unexpected EOF".
I can 'fix' this problem using a hack in InflaterInputStream.Read:
public override int Read(byte[] buffer, int offset, int count)
....
if (inf.IsNeedingInput)
{
try
{
Fill();
}
catch(SharpZipBaseException)
{ // WB! early EOF: apparantly not a big deal for some PDF pages: break out of the loop.
break;
}
}
...
If a PDF has a custom property whose name contains square brackets the file is corrupted when it is saved by PDFsharp. On the next open, PDFsharp gives an UnexpectedToken error and is unable to open the file. One is able to open the file in Acrobat Reader, but not able to view the file's properties.
I have included an example solution.
CustomPropertyIssue.zip
I am creating a new class PdfHighlightAnnotation
for rendering a transparent LightYellow
rectangle in order to highlight any text in an existing PDF.
I don't have any idea about what values to set in following methods. E.g. What should be the value of Keys.Subtype for highlighting text? Subsequently, what value I can set for Open
and Name
string constants of the Keys
class?
This code is from PdfTextAnnotation.cs
void Initialize()
{
Elements.SetName(Keys.Subtype, "/Text");
// By default make a yellow comment.
Icon = PdfTextAnnotationIcon.Comment;
//Color = XColors.Yellow;
}
internal new class Keys : PdfAnnotation.Keys
{
[KeyInfo(KeyType.Boolean | KeyType.Optional)]
public const string Open = "/Open";
[KeyInfo(KeyType.Name | KeyType.Optional)]
public const string Name = "/Name";
public static DictionaryMeta Meta
{
get { return _meta ?? (_meta = CreateMeta(typeof(Keys))); }
}
static DictionaryMeta _meta;
}
Can anyone help?
Hi,
Thank you for sharing this wonderful library.
My enterprise added a signing functionality to it and wants to contribute to your project. Before creating a pull request I need some information:
Best,
Paul
I'm starting to use your library to parse pdf and extract data from it. This is working perfectly for windows, however I'm not able to even compile with .Net core because of all the GUI library ( like silverlight or Winform) dependency that the base library has.
Do you thing it would be possible to have a base pdfSharp.Core library that only contain parsing and creation of of pdf ?
I'm would be happy to help if you need
If you try to map the sources for .Net Core (.Net Standard 2.0), you'll face an exception when an external OpenType font is loaded.
The exception is a "NotSupportedException", and it's thrown by the Encoding.GetEncoding(1252) call. The reason is well explained here.
namespace PdfSharp.Pdf.Internal
{
/// <summary>
/// Groups a set of static encoding helper functions.
/// </summary>
internal static class PdfEncoders
{
...
/// <summary>
/// Gets the Windows 1252 (ANSI) encoding.
/// </summary>
public static Encoding WinAnsiEncoding
{
get
{
if (_winAnsiEncoding == null)
{
#if !SILVERLIGHT && !NETFX_CORE && !UWP
// Use .net encoder if available.
_winAnsiEncoding = CodePagesEncodingProvider.Instance.GetEncoding(1252);
//_winAnsiEncoding = Encoding.GetEncoding(1252);
#else
// Use own implementation in Silverlight and WinRT
_winAnsiEncoding = new AnsiEncoding();
#endif
}
return _winAnsiEncoding;
}
}
...
}
}
By following the suggestion here, the problem seems solved.
However, I believe it should be made active via a proper conditional switch.
I didnt find any documentation to implement paging in footer.
I get an "Object already in table issue", I've been able to manipulate a lot of Pdfs but all of a sudden this issue arose for a specific pdf.
It looks like, it thinks it's not done?
i'm assuming after the first loop, the variable "prev" is supposed to be zero? so it breaks out of the loop?
is there something specific i am supposed to look for in the pdf itself?
After some research and several clicks I found that:
On site https://archive.codeplex.com/?p=pdfsharp there is information that project was migrated to github
also on this page I found link to project page http://www.pdfsharp.net/ where I fonud that it is licensed under MIT
Can license information as well as project homepage can be added to github README.md
It will be more than helpful to have that information close to code
Benefits:
Github search filter will work better if we searching by license
Reduce number of clicks to find license information
Will be much clearer under what license is this project for github users
Parser.cs/ReadXRefStream(): Is the line 1189 Debug.Assert(generation == 0) really necessary?
In my PDF file there is an XRef stream object with generation 1.
168 1 obj
<</DecodeParms <</Columns 5/Predictor 12>>/Filter /FlateDecode/ID [(K\267M\221U\321\254>\315\377,Q\372\207KL) (K\267M\221U\321\254>\315\377,Q\372\207KL)]/Info 1 0 R/Length 329/Root 2 0 R/Size 176/Type /XRef/W [1 3 1]>>
stream
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.