Explanation IMHO pypdf deserves t

According to <a class="issue-link js-issue-link" data-error-text="Failed to load title

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-

ENH: Support detection of digital signatures about pypdf HOT 14 CLOSED

macdeport commented on July 17, 2024

ENH: Support detection of digital signatures

from pypdf.

Comments (14)

macdeport commented on July 17, 2024 1

Do all the test files in the pypdf project contain signed files?

Currently we have not signed document

The lack of test files (valid/invalid digital signature) will not help testing or short-term development.

So below 4 different signed documents (3 only with a valid signature) as test files:
la-grenouille-et-le-boeuf-vivant-signed-1.pdf
la-grenouille-et-le-boeuf-vivant-signed-2.pdf
la-grenouille-et-le-boeuf-vivant-signed-2-comp.pdf
la-grenouille-et-le-boeuf-vivant-signed_.pdf <= non valid

from pypdf.

stefan6419846 commented on July 17, 2024 1

According to MatthiasValvekens/pyHanko#127 and MatthiasValvekens/pyHanko#335 (comment), there are quite some changes which make it hard to update pypdf to the latest version on pyHanko. pyHanko itself embeds an own, heavily modified copy of some older PyPDF2 version. Upgrading will most likely be a large and tedious task, while there are more important aspects to work on in my opinion.

from pypdf.

stefan6419846 commented on July 17, 2024

You are of course invited to propose a corresponding PR with tests and being compatible with Python 3.7 as well.

from pypdf.

macdeport commented on July 17, 2024

Thanks for the invitation. But I don't have the necessary know-how for this PR and am not currently available to learn by doing ;-)

For a personal application, which I use several times a day in production, I have on the other hand carried out tests (signed and unsigned files) of very reasonable reliability, but cannot communicate these files as private.

Do all the test files in the pypdf project contain signed files?

from pypdf.

pubpub-zz commented on July 17, 2024

Encryption is not signauture. Also signature is a complex process for both signature and validation.
Also, the flag you're referecing just indicated that there is a signature field not indicating if signed or not

from pypdf.

pubpub-zz commented on July 17, 2024

Do all the test files in the pypdf project contain signed files?

Currently we have not signed document

from pypdf.

macdeport commented on July 17, 2024

Encryption is not signature. Also signature is a complex process for both signature and validation. Also, the flag you're referencing just indicated that there is a signature field not indicating if signed or not

@pubpub-zz Your are right.

Preliminary statement: in PDF format, I'm just an unfortunate self-taught man with his limits... Sharing these limitations often allows us to go further together ;-)

So can you suggest a realistic scenario (or better still show a file) where such a field would be present without the PDF containing a valid signature, or one that was or will be valid?

This proposal simply concerns the detection of digital signatures and not, of course, their validation.

The naming can lead to confusion: is_signed can become has_signature.

Furthermore, the proposed algorithm, here adapted to the pypdf context, has been used by James Barlow for his two pikepdf and OCRmyPDF projects and their many users for years: isn't that a great realistic cohort of testers?

This heuristic is undoubtedly not perfect, but better (at this time) than others on the web.

from pypdf.

MartinThoma commented on July 17, 2024

I vaguely remember that we decided against adding signature support as we wanted to focus on other parts + pyHanko exists: https://pypi.org/project/pyHanko/

from pypdf.

MartinThoma commented on July 17, 2024

#302
#653

from pypdf.

MartinThoma commented on July 17, 2024

Since we closed those issues, quite a lot of improvements were done. I would be open to discuss this topic again. We would need to clarify:

Scope: We need to define which capabilities we want / we feel confident to be able to maintain. What would be the value for the user? Do we have evidence that people look for this (especially when we say that we only support signatures partially this is questionable)
Examples for testing: Having a few example files we can test with - thank you, I'll add those to https://github.com/py-pdf/sample-files/ if that is fine to you @macdeport
PR: We would need to find somebody who is willing to create a PR. @macdeport , it's fine to not be an expert :-) Nobody here does PDF as their main job. And if you need help with finding where something fits in pypdf, I'm confident people will help :-)

from pypdf.

stefan6419846 commented on July 17, 2024

I personally do not think that we should really start with the signature stuff. We already have the pyHanko project which does this completely fine (although on an older copy of pypdf) - why should we re-implement this here while just increasing the maintenance load?

from pypdf.

pubpub-zz commented on July 17, 2024

As said earlier, signing is quite tough : In order to be efficient the solution needs to apply work with PKCS / PKI. I have not used yet pyHanko. It looks quite good, although I see 2 points : It stills uses pyPDF2 and uses some old interfaces (e.g. PdfFileReader) which prevents a proper work with some incremental PDFs (I've started to work on the PR which has been proposed although the work is quite tough). Maybe we should start to exchange with pyHanko to make it compatible with the latest pypdf?

from pypdf.

macdeport commented on July 17, 2024

#2655 (comment)
@MartinThoma Thank you for your kind and careful summary.

These PDFs have been produced especially for the test base of this project, which I appreciate both for its didactic contribution and as a user of the functions developed. Thank you to all those who are helping to bring it to life.

from pypdf.

pubpub-zz commented on July 17, 2024

based on exchanges I convert this thread into a discussion

from pypdf.

ENH: Support detection of digital signatures about pypdf HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs