GithubHelp home page GithubHelp logo

Comments (14)

macdeport avatar macdeport commented on July 17, 2024 1

Do all the test files in the pypdf project contain signed files?

Currently we have not signed document

The lack of test files (valid/invalid digital signature) will not help testing or short-term development.

So below 4 different signed documents (3 only with a valid signature) as test files:
la-grenouille-et-le-boeuf-vivant-signed-1.pdf
la-grenouille-et-le-boeuf-vivant-signed-2.pdf
la-grenouille-et-le-boeuf-vivant-signed-2-comp.pdf
la-grenouille-et-le-boeuf-vivant-signed_.pdf <= non valid

from pypdf.

stefan6419846 avatar stefan6419846 commented on July 17, 2024 1

According to MatthiasValvekens/pyHanko#127 and MatthiasValvekens/pyHanko#335 (comment), there are quite some changes which make it hard to update pypdf to the latest version on pyHanko. pyHanko itself embeds an own, heavily modified copy of some older PyPDF2 version. Upgrading will most likely be a large and tedious task, while there are more important aspects to work on in my opinion.

from pypdf.

stefan6419846 avatar stefan6419846 commented on July 17, 2024

You are of course invited to propose a corresponding PR with tests and being compatible with Python 3.7 as well.

from pypdf.

macdeport avatar macdeport commented on July 17, 2024

Thanks for the invitation. But I don't have the necessary know-how for this PR and am not currently available to learn by doing ;-)

For a personal application, which I use several times a day in production, I have on the other hand carried out tests (signed and unsigned files) of very reasonable reliability, but cannot communicate these files as private.

Do all the test files in the pypdf project contain signed files?

from pypdf.

pubpub-zz avatar pubpub-zz commented on July 17, 2024

Encryption is not signauture. Also signature is a complex process for both signature and validation.
Also, the flag you're referecing just indicated that there is a signature field not indicating if signed or not

from pypdf.

pubpub-zz avatar pubpub-zz commented on July 17, 2024

Do all the test files in the pypdf project contain signed files?

Currently we have not signed document

from pypdf.

macdeport avatar macdeport commented on July 17, 2024

Encryption is not signature. Also signature is a complex process for both signature and validation. Also, the flag you're referencing just indicated that there is a signature field not indicating if signed or not

@pubpub-zz Your are right.

Preliminary statement: in PDF format, I'm just an unfortunate self-taught man with his limits... Sharing these limitations often allows us to go further together ;-)

So can you suggest a realistic scenario (or better still show a file) where such a field would be present without the PDF containing a valid signature, or one that was or will be valid?

This proposal simply concerns the detection of digital signatures and not, of course, their validation.

The naming can lead to confusion: is_signed can become has_signature.

Furthermore, the proposed algorithm, here adapted to the pypdf context, has been used by James Barlow for his two pikepdf and OCRmyPDF projects and their many users for years: isn't that a great realistic cohort of testers?

This heuristic is undoubtedly not perfect, but better (at this time) than others on the web.

from pypdf.

MartinThoma avatar MartinThoma commented on July 17, 2024

I vaguely remember that we decided against adding signature support as we wanted to focus on other parts + pyHanko exists: https://pypi.org/project/pyHanko/

from pypdf.

MartinThoma avatar MartinThoma commented on July 17, 2024

from pypdf.

MartinThoma avatar MartinThoma commented on July 17, 2024

Since we closed those issues, quite a lot of improvements were done. I would be open to discuss this topic again. We would need to clarify:

  1. Scope: We need to define which capabilities we want / we feel confident to be able to maintain. What would be the value for the user? Do we have evidence that people look for this (especially when we say that we only support signatures partially this is questionable)
  2. Examples for testing: Having a few example files we can test with - thank you, I'll add those to https://github.com/py-pdf/sample-files/ if that is fine to you @macdeport
  3. PR: We would need to find somebody who is willing to create a PR. @macdeport , it's fine to not be an expert :-) Nobody here does PDF as their main job. And if you need help with finding where something fits in pypdf, I'm confident people will help :-)

from pypdf.

stefan6419846 avatar stefan6419846 commented on July 17, 2024

I personally do not think that we should really start with the signature stuff. We already have the pyHanko project which does this completely fine (although on an older copy of pypdf) - why should we re-implement this here while just increasing the maintenance load?

from pypdf.

pubpub-zz avatar pubpub-zz commented on July 17, 2024

As said earlier, signing is quite tough : In order to be efficient the solution needs to apply work with PKCS / PKI. I have not used yet pyHanko. It looks quite good, although I see 2 points : It stills uses pyPDF2 and uses some old interfaces (e.g. PdfFileReader) which prevents a proper work with some incremental PDFs (I've started to work on the PR which has been proposed although the work is quite tough). Maybe we should start to exchange with pyHanko to make it compatible with the latest pypdf?

from pypdf.

macdeport avatar macdeport commented on July 17, 2024

#2655 (comment)
@MartinThoma Thank you for your kind and careful summary.

These PDFs have been produced especially for the test base of this project, which I appreciate both for its didactic contribution and as a user of the functions developed. Thank you to all those who are helping to bring it to life.

from pypdf.

pubpub-zz avatar pubpub-zz commented on July 17, 2024

based on exchanges I convert this thread into a discussion

from pypdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.