Comments (14)
Do all the test files in the pypdf project contain signed files?
Currently we have not signed document
The lack of test files (valid/invalid digital signature) will not help testing or short-term development.
So below 4 different signed documents (3 only with a valid signature) as test files:
la-grenouille-et-le-boeuf-vivant-signed-1.pdf
la-grenouille-et-le-boeuf-vivant-signed-2.pdf
la-grenouille-et-le-boeuf-vivant-signed-2-comp.pdf
la-grenouille-et-le-boeuf-vivant-signed_.pdf <= non valid
from pypdf.
According to MatthiasValvekens/pyHanko#127 and MatthiasValvekens/pyHanko#335 (comment), there are quite some changes which make it hard to update pypdf to the latest version on pyHanko. pyHanko itself embeds an own, heavily modified copy of some older PyPDF2 version. Upgrading will most likely be a large and tedious task, while there are more important aspects to work on in my opinion.
from pypdf.
You are of course invited to propose a corresponding PR with tests and being compatible with Python 3.7 as well.
from pypdf.
Thanks for the invitation. But I don't have the necessary know-how for this PR and am not currently available to learn by doing ;-)
For a personal application, which I use several times a day in production, I have on the other hand carried out tests (signed and unsigned files) of very reasonable reliability, but cannot communicate these files as private.
Do all the test files in the pypdf project contain signed files?
from pypdf.
Encryption is not signauture. Also signature is a complex process for both signature and validation.
Also, the flag you're referecing just indicated that there is a signature field not indicating if signed or not
from pypdf.
Do all the test files in the pypdf project contain signed files?
Currently we have not signed document
from pypdf.
Encryption is not signature. Also signature is a complex process for both signature and validation. Also, the flag you're referencing just indicated that there is a signature field not indicating if signed or not
@pubpub-zz Your are right.
Preliminary statement: in PDF format, I'm just an unfortunate self-taught man with his limits... Sharing these limitations often allows us to go further together ;-)
So can you suggest a realistic scenario (or better still show a file) where such a field would be present without the PDF containing a valid signature, or one that was or will be valid?
This proposal simply concerns the detection of digital signatures and not, of course, their validation.
The naming can lead to confusion: is_signed
can become has_signature
.
Furthermore, the proposed algorithm, here adapted to the pypdf
context, has been used by James Barlow for his two pikepdf
and OCRmyPDF
projects and their many users for years: isn't that a great realistic cohort of testers?
This heuristic is undoubtedly not perfect, but better (at this time) than others on the web.
from pypdf.
I vaguely remember that we decided against adding signature support as we wanted to focus on other parts + pyHanko exists: https://pypi.org/project/pyHanko/
from pypdf.
from pypdf.
Since we closed those issues, quite a lot of improvements were done. I would be open to discuss this topic again. We would need to clarify:
- Scope: We need to define which capabilities we want / we feel confident to be able to maintain. What would be the value for the user? Do we have evidence that people look for this (especially when we say that we only support signatures partially this is questionable)
- Examples for testing: Having a few example files we can test with - thank you, I'll add those to https://github.com/py-pdf/sample-files/ if that is fine to you @macdeport
- PR: We would need to find somebody who is willing to create a PR. @macdeport , it's fine to not be an expert :-) Nobody here does PDF as their main job. And if you need help with finding where something fits in pypdf, I'm confident people will help :-)
from pypdf.
I personally do not think that we should really start with the signature stuff. We already have the pyHanko project which does this completely fine (although on an older copy of pypdf) - why should we re-implement this here while just increasing the maintenance load?
from pypdf.
As said earlier, signing is quite tough : In order to be efficient the solution needs to apply work with PKCS / PKI. I have not used yet pyHanko
. It looks quite good, although I see 2 points : It stills uses pyPDF2
and uses some old interfaces (e.g. PdfFileReader) which prevents a proper work with some incremental PDFs (I've started to work on the PR which has been proposed although the work is quite tough). Maybe we should start to exchange with pyHanko
to make it compatible with the latest pypdf?
from pypdf.
#2655 (comment)
@MartinThoma Thank you for your kind and careful summary.
These PDFs have been produced especially for the test base of this project, which I appreciate both for its didactic contribution and as a user of the functions developed. Thank you to all those who are helping to bring it to life.
from pypdf.
based on exchanges I convert this thread into a discussion
from pypdf.
Related Issues (20)
- Use token for Codecov
- `Ressources` deprecation does not work for some python versions HOT 2
- Rotated a pdf and Trying to extract images from the pdf it extracted unrotated pdfs HOT 4
- local variable 'cm' referenced before assignment HOT 15
- Insert image on a signature field in pypdf
- pdf should be how to replace the text in pdf, but do not change the original layout, only add or delete two words, in GitHub for a long time did not find, why
- PyPDF some fields not showing in generated PDF HOT 27
- Functionality of b_ HOT 6
- Form Fill Font Size and Orientation wrong HOT 2
- Form fill font extra \x00 and font size HOT 1
- Option to clear all images from a page HOT 1
- Version 4.3.0 writer unable to fill Dropdown fields HOT 2
- Broken docs link on PyPI HOT 3
- Filled Choice Fields Not Rendered Correctly By Adobe Acrobat HOT 6
- `TypeError` in `_cmap.py` when calling `extract_text()` HOT 6
- Documentation: Adding internal link code snippet is incorrect. HOT 1
- git tag was not created for 4.3.0 HOT 3
- [4.3.0] Regression when decoding strings
- PdfWriter unable to add reader HOT 8
- Paragraph field not showing correct HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pypdf.