GithubHelp home page GithubHelp logo

Combined pdf? about remarks HOT 9 CLOSED

lucasrla avatar lucasrla commented on May 27, 2024
Combined pdf?

from remarks.

Comments (9)

folofjc avatar folofjc commented on May 27, 2024 1

Hi @lucasrla,

Thanks for the info. I have been using remarkable's app, however my issue is that it "flattens" the pdf so that annotations are not seen as annotations. So when I open it in adobe acrobat, etc, they do not appear as annotations. Using remarks, they do show up as annotations (I tested on the individual page pdfs).

I read a lot of academic journal papers and mark them up. Zotero has the ability to parse an annotated pdf and pull out all your annotations so that you can quickly look at them. But I cannot do this with remarkable exported pdfs. Which is why I am still using my android tablet to read these papers, since I can make annotations as true annotations in a pdf reader.

I agree that I would like the ToC to still work (but that is personally less of a priority for me since I still keep the original pdf).

I do not know PyMuPDF at all so I am not sure how much help I could be, but perhaps I will look into it since it is the only thing lacking!

from remarks.

lucasrla avatar lucasrla commented on May 27, 2024 1

Nice. Welcome aboard!

I see your point about doing the resize only if necessary. If a PDF has only "well-behaved" highlights on it, then that seems like a viable path for using Document.insertPDF() and keeping the links. If there are scribbles on the margins, unfortunately resizing is almost surely necessary.

The need for resizing when there are annotations on the margins is due to differences between the aspect ratio of the reMarkable (0.75) and the ones of common paper sizes (e.g. ~0.70 for A4). That is, the device itself already resizes PDFs while displaying most documents. If we then annotate on the margins of a page, we make the resizing "definitive" for that page.

If/when I find some time in the upcoming days, I might take a shot at implementing this. I will let you know.

from remarks.

folofjc avatar folofjc commented on May 27, 2024 1

Based upon that commit and the discussion thread, I think this issue can be closed. Thanks again!

from remarks.

lucasrla avatar lucasrla commented on May 27, 2024

Hey @folofjc,

I know there is at least one convenient alternative for exporting entire PDFs with annotations: using reMarkable's official desktop app. It has served me well on the Mac (there is a Windows version as well): https://support.remarkable.com/hc/en-us/articles/360002665378-Desktop-app

As you already noticed (by reading the comments), I ran into issues while trying to implement that feature with PyMuPDF. The deal breaker for me at the time were the differences in page size (but ToCs did not work either).

The ToC issue likely requires help from PyMuPDF upstream.

On the other hand, it should be possible to fix the page size within remarks. It is simply a matter of time to investigate the resizing/cropping process more carefully. If you are willing to help, pull requests are very welcome!

from remarks.

lucasrla avatar lucasrla commented on May 27, 2024

Hey @folofjc,

I have just pushed a commit that adds a "combined_pdf" feature. Could you please pull to origin master/HEAD and test it out?

Also, I am now mentioning your use of remarks together with Zotero in the README file, I hope you don't mind it.

Thanks

from remarks.

folofjc avatar folofjc commented on May 27, 2024

Hey @lucasrla,

I tried it out on a couple and it looks good, thanks! A few issues:

  • I think that the filename has an error; it puts a space between the original name and the "_remarks". I haven't looked through the code enough to try to find it.
  • The combined pdf is at the top level. Would it make more sense to put it in the original directory structure with the individual pages?
  • I get a lot of errors like "Found highlighted text but couldn't create markdown from page #7" from remarks.py, as well as from mupdf: "mupdf: expected object number" and "mupdf: kid not found in parent's kids array". Sometimes just one of them, sometimes other. Is this a problem with my original pdf? The ones about highlighted text are only when I want markdown output and go away if I only care about pdf, so it looks like getting the highlights is okay but forming markdown is not. However, even in pdf mode, I still get the mupdf errors.
  • Links are gone on the annotated pdf. I think this is from making the pdf page again from scratch. However, on non-annotated pages the links within the pdf still work. I guess it is not possible to apply the annotations on top of the original pdf because of the page size difference? This isn't a huge problem, since I still have the original pdf. But for workflows where the original pdf is overwritten, this would be problematic.

from remarks.

lucasrla avatar lucasrla commented on May 27, 2024

Glad to hear that it looks good!

Answering your points:

  1. The space between the original file name and " _remarks" was intentional. You can trim it in your local copy at this line: https://github.com/lucasrla/remarks/blob/master/remarks/remarks.py#L166

  2. The combined pdf at top level was intentional as well. You can tweak that same line (L166) to save the file anywhere else.

  3. I have been using remarks for a few months now and have never experienced any mupdf error. Your issues seem either due to a malformed PDF or a bug in PyMuPDF/MuPDF. Try googling about them, searching their repo, etc. Regarding the text extraction for Markdown, many things could go wrong there... If you don't have OCRmyPDF yet, I recommend installing it. But please also keep in mind that it does not address all the potential edge cases that can happen with text/OCR in PDFs.

  4. Yes, the links are gone because I am recreating each annotated PDF page with adjusted dimensions via Page.showPDFpage(). There is the following note in its documentation:

In contrast to method Document.insertPDF(), this method does not copy annotations or links, so they are not shown. But all its other resources (text, images, fonts, etc.) will be imported into the current PDF.

I haven't tested it extensively, but it seems from the documentation that Document.insertPDF() does not resize pages.

If Document.insertPDF() does not do resizing, then an alternative would be to recreate manually the links (similarly to what I am doing with annotations). See, for instance, Notes on Supporting Links. If you have the appetite, contributions are welcome!


Given that the combined PDF issue is now solved, I will go ahead and close this issue for now.

from remarks.

folofjc avatar folofjc commented on May 27, 2024

Okay, thanks. I wonder if the size of the page is the same, then could just use Document.insertPDF(). So only recreate if the dimensions are different. Like I said, not huge for me because Zotero keeps both the original and the annotated one. Thanks again for looking into this, it really makes the rM actually useful for me!

from remarks.

lucasrla avatar lucasrla commented on May 27, 2024

Hey @folofjc, I have just pushed through a new commit that should preserve the links in the original PDF.

Can you please pull to the most recent commit (99ec38) and report if it is working good for you in the new discussion thread that I have started just for that?

Thanks

from remarks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.