Comments (9)
Okay, I did some digging around in the new highlights json file. It looks pretty helpful - it even gives you the highlighted text! So might require some re-writing, but looks like you could keep the old highlighting part, and just add the new one. The new one should already be done, I think all you would have to do is take the rect
with the height and width and just apply it.
from remarks.
I created a pull request with my attempt to fix this. I also tested importing into Zotero and it works like it should! Zotero is actually worse at identifying that annotations that rM, haha. Since rM parses the text under the highlight, I wonder if there is a way to "store" the text with the annotation rectangle??
from remarks.
@folofjc Thanks so much for this fix. It works great for me. I had to change one line in your code in remarks.py
rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{rm_file.stem}.json")
For me, the rm_file.stem
points to a number while the highlights file seems to be the page id for me.
h_fname = pages[page_idx]
rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{h_fname}.json")
from remarks.
Hi @czarrar. That is interesting. I don't have that issue. Are you on 2.12? I am still on 2.11 and it still works great. I have added a few protections to my code (if you try to extract highlights on a file with no highlights, it crashes; also, if your highlights are too small, it crashes. I had a highlight that rM made that was one dimensional; the rect that rM set had the same vertices). So I will check this out and maybe push another update to my master.
from remarks.
hi @czarrar. I just checked, and the original code still works for me. I still have the {path_stem}.highlights/{rm_file.stem}.json
as the file with all the highlights.
What is the value of your h_fname
? Is it different than rm_file.stem
? rm_file.stem
should be the UUID of the pdf.
from remarks.
@folofjc For the few files I tried, rm_file.stem
is a number like 0 or 1. While my actual highlights json files are UUID like 1d111126-d36a-42b7-b35e-bc4ef80f3711.json
.
To work for both our cases, I can make the following change instead:
try: # line 86
page_idx = pages.index(f"{rm_file.stem}")
rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{rm_file.stem}.json") # added
except:
page_idx = int(f"{rm_file.stem}")
h_fname = pages[page_idx] # added
rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{h_fname}.json") # added
pages
seems to be a list with the UUID for each page. In your case, the try statement should work. In my case it will run what is in the exception part. I'm not sure why the output can be these two types.
from remarks.
@czarrar Huh. I do not get that at all. How many rm_files
do you have? It looks like the problem is that for for
loop is giving you the index of the loop, instead of the value of item in the list (which python is not supposed to do). What python are you using? Can you give me the value of rm_files
as well?
from remarks.
It's confusing to me too @folofjc. Sorry I didn't respond to your earlier message. I have 2.12 and Python 3.8.3, and thanks for those new additions.
My rm_files are 0.rm, 1.rm, 2.rm, 6.rm, and 9.rm (so 5 of them). My highlight files are 2fa55ecf-b917-450c-94bc-5dfc71246750.json, 6da0c15c-e1dc-4978-adc3-6f14da8a0761.json, 9abb565d-1ee0-4668-94e1-1053493feffd.json, 8860e3fc-5dfb-48ce-8b2a-f5d9f3f06c4b.json, f5c597fa-5620-4710-b48c-494335c934b2.json. Here are my files for this one document if you want to take a look: https://www.dropbox.com/s/coskv9skfb4vrti/zarrar_remarks_demo.zip?dl=0.
from remarks.
Oh, wow. So your rM is actually storing the pages with those numbers? So it isn't python, it is the rM. Did you use rsync
to get the files? So the problem is that list_pages_uuid
is getting the pages from the 3b289986-47e4-472f-94fc-377350e8d2f6.content
file, which has the pages as UUIDs, which is the same as in your .highlights
directory. However, the pages in the folder 3b289986-47e4-472f-94fc-377350e8d2f6
are numbered as page numbers like 0, 1, 6, etc. That is really odd.
In my files, the contents of that folder have the same UUID for the pages in the .highlights
folder.
So since @lucasrla put that except
clause in there originally, it must have been for this reason. My question, is why one rM would store the files with the UUIDs and another would store it with page numbers???
Can you ssh
into your rM and see if that folder stores them with the page numbers on the rM itself?
from remarks.
Related Issues (20)
- Upgrade to rM v2.11 HOT 7
- Syntax Error after installing HOT 1
- Can this work with non-remarkable PDFs?
- Error when attempting running script with --pdf_name HOT 1
- Page offset for markdown HOT 1
- ATX headers for Markdown HOT 1
- Add support for converting PDFs to Remarkable bundles HOT 1
- add_smart_highlight_annotations does not work for paragraphs HOT 3
- Compatibiltity with older highlight colors HOT 1
- Multiple, wrong highlights in certain edge cases HOT 6
- Upgrade to reMarkable 3.0 HOT 6
- Change default page offset to 1 HOT 2
- Segments with just one point (report them here)
- Review the eraser tool HOT 4
- Drawing many annotations is very slow HOT 2
- Replace tools code with Maxio's tool code
- About ### 2.1 Clone git clone https://github.com/lucasrla/remarks.git && cd remarks
- Give pdf
- Skipping out_path.mkdir step when processing paths in directories and per_page_targets isn't used
- Pdf whitout annotations
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from remarks.