In the last version of the rM software, they added a functionality where they will att

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

It's confusing to me too <a class="user-mention notranslate" data-hovercard-type="user

Upgrade to rM 2.8 about remarks HOT 9 CLOSED

lucasrla commented on May 23, 2024

Upgrade to rM 2.8

from remarks.

Comments (9)

folofjc commented on May 23, 2024 1

Okay, I did some digging around in the new highlights json file. It looks pretty helpful - it even gives you the highlighted text! So might require some re-writing, but looks like you could keep the old highlighting part, and just add the new one. The new one should already be done, I think all you would have to do is take the rect with the height and width and just apply it.

from remarks.

folofjc commented on May 23, 2024

I created a pull request with my attempt to fix this. I also tested importing into Zotero and it works like it should! Zotero is actually worse at identifying that annotations that rM, haha. Since rM parses the text under the highlight, I wonder if there is a way to "store" the text with the annotation rectangle??

from remarks.

czarrar commented on May 23, 2024

@folofjc Thanks so much for this fix. It works great for me. I had to change one line in your code in remarks.py

rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{rm_file.stem}.json")

For me, the rm_file.stem points to a number while the highlights file seems to be the page id for me.

h_fname = pages[page_idx]
rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{h_fname}.json")

from remarks.

folofjc commented on May 23, 2024

Hi @czarrar. That is interesting. I don't have that issue. Are you on 2.12? I am still on 2.11 and it still works great. I have added a few protections to my code (if you try to extract highlights on a file with no highlights, it crashes; also, if your highlights are too small, it crashes. I had a highlight that rM made that was one dimensional; the rect that rM set had the same vertices). So I will check this out and maybe push another update to my master.

from remarks.

folofjc commented on May 23, 2024

hi @czarrar. I just checked, and the original code still works for me. I still have the {path_stem}.highlights/{rm_file.stem}.json as the file with all the highlights.

What is the value of your h_fname? Is it different than rm_file.stem? rm_file.stem should be the UUID of the pdf.

from remarks.

czarrar commented on May 23, 2024

@folofjc For the few files I tried, rm_file.stem is a number like 0 or 1. While my actual highlights json files are UUID like 1d111126-d36a-42b7-b35e-bc4ef80f3711.json.

To work for both our cases, I can make the following change instead:

try: # line 86
    page_idx = pages.index(f"{rm_file.stem}")
    rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{rm_file.stem}.json") # added
except:
    page_idx = int(f"{rm_file.stem}")
    h_fname = pages[page_idx] # added
    rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{h_fname}.json") # added

pages seems to be a list with the UUID for each page. In your case, the try statement should work. In my case it will run what is in the exception part. I'm not sure why the output can be these two types.

from remarks.

folofjc commented on May 23, 2024

@czarrar Huh. I do not get that at all. How many rm_files do you have? It looks like the problem is that for for loop is giving you the index of the loop, instead of the value of item in the list (which python is not supposed to do). What python are you using? Can you give me the value of rm_files as well?

from remarks.

czarrar commented on May 23, 2024

It's confusing to me too @folofjc. Sorry I didn't respond to your earlier message. I have 2.12 and Python 3.8.3, and thanks for those new additions.

My rm_files are 0.rm, 1.rm, 2.rm, 6.rm, and 9.rm (so 5 of them). My highlight files are 2fa55ecf-b917-450c-94bc-5dfc71246750.json, 6da0c15c-e1dc-4978-adc3-6f14da8a0761.json, 9abb565d-1ee0-4668-94e1-1053493feffd.json, 8860e3fc-5dfb-48ce-8b2a-f5d9f3f06c4b.json, f5c597fa-5620-4710-b48c-494335c934b2.json. Here are my files for this one document if you want to take a look: https://www.dropbox.com/s/coskv9skfb4vrti/zarrar_remarks_demo.zip?dl=0.

from remarks.

folofjc commented on May 23, 2024

Oh, wow. So your rM is actually storing the pages with those numbers? So it isn't python, it is the rM. Did you use rsync to get the files? So the problem is that list_pages_uuid is getting the pages from the 3b289986-47e4-472f-94fc-377350e8d2f6.content file, which has the pages as UUIDs, which is the same as in your .highlights directory. However, the pages in the folder 3b289986-47e4-472f-94fc-377350e8d2f6 are numbered as page numbers like 0, 1, 6, etc. That is really odd.

In my files, the contents of that folder have the same UUID for the pages in the .highlights folder.

So since @lucasrla put that except clause in there originally, it must have been for this reason. My question, is why one rM would store the files with the UUIDs and another would store it with page numbers???

Can you ssh into your rM and see if that folder stores them with the page numbers on the rM itself?

from remarks.

Upgrade to rM 2.8 about remarks HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs