GithubHelp home page GithubHelp logo

Upgrade to rM 2.8 about remarks HOT 9 CLOSED

lucasrla avatar lucasrla commented on May 23, 2024
Upgrade to rM 2.8

from remarks.

Comments (9)

folofjc avatar folofjc commented on May 23, 2024 1

Okay, I did some digging around in the new highlights json file. It looks pretty helpful - it even gives you the highlighted text! So might require some re-writing, but looks like you could keep the old highlighting part, and just add the new one. The new one should already be done, I think all you would have to do is take the rect with the height and width and just apply it.

from remarks.

folofjc avatar folofjc commented on May 23, 2024

I created a pull request with my attempt to fix this. I also tested importing into Zotero and it works like it should! Zotero is actually worse at identifying that annotations that rM, haha. Since rM parses the text under the highlight, I wonder if there is a way to "store" the text with the annotation rectangle??

from remarks.

czarrar avatar czarrar commented on May 23, 2024

@folofjc Thanks so much for this fix. It works great for me. I had to change one line in your code in remarks.py

rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{rm_file.stem}.json")

For me, the rm_file.stem points to a number while the highlights file seems to be the page id for me.

h_fname = pages[page_idx]
rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{h_fname}.json")

from remarks.

folofjc avatar folofjc commented on May 23, 2024

Hi @czarrar. That is interesting. I don't have that issue. Are you on 2.12? I am still on 2.11 and it still works great. I have added a few protections to my code (if you try to extract highlights on a file with no highlights, it crashes; also, if your highlights are too small, it crashes. I had a highlight that rM made that was one dimensional; the rect that rM set had the same vertices). So I will check this out and maybe push another update to my master.

from remarks.

folofjc avatar folofjc commented on May 23, 2024

hi @czarrar. I just checked, and the original code still works for me. I still have the {path_stem}.highlights/{rm_file.stem}.json as the file with all the highlights.

What is the value of your h_fname? Is it different than rm_file.stem? rm_file.stem should be the UUID of the pdf.

from remarks.

czarrar avatar czarrar commented on May 23, 2024

@folofjc For the few files I tried, rm_file.stem is a number like 0 or 1. While my actual highlights json files are UUID like 1d111126-d36a-42b7-b35e-bc4ef80f3711.json.

To work for both our cases, I can make the following change instead:

try: # line 86
    page_idx = pages.index(f"{rm_file.stem}")
    rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{rm_file.stem}.json") # added
except:
    page_idx = int(f"{rm_file.stem}")
    h_fname = pages[page_idx] # added
    rm_highlight_file = pathlib.Path(f"{input_dir}/{path.stem}.highlights/{h_fname}.json") # added

pages seems to be a list with the UUID for each page. In your case, the try statement should work. In my case it will run what is in the exception part. I'm not sure why the output can be these two types.

from remarks.

folofjc avatar folofjc commented on May 23, 2024

@czarrar Huh. I do not get that at all. How many rm_files do you have? It looks like the problem is that for for loop is giving you the index of the loop, instead of the value of item in the list (which python is not supposed to do). What python are you using? Can you give me the value of rm_files as well?

from remarks.

czarrar avatar czarrar commented on May 23, 2024

It's confusing to me too @folofjc. Sorry I didn't respond to your earlier message. I have 2.12 and Python 3.8.3, and thanks for those new additions.

My rm_files are 0.rm, 1.rm, 2.rm, 6.rm, and 9.rm (so 5 of them). My highlight files are 2fa55ecf-b917-450c-94bc-5dfc71246750.json, 6da0c15c-e1dc-4978-adc3-6f14da8a0761.json, 9abb565d-1ee0-4668-94e1-1053493feffd.json, 8860e3fc-5dfb-48ce-8b2a-f5d9f3f06c4b.json, f5c597fa-5620-4710-b48c-494335c934b2.json. Here are my files for this one document if you want to take a look: https://www.dropbox.com/s/coskv9skfb4vrti/zarrar_remarks_demo.zip?dl=0.

from remarks.

folofjc avatar folofjc commented on May 23, 2024

Oh, wow. So your rM is actually storing the pages with those numbers? So it isn't python, it is the rM. Did you use rsync to get the files? So the problem is that list_pages_uuid is getting the pages from the 3b289986-47e4-472f-94fc-377350e8d2f6.content file, which has the pages as UUIDs, which is the same as in your .highlights directory. However, the pages in the folder 3b289986-47e4-472f-94fc-377350e8d2f6 are numbered as page numbers like 0, 1, 6, etc. That is really odd.

In my files, the contents of that folder have the same UUID for the pages in the .highlights folder.

So since @lucasrla put that except clause in there originally, it must have been for this reason. My question, is why one rM would store the files with the UUIDs and another would store it with page numbers???

Can you ssh into your rM and see if that folder stores them with the page numbers on the rM itself?

from remarks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.