I noticed you were experimenting with drag-and-drop in a blog post you wrote, John. W

Update on this issue. I added <a href="https://github.com/jkitchin/org-ref/blob/master

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I committed a draft file here <a href="https://github.com/jkitchin/org-ref/blob/master

drag and drop PDF to add to bibliography / notes about org-ref HOT 10 CLOSED

jkitchin commented on September 18, 2024

drag and drop PDF to add to bibliography / notes

from org-ref.

Comments (10)

jkitchin commented on September 18, 2024

That is a nice idea. Do you know how this information is extracted from the pdf? Could you send me an example of a PDF you know has this metadata in it?

from org-ref.

edgimar commented on September 18, 2024

Sometimes I believe metadata can be embedded in the PDF directly, but I'm talking more about tools that read the PDF, and guess (e.g. based on the title, authors, etc.) what paper it is by searching for it. A nice tool I've used that does renaming in this manner is gscholar.

There appears to be some elisp code that semi-automates the pulling of google-scholar (and other) source data and constructing a bibtex entry from it -- see gscholar-bibtex.

As for metadata directly embedded in the PDF, there seems to be some older information on this here and here.

Lastly, the pdf-tools emacs package seems like it is able to extract (and edit!) annotations in a PDF file.

from org-ref.

jkitchin commented on September 18, 2024

Thanks for these links. I will take a look at them. I actually tried the
python one, and after the third use or so google blocked me! But it
looks like a lot of the work is done in gscholar-bibtex already.

edgimar writes:

Sometimes I believe metadata can be embedded in the PDF directly, but I'm talking more about tools that read the PDF, and guess (e.g. based on the title, authors, etc.) what paper it is by searching for it. A nice tool I've used that does renaming in this manner is gscholar.

There appears to be some elisp code that semi-automates the pulling of google-scholar (and other) source data and constructing a bibtex entry from it -- see gscholar-bibtex.

As for metadata directly embedded in the PDF, there seems to be some older information on this here and here.

Reply to this email directly or view it on GitHub:
#44 (comment)

Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

from org-ref.

jkitchin commented on September 18, 2024

Update on this issue. I added https://github.com/jkitchin/org-ref/blob/master/org-ref-url-utils.el, which provides some support to drag a webpage onto a bibtex file to add a bibtex entry.

from org-ref.

llcc commented on September 18, 2024

I implemented a rough method to add an bibtex entry by drag-and-dropping the pdf to emacs if the doi is embedded in the file.

(defun extract-metadata-from-pdf (event)
  (interactive "e")
  (goto-char (nth 1 (event-start event)))
  (x-focus-frame nil)
  (let* ((payload (car (last event)))
     (pdf-file (abbreviate-file-name (replace-regexp-in-string "\\\\" "/" (car payload))))
     (text-file (concat (f-no-ext pdf-file)))
     doi)
    (save-excursion
      (shell-command (format "pdftotext %s %s" pdf-file text-file))
      (find-file-existing text-file)
      (beginning-of-buffer)
      (if (re-search-forward "http://dx.doi.org/\\(10.+$\\)" nil nil)
          (setq doi (match-string 1))
        (user-error "No doi can be found in the pdf file"))
      (kill-buffer)
      (delete-file text-file)
      (doi-utils-add-bibtex-entry-from-doi doi (car org-ref-default-bibliography)))))

(bind-key "<drag-n-drop>" 'extract-metadata-from-pdf)

I used a pdftotext command from git which can convert a pdf to a text file. If the doi of current file is embedded in this file, we can search and get the doi, then use it to add a bibtex entry for the default bibliography file.

This is just a rough idea. I tried to extract the title from the text file, however, the title is just plain text without any properties. So I think a better solution is to find a new application which can extract the right metadata instead of pdftotext.

Any improvement and advice about this is appreciated!

from org-ref.

jkitchin commented on September 18, 2024

This is a good start. I will give it some tests this weekend. I think we could think of a series of functions to try. First, if metadata exists we should get it since it is most reliable. Second we could try this approach. The only risk is it takes the first doi link, which we have to assume is for the article. if this failed, then a google/crossref search on some text from the pdf might be the last resort before giving up.

from org-ref.

llcc commented on September 18, 2024

pdftotext can accept arguments to generate a metadata html file. Then we can use it to get title, even authors and other things. I updated the above function and git it to https://github.com/llcc/org-ref-extraction-metadata-from-pdf, please have a look (sorry, i named it started with org-ref. Please tell me if it is not good).

Still need some fixes, but the basic has been expressed.

from org-ref.

llcc commented on September 18, 2024

@jkitchin can you create a file for pdf metadata extraction in the org-ref repository? It will be easier for us to submit? If so, I will merge the function in org-ref. Thanks!

from org-ref.

jkitchin commented on September 18, 2024

I committed a draft file here https://github.com/jkitchin/org-ref/blob/master/org-ref-pdf.el

It matches two types of DOIs in pdftotext. If one doi is found, it adds it as a bibtex entry. If two are found, it offers a helm selection menu for which one you want.

I like your idea of getting more structured metadata, but on some tests of about 100 pdfs, I didn't find any useful information in them. This still needs a lot of testing, so PRs are welcome for improvements!

from org-ref.

jkitchin commented on September 18, 2024

I am going to close this. Most of the functionality described above has been implemented now. Thanks for the idea! The second idea about extracting annotations is outside the scope of org-ref for now I think. The only other thing not done is renaming the pdf. That might be a good idea to do some time.

from org-ref.

drag and drop PDF to add to bibliography / notes about org-ref HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs