GithubHelp home page GithubHelp logo

jimhester / jstor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ropensci/jstor

0.0 1.0 0.0 6.25 MB

Import journal data from DfR (JSTOR)

Home Page: https://ropensci.github.io/jstor

License: GNU General Public License v3.0

R 98.73% HTML 0.17% TeX 0.97% CSS 0.13%

jstor's Introduction

jstor: Import and Analyse Data from Scientific Articles

Author: Thomas Klebel
License: GPL v3.0

Travis build status AppVeyorBuild status Coverage status lifecycle CRAN status CRAN_Download_Badge rOpenSci badge JOSS badge Zenodo DOI

The tool Data for Research (DfR) by JSTOR is a valuable source for citation analysis and text mining. jstor provides functions and suggests workflows for importing datasets from DfR. It was developed to deal with very large datasets which require an agreement, but can be used with smaller ones as well.

The most important set of functions is a group of jst_get_* functions:

  • jst_get_article
  • jst_get_authors
  • jst_get_references
  • jst_get_footnotes
  • jst_get_book
  • jst_get_chapters
  • jst_get_full_text
  • jst_get_ngram

All functions which are concerned with meta data (therefore excluding jst_get_full_text and jst_get_ngram) operate along the same lines:

  1. The file is read with xml2::read_xml().
  2. Content of the file is extracted via XPATH or CSS-expressions.
  3. The resulting data is returned in a tibble.

Installation

To install the package use:

install.packages("jstor")

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("ropensci/jstor")

Usage

In order to use jstor, you first need to load it:

library(jstor)
library(magrittr)

The basic usage is simple: supply one of the jst_get_*-functions with a path and it will return a tibble with the extracted information.

jst_get_article(jst_example("article_with_references.xml")) %>% knitr::kable()
file_name journal_doi journal_jcode journal_pub_id journal_title article_doi article_pub_id article_jcode article_type article_title volume issue language pub_day pub_month pub_year first_page last_page page_range
article_with_references NA tranamermicrsoci NA Transactions of the American Microscopical Society 10.2307/3221896 NA NA research-article On the Protozoa Parasitic in Frogs 41 2 eng 1 4 1922 59 76 59-76
jst_get_authors(jst_example("article_with_references.xml")) %>% knitr::kable()
file_name prefix given_name surname string_name suffix author_number
article_with_references NA R. Kudo NA NA 1

Further explanations, especially on how to use jstor’s functions for importing many files, can be found in the vignettes.

Getting started

In order to use jstor, you need some data from DfR. From the main page you can create a dataset by searching for terms and restricting the search regarding time, subject and content type. After you created an account, you can download your selection. Alternatively, you can download sample datasets with documents from before 1923 for the US, and before 1870 for all other countries.

Supported Elements

In their technical specifications, DfR lists fields which should be reliably present in all articles and books.

The following table gives an overview, which elements are supported by jstor.

Articles

xml-field reliably present supported in jstor
journal-id (type=“jstor”) x x
journal-id (type=“publisher-id”) x x
journal-id (type=“doi”) x
issn x
journal-title x x
publisher-name x
article-id (type=“doi”) x x
article-id (type=“jstor”) x x
article-id (type=“publisher-id”) x
article-type x
volume x
issue x
article-categories x
article-title x x
contrib-group x x
pub-date x x
fpage x x
lpage x
page-range x
product x
self-uri x
kwd-group x
custom-meta-group x x
fn-group (footnotes) x
ref-list (references) x

Books

xml-field reliably present supported in jstor
book-id (type=“jstor”) x x
discipline x x
call-number x
lcsh x
book-title x x
book-subtitle x
contrib-group x x
pub-date x x
isbn x x
publisher-name x x
publisher-loc x x
permissions x
self-uri x
counts x x
custom-meta-group x x

Book Chapters

xml-field reliably present supported in jstor
book-id (type=“jstor”) x x
part_id x x
part_label x x
part-title x x
part-subtitle x
contrib-group x x
fpage x x
abstract x x

Code of conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Citation

To cite jstor, please refer to citation(package = "jstor"):

Klebel (2018). jstor: Import and Analyse Data from Scientific Texts. Journal of 
Open Source Software, 3(28), 883, https://doi.org/10.21105/joss.00883

Acknowledgements

Work on jstor benefited from financial support for the project “Academic Super-Elites in Sociology and Economics” by the Austrian Science Fund (FWF), project number “P 29211 Einzelprojekte”.

Some internal functions regarding file paths and example files were adapted from the package readr.

ropensci_footer

jstor's People

Contributors

bklebel avatar starship9 avatar tklebel avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.