jstor: Import and Analyse Data from Scientific Articles

Author: Thomas Klebel
License: GPL v3.0

The tool Data for Research (DfR) by JSTOR is a valuable source for citation analysis and text mining. jstor provides functions and suggests workflows for importing datasets from DfR. It was developed to deal with very large datasets which require an agreement, but can be used with smaller ones as well.

The most important set of functions is a group of jst_get_* functions:

jst_get_article
jst_get_authors
jst_get_references
jst_get_footnotes
jst_get_book
jst_get_chapters
jst_get_full_text
jst_get_ngram

All functions which are concerned with meta data (therefore excluding jst_get_full_text and jst_get_ngram) operate along the same lines:

The file is read with xml2::read_xml().
Content of the file is extracted via XPATH or CSS-expressions.
The resulting data is returned in a tibble.

Installation

To install the package use:

install.packages("jstor")

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("ropensci/jstor")

Usage

In order to use jstor, you first need to load it:

library(jstor)
library(magrittr)

The basic usage is simple: supply one of the jst_get_*-functions with a path and it will return a tibble with the extracted information.

jst_get_article(jst_example("article_with_references.xml")) %>% knitr::kable()

file_name	journal_doi	journal_jcode	journal_pub_id	journal_title	article_doi	article_pub_id	article_jcode	article_type	article_title	volume	issue	language	pub_day	pub_month	pub_year	first_page	last_page	page_range
article_with_references	NA	tranamermicrsoci	NA	Transactions of the American Microscopical Society	10.2307/3221896	NA	NA	research-article	On the Protozoa Parasitic in Frogs	41	2	eng	1	4	1922	59	76	59-76

jst_get_authors(jst_example("article_with_references.xml")) %>% knitr::kable()

file_name	prefix	given_name	surname	string_name	suffix	author_number
article_with_references	NA	R.	Kudo	NA	NA	1

Further explanations, especially on how to use jstor’s functions for importing many files, can be found in the vignettes.

Getting started

In order to use jstor, you need some data from DfR. From the main page you can create a dataset by searching for terms and restricting the search regarding time, subject and content type. After you created an account, you can download your selection. Alternatively, you can download sample datasets with documents from before 1923 for the US, and before 1870 for all other countries.

Supported Elements

In their technical specifications, DfR lists fields which should be reliably present in all articles and books.

The following table gives an overview, which elements are supported by jstor.

Articles

`xml`-field	reliably present	supported in `jstor`
journal-id (type=“jstor”)	x	x
journal-id (type=“publisher-id”)	x	x
journal-id (type=“doi”)		x
issn	x
journal-title	x	x
publisher-name	x
article-id (type=“doi”)	x	x
article-id (type=“jstor”)	x	x
article-id (type=“publisher-id”)		x
article-type		x
volume		x
issue		x
article-categories	x
article-title	x	x
contrib-group	x	x
pub-date	x	x
fpage	x	x
lpage		x
page-range		x
product	x
self-uri	x
kwd-group	x
custom-meta-group	x	x
fn-group (footnotes)		x
ref-list (references)		x

Books

`xml`-field	reliably present	supported in `jstor`
book-id (type=“jstor”)	x	x
discipline	x	x
call-number	x
lcsh	x
book-title	x	x
book-subtitle		x
contrib-group	x	x
pub-date	x	x
isbn	x	x
publisher-name	x	x
publisher-loc	x	x
permissions	x
self-uri	x
counts	x	x
custom-meta-group	x	x

Book Chapters

`xml`-field	reliably present	supported in `jstor`
book-id (type=“jstor”)	x	x
part_id	x	x
part_label	x	x
part-title	x	x
part-subtitle		x
contrib-group	x	x
fpage	x	x
abstract	x	x

Code of conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Citation

To cite jstor, please refer to citation(package = "jstor"):

Klebel (2018). jstor: Import and Analyse Data from Scientific Texts. Journal of 
Open Source Software, 3(28), 883, https://doi.org/10.21105/joss.00883

Acknowledgements

Work on jstor benefited from financial support for the project “Academic Super-Elites in Sociology and Economics” by the Austrian Science Fund (FWF), project number “P 29211 Einzelprojekte”.

Some internal functions regarding file paths and example files were adapted from the package readr.

jimhester / jstor Goto Github PK

jstor's Introduction

jstor: Import and Analyse Data from Scientific Articles

Installation

Usage

Getting started

Supported Elements

Articles

Books

Book Chapters

Code of conduct

Citation

Acknowledgements

jstor's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs