ε-index

R function to calculate the ε-index of a researcher's relative citation performance

Prof Corey J. A. Bradshaw
Global Ecology, Flinders University, Adelaide, Australia
September 2021
e-mail

Existing citation-based indices used to rank research performance do not permit a fair comparison of researchers among career stages or disciplines, nor do they treat women and men equally. We designed the ε-index, which is simple to calculate, based on open-access data, corrects for disciplinary variation, can be adjusted for career breaks, and sets a sample-specific threshold above and below which a researcher is deemed to be performing above or below expectation.

Code accompanies the article:

BRADSHAW, CJA, JM CHALKER, SA CRABTREE, BA EIJKELKAMP, JA LONG, JR SMITH, K TRINAJSTIC, V WEISBECKER. 2021. A fairer way to compare researchers at any career stage and in any discipline using open-access citation data. PLoS One 16(9): e0257141. doi:10.1371/journal.pone.0257141

--
DIRECTIONS

Create a .csv file of exactly the same format as the example file in this repository ('datasample.csv'):

COLUMN 1: personID — any character identification of an individual researcher (can be a name)
COLUMN 2: gender — researcher's gender ("F" or "M")
COLUMN 3: i10 — researcher's i10 index (# papers with ≥ 10 citations); must be > 0
COLUMN 4: h — researcher's h-index
COLUMN 5: maxcit — number of citations of researcher's most cited peer-reviewed paper
COLUMN 6: firstyrpub — the year of the researcher's first published peer-reviewed paper

Import the sample .csv file, or your own following the format indicated above (make sure first to specify the directory in which 'datasample.csv' resides using the 'setwd()' command):
```
 setwd("/path") # where /path is the directory path on your machine
 example.dat <- read.csv("datasample.csv", header=T) 
```
Alternatively, you can automatically harvest the necessary citation data from Google Scholar using the 'get.profile.func.R' function, which produces a file that can be called directly by the 'epsilon.index.func.R':

i. Predefine a Google Scholar ids vector (12-character user ID from scholar.google.com), e.g.,
```
  ids <- c("1sO0O3wAAAAJ","ZBUju2QAAAAJ","oGAui-IAAAAJ","cpJnEYIAAAAJ","ptDEg44AAAAJ","PJYrOvQAAAAJ","4UxbBYIAAAAJ") 
```
ii. Then define a 'genders' vector of the same length, e.g.,
```
  genders <- c("M","M","F","M","M","F","F")
```
iii. Load get.profile.func

iv. Define an input file that the epsilon.index.func will use, e.g.,
```
  example.dat <- getProfiledatFunc(ids, genders)
```
Note: The estimation of the first year of publication (Y₁) can return errors because the function does not differentiate peer-reviewed and non-peer-reviewed entries in Google Scholar, nor can it avoid clearly erroneous entries in a researcher's publication history. We recommend that all harvested values for the year of first publication be checked manually for each researcher in the sample. A case in point is id=ptDEg44AAAAJ that returns Y₁ = 1791, but the true year of first publication for this researcher is 1982.
Load the function ('epsilon.index.func') in R by submitting the entire function code (lines 20 to 212) to the R console.

Simply run the function as follows:

 epsilonIndexFunc(dat.samp=example.dat, bygender=c('no','yes'), sort.index=c('e', 'd', 'ep', 'dp'))

where 'bygender' indicates whether you want to calculate the gender-debiased index, and 'sort.out' is a sorting option for the final results table based on desired index (default = 'e')

possible values: 'e' = pooled; 'ep' = normalised; 'd' = gender-debiased; 'dp' = normalised gender-debiased

If there are insufficient individuals per gender to estimate a gender-specific index, we recommmend selecting bygender='no' and not using or sorting based on the gender-debiased index (option 'd'). If the individuals in the sample are not all in the same approximate discipline, we recommend not using or sorting based on either of the two normalised indices (options 'ep' or 'dp').

The output includes the following columns:

person: researcher's ID (specified by user)
gender: F=female; M=male
yrs.publ: number of years since first peer-reviewed article
gender.eindex: ε-index relative to others of the same gender in the sample
expectation: whether above or below expectation based on chosen index (default is 'e' = pooled index)
m-quotient: h-index ÷ yrs.publ
h-index: h-index
debiased.e.prime.index: scaled gender.eindex (gender ε′-index)
gender.rank: rank from gender.eindex (1 = highest)
rnk.debiased: gender-debiased rank (1 = highest)
pooled.eindex: ε-index generated from the entire sample (not gender-specific)
e.prime.index: scaled pooled.eindex (ε′-index)
pooled.rnk: rank from pooled.eindex (1 = highest)

and

if sort.index = 'ep':

eprime.rnk: rank from scaled pooled.eindex (ε′-index)

or if sort.index = 'dp':

eprime.debiased.rnk: rank from scaled gender.eindex (gender ε′-index)

You can easily export the output to a file like this:

 out <- epsilon.index.func(dat.samp=example.dat, sort.index=c('e', 'd', 'ep', 'dp'))
 write.table(out,file="rank.output.csv",sep=",",dec = ".", row.names = F,col.names = TRUE)

cjabradshaw / epsilonindex Goto Github PK

epsilonindex's Introduction

ε-index

epsilonindex's People

Contributors

Stargazers

Watchers

Forkers

epsilonindex's Issues

Unable to follow example code?

Not calculating

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs