muschellij2 / rscopus Goto Github PK

View Code? Open in Web Editor NEW

74.0 6.0 16.0 1.13 MB

Scopus Database API Interface to R

R 100.00%

scopus-api scopus bibliometrics

rscopus's Introduction

R Package to interface with Elsevier and Scopus APIs

rscopus

The goal of rscopus is to provide an R Scopus Database ‘API’ Interface.

Installation

You can install rscopus from github with:

# install.packages("devtools")
devtools::install_github("muschellij2/rscopus")

Steps to get API key

In order to use this package, you need an API key from https://dev.elsevier.com/sc_apis.html. You should login from your institution and go to Create API Key. You need to provide a website URL and a label, but the website can be your personal website, and agree to the terms of service.

Go to https://dev.elsevier.com/user/login. Login or create a free account.
Click “Create API Key”. Put in a label, such as rscopus key. Add a website. http://example.com is fine if you do not have a site.
Read and agree to the TOS if you do indeed agree.
Add Elsevier_API = "API KEY GOES HERE" to ~/.Renviron file, or add export Elsevier_API=API KEY GOES HERE to your ~/.bash_profile.

Alternatively, you you can either set the API key using rscopus::set_api_key or by options("elsevier_api_key" = api_key). You can access the API key using rscopus::get_api_key.

You should be able to test out the API key using the interactive Scopus APIs.

A note about API keys and IP addresses

The API Key is bound to a set of IP addresses, usually bound to your institution. Therefore, if you are using this for a Shiny application, you must host the Shiny application from your institution servers in some way. Also, you cannot access the Scopus API with this key if you are offsite and must VPN into the server or use a computing cluster with an institution IP.

See https://dev.elsevier.com/tecdoc_api_authentication.html

Example

This is a basic example which shows you how to solve a common problem:

library(rscopus)
library(dplyr)
if (rscopus::is_elsevier_authorized()) {
  res = author_df(last_name = "Muschelli", first_name = "John", verbose = FALSE, general = FALSE)
  names(res)
  head(res[, c("title", "journal", "description")])
  unique(res$au_id)
  unique(as.character(res$affilname_1))
  
  all_dat = author_data(last_name = "Muschelli", 
                        first_name = "John", verbose = FALSE, general = TRUE)
  res2 = all_dat$df
  res2 = res2 %>% 
    rename(journal = `prism:publicationName`,
           title = `dc:title`,
           description = `dc:description`)
  head(res[, c("title", "journal", "description")])
}

Using an Institution Token

As per https://dev.elsevier.com/tecdoc_api_authentication.html: “Using a proprietary token (an”Institutional Token“) created for you by our integration support team”, so you need to contact Scopus to get one. If you have one and it’s located in an object called token, you should be able to use it as:

# token is from Scopus dev
hdr = inst_token_header(token)
res = author_df(last_name = "Muschelli", first_name = "John", verbose = FALSE, general = FALSE, headers = hdr)

but I have not tried it extensively.

rscopus's People

Contributors

Stargazers

Watchers

Forkers

weaselmicu viro76 kimberly-yang cpikas mufflyt yasminemabdelfattah nemochina2008 fgullo anj-prog the-statistical-support-network webteg quinnasena alvaro-cg trinecosmusnobel etiennebacher gnkadimeng

rscopus's Issues

Get more than 25 articles from ISSN

When I execute
x <- generic_elsevier_api(query = "ISSN(0004-3702) AND YEAR(2001)", search_type = "scopus") x$content$search-results$opensearch:totalResults``
I get 66 as results obtained, but I can't see more than 25, that are the first page. How can I get all the results?

Thanks for this package!!

author_df for a list or vector of authors

I have a list of more than 9000 authors for which I would like to obtain their publications, the date of such publications and their Author ID to then run other queries. The following:

x = author_df(last_name = "Muschelli", first_name = "John", verbose = FALSE)
Returns the information I am looking for. How can I run the query for a list of authors that I have in a data frame such as:

last <- c("Cho", "Mansury", "Ye", "Florida", "Mellander")
first <- c("Jae Beum", "Yuri S.", "Xinyue", "Richard", "Charlotta")
db <- cbind(last, first)

As an example,

x = author_df(last_name = last, first_name = first, verbose = FALSE)
returns

Error in `$<-.data.frame`(`*tmp*`, "first_name", value = c("Jae Beum",  : 
  replacement has 5 rows, data has 1

I suspect I have to somehow loop through the values of my variables and/or append the results of the query somehow, but I have not been successful so far.

Thanks,

Marco

PlumX metrics

Hi, i am not reporting any issues, but just wondering if you have any plan to modify the package to include PlumX search to retrieve PlumX metrics for Scopus documents: https://dev.elsevier.com/documentation/PlumXMetricsAPI.wadl

Thank you!

help is corrupt with GitHub version

Here's the error message
Error in fetch(key) : lazy-load database '/Users/gcn/Library/R/3.5/library/rscopus/help/rscopus.rdb' is corrupt

HTTP 401

code

library(rscopus)

author_search(au_id = "57195546540",
api_key = "1b6211270eebf79d93bc47554981eb03",
http = "https://api.elsevier.com/content/search/author",
count = 25,
start = 0,
verbose = TRUE,
facets = "subjarea(sort=fd,count=350)",
searcher = "AU-ID",
view = "STANDARD",
add_query = NULL)

Error

The query list is:
list(query = "AU-ID(57195546540)", count = 25, start = 0, view = "STANDARD",
facets = "subjarea(sort=fd,count=350)")
$query
[1] "AU-ID(57195546540)"

$count
[1] 25

$start
[1] 0

$view
[1] "STANDARD"

$facets
[1] "subjarea(sort=fd,count=350)"

Response [https://api.elsevier.com/content/search/author?query=AU-ID%2857195546540%29&count=25&start=0&view=STANDARD&facets=subjarea%28sort%3Dfd%2Ccount%3D350%29]
Date: 2018-09-21 05:45
Status: 401
Content-Type: application/json;charset=UTF-8
Size: 167 B

Error in get_results(au_id, start = init_start, count = count, facets = facets, :
Unauthorized (HTTP 401).

Problems with entries_to_df and entries_to_affil_list

Many thanks for a great package!

However, I have encountered an issue when using entries_to_df and entries_to_affil_list

water <- scopus_search(query = 'TITLE-ABS-KEY ( water AND ( "academy" OR "access" OR "access to sanitation" )) AND AFFILCOUNTRY ( belgium ) AND PUBYEAR > 2008')
water_df <- entries_to_df(water$entries)
water_aff <- entries_to_affil_list(water$entries)
The error happens with entries_to_df is
| | 0%Error in$<-.data.frame(*tmp*, "n_auth", value = -Inf) : replacement has 1 row, data has 0 In addition: Warning message: In max(as.numeric(res$seq)) : no non-missing arguments to max; returning -Inf

While with entries_to_affil_list, it returns a list containing only null elements

Please help me with this! Thank you very much!

Installation failed: Command failed (1)

devtools::install_github("muschellij2/rscopus")
x <- generic_elsevier_api(
query = "ISSN(0004-3702) AND YEAR(2001)",
search_type = "scopus")
x$content$search-results$opensearch:totalResults

error:

installing source package 'rscopus' ...
** R
** tests
** byte-compile and prepare package for lazy loading
Error in library(rvest) : there is no package called 'rvest'
Error : unable to load R code in package 'rscopus'
ERROR: lazy loading failed for package 'rscopus'
removing 'C:/Users/Myname/Documents/R/win-library/3.5/rscopus'
restoring previous 'C:/Users/Myname/Documents/R/win-library/3.5/rscopus'
In R CMD INSTALL
Installation failed: Command failed (1)

Non-character argument error applying bibtex_core_data()

Hi -

I get the following error when applying bibtex_core_data() to the output of article_retrieval().

"Error in strsplit(title, " ") : non-character argument."

Bibtex_core_data() appears to have executed successfully.

Let me know what I can provide to help and appreciate any attention you might have for this!

Retrieving text

Can you pls tell me how i can use this package to retrieve the full text divided by different sections i.e introduction, methodology, conclusion etc. for an elsevier papert?
Thanks in advance

Paper ID for merging

Would it be possible to add the Scopus paper ID to the output data.frame so that the information can be easily merged with other sources?

Query on multiple subject areas

The scopus_search() function seems to ignore the queries including LIMITS-TO() or EXCLUDE () in regard to SUBJAREA. As an example:

 completeArticle <- scopus_search(
    query = 'TITLE-ABS ("gender stereotypes") AND (SUBJAREA("PSYC") OR SUBJAREA ("MATH")) AND PUBYEAR = 2002',
    view = "COMPLETE", 
    count = 200)

## Total Entries are 36
## 3 runs need to be sent with current count
## |======================================================================================================================================| 100%
## Number of Output Entries are 36

returns 36 papers.

If I however copy the query from the scopus.com advanced search functionality,

 completeArticle <- scopus_search(
    query = '(LIMIT-TO ( SUBJAREA , "PSYC" ) OR LIMIT-TO ( SUBJAREA , "MATH"))',
    view = "COMPLETE", 
    count = 200)
## Total Entries are 55
## 3 runs need to be sent with current count
## |======================================================================================================================================| 100%
## Number of Output Entries are 55

the result contains 55 entries, which is the same as if no filter on the subject area were applied:

  completeArticle <- scopus_search(
    query = 'TITLE-ABS ("gender stereotypes") AND PUBYEAR = 2002',
    view = "COMPLETE", 
    count = 200)
## Total Entries are 55
## 3 runs need to be sent with current count
## |======================================================================================================================================| 100%
## Number of Output Entries are 55

Basic scopus search

Hi, thanks for your package, which could be really helpful if functioning properly ... I want to do a basic scopus search and directly have all the results as a dataframe in R. In the end, it should be a simple scopus export directly in R:

So when using rscopus I guess I need to take the scopus_search() function. Could you please assist me in how to use it?

for the query, I want to use the online query term (TITLE-ABS-KEY("dog*") OR TITLE-ABS-KEY("cat*") OR TITLE-ABS-KEY("bird*") OR TITLE-ABS-KEY("elephat*")) AND ( LIMIT-TO ( PUBYEAR,2019) OR LIMIT-TO ( PUBYEAR,2018) OR LIMIT-TO ( PUBYEAR,2017) OR LIMIT-TO ( PUBYEAR,2016) OR LIMIT-TO ( PUBYEAR,2015) OR LIMIT-TO ( PUBYEAR,2014) OR LIMIT-TO ( PUBYEAR,2013) OR LIMIT-TO ( PUBYEAR,2012) OR LIMIT-TO ( PUBYEAR,2011) OR LIMIT-TO ( PUBYEAR,2010) ) => how to use that in the function?
for the count, I guess I have to loop through my more than 40000 results, right?

Thanks for your help!

Use of an institution token

I wonder how I use my Scopus institution token in your package. The context is I am setting up a shiny app using your package.

bibtex_core_data no comma after doi field

Hi. I picked up an issue. The bibtext_core_data function appears to leave out comma after doi field. I create an entry using the scopus_id as my object identifier in abstract_retrieval.

argument inconsistency in citation_retrieval compared to article_retrieval and abstract_retrieval

I struggled quite a bit with citation_retrieval(), until I realized that the logic of the arguments is completely different:

in these two, you specify which type of identifier as a serate argument:

article_retrieval('10, identifier='doi', .1287/mnsc.2021.3997')
abstract_retrieval('10.1287/mnsc.2021.3997', identifier='doi')

but in citation_retreival, you have to use a named argument for the doi, for example:

citation_retrieval(doi='10.1287/mnsc.2021.3997')

this means that if you try to use the same logic in the other two:

citation_retrieval('10.1287/mnsc.2021.3997', identifier='doi')

it won't fail - but it will try to use the doi as an eid, and get an empty result back. This is very confusing.

Excluding of self-citations does not work in author_retrieval()

The same results in terms of $content$author-retrieval-response[[1]]$coredata$citation-count is obtained when using author_retrieval(au_id="35274452900",self_cite ="exclude") and author_retrieval(au_id="35274452900",self_cite ="include").

Panel data for authors?

Does rscopus include a function for gathering time-varying information on authors and/or papers? I can gather contemporary h-index and citations, for example, but I see that Scopus has annual data on these metrics

Feasibility of Metrics for All Co-authors

I read the manual and didn't find a suitable function, but I am wondering if it is possible to get the results of a search of several author IDs and determine all of the coauthors of those senior researchers to calculate the h-index and number of publications of every coauthor of theirs. The coauthors list would be long and I'm trying to avoid manually typing it in and maintaining it. Is it achievable?

is there any existing function to retrieve the tables that are embedded in the scopus provided open source articles?

when no entry return an error occur

I am currently using scopus_search() which is quite convenient with advanced search, however, when the number of output entries is 0, an error was thrown out. This is not convenient when we use it in a loop. Would you please add some codes to deal with this error so that it return a dataframe with 0 rows and same column names? If the missing searches could be recorded, it would even be better. Thanks.

how to collect abstracts based on query search?

thanks for the great package.

kindly advise how to return abstract of batch articles by query search?

which method could be useful.

regards

error while using the function process_object_retrieval()

Hello team,

I would like to get the objects of an article that are retrieved using the object_retrieval function. I am using the following code:
art <- article_retrieval(id = doi, identifier = "doi", verbose = F)
x <- process_object_retrieval(art)

And getting the below error:
Error in colnames<-(*tmp*, value = cn) :
attempt to set 'colnames' on an object with less than two dimensions

It would be really helpful if someone could look into the issue. Let me know if any details are required from my side.

Exporting query results to bibtex

Sorry to pester you with a feature request, but it would be fantastic if there could be built-in functionality that makes it easier to generate bibtex files based on the query results. My experience using R is somewhat limited, and I'm having trouble looping through the results and the entries of those results, to generate bibtex files containing records for each entry resulting from a given query. Any help with this would be very appreciated!

Unable to generate valid bibtex

The bibtex_core_data function does not appear to generate a valid bibtex file. That is, some applications will throw an error when opening the resulting bibtex (not all though).

This appears to be due to a missing comma separator between the doi and the abstract fields.

This comma is missing in this section of code:

paste(" <key>,", " author = {<auth>},", " address = {<address>},", " title = {<title>},", " journal = {<jour>},", " year = {<year>},", " volume = {<vol>},", " number = {<number>},", " pages = {<pages>},", " doi = {<doi>}", " abstract = {<abstract>}", sep = "\n")

Please add function to check current status of quota

Thank you for your great package.

I believe adding a function that checks one's remaining quota would be good. Can this be added to the todo list, please?

Many thanks. Please stay safe.

Error Installing rscopus

After installing devtool and Rtools's packages, when I code:
devtools::install_github("muschellij2/rscopus")
It retrieves the following:
Error: Failed to install 'rscopus' from GitHub: (converted from warning) cannot remove prior installation of package ‘curl’

The first part of the installation request you wheter update some packages or not. In this example, I don't make any update- It seems that curl is installed by default.

Add topic: scopus

I think this would be better findable, if displayed on GitHub.com/topics/scopus, please consider adding that topic to this repo. Thank you :-)

Get to know the query size before download

For my experience, the API could never download more than 5000 entries at a time, but the query might yield more than 5000. Is there a way to get the size of the query before the download? For your reference: https://pybliometrics.readthedocs.io/en/stable/examples/ScopusSearch.html. In the reference, they could use download = False to suppress the download and use get_results_size. Of cource, if you could adjust the codes to make it download any number of entries, it would be best.

Thank you for the R API of scopus, which leads us to open science in Scientometrics.

Extracting Abstracts

My hope is to develop a script that extract just the abstract information to a vector of character strings from an rscopus search. To start this code, I have created an if statement that creates the search, and extracts a list of just rscopus ID numbers:

if(have_api_key()) {
  #First, the scopus search, with a high max_count. If I increase the count#, I usually get an error.
  res = scopus_search(query = "conservation AND translocation", 
                      max_count = 20, count = 10)
  
  #Then, list some data frames from these entries
  df = gen_entries_to_df(res$entries)
  head(df$df)
  
  #Just extract the scopus identifier
  Scopus_IDs <- df$df$`dc:identifier`
  head(Scopus_IDs)
  
  #Take SCOPUS_ID: off these numbers, so we just have the ID numbers. 
  Scopus_IDs_Clean <- str_remove(Scopus_IDs, "SCOPUS_ID:")
  head(Scopus_IDs_Clean)
}

Before I write a loop that can take each scopus ID and extract just the abstract, I want to test it with just one entry to make sure the function can provide just one abstract:

#Call just one abstract:
  x = abstract_retrieval("85081719894", identifier = "scopus_id")
  data = jsonlite::fromJSON(httr::content(x$get_statement, as = "text"), flatten=TRUE)
  
  data = data$`abstracts-retrieval-response`
  names(data)
  data$coredata
  data$coredata$`dc:description`

However, when I use this code, it comes NULL for the abstract information.

> x = abstract_retrieval("85081719894", identifier = "scopus_id")
HTTP specified is:https://api.elsevier.com/content/abstract/scopus_id/85081719894

>   data = jsonlite::fromJSON(httr::content(x$get_statement, as = "text"), flatten=TRUE)
>   
>   data = data$`abstracts-retrieval-response`
>   names(data)
[1] "affiliation" "coredata"   
>   data$coredata
$srctype
[1] "j"

$`prism:issueIdentifier`
[1] "1"

$eid
[1] "2-s2.0-85081719894"

$`pubmed-id`
[1] "32165659"

$`prism:coverDate`
[1] "2020-12-01"

$`prism:aggregationType`
[1] "Journal"

$`prism:url`
[1] "https://api.elsevier.com/content/abstract/scopus_id/85081719894"

$subtypeDescription
[1] "Article"

$`dc:creator`
$`dc:creator`$author
  ce:given-name @seq ce:initials @_fa ce:surname      @auid
1        Wricha    1          W. true      Tyagi 8675093800
                                                    author-url ce:indexed-name
1 https://api.elsevier.com/content/author/author_id/8675093800        Tyagi W.
  preferred-name.ce:given-name preferred-name.ce:initials preferred-name.ce:surname
1                       Wricha                         W.                     Tyagi
  preferred-name.ce:indexed-name affiliation.@id
1                       Tyagi W.       109874624
                                                      affiliation.@href
1 https://api.elsevier.com/content/affiliation/affiliation_id/109874624

$link
  @_fa           @rel
1 true           self
2 true         scopus
3 true scopus-citedby
                                                                                       @href
1                            https://api.elsevier.com/content/abstract/scopus_id/85081719894
2  https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85081719894&origin=inward
3 https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85081719894&origin=inward

$`prism:publicationName`
[1] "Scientific Reports"

$`source-id`
[1] "21100200805"

$`citedby-count`
[1] "0"

$`prism:volume`
[1] "10"

$subtype
[1] "ar"

$`dc:title`
[1] "Root transcriptome reveals efficient cell signaling and energy conservation key to aluminum toxicity tolerance in acidic soil adapted rice genotype"

$openaccess
[1] "1"

$openaccessFlag
[1] "true"

$`prism:doi`
[1] "10.1038/s41598-020-61305-7"

$`prism:issn`
[1] "20452322"

$`article-number`
[1] "4580"

$`dc:identifier`
[1] "SCOPUS_ID:85081719894"

$`dc:publisher`
[1] "Nature Research"

>   data$coredata$`dc:description`
NULL

It should be stated that names(data) only retrieves the fields 'affiliation' and 'coredata'. Is there something that I have missed? I have attached
Session_Info_9June2020.txt
, for more detail.

FWCI abstraction

Hi, is there a way to extract the Field-Weighted Citation Impact using this package?

automatic proxy configuration

Hello

Thanks for rscopus.

I don't know whether this is general or not - to access articles I am required two settings for manual browser based download:

https://www.lrz.de/services/netz/mobil/vpn/
https://www.lrz.de/services/netzdienste/proxy/browser-config/

After setup Interactive article retrieval works great in the browser, but not from RStudio.
I am guessing its related to this automatic-proxy-configuration and I am clueless of how to set this up in R.

`author_df` returns `chartr` error

not sure what's going on as this is just like the vignette

> author_df(last_name = authorsFilter$lastName[i],
+                               first_name = authorsFilter$firstName[i],
+                              verbose=FALSE)
Error in chartr(paste(names(unwanted_array), collapse = ""), paste(unwanted_array,  : 
  'old' is longer than 'new'

> x =get_complete_author_info(last_name = authorsFilter$lastName[i],
+                               first_name = authorsFilter$firstName[i],
+                              verbose=FALSE)

I guess the example doesn't work anymore either:

> res = author_df(last_name = "Muschelli", first_name = "John", verbose = FALSE)
Error in chartr(paste(names(unwanted_array), collapse = ""), paste(unwanted_array,  : 
  'old' is longer than 'new'

here is the sessionInfo()

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rscopus_0.5.8.9000 jaffelab_0.99.17   rafalib_1.0.0      RColorBrewer_1.1-2 igraph_1.1.2       stringr_1.2.0      readxl_1.0.0      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15           rstudioapi_0.7         XVector_0.18.0         magrittr_1.5           GenomicRanges_1.30.0  
 [6] BiocGenerics_0.24.0    zlibbioc_1.24.0        IRanges_2.12.0         R6_2.2.2               rlang_0.1.6           
[11] plyr_1.8.4             httr_1.3.1             GenomeInfoDb_1.14.0    tools_3.4.3            parallel_3.4.3        
[16] utf8_1.1.3             cli_1.0.0              assertthat_0.2.0       tibble_1.4.1           crayon_1.3.4          
[21] GenomeInfoDbData_1.0.0 S4Vectors_0.16.0       bitops_1.0-6           curl_3.1               segmented_0.5-3.0     
[26] RCurl_1.95-4.10        limma_3.34.5           stringi_1.1.6          pillar_1.1.0           compiler_3.4.3        
[31] cellranger_1.1.0       stats4_3.4.3           jsonlite_1.5           pkgconfig_2.0.1

[1] "AUTHORIZATION_ERROR"

please help me,

library(rscopus)
mykey = "78e80f8644fb94c3fef909dc2e12e243"
author_retrieval(au_id = "57195546540",view = c("STANDARD"),
self_cite = c("include"),
verbose = TRUE, api_key = mykey)

error:

$get_statement
Response [https://api.elsevier.com/content/author/author_id/57195546540?view=METRICS&self-citation=exclude&apiKey=78e80f8644fb94c3fef909dc2e12e243]
Date: 2018-09-02 13:40
Status: 401
Content-Type: application/json;charset=UTF-8
Size: 167 B

$content
$content$service-error
$content$service-error$status
$content$service-error$status$statusCode
[1] "AUTHORIZATION_ERROR"

$content$service-error$status$statusText
[1] "The requestor is not authorized to access the requested view or fields of the resource"

Scopus_search does not start from specified start entry

I want to download a lot of papers from Scopus. The first 4999 went well.
Now that I want to start later, I run into problems. The API does not seem to start at the specified start point.

Here the example; Note that it says "10 runs need to be sent with current count", even though it should be only 2

res <- scopus_search(
query = q,
view = "COMPLETE",
count = 1, # nr records to retrieve at one. below 25 for view = COMPLETE
max_count = 10, # Maximum count of records to be returned
start = 8,
verbose = TRUE,
wait_time = 1,
# api_key = rscopus_key
)

Total Entries are 12405, but starting at 8
Maximum Count is 10
10 runs need to be sent with current count
|======================================================================================================| 100%
Number of Output Entries are 10

the author no. is not correct for scopus_search

I have found that while using scopus_search to get the COMPLETE view, the author no. is not correct. Actually, it returns the distince author-affiliation pair but not the distinct author.

Minor issue with bibtext_core_data for punctuated titles

Hi there, thanks for the very useful package.

I have encountered a small issue when generating a .bib file using bibtex_core_data. The problem occurs when article titles have punctuation with the first word:

e.g.:"Comprehensiveness, accuracy, quality, credibility and readability of online information about knee osteoarthritis"

Here split_title = strsplit(title, " ")[[1]] in the function brings the comma with the first word along with it. I.e. the value for split title in this example is "Comprehensiveness,"

this affects the bib key later, as there is a comma in it which produced a read-in error for me

I came across this while modifying your function slightly for my own use, and fixed it with:
split_title = gsub("[[:punct:]]", "", strsplit(title, " ")[[1]])

thanks in advance

Alternative authorization options

I am accessing Scopus from a subscriber institution (not a university), but my machine's IP address resets every eight days. As a result, I get this error (in this case, I was using get_author_info on myself):

$`service-error`
$`service-error`$status
$`service-error`$status$statusCode
[1] "AUTHENTICATION_ERROR"

$`service-error`$status$statusText
[1] "Client IP Address: xxx.xxx.xxx.xxx does not resolve to an account"

If anyone else has this problem, here is what integration support told me:

IP address xxx.xxx.xxx.xxx is not configured for your access to Scopus and that is why you will only obtain a non-subscriber's response from API.

If it should be set up for access, then your Elsevier reporesentative needs to take care of that.

If it shouldn't be, but you still need institutional access to API, that can be solved with an institutional token which I can provide you with.

Since my client IP address changes every eight days, I need to gain authorization using the API Key and an institution token instead of by IP address. I am still a novice at R and programming in general, so I'm not likely to solve this problem by myself soon, but functionality for insttokens or authtokens would be deeply appreciated in the future.

retrieve email address of corresponding author

Thank you very much for your rscopus package, it's been very useful so far!

However, I am still struggling using it to retrieve corresponding authors email address.

This is one of the fields offered by Scopus within the bibliographical info and inside the field "correspondence address".
When using the Scopus web interface I can download this correspondence addresses in a BibTex file.
Then opening the BibTex file it appears as a non-bibtex field named "correspondence_address1" (see below an example of one of my publications), which is perfectly suitable for me :)

@ARTICLE{deCastro-Arrazola2018,
correspondence_address1={deCastro-Arrazola, I.; Department of Biogeography and Global Change, Spain; email: [email protected]},
}

However, using function affiliation_retrieval doesn't seem to help :(
Any idea of how to retrieve this email from within rscopus?
is it an intended Elsevier limitation?

Thank you very very much in advance,

Always says HTTP specified is (without API key):

Every time I try to run any query I get this error.

I have tried setting the API via the Renviron file, directly into the command with api_key = 'key', and with the function set_api_key and each time it gives the same error. The API Key works when I use in web browser but for some reason RScopus seems not to be feeding it. Can't figure it out

extra `/` in `article_retrieval` and `abstract_retrieval` URLs

The functions article_retrieval and abstract_retrieval add an extra / to the URLs they generate.

Ex:

library(rscopus)
doi = '10.1109/TASE.2014.2368997'
set_api_key('put a real API key here')
abstract_retrieval(doi, identifier = 'doi')

Message output:

HTTP specified is:http://api.elsevier.com/content/article//doi/10.1109/TASE.2014.2368997

Scopus returns a service error for this. I haven't checked whether this happens for other article identifiers, but looking at the code it seems to be generic.

In both functions, the problem is an interaction between the line

ender = paste0("/", paste(identifier, id, sep = "/"))

which prepends one /, and the fact that, in generic_elsevier_api, the large switch statement for search_type returns NULL when type == 'abstract' or 'article'. Then the line

http = paste(root_http, type, search_type, sep = "/")

is equivalent to

http = paste('http://api.elsevier.com/content', 'abstract', NULL, sep = '/')

which returns 'http://api.elsevier.com/content/abstract/'.

I see two potential fixes.

In abstract_retrieval and article_retrieval, replace the problem line with something like

ender = paste(identifier, id, sep = '/')

which no longer prepends one '/'.

In generic_elsevier_api, use stringr::str_c in place of paste. https://cran.r-project.org/web/packages/stringr/index.html. This function ignores 0-length strings, including nulls. Ex:

stringr::str_c('http://api.elsevier.com/content', 'abstract', NULL, sep = '/')

returns

"http://api.elsevier.com/content/abstract"

scopus_search returns $`author-count`$`$` incremented by 1

An example:
for eid="2-s2.0-85054194688"
author count shall be 4 while 5 is returned

The code to retrieve the record
item = scopus_search(query = paste0("eid(2-s2.0-85054194688)"),
view = "COMPLETE",
verbose = T)

item$entries[[1]]$author-count

Downloading more than the Scopus quota's

Thanks for developing this package. It has been functioning perfectly so far. However, I have the following issue. My current search via the Scopus search function indicates that there are about 80k hits. Since the quota for this API is 20.000 publications per week, I can't download them all at once. I was wondering if there is a way to continue the download next week (when Elsevier will reset the quota's) from publication 20.001 till 40.000, and after waiting another week downloading 40.001-60.000.

gen_entries_to_df returns only the first affiliation for every author

I am currently running the following code, which returns me a dataframe with all publications, all authors, and all affiliations.

res=scopus_search(query="PMID(30431561)", view="COMPLETE", max_count = 1)
df = gen_entries_to_df(res$entries)
publications=df$df
affiliations=df$affiliation
authors=df$author

However, in the authors dataframe the author is only linked to the ID of his/her first affiliation, while they might also be linked to other affiliations. In this particular case, the 5 authors are all connected to 2 different institutions, but the authors dataframe only links them to the first.

Now is my question, am I using the wrong function (I looked at the others to_df functions, but that did not solve the problem). I am now solving this with some basic code, but it would be nice if it could be incorporated into this function (since I guess that was the idea, because now the authors dataframe is just showing duplicate rows).

bibtex scopus

if (!is.null(api_key)){
x = article_retrieval(fetch_file[[1]][2],
identifier = "scopus_id",
verbose = FALSE,
headers = inr)

save(x)
load(x)

 bib=bibtex_core_data(x) 
 if (!is.null(bib)){
  fileW=sprintf("%s.bib",a)

files=list.files(path="LUT",pattern="*.bib",full.names=TRUE)
M <- convert2df(files, dbsource = "scopus", format = "bibtex")

Converting your scopus collection into a bibliographic dataframe

Warning:
In your file, some mandatory metadata are missing. Bibliometrix functions may not work properly!

Please, take a look at the vignettes:

'Data Importing and Converting' (https://www.bibliometrix.org/vignettes/Data-Importing-and-Converting.html)
'A brief introduction to bibliometrix' (https://www.bibliometrix.org/vignettes/Introduction_to_bibliometrix.html)

Missing fields: AU DE ID C1 CR
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 56

Too many request

I could not thank you enough for the powerful tool you have provided, and have cited your work in my new paper.

When I want to automate the work, however, I received error as "Too Many Requests (RFC 6585) (HTTP 429)." And after investigation I know that Scopus has a request rate limiter (https://harzing.com/resources/publish-or-perish/manual/reference/dialogs/preferences-scopus). I wonder if "rcopus" has considered this issue? Can we handle this gracefully?

Thanks.

FYI: I contact the manager of Scopus afterwards but receive no response so far, and it seems that the API key does not work any more. Sad story.

Add a "sleep" parameter in scopus_search.R when getting more than 25/200 items

It would be useful if the user could specify a Sys.sleep time that would be applied between consecutive retrievals that are called internally from the scopus_search.R function.

In my case I try to download ~5000 items in a single scopus_search call, and I get a HTTP 400 (Bad Request) Response, somewhere in the middle of retrieving the items. I suspect the API has a limit of requests per second. Thus setting such a parameter would force the scopus_search to maintain a maximum request rate.

If there is no other workaround, I will try to modify the scopus_search.R and add this functionality.

Question regarding scopus_search()

Hi, thanks for an awesome package! Very excited to see this.

When I run scopus_search(), I seem to run into an error at the same point (around 65% of the way through searching all of the results) regardless of how I set the wait time, rate, etc.

Here is the query:

scopus_search(query = "ISSN(0022-0663)", max_count = 8000, count = 25, wait_time = 7)

Here is the error that is returned:

Error in get_results(query, start = init_start, count = count, verbose = verbose, : Bad Request (HTTP 400).

Any idea what is causing this? Is it possible to handle this error and to proceed to the rest of the articles?

Getting full author name from AU-ID

If you have the au_ids, you can get the author information, but first names are not always included rather than the initials for certain individuals:
https://www.scopus.com/authid/detail.uri?authorId=22968535800

Using the current version of rscopus:

library(rscopus)
res_full = get_complete_author_info(au_id = "22968535800")
#> HTTP specified is:http://api.elsevier.com/content/search/author?query=AU-ID(22968535800)

full_names = res_full$content$`search-results`$entry[[1]]$`name-variant`
full_names = t(sapply(full_names, function(x) {
  unlist(x[c("given-name", "surname")])
}))

Using new functions/dev version

You need the dev version for this functionality:

devtools::install_github("muschellij2/rscopus")

Here I show how to do this with the new multi_author_info function:

au_ids = c("22968535800", "40462056100")
au_ids = paste(au_ids, collapse = ",")

info = multi_author_info(au_ids)
#> HTTP specified is:http://api.elsevier.com/content/author
lapply(info, `[[`, "other_names")
#> $`22968535800`
#>       doc-count initials indexed-name surname  
#>  [1,] "1"       "C.N.R." "Rao C."     "Rao"    
#>  [2,] "1"       "C.N.R." "Rao C."     "Rao"    
#>  [3,] "1484"    "C.N.R." "Rao C."     "Rao"    
#>  [4,] "28"      "C.N.R." "Rao C."     "Rao"    
#>  [5,] "3"       "C.N."   "Rao C."     "Rao"    
#>  [6,] "2"       "C.N.R." "Rao C."     "Rao"    
#>  [7,] "4"       NULL     "RAO CNR"    "RAO CNR"
#>  [8,] "1"       "C.N.R." "Rao C."     "Rao"    
#>  [9,] "1"       "C.N."   "Rao C."     "Rao"    
#> [10,] "1"       "C.N.R." "RAO C."     "RAO"    
#>       given-name                     
#>  [1,] "C. N.Ramchandra"              
#>  [2,] "Chintamani Nagesa Ramachandra"
#>  [3,] "C. N.R."                      
#>  [4,] "C. N.Ramachandra"             
#>  [5,] "C. N."                        
#>  [6,] "Chintamani N.R."              
#>  [7,] NULL                           
#>  [8,] "Cnr N.R."                     
#>  [9,] "Cnr N."                       
#> [10,] "C. N.R."                      
#> 
#> $`40462056100`
#>     
#> [1,]