GithubHelp home page GithubHelp logo

r-hub / cranlogs Goto Github PK

View Code? Open in Web Editor NEW
79.0 79.0 13.0 215 KB

Download Logs from the RStudio CRAN Mirror

Home Page: https://r-hub.github.io/cranlogs/

License: Other

R 99.23% Makefile 0.77%
r r-package rstats

cranlogs's People

Contributors

bryceroney avatar ericwatt avatar gaborcsardi avatar jbkunst avatar jeroen avatar lindbrook avatar maelle avatar patperry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cranlogs's Issues

Odd download counts recently

There's been a very large drop-off in downloads for pretty much every package - not sure if this is a cranlogs issue or something else πŸ€·β€β™‚οΈ.

library(tidytable, w = FALSE)
library(data.table, w = FALSE)
library(cranlogs)
library(ggplot2)

cran_downloads(c("data.table", "dplyr"), from = "2020-01-01") %>%
  mutate.(year_month = yearmon(date),
          package = factor(package)) %>%
  summarize.(avg_daily_downloads = sum(count)/n(),
             .by = c(package, year_month)) %>%
  ggplot() +
  aes(x = year_month, y = avg_daily_downloads, color = package) +
  geom_point() +
  geom_line()

Percentiles

Feature request: I'm not sure how much work would be involved in implementing this, but I think it would be very useful to have a function to return percentiles for downloads, in order to be able to say things like "package X is in the top 10% of downloaded packages from CRAN".

`cran_downloads` not working since `2020-01-01`

cranlogs::cran_downloads(
  packages = "dplyr",
  from = "2019-12-29",
  to = Sys.Date()
)
#>         date count package
#> 1 2019-12-29 20758   dplyr
#> 2 2019-12-30 28627   dplyr
#> 3 2019-12-31 23584   dplyr
#> 4 2020-01-01     0   dplyr
#> 5 2020-01-02     0   dplyr
#> 6 2020-01-03     0   dplyr

Created on 2020-01-03 by the reprex package (v0.3.0.9001)

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.2 (2019-12-12)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       Europe/Berlin               
#>  date     2020-01-03                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version    date       lib source                           
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                   
#>  backports     1.1.5      2019-10-02 [1] CRAN (R 3.6.1)                   
#>  cli           2.0.0.9000 2019-12-23 [1] Github (r-lib/cli@0293ae7)       
#>  cranlogs      2.1.1.9000 2019-05-09 [1] Github (r-hub/cranlogs@361c9d7)  
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.1)                   
#>  curl          4.3        2019-12-02 [1] CRAN (R 3.6.1)                   
#>  digest        0.6.23     2019-11-23 [1] CRAN (R 3.6.1)                   
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)                   
#>  fansi         0.4.0      2018-11-05 [1] Github (brodieG/fansi@ab11e9c)   
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.0)                   
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.0)                   
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.0)                   
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.6.1)                   
#>  httr          1.4.1      2019-08-05 [1] CRAN (R 3.6.1)                   
#>  jsonlite      1.6        2018-12-07 [1] CRAN (R 3.6.0)                   
#>  knitr         1.26       2019-11-12 [1] CRAN (R 3.6.1)                   
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.5.1)                   
#>  pillar        1.4.3      2019-12-20 [1] CRAN (R 3.6.2)                   
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.6.1)                   
#>  purrr         0.3.3      2019-10-18 [1] CRAN (R 3.6.1)                   
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 3.6.1)                   
#>  Rcpp          1.0.3      2019-11-08 [1] CRAN (R 3.6.1)                   
#>  reprex        0.3.0.9001 2019-12-30 [1] Github (tidyverse/reprex@27aa69a)
#>  rlang         0.4.2      2019-11-23 [1] CRAN (R 3.6.2)                   
#>  rmarkdown     2.0        2019-12-12 [1] CRAN (R 3.6.1)                   
#>  rstudioapi    0.10       2019-03-19 [1] CRAN (R 3.6.0)                   
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                   
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                   
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                   
#>  styler        1.2.0.9000 2020-01-02 [1] Github (r-lib/styler@ca3d2b0)    
#>  tibble        2.1.3      2019-06-06 [1] CRAN (R 3.6.2)                   
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.1)                   
#>  xfun          0.11       2019-11-12 [1] CRAN (R 3.6.1)                   
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.1)                   
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.2/library

Option to count only current, unarchived packages

For convenience's sake, "archived" refers to either past versions of current, active packages or all versions of inactive packages found at https://cran.r-project.org/src/contrib/Archive/

While there's probably some interest in and use of "archived" versions (e.g., compatibility), my sense is that "archived" packages may be overrepresented in the CRAN download logs. I would think they would be rare and occasional rather than regular and frequent. While related to #45, in this case I'm referring to instances where "archived" packages are downloaded in their entirety. Below are two examples. Try other dates (e.g. "2019-12-06", "2019-12-08") and packages (you may have to look up the current version).

vars <- c("date", "time", "size", "version", "ip_id")
date <- "2019-12-04"

pkg <- "cranlogs"
current.ver <- "2.1.1" # published on 2019-04-29
sample_log <- packageRank::packageLog(pkg, date)
sample_log <- sample_log[order(sample_log$version, sample_log$ip_id), ]
sample_log[sample_log$version != current.ver & sample_log$size > 1000, vars]

date time size version ip_id
5092411 2019-12-04 13:57:51 14099 2.0.0 1248
5092830 2019-12-04 13:57:51 14100 2.0.0 1248
3969680 2019-12-04 21:12:16 3559 2.0.0 2209
236723 2019-12-04 15:32:59 17450 2.1.0 1248
237393 2019-12-04 15:32:59 17449 2.1.0 1248
4168022 2019-12-04 14:29:30 17560 2.1.0 1248
4170377 2019-12-04 14:29:30 17561 2.1.0 1248
3581717 2019-12-04 05:19:04 19854 2.1.0 2209
3969681 2019-12-04 21:12:16 5645 2.1.0 2209
5063409 2019-12-04 05:04:01 19840 2.1.0 2209
3586015 2019-12-04 20:00:40 17096 2.1.0 3646

pkg <- "HistData"
current.ver <- "0.8-4" # published on 2018-04-04
sample_log <- packageRank::packageLog(pkg, date)
sample_log <- sample_log[order(sample_log$version, sample_log$ip_id), ]
sample_log[sample_log$version != current.ver & sample_log$size > 1000, vars]

date time size version ip_id
3397012 2019-12-04 22:12:06 128701 0.6-11 2209
98322 2019-12-04 16:36:10 233746 0.6-12 44
3551489 2019-12-04 13:51:11 233690 0.6-12 44
4877876 2019-12-04 07:36:01 233685 0.6-12 1164
3397013 2019-12-04 22:12:06 137419 0.6-12 2209
3397014 2019-12-04 22:12:06 142693 0.6-13 2209
3397015 2019-12-04 22:12:06 144321 0.6-14 2209
3397016 2019-12-04 22:12:06 105128 0.6-4 2209
3397017 2019-12-04 22:12:06 108639 0.6-5 2209
3397018 2019-12-04 22:12:07 122814 0.6-7 2209
3397019 2019-12-04 22:12:07 123349 0.6-8 2209
4228171 2019-12-04 11:21:15 192318 0.6-9 44
3397020 2019-12-04 22:12:07 124259 0.6-9 2209
3397021 2019-12-04 22:12:07 144828 0.7-0 2209
2839810 2019-12-04 23:07:22 251000 0.7-3 44
397676 2019-12-04 09:04:26 250995 0.7-3 1164
3397022 2019-12-04 22:12:07 146243 0.7-3 2209
5099407 2019-12-04 13:59:16 251731 0.7-5 1248
5100200 2019-12-04 13:59:16 251732 0.7-5 1248
571097 2019-12-04 04:43:59 253220 0.7-5 2209
3222200 2019-12-04 04:54:47 253220 0.7-5 2209
3397023 2019-12-04 22:12:07 146560 0.7-5 2209
3397024 2019-12-04 22:12:07 148196 0.7-6 2209
3397025 2019-12-04 22:12:07 344875 0.7-8 2209
3765857 2019-12-04 05:07:05 450034 0.7-8 2209
3397026 2019-12-04 22:12:07 353161 0.8-0 2209
2194293 2019-12-04 05:23:06 476708 0.8-1 2209
3397027 2019-12-04 22:12:08 356170 0.8-1 2209
3413294 2019-12-04 01:47:43 356145 0.8-2 44
4092977 2019-12-04 15:37:57 356140 0.8-2 1248
4093061 2019-12-04 15:37:57 356139 0.8-2 1248
3397028 2019-12-04 22:12:08 236694 0.8-2 2209

aggregate counts by package over a period

We can get aggregate counts for all packages over a period and counts by date for specific packages. That is great but can we also get aggregate counts by package over a period? I know it's a simple aggregation we can do as well but it would save downloading a million records.

Something like:

cran_downloads(package = 'cranlogs', from='2020-01-01', to='2020-03-14', aggregate=TRUE)

This probably would require a change in the API for it to be an actual saving. Let me know what you think.

Since 2016, it does not work properly

Since 2016, it does not work properly:

cran_downloads(from = "2015-12-20", to = "2016-01-08", "dplyr")
         date count package
1  2015-12-20  1499   dplyr
2  2015-12-21  2724   dplyr
3  2015-12-22  2794   dplyr
4  2015-12-23  3250   dplyr
5  2015-12-24  1847   dplyr
6  2015-12-25   933   dplyr
7  2015-12-26  1628   dplyr
8  2015-12-27  1540   dplyr
9  2015-12-28  2196   dplyr
10 2015-12-29  2068   dplyr
11 2015-12-30  2047   dplyr
12 2015-12-31  1274   dplyr
13 2016-01-01     0   dplyr
14 2016-01-02     0   dplyr
15 2016-01-03     0   dplyr
16 2016-01-04     0   dplyr
17 2016-01-05     0   dplyr
18 2016-01-06     0   dplyr
19 2016-01-07     0   dplyr
20 2016-01-08     0   dplyr

row names for 0 counts

There are some occasions where the count is 0 but it seems implausible. Maybe this is due to the server being offline this day. Whatever caused the 0, the row names for that row are not in sequence after that date anymore, see below:

b <- cranlogs::cran_downloads(from="2016-01-01", to="2016-12-31", packages = c("ggplot2"))

head(b, 35)
          date count package
1   2016-01-01  2822 ggplot2
2   2016-01-02  3135 ggplot2
3   2016-01-03  3124 ggplot2
4   2016-01-04  7788 ggplot2
5   2016-01-05  7669 ggplot2
6   2016-01-06  8144 ggplot2
7   2016-01-07  8405 ggplot2
8   2016-01-08  7641 ggplot2
9   2016-01-09  4170 ggplot2
10  2016-01-10  4103 ggplot2
11  2016-01-11  8640 ggplot2
12  2016-01-12  9702 ggplot2
13  2016-01-13  9255 ggplot2
14  2016-01-14  8894 ggplot2
15  2016-01-15  8146 ggplot2
16  2016-01-16  5411 ggplot2
17  2016-01-17  4577 ggplot2
18  2016-01-18  8759 ggplot2
19  2016-01-19  9553 ggplot2
20  2016-01-20 10585 ggplot2
21  2016-01-21 10023 ggplot2
22  2016-01-22  8350 ggplot2
23  2016-01-23  4690 ggplot2
24  2016-01-24  5263 ggplot2
25  2016-01-25 10046 ggplot2
26  2016-01-26  9741 ggplot2
27  2016-01-27 10819 ggplot2
28  2016-01-28 10622 ggplot2
29  2016-01-29  9362 ggplot2
30  2016-01-30  4389 ggplot2
31  2016-01-31  4951 ggplot2
32  2016-02-01  9863 ggplot2
33  2016-02-02 10677 ggplot2
355 2016-02-03     0 ggplot2
34  2016-02-04 10036 ggplot2

Distinguish NA and 0

I have no idea how hard this would be, but it would be nice to distinguish "package was not on CRAN" (e.g. NA) from "package was not downloaded" (e.g. 0)

Days with no CRAN downloads

There are 43 days when cranlogs::cran_downloads() reports that there were zero package downloads. I've checked a couple of logs at http://cran-logs.rstudio.com/; they seem to disagree.

dates <- as.Date(c("2018-01-05", "2018-02-09", "2018-02-10", "2018-02-23",
  "2018-02-24", "2018-05-06", "2018-05-12", "2018-05-19", "2018-05-27",
  "2018-07-07", "2018-07-08", "2018-07-28", "2018-08-31", "2018-10-21",
  "2017-01-12", "2017-07-16", "2017-09-01", "2017-09-02", "2016-02-03",
  "2016-06-02", "2016-06-12", "2016-07-12", "2016-07-24", "2016-08-04",
  "2016-08-11", "2016-08-13", "2016-08-14" ,"2016-08-20", "2016-09-02",
  "2016-09-09", "2015-08-23", "2015-09-07", "2015-09-09", "2015-10-18",
  "2015-10-26", "2015-10-31", "2015-11-01", "2015-11-15", "2014-01-01",
  "2014-11-17", "2012-12-29", "2012-12-30", "2012-12-31"))

dates <- sort(dates)

zero_downloads <- lapply(dates, function(x) {
  cranlogs::cran_downloads(from = x, to = x)
})

zero_downloads <- do.call(rbind, zero_downloads)

I'm guessing these will be fixed when you update the DB script (#45).

FWIW

R downloads before 2015

I was trying to download the entire history of R downloads (β€˜cranlogs’ version 2.1.1 and 2.1.1.9000, R version 3.6.1, macOS 10.14.6). I noticed that there appears to be a problem for years before 2015.

I think there are two types of problems: 1) valid logs not being "read" and 2) invalid logs (no observations)

valid logs

a) The code below computes the number of R downloads from 1 January 2015 to yesterday, and from 31 December 2014 to yesterday

'# logs from 1 January 2015 to yesterday
test1 <- cranlogs::cran_downloads("R", from = "2015-01-01", to = Sys.Date() - 1)

'# logs from 31 December 2014 to yesterday
test2 <- cranlogs::cran_downloads("R", from = "2014-12-31", to = Sys.Date() - 1)

As you can see, both have the same number of observations and the data for 31 December 2014 appears to be missing:

'# same number of observations (rows)
identical(nrow(test1), nrow(test2))

'# no data for 31 December 2014
head(test2[order(test2$date), ])
tail(test2[order(test2$date), ])

If you manually download the logs, you'll see that the log for 25 December 2014 looks OK and its format (str) appears to be the same as 1 January 2015.

http://cran-logs.rstudio.com/2014/2014-12-31-r.csv.gz
http://cran-logs.rstudio.com/2015/2015-01-01-r.csv.gz

b) The code below computes the number of R downloads for the last week of 2014:

test3 <- cranlogs::cran_downloads("R", from = "2014-12-25", to = "2014-12-31")

Again, if you manually download the individual logs, they look fine:

http://cran-logs.rstudio.com/2014/2014-12-31-r.csv.gz
http://cran-logs.rstudio.com/2014/2014-12-30-r.csv.gz
http://cran-logs.rstudio.com/2014/2014-12-29-r.csv.gz
http://cran-logs.rstudio.com/2014/2014-12-28-r.csv.gz
http://cran-logs.rstudio.com/2014/2014-12-27-r.csv.gz
http://cran-logs.rstudio.com/2014/2014-12-26-r.csv.gz
http://cran-logs.rstudio.com/2014/2014-12-25-r.csv.gz

c) For what it's worth, here's the code for the calendar years 2012, 2013 and 2014:

test4 <- cranlogs::cran_downloads("R", from = "2014-01-01", to = "2014-12-31")
test5 <- cranlogs::cran_downloads("R", from = "2013-01-01", to = "2013-12-31")

'# Note first log for 2012 begins on 01 October 2012
test6 <- cranlogs::cran_downloads("R", from = "2012-10-01", to = "2012-12-31")

'# number of rows is 0 for each year:
vapply(list(test4, test5, test6), nrow, integer(1L))

invalid logs

A look at the first two logs, for 01 and 02 October 2012 shows that there are also logs that appear to have no entries (rows). Not sure how widespread this is but 2012 seems particularly problematic.

http://cran-logs.rstudio.com/2012/2012-10-01-r.csv.gz
http://cran-logs.rstudio.com/2012/2012-10-02-r.csv.gz

better error msg

Specifing a wrong date format (YYYY-DD-MM instead of YYYY-MM-DD) gives a misleading error msg:

cranlogs::cran_downloads(from="2016-01-01", to="2016-31-12", packages = c("ggplot2"))
## No encoding supplied: defaulting to UTF-8.
## Error in res1$downloads : $ operator is invalid for atomic vectors

cranlogs not working?

cranlogs has stopped counting downloads since the 7th of August. Any ideas about what's happening?

cranlogs::cran_downloads(packages = "ggplot2", from = "2018-08-04", to = "2018-08-16")
#>          date count package
#> 1  2018-08-04  8056 ggplot2
#> 2  2018-08-05  8612 ggplot2
#> 3  2018-08-06 17220 ggplot2
#> 4  2018-08-07     0 ggplot2
#> 5  2018-08-08     0 ggplot2
#> 6  2018-08-09     0 ggplot2
#> 7  2018-08-10     0 ggplot2
#> 8  2018-08-11     0 ggplot2
#> 9  2018-08-12     0 ggplot2
#> 10 2018-08-13     0 ggplot2
#> 11 2018-08-14     0 ggplot2
#> 12 2018-08-15     0 ggplot2
#> 13 2018-08-16     0 ggplot2

Created on 2018-08-17 by the reprex package (v0.2.0.9000).

Compute number of unique IP addresses

Currently, cranlogs::cran_downloads() computes the number of downloads:

cran_downloads("HistData")
date count package
1 2019-09-27 165 HistData

Would it be possible to also compute the number of unique IP addresses per package? You'd get something like:

cran_downloads("HistData")
date count ip_count package
1 2019-09-27 165 107 HistData

(107 is correct by my computation)

No recent results

Similar to #14
Applies to all packages

test <- cran_downloads(from = "2016-09-17", to = "2016-09-22","dplyr")

test

        date count package
1 2016-09-17  3113   dplyr
2 2016-09-18     0   dplyr
3 2016-09-19     0   dplyr
4 2016-09-20     0   dplyr
5 2016-09-21     0   dplyr
6 2016-09-22     0   dplyr

cranlogs not working from 1st?

As can be seen here, there is no download count for the 2019-01-01 and I am wondering if there is some issue with cranlogs since the new year-

cranlogs::cran_downloads(c("ggplot2", "dplyr"), "last-week")
#>          date count package
#> 1  2018-12-25  9010 ggplot2
#> 2  2018-12-26 12964 ggplot2
#> 3  2018-12-27 13846 ggplot2
#> 4  2018-12-28 12993 ggplot2
#> 5  2018-12-29  8865 ggplot2
#> 6  2018-12-30  8423 ggplot2
#> 7  2018-12-31  9523 ggplot2
#> 8  2018-12-25  8648   dplyr
#> 9  2018-12-26 12235   dplyr
#> 10 2018-12-27 14021   dplyr
#> 11 2018-12-28 12601   dplyr
#> 12 2018-12-29  8268   dplyr
#> 13 2018-12-30  8140   dplyr
#> 14 2018-12-31  9687   dplyr

Created on 2019-01-03 by the reprex package (v0.2.1)

Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                                             
#>  version  R Under development (unstable) (2018-11-30 r75724)
#>  os       Windows 10 x64                                    
#>  system   x86_64, mingw32                                   
#>  ui       RTerm                                             
#>  language (EN)                                              
#>  collate  English_United States.1252                        
#>  ctype    English_United States.1252                        
#>  tz       Asia/Calcutta                                     
#>  date     2019-01-03                                        
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib
#>  assertthat    0.2.0      2017-04-11 [1]
#>  backports     1.1.3      2018-12-14 [1]
#>  callr         3.1.1      2018-12-21 [1]
#>  cli           1.0.1.9000 2018-10-30 [1]
#>  cranlogs      2.1.1      2018-10-24 [1]
#>  crayon        1.3.4      2017-09-16 [1]
#>  curl          3.2        2018-03-28 [1]
#>  desc          1.2.0      2018-10-30 [1]
#>  devtools      2.0.1      2018-10-26 [1]
#>  digest        0.6.18     2018-10-10 [1]
#>  evaluate      0.12       2018-10-09 [1]
#>  fs            1.2.6      2018-08-23 [1]
#>  glue          1.3.0      2018-07-17 [1]
#>  highr         0.7        2018-06-09 [1]
#>  htmltools     0.3.6      2017-04-28 [1]
#>  httr          1.4.0      2018-12-11 [1]
#>  jsonlite      1.6        2018-12-07 [1]
#>  knitr         1.21       2018-12-10 [1]
#>  magrittr      1.5        2014-11-22 [1]
#>  memoise       1.1.0      2017-04-21 [1]
#>  pkgbuild      1.0.2      2018-10-16 [1]
#>  pkgload       1.0.2      2018-10-29 [1]
#>  prettyunits   1.0.2      2015-07-13 [1]
#>  processx      3.2.1      2018-12-05 [1]
#>  ps            1.3.0      2018-12-21 [1]
#>  R6            2.3.0      2018-10-04 [1]
#>  Rcpp          1.0.0      2018-11-07 [1]
#>  remotes       2.0.2      2018-10-30 [1]
#>  rlang         0.3.0.1    2018-10-25 [1]
#>  rmarkdown     1.11       2018-12-08 [1]
#>  rprojroot     1.3-2      2018-01-03 [1]
#>  sessioninfo   1.1.1      2018-11-05 [1]
#>  stringi       1.2.4      2018-07-20 [1]
#>  stringr       1.3.1      2018-05-10 [1]
#>  testthat      2.0.1      2018-10-13 [1]
#>  usethis       1.4.0.9000 2018-12-12 [1]
#>  withr         2.1.2      2018-03-15 [1]
#>  xfun          0.4        2018-10-23 [1]
#>  yaml          2.2.0      2018-07-25 [1]
#>  source                            
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  Github (r-lib/cli@56538e3)        
#>  Github (metacran/cranlogs@554a99e)
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  Github (r-lib/desc@7c12d36)       
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  Github (r-lib/usethis@923dd75)    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-devel/library

Not Found (HTTP 404) in cran_downloads

Hello, i get error in next code:

library(cranlogs)

cran_downloads(from = Sys.Date()-15, to = Sys.Date(), packages = c("rgoogleads", "RAdwords"))
#> Error in cran_downloads(from = Sys.Date() - 15, to = Sys.Date(), packages = c("rgoogleads", : Not Found (HTTP 404).

Created on 2022-02-09 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> - Session info  --------------------------------------------------------------
#>  hash: face with symbols on mouth, lobster, man police officer: medium skin tone
#> 
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Windows Server x64 (build 14393)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.1252
#>  ctype    English_United States.1252
#>  tz       Europe/Helsinki
#>  date     2022-02-09
#>  pandoc   2.14.0.3 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date (UTC) lib source
#>  cli           3.1.0   2021-10-27 [1] CRAN (R 4.1.1)
#>  cranlogs    * 2.1.1   2019-04-29 [1] CRAN (R 4.1.1)
#>  curl          4.3.2   2021-06-23 [1] CRAN (R 4.1.1)
#>  digest        0.6.28  2021-09-23 [1] CRAN (R 4.1.1)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.1)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.1)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.1)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.1)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.1)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.1)
#>  httr          1.4.2   2020-07-20 [1] CRAN (R 4.1.1)
#>  jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.1.1)
#>  knitr         1.36    2021-09-29 [1] CRAN (R 4.1.1)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.1)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.1)
#>  rlang         0.4.12  2021-10-18 [1] CRAN (R 4.1.1)
#>  rmarkdown     2.11    2021-09-14 [1] CRAN (R 4.1.1)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.1)
#>  sessioninfo   1.2.1   2021-11-02 [1] CRAN (R 4.1.2)
#>  stringi       1.7.5   2021-10-04 [1] CRAN (R 4.1.1)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.1)
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.1)
#>  xfun          0.27    2021-10-18 [1] CRAN (R 4.1.1)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.1)
#> 
#>  [1] C:/Users/Alsey/Documents/R/win-library/4.0
#>  [2] C:/Program Files/R/R-4.1.2/library
#> 
#> ------------------------------------------------------------------------------

CRAN release 2.1.1

(is this the right version number?)

Prepare for release:

  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Polish NEWS

Perform release:

  • Bump version (in DESCRIPTION and NEWS)
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Tag release
  • Bump dev version
  • Blog post and tweet

0 downloads since 2017-01-01

It seems that the cran_downloads function stopped working beginning of this year. I only get 0 for all tested packages since then.

package name errors in CRAN logs

There are 8 problem package names in the RStudio's CRAN mirror download logs. They arguably shouldn't appear in the logs because they are either misspelled, typos, or are no longer on CRAN or in the Archive.

pkg.name.errors <- c("calib", "clus", "ebayesthresh", "PARccs", "rcom",
"RcppTemplate", "rmosek", "tsp")

lapply(pkg.name.errors, function(x) cranlogs::cran_downloads(x, "last-month"))

'calib' is neither on CRAN nor in the archive (9 yrs old).
'clus' is probably some kind of typo.
'ebayesthresh' is an old name for 'EbayesThresh', which is on CRAN.
'PARccs'is an old name for 'pARccs', which is now archived.
'rcom' is neither on CRAN nor in the archive (7 yrs old).
'RcppTemplate' is neither on CRAN nor in the archive (10 yrs old).
'rmosek' is an old name for 'Rmosek', which is on CRAN.
'tsp'is an old name for 'TSP', which is on CRAN.

`cranlogs` download counts are haphazardly 0

For example, for dates 2019-10-02 and 2019-10-17 here, the download count is zero.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2019-09-30",
  to = Sys.Date()
)

#>          date count package
#> 1  2019-09-30 40492 ggplot2
#> 2  2019-10-01 42182 ggplot2
#> 3  2019-10-02     0 ggplot2
#> 4  2019-10-03 38139 ggplot2
#> 5  2019-10-04 34742 ggplot2
#> 6  2019-10-05 21580 ggplot2
#> 7  2019-10-06 20301 ggplot2
#> 8  2019-10-07 39840 ggplot2
#> 9  2019-10-08 41505 ggplot2
#> 10 2019-10-09 41291 ggplot2
#> 11 2019-10-10 40191 ggplot2
#> 12 2019-10-11 34916 ggplot2
#> 13 2019-10-12 22149 ggplot2
#> 14 2019-10-13 21523 ggplot2
#> 15 2019-10-14 38852 ggplot2
#> 16 2019-10-15 41294 ggplot2
#> 17 2019-10-16 39713 ggplot2
#> 18 2019-10-17     0 ggplot2
#> 19 2019-10-18     0 ggplot2
#> 20 2019-10-19     0 ggplot2

Created on 2019-10-19 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       Europe/Berlin               
#>  date     2019-10-19                  
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib source                         
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                 
#>  backports     1.1.5      2019-10-02 [1] CRAN (R 3.6.1)                 
#>  callr         3.3.2      2019-09-22 [1] CRAN (R 3.6.1)                 
#>  cli           1.1.0      2019-03-19 [1] CRAN (R 3.6.0)                 
#>  cranlogs      2.1.1.9000 2019-05-09 [1] Github (r-hub/cranlogs@361c9d7)
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.1)                 
#>  curl          4.2        2019-09-24 [1] CRAN (R 3.6.1)                 
#>  desc          1.2.0      2019-04-03 [1] Github (r-lib/desc@c860e7b)    
#>  devtools      2.2.1      2019-09-24 [1] CRAN (R 3.6.1)                 
#>  digest        0.6.21     2019-09-20 [1] CRAN (R 3.6.1)                 
#>  ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.6.1)                 
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)                 
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.0)                 
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.0)                 
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.0)                 
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.6.1)                 
#>  httr          1.4.1      2019-08-05 [1] CRAN (R 3.6.1)                 
#>  jsonlite      1.6        2018-12-07 [1] CRAN (R 3.6.0)                 
#>  knitr         1.25       2019-09-18 [1] CRAN (R 3.6.1)                 
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.5.1)                 
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)                 
#>  pkgbuild      1.0.6      2019-10-09 [1] CRAN (R 3.6.1)                 
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)                 
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.5.1)                 
#>  processx      3.4.1      2019-07-18 [1] CRAN (R 3.6.1)                 
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.0)                 
#>  R6            2.4.0      2019-02-14 [1] CRAN (R 3.6.0)                 
#>  Rcpp          1.0.2      2019-07-25 [1] CRAN (R 3.6.1)                 
#>  remotes       2.1.0      2019-06-24 [1] CRAN (R 3.6.0)                 
#>  rlang         0.4.0      2019-06-25 [1] CRAN (R 3.6.1)                 
#>  rmarkdown     1.16       2019-10-01 [1] CRAN (R 3.6.1)                 
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.5.1)                 
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                 
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                 
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                 
#>  testthat      2.2.1      2019-07-25 [1] CRAN (R 3.6.1)                 
#>  usethis       1.5.1.9000 2019-10-18 [1] Github (r-lib/usethis@55cff6e) 
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.1)                 
#>  xfun          0.10       2019-10-01 [1] CRAN (R 3.6.1)                 
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.1)                 
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.1/library

limit on number of packages as argument to cran_downloads

Hi,

I tried to do get download counts for 8000 packages and ran into a HTTP 414 (Request-URI Too Long). After some trial and error it seems the limit is at 905 packages, reproducable with following code:

cran_downloads(package = rep('cranlogs', 906))

I can split up the requests but it would be nicer to have that done by the package. Also the limit is not documented. Let me know if I'm doing something the package wasn't intended for.

cranlogs - no download counts on 04-28-2019

The download count for 2019-04-28 was 0. Not sure if this was a problem for just that day or if it represents a more systematic issue that started on that date.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2019-04-20",
  to = Sys.Date()
)
#>          date count package
#> 1  2019-04-20 15283 ggplot2
#> 2  2019-04-21 13779 ggplot2
#> 3  2019-04-22 25061 ggplot2
#> 4  2019-04-23 31446 ggplot2
#> 5  2019-04-24 31365 ggplot2
#> 6  2019-04-25 29973 ggplot2
#> 7  2019-04-26 27867 ggplot2
#> 8  2019-04-27 16977 ggplot2
#> 9  2019-04-28     0 ggplot2
#> 10 2019-04-29     0 ggplot2
#> 11 2019-04-30     0 ggplot2

Created on 2019-04-30 by the reprex package (v0.2.1.9000)

Session info
devtools::session_info()
#> - Session info ----------------------------------------------------------
#>  setting  value                                    
#>  version  R version 3.6.0 alpha (2019-03-29 r76300)
#>  os       Windows 10 x64                           
#>  system   x86_64, mingw32                          
#>  ui       RTerm                                    
#>  language (EN)                                     
#>  collate  English_United States.1252               
#>  ctype    English_United States.1252               
#>  tz       America/New_York                         
#>  date     2019-04-30                               
#> 
#> - Packages --------------------------------------------------------------
#>  package     * version    date       lib
#>  assertthat    0.2.1      2019-03-21 [1]
#>  backports     1.1.4      2019-04-10 [1]
#>  callr         3.2.0      2019-03-15 [1]
#>  cli           1.1.0      2019-03-19 [1]
#>  cranlogs      2.1.1.9000 2019-04-30 [1]
#>  crayon        1.3.4      2017-09-16 [1]
#>  curl          3.3        2019-01-10 [1]
#>  desc          1.2.0      2019-04-03 [1]
#>  devtools      2.0.2      2019-04-08 [1]
#>  digest        0.6.18     2018-10-10 [1]
#>  evaluate      0.13       2019-02-12 [1]
#>  fs            1.2.7      2019-03-19 [1]
#>  glue          1.3.1      2019-03-12 [1]
#>  highr         0.8        2019-03-20 [1]
#>  htmltools     0.3.6      2017-04-28 [1]
#>  httr          1.4.0      2018-12-11 [1]
#>  jsonlite      1.6        2018-12-07 [1]
#>  knitr         1.22.8     2019-04-13 [1]
#>  magrittr      1.5        2014-11-22 [1]
#>  memoise       1.1.0      2017-04-21 [1]
#>  pkgbuild      1.0.3      2019-03-20 [1]
#>  pkgload       1.0.2      2018-10-29 [1]
#>  prettyunits   1.0.2      2015-07-13 [1]
#>  processx      3.3.0      2019-03-10 [1]
#>  ps            1.3.0      2018-12-21 [1]
#>  R6            2.4.0      2019-02-14 [1]
#>  Rcpp          1.0.1      2019-03-17 [1]
#>  remotes       2.0.4      2019-04-10 [1]
#>  rlang         0.3.4      2019-04-07 [1]
#>  rmarkdown     1.12.3     2019-03-25 [1]
#>  rprojroot     1.3-2      2018-01-03 [1]
#>  sessioninfo   1.1.1      2018-11-05 [1]
#>  stringi       1.4.3      2019-03-12 [1]
#>  stringr       1.4.0      2019-02-10 [1]
#>  testthat      2.1.1      2019-04-23 [1]
#>  usethis       1.5.0      2019-04-07 [1]
#>  withr         2.1.2      2018-03-15 [1]
#>  xfun          0.6        2019-04-02 [1]
#>  yaml          2.2.0      2018-07-25 [1]
#>  source                            
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  Github (r-hub/cranlogs@509b780)   
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  Github (r-lib/desc@c860e7b)       
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  Github (yihui/knitr@cf3c219)      
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.3)                    
#>  Github (rstudio/rmarkdown@503cc5f)
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.5.1)                    
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.0alpha/library

`cran_downloads()` throws an error

I am trying to use the cranlogs package, but currently cran_downloads() is throwing the following error

No encoding supplied: defaulting to UTF-8.
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE HTML PUBLIC "-//W3C//
                     (right here) ------^

Thanks

Option to not count downloads < 1000 bytes

To compute the number of package downloads for a given day, cranlogs::cran_downloads() counts the number of entries (number of rows) for that package in CRAN's download logs. Would it be possible to add an optional argument so that observations with sizes less than 1000 bytes do not count toward the number of package downloads?

Two reasons. First, it's hard to say that such observations really represent a package download. Second, while much of this may just be unsuccessful/aborted downloads, I think that some of this is more than random noise.

Using 2019-10-23 as an example, here's what I found.

My back-of-envelope estimate is that around 5% (233,722 / 5,097,912) of all downloads on 2019-10-23 was smaller than 1000 bytes, typically around 500 bytes.

Here's an example. On 2019-10-23 'rstan' was downloaded 2,574 times.

> cranlogs::cran_downloads("rstan", from = "2019-10-23", to = "2019-10-23")
date count package
1 2019-10-23 2574 rstan

But if you look at the logs (RStudio's CRAN mirror at http://cran-logs.rstudio.com), you'll see that there are 40 entries smaller than 1000 bytes:

date time size package version country ip_id
1438000 2019-10-23 19:49:15 531 rstan 2.11.1 US 7
1438001 2019-10-23 19:49:15 537 rstan 2.12.1 US 7
1438002 2019-10-23 19:49:15 537 rstan 2.13.2 US 7
1438003 2019-10-23 19:49:15 531 rstan 2.14.1 US 7
1438004 2019-10-23 19:49:15 537 rstan 2.14.2 US 7
1438005 2019-10-23 19:49:15 537 rstan 2.15.1 US 7
1438006 2019-10-23 19:49:15 531 rstan 2.16.2 US 7
1438007 2019-10-23 19:49:15 537 rstan 2.17.2 US 7
1438008 2019-10-23 19:49:15 531 rstan 2.17.3 US 7
1438009 2019-10-23 19:49:15 537 rstan 2.17.4 US 7
1438010 2019-10-23 19:49:15 537 rstan 2.18.1 US 7
1438011 2019-10-23 19:49:15 537 rstan 2.18.2 US 7
1438012 2019-10-23 19:49:15 533 rstan 2.7.0-1 US 7
1438013 2019-10-23 19:49:15 533 rstan 2.8.0 US 7
1438014 2019-10-23 19:49:15 533 rstan 2.8.1 US 7
1438015 2019-10-23 19:49:15 539 rstan 2.8.2 US 7
1438016 2019-10-23 19:49:15 539 rstan 2.9.0-3 US 7
1438017 2019-10-23 19:49:15 539 rstan 2.9.0 US 7
1438121 2019-10-23 19:49:14 537 rstan 2.10.1 US 7
3607030 2019-10-23 10:59:51 534 rstan 2.19.2 <NA> 5
3702500 2019-10-23 20:16:37 537 rstan 2.10.1 US 7
3702501 2019-10-23 20:16:38 537 rstan 2.11.1 US 7
3702502 2019-10-23 20:16:38 537 rstan 2.12.1 US 7
3702503 2019-10-23 20:16:38 531 rstan 2.13.2 US 7
3702504 2019-10-23 20:16:38 531 rstan 2.14.1 US 7
3702505 2019-10-23 20:16:38 537 rstan 2.14.2 US 7
3702506 2019-10-23 20:16:38 537 rstan 2.15.1 US 7
3702507 2019-10-23 20:16:38 531 rstan 2.16.2 US 7
3702508 2019-10-23 20:16:38 537 rstan 2.17.2 US 7
3702509 2019-10-23 20:16:38 537 rstan 2.17.3 US 7
3702510 2019-10-23 20:16:38 531 rstan 2.17.4 US 7
3702511 2019-10-23 20:16:38 531 rstan 2.18.1 US 7
3702512 2019-10-23 20:16:38 537 rstan 2.18.2 US 7
3702513 2019-10-23 20:16:38 539 rstan 2.7.0-1 US 7
3702514 2019-10-23 20:16:39 539 rstan 2.8.0 US 7
3702515 2019-10-23 20:16:39 539 rstan 2.8.1 US 7
3702516 2019-10-23 20:16:39 539 rstan 2.8.2 US 7
3702517 2019-10-23 20:16:39 539 rstan 2.9.0-3 US 7
3702518 2019-10-23 20:16:39 533 rstan 2.9.0 US 7
4186380 2019-10-23 18:29:29 530 rstan 2.19.2 US 7

For what it's worth, here's the code for the above log data using packageLog() from the development version of 'packageRank' (v0.3.0.9000) on https://github.com/lindbrook/packageRank

rstan.log <- packageRank::packageLog("rstan", "2019-10-23")
vars <- c("date", "time", "size", "package", "version", "country", "ip_id")
rstan.log[rstan.log$size < 1000, vars]

While 40 of 2574 downloads is small, percentage-wise (1.6%), you'll see that the overwhelming majority of these observations comes from a single IP address that is "downloading" different (possible all) versions of 'rstan'.

While people may, of course, be interested in previous versions of a package and while many people are using network address translation (NAT), this kind of activity is not an isolated event. You'll find it across many packages intermittently throughout the month.

It even extends to "archived" packages (those that are not included on CRAN's main listing). For example, we see that 'bim' was downloaded 12 times:

> cranlogs::cran_downloads("bim", from = "2019-10-23", to = "2019-10-23")
date count package
1 2019-10-23 12 bim

But all 12 of those downloads looks like this:

date time size package version country ip_id
1746102 2019-10-23 19:39:06 539 bim 0.92-3 US 7
1746103 2019-10-23 19:39:06 533 bim 0.93-1 US 7
1746105 2019-10-23 19:39:06 537 bim 1.01-1 US 7
1746107 2019-10-23 19:39:06 537 bim 1.01-3 US 7
1746108 2019-10-23 19:39:06 537 bim 1.01-4 US 7
1746109 2019-10-23 19:39:06 537 bim 1.01-5 US 7
1937591 2019-10-23 19:12:29 533 bim 0.92-3 US 7
1937592 2019-10-23 19:12:29 539 bim 0.93-1 US 7
1937593 2019-10-23 19:12:29 537 bim 1.01-1 US 7
1937594 2019-10-23 19:12:29 537 bim 1.01-3 US 7
1937595 2019-10-23 19:12:29 537 bim 1.01-4 US 7
1937596 2019-10-23 19:12:29 537 bim 1.01-5 US 7

bim.log <- packageRank::packageLog("bim", "2019-10-23")
vars <- c("date", "time", "size", "package", "version", "country", "ip_id")
bim.log[, vars]

In this case, it's hard to say that this package was really downloaded.

Missing day when specifying "when" argument

When using cran_downloads() just now I get the following behavior:

cran_downloads(when = "last-day", package = "ggplot2")
# date count package
# 1 2017-09-02 0 ggplot2

cran_downloads(when = "last-week", package = "ggplot2")
# date count package
# 1 2017-08-25 10056 ggplot2
# 2 2017-08-26 5554 ggplot2
# 3 2017-08-27 4916 ggplot2
# 4 2017-08-28 12029 ggplot2
# 5 2017-08-29 13470 ggplot2
# 6 2017-08-30 14024 ggplot2
# 7 2017-08-31 14205 ggplot2

I was expecting "last-week" to include Sept 1, but it was skipped over. Is this the intended behavior or is it a bug?

`cran_downloads` returning 0s starting from 22.01.2020

Maybe this is a temporary issue, but raising the issue just in case this is pointing to something more systematic.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2020-01-20",
  to = Sys.Date()
)

#>         date count package
#> 1 2020-01-20 34508 ggplot2
#> 2 2020-01-21 41155 ggplot2
#> 3 2020-01-22     0 ggplot2
#> 4 2020-01-23     0 ggplot2
#> 5 2020-01-24     0 ggplot2
#> 6 2020-01-25     0 ggplot2

`cran_downloads` now working?

Maybe this is a temporary problem, but I noticed that download counts are 0 for the 1st of Sept.

cranlogs::cran_downloads(
  packages = "ggplot2",
  from = "2019-08-30",
  to = "2019-09-01"
)
#>         date count package
#> 1 2019-08-30 27377 ggplot2
#> 2 2019-08-31 13531 ggplot2
#> 3 2019-09-01     0 ggplot2

Created on 2019-09-03 by the reprex package (v0.3.0)

502 error: Bad Gateway

Hi,

Trying the package with default settings gives the following error for me:

Error in cran_downloads() : Bad Gateway (HTTP 502)

Here is an example:

devtools::install_github("r-hub/cranlogs")
library("cranlogs")
cran_downloads()

add user agent

"cranlogs R package".

not a parameter exposed to the user.

last-week, last-month vs. last 7 days, last 30 days

As it seems the values last-week, last-month of the when argument are really meant to mean last 7 days and last 30 days, respectively. Given the current names, one could suspect to get the last full week (Mon - Sun) and the last full month (1 to 28, 29, 30, or 31 days).

Thus, my suggestion is to introduce more explicit names (last-7-days, last-30-days), maybe additionally (and discourage use of last-week, last-month but allow for them as back-compatiblilty).

Change to new cran checks badge URL

πŸ‘‹πŸ½ I maintain the cran checks badges. Please change to the new cran checks badge URL (e.g., https://badges.cranchecks.info/worst/dplyr.svg). Old badges at (e.g. https://cranchecks.info/badges/worst/dplyr) will be unavailable as of Jan 1st 2023.

default value of "last-day" = today, should point to yesterday

When I call cran_downloads() it gives me the data for today, which is always 0 downloads.

library(cranlogs)
cran_downloads()
#>         date count
#> 1 2017-02-08     0

This isn't a problem in when I specify interval = "last-week":

library(cranlogs)
cran_downloads(NULL, "last-week")
#>         date   count
#> 1 2017-02-01 1205129
#> 2 2017-02-02 1139361
#> 3 2017-02-03 1106105
#> 4 2017-02-04  570183
#> 5 2017-02-05  571141
#> 6 2017-02-06 1129596
#> 7 2017-02-07 1253377

Don't use sapply()

> cran_downloads("ggplot3")
 Error in as.Date.default(sapply(x$downloads, "[[", "day")) : 
  do not know how to convert 'sapply(x$downloads, "[[", "day")' to class β€œDate” 

bioconductor stats

I realize this is cranlogs and not bioclogs, but is it possible to retrieve Bioconductor stats in addition to CRAN stats? If not, would it be difficult to add or is it just outside the scope of this project?

country variable typos in CRAN logs

I believe there are two "universal" typos in the country variable (probably) across all RStudio CRAN logs: "A1" (A + number one) and "A2" (A + number 2).

For what it's worth, using "2019-10-23" as an example (Wednesdays are usually high traffic days), here's the code I used:

ymd <- as.Date("2019-10-23")
year <- as.POSIXlt(ymd)$year + 1900
rstudio.url <- "http://cran-logs.rstudio.com/"
url <- paste0(rstudio.url, year, '/', ymd, ".csv.gz")
cran_log <- data.table::fread(url)
sort(unique(cran_log$country))

"A1" and "A2" are the first two elements of the vector. I haven't found any other country codes with numbers mixed in.

cranlogs::cran_downloads() overcounts downloads on 8 days at end of 2012 and beginning of 2013

This is for posterity's sake but I hope it'll be fixed.

For eight days at end of 2012 and the beginning of 2013, cranlogs::cran_downloads() returns counts that are double or even triple of what they should be. I'm fairly confident of this conclusion because the numbers I get are derived by directly downloading the logs from RStudio and counting the number of log entries.

The code for my analysis:

library(cranlogs)
library(packageRank)

start.date <- "2012-10-01"
end.date <- "2013-01-05"

# The expression below uses 'cranlogs' to compute the total number of
# downloads for all of CRAN on the dates above:

cranlogs.data <- cranlogs::cran_downloads(from = start.date, to = end.date)

# This code below uses 'packageRank' and the "raw" RStudio logs to compute
# the total number of download for all CRAN packages on the dates above.

# There are two functions to note: fixDate_2012(), which is part of
# 'packageRank' but is not exported (not in namespace) and
# packageRank::fetchCranLog().

# fixDate_2012() fixes mis-labelled filenames/URL and duplicate logs

fixDate_2012 <- function(date = "2012-12-31") {
  if (class(date) != "Date") ymd <- as.Date(date)
  else ymd <- date
  if (format(ymd, "%Y") == "2012") {
    if (ymd %in% as.Date(c("2012-12-29", "2012-12-30", "2012-12-31"))) {
      stop("Log for ", ymd, " is missing/unavailable.", call. = FALSE)
    } else if (ymd >= as.Date("2012-10-13") & ymd <= as.Date("2012-12-28")) {
      ymd <- ymd + 3
    } else if (ymd %in% as.Date(c("2012-10-11", "2012-10-12"))) {
      if (identical(ymd, as.Date("2012-10-11"))) {
        ymd <- as.Date("2012-10-12")
      } else if (identical(ymd, as.Date("2012-10-12"))) {
        ymd <- as.Date("2012-10-14")
      }
    }
  }
  ymd
}

# packageRank::fetchCranLog(date, memoization = FALSE)
# retrieves logs by their "literal" or exact filename/URL

d <- seq(from = as.Date(start.date), to = as.Date(end.date), by = "day")

packageRank.data <- vapply(d, function(x) {
  tmp <- try(packageRank::fetchCranLog(fixDate_2012(x), TRUE), silent = TRUE)
  if (any(class(tmp) == "try-error")) 0L
  else nrow(tmp[!is.na(tmp$package), ])
}, integer(1L))

packageRank.data <- data.frame(date = d, count = packageRank.data)

# Merge the two data frames by calendar date:
cran.data <- merge(cranlogs.data, packageRank.data, by = "date")
names(cran.data)[-1] <- c("cranlogs", "packageRank")

# Compute the ratio of counts of 'cranlogs' to 'packageRank'
cran.data$ratio <- cran.data$cranlogs / cran.data$packageRank

# If you take a look at `cran.data`, you'll see that generally,
# you get the same exact results for both methods except for
# 8 discrepancies or errors:

errors <- cran.data[cran.data$cranlogs != cran.data$packageRank, ]

# > errors
#          date cranlogs packageRank    ratio
# 6  2012-10-06    13630        6815 2.000000
# 7  2012-10-07       50          25 2.000000
# 8  2012-10-08      170          85 2.000000
# 11 2012-10-11      388         194 2.000000
# 87 2012-12-26    80738       26910 3.000297
# 88 2012-12-27    49007       24501 2.000204
# 89 2012-12-28    21959       10979 2.000091
# 93 2013-01-01    21822       10911 2.000000

The ratio of these differences are generally whole numbers. This leads me to believe that there may be computational errors in 'cranlogs'.

  1. I'm not sure what's going on with "2012-10-06".

  2. I believe that problem with "2012-10-07", "2012-10-08" and ""2012-10-11" stem from the fact that those logs for are actually duplicated in the RStudio logs.

Nominal          Actual log in file/URL
2012-10-07 ----- 2012-10-07
2012-10-11 ----- 2012-10-07

2012-10-08 ----- 2012-10-08
2012-10-13 ----- 2012-10-08

2012-10-12 ----- 2012-10-11
2012-10-15 ----- 2012-10-11

This overcounting makes sense because, as you wrote in issue #54, you rely on the data in the files and not the filenames/URLs. By doing so, you may have ended up double counting.

  1. I haven't sorted out what's going on with the 4 remaining dates ("2012-12-26", "2012-12-27", "2012-12-28", "2013-01-01") but I'm guessing it has something to do with the fact that they surround the 3 missing/lost RStudio logs ("2012-12-29", "2012-12-30", "2012-12-31").

Note that the ratios for the three December dates are not whole numbers. However, I did a sanity check using the top six packages for each of the three days; they all returned whole number multiples. If useful, I can provide more details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.