GithubHelp home page GithubHelp logo

Percentiles about cranlogs HOT 5 OPEN

r-hub avatar r-hub commented on July 17, 2024 1
Percentiles

from cranlogs.

Comments (5)

gaborcsardi avatar gaborcsardi commented on July 17, 2024

Good idea. I don't think it is difficult to implement. You want to help with it? :)

A new SQL (plpgsql) procedure is needed here: https://github.com/metacran/cranlogs.app/blob/master/db/proc.sql

from cranlogs.

pbreheny avatar pbreheny commented on July 17, 2024

Hmm...well, I'm not sure I know enough SQL/JSON to be of much help. Algorithmically, it would seem to require:

  1. Get names of all CRAN packages
  2. Run cran_downloads on that list
  3. Calculate quantiles

2 and 3 are straightforward. 1 is clearly possible, but I wouldn't know how to do it through the SQL/JSON interface. Or perhaps there's a more efficient approach than all this?

from cranlogs.

HenrikBengtsson avatar HenrikBengtsson commented on July 17, 2024

EDIT 2021-11-30: Answer to a different question below ... (I've updated it to say fraction instead of quantile)

Since you can get the total download count for all packages by passing packages = NULL ("... for a sum of downloads for all packages."), you could use that for your denominator. Here's the gist:

cran_download_fraction <- function(packages, ...) {
  counts <- cranlogs::cran_downloads(packages = packages, ...)
  total <- cranlogs::cran_downloads(packages = NULL, ...)
  z <- lapply(total$date, FUN = function(.date) {
    x <- subset(counts, date == .date)
    y <- subset(total, date == .date)
    x$fraction <- x$count / y$count
    x[, c("date", "count", "fraction", "package")]
  })
  z <- do.call(rbind, z)
  rownames(z) <- NULL
  z
}

Example:

pkgs <- c("rlang", "digest")
stats <- cran_download_fraction(pkgs, from = "2021-11-10", to = "2021-11-12")
stats
#>         date count    fraction package
#> 1 2021-11-10 86060 0.010044005   rlang
#> 2 2021-11-10 36999 0.004318129  digest
#> 3 2021-11-11 86956 0.011273038   rlang
#> 4 2021-11-11 36907 0.004784650  digest
#> 5 2021-11-12 78391 0.011641753   rlang
#> 6 2021-11-12 32555 0.004834704  digest

stats <- cran_download_fraction(pkgs, when = "last-week")
head(stats)
#>         date count    fraction package
#> 1 2021-11-17 87119 0.011624874   rlang
#> 2 2021-11-17 36247 0.004836681  digest
#> 3 2021-11-18 86853 0.012107869   rlang
#> 4 2021-11-18 37356 0.005207668  digest
#> 5 2021-11-19 72217 0.011277519   rlang
#> 6 2021-11-19 30428 0.004751684  digest

Suggestion

Add argument fraction = FALSE to cran_downloads() and make the above calculations internally.

Maybe fraction = TRUE could even be the default?

Limitation: The above is only for download fraction per day. For anyone who wishes to calculate download fraction for a longer time period, say, per week or per month, will have to do something else.

from cranlogs.

pbreheny avatar pbreheny commented on July 17, 2024

Well, this isn't really returning quantiles (or at least, not what I had in mind). rlang might represent 1.2% of all downloads on 2021-11-17, but I would assume that places it in the 99th percentile among all CRAN packages.

from cranlogs.

HenrikBengtsson avatar HenrikBengtsson commented on July 17, 2024

Doh! Fair point. I have no idea what I was thinking. I've updated my comment to say 'fraction' instead of 'quantile'.

from cranlogs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.