r-hub / cransays Goto Github PK

View Code? Open in Web Editor NEW

76.0 76.0 10.0 168.14 MB

Creates an Overview of CRAN Incoming Submissions :mailbox_with_mail:

Home Page: https://r-hub.github.io/cransays/articles/dashboard.html

License: Other

R 100.00%

cran cran-r r r-package r-packages rstats

cransays's People

Contributors

Stargazers

Watchers

Forkers

riteshsrajput jimhester barryrowlingson cderv pizzashift iago-noncontributedforks dfalbel llrs alexisderumigny bbolker

cransays's Issues

Link in sidebar is broken

https://cransays.itsalocke.com/articles/dashboard.html

'CRAN review worflow' link in sidebar is broken. For some reason, it can't be clicked.

Stop hardcoding year

whilst finding a way to guess the year probably from the difference between the current time and the submission time, taking timezones into account.

cf https://github.com/lockedata/cransays/blob/master/R/take-snapshot.R#L99

Function to analyze history branch/data

I know cransays is not really to deliver code, but I have some code to merge all the csv files of the history branch that I think it would be helpful to others (and myself) if it were documented here.
The code solves merging some files with different headers efficiently (previous iterations of the code lasted 30 minutes and now I can do it in just 1).

I think it doesn't have dependencies and wouldn't need to be run or tested but it could help others if they want to analyze the data.

Let me know if it would be helpful/appropriate and I would create a PR with the code.

Improve the sorting by time since submission

Right now the sorting follows the character order, ideally it'd be sorted by number of days while keeping the current "pretty" display "number of days days ago"

Only include tar.gz files

It looks like someone inadvertently added a pdf in their folder:

To prevent this, it would be useful to ensure only tar.gz are listed. I can submit a PR for this in the next couple of days if necessary.

Show more information about the packages?

I'm still undecided since it'd mean unpacking the .tar.gz to find the URL/BugReports. If we did that, we'd need to cache results somehow in order not to unpack all .tar.gz every hour.

Then using the information:

make the package name clickable in the table.
maybe we'd want to show maintainer name?

Two versions of the dashboard are live

It seems there are two versions of the dashboard. One is updated, but not the other.

Correct: https://lockedata.github.io/cransays/articles/dashboard.html

Not updated ? https://itsalocke.com/cransays/articles/dashboard

Remove Locke Data branding

Remove pkgdown template package (and switch to BS5)?
Remove Locke Data ad in the vignette.

Cc @Bisaloo

use LD palette

Where to store historical data?

Opening this issue so we have a public & central place to discuss this matter.

Having the historical data on a branch seems suboptimal:

it is difficult to discover (although this could be changed by advertising it more in the README & the pkgdown website)
inevitable growth of the branch size will have ripple effects on all operations in the main branch (in particular clones & checkouts)

The cleanest option is probably to store this data in an actual database, hosted on an external service. This makes sense since we're not actually changing the file contents, just adding new files, and therefore don't need a Version Control System. But:

this costs money
it requires more maintenance / learning how to use a new service

Another simpler (albeit imperfect) option would be to store the historical data in a distinct GitHub repository. This uses tools we already know, is free, public & easy to find.

Change maintainer to Hugo Gruson

@Bisaloo 😉

Dashboard hasn't updated in a few days

Thank you for creating this wonderful service!

I wanted to let you know that the dashboard hasn't updated in 3 days (in case you weren't already aware)

🌴 🌞

only build pkgdown website from master

How to update information about folder meaning?

There are now two new folders, "newbies" and "waiting".

Cc @Bisaloo

Confusing description of category?

Today on the r-pkg-devel mailing list there was a question about the meaning of the folders/labels

Reference email from UweLigges: https://stat.ethz.ch/pipermail/r-package-devel/2022q2/008084.html (and see the whole thread)

And there is an email from Ben Bolker highlighting the difference with foghorn fmichonneau/foghorn#42

Dashboard not updating?

The dashboard has been stuck at 2019-05-23 11:41 UTC+0000 for a while now.

'archive' status/directory seems missing

The directory archive is used on the CRAN incoming FTP but not described in the Dashboard for status.

Description from R Journal 01/2018 (https://journal.r-project.org/archive/2018-1/cran.pdf) is:
archive reject the package, if the package does not pass the checks cleanly and the problem
are not likely to be false positives.

Add an explanation for the different folders

Right now, it is a bit obscure for anyone unfamiliar with the submission process what each folder means. If I'm a package maintainer, I can see that my package has moved to pretest but what does it mean?

My understanding is that there is no official documentation about the meaning of each folder (it may even depends on each maintainer????) so this may be a bit difficult.

The diagram from https://github.com/edgararuiz/cran-stages could help here but I'm not sure under which license it's been released.

Have a look at foghorn code

In particular maybe foghorn::cran_incoming() can help make the code here more elegant? h/t @fmichonneau

Add incoming time to history snapshots

Can you please add the incoming time to the history branch CSV snapshots?

Those currently only include snapshot_time which is constant for all of them.

I am trying to use the GitHub API to build an alternative frontend for the data.

Thank you for your time.

Edit: For reference I am doing this (TypeScript):

async function fetchCranSays() {

    // fetch last commit by actions bot on history branch
    const reCommints: any[] = await fetch(
        "https://api.github.com/repos/lockedata/cransays/commits?sha=history&author=actions-user&per_page=1"
    ).then((response) => response.json());

    const commitSha = reCommints[0].sha;
    console.log(commitSha);

    // fetch commit (for filename)
    const reCommitExt: any = await fetch(
        "https://api.github.com/repos/lockedata/cransays/commits/" + commitSha
    ).then((response) => response.json());

    const csvFilename = reCommitExt.files[0].filename;
    console.log(csvFilename);

    // fetch csv
    const reCsv: any = await fetch(
        "https://api.github.com/repos/lockedata/cransays/contents/" +
        csvFilename +
        "?ref=history"
    ).then((response) => response.json());

    const csv = atob(reCsv.content);
    console.log(csv)
}

Dashboard is out of date

Appears to be last updated 2019-12-13 14:46 UTC+0000.

Brilliant site, by the way, looks great and very insightful for submitters. Cheers!!!

Description for `pending` is wrong?

On the dashboard, it is written:

pending: the CRAN maintainers are waiting for an action on your side. You should check your emails!

Yet, for the lightr package, which is currently in pending, I got the following email:

Dear maintainer,

package lightr_1.2.tar.gz has been auto-processed and is pending a manual inspection. A CRAN team member will typically respond to you within the next 5 working days. For technical reasons you may receive a second copy of this message when a team member triggers a new check.

make the table in the vignette a Kanban board

Instead of a big table, have a table with columns reflecting the diagram in https://github.com/edgararuiz/cran-stages

The GitHub Actions workflow to render dashboard may be deactivated

It seems like the dashboard is not being updated. A look at the GitHub Actions page indicates that it is due to an automated deactivation of the cron job. I hope you can fix it and have your great service up and running again. Thanks for providing this service to the community!

Website and dashboard not reachable

I can't go on the dashboard
https://cransays.itsalocke.com/articles/dashboard.html
nor the website
https://cransays.itsalocke.com

Is this just me ?

improve the code in take_snapshot

remove repetition of code
have a better format for the human subfolders (DSok/ -> DS/ok)
for each line add a direct link to the corresponding folder

Alternative approach

Hi,

great work on the dashboard.
I built a similar website, but with a different approach:

Tiny Rust server backend that lazily fetches data from the incoming FTP and caches it in memory (currently for 10 minutes)
Svelte Frontend on GitHub Pages with auto-reload, search, and let's you track individual packages.

Website:
https://nx10.github.io/cransubs/

Repos:
https://github.com/nx10/cransubs
https://github.com/nx10/cransubs-server

It would be great to hear what you think.
Feel free to close this issue anytime.

Using timestamps of the ftp server

Just wanted to resurface this suggestion to use ftp timestamps (instead of action time or in addition of it):
#36 (comment)

WISH: Increase poll frequency

Currently, the CRAN incoming FTP server is polled once an hour:

cransays/.github/workflows/render-dashboard.yml

Lines 10 to 11 in b0cc818

 schedule: 

 - cron: '0 * * * *'

Have you considered increasing this to, say, two or four times an hour? I doubt it would make a big dent in the total amount of traffic that the CRAN server sees. It might even help decrease the traffic by moving someone who's tracking their package manually to looking at CRANsays instead - once an hour is not enough for such use.

UPDATE: I see that https://nx10.github.io/cransubs/ is updated once every ten minutes.
UPDATE 2: It's updated only when someone access it, and I guess at most every 10 minutes.

Github actions set up

Hi! Many thanks for providing this useful report!
Also hope that the new tracking history will help to provide some insight on the process. I already set up a reminder to analyse it on 2021 when more data will have accumulated.

I attempted to replicate the idea but with the current available packages at llrs/cranis, to provide a more complete view of the time between the submission and the appearance at CRAN (and about package removals and reappearances).

I have trouble mimicking the GHA set up: llrs/cranis#1. Maybe someone could explain how does it work or help with the setup. Thanks again!

Think about default ordering in table

At the moment, we don't provide a value for defaultSorted in the reactable() call.

This results in the default ordering being the data.frame ordering: results are ordered by folder first.

I'm not completely sure this matches the expectation of visitors. This is particularly visible at the moment because the human folder contains quite a high number of package since many many months. Users who want to check the status of their recently submitted package will to scroll past it to get the info they need.

ORCID IDs

@Bisaloo @mitchelloharawild do you want me to add your ORCID ID in the DESCRIPTION?

More info: https://ropensci.org/technotes/2018/10/08/orcid/

how to monitor health of the dashboard?

do we want to get notified if the table has 0 row?

Deal with concurrent runs due to starting delays

I suspect #86 wasn't the right fix since artifacts will always overwrite the previous ones. If jobs timings make it so that jobs from two different runs pull the same artifact, there will be nothing to commit.

Originally posted by @Bisaloo in #53 (comment)

Rename `master` branch to `main`

I can pull the switch. I just want to let all potential contributors know that they should update their git setup.

The tidyverse blog has a great post about how you deal with this change using convenience functions from the usethis package: https://www.tidyverse.org/blog/2021/10/renaming-default-branch/

Publish website directly via GHA workflow

Background and upsides/downsides discussed here: r-lib/actions#597.

The idea to revive this proposal here comes from the realization we are using a severely outdated version of JamesIves/github-pages-deploy-action.

The r-lib/actions maintainers identified it was not a good fit for the pkgdown action because it doesn't play well with the development: mode: auto of pkgdown but I wonder if it would be good to have here.

Do you see any issues with making the switch?

r-hub / cransays Goto Github PK

cransays's People

Contributors

Stargazers

Watchers

Forkers

cransays's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs