GithubHelp home page GithubHelp logo

r-hub / cransays Goto Github PK

View Code? Open in Web Editor NEW
76.0 76.0 10.0 168.14 MB

Creates an Overview of CRAN Incoming Submissions :mailbox_with_mail:

Home Page: https://r-hub.github.io/cransays/articles/dashboard.html

License: Other

R 100.00%
cran cran-r r r-package r-packages rstats

cransays's People

Contributors

alexisderumigny avatar bbolker avatar bisaloo avatar dependabot[bot] avatar gadenbuie avatar hadley avatar jeroen avatar jimhester avatar llrs avatar maelle avatar mitchelloharawild avatar olivroy avatar stephlocke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cransays's Issues

Function to analyze history branch/data

I know cransays is not really to deliver code, but I have some code to merge all the csv files of the history branch that I think it would be helpful to others (and myself) if it were documented here.
The code solves merging some files with different headers efficiently (previous iterations of the code lasted 30 minutes and now I can do it in just 1).

I think it doesn't have dependencies and wouldn't need to be run or tested but it could help others if they want to analyze the data.

Let me know if it would be helpful/appropriate and I would create a PR with the code.

Only include tar.gz files

It looks like someone inadvertently added a pdf in their folder:

image

To prevent this, it would be useful to ensure only tar.gz are listed. I can submit a PR for this in the next couple of days if necessary.

Show more information about the packages?

I'm still undecided since it'd mean unpacking the .tar.gz to find the URL/BugReports. If we did that, we'd need to cache results somehow in order not to unpack all .tar.gz every hour.

Then using the information:

  • make the package name clickable in the table.
  • maybe we'd want to show maintainer name?

Where to store historical data?

Opening this issue so we have a public & central place to discuss this matter.

Having the historical data on a branch seems suboptimal:

  • it is difficult to discover (although this could be changed by advertising it more in the README & the pkgdown website)
  • inevitable growth of the branch size will have ripple effects on all operations in the main branch (in particular clones & checkouts)

The cleanest option is probably to store this data in an actual database, hosted on an external service. This makes sense since we're not actually changing the file contents, just adding new files, and therefore don't need a Version Control System. But:

  • this costs money
  • it requires more maintenance / learning how to use a new service

Another simpler (albeit imperfect) option would be to store the historical data in a distinct GitHub repository. This uses tools we already know, is free, public & easy to find.

Dashboard hasn't updated in a few days

Thank you for creating this wonderful service!

I wanted to let you know that the dashboard hasn't updated in 3 days (in case you weren't already aware)
image

๐ŸŒด ๐ŸŒž

Add an explanation for the different folders

Right now, it is a bit obscure for anyone unfamiliar with the submission process what each folder means. If I'm a package maintainer, I can see that my package has moved to pretest but what does it mean?

My understanding is that there is no official documentation about the meaning of each folder (it may even depends on each maintainer????) so this may be a bit difficult.

The diagram from https://github.com/edgararuiz/cran-stages could help here but I'm not sure under which license it's been released.

Add incoming time to history snapshots

Can you please add the incoming time to the history branch CSV snapshots?

Those currently only include snapshot_time which is constant for all of them.

I am trying to use the GitHub API to build an alternative frontend for the data.

Thank you for your time.

Edit: For reference I am doing this (TypeScript):

async function fetchCranSays() {

    // fetch last commit by actions bot on history branch
    const reCommints: any[] = await fetch(
        "https://api.github.com/repos/lockedata/cransays/commits?sha=history&author=actions-user&per_page=1"
    ).then((response) => response.json());

    const commitSha = reCommints[0].sha;
    console.log(commitSha);

    // fetch commit (for filename)
    const reCommitExt: any = await fetch(
        "https://api.github.com/repos/lockedata/cransays/commits/" + commitSha
    ).then((response) => response.json());

    const csvFilename = reCommitExt.files[0].filename;
    console.log(csvFilename);

    // fetch csv
    const reCsv: any = await fetch(
        "https://api.github.com/repos/lockedata/cransays/contents/" +
        csvFilename +
        "?ref=history"
    ).then((response) => response.json());

    const csv = atob(reCsv.content);
    console.log(csv)
}

Dashboard is out of date

Appears to be last updated 2019-12-13 14:46 UTC+0000.

Brilliant site, by the way, looks great and very insightful for submitters. Cheers!!!

Description for `pending` is wrong?

On the dashboard, it is written:

pending: the CRAN maintainers are waiting for an action on your side. You should check your emails!

Yet, for the lightr package, which is currently in pending, I got the following email:

Dear maintainer,

package lightr_1.2.tar.gz has been auto-processed and is pending a manual inspection. A CRAN team member will typically respond to you within the next 5 working days. For technical reasons you may receive a second copy of this message when a team member triggers a new check.

The GitHub Actions workflow to render dashboard may be deactivated

It seems like the dashboard is not being updated. A look at the GitHub Actions page indicates that it is due to an automated deactivation of the cron job. I hope you can fix it and have your great service up and running again. Thanks for providing this service to the community!

improve the code in take_snapshot

  • remove repetition of code

  • have a better format for the human subfolders (DSok/ -> DS/ok)

  • for each line add a direct link to the corresponding folder

Alternative approach

Hi,

great work on the dashboard.
I built a similar website, but with a different approach:

  • Tiny Rust server backend that lazily fetches data from the incoming FTP and caches it in memory (currently for 10 minutes)
  • Svelte Frontend on GitHub Pages with auto-reload, search, and let's you track individual packages.

Website:
https://nx10.github.io/cransubs/

Repos:
https://github.com/nx10/cransubs
https://github.com/nx10/cransubs-server

It would be great to hear what you think.
Feel free to close this issue anytime.

WISH: Increase poll frequency

Currently, the CRAN incoming FTP server is polled once an hour:

schedule:
- cron: '0 * * * *'

Have you considered increasing this to, say, two or four times an hour? I doubt it would make a big dent in the total amount of traffic that the CRAN server sees. It might even help decrease the traffic by moving someone who's tracking their package manually to looking at CRANsays instead - once an hour is not enough for such use.

UPDATE: I see that https://nx10.github.io/cransubs/ is updated once every ten minutes.
UPDATE 2: It's updated only when someone access it, and I guess at most every 10 minutes.

Github actions set up

Hi! Many thanks for providing this useful report!
Also hope that the new tracking history will help to provide some insight on the process. I already set up a reminder to analyse it on 2021 when more data will have accumulated.

I attempted to replicate the idea but with the current available packages at llrs/cranis, to provide a more complete view of the time between the submission and the appearance at CRAN (and about package removals and reappearances).

I have trouble mimicking the GHA set up: llrs/cranis#1. Maybe someone could explain how does it work or help with the setup. Thanks again!

Think about default ordering in table

At the moment, we don't provide a value for defaultSorted in the reactable() call.

This results in the default ordering being the data.frame ordering: results are ordered by folder first.

I'm not completely sure this matches the expectation of visitors. This is particularly visible at the moment because the human folder contains quite a high number of package since many many months. Users who want to check the status of their recently submitted package will to scroll past it to get the info they need.

Publish website directly via GHA workflow

Background and upsides/downsides discussed here: r-lib/actions#597.

The idea to revive this proposal here comes from the realization we are using a severely outdated version of JamesIves/github-pages-deploy-action.

The r-lib/actions maintainers identified it was not a good fit for the pkgdown action because it doesn't play well with the development: mode: auto of pkgdown but I wonder if it would be good to have here.

Do you see any issues with making the switch?

Dashboard status

This is an issue that will be automatically re-opened whenever dashboard updates fail. If this issue is currently open, our team has been notified and someone will try to fix it as soon as possible.

Dashboard status

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.