GithubHelp home page GithubHelp logo

Comments (5)

kevquirk avatar kevquirk commented on July 22, 2024 1

@JLO64 firstly, thanks for all the great work on this - yay for open source!

So a few comments from me, and I'll also let @garritfra opine too..

Broken sites

We definitely need to do something with these. We currently have the site checker which spits out a list of any status that isn't 200, then we manually go through the list. We haven't run it for a while, which is probably why you're seeing lots of broken sites.

Having something automated to clean up would be better for obvious reasons, but I think we need to consider some kind of safety net so that things aren't accidentally deleted that are valid. Maybe a CRON that runs and if a site fails over, say 7 days, it gets deleted. I'm just spit-balling ideas here, as this is all way out of my skillset.

If that can't be done easily, we will continue with the manual script and cleanup.

Cloudflare auth and .env file

I can set up an account so it's tied to this project. I wonder if instead of an .env file we use GitHub secrets instead? That way we can work it all into GH Actions, as you mentioned?

Markdown file output

I think that's fine - whatever makes it easiest to parse is good with me.

Thanks again,

Kev

from 512kb.club.

garritfra avatar garritfra commented on July 22, 2024 1

No objections!

An idea regarding the automatic size check: Renovate, the dependabot competitor, opens an issue in each repository it's active in and lists all dependencies that are out of date. Maybe this approach works here as well?

We could have an issue that regularly gets updated by an action, listing all domains that could be removed, alongside the time it has been down for or above 512 kB. To avoid having to maintain a backend, the body of the issue should be in machine-readable form.

If we want, we can even add checkmarks next to the domains to schedule removal, or remove any domains that were unhealthy for n days (or n iterations of the check).

These are just my unfiltered thoughts, I'm totally open for remarks or alternative solutions.

from 512kb.club.

JLO64 avatar JLO64 commented on July 22, 2024 1

Sorry for the lack of updates, I got really sick on Monday and am still recovering.

Quick update on the script. I tried running it overnight but it errored out after ~250 entries. I'm not quite certain why that happened since when I reran it from the entry it failed at it ran properly. Additionally, I think there's an issue with some form of rate limiting going on.

image

This is a patern throughought the table where every 60-ish entries for roughly 10 entries it failes to scan them. At most the script querries the API 2 times per 20 seconds which is well within the limits of the overall API and that particular endpoint.

Additionally, there are a bunch of sites that are failing seemingly at random despite being accessable via a browser and Cloudflare. The frequency of this is low at roughly 1 per 40 entries.

Thankfully, I can just have the script retest these entries by having it sort sites.yml by last_passed instead of last_checked. I'll be doing that once I'm done with the entire list of sites.

Long term I don't think this will be an issue if the script is run periodically via GitHub Actions for just a couple of sites at a time, but this is going to be a problem if we ever have to rerun the entire list again. For now I'll PR the script as it is. I haven't changed the API token stuff, documentation, or comments. I'll do that in later PRs.

from 512kb.club.

JLO64 avatar JLO64 commented on July 22, 2024

Sorry for not responding sooner! I've had a hectic weekend and haven't been able to touch VS Code or GitHub, but I'll definely be able to submit a PR based on the above comments soon.

I wish I could contribute more on the GitHub Actions side of things, but I've never used it before now. I'm gonna try giving myself a crash course on how that all works this week, but for now I'll absain from commenting on that stuff. That said, switching to GitHub Secrets seems like a good idea!

Regarding broken links and websites larger than 512KB, maybe a new variables should be added to the yaml entries. last_checked and last_passed ? (EDIT: I went to sleep late last night, so I didn't realize that we already have a variable checking date last checked lol) This could additionally be used to filter links for the website by adding a liquid if statement to index.md that checks to make sure that these two dates are the same. This could really help the QoL of someone just browsing the list.

from 512kb.club.

kevquirk avatar kevquirk commented on July 22, 2024

This looks fantastic!

from 512kb.club.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.