GithubHelp home page GithubHelp logo

Comments (6)

ethn avatar ethn commented on May 30, 2024

No there isn't a current mechanism, but I have several related thoughts:

  1. Currently we track references (by links, nests, queries...) to other cards in a separate table (card_references). This is what makes it fast/easy to query card relationships. I have often wanted to track external references as well. That would be a good building block for a system like this, because you could go through all the links without re-parsing content.
  2. WikiRate.org uses external links heavily, and this is a problem for them as you mention. One part of the planned solution (currently slotted for attention in about 2 months) would involve storing copies of external sources. Those may become canonical (?), but regardless, it would clearly be useful to know whether external links are valid.

I'll have to give more thought as to whether something like #1 above could come together quickly enough to make it a better solution for us than something like link-checker. On the one hand, #1 would clearly be much more efficient. decko sites can be pretty hard to crawl efficiently because of all the content reuse, and wikirate.org is especially complex. And we already have all the parsing mechanisms in place, so it wouldn't be a massive new undertaking for us. On the other hand, new development always takes time.

I guess part of the question may come down to the utility of making the references queryable. I can certainly imagine that having benefits for wikirate down the road.

from decko.

tukanos avatar tukanos commented on May 30, 2024

Thank you for your answer.

would involve storing copies of external sources

In my eyes, that is probably nearly impossible. Imagine all the contents you wanted to be saved every time the source will get update. The source can contain plenty of javascript, ajax, etc. Of course, it will depend on what exactly you are storing and how much you plan to store. If your database will grow probably will take also some toll on the performance.

On the other hand, if the source is simple enough it can makes sense.

from decko.

ethn avatar ethn commented on May 30, 2024

Our solution isn't super ambitious: it involves one static version of the source document as it was at citation time. In WikiRate's case, static is arguably preferable, because citations can have specific content references that will get lost in an update. But WikiRate is probably unusual in the need for cached source files; I wouldn't expect many other sites to borrow that functionality.

The shared functionality is what you proposed, the external link tracking. That's just a record of the link and its validity. In WikiRate's case, it makes sense to provide a link to the external source version so long as the reference is still valid. That will avail users of any dynamic (JS, etc) functionality, so long as the source is there.

Re resources, while our (cloud-based) file storage will undoubted grow, we won't be updating it with every source update. The only major database growth would be the external link tracking, which isn't storing much more than (1) referring card id, (2) referee uri, and (3) current http status. That should be manageable.

from decko.

tukanos avatar tukanos commented on May 30, 2024

Our solution isn't super ambitious: it involves one static version of the source document as it was at citation time. In WikiRate's case, static is arguably preferable, because citations can have specific content references that will get lost in an update. But WikiRate is probably unusual in the need for cached source files; I wouldn't expect many other sites to borrow that functionality.

I see. The static version would be quite nice to have. Some really important information gets lost in the internet history.

The shared functionality is what you proposed, the external link tracking. That's just a record of the link and its validity. In WikiRate's case, it makes sense to provide a link to the external source version so long as the reference is still valid. That will avail users of any dynamic (JS, etc) functionality, so long as the source is there.

Yes that is exactly what I have proposed. The external links tracking, if the site is alive at the link. It would be also good to have some mass update functionality if only the source moved to different link like e.g. domain.com/I_was_here now it is at domain.com/new_site/old_information/I_m_here.

Re resources, while our (cloud-based) file storage will undoubted grow, we won't be updating it with every source update. The only major database growth would be the external link tracking, which isn't storing much more than (1) referring card id, (2) referee uri, and (3) current http status. That should be manageable.

Yes that is reasonable. I was talking more on the copy site to wikirate case. In case of external link tracking that should be manageable, even to say desirable.

from decko.

ethn avatar ethn commented on May 30, 2024

I like the bulk update idea. It would be pretty similar to what happens when a card gets renamed, provided we have the link tracking.

I suppose we could also consider updating the link in the case of redirects, but that's not always desirable (eg when a more permanent links redirects to a more temporary one).

from decko.

tukanos avatar tukanos commented on May 30, 2024

I like the bulk update idea. It would be pretty similar to what happens when a card gets renamed, provided we have the link tracking.

Yes the logic would be similar.

I still about how to dead with a situation when a "dead" link is found. Would you keep the information and put a [deadlink] tag or would you prefer to delete it completely? Maybe an option to copy it from Internet Archive: Wayback Machine or any other "backup" source would be also nice.

from decko.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.