GithubHelp home page GithubHelp logo

artifacts's Introduction

Hypothesis

Web browsers let us access most of human knowledge, but they do it in a way it was prescribed by a site hosting the resource and do very little in terms of capturing relevant information into local knowledge base. This experiment explores ways to view web through different lens, e.g. viewing web page as an image catalog or readable article, local anotations and possibly more... in an assumbtion that captured web artifacts can seed ideas and help us identify connections.

Status: ๐Ÿ’ฃ Experimental ๐Ÿ’ฃ

At the moment this is a reasearch experiment that is unlikely to be useful, however if you feel inlclined to try you can use Artifacts bookmarlet and run it on arbitrary page to see what happens. Bookmarklet does not work on sites that use conten security policies to block third party scripts and in the future we plan to have a web-extension to overcome this limitation.

Design

When bookmarklet is activated it loads a bookmarklet host, script that injects an iframe into a document loading a bookmarklet client which communicates with a host via MessagePort API.

In the future we plan to load host as browser extension content script in order to overcome content security restrictions. Other than that design will remain equivalent.

Client once loaded will issue request to a host to scrape document metadata, archive page, etc.... Which host fulfills and transfers response back to the client which then renders it in the UI.

Preview cards

On clients request host will scrape metadata from the document for the "preview card" (that is similar to twitter, slack, apple messages, etc...)

Scraper attempts to extract following information from the document:

  • URL
  • Hero images
  • Title
  • Summary
  • Site name

To accomplish this it looks for the following information to thevarios extent.

If none of the above is found, it falls back to trying it's best at guessing it via primitive algorithm inspired by Mozilla Readability library.

Web Archive (status:work in progress)

On client request host will archive a page via (custom fork of) an excellent freeze-dry library as web bundle file containing all the linked resources and transfer it back to a client in form of ArrayBuffer of it's content.

There is no shortage of file formats for representing web bundles:

However none is part of web standard or widely supported by mainstream browsers, there for figuring out right format for the task is part of this research.

Received web bundle then gets loaded into a special web bundle viewer.

Web bundle Viewer (status:todo)

Given that no browser support viewing web bundles natively (except for Safari) for this reasearch we create a custom viewer using service worker registered at /webarchive/ and sandboxed iframe.

This allows us to e.g. access archived web bundle via URL like:

https://gozala.io/artifacts/webarchive/blob/dc265246-d4ca-f644-91a5-d4b33c4512fd

service worker will take care of decoding corresponding web bundle file and serving all the linked resources per request.

artifacts's People

Contributors

gozala avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.