GithubHelp home page GithubHelp logo

Comments (9)

samuelli avatar samuelli commented on May 22, 2024 2

Yep, this is all stuff I wanted to add but haven't quite had the time to work on yet.

Definitely planning a pretty complex caching layer. Given that rendering can be pretty slow, I'm thinking some combination of parameters in the request (eg. time before stale, time for refresh) that result in a queued render.

eg. you request resource a (stale = 6 day, refresh = 1 day), and if it was cached 2 days ago, you would serve the cached render and queue a render to update the cache (since refresh was requested at 1 day).

No opinions on cloud/redis. What are the storage limits for values in both?

More generally, the whole bot-render needs to be architected properly to scale properly.

from rendertron.

samuelli avatar samuelli commented on May 22, 2024 1

Memcached or datastore?
Memcached is a distributed memory object caching system which allows for rapid reads/writes. Google Cloud Datastore is a NoSQL, scaling datastore.

  • Size: Gzip size of a sample page on webcomponents.org was 30KB. Redis free tier only supports 30MB which means we’re limited to storing 1000 pages in the cache. For any reasonable service size, this would result in a high cache miss rate.
  • Performance: memcached is an order of magnitude faster than Cloud Datastore. However, a cache miss is significantly more costly since we’ll have to re-generate the result again which is pretty slow given the headless rendering times.

Datastore is the more appropriate fit for what we need, which is a large storage system where read performance is not critical.

from rendertron.

justinribeiro avatar justinribeiro commented on May 22, 2024

The queued render is a good idea; there is a pretty clean path to this on the existing GCP side (cron/task queues) although could arch something different. As you note, there's some complexity there via time before stale and what not, but doable. :-)

No opinions on cloud/redis. What are the storage limits for values in both?

Text long has a 1MB limit in GAE datastore (non-indexed), String in Redis has a 512MB limit.

More generally, the whole bot-render needs to be architected properly to scale properly.

If you've got a doc lying around, happy to contribute however I can.

from rendertron.

samuelli avatar samuelli commented on May 22, 2024

Unrelated, why is only returning head awesome? That's planned to be changed to return the whole (serialised) response. Ref: https://github.com/justinribeiro/pwa-firebase-functions-botrender/blob/master/functions/index.js#L67

1MB should be fine... if we use compression.

I sent you the doc of what I have so far, but it's mostly around what needs to be done and less about architectural design. I haven't spent too much time thinking about architecture, but I'm expecting it to be mostly pretty layered/modular.

Caching is one layer, rendering another, queues another etc. I'd want to extract/publish a few node modules as well. I'd also like to do some pretty heavy load testing too.

from rendertron.

justinribeiro avatar justinribeiro commented on May 22, 2024

Unrelated, why is only returning head awesome? That's planned to be changed to return the whole (serialised) response. Ref: https://github.com/justinribeiro/pwa-firebase-functions-botrender/blob/master/functions/index.js#L67

It's awesome because when I tested this last year with headless_shell and some other AppEngine python things, I found that you can actually return even less to most link sharing bots (they really just love those open graph tags). Cut down the response to bare bones data, speed up where item is being shared. Now, should you only send the <head>? Probably not. :-) Basically playing "pffft who needs valid data" in which case the upcoming change I'm fine with (makes sense to me). I think I'll put a note in there about that.

I sent you the doc of what I have so far, but it's mostly around what needs to be done and less about architectural design. I haven't spent too much time thinking about architecture, but I'm expecting it to be mostly pretty layered/modular.

Cool. I'll have a look and try not to pester you (I know you're busy).

from rendertron.

yadvendar avatar yadvendar commented on May 22, 2024

I am sorry this is unrelated to the mentioned issue but I am curious and have a query.

It's awesome because when I tested this last year with headless_shell and some other AppEngine python things, I found that you can actually return even less to most link sharing bots (they really just love those open graph tags).

@justinribeiro @samuelli Can we not have both the functionalities in bot-render?

  1. Send limited amount of data for social sharing/link sharing bots (taking Open Graph tags into account).
  2. Dump complete HTML DOM for search engine bots.

Probably, control this by having an additional query param in the existing route.

Background: I landed on this repo looking for a solution to our use case - Replace PhantomJS by Chrome headless in a service responsible for serving a rendered HTML DOM of a provided URL. We use this service to serve our Single Page Application (AngularJS) URLs to search engine bots.

from rendertron.

yadvendar avatar yadvendar commented on May 22, 2024

I have done the same for our use case here. Please let me know your views and if I need to modify anything to fit into your contributing guidelines.

from rendertron.

samuelli avatar samuelli commented on May 22, 2024

I've suggested this before, but the benefits seems questionable. Performance & size is not typically a concern with any of these bots. This solution already requires differential serving, so having to do another check for sharing/search adds a little bit of extra complexity.

My thoughts at this stage are to serve full content to all of these agents until there is a demonstrable case for why this would be needed. It seems nice in theory, but the practical implications are unclear at this stage.

from rendertron.

samuelli avatar samuelli commented on May 22, 2024

Closing this as there is a basic cache layer implemented now.

from rendertron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.