I have a notion that I'd like to add a cache layer that checks to see if the site has

I have done the same for our use case <a href="https://github.com/yadvendar/bot-render

[thought] cache layer and return about rendertron HOT 9 CLOSED

googlechrome commented on May 22, 2024

[thought] cache layer and return

from rendertron.

Comments (9)

samuelli commented on May 22, 2024 2

Yep, this is all stuff I wanted to add but haven't quite had the time to work on yet.

Definitely planning a pretty complex caching layer. Given that rendering can be pretty slow, I'm thinking some combination of parameters in the request (eg. time before stale, time for refresh) that result in a queued render.

eg. you request resource a (stale = 6 day, refresh = 1 day), and if it was cached 2 days ago, you would serve the cached render and queue a render to update the cache (since refresh was requested at 1 day).

No opinions on cloud/redis. What are the storage limits for values in both?

More generally, the whole bot-render needs to be architected properly to scale properly.

from rendertron.

samuelli commented on May 22, 2024 1

Memcached or datastore?
Memcached is a distributed memory object caching system which allows for rapid reads/writes. Google Cloud Datastore is a NoSQL, scaling datastore.

Size: Gzip size of a sample page on webcomponents.org was 30KB. Redis free tier only supports 30MB which means we’re limited to storing 1000 pages in the cache. For any reasonable service size, this would result in a high cache miss rate.
Performance: memcached is an order of magnitude faster than Cloud Datastore. However, a cache miss is significantly more costly since we’ll have to re-generate the result again which is pretty slow given the headless rendering times.

Datastore is the more appropriate fit for what we need, which is a large storage system where read performance is not critical.

from rendertron.

justinribeiro commented on May 22, 2024

The queued render is a good idea; there is a pretty clean path to this on the existing GCP side (cron/task queues) although could arch something different. As you note, there's some complexity there via time before stale and what not, but doable. :-)

No opinions on cloud/redis. What are the storage limits for values in both?

Text long has a 1MB limit in GAE datastore (non-indexed), String in Redis has a 512MB limit.

More generally, the whole bot-render needs to be architected properly to scale properly.

If you've got a doc lying around, happy to contribute however I can.

from rendertron.

samuelli commented on May 22, 2024

Unrelated, why is only returning head awesome? That's planned to be changed to return the whole (serialised) response. Ref: https://github.com/justinribeiro/pwa-firebase-functions-botrender/blob/master/functions/index.js#L67

1MB should be fine... if we use compression.

I sent you the doc of what I have so far, but it's mostly around what needs to be done and less about architectural design. I haven't spent too much time thinking about architecture, but I'm expecting it to be mostly pretty layered/modular.

Caching is one layer, rendering another, queues another etc. I'd want to extract/publish a few node modules as well. I'd also like to do some pretty heavy load testing too.

from rendertron.

justinribeiro commented on May 22, 2024

Unrelated, why is only returning head awesome? That's planned to be changed to return the whole (serialised) response. Ref: https://github.com/justinribeiro/pwa-firebase-functions-botrender/blob/master/functions/index.js#L67

It's awesome because when I tested this last year with headless_shell and some other AppEngine python things, I found that you can actually return even less to most link sharing bots (they really just love those open graph tags). Cut down the response to bare bones data, speed up where item is being shared. Now, should you only send the <head>? Probably not. :-) Basically playing "pffft who needs valid data" in which case the upcoming change I'm fine with (makes sense to me). I think I'll put a note in there about that.

I sent you the doc of what I have so far, but it's mostly around what needs to be done and less about architectural design. I haven't spent too much time thinking about architecture, but I'm expecting it to be mostly pretty layered/modular.

Cool. I'll have a look and try not to pester you (I know you're busy).

from rendertron.

yadvendar commented on May 22, 2024

I am sorry this is unrelated to the mentioned issue but I am curious and have a query.

It's awesome because when I tested this last year with headless_shell and some other AppEngine python things, I found that you can actually return even less to most link sharing bots (they really just love those open graph tags).

@justinribeiro @samuelli Can we not have both the functionalities in bot-render?

Send limited amount of data for social sharing/link sharing bots (taking Open Graph tags into account).
Dump complete HTML DOM for search engine bots.

Probably, control this by having an additional query param in the existing route.

Background: I landed on this repo looking for a solution to our use case - Replace PhantomJS by Chrome headless in a service responsible for serving a rendered HTML DOM of a provided URL. We use this service to serve our Single Page Application (AngularJS) URLs to search engine bots.

from rendertron.

yadvendar commented on May 22, 2024

I have done the same for our use case here. Please let me know your views and if I need to modify anything to fit into your contributing guidelines.

from rendertron.

samuelli commented on May 22, 2024

I've suggested this before, but the benefits seems questionable. Performance & size is not typically a concern with any of these bots. This solution already requires differential serving, so having to do another check for sharing/search adds a little bit of extra complexity.

My thoughts at this stage are to serve full content to all of these agents until there is a demonstrable case for why this would be needed. It seems nice in theory, but the practical implications are unclear at this stage.

from rendertron.

samuelli commented on May 22, 2024

Closing this as there is a basic cache layer implemented now.

from rendertron.

[thought] cache layer and return about rendertron HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs