Comments (9)
Yep, this is all stuff I wanted to add but haven't quite had the time to work on yet.
Definitely planning a pretty complex caching layer. Given that rendering can be pretty slow, I'm thinking some combination of parameters in the request (eg. time before stale, time for refresh) that result in a queued render.
eg. you request resource a (stale = 6 day, refresh = 1 day), and if it was cached 2 days ago, you would serve the cached render and queue a render to update the cache (since refresh was requested at 1 day).
No opinions on cloud/redis. What are the storage limits for values in both?
More generally, the whole bot-render needs to be architected properly to scale properly.
from rendertron.
Memcached or datastore?
Memcached is a distributed memory object caching system which allows for rapid reads/writes. Google Cloud Datastore is a NoSQL, scaling datastore.
- Size: Gzip size of a sample page on webcomponents.org was 30KB. Redis free tier only supports 30MB which means we’re limited to storing 1000 pages in the cache. For any reasonable service size, this would result in a high cache miss rate.
- Performance: memcached is an order of magnitude faster than Cloud Datastore. However, a cache miss is significantly more costly since we’ll have to re-generate the result again which is pretty slow given the headless rendering times.
Datastore is the more appropriate fit for what we need, which is a large storage system where read performance is not critical.
from rendertron.
The queued render is a good idea; there is a pretty clean path to this on the existing GCP side (cron/task queues) although could arch something different. As you note, there's some complexity there via time before stale and what not, but doable. :-)
No opinions on cloud/redis. What are the storage limits for values in both?
Text long has a 1MB limit in GAE datastore (non-indexed), String in Redis has a 512MB limit.
More generally, the whole bot-render needs to be architected properly to scale properly.
If you've got a doc lying around, happy to contribute however I can.
from rendertron.
Unrelated, why is only returning head awesome? That's planned to be changed to return the whole (serialised) response. Ref: https://github.com/justinribeiro/pwa-firebase-functions-botrender/blob/master/functions/index.js#L67
1MB should be fine... if we use compression.
I sent you the doc of what I have so far, but it's mostly around what needs to be done and less about architectural design. I haven't spent too much time thinking about architecture, but I'm expecting it to be mostly pretty layered/modular.
Caching is one layer, rendering another, queues another etc. I'd want to extract/publish a few node modules as well. I'd also like to do some pretty heavy load testing too.
from rendertron.
Unrelated, why is only returning head awesome? That's planned to be changed to return the whole (serialised) response. Ref: https://github.com/justinribeiro/pwa-firebase-functions-botrender/blob/master/functions/index.js#L67
It's awesome because when I tested this last year with headless_shell and some other AppEngine python things, I found that you can actually return even less to most link sharing bots (they really just love those open graph tags). Cut down the response to bare bones data, speed up where item is being shared. Now, should you only send the <head>
? Probably not. :-) Basically playing "pffft who needs valid data" in which case the upcoming change I'm fine with (makes sense to me). I think I'll put a note in there about that.
I sent you the doc of what I have so far, but it's mostly around what needs to be done and less about architectural design. I haven't spent too much time thinking about architecture, but I'm expecting it to be mostly pretty layered/modular.
Cool. I'll have a look and try not to pester you (I know you're busy).
from rendertron.
I am sorry this is unrelated to the mentioned issue but I am curious and have a query.
It's awesome because when I tested this last year with headless_shell and some other AppEngine python things, I found that you can actually return even less to most link sharing bots (they really just love those open graph tags).
@justinribeiro @samuelli Can we not have both the functionalities in bot-render
?
- Send limited amount of data for social sharing/link sharing bots (taking Open Graph tags into account).
- Dump complete HTML DOM for search engine bots.
Probably, control this by having an additional query param in the existing route.
Background: I landed on this repo looking for a solution to our use case - Replace PhantomJS by Chrome headless in a service responsible for serving a rendered HTML DOM of a provided URL. We use this service to serve our Single Page Application (AngularJS) URLs to search engine bots.
from rendertron.
I have done the same for our use case here. Please let me know your views and if I need to modify anything to fit into your contributing guidelines.
from rendertron.
I've suggested this before, but the benefits seems questionable. Performance & size is not typically a concern with any of these bots. This solution already requires differential serving, so having to do another check for sharing/search adds a little bit of extra complexity.
My thoughts at this stage are to serve full content to all of these agents until there is a demonstrable case for why this would be needed. It seems nice in theory, but the practical implications are unclear at this stage.
from rendertron.
Closing this as there is a basic cache layer implemented now.
from rendertron.
Related Issues (20)
- Server not responding after running 1 or 2 days, suspect memory leak HOT 3
- Script monitor-inspect points to non existing file src/main.js HOT 1
- the memory of chrome process keeps increasing HOT 3
- Rendertron on Google Cloud Platform - Cloud Run HOT 1
- Rendertron returning base href of Angular application as "https://null" HOT 3
- Password protection / authentication HOT 1
- restrictedUrlPattern does not exist in the latest published npm package 3.1.0 HOT 1
- [Project Status?] Is @AVGP still working on this project? HOT 8
- Website has css missing and API calls don't get fulfilled when rendering it through Rendertron HOT 2
- Question: What is the reason of cutting pathname in rendertron which is used in <base> as a page default URL HOT 2
- Set custom user agent HOT 1
- cant get rendertron to work. HOT 1
- Case-sensitivity of refreshCache HOT 1
- Build is failing - is this project still maintained? HOT 3
- SSRF on rendertron HOT 1
- Add timeout options for screenshots HOT 3
- ERROR: newTree.optimize is not a function
- problem with metadata HOT 1
- Not working for deep pages HOT 1
- https://render-tron.appspot.com/ is down HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rendertron.