Comments (5)
@nehakansal blank pages with only header with scripts is common when javascript execution is disabled, but in undercrawler it seems it shouldn't be... I would check a few things:
- is this reproducible?
- do pages render fine in the browser?
- do pages render with a simple default splash script (you can use splash UI to test that)?
from undercrawler.
- Yes, it's reproducible but its not the same pages that are blank every time from what I have seen, I will double check on that.
- They render fine in a regular browser
- I will check on this.
Thanks for the pointers, @lopuhin .
from undercrawler.
- I double checked and its not the same pages every time I run a crawl.
- I tried some of the urls on the Splash UI, and I see the same behavior, where some urls work and others dont as in the rendered .png image is blank for the ones that don't and the html string is incomplete.
Would you please try them on a Splash UI to see if you can spot a difference between a url that works and one that doesn't? Or can you please guide me further on what to look for?
Here is a list of some of those urls, hopefully at least one of these will work if/when you test them.
- https://ada.com/conditions/hypertensive-retinopathy/
- https://ada.com/signs-of-burnout/
- https://ada.com/conditions/subluxation-and-dislocation-of-the-hip/
- https://ada.com/conditions/anal-fissure/
- https://ada.com/conditions/roseola-infantum/
- https://ada.com/conditions/fibromyalgia/
Thanks!
from undercrawler.
@nehakansal sure, to test the page in splash UI you have to do the following:
- go to the splash URL with your browser (if you don't have a readily accessible splash, use
docker run -p 8050:8050 scrapinghub/splash
and go to http://localhost:8050) - you'll see something like this
from undercrawler.
Thanks, @lopuhin. You might have misunderstood my previous message. I was able to test few urls on Splash UI. I was requesting you to run it on your end to see if you get the same behavior and if the Splash UI stats give you any clue.
When I tested them on the Splash UI earlier, I hadn't noticed the wait time I could change, I ran all with the default 0.5 and with that some worked and others didnt. After reading your message, I changed the wait time to 5.0 and still some work and others don't.
from undercrawler.
Related Issues (20)
- Lua page script timeouts when trying to render binary pages HOT 5
- Redirect from domain to www.domain is not handled correctly without splash HOT 1
- Bad interaction of subdomains and autologin keychain
- S3 Filestorage HOT 4
- Creating a working docker image HOT 5
- feature request Soft404
- EvalError: Refused to evaluate a string as JavaScript HOT 5
- test_documents fails on scrapy master HOT 1
- crazy form submitter is not using form url
- What is the location of CDRv2 exports? HOT 1
- Config/issues with running multiple crawls? HOT 2
- Undercrawler concurrency and Splash slots HOT 2
- Memory problems: SplashRequest references keep going up HOT 9
- Question about Downloader Middlewares HOT 4
- How can i get both cookie and html through def parse(self,response) HOT 2
- Lua error. HOT 1
- Where are debugLogs logged when splash.args debug is true? HOT 2
- How to set splash.plugins_enabled for Undercrawler. HOT 3
- How to store urls and html content to json format? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from undercrawler.