GithubHelp home page GithubHelp logo

Comments (5)

lopuhin avatar lopuhin commented on August 10, 2024

@nehakansal blank pages with only header with scripts is common when javascript execution is disabled, but in undercrawler it seems it shouldn't be... I would check a few things:

  • is this reproducible?
  • do pages render fine in the browser?
  • do pages render with a simple default splash script (you can use splash UI to test that)?

from undercrawler.

nehakansal avatar nehakansal commented on August 10, 2024
  • Yes, it's reproducible but its not the same pages that are blank every time from what I have seen, I will double check on that.
  • They render fine in a regular browser
  • I will check on this.

Thanks for the pointers, @lopuhin .

from undercrawler.

nehakansal avatar nehakansal commented on August 10, 2024
  • I double checked and its not the same pages every time I run a crawl.
  • I tried some of the urls on the Splash UI, and I see the same behavior, where some urls work and others dont as in the rendered .png image is blank for the ones that don't and the html string is incomplete.

Would you please try them on a Splash UI to see if you can spot a difference between a url that works and one that doesn't? Or can you please guide me further on what to look for?
Here is a list of some of those urls, hopefully at least one of these will work if/when you test them.

Thanks!

from undercrawler.

lopuhin avatar lopuhin commented on August 10, 2024

@nehakansal sure, to test the page in splash UI you have to do the following:

  • go to the splash URL with your browser (if you don't have a readily accessible splash, use docker run -p 8050:8050 scrapinghub/splash and go to http://localhost:8050)
  • you'll see something like this

screenshot 2018-12-12 at 10 10 27

- here you have a small splash script, you can change the URL from google.com to one of the above URLs, and click "Render me!" button - I did that for https://ada.com/conditions/hypertensive-retinopathy/ and with default 0.5 s wait first got a good page, and then a blank page, and with a 2.5 s wait got a normal page. So it means that the page renders fine in splash, and the the issue is that either some page elements might take longer to download, or there is something in the headless horsemen script used in undercrawler which is causing issues

screenshot 2018-12-12 at 10 30 04

from undercrawler.

nehakansal avatar nehakansal commented on August 10, 2024

Thanks, @lopuhin. You might have misunderstood my previous message. I was able to test few urls on Splash UI. I was requesting you to run it on your end to see if you get the same behavior and if the Splash UI stats give you any clue.

When I tested them on the Splash UI earlier, I hadn't noticed the wait time I could change, I ran all with the default 0.5 and with that some worked and others didnt. After reading your message, I changed the wait time to 5.0 and still some work and others don't.

from undercrawler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.