GithubHelp home page GithubHelp logo

Comments (2)

Kikobeats avatar Kikobeats commented on June 2, 2024

Hello,

const result = await getHTML('https://www.bloomberg.com/', {
  headers: {
    'user-agent': 'googlebot'
  }
})

particularly Bloomberg is validating you are a human you need to figure a way to bypass the validation ๐Ÿ™‚

from html-get.

sidster-io avatar sidster-io commented on June 2, 2024

Hey @Kikobeats, That did not get the desired result I was looking for. The way I get around the human validation on Bloomberg is by adding .setUserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9"). Here is how my Puppeteer scraper looks;

    console.log("๐Ÿงน Scrape Webpage " + url)
    try {
        const browser = await puppeteer.launch({ args: ['--no-sandbox'] });
        const page = await browser.newPage();
        await page.setUserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9")
        await page.goto(url);
        const html = await page.content()
        browser.close()
        return html
    } catch (error) {
        return error
    }
}

I honestly don't know if this is the right way but for some reason it works and if I remove the setUserAgent it will fail and ask me for human validation. When you get a chance try it out on your script, also I randomize the user agent so I don't get flagged ๐Ÿ˜„

from html-get.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.