Comments (2)
Hello,
const result = await getHTML('https://www.bloomberg.com/', {
headers: {
'user-agent': 'googlebot'
}
})
particularly Bloomberg is validating you are a human you need to figure a way to bypass the validation ๐
from html-get.
Hey @Kikobeats, That did not get the desired result I was looking for. The way I get around the human validation on Bloomberg is by adding .setUserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9")
. Here is how my Puppeteer scraper looks;
console.log("๐งน Scrape Webpage " + url)
try {
const browser = await puppeteer.launch({ args: ['--no-sandbox'] });
const page = await browser.newPage();
await page.setUserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9")
await page.goto(url);
const html = await page.content()
browser.close()
return html
} catch (error) {
return error
}
}
I honestly don't know if this is the right way but for some reason it works and if I remove the setUserAgent it will fail and ask me for human validation. When you get a chance try it out on your script, also I randomize the user agent so I don't get flagged ๐
from html-get.
Related Issues (20)
- how to force prerendering from the command line? HOT 1
- TypeError: Need to provide a `getBrowserless` function. HOT 1
- The 'Chromium' process stay alive even after getHTML is done HOT 1
- Error: The `onCancel` handler was attached after the promise settled. HOT 7
- Thorwing TypeError: browser.createIncognitoBrowserContext is not a function; for some URLs HOT 1
- Incorrect function parameter, in readme example HOT 1
- Move from `got` to `curl` HOT 1
- An in-range update of require-one-of is breaking the build ๐จ HOT 2
- An in-range update of @metascraper/helpers is breaking the build ๐จ HOT 1
- An in-range update of require-one-of is breaking the build ๐จ HOT 1
- An in-range update of tldts is breaking the build ๐จ HOT 1
- An in-range update of require-one-of is breaking the build ๐จ HOT 1
- An in-range update of @metascraper/helpers is breaking the build ๐จ HOT 1
- returned html structure not complete HOT 2
- Still using mem dependency? HOT 1
- Resolve CloudFlare DDoS challenge HOT 2
- resolve relative urls into absolutes
- Resolve relative URLS getting changed to absolute URLS HOT 1
- MaxListenersExceededWarning HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from html-get.