emadehsan / thal Goto Github PK
View Code? Open in Web Editor NEWGetting started with Puppeteer and Chrome Headless for Web Scraping
Home Page: https://emadehsan.com
License: MIT License
Getting started with Puppeteer and Chrome Headless for Web Scraping
Home Page: https://emadehsan.com
License: MIT License
The page's stuck after login on github, here's the error message after 30 seconds of waiting for nothing happens:
(node:24805) UnhandledPromiseRejectionWarning: TimeoutError: Navigation timeout of 30000 ms exceeded
at Promise.then (/home/loia5tqd001/Desktop/thal/node_modules/puppeteer/lib/LifecycleWatcher.js:142:21)
-- ASYNC --
at Frame.<anonymous> (/home/loia5tqd001/Desktop/thal/node_modules/puppeteer/lib/helper.js:111:15)
at Page.waitForNavigation (/home/loia5tqd001/Desktop/thal/node_modules/puppeteer/lib/Page.js:690:49)
at Page.<anonymous> (/home/loia5tqd001/Desktop/thal/node_modules/puppeteer/lib/helper.js:112:23)
at run (/home/loia5tqd001/Desktop/thal/index.js:30:14)
at process._tickCallback (internal/process/next_tick.js:68:7)
(node:24805) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:24805) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
I cloned the repository, ran npm install
, and then ran index.js
with Code Runner (it's similar to run node index.js
)
when run index.js๏ผi get this error:
node index.js
(node:2199) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Navigation Timeout Exceeded: 30000ms exceeded
(node:2199) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
env:
Here comes a Chinese version
Hello,
We use e-commerce web scraping and found the library a perfect start point. Could you tell us how we can integrating amazon.com best.com and ebay.com with this scraper.
Would love to use it at production.
Thanks in advance,
Rahul
For example
let listLength = await page.evaluate((sel) => {
return document.getElementsByClassName(sel).length;
}, LENGTH_SELECTOR_CLASS);
Where does document
comes from?
I don't want to import implicitly but only explicitly...
Thanks!
At this point in your tut, Extract Emails, when you selected the username selector in devtools, this is what I am getting:
#user_search_results > div.user-list > div:nth-child(1) > div.d-flex > div > a > em
Note the em
at then end. If you use this, the loop doesnt work. You have to change it to #user_search_results > div.user-list > div:nth-child(1) > div.d-flex > div > a
for it to run.
Do you know why this might be happening?
Great tutorial. Thank you.
In ES6, it's idiomatic to use const
when a variable binding doesn't change. Therefore, most let
bindings in the README should be const
, right?
Hi,
I'm a new user for your module, I icorrectly installed Node and MongoDB
I did this :
git clone https://github.com/emadehsan/thal.git
cd thal
npm install
=> modules are correctly installed
I don't know how can I run it? can you tell me plz
when I did npm test
, I have this error
> [email protected] test /root/puppeteer/thal
> echo "Error: no test specified" && exit 1
Error: no test specified
npm ERR! Test failed. See above for more details.
I tried node index.js
, I have this error
module.js:491
throw err;
^
Error: Cannot find module './creds'
at Function.Module._resolveFilename (module.js:489:15)
at Function.Module._load (module.js:439:25)
at Module.require (module.js:517:17)
at require (internal/module.js:11:18)
at Object.<anonymous> (/root/puppeteer/thal/index.js:2:15)
at Module._compile (module.js:573:30)
at Object.Module._extensions..js (module.js:584:10)
at Module.load (module.js:507:32)
at tryModuleLoad (module.js:470:12)
at Function.Module._load (module.js:462:3)
Thanks
nodejs index.js
(node:13914) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Evaluation failed: TypeError: Cannot read property 'innerHTML' of null
at :2:43
(node:13914) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Firstly, great work on the tutorial!
I've noticed you've possibly misspelled your LENGHT_SELECTOR_CLASS
variable. It should be LENGTH_SELECTOR_CLASS
. ๐
I have a list of pdf links which I need to download after web scraping using puppeteer! page.pdf() doesnt seem to work!
Any suggestions?
I believe two of my colleagues already left a comment on the Medium post with this information..
But you don't need to use JSDOM for text extraction. You can use the $
method instead. It should make this a lot more simpler.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.