GithubHelp home page GithubHelp logo

lqqyt2423 / wechat_spider Goto Github PK

View Code? Open in Web Editor NEW
1.4K 1.4K 347.0 13.31 MB

微信爬虫,获取文章内容、阅读量、点赞量、评论等,获取公众号所有历史文章链接。

License: MIT License

JavaScript 95.93% HTML 3.50% CSS 0.17% Dockerfile 0.40%

wechat_spider's People

Contributors

adobj avatar lqqyt2423 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wechat_spider's Issues

will forward to local https server,然后网络错误

received https CONNECT request p54-keyvalueservice.icloud.com
==>will bypass the man-in-the-middle proxy

received https CONNECT request mp.weixin.qq.com
==>will forward to local https server
[internal https]proxy server for mp.weixin.qq.com established

received https CONNECT request mp.weixin.qq.com
==>will forward to local https server
[internal https]proxy server for mp.weixin.qq.com established

received https CONNECT request mp.weixin.qq.com
==>will forward to local https server

然后就没反应了,微信上显示的是网络错误。

能否增加黑名单、白名单模式?

proxy开启常驻情况下,会爬取只是用来浏览的公众号。希望增加

  • 黑名单:除名单内ID外,其余全部爬取
  • 白名单:除名单内ID外,其余都不爬取。

谢谢。

会出现崩溃错误。。。[底部有截图]

运行一会儿就会出现这个错误并崩溃退出,安卓、iPhone 都会:

/data/wechat_spider/node_modules/brotli/build/encode.js:3
1<process.argv.length?process.argv[1].replace(/\\/g,"/"):"unknown-program");b.arguments=process.argv.slice(2);"undefined"!==typeof module&&(module.exports=b);process.on("uncaughtException",function(a){if(!(a instanceof y))throw a;});b.inspect=function(){return"[Emscripten Module object]"}}else if(x)b.print||(b.print=print),"undefined"!=typeof printErr&&(b.printErr=printErr),b.read="undefined"!=typeof read?read:function(){throw"no read() available (jsc?)";},b.readBinary=function(a){if("function"===
                                                                                                                                                                                                                              ^

Error: read ECONNRESET
    at TCP.onread (net.js:660:25)
npm ERR! code ELIFECYCLE
npm ERR! errno 7
npm ERR! [email protected] start: `node index.js`
npm ERR! Exit status 7
npm ERR! 
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2018-08-30T20_50_21_381Z-debug.log

然后 2018-08-30T20_50_21_381Z-debug.log 里的内容是这样的:

0 info it worked if it ends with ok
1 verbose cli [ '/usr/bin/node', '/usr/bin/npm', 'start' ]
2 info using [email protected]
3 info using [email protected]
4 verbose run-script [ 'prestart', 'start', 'poststart' ]
5 info lifecycle [email protected]~prestart: [email protected]
6 info lifecycle [email protected]~start: [email protected]
7 verbose lifecycle [email protected]~start: unsafe-perm in lifecycle true
8 verbose lifecycle [email protected]~start: PATH: /usr/lib/node_modules/npm/node_modules/npm-lifecycle/node-gyp-bin:/data/wechat_spider/node_modules/.bin:/usr/local/jdk/jdk1.8.0_151/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
9 verbose lifecycle [email protected]~start: CWD: /data/wechat_spider
10 silly lifecycle [email protected]~start: Args: [ '-c', 'node index.js' ]
11 silly lifecycle [email protected]~start: Returned: code: 7  signal: null
12 info lifecycle [email protected]~start: Failed to exec start script
13 verbose stack Error: [email protected] start: `node index.js`
13 verbose stack Exit status 7
13 verbose stack     at EventEmitter.<anonymous> (/usr/lib/node_modules/npm/node_modules/npm-lifecycle/index.js:304:16)
13 verbose stack     at EventEmitter.emit (events.js:182:13)
13 verbose stack     at ChildProcess.<anonymous> (/usr/lib/node_modules/npm/node_modules/npm-lifecycle/lib/spawn.js:55:14)
13 verbose stack     at ChildProcess.emit (events.js:182:13)
13 verbose stack     at maybeClose (internal/child_process.js:961:16)
13 verbose stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:250:5)
14 verbose pkgid [email protected]
15 verbose cwd /data/wechat_spider
16 verbose Linux 3.10.0-693.2.2.el7.x86_64
17 verbose argv "/usr/bin/node" "/usr/bin/npm" "start"
18 verbose node v10.9.0
19 verbose npm  v6.2.0
20 error code ELIFECYCLE
21 error errno 7
22 error [email protected] start: `node index.js`
22 error Exit status 7
23 error Failed at the [email protected] start script.
23 error This is probably not a problem with npm. There is likely additional logging output above.
24 verbose exit [ 7, true ]

这是截图:
wx_spider_error

手机证书安装后,代理8104端口,公众号文章内容打不开

微信新版本 7.0.3
Andriod:9.0.4
配置好代理后,打不开公众号文章历史页面,点击公众号内容,也也显示白屏
查看anyproxy 日志,提示连接被重置
Error: read ECONNRESET
是不是微信版太新造成的,项目非常棒,谢谢分享

2019-04-26T03:28:30.126713875Z [AnyProxy Log][2019-04-26 03:28:30]: received request to: POST res.imtt.qq.com/qbprobe/netprobe.txt?t=1556249309170
2019-04-26T03:28:30.136214045Z [AnyProxy Log][2019-04-26 03:28:30]: received request to: POST res.imtt.qq.com/qbprobe/netprobe.txt?t=1556249309114
2019-04-26T03:28:32.643456972Z [AnyProxy Log][2019-04-26 03:28:32]: received https CONNECT request play.googleapis.com
2019-04-26T03:28:32.681006912Z [AnyProxy Log][2019-04-26 03:28:32]: will forward to local https server
2019-04-26T03:28:33.288088594Z [AnyProxy Log][2019-04-26 03:28:33]: [internal https]proxy server for play.googleapis.com established
2019-04-26T03:28:33.318730911Z [AnyProxy ERROR][2019-04-26 03:28:33]: Error: read ECONNRESET
2019-04-26T03:28:38.803598413Z [AnyProxy ERROR][2019-04-26 03:28:38]: Error: connect ETIMEDOUT 203.119.215.254:35667
2019-04-26T03:28:42.781410959Z [AnyProxy Log][2019-04-26 03:28:42]: received https CONNECT request mobilenetworkscoring-pa.googleapis.com
2019-04-26T03:28:42.809838445Z [AnyProxy Log][2019-04-26 03:28:42]: will forward to local https server
2019-04-26T03:28:42.809873751Z [AnyProxy Log][2019-04-26 03:28:42]: [internal https]proxy server for mobilenetworkscoring-pa.googleapis.com established
2019-04-26T03:28:42.831666686Z [AnyProxy ERROR][2019-04-26 03:28:42]: Error: read ECONNRESET
2019-04-26T03:28:59.606489881Z [AnyProxy Log][2019-04-26 03:28:59]: received request to: GET www.noisyfox.cn/generate_204
2019-04-26T03:28:59.606519663Z [AnyProxy Log][2019-04-26 03:28:59]: received request to: GET conn1.oppomobile.com/generate_204
2019-04-26T03:28:59.606526340Z [AnyProxy Log][2019-04-26 03:28:59]: received request to: GET developers.google.cn/generate_204
2019-04-26T03:29:00.160849413Z [AnyProxy Log][2019-04-26 03:29:00]: received request to: POST res.imtt.qq.com/qbprobe/netprobe.txt?t=1556249339212
2019-04-26T03:29:01.247247368Z [AnyProxy Log][2019-04-26 03:29:01]: received request to: POST res.imtt.qq.com/qbprobe/netprobe.txt?t=1556249340272
2019-04-26T03:29:01.256491263Z [AnyProxy Log][2019-04-26 03:29:01]: received request to: POST res.imtt.qq.com/qbprobe/netprobe.txt?t=1556249340267
2019-04-26T03:29:01.256522681Z [AnyProxy Log][2019-04-26 03:29:01]: received request to: POST res.imtt.qq.com/qbprobe/netprobe.txt?t=1556249340273
2019-04-26T03:29:06.828507316Z [AnyProxy Log][2019-04-26 03:29:06]: received https CONNECT request mobilenetworkscoring-pa.googleapis.com
2019-04-26T03:29:06.857067239Z [AnyProxy Log][2019-04-26 03:29:06]: will forward to local https server
2019-04-26T03:29:06.857099576Z [AnyProxy Log][2019-04-26 03:29:06]: [internal https]proxy server for mobilenetworkscoring-pa.googleapis.com established
2019-04-26T03:29:06.898038874Z [AnyProxy ERROR][2019-04-26 03:29:06]: Error: read ECONNRESET
2019-04-26T03:29:13.832568637Z [AnyProxy Log][2019-04-26 03:29:13]: received https CONNECT request mp.weixin.qq.com
2019-04-26T03:29:13.841724915Z [AnyProxy Log][2019-04-26 03:29:13]: will forward to local https server
2019-04-26T03:29:13.841748897Z [AnyProxy Log][2019-04-26 03:29:13]: [internal https]proxy server for mp.weixin.qq.com established
2019-04-26T03:29:14.032441032Z [AnyProxy Log][2019-04-26 03:29:14]: received request to: GET wechatfe.github.io/vconsole/lib/vconsole.min.js?v=3.0.0.0
2019-04-26T03:29:31.244230881Z [AnyProxy Log][2019-04-26 03:29:31]: received request to: POST res.imtt.qq.com/qbprobe/netprobe.txt?t=1556249370291
2019-04-26T03:29:31.351350905Z [AnyProxy Log][2019-04-26 03:29:31]: received request to: POST res.imtt.qq.com/qbprobe/netprobe.txt?t=1556249370400
2019-04-26T03:30:03.796142529Z [AnyProxy ERROR][2019-04-26 03:30:03]: Error: connect ETIMEDOUT 203.119.215.254:35667
2019-04-26T03:30:03.796183261Z [AnyProxy ERROR][2019-04-26 03:30:03]: Error: This socket is closed
2019-04-26T03:30:03.796189790Z Error: This socket is closed
2019-04-26T03:30:03.796194141Z at Socket._writeGeneric (net.js:733:18)
2019-04-26T03:30:03.796198075Z at Socket._write (net.js:787:8)
2019-04-26T03:30:03.796201864Z at doWrite (_stream_writable.js:396:12)
2019-04-26T03:30:03.796205651Z at writeOrBuffer (_stream_writable.js:382:5)
2019-04-26T03:30:03.796209437Z at Socket.Writable.write (_stream_writable.js:290:11)
2019-04-26T03:30:03.796213253Z at Socket.write (net.js:711:40)
2019-04-26T03:30:03.796216994Z at /app/node_modules/anyproxy/lib/requestHandler.js:686:19
2019-04-26T03:30:03.796220846Z at Generator.next ()
2019-04-26T03:30:03.796225191Z at onFulfilled (/app/node_modules/co/index.js:65:19)
2019-04-26T03:30:03.796229036Z at
2019-04-26T03:30:03.796233008Z at process._tickCallback (internal/process/next_tick.js:189:7)
2019-04-26T03:30:05.409512632Z [AnyProxy Log][2019-04-26 03:30:05]: received https CONNECT request mobilenetworkscoring-pa.googleapis.com
2019-04-26T03:30:05.425946557Z [AnyProxy Log][2019-04-26 03:30:05]: will forward to local https server
2019-04-26T03:30:05.425991461Z [AnyProxy Log][2019-04-26 03:30:05]: [internal https]proxy server for mobilenetworkscoring-pa.googleapis.com established
2019-04-26T03:30:05.465077395Z [AnyProxy ERROR][2019-04-26 03:30:05]: Error: read ECONNRESET
2019-04-26T03:30:06.422225622Z [AnyProxy Log][2019-04-26 03:30:06]: received request to: POST sqimg.qq.com/qq_product_operations/nettest/index2.html?r=34359&mType=netdetect
2019-04-26T03:30:06.422266199Z [AnyProxy Log][2019-04-26 03:30:06]: received request to: POST sqimg.qq.com/qq_product_operations/nettest/index.html?r=57236&mType=netdetect
2019-04-26T03:30:12.931027060Z [AnyProxy Log][2019-04-26 03:30:12]: received request to: POST oth.eve.mdt.qq.com:8080/analytics/upload?rid=108774cc25ee75f0&sid=0bb20be38c36f8938e1f782acc3f051d
2019-04-26T03:30:14.179052470Z [AnyProxy Log][2019-04-26 03:30:14]: received https CONNECT request telemetry-in.battle.net
2019-04-26T03:30:14.191936241Z [AnyProxy Log][2019-04-26 03:30:14]: will forward to local https server
2019-04-26T03:30:14.515154665Z [AnyProxy Log][2019-04-26 03:30:14]: [internal https]proxy server for telemetry-in.battle.net established
2019-04-26T03:30:14.533253313Z [AnyProxy ERROR][2019-04-26 03:30:14]: Error: read ECONNRESET
2019-04-26T03:30:21.454988512Z [AnyProxy Log][2019-04-26 03:30:21]: received request to: POST oth.eve.mdt.qq.com:8080/analytics/upload?rid=8e3aad721715f234&sid=93c882e5f255ac9ef2e4b063fbbc6167
2019-04-26T03:30:55.501331602Z [AnyProxy Log][2019-04-26 03:30:55]: received https CONNECT request mobilenetworkscoring-pa.googleapis.com
2019-04-26T03:30:55.510262166Z [AnyProxy Log][2019-04-26 03:30:55]: will forward to local https server
2019-04-26T03:30:55.510299666Z [AnyProxy Log][2019-04-26 03:30:55]: [internal https]proxy server for mobilenetworkscoring-pa.googleapis.com established
2019-04-26T03:30:55.520385873Z [AnyProxy ERROR][2019-04-26 03:30:55]: Error: read ECONNRESET
2019-04-26T03:30:55.822799045Z [AnyProxy Log][2019-04-26 03:30:55]: received request to: GET www.google.cn/generate_204
2019-04-26T03:30:55.822832311Z [AnyProxy Log][2019-04-26 03:30:55]: received request to: GET www.noisyfox.cn/generate_204
2019-04-26T03:30:55.830259307Z [AnyProxy Log][2019-04-26 03:30:55]: received request to: GET conn2.oppomobile.com/generate_204
2019-04-26T03:30:56.376093723Z [AnyProxy Log][2019-04-26 03:30:56]: received https CONNECT request mobilenetworkscoring-pa.googleapis.com
2019-04-26T03:30:56.384066650Z [AnyProxy Log][2019-04-26 03:30:56]: received https CONNECT request android.clients.google.com
2019-04-26T03:30:56.393705227Z [AnyProxy Log][2019-04-26 03:30:56]: will forward to local https server
2019-04-26T03:30:56.393738585Z [AnyProxy Log][2019-04-26 03:30:56]: [internal https]proxy server for mobilenetworkscoring-pa.googleapis.com established
2019-04-26T03:30:56.411514634Z [AnyProxy ERROR][2019-04-26 03:30:56]: Error: read ECONNRESET
2019-04-26T03:30:56.432187735Z [AnyProxy Log][2019-04-26 03:30:56]: will forward to local https server
2019-04-26T03:30:56.623831916Z [AnyProxy Log][2019-04-26 03:30:56]: [internal https]proxy server for android.clients.google.com established
2019-04-26T03:30:56.641300337Z [AnyProxy ERROR][2019-04-26 03:30:56]: Error: read ECONNRESET
2019-04-26T03:31:53.926218997Z [AnyProxy Log][2019-04-26 03:31:53]: received request to: POST mazu.3g.qq.com/
2019-04-26T03:31:56.899228291Z [AnyProxy Log][2019-04-26 03:31:56]: received request to: POST log.tbs.qq.com/ajax?c=pu&tk=4b64865480e241c1957255c83be0f7d0a009f5d6eff6982e319aeef9467dd4596e25f009b49e830ca4130a91e1adbea0
2019-04-26T03:32:08.938755493Z [AnyProxy Log][2019-04-26 03:32:08]: received https CONNECT request android.clients.google.com
2019-04-26T03:32:08.949924088Z [AnyProxy Log][2019-04-26 03:32:08]: will forward to local https server
2019-04-26T03:32:08.949957057Z [AnyProxy Log][2019-04-26 03:32:08]: [internal https]proxy server for android.clients.google.com established
2019-04-26T03:32:08.962632604Z [AnyProxy ERROR][2019-04-26 03:32:08]: Error: read ECONNRESET
2019-04-26T03:32:28.949789680Z [AnyProxy Log][2019-04-26 03:32:28]: received https CONNECT request mobilenetworkscoring-pa.googleapis.com
2019-04-26T03:32:28.964018981Z [AnyProxy Log][2019-04-26 03:32:28]: will forward to local https server
2019-04-26T03:32:28.975869201Z [AnyProxy Log][2019-04-26 03:32:28]: [internal https]proxy server for mobilenetworkscoring-pa.googleapis.com established
2019-04-26T03:32:33.996770666Z [AnyProxy Log][2019-04-26 03:32:33]: received https CONNECT request mobilenetworkscoring-pa.googleapis.com
2019-04-26T03:32:34.010786129Z [AnyProxy Log][2019-04-26 03:32:34]: will forward to local https server
2019-04-26T03:32:34.010818716Z [AnyProxy Log][2019-04-26 03:32:34]: [internal https]proxy server for mobilenetworkscoring-pa.googleapis.com established
2019-04-26T03:32:34.024616125Z [AnyProxy ERROR][2019-04-26 03:32:34]: Error: read ECONNRESET
2019-04-26T03:32:46.993967838Z [AnyProxy Log][2019-04-26 03:32:46]: received https CONNECT request android.googleapis.com
2019-04-26T03:32:47.015119922Z [AnyProxy Log][2019-04-26 03:32:47]: will forward to local https server
2019-04-26T03:32:47.280536925Z [AnyProxy Log][2019-04-26 03:32:47]: [internal https]proxy server for android.googleapis.com established
2019-04-26T03:32:51.128234734Z [AnyProxy Log][2019-04-26 03:32:51]: received request to: POST sqimg.qq.com/qq_product_operations/nettest/index2.html?r=64800&mType=netdetect
2019-04-26T03:32:51.128291808Z [AnyProxy Log][2019-04-26 03:32:51]: received request to: POST sqimg.qq.com/qq_product_operations/nettest/index.html?r=64800&mType=netdetect
2019-04-26T03:33:24.368909082Z [AnyProxy Log][2019-04-26 03:33:24]: received request to: GET developers.google.cn/generate_204
2019-04-26T03:33:24.368942632Z [AnyProxy Log][2019-04-26 03:33:24]: received request to: GET conn2.oppomobile.com/generate_204
2019-04-26T03:33:24.368949642Z [AnyProxy Log][2019-04-26 03:33:24]: received request to: GET www.noisyfox.cn/generate_204
2019-04-26T03:33:34.271459873Z [AnyProxy Log][2019-04-26 03:33:34]: received https CONNECT request mobilenetworkscoring-pa.googleapis.com
2019-04-26T03:33:34.282937695Z [AnyProxy Log][2019-04-26 03:33:34]: will forward to local https server
2019-04-26T03:33:34.282984689Z [AnyProxy Log][2019-04-26 03:33:34]: [internal https]proxy server for mobilenetworkscoring-pa.googleapis.com established
2019-04-26T03:33:58.326337120Z [AnyProxy Log][2019-04-26 03:33:58]: received https CONNECT request mobilenetworkscoring-pa.googleapis.com
2019-04-26T03:33:58.341699363Z [AnyProxy Log][2019-04-26 03:33:58]: will forward to local https server
2019-04-26T03:33:58.341736521Z [AnyProxy Log][2019-04-26 03:33:58]: [internal https]proxy server for mobilenetworkscoring-pa.googleapis.com established
2019-04-26T03:33:58.358426254Z [AnyProxy ERROR][2019-04-26 03:33:58]: Error: read ECONNRESET
2019-04-26T03:34:04.905472777Z [AnyProxy Log][2019-04-26 03:34:04]: received https CONNECT request android.googleapis.com
2019-04-26T03:34:04.932094469Z [AnyProxy Log][2019-04-26 03:34:04]: will forward to local https server
2019-04-26T03:34:04.932128075Z [AnyProxy Log][2019-04-26 03:34:04]: [internal https]proxy server for android.googleapis.com established
2019-04-26T03:34:04.943257364Z [AnyProxy ERROR][2019-04-26 03:34:04]: Error: read ECONNRESET

encounter Error: read ECONNRESET

open file in wechat, anyproxy has response, but failed

[AnyProxy Log][2020-07-07 05:37:29]: [internal https]proxy server for mmbiz.qlogo.cn established
[AnyProxy Log][2020-07-07 05:37:30]: received request to: GET mp.weixin.qq.com/mp/jsreport?key=10&content=https%3A%2F%2Fmmbiz.qlogo.cn%2Fmmbiz_gif%2Fw0v5qeKiawmLAIRabzWTzTYb592Zh9qLG3724NZHXK4FlynU9WllibGpsMO0LfHR7IokKDFa5lib9PHg4fw3hf1Vw%2F640%3Fwx_fmt%3Dgif%26tp%3Dwxpic%26wxfrom%3D5%26wx_lazy%3D1%26wx_co%3D1%26retryload%3D2[]&r=0.1209877349412547
[AnyProxy Log][2020-07-07 05:37:30]: received https CONNECT request badjs.weixinbridge.com
[AnyProxy Log][2020-07-07 05:37:30]: will forward to local https server
[AnyProxy Log][2020-07-07 05:37:30]: [internal https]proxy server for badjs.weixinbridge.com established
[AnyProxy Log][2020-07-07 05:37:30]: received https CONNECT request mp.weixin.qq.com
[AnyProxy Log][2020-07-07 05:37:30]: will forward to local https server
[AnyProxy Log][2020-07-07 05:37:30]: [internal https]proxy server for mp.weixin.qq.com established
[AnyProxy Log][2020-07-07 05:37:30]: received https CONNECT request mp.weixin.qq.com
[AnyProxy Log][2020-07-07 05:37:30]: will forward to local https server
[AnyProxy Log][2020-07-07 05:37:30]: [internal https]proxy server for mp.weixin.qq.com established
[AnyProxy Log][2020-07-07 05:37:31]: received request to: GET dnet.mb.qq.com/rsp204
[AnyProxy Log][2020-07-07 05:37:34]: received request to: GET connectivitycheck.platform.hicloud.com/generate_204_909c6b4f-2e11-4cf2-ba89-b9bc4e9efe8f
[AnyProxy Log][2020-07-07 05:37:34]: received https CONNECT request connectivitycheck.platform.hicloud.com
[AnyProxy Log][2020-07-07 05:37:34]: will forward to local https server
[AnyProxy Log][2020-07-07 05:37:34]: [internal https]proxy server for connectivitycheck.platform.hicloud.com established
[AnyProxy Log][2020-07-07 05:37:34]: received request to: GET connectivitycheck.cbg-app.huawei.com/generate_204
[AnyProxy Log][2020-07-07 05:37:34]: received https CONNECT request connectivitycheck.cbg-app.huawei.com
[AnyProxy Log][2020-07-07 05:37:34]: will forward to local https server
[AnyProxy Log][2020-07-07 05:37:34]: [internal https]proxy server for connectivitycheck.cbg-app.huawei.com established
[AnyProxy ERROR][2020-07-07 05:37:34]: Error: read ECONNRESET

获取文章内容时的Bug

获取文章内容时会计算,剩余文章抓取长度。
如果多次打开,那么数字会不断累积增加,在公众号并未增加的情况下,应为重复计算导致。

解决方案:
修正redis中的key的value值。

楼主这个好像运行有点问题

楼主这个还能用吗,最近公司派我研究微信公众号文章爬取任务。我自己也想学习下具体的做法,可提供有偿服务。 QQ 460356696

client文件夹的作用

你好,我看了你的代码,我看client文件夹是一系列的前端代码,我想问问这系列代码的作用是什么,是前端页面吗?那么应该如何访问呢?谢谢

不能抓取阅读数、点赞数、评论吗?

首先感谢你的代码,经测试能跑通。其次,由于对nodejs不熟悉,我想抓阅读、点赞、评论,抓包知道这些数据在哪,但我该从哪里着手改代码呢?谢谢。

npm start error

13 verbose stack Error: [email protected] start: DEBUG=ws:* NODE_ENV=production node index.js
13 verbose stack Exit status 1
13 verbose stack at EventEmitter. (C:\Users\Anthony.sun\AppData\Roaming\npm\node_modules\npm\node_modules\npm-lifecycle\index.js:326:16)
13 verbose stack at EventEmitter.emit (events.js:203:13)
13 verbose stack at ChildProcess. (C:\Users\Anthony.sun\AppData\Roaming\npm\node_modules\npm\node_modules\npm-lifecycle\lib\spawn.js:55:14)
13 verbose stack at ChildProcess.emit (events.js:203:13)
13 verbose stack at maybeClose (internal/child_process.js:1021:16)
13 verbose stack at Process.ChildProcess._handle.onexit (internal/child_process.js:283:5)
14 verbose pkgid [email protected]
15 verbose cwd C:\Users\Anthony.sun\Documents\OneDrive - Dana Incorporated\Anthony\python\wechat_spider
16 verbose Windows_NT 10.0.16299
17 verbose argv "C:\Program Files\nodejs\node.exe" "C:\Users\Anthony.sun\AppData\Roaming\npm\node_modules\npm\bin\npm-cli.js" "start"
18 verbose node v12.7.0
19 verbose npm v6.10.2
20 error code ELIFECYCLE
21 error errno 1
22 error [email protected] start: DEBUG=ws:* NODE_ENV=production node index.js
22 error Exit status 1
23 error Failed at the [email protected] start script.
23 error This is probably not a problem with npm. There is likely additional logging output above.
24 verbose exit [ 1, true ]

本地WIN10调试

项目都启动OK。
手机安装了CA证书,然后设置了代理,是内网的IP,此时不能上网了,是设置的有问题吗?

获取文章详细内容,重复显示某一个文章的ID

AnyProxy正常,手机上文章正常刷新,正常跳转。
在服务器端,显示文章的ID为一个奇怪的ID,这个ID在mongodb的Post中搜索,并无此ID。
其中文章长度是错误的,文章的ID重复获取,应该是redis的问题。

日志如下:

文章id: 5acf2be0c0dc79a2ba3467eb
阅读量: 1006 点赞量: 7

剩余文章抓取长度: 176496

文章id: 5acf2be0c0dc79a2ba3467eb
阅读量: 1007 点赞量: 7

剩余文章抓取长度: 176495

文章id: 5acf2be0c0dc79a2ba3467eb
阅读量: 7398 点赞量: 35

剩余文章抓取长度: 176494

文章id: 5acf2be0c0dc79a2ba3467eb
阅读量: 23172 点赞量: 79

剩余文章抓取长度: 176493

已抓取10条评论

前台分类页面加载失败,一直在转圈

其他的都一切正常,代码看了好久,React看得优点晕哩呼噜的,不知道啥问题。
使用项目原先打包好的页面文件就出现这个问题了,应该不是我的问题...

请求支援!万分感谢~

mac 运行 MONGO_PATH=/data/mongo docker-compose up报错·

Pulling mongo (mongo:3.6)...
3.6: Pulling from library/mongo
0a01a72a686c: Pull complete
cc899a5544da: Pull complete
19197c550755: Pull complete
716d454e56b6: Pull complete
0793d4ab2500: Pull complete
df33e33466d0: Pull complete
3b2d76901480: Pull complete
df04584b8696: Pull complete
80dc6d311601: Pull complete
000637459d95: Pull complete
8e5d88b7543b: Pull complete
ee7d07a859de: Pull complete
570420d1587b: Pull complete
Digest: sha256:1e267a2a16093e70a9b21a75a912ab8898971f42ce00f2d9ae2660410604d65e
Status: Downloaded newer image for mongo:3.6
Building app
Step 1/9 : FROM node:10.16.3
10.16.3: Pulling from library/node
9a0b0ce99936: Pull complete
db3b6004c61a: Pull complete
f8f075920295: Pull complete
6ef14aff1139: Pull complete
0bbd8b48260f: Pull complete
524be717efb1: Pull complete
aad1e8812bc2: Pull complete
13fb072986db: Pull complete
627a2df19018: Pull complete
Digest: sha256:f4b6f471cdd4b66b27eef899e7f8423ecd9fbfc863b2cb7a59978a7f64c8e0c3
Status: Downloaded newer image for node:10.16.3
---> a68faf70e589
Step 2/9 : WORKDIR /app
---> Running in 5ece924b7018
Removing intermediate container 5ece924b7018
---> 9a306797bade
Step 3/9 : COPY package.json package-lock.json /app/
---> e6063c8ed833
Step 4/9 : RUN npm install
---> Running in e1060dbd7737

[email protected] postinstall /app/node_modules/core-js
node -e "try{require('./postinstall')}catch(e){}"

Thank you for using core-js ( https://github.com/zloirock/core-js ) for polyfilling JavaScript standard library!

The project needs your help! Please consider supporting of core-js on Open Collective or Patreon:

https://opencollective.com/core-js
https://www.patreon.com/zloirock

Also, the author of core-js ( https://github.com/zloirock ) is looking for a good job -)

[email protected] postinstall /app/node_modules/nodemon
node bin/postinstall || exit 0

Love nodemon? You can now support the project via the open collective:

https://opencollective.com/nodemon/donate

npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules/fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})

added 622 packages from 887 contributors and audited 3452 packages in 15.33s
found 2 low severity vulnerabilities
run npm audit fix to fix them, or npm audit for details
Removing intermediate container e1060dbd7737
---> 9a64e058a150
Step 5/9 : COPY . /app
---> 8988fefddd11
Step 6/9 : RUN cd ~ && mkdir .anyproxy && cd .anyproxy && mv /app/certificates ~/.anyproxy/ && cp ~/.anyproxy/certificates/rootCA.crt /usr/local/share/ca-certificates/ && update-ca-certificates
---> Running in c9b1fde36b4f
Updating certificates in /etc/ssl/certs...
1 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Removing intermediate container c9b1fde36b4f
---> 28bd2fa539f6
Step 7/9 : RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
---> Running in 3922657deab8
Removing intermediate container 3922657deab8
---> c0d38fb8595f
Step 8/9 : EXPOSE 8101 8104
---> Running in 1526d210c2e2
Removing intermediate container 1526d210c2e2
---> ede80d43221a
Step 9/9 : CMD ["node", "index.js"]
---> Running in 73411a22b508
Removing intermediate container 73411a22b508
---> 30697f95b868
Successfully built 30697f95b868
Successfully tagged wechat-spider:latest
WARNING: Image for service app was built because it did not already exist. To rebuild this image you must use docker-compose build or docker-compose up --build.
wechat_spider_redis_1 is up-to-date
Creating wechat_spider_mongo_1 ... error

ERROR: for wechat_spider_mongo_1 Cannot start service mongo: Mounts denied:
The path /data/mongo
is not shared from OS X and is not known to Docker.
You can configure shared paths from Docker -> Preferences... -> File Sharing.
See https://docs.docker.com/docker-for-mac/osxfs/#namespaces for more info.
.

ERROR: for mongo Cannot start service mongo: Mounts denied:
The path /data/mongo
is not shared from OS X and is not known to Docker.
You can configure shared paths from Docker -> Preferences... -> File Sharing.
See https://docs.docker.com/docker-for-mac/osxfs/#namespaces for more info.
.
ERROR: Encountered errors while bringing up the project.

感觉很鸡肋

这种方法以前用过,每天能抓的数据很少,200公众号到顶了,还是得用万能key抓取

试用的几个问题

  1. 主页面一直转圈
    然后发现是系统中的一些环境没有启动,enable后重启就好了。
  2. 微信打不开公众号内容
    自己分析了一下,可能是因为安卓版本的缘故。
    因为7.0以上的安全机制就加强了,何况我是8.0的华为……
    具体来讲,是anyProxy的问题:alibaba/anyproxy#243
  3. 解析内容本地保存
    我注意到保存的都是链接
    可以存储到离线吗?
    注意到那个导出小工具exportData.js,但是苦于不会用,试过了npm start exportData.js npm run exportData.js nodejs exportData.js`等等都不行。

TypeError: Cannot read property 'certMgr' of undefined

[nodemon] starting node index.js
E:\PycharmProjects\github\wechat_spider-master\wechat_spider-master\index.js:9
if (!AnyProxy.utils.certMgr.ifRootCAFileExists()) {
^

TypeError: Cannot read property 'certMgr' of undefined
at Object. (E:\PycharmProjects\github\wechat_spider-master\wechat_spider-master\index.js:9:21)
at Module._compile (module.js:652:30)
at Object.Module._extensions..js (module.js:663:10)
at Module.load (module.js:565:32)
at tryModuleLoad (module.js:505:12)
at Function.Module._load (module.js:497:3)
at Function.Module.runMain (module.js:693:10)
at startup (bootstrap_node.js:188:16)
at bootstrap_node.js:609:3

时好时坏,错误显示connect ETIMEDOUT 74.125.204.113:80

时好时坏, 有时候可以正常爬去, 有时候就是公众号页面点不开, 开启了 onError事件打印错误 总显示如下错误

{ Error: connect ETIMEDOUT 74.125.204.113:80
at Object._errnoException (util.js:992:11)
at _exceptionWithHostPort (util.js:1014:20)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1186:14)
code: 'ETIMEDOUT',
errno: 'ETIMEDOUT',
syscall: 'connect',
address: '74.125.204.113',
port: 80 }

请问是哪里出现的问题呢

无法打开历史记录

用anyproxy能够正常的打开历史记录,用这个代理点击历史记录出现 网络出错,轻触屏幕重新加载 谁能告诉我怎么解决啊

部分文章substr报错

导致数据显示不出来, 报错如下:
VM114 bundle.js:28 Uncaught (in promise) TypeError: Cannot read property 'substr' of null
at VM114 bundle.js:28
at Array.map ()
at t.value (VM114 bundle.js:28)
at c._renderValidatedComponentWithoutOwnerOrContext (VM114 bundle.js:23)
at c._renderValidatedComponent (VM114 bundle.js:23)
at c._updateRenderedComponent (VM114 bundle.js:23)
at c._performComponentUpdate (VM114 bundle.js:23)
at c.updateComponent (VM114 bundle.js:23)
at c.receiveComponent (VM114 bundle.js:23)
at Object.receiveComponent (VM114 bundle.js:6)

wx 调整规则了? read_sum无法收集

RT, 文章能够抓取到,但评论、点赞数据没了,看着是wx api返回数据变了?

Error: TypeError: Cannot destructure property `read_num` of 'undefined' or 'null'.
    at getReadAndLikeNum (/private/tmp/wechat_spider/rule/wechatRule.js:34:41)
    at handleFn (/private/tmp/wechat_spider/rule/index.js:105:14)
    at Object.beforeSendResponse (/private/tmp/wechat_spider/rule/index.js:112:12)
    at beforeSendResponse.next (<anonymous>)
    at onFulfilled (/private/tmp/wechat_spider/node_modules/co/index.js:65:19)
    at /private/tmp/wechat_spider/node_modules/co/index.js:54:5
    at new Promise (<anonymous>)
    at co (/private/tmp/wechat_spider/node_modules/co/index.js:50:10)
    at toPromise (/private/tmp/wechat_spider/node_modules/co/index.js:118:63)
    at next (/private/tmp/wechat_spider/node_modules/co/index.js:99:29)
    at onFulfilled (/private/tmp/wechat_spider/node_modules/co/index.js:69:7)
    at /private/tmp/wechat_spider/node_modules/co/index.js:54:5
    at new Promise (<anonymous>)
    at co (/private/tmp/wechat_spider/node_modules/co/index.js:50:10)
    at createPromise (/private/tmp/wechat_spider/node_modules/co/index.js:30:15)

自动翻页爬取过程中加载停止,需手动刷新

在自动翻页爬取的过程中,有时会出现文章加载进度条缓慢、无法加载的情况,这时只能手动刷新。这对于爬取大量文章是致命的,请问owner是否有相关解决方案可以分享?

cnpm start 报错

$ cnpm start

[email protected] start E:\wechat_spider\wechat_spider
DEBUG=ws:* NODE_ENV=production node index.js

'DEBUG' ▒▒▒▒▒ڲ▒▒▒▒ⲿ▒▒▒Ҳ▒▒▒ǿ▒▒▒▒еij▒▒▒
▒▒▒▒▒▒▒▒▒ļ▒▒▒
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] start: DEBUG=ws:* NODE_ENV=production node index.js
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! C:\Users\admin\AppData\Roaming\npm-cache_logs\2019-08-01T08_18_10_350Z-debug.log

没有保存阅读量

image
ui 没有显示阅读量
image

mongodb 里面也没有保存相应的数据

image
console 里面有阅读量的输出

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.