GithubHelp home page GithubHelp logo

Comments (15)

speed avatar speed commented on August 26, 2024

下载这两个
https://github.com/speed/windows-64bit-jetty-jre/archive/master.zip 解压成 windows-64bit-jetty-jre
https://github.com/speed/newcrawler/archive/master.zip 解压成 newcrawler

2.替换 newcrawler/war 到 windows-64bit-jetty-jre/war

3.点击 start.bat 运行

4.等 一会 就可以 在浏览器里访问 http://127.0.0.1:8500/

5.需要在newcrawler.com注册帐号

from newcrawler.

whairg avatar whairg commented on August 26, 2024

HTTP ERROR: 503
Problem accessing /. Reason:

Service Unavailable

Powered by Jetty://
报这个错误

from newcrawler.

whairg avatar whairg commented on August 26, 2024

12761584941047_ pic
启动的时候显示这个。

from newcrawler.

speed avatar speed commented on August 26, 2024

能把上半部的异常也截图看下吗?

from newcrawler.

whairg avatar whairg commented on August 26, 2024

image
image
image
您好,这是点击start.bat的所有信息。目前服务器是windows2012 R2系统,
image
这是打开http://127.0.0.1:8500/报的错误,
image
这是JAVA版本。
image
javac编译都没问题,java环境没问题。
image
image
这是文件,都覆盖过去了。

from newcrawler.

speed avatar speed commented on August 26, 2024

是NewCrawler自带的JRE版本低了,需要你将start.bat文件里的这一行删掉(我看到你有JDK1.8的环境)
set path="%~dp0jre\bin"
删掉后你再启动

from newcrawler.

whairg avatar whairg commented on August 26, 2024

您好,

可以打开了,
http://www.dianping.com/guangzhou/ch30/g141
这个是我要采集的网站,但是输入进去的时候显示这样,。
image
image
也无法像视频那样选择需要采集的字段。

from newcrawler.

speed avatar speed commented on August 26, 2024

你使用了chrome插件支持,需要下载
https://github.com/speed/newcrawler-plugin-urlfetch-chrome/archive/master.zip
并修改这个插件配置 , chromedriver.exe, ModHeader.crx 这两个文件位置要正确
5849540899249

from newcrawler.

whairg avatar whairg commented on August 26, 2024

您好,
image
image
为啥这个下一页测试的时候获取不到?

from newcrawler.

whairg avatar whairg commented on August 26, 2024

设置好下一页链接提取规则,
这个下一页的链接提取规则怎么设置?

from newcrawler.

whairg avatar whairg commented on August 26, 2024

image
下一页的提取规则请问是在这里填写吗?请问http://${property3}?pageNo=${page(1,1,50)}&PARAM1=${3},PARAM1=${3}是什么意思?

from newcrawler.

whairg avatar whairg commented on August 26, 2024

image
还有问题,乱码这个怎么解决?
不好意思,第一次用这个比较多问题,麻烦了。

from newcrawler.

speed avatar speed commented on August 26, 2024

自定义下一页CSS路径
div.page > a.next
200323215844

from newcrawler.

speed avatar speed commented on August 26, 2024

页面没乱码?

from newcrawler.

whairg avatar whairg commented on August 26, 2024

您好,

页面没有乱码,

用自定义下一页CSS路径
div.page> a.next这个方式,测试采集的时候还是没有办法采集下一页的信息出来。

from newcrawler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.