Comments (7)
I found a DotnetSpider2.Core
and a DotnetSpider2.Extension
on NuGet, which looks like this project
from dotnetspider.
NUGET上的DotnetSpider2.Core, DotnetSpider2.Extension就是这个项目的, 但是包比较老. 想用最新的可以配置我的私有源http://zlzforever.6655.la:40001/
from dotnetspider.
谢谢。
我用NuGet上的库暂时没有发现什么问题,除了有个Download拼错了。
另外我现在做的事情是下载网页上的图片,全部下载好之后和网页本身拼到一起。如果用现在的pipeline机制的话,怎样在图片全部下载完成后执行后面的事情呢?
from dotnetspider.
HttpClientDownloader集成了下载图片的功能,不确定NUGET上版本有。你只需要在Processor的解析中,把所有IMAGE的链接解析出来,通过page.AddTargetRequests最终会添加到队列中,downloader发现是文件是会下载到相对的对应目录中。比如 www.a.com/b/1.png, 则会下载到 /b/1.png中
from dotnetspider.
那有办法在同一网页上的所有图片完成后得到通知吗?
from dotnetspider.
没有。你这个需求不是一个常用的场景,不太好抽象。你可以自己实现一个Processor, 把图片的链接按URL归类, 在Pipeline中下载,就样可以保留到这一状态信息。
from dotnetspider.
我现在就是这么干的。。谢谢了
from dotnetspider.
Related Issues (20)
- 如何往界面元素上填账号密码,
- [问题]爬取列表页API的场景 HOT 2
- 下载的html乱码
- [feature] 框架是否能自动生成Host request header HOT 3
- 怎样在DotnetSpider中使用PuppeteerSharp? HOT 1
- 发现宝藏
- 怎么使用sqlserver数据库呢 HOT 3
- 请求问题原因:Could not load type 'MySql.Data.MySqlClient.MySqlDbType' from assembly 'MySqlConnector, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d33d3e53aa5f8c92 HOT 1
- DotnetSpider.Portal项目的lib资源目录在哪里,源码里没找到? HOT 2
- 我该如何获取多个元素 HOT 2
- 爬取一个日语酒店数据都是乱码 HOT 1
- Builder无法引用 HOT 3
- 重试的request请求它的Properties和headers被清理掉了,请问怎么解决 HOT 3
- 请问能用一个Spider,然后通过数据库进行配置不同抓取规则,进行多个网站抓取吗? HOT 1
- 自定义储存数据库问题 HOT 1
- XPathSelector内存消耗过大 HOT 3
- DataParser解析HTML的BUG HOT 1
- 用Host形式启动后,执行所有任务后仍在运行, 请问完成后怎么关闭 HOT 4
- 怎么样获取js渲染后的结果
- Where can I find documentations? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dotnetspider.