Comments (11)
import xCrawl from 'x-crawl'
const myXCrawl = xCrawl({ crawlPage: { puppeteerLaunch: { args: ['--no-sandbox'] } })
from x-crawl.
@allmors 这是个小示例,让 AI 帮你快速提取一些想要的内容
结果:
{
elements: [
{
src: 'https://z1.muscache.cn/im/pictures/miso/Hosting-45937791/original/c67d32ed-21eb-4066-8cef-650dcd45bada.jpeg?aki_policy=large' },
{
src: 'https://z1.muscache.cn/im/pictures/df3493cf-39b2-46cc-9e85-7ef186980f25.jpg?aki_policy=large'
},
{
src: 'https://z1.muscache.cn/im/pictures/52d375d3-5e54-444b-8186-15e61a592d9a.jpg?aki_policy=large'
}
],
type: 'multiple'
}
也可以将整个 HTML 传给 AI 帮我们操作,但是会消耗更多 Tokens
from x-crawl.
Welcome to submit an issue for x-crawl for the first time
from x-crawl.
这里 const { browser, page } = res.data
获取得到 data 吗
from x-crawl.
这里
const { browser, page } = res.data
获取得到 data 吗
linux获取不到的,我抛异常了,大致知道问题出在哪,x-crawl依赖puppeteer,安装的时候不安装chrome,但是我看puppeteer这里说的是会自动安装,我也尝试曲线救国安装了pnpm i puppeteer
让它自动安装chrome,但是还是无法正常使用x-crawl
现在我打算手动安装chrome-linux64试试
14:29-补充说明
:
(已经是root权限)手动安装了一些列依赖,现在报一个错误 Running as root without --no-sandbox is not supported.
,问题源:https://crbug.com/638180,按照pptr的用法,const browser = await puppeteer.launch({ args: ['--no-sandbox'] });
,x-crawl支持设置这个吗,我看api文档没提到
from x-crawl.
有 https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md#xcrawlconfig
看这里 puppeteerLaunch 选项
from x-crawl.
import xCrawl from 'x-crawl' const myXCrawl = xCrawl({ crawlPage: { puppeteerLaunch: { args: ['--no-sandbox'] } })
from x-crawl.
@allmors ok,可以用了那我就关闭这个 Issues 了
from x-crawl.
下一个版本这个 puppeteerLaunch 会变成 puppeteerLaunchOptions 得注意一下,加入了 AI 后会有很多东西发生改变。
from x-crawl.
下一个版本这个 puppeteerLaunch 会变成 puppeteerLaunchOptions 得注意一下,加入了 AI 后会有很多东西发生改变。
加入的AI是收费模式还是说开放的,我们自己根据ai平台调api?
from x-crawl.
要用到 openai 的 APIKey ,底层是对 openai 进行了封装。openai 的 APIKey 也有免费的渠道,到时候我也会在文档那发出来。
目前这几个方法已经实现了,后续可能加入更多。
想详细了解可以看看这里 https://github.com/coder-hxl/x-crawl/tree/embracingAI/packages/ai
from x-crawl.
Related Issues (20)
- Dependency Dashboard HOT 1
- 请问有跳转的文件下载如何处理 HOT 3
- pnpm 安装依赖报错 HOT 3
- xCrawl.crawlFile 函数不能完美的兼容linux HOT 6
- crawlData 的请求参数传递有问题 HOT 4
- Check this box to trigger a request for Renovate to run again on this repository HOT 1
- 这个是不能在centos服务器上使用吗?安装依赖的时候,无头浏览器一直安装不成功 HOT 4
- crawlPage setting proxy option does not work HOT 7
- 建议 crawlFile 的选项参数可支持字符串或数组 HOT 9
- 关于 `crawlFile` API 设计的想法建议 HOT 4
- crawlData配置问题 HOT 3
- crawlData中data参数为string HOT 1
- crawData 请求结果问题 HOT 3
- chore(deps): update dependency @rollup/plugin-terser to v0.4.3
- crawlPage 爬取多个 link 时, 返回结果是数组, 但是不知道每个结果对应的原始 url HOT 2
- 能否增加debug模式,在开发的时候,支持将浏览器显示出来? HOT 7
- 是否可以有个选项启用或者关闭打印的 Start Crawling/finish 信息 HOT 8
- 创建实例时候 报错 TypeError: (0 , x_crawl_1.default) is not a function HOT 9
- 请问点击事件如何做呢 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from x-crawl.