The scrape.js from themaximalist

Scrape.js

Scrape.js is an easy to use web scraping library for Node.js.

const data = await scrape("https://example.com");
// { url, html }

Features

Fast
Scrape nearly any website
Headless JavaScript scraping
Auto proxy rotation
...it just works
MIT License

Install

Install Scrape.js from NPM:

npm install @themaximalist/scrape.js

Config

Scrape.js uses Zen Rows for proxy rotation. To use it acquire a Zen Rows API key and setup the environment variable.

ZENROWS_API_KEY=abcxyz123

Scrape.js can be used without proxies, but is less effective.

Usage

Using Scrape.js is as simple as calling a function with a website URL.

const scrape = require("@themaximalist/scrape.js");
await scrape("http://example.com"); // { url, html }

You can specify additional options to scrape() for more control:

const data = await scrape("https://example.com", {
    headless: true,
    proxy: true
});
// { url, html }

API

The Scrape.js API is a simple function you call with your URL, with an optional config object.

await scrape(
    url, // URL to scrape
    {
        headless: true, // Use JavaScript headless scraping
        proxy: true, // Use proxy rotation
        method: "GET", // HTTP Request method
        timeout: 3000, // Scrape timeout in ms
        userAgent: "Mozilla/5.0...", // User Agent
    }
);

URL (required)

url <string>: URL to scrape

Options

headless <bool>: Enable JavaScript. Default is true.
proxy <bool>: Use proxy with request. Default is true.
method <string>: HTTP request method, usually GET or POST. Default is GET.
timeout <int>: Max request time in ms. Default is 3500.
userAgent <string>: User agent for request. Default is Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36.

Response

Scrape.js returns an object containing the final url and html content.

const { url, html } = await scrape("https://example.com");
console.log(url); // https://example.com/
console.log(html); // <html...

The Scrape.js API is a simple and reliable way to scrape the HTML from any website.

Debug

Scrape.js uses the debug npm module with the scrape.js namespace.

View debug logs by setting the DEBUG environment variable.

> DEBUG=scrape.js*
> node src/get_website_html.js
# debug logs

Examples

View tests to examples on how to use Scrape.js.

Projects

Scrape.js is currently used in the following projects:

News Score — score the news, score the news, rewrite the headlines

License

MIT

Author

Created by The Maximalist, see our open-source projects.

themaximalist / scrape.js Goto Github PK

scrape.js's Introduction

Scrape.js

Install

Config

Usage

API

Debug

Examples

Projects

License

Author

scrape.js's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs