GithubHelp home page GithubHelp logo

webgrabber's Introduction

webGrabber

webGrabber

webGrabber is a config-based web scraper and browser automation tool that makes it easy to extract data from websites and automate repetitive browsing tasks. With its flexible and powerful set of features, including custom actions, memory interpolation, and the ability to run specific grabs, webGrabber is the perfect solution for streamlining your web scraping and browser automation needs. Whether you are a data analyst, researcher, or web developer, webGrabber has something to offer for everyone.

Installation

npm install

Chromium on Mac

If you have trouble with chromium on Mac, you can try to install it using:

npx puppeteer browsers install chrome

Or you can add the executable path to Chrome in the options passed to Puppeteer through Grabber using the options file:

export default {
  executablePath: '/path/to/Chrome'
}

Usage

Create a grab config (json|yml|yaml) file in the src/grabs directory of the project

Hello World example: hello-world.json

{
  "name": "hello-world",
  "actions" : [
    {
      "name" : "log",
      "params" : {
        "text" : "Hello World!"
      }
    }
  ]
}

Hello World example: hello-world.yml

name: hello-world
actions:
  - name: log
    params:
      text: "Hello World!"

Running the Application

Local Mode

Run the app and all the grabs in the src/grabs directory will be executed:

npm run start

Run a specific grab:

npm run start hello-world

Server Mode

Run the app in server mode to start an HTTP server and receive grab configurations via API requests. In server mode, the application exposes an HTTP POST endpoint to accept JSON payloads for grab configurations.

npm run start:server

Endpoint Details

  • Endpoint: /grab
  • Method: POST
  • Payload: The endpoint expects a JSON payload containing the grab configuration.
  • Server Port: The server runs on the port specified in the PORT environment variable, with a default fallback to port 3000 if not set.

Send a POST request with a JSON payload to this endpoint to trigger the grab process.

Actions

A full list of actions can be found in Actions

Custom Actions

An example of how to add custom actions is found in the custom file

Environment Variables

Environment variables can be set in a .env file in the root of the project
All variables prepended with GRABBER_ will be loaded into the memory and can be accessed in the config files

Memory Interpolation

The memory can be accessed in the config files using the {{variable}} syntax

Return From Action

An action can return a value that can be used in the next action by using the INPUT keyword

Reserved Variable Names

The following variable names are reserved and should be used in the config files with caution:

  • INPUT
  • PARAMS
  • INDENTATION
  • CURRENT_DIR
  • BASE_DIR
  • PAYLOAD_ID

License

MIT

webgrabber's People

Contributors

andrejcbittencourt avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.