GithubHelp home page GithubHelp logo

scarapingai

version license

Extract data from websites automatically with AI or build web scraping agents for bulk URL scraping.

Auto extract website data with AI

Installation

Install it via npm:

npm i scarapingai

Highlights

  • Built-in residential proxies and captcha handling
  • Smart ad blocker, popup blocker for better performance
  • Accept cookie consent automatically to close cookie banners
  • Compatible with Puppeteer, Playwright for browser automation and testing.
  • Background jobs for bulk URL scraping with automatic retry & error handling.

Usage

Get your api key from here

const agenty = new Agenty(API_KEY);
const data = await agenty.browser.extract("https://example.com");
console.log(data);

Extract

To auto-extract product, jobs listing, SEO meta data, schema JSON etc from given URL

const data = await agenty.browser.extract("https://example.com");
console.log(data);

Scrape

To extract data from given CSS selector or custom jQuery function

const data = await agenty.browser.scrape("https://example.com");
console.log(data);

Screenshot

To capture a screenshot for given URL

const data = await agenty.browser.screenshot("https://example.com");
console.log(data);

PDF

To convert webpage into PDF.

const data = await agenty.browser.pdf("https://example.com");
console.log(data);

Content

To get HTML content from a URL.

const data = await agenty.browser.content("https://example.com");
console.log(data);

License

scrapingai is a project by Agenty, released under the MIT License.

Agenty's Projects

agenty.googlesheet icon agenty.googlesheet

Agenty Google sheet script to fetch agent result in google sheet automatically using ImportJSON library

agenty.net icon agenty.net

Cross-platform Agenty API wrapper for .NET Core and .NET 4.6+

agenty.scripting icon agenty.scripting

A replication of Agenty scripting library in C# to test and debug pre-processing and post-processing scripts locally

agenty.testdata icon agenty.testdata

This project contains the publc test data set to try and learn how to use cloud-based agents in Agenty.

browser-automation-api icon browser-automation-api

Browser automation API for repetitive web-based tasks, with a friendly user interface. You can use it to scrape content or do many other things like capture a screenshot, generate pdf, extract content or execute custom Puppeteer, Playwright functions.

css-selectors icon css-selectors

HTML examples to learn CSS selectors for web scraping using Agenty scraping agents

filereader icon filereader

C# library to read extremely large text, csv, tsv files efficiently

public-roadmap icon public-roadmap

Agenty's public roadmap, all planned features, updates and improvements.

scrapingai icon scrapingai

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.