GithubHelp home page GithubHelp logo

node-search-engine-parser's Introduction

Node.js - Search Engine Parser

This module allows you to search google by scraping the results. You can add your own search engine.

Installation

npm install --save search-engine-parser

Built-in strategies

  • Google Images

Usage

This prints out the first 20 search results of the query kitten in Google Images.

var createParser = require('search-engine-parser');

var googleImagesParser = createParser('google-images');

googleImagesParser.search('kitten', function(err, results){
    results.map(function(result){
        console.log(result);
    });
});

Own search engine

You need to implement Search Strategy interface:

var cheerio = require('cheerio');

var SearchEngineStrategy = {
   getSearchURL : function(query){
       return "http://your-search-engine.com?query=" + query;
   },

   getResults : function(html, callback){
       var $ = cheerio.load(html);
       var imageLinks = $('.link');
       var imageHrefs = [];

       imageLinks.each(function(i, element){
           imageHrefs.push(element.attribs.href);
       });

       callback(null, imageHrefs);
   }
};

Example Google Images strategy.

License

Licensed under MIT.

node-search-engine-parser's People

Contributors

mkholodnyak avatar

Stargazers

 avatar Robert Rita avatar Mikal avatar nicolastorre avatar  avatar Vladislav Orlov avatar Aleksandr Saprykin avatar

Watchers

 avatar

Forkers

snehaldurge

node-search-engine-parser's Issues

Google Search Strategy

Hi,

I have noticed that the provided strategy returns an array of #'s. I poked around a little. I think what Google provides now is rendered as a bunch of a and img tags whose attributes match the regexes by the browser but what is actually provided when requested from node.js is div's with JSON strings inside. So I modified it and it now returns as expected.

var cheerio = require('cheerio');

var IMAGE_LINKS_SELECTOR = 'div.rg_meta.notranslate'; // JSON strings are in these div's
var GOOGLE_SEARCH_URL = 'http://images.google.com/search?tbm=isch&q=';

var GoogleImageStrategy = {
    getSearchURL : function(query){
        return GOOGLE_SEARCH_URL + query
    },

    getResults : function(html, callback){
        var $ = cheerio.load(html);
        var imageLinks = $(IMAGE_LINKS_SELECTOR);
        var imageHrefs = [];

        imageLinks.each(function collectHref(i, element){
            imageHrefs.push($(element).text()); // Need to get JSON strings inside the div's
        });
        
        callback(null, imageHrefs);
    }
};

module.exports = GoogleImageStrategy;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.