GithubHelp home page GithubHelp logo

floodesh's Introduction

Floodesh

Floodesh is middleware based web spider written with Nodejs. "Floodesh" is a combination of two words, flood and mesh.

Requirement

Gearman Server Installation

Make sure g++, make, libboost-all-dev, gperf, libevent-dev and uuid-dev have been installed.

wget https://launchpad.net/gearmand/1.2/1.1.12/+download/gearmand-1.1.12.tar.gz | tar xvf
cd gearmand-1.1.12
./configure
make
make install

Install

$ npm install -g floodesh-cli

Useage

Generate new app from templates by only one command.

$ mkdir floodesh_demo
$ cd floodesh_demo
$ floodesh-cli init // all necessary files will be generated in your directory.
$ npm install

Context

A context instance is a kind of Finite-State Machine implemented by Generators which is ECMAScript 6 feature. By context, we can access almost all fields in response and request, like:

worker.responsemw.use( (ctx,next) => {
    ctx.content = ctx.body.toString(); // totally do not care about the body 
    return next();
})

Request

ctx.querystring

  • String

Get querystring.

ctx.idempotent

  • Boolean

Check if the request is idempotent.

ctx.search

  • String

Get the search string. It includes the leading "?" compare to querystring.

ctx.method

  • String

Get request method.

ctx.query

  • Object

Get parsed query-string.

ctx.path

  • String

Get the request pathname

ctx.url

  • String

Return request url, the same as ctx.href.

ctx.origin

  • String

Get the origin of URL, for instance, "https://www.google.com".

ctx.protocol

  • String

Return the protocol string "http:" or "https:".

ctx.host

  • String, hostname:port

Parse the "Host" header field host and support X-Forwarded-Host when a proxy is enabled.

ctx.hostname

  • String

Parse the "Host" header field hostname and support X-Forwarded-Host when a proxy is enabled.

ctx.secure

  • Boolean

Check if protocol is https.

Response

ctx.status

  • Number

Get status code from response.

ctx.message

  • String

Get status message from response.

ctx.body

  • Buffer

Get the response body in Buffer.

ctx.length

  • Number

Get length of response body.

ctx.type

  • String

Get the response mime type, for instance, "text/html"

ctx.lastModifieds

  • Date

Get the Last-Modified date in Date form, if it exists.

ctx.etag

  • String

Get the ETag of a response.

ctx.header

  • Object

Return the response header.

ctx.contentType

  • String

ctx.get(key)

  • key String
  • Return: String

Get value by key in response headers

ctx.is(types)

  • types String|Array
  • Return: String|false|null

Check if the incoming response contains the "Content-Type" header field, and it contains any of the give mime types.If there is no response body, null is returned.If there is no content type, false is returned.Otherwise, it returns the first type that matches.

Other

tasks

  • Array

Array of pending crawling tasks. A task is an object consists of Options and next, next is a function name in your spider you want to call in next task , Supported format:

[{
    opt:[Options](https://github.com/request/request#requestoptions-callback),
    next:String
}]

dataSet

  • Map

dataSet is a map to store result, that will be parsed and saved by floodesh.

Middlewares

floodesh's People

Contributors

mike442144 avatar darrenqc avatar zero0707 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.