GithubHelp home page GithubHelp logo

doc22940 / parallelcurl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from petewarden/parallelcurl

0.0 1.0 0.0 15 KB

A PHP class providing an easy interface for running multiple concurrent CURL requests

Home Page: http://petewarden.typepad.com/

PHP 100.00%

parallelcurl's Introduction

ParallelCurl


This module provides an easy-to-use interface to allow you to run multiple CURL url fetches in parallel in PHP.

Disclaimer

I've had reports of problems that appear to be related to changes in curl_multi's behavior. I'm no longer using PHP so I can't verify what's going wrong, but @marcushat has kindly provided a port with fixes: https://github.com/marcushat/rollingcurlx.

If you are hitting issues, please give it a try!
Pete Warden - Dec 16th 2014

Testing

To test it, go to the command line, cd to this folder and run

./test.php

This should run 100 searches through Google's API, printing the results. To see what sort of performance difference running parallel requests gets you, try altering the default of 10 requests running in parallel using the optional script argument, and timing how long each takes:

time ./test.php 1

time ./test.php 20

  • The first only allows one request to run at once, serializing the calls. I see this taking around 100 seconds.

  • The second run has 20 in flight at a time, and takes 11 seconds! Be warned though, it's possible to overwhelm your target if you fire too many requests at once. You may end up with your IP banned from accessing that server, or hit other API limits.

The class is designed to make it easy to run multiple curl requests in parallel, rather than waiting for each one to finish before starting the next. Under the hood it uses curl_multi_exec but since I find that interface painfully confusing, I wanted one that corresponded to the tasks that I wanted to run.

Usage

To use it, first copy parallelcurl.php and include it, then create the ParallelCurl object:

$parallelcurl = new ParallelCurl(10);

The first argument to the constructor is the maximum number of outstanding fetches to allow before blocking to wait for one to finish. You can change this later using setMaxRequests().

The second optional argument is an array of curl options in the format used by curl_setopt_array()

Next, start a URL fetch:

$parallelcurl->startRequest('http://example.com', 'on_request_done', array('something'));

The first argument is the address that should be fetched The second is the callback function that will be run once the request is done. The third is a 'cookie', that can contain arbitrary data to be passed to the callback.

This startRequest call will return immediately, as long as less than the maximum number of requests are outstanding. Once the request is done, the callback function will be called, eg:

on_request_done($content, 'http://example.com', $ch, array('something));

The callback should take four arguments. The first is a string containing the content found at the URL. The second is the original URL requested, the third is the curl handle of the request that can be queried to get the results, and the fourth is the arbitrary 'cookie' value that you associated with this object. This cookie contains user-defined data.

There's an optional fourth parameter to startRequest. If you pass in an array at that position in the arguments, the POST method will be used instead, with the contents of the array controlling the contents of the POST parameters.

Since you may have requests outstanding at the end of your script, you MUST call

$parallelcurl->finishAllRequests();

before you exit. If you don't, the final requests may be left unprocessed!

Credits

By Pete Warden [email protected], freely reusable, see http://petewarden.typepad.com for more

parallelcurl's People

Contributors

petewarden avatar kendallhopkins avatar samuelguebo avatar aslamsayyed avatar dhazel avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.