GithubHelp home page GithubHelp logo

spekulatius / phpscraper-keyword-scraping-example Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 0.0 30 KB

Simple example of a few ways to extract keywords from a website

Home Page: https://phpscraper.de/examples/extract-keywords.html

License: GNU General Public License v3.0

PHP 100.00%
keyword-extraction keyword-extractor keyword keyword-scraper php-scraper php7 web-scraping web-scraper php phpscraper-example

phpscraper-keyword-scraping-example's Introduction

Keyword Scraping Example using PHPScraper

PHPScraper is a scraping library aimed at making web-scraping easier. It simplifies the coding effort involved by reducing verbosity.

This is an example of the library scraping keywords from the Wikipedia article "Online Advertising". The expected output can be found below.

Within PHPScraper, the library RAKE PHP Plus is used. RAKE stands for "Rapid Automatic Keyword Extraction" algorithm.

There is another example showing how to analyze the keyword length distribution of a web-page and the performance test of PHPScraper and BeautifulSoup.

You might need to merge your keywords after scraping.

Installation

This example has been built on PHP 7.2.24 run on an Ubuntu-based Linux distro.

To run this example you will need to clone the repository and install the dependencies:

git clone [email protected]:spekulatius/phpscraper-keyword-scraping-example.git
composer install

If you would like to make changes you will need to fork the repository.

Execution

$ php keyword-extractor.php

Result

This page contains around 1989 keywords/phrases. Below are some selected keyword extractions.

Selected keywords with years:

  • truste announces 2011 behavioral advertising survey results (65.0)
  • july 2014 facebook reported advertising revenue (56.1)
  • cisco 2013 annual security report (18.3)
  • january 1994 mark eberra started (13.0)
  • august 2017 wikipedia articles (8.8)
  • august 2017 category (5.8)
  • october 2013 category (5.0)
  • august 2014 yahoo' (4.1)
  • june 2014 quarter (3.5)

Selected keywords with "content":

  • call 'content marketing' (77.2)
  • content management system (53.6)
  • automated ad content optimisation (49.6)
  • 10 content marketing 2 (44.0)
  • content marketing (44.0)
  • publisher content server sends (35.9)
  • web page content (33.3)
  • online content (29.5)
  • ad content delivered (27.1)
  • /wp-content/uploads/2015/11/iab_display_mobile_creative_guidelines_html5_2015 (25.8)
  • ad content (25.4)
  • website content' (22.9)
  • access requested content (19.3)
  • content page [ (16.5)
  • editorial content (15.8)
  • dividing content (15.8)
  • content filters (15.8)
  • sexual content (15.8)
  • primary content (15.8)
  • publishing content (15.3)
  • presenting content (15.3)
  • content (13.8)

Long Tail Keywords:

  • spanish euskara online publizitate (41,795.3)
  • platform customer relationship management (7,010.4)
  • flower delivery flower delivery (2,521.1)
  • search engine optimization search (1,645.3)
  • blocking search engine marketing (1,319.9)
  • adblock adblock advertising advertising (1,051.5)
  • factor annoyance factor horizontal (887.6)
  • web banners web banner (873.3)
  • engine optimisation search engine (809.6)
  • market segmentation strategy marketing (799.8)
  • search analytics search analytics (597.8)
  • management logistics management facebook (568.1)
  • enlarge display advertising display (567.3)
  • firms oracle oracle corporation (556.7)
  • digital distribution digital distribution (373.1)
  • underwriting spot underwriting spot (371.4)
  • interactive advertising bureau interactive (358.8)
  • mix promotional mix promotional (326.5)
  • marketing market research market (276.8)
  • marketing marketing marketing marketing (271.1)
  • product demonstration product demonstration (265.8)
  • placement product placement propaganda (198.8)
  • marketing activation brand licensing (186.5)
  • advertising mobile advertising mobile (169.8)
  • red bull red bull (169.1)
  • honor system honor system (167.9)
  • sears global network navigator (145.6)
  • arpanet arpanet nsfnet nsfnet (138.7)
  • banner blindness banner blindness (127.9)
  • marketing effectiveness ethics marketing (108.0)
  • revenue sharing revenue sharing (100.7)
  • modern search engines rank (100.6)
  • bull media house streaming (92.5)
  • pricing retail retail service (91.4)
  • live support software online (91.0)
  • malvertising malvertising cisco cisco (90.7)
  • advertising bureau predicts continued (88.7)
  • rich media rich media (86.3)
  • banner advertising display advertising (85.9)
  • advertising methods digital marketing (83.1)
  • federal trade commission federal (76.5)
  • explorer continues growth past (67.5)
  • online service prodigy displayed (65.3)
  • announces 2011 behavioral advertising (65.0)
  • corporate identity corporate identity (62.3)
  • search engines originally sold (60.0)
  • advertising age advertising age (58.7)
  • 2014 facebook reported advertising (56.1)
  • web bugs web bugs (50.3)
  • crime complaint center received (45.2)
  • states advertising industry organizations (43.7)
  • unit guidelines proposes standardized (43.2)
  • personal selling personal selling (41.0)
  • trade commission frequently supports (38.7)
  • ndl national diet library (33.8)
  • wikipedia current events find (32.3)
  • display advertising process overview (32.0)
  • revenue sharing revenue sharing (31.1)
  • file printable version printable (28.3)
  • owners sought additional revenue (28.3)
  • fixed cost compensation means (25.8)
  • news feed ads generate (24.9)
  • upload file upload files (16.7)

Please note: These results might have changed by now.

phpscraper-keyword-scraping-example's People

Contributors

spekulatius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.