GithubHelp home page GithubHelp logo

flacle / simplephpdomainparser Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 0.0 138 KB

A very simple domain parser for PHP version 5.6.2+. Splits a URL into subdomain(s), registrable domain, and public suffix(es).

License: MIT License

PHP 100.00%
domain-parser php icann subdomain php-array

simplephpdomainparser's Introduction

simplePHPDomainParser

A very simple domain parser for PHP version 5.6.2+. It splits a URL into subdomain(s), registrable domain, and public suffix(es).

Why simple and why custom?

I am working on a big data processor and needed a domain parsing utility that is lightweight and fast. While I haven't benchmarked the performance of the app, I opted to use basic string fuctions such as strpos instead of more intensive regex functions for string pattern matching. While this utility uses an externally maintained reference list, there are no external requests being made as the reference list is pre-processed into a PHP array that can be loaded once per runtime. Because it's a small utility I also made it entirely procedural instead of object oriented.

As experienced by other parser developers, domain parsing is tricky business. For instance, think about the number of segments (such as http://a.b.c.d.e). This complexity comes at a cost where it becomes difficult to accurately parse a domain from an input URL into sub, registrable, and suffixes. One way to quite accurately parse a domain is to compare the input URL with a maintained list of the ICANN database, which is what this utility does.

There are also minor issues that I've encountered with PHP's own parse_url() function, and so this utility does not make use of PHP’s own built in URL parser, nor any regex functions for that matter. Please have a have a look at demo.php to see some tests with several URLs.

Installation

This utility is procedural and does not require classes to be auto loaded. It has a namespace simplePHPDomainParser for encapsulation, but that's also it. To incorporate the utility into your own project, paste the folder and include it by adding a statement such as require_once '../util/simplePHPDomainParser/index.php'; at the top of your script.

Usage

Parsing URLs for domains

The below snippet of code:

require_once './index.php';
$url = 'http://shop.retail.mystore.co.uk';
var_dump(\simplePHPDomainParser\getDomain($url));

Would output:

array(3) {
  [0]=>
  string(11) "shop.retail"
  [1]=>
  string(7) "mystore"
  [2]=>
  string(5) "co.uk"
}

By including index.php into your project you automatically include the file parser.php that contains the utilities' logic. The main function is getDomain($url). For convenience you can also ask for specific components. Calling getSubDomain($url) would return just shop.retail. Have a look demo.php that contains an array of test URLs.

Maintaining the ICANN reference list

The ICANN public suffix list comes from https://github.com/publicsuffix/list (thanks Mozilla!). This list is maintained from time to time and if you decide to use it you should also update public_suffix_list.dat from time to time stored in folder /publicsuffixlists/. The util parses only public ICANN domains, and not private ones, however feel free to fork and adapt the code as as you see fit. Every time you update the .dat file, you should also run /src/serializeToPHP.php to update the PHP array as well.

Contributor

simplephpdomainparser's People

Contributors

flacle avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.