GithubHelp home page GithubHelp logo

thom4parisot / tld.js Goto Github PK

View Code? Open in Web Editor NEW
458.0 12.0 55.0 763 KB

JavaScript API to work easily with complex domain names, subdomains and well-known TLDs.

Home Page: https://npmjs.com/tldjs

License: MIT License

JavaScript 100.00%
subdomain javascript tld browser uri hostname nodejs public-suffix-list tldextract validation-library

tld.js's People

Contributors

chrmod avatar ghostwords avatar jdesboeufs avatar jhnns avatar kellycampbell avatar krinkle avatar olivoil avatar remusao avatar thom4parisot avatar xdamman avatar yehezkielbs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tld.js's Issues

CLI command

With usage like:

$ tld.js domain love.to.speak.blabla.com
-> blabla.com
$ tld.js validity blabla.com
-> valid
$ tld.js subdomain love.to.speak.blabla.com
-> love.to.speak

Future evolution

$ tld.js exists blabla.com ahahah.com ahahah.gd
-> blabla.com: valid
-> ahahah.com: valid
-> ahahah.gd: invalid

Offer to update rules at install time

  • it certainly happens during a postinstall script
  • the script might just check for the presence of a specific npm key/value (like tldjs_update_rules=true)
  • this might happen by doing npm install --tldjs-update-rules (as of npm config)
  • rewrite the update script to get rid of the following devDependencies:
    • request
    • async
  • on the other hand, the following dependencies should be added:

add support for subdomains on localhost

It would be great if subdomains on localhost could be supported. Is there a particular reason it isn't included or simple oversight as it's probably not very commonly needed?

example

tld.getSubdomain("vhost.localhost"); // should return vhost

"download" command

In order to decouple it from "update" (which is check-upadte, download and build in this order).

Domain extraction bug

Two similar failing examples:

Observed in the Web version (http://wzrd.in/standalone/tldjs) and in tld.js from npm.

Compare with tldextract (Python module):

$ curl "http://tldextract.appspot.com/api/extract?url=http://cdn.jsdelivr.net/g/[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]"
{"domain": "jsdelivr", "subdomain": "cdn", "suffix": "net", "tld": "net"}

Allow to customize rules.json

It will be great if we can customize rules.json used by tldjs.

In order to reduce my bundle size, I would like to use a lighter version of rules.json.

Weird behaviour on Github.io getDomain

Hello,

I noticed weird behaviour handling urls from the domain github.io using the getDomain method.

const tld  = require('tldjs');
console.log("tld", tld.getDomain("google.io")); --> google
console.log("tld", tld.getDomain("github.io")); --> null

and i get back null on github.io

You can check it live in here

UPDATE

The behaviour extends as well to other functions like getPublicSuffix:

console.log(tld.getPublicSuffix("google.io")); --> io
console.log(tld.getPublicSuffix("github.io")); --> github.io

Looking forward for your input.

grunt update fails

Hi, I was trying to run this project to see how the parsing works. The project seems to fail when i do 'npm install'
The error:
npm ERR! Tell the author that this fails on your system:
npm ERR! grunt update

Could you guide me as to how to proceed ?

Thanks

grunt update fails

Hi, I was trying to run this project to see how the parsing works. The project seems to fail when i do 'npm install'
The error:
npm ERR! Tell the author that this fails on your system:
npm ERR! grunt update

Could you guide me as to how to proceed ?

Thanks

Breaks on some links

nesh> var tld = require('tldjs')
undefined
nesh> tld.getDomain('http://www.nytimes.com/glogin?URI=http://www.nytimes.com/2010/03/26/us/politics/26court.html&OQ=_rQ3D1Q26&OP=45263736Q2FKgi!KQ7Dr!K@@@Ko!fQ24KJg(Q3FQ5Cgg!Q60KQ60W.WKWQ22KQ60IKyQ3FKigQ24Q26!Q26(Q3FKQ60I(gyQ5C!Q2Ao!fQ24')
SyntaxError: Invalid regular expression: /.+\.wkwq22kq60ikyq3fkigq24q26!q26(q3fkq60i(gyq5c!q2ao!fq24$/: Unterminated group
    at new RegExp (<anonymous>)
    at /home/suor/projects/domain-scraper/node_modules/tldjs/lib/tld.js:107:10
    at _someFunction (/home/suor/projects/domain-scraper/node_modules/tldjs/lib/tld.js:41:25)
    at Function.getCandidateRule (/home/suor/projects/domain-scraper/node_modules/tldjs/lib/tld.js:91:3)
    at tld.getDomain (/home/suor/projects/domain-scraper/node_modules/tldjs/lib/tld.js:247:14)
    at repl:1:6
    at REPLServer.self.eval (repl.js:110:21)
    at REPLServer.repl.eval (/home/suor/.nvm/v0.10.28/lib/node_modules/nesh/lib/plugins/doc.js:147:16)
    at Interface.<anonymous> (repl.js:239:12)
    at Interface.EventEmitter.emit (events.js:117:20)

isValid() should follow RFC-952

The current implementation is too naive.

  The syntax of a legal Internet host name was specified in RFC-952
  [DNS:4].  One aspect of host name syntax is hereby changed: the
  restriction on the first character is relaxed to allow either a
  letter or a digit.  Host software MUST support this more liberal
  syntax.

  Host software MUST handle host names of up to 63 characters and
  SHOULD handle host names of up to 255 characters.

  Whenever a user inputs the identity of an Internet host, it SHOULD
  be possible to enter either (1) a host domain name or (2) an IP
  address in dotted-decimal ("#.#.#.#") form.  The host SHOULD check
  the string syntactically for a dotted-decimal number before
  looking it up in the Domain Name System.

  DISCUSSION:
       This last requirement is not intended to specify the complete
       syntactic form for entering a dotted-decimal host number;
       that is considered to be a user-interface issue.  For
       example, a dotted-decimal number must be enclosed within
       "[ ]" brackets for SMTP mail (see Section 5.2.17).  This
       notation could be made universal within a host system,
       simplifying the syntactic checking for a dotted-decimal
       number.

       If a dotted-decimal number can be entered without such
       identifying delimiters, then a full syntactic check must be
       made, because a segment of a host domain name is now allowed
       to begin with a digit and could legally be entirely numeric
       (see Section 6.1.2.4).  However, a valid host name can never
       have the dotted-decimal form #.#.#.#, since at least the
       highest-level component label will be alphabetic.

Cf https://www.ietf.org/rfc/rfc952.txt and https://www.ietf.org/rfc/rfc1123.txt.

Could/should possibly be made available as a separate module.

closes #58

getSubdomain/getDomain do not perform as expected for .er top level domains

Testing the following host "shop.google.er" will return an empty subdomain. For example,

tldjs.getSubdomain("shop.google.er"); // null
tldjs.getDomain("shop.google.er"); // shop.google.er
tldjs.isValid("shop.google.er"); // true
tldjs.tldExists("shop.google.er"); // true

Although the expected outcome would be to have a subdomain of shop and a domain of google.er.

npm package is 18 MB because of included bower_components directory with tal

The node module was published with a bower_components directory from tal, which in turn includes a bunch of binaries:

/tldjs/bower_components/tal/static/vendor/cache ❯ dir
total 10204
drwxr-xr-x 12 dylang staff     408 Oct 13 13:47 .
drwxr-xr-x  3 dylang staff     102 Jul 21  2014 ..
-rw-r--r--  1 dylang staff   25600 Jul 21  2014 childprocess-0.3.9.gem
-rw-r--r--  1 dylang staff  879616 Jul 21  2014 ffi-1.4.0.gem
-rw-r--r--  1 dylang staff  148992 Jul 21  2014 json-1.8.0.gem
-rw-r--r--  1 dylang staff   28672 Jul 21  2014 multi_json-1.7.7.gem
-rw-r--r--  1 dylang staff  119808 Jul 21  2014 rake-10.0.3.gem
-rw-r--r--  1 dylang staff  415232 Jul 21  2014 rubygems-update-2.2.1.gem
-rw-r--r--  1 dylang staff   31744 Jul 21  2014 rubyzip-0.9.9.gem
-rw-r--r--  1 dylang staff 2849280 Jul 21  2014 selenium-webdriver-2.35.1.gem
-rw-r--r--  1 dylang staff 5908480 Jul 21  2014 tal-test-runner-0.0.1.1390678128.gem
-rw-r--r--  1 dylang staff   22528 Jul 21  2014 websocket-1.0.7.gem

bower_components could be added to the .gitignore, bump the version, and publish to fix this problem.

interactive "update" command

  1. create src/index.json containing build metadata
  2. create "check-update" checking remote updates (feed or HTTP header)
  3. display choice:
    • nothing to do
      1. pass (default)
      2. update anyway
    • update available
      1. update (default)
      2. skip update
      3. skip update but build anyway

Prerequisites for running build?

Hi

I tried to update the public suffix list with the method given in README.md.

It failed with the following error:

$ npm run-script build

> [email protected] build ./node_modules/tldjs
> npm run build-rules && npm run build-browser && npm run build-compress

> [email protected] build-rules ./node_modules/tldjs
> grunt update

sh: 1: grunt: not found

npm ERR! [email protected] build-rules: `grunt update`
npm ERR! Exit status 127
npm ERR! 
npm ERR! Failed at the [email protected] build-rules script.
npm ERR! This is most likely a problem with the tldjs package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     grunt update
npm ERR! You can get their info via:
npm ERR!     npm owner ls tldjs
npm ERR! There is likely additional logging output above.
npm ERR! System Linux 3.8.0-31-generic

Not sure what grunt is, but I think that maybe a command is missing in the README.md documentation, before the build command, something that installs/configures something called "grunt"?

BR

Expose methods as functions (bound methods)

Currently, importing a single function from a module (in ES6) like so:

import { getDomain } from 'tldjs';
getDomain(url)

fails with:

TypeError: Cannot read property 'isValid' of undefined
    at getDomain (.....)

This happens because getDomain relies on this being the tldjs object.

To fix this, could you perhaps expose the methods as independent functions?

Dealing with invalid host names

Someone sniffing for open vulnerabilities send our server a request with the URL: http://('4drsteve.com', [], ['54.213.246.177'])/xmlrpc.php

Our server handles traffic for multiple domains, so it passes the Host header toTLD.js to determine which domain the request belongs to. TLD.js extracts the tail portion of the value (177'])), which has no matching rule, and attempts to create a rule by turning this string into a regular expression without escaping it first.

So we get:

SyntaxError: Invalid regular expression: /.+\.177'])$/: Unmatched ')' 
File "/app/node_modules/tldjs/lib/tld.js", line 107, in <unknown>
        if ((new RegExp(pattern)).test(host)) {
  File "/app/node_modules/tldjs/lib/tld.js", line 41, in _someFunction
          if (i in t && fun.call(thisArg, t[i], i, t))
  File "/app/node_modules/tldjs/lib/tld.js", line 91, in Function.getCandidateRule
      _someFunction(rules, function (r) {
  File "/app/node_modules/tldjs/lib/tld.js", line 247, in tld.getDomain
      rule = tld.getCandidateRule(host, rules);

Not sure where to patch fix, probably in Rule constructor when creating a new rule it should always attempt to escape a string?

Or should this even be patched? Should TLD.js throw an error on invalid host names, or just return null?

a_b_c.domain.com — Neither domain, nor publicSuffix? (but valid)

The URL http://wsc4_1.webspectator.com/ is returning null for both getDomain and getPublicSuffix. I can't even find webspectator.com on public suffix list, so I assume the corect result would be webspectator.com for domain and com for public suffix.

Demo:

var tld = require('tldjs');
tld.getDomain('http://wsc4_1.webspectator.com/'); // null
tld.getDomain('wsc4_1.webspectator.com'); // null
tld.getPublicSuffix('http://wsc4_1.webspectator.com/'); // null
tld.isValid('http://wsc4_1.webspectator.com/'); // true

but:

> tld.getDomain('wsc41.webspectator.com')
'webspectator.com'

So it seems it's all about the _ character.
See:

> tld.getDomain('a_b.google.com')
null
> tld.getDomain('a-b.google.com')
'google.com'

is there a callback implimentation

either a sync version or version will allows for a callback would be welcome so i dont have to run around creating caches and polling of cache.

Warning for npm 1.4.28

I get this:

npm WARN engine [email protected]: wanted: {"node":">= 0.10","npm":">= 2.13.0"} (current: {"node":"0.10.40","npm":"1.4.28"})

Not sure this warning is useful; the npm version doesn't seem that important for this package?

URIs input really works?

> require('tldjs').getDomain('http://www.ok.com/domain/ko.com')
'com/domain/ko.com'

> require('tldjs').getDomain('http://www.ok.com/domain/')
'ok.com/domain/'

> require('tldjs').getDomain('http://www.ok.com/')
'ok.com/'

> require('tldjs').getDomain('http://www.ok.com')
'ok.com'

> require('tldjs').getDomain('http://w.ok.com')
'ok.com'

> require('tldjs').getDomain('http://ok.com')
'http://ok.com'

> require('tldjs').getDomain('http://ok.com?hello=oncletom')
null

I have not yet read your code. I think you can include url module to extract hostname in first step...

require('url').parse('http://www.ok.com.eu/ko.com?hello=oncletom').host
'www.ok.com'
require('tldjs').getDomain(require('url').parse('http://www.ok.eu.com/ko.com?hello=oncletom').host)
'ok.eu.com'

How to use tld.js in browser?

Example shows there's a minified version, but I couldn't find it. Can you please point me to one, or instructions on how to build one myself.

Thanks, Sunil

Error during installation of tldjs

Hello, having this bug trying to install tldjs now. This is duplicate of https://github.com/isaacs/npm/issues/2995:

Node version: 0.8.16
NPM version: 1.1.69

tsmtp@alpha:~/TurboPanel/TurboAPI$ npm install tldjs
npm WARN package.json [email protected] No README.md file found!
npm http GET https://registry.npmjs.org/tldjs
npm http 304 https://registry.npmjs.org/tldjs
npm ERR! TypeError: Object ~0.6.0,~0.8.0 has no method 'trim'
npm ERR!     at toComparators (/usr/local/lib/node_modules/npm/node_modules/semver/semver.js:92:27)
npm ERR!     at Object.satisfies (/usr/local/lib/node_modules/npm/node_modules/semver/semver.js:204:11)
npm ERR!     at checkEngine (/usr/local/lib/node_modules/npm/lib/install.js:719:36)
npm ERR!     at Array.0 (/usr/local/lib/node_modules/npm/node_modules/slide/lib/bind-actor.js:15:8)
npm ERR!     at LOOP (/usr/local/lib/node_modules/npm/node_modules/slide/lib/chain.js:15:14)
npm ERR!     at chain (/usr/local/lib/node_modules/npm/node_modules/slide/lib/chain.js:20:5)
npm ERR!     at installOne_ (/usr/local/lib/node_modules/npm/lib/install.js:698:3)
npm ERR!     at installOne (/usr/local/lib/node_modules/npm/lib/install.js:621:3)
npm ERR!     at /usr/local/lib/node_modules/npm/lib/install.js:508:9
npm ERR!     at /usr/local/lib/node_modules/npm/node_modules/slide/lib/async-map.js:54:35
npm ERR! If you need help, you may report this log at:
npm ERR!     <http://github.com/isaacs/npm/issues>
npm ERR! or email it to:
npm ERR!     <[email protected]>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.