thom4parisot / tld.js Goto Github PK
View Code? Open in Web Editor NEWJavaScript API to work easily with complex domain names, subdomains and well-known TLDs.
Home Page: https://npmjs.com/tldjs
License: MIT License
JavaScript API to work easily with complex domain names, subdomains and well-known TLDs.
Home Page: https://npmjs.com/tldjs
License: MIT License
With usage like:
$ tld.js domain love.to.speak.blabla.com
-> blabla.com
$ tld.js validity blabla.com
-> valid
$ tld.js subdomain love.to.speak.blabla.com
-> love.to.speak
Future evolution
$ tld.js exists blabla.com ahahah.com ahahah.gd
-> blabla.com: valid
-> ahahah.com: valid
-> ahahah.gd: invalid
postinstall
scripttldjs_update_rules=true
)npm install --tldjs-update-rules
(as of npm config)devDependencies
:
request
async
dependencies
should be added:
punycode
(because node declares its bundle version deprecated)It would be great if subdomains on localhost could be supported. Is there a particular reason it isn't included or simple oversight as it's probably not very commonly needed?
tld.getSubdomain("vhost.localhost"); // should return vhost
Hi,
I saw #37 and I'm asking why bower dependency isn't the browserified version ?
In order to decouple it from "update" (which is check-upadte
, download
and build
in this order).
.gitignore in subdirectories is also interpreted as .npmignore
To analyze exceptions and errors in the browser from public webpage.
Because PublicSuffix list contains rules, domains and "reserved domains", it's finally harsh again to detect what's a real domain from a TLD, an SLD or something else.
Back to zero?
Two similar failing examples:
tldjs.getDomain("http://cdn.jsdelivr.net/g/[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]")
//"0.3"
tldjs.getDomain("http://cdn.jsdelivr.net/g/[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]")
//"2.4"
Observed in the Web version (http://wzrd.in/standalone/tldjs) and in tld.js from npm.
Compare with tldextract (Python module):
$ curl "http://tldextract.appspot.com/api/extract?url=http://cdn.jsdelivr.net/g/[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]"
{"domain": "jsdelivr", "subdomain": "cdn", "suffix": "net", "tld": "net"}
previous issue is here #69
It seems the pull request didn't fix it.
It will be great if we can customize rules.json
used by tldjs.
In order to reduce my bundle size, I would like to use a lighter version of rules.json.
I tried check this function with test data:
http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/data/test_psl.txt?raw=1
For example:
tldjs.getPublicSuffix('test.kyoto.jp') -> kyoto.jp
but have to return:
checkPublicSuffix('test.kyoto.jp', 'test.kyoto.jp');
To monitor time and memory consumption over versions.
Parsing + RegEx is definitely too slow even though it is solid.
Available options:
Hello,
I noticed weird behaviour handling urls from the domain github.io
using the getDomain
method.
const tld = require('tldjs');
console.log("tld", tld.getDomain("google.io")); --> google
console.log("tld", tld.getDomain("github.io")); --> null
and i get back null
on github.io
You can check it live in here
UPDATE
The behaviour extends as well to other functions like getPublicSuffix
:
console.log(tld.getPublicSuffix("google.io")); --> io
console.log(tld.getPublicSuffix("github.io")); --> github.io
Looking forward for your input.
It seems related to https://gist.github.com/2727303
No clue for the moment
I have been pointed here by the Brave Browser staff.
Could you please add keybase.pub
to the list of TLDs?
Hi, I was trying to run this project to see how the parsing works. The project seems to fail when i do 'npm install'
The error:
npm ERR! Tell the author that this fails on your system:
npm ERR! grunt update
Could you guide me as to how to proceed ?
Thanks
Because we all love CasperJS.
Tests bumped to 900ms with more yellow dots in mocha ;-(
Hi, I was trying to run this project to see how the parsing works. The project seems to fail when i do 'npm install'
The error:
npm ERR! Tell the author that this fails on your system:
npm ERR! grunt update
Could you guide me as to how to proceed ?
Thanks
It has been discontinued as of 31 July 2015 https://en.wikipedia.org/wiki/.an
Please see attach plunkr snippet
nesh> var tld = require('tldjs')
undefined
nesh> tld.getDomain('http://www.nytimes.com/glogin?URI=http://www.nytimes.com/2010/03/26/us/politics/26court.html&OQ=_rQ3D1Q26&OP=45263736Q2FKgi!KQ7Dr!K@@@Ko!fQ24KJg(Q3FQ5Cgg!Q60KQ60W.WKWQ22KQ60IKyQ3FKigQ24Q26!Q26(Q3FKQ60I(gyQ5C!Q2Ao!fQ24')
SyntaxError: Invalid regular expression: /.+\.wkwq22kq60ikyq3fkigq24q26!q26(q3fkq60i(gyq5c!q2ao!fq24$/: Unterminated group
at new RegExp (<anonymous>)
at /home/suor/projects/domain-scraper/node_modules/tldjs/lib/tld.js:107:10
at _someFunction (/home/suor/projects/domain-scraper/node_modules/tldjs/lib/tld.js:41:25)
at Function.getCandidateRule (/home/suor/projects/domain-scraper/node_modules/tldjs/lib/tld.js:91:3)
at tld.getDomain (/home/suor/projects/domain-scraper/node_modules/tldjs/lib/tld.js:247:14)
at repl:1:6
at REPLServer.self.eval (repl.js:110:21)
at REPLServer.repl.eval (/home/suor/.nvm/v0.10.28/lib/node_modules/nesh/lib/plugins/doc.js:147:16)
at Interface.<anonymous> (repl.js:239:12)
at Interface.EventEmitter.emit (events.js:117:20)
The current implementation is too naive.
The syntax of a legal Internet host name was specified in RFC-952
[DNS:4]. One aspect of host name syntax is hereby changed: the
restriction on the first character is relaxed to allow either a
letter or a digit. Host software MUST support this more liberal
syntax.
Host software MUST handle host names of up to 63 characters and
SHOULD handle host names of up to 255 characters.
Whenever a user inputs the identity of an Internet host, it SHOULD
be possible to enter either (1) a host domain name or (2) an IP
address in dotted-decimal ("#.#.#.#") form. The host SHOULD check
the string syntactically for a dotted-decimal number before
looking it up in the Domain Name System.
DISCUSSION:
This last requirement is not intended to specify the complete
syntactic form for entering a dotted-decimal host number;
that is considered to be a user-interface issue. For
example, a dotted-decimal number must be enclosed within
"[ ]" brackets for SMTP mail (see Section 5.2.17). This
notation could be made universal within a host system,
simplifying the syntactic checking for a dotted-decimal
number.
If a dotted-decimal number can be entered without such
identifying delimiters, then a full syntactic check must be
made, because a segment of a host domain name is now allowed
to begin with a digit and could legally be entirely numeric
(see Section 6.1.2.4). However, a valid host name can never
have the dotted-decimal form #.#.#.#, since at least the
highest-level component label will be alphabetic.
Cf https://www.ietf.org/rfc/rfc952.txt and https://www.ietf.org/rfc/rfc1123.txt.
Could/should possibly be made available as a separate module.
closes #58
Testing the following host "shop.google.er" will return an empty subdomain. For example,
tldjs.getSubdomain("shop.google.er"); // null
tldjs.getDomain("shop.google.er"); // shop.google.er
tldjs.isValid("shop.google.er"); // true
tldjs.tldExists("shop.google.er"); // true
Although the expected outcome would be to have a subdomain of shop and a domain of google.er.
The node module was published with a bower_components
directory from tal
, which in turn includes a bunch of binaries:
/tldjs/bower_components/tal/static/vendor/cache ❯ dir
total 10204
drwxr-xr-x 12 dylang staff 408 Oct 13 13:47 .
drwxr-xr-x 3 dylang staff 102 Jul 21 2014 ..
-rw-r--r-- 1 dylang staff 25600 Jul 21 2014 childprocess-0.3.9.gem
-rw-r--r-- 1 dylang staff 879616 Jul 21 2014 ffi-1.4.0.gem
-rw-r--r-- 1 dylang staff 148992 Jul 21 2014 json-1.8.0.gem
-rw-r--r-- 1 dylang staff 28672 Jul 21 2014 multi_json-1.7.7.gem
-rw-r--r-- 1 dylang staff 119808 Jul 21 2014 rake-10.0.3.gem
-rw-r--r-- 1 dylang staff 415232 Jul 21 2014 rubygems-update-2.2.1.gem
-rw-r--r-- 1 dylang staff 31744 Jul 21 2014 rubyzip-0.9.9.gem
-rw-r--r-- 1 dylang staff 2849280 Jul 21 2014 selenium-webdriver-2.35.1.gem
-rw-r--r-- 1 dylang staff 5908480 Jul 21 2014 tal-test-runner-0.0.1.1390678128.gem
-rw-r--r-- 1 dylang staff 22528 Jul 21 2014 websocket-1.0.7.gem
bower_components
could be added to the .gitignore
, bump the version, and publish to fix this problem.
src/index.json
containing build metadataHi
I tried to update the public suffix list with the method given in README.md.
It failed with the following error:
$ npm run-script build
> [email protected] build ./node_modules/tldjs
> npm run build-rules && npm run build-browser && npm run build-compress
> [email protected] build-rules ./node_modules/tldjs
> grunt update
sh: 1: grunt: not found
npm ERR! [email protected] build-rules: `grunt update`
npm ERR! Exit status 127
npm ERR!
npm ERR! Failed at the [email protected] build-rules script.
npm ERR! This is most likely a problem with the tldjs package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! grunt update
npm ERR! You can get their info via:
npm ERR! npm owner ls tldjs
npm ERR! There is likely additional logging output above.
npm ERR! System Linux 3.8.0-31-generic
Not sure what grunt is, but I think that maybe a command is missing in the README.md documentation, before the build command, something that installs/configures something called "grunt"?
BR
It would enable people own repos following the syntax of publicsuffix.org.
Seems that if there is an underscore "_" in the domain, it returns False instead of True when checking the existence of the domain TLD
To have the most lightweight library as possible and no requirements on AMD modules and overkill stuff.
Some hints:
Currently, importing a single function from a module (in ES6) like so:
import { getDomain } from 'tldjs';
getDomain(url)
fails with:
TypeError: Cannot read property 'isValid' of undefined
at getDomain (.....)
This happens because getDomain relies on this
being the tldjs
object.
To fix this, could you perhaps expose the methods as independent functions?
Someone sniffing for open vulnerabilities send our server a request with the URL: http://('4drsteve.com', [], ['54.213.246.177'])/xmlrpc.php
Our server handles traffic for multiple domains, so it passes the Host
header toTLD.js to determine which domain the request belongs to. TLD.js extracts the tail portion of the value (177'])
), which has no matching rule, and attempts to create a rule by turning this string into a regular expression without escaping it first.
So we get:
SyntaxError: Invalid regular expression: /.+\.177'])$/: Unmatched ')'
File "/app/node_modules/tldjs/lib/tld.js", line 107, in <unknown>
if ((new RegExp(pattern)).test(host)) {
File "/app/node_modules/tldjs/lib/tld.js", line 41, in _someFunction
if (i in t && fun.call(thisArg, t[i], i, t))
File "/app/node_modules/tldjs/lib/tld.js", line 91, in Function.getCandidateRule
_someFunction(rules, function (r) {
File "/app/node_modules/tldjs/lib/tld.js", line 247, in tld.getDomain
rule = tld.getCandidateRule(host, rules);
Not sure where to patch fix, probably in Rule
constructor when creating a new rule it should always attempt to escape a string?
Or should this even be patched? Should TLD.js throw an error on invalid host names, or just return null
?
Hi,
If founded that if getDomain is called with an url without any subdomain, the result of getDomain also has 'http://'
For example,
http://www.google.com will return google.com
but
http://google.com will return http://google.com
In the second case, i think google.com may be returned. Is this intended?
The URL http://wsc4_1.webspectator.com/
is returning null
for both getDomain
and getPublicSuffix
. I can't even find webspectator.com
on public suffix list, so I assume the corect result would be webspectator.com
for domain and com
for public suffix.
Demo:
var tld = require('tldjs');
tld.getDomain('http://wsc4_1.webspectator.com/'); // null
tld.getDomain('wsc4_1.webspectator.com'); // null
tld.getPublicSuffix('http://wsc4_1.webspectator.com/'); // null
tld.isValid('http://wsc4_1.webspectator.com/'); // true
but:
> tld.getDomain('wsc41.webspectator.com')
'webspectator.com'
So it seems it's all about the _
character.
See:
> tld.getDomain('a_b.google.com')
null
> tld.getDomain('a-b.google.com')
'google.com'
either a sync version or version will allows for a callback would be welcome so i dont have to run around creating caches and polling of cache.
.google
is an actual TLD (sigh) so the README should be updated to reflect this.
https://github.com/oncletom/tld.js/blob/master/README.md#tldexists
Outdated example:
tld.tldExists('google.google'); // returns `false` (not an explicit registered TLD)
I get this:
npm WARN engine [email protected]: wanted: {"node":">= 0.10","npm":">= 2.13.0"} (current: {"node":"0.10.40","npm":"1.4.28"})
Not sure this warning is useful; the npm version doesn't seem that important for this package?
Checking TLDs in the browser would be a huge win.
> require('tldjs').getDomain('http://www.ok.com/domain/ko.com')
'com/domain/ko.com'
> require('tldjs').getDomain('http://www.ok.com/domain/')
'ok.com/domain/'
> require('tldjs').getDomain('http://www.ok.com/')
'ok.com/'
> require('tldjs').getDomain('http://www.ok.com')
'ok.com'
> require('tldjs').getDomain('http://w.ok.com')
'ok.com'
> require('tldjs').getDomain('http://ok.com')
'http://ok.com'
> require('tldjs').getDomain('http://ok.com?hello=oncletom')
null
I have not yet read your code. I think you can include url module to extract hostname in first step...
require('url').parse('http://www.ok.com.eu/ko.com?hello=oncletom').host
'www.ok.com'
require('tldjs').getDomain(require('url').parse('http://www.ok.eu.com/ko.com?hello=oncletom').host)
'ok.eu.com'
2) http://publicsuffix.org/list/test.txt More complex TLD:
Error: expected null to equal 'test.kyoto.jp'
Example shows there's a minified version, but I couldn't find it. Can you please point me to one, or instructions on how to build one myself.
Thanks, Sunil
Hello, having this bug trying to install tldjs
now. This is duplicate of https://github.com/isaacs/npm/issues/2995:
Node version: 0.8.16
NPM version: 1.1.69
tsmtp@alpha:~/TurboPanel/TurboAPI$ npm install tldjs
npm WARN package.json [email protected] No README.md file found!
npm http GET https://registry.npmjs.org/tldjs
npm http 304 https://registry.npmjs.org/tldjs
npm ERR! TypeError: Object ~0.6.0,~0.8.0 has no method 'trim'
npm ERR! at toComparators (/usr/local/lib/node_modules/npm/node_modules/semver/semver.js:92:27)
npm ERR! at Object.satisfies (/usr/local/lib/node_modules/npm/node_modules/semver/semver.js:204:11)
npm ERR! at checkEngine (/usr/local/lib/node_modules/npm/lib/install.js:719:36)
npm ERR! at Array.0 (/usr/local/lib/node_modules/npm/node_modules/slide/lib/bind-actor.js:15:8)
npm ERR! at LOOP (/usr/local/lib/node_modules/npm/node_modules/slide/lib/chain.js:15:14)
npm ERR! at chain (/usr/local/lib/node_modules/npm/node_modules/slide/lib/chain.js:20:5)
npm ERR! at installOne_ (/usr/local/lib/node_modules/npm/lib/install.js:698:3)
npm ERR! at installOne (/usr/local/lib/node_modules/npm/lib/install.js:621:3)
npm ERR! at /usr/local/lib/node_modules/npm/lib/install.js:508:9
npm ERR! at /usr/local/lib/node_modules/npm/node_modules/slide/lib/async-map.js:54:35
npm ERR! If you need help, you may report this log at:
npm ERR! <http://github.com/isaacs/npm/issues>
npm ERR! or email it to:
npm ERR! <[email protected]>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.