GithubHelp home page GithubHelp logo

pauldix / domainatrix Goto Github PK

View Code? Open in Web Editor NEW
309.0 9.0 60.0 117 KB

A cruel mistress that uses the public suffix domain list to dominate URLs by canonicalizing, finding the public suffix, and breaking them into their domain parts.

Ruby 100.00%

domainatrix's Introduction

Domainatrix

http://github.com/pauldix/domainatrix

Summary

A cruel mistress that uses the public suffix domain list to dominate URLs by canonicalizing, finding public suffixes, and breaking them into their domain parts.

Description

This simple library can parse a URL into its canonical form. It uses the list of domains from http://publicsuffix.org to break the domain into its public suffix, domain, and subdomain.

Installation

  gem install domainatrix --source http://gemcutter.org

Use

require 'rubygems'
require 'domainatrix'

url = Domainatrix.parse("http://www.pauldix.net")
url.url       # => "http://www.pauldix.net" (the original url)
url.public_suffix       # => "net"
url.domain    # => "pauldix"
url.canonical # => "net.pauldix"

url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix       # => "co.uk"
url.domain    # => "pauldix"
url.subdomain # => "foo.bar"
url.path      # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"

LICENSE

(The MIT License)

Copyright © 2009:

Paul Dix

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
‘Software’), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED ‘AS IS’, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

domainatrix's People

Contributors

dj2 avatar enricob avatar f1sherman avatar joelvh avatar leereilly avatar mtodd avatar pauldix avatar pcasaretto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

domainatrix's Issues

Pass IP Address Causes Exception

It would be nice to be able to pass in IP addresses, as often a website will run as the IP for testing. Eg. url = Domainatrix.parse(request.url)
where request.url may be 192.168.0.1 testing on the local network.
At the moment its throws 'You have a nil object when you didn't expect it!'

Exception on IP address in host string

ruby-1.9.2-p0 > Domainatrix.parse('http://74.205.88.194/article/news/microsoft_ballmer_envious_ipads_success_insists_windows_tablets_are_priority')
NoMethodError: undefined method `has_key?' for nil:NilClass
    from /Users/igrigorik/.rvm/gems/ruby-1.9.2-p0/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:52:in `block in parse_domains_from_host'
    from /Users/igrigorik/.rvm/gems/ruby-1.9.2-p0/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:47:in `each_index'
    from /Users/igrigorik/.rvm/gems/ruby-1.9.2-p0/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:47:in `parse_domains_from_host'
    from /Users/igrigorik/.rvm/gems/ruby-1.9.2-p0/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:33:in `parse'
    from /Users/igrigorik/.rvm/gems/ruby-1.9.2-p0/gems/domainatrix-0.0.7/lib/domainatrix.rb:12:in `parse'
    from (irb):2
    from /Users/igrigorik/.rvm/rubies/ruby-1.9.2-p0/bin/irb:17:in `'
ruby-1.9.2-p0 > 

Blows up on .dev (for Pow web server)

Obviously this isn't a legitimate TLD, but it's in use thanks to the Pow rack server by 37 Signals. It might make sense to add support for it, or at least not throw an error when it is used with parse:

>> Domainatrix.parse('http://google.com/')
=> #<Domainatrix::Url:0x00000104a196b8 @scheme="http", @host="google.com", @url="http://google.com/", @public_suffix="com", @domain="google", @subdomain="", @path="/">
>> Domainatrix.parse('http://google.dev/')
NoMethodError: undefined method `has_key?' for nil:NilClass
  from [...]/whiny_nil.rb:48:in `method_missing'
  from [...]/domainatrix-0.0.10/lib/domainatrix/domain_parser.rb:59:in `block in parse_domains_from_host'
  from [...]/domain_parser.rb:54:in `each_index'
  from [...]/domain_parser.rb:54:in `parse_domains_from_host'
  from [...]/domain_parser.rb:40:in `parse'

Blows up when URL doesn't contain HTTP:// (Sinatra)

Blows up when URL doesn't contain HTTP:// would be nice to make the HTTP:// optional

Code was tested under Sinatra

Error:-

  • undefined method `split' for nil:NilClass
    • file: domain_parser.rb
    • location: parse_domains_from_host
    • line: 49

concatenating "http://" is a workaround but it would be nice to have this within the gem itself..

No scheme gives error

p Domainatrix.parse("/test?foo=bar")
# => NoMethodError: undefined method `split' for nil:NilClass

p Domainatrix.parse("example.com/test?foo=bar")
# => NoMethodError: undefined method `split' for nil:NilClass

p Domainatrix.parse("www.example.com/test?foo=bar")
# => NoMethodError: undefined method `split' for nil:NilClass

p Domainatrix.parse("http://www.example.com/test?foo=bar")
#=> #<Domainatrix::Url:0x007fa4064d7810 @scheme="http", @host="www.example.com", @url="http://www.example.com/test?foo=bar", @public_suffix="com", @domain="example", @subdomain="www", @path="/test?foo=bar">

Blows up when domain has no suffix eg, 'http://www.foo/'

Hi,
It seems that Domainatrix.parse() method fails when domain has no suffix eg, 'http://www.foo/'

$ irb

require 'domainatrix'
Domainatrix.parse('http://www.foo/')

NoMethodError: undefined method has_key?' for nil:NilClass from /Users/ami/.rvm/gems/ruby-1.9.2-head@rails3beta/gems/domainatrix-0.0.10/lib/domainatrix/domain_parser.rb:59:inblock in parse_domains_from_host'

Thanks,
Ami

url.full_domain ?

Suggestion:
At the moment, in order to get the full domain (minus subdomain) I have to:
url.domain + '.' + url.public_suffix

It would be nice to have one method that combines these :)

Canonical name

Issue

On the gem's home page, you write:

url = Domainatrix.parse("http://www.pauldix.net")
url.canonical # => "net.pauldix"

However, in IRB, I get the following behavior:

irb> url = Domainatrix.parse('http://www.pauldix.net')
=> #<Domainatrix::Url:0x007fd0409d5310 @scheme="http", @host="www.pauldix.net", @url="http://www.pauldix.net", @public_suffix="net", @domain="pauldix", @subdomain="www", @path="">
> url.canonical
=> "net.pauldix.www"

Is the www supposed to be a part of the canonical name?

Ruby

$ ruby -v
ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-darwin14.0.0]

Gem

$ gem list domainatrix -d

*** LOCAL GEMS ***

domainatrix (0.0.11)
    Authors: Paul Dix, Brian John
    Homepage: http://github.com/pauldix/domainatrix
    Installed at: /Users/craibuc/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0

    A cruel mistress that uses the public suffix domain list to dominate
    URLs by canonicalizing, finding the public suffix, and breaking them
    into their domain parts.

Failure to parse DAT file with Ruby 1.9.1

When using Domainatrix with Ruby 1.9.1 (p378 on OSX 10.6 i386) the following error occurs when calling Domainatrix.parse:
ArgumentError: invalid byte sequence in US-ASCII
from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:14:in strip' from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:14:inblock in read_dat_file'
from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:13:in each' from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:13:inread_dat_file'
from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix/domain_parser.rb:9:in initialize' from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix.rb:11:innew'
from /opt/lib/ruby/gems/1.9.1/gems/domainatrix-0.0.7/lib/domainatrix.rb:11:in parse' from (irb):3 from /opt/bin/irb:12:in

'

FIX:

change domainatrix/domain_parser.rb:14 from:
line = line.strip

to: line = line.force_encoding('utf-8').strip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.