GithubHelp home page GithubHelp logo

psl's People

Contributors

ko-zu avatar megabug avatar tomers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

psl's Issues

Fails on ccTLDs

Calling privatesuffix("something.com.mx") returns "com.mx".

Sdist on PyPI

Thank you for writing publicsuffixlist!

For those of us using buildout or other non-wheel-aware installers (or at least for me) it would be convenient to have an sdist available on PyPI. Could I bother you to upload one?

`--help` fails on win

When running publicsuffixlist-download --help on windows, I get the following error:

Error:

Traceback (most recent call last):
  File "C:\bld\publicsuffixlist_1675107463020\_test_env\Scripts\publicsuffixlist-download-script.py", line 9, in <module>
    sys.exit(updatePSL())
             ^^^^^^^^^^^
  File "C:\bld\publicsuffixlist_1675107463020\_test_env\Lib\site-packages\publicsuffixlist\update.py", line 41, in updatePSL
    os.rename(psl_file + ".swp", psl_file)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\bld\\publicsuffixlist_1675107463020\\_test_env\\Lib\\site-packages\\publicsuffixlist\\public_suffix_list.dat.swp' -> 'C:\\bld\\publicsuffixlist_1675107463020\\_test_env\\Lib\\site-packages\\publicsuffixlist\\public_suffix_list.dat'

Logs: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=650026&view=logs&j=3ff94dba-189a-527c-65e3-ce8503824159&t=35acf2bd-66a8-5b9f-4368-b52d351bfcc2
Context: conda-forge/staged-recipes#21906

Tag the source

Could you please tag the source? This allows distributions to get the complete source from GitHub if they want.

Thanks

issue/inconsistent behavior for all *. rules

if there is a rule like:
*.abc.com
I would expect that if you give it
substuf.def.abc.com that the public suffix should be def.abc.com.

from publicsuffixlist import PublicSuffixList

# RULES TESTED:
# *.awdev.ca
# *.advisor.ws
#
# *.compute.amazonaws.com
# *.compute-1.amazonaws.com
# *.compute.amazonaws.com.cn
#
# *.elb.amazonaws.com
# *.elb.amazonaws.com.cn

psl = PublicSuffixList()
input = [
    'test.awdev.ca',
    'test.advisor.ws',
    
    'test.compute.amazonaws.com',
    'test.compute-1.amazonaws.com',
    'test.compute.amazonaws.com.cn',
    
    'test.elb.amazonaws.com',
    'test.amazonaws.com.cn',
    
    # add another level and it gets weird
    'sub.test.awdev.ca',
    'sub.test.advisor.ws',
    
    'sub.test.compute.amazonaws.com',
    'sub.test.compute-1.amazonaws.com',
    'sub.test.compute.amazonaws.com.cn',

    'sub.test.elb.amazonaws.com',
    'sub.test.amazonaws.com.cn',
]


output = [(i, psl.privatesuffix(i)) for i in input]

for t in output:
    print(f'{t[0]} -> {t[1]}')

Output from the run:

test.awdev.ca -> None
test.advisor.ws -> None
test.compute.amazonaws.com -> None
test.compute-1.amazonaws.com -> None
test.compute.amazonaws.com.cn -> None
test.elb.amazonaws.com -> None
test.amazonaws.com.cn -> amazonaws.com.cn
sub.test.awdev.ca -> sub.test.awdev.ca
sub.test.advisor.ws -> sub.test.advisor.ws
sub.test.compute.amazonaws.com -> sub.test.compute.amazonaws.com
sub.test.compute-1.amazonaws.com -> sub.test.compute-1.amazonaws.com
sub.test.compute.amazonaws.com.cn -> sub.test.compute.amazonaws.com.cn
sub.test.elb.amazonaws.com -> sub.test.elb.amazonaws.com
sub.test.amazonaws.com.cn -> amazonaws.com.cn

I would have expected the first set to return the domains unchanged and the second set to return the part minus the sub. part.

in either case the behavior is inconsistent for 2 reasons:

  1. test.amazonaws.com.cn -> amazonaws.com.cn the return was not None like all the others.
  2. why are all the domains with sub returning unchanged? again with the sub.test.amazonaws.com.cn -> amazonaws.com.cn behaving differently.

publicsuffix of cloudfront.net

cloudfront.net is a public suffix and belong to Amazon.
but before the TLD was registered, Amazon also has the domain cloudfront with TLD .net.
So it's confused to discern the root domain of *.cloudfront.net.

examples:

In [164]: ps.privatesuffix('d2os3n5ieuk9g5.cloudfront.net')
Out[164]: 'd2os3n5ieuk9g5.cloudfront.net'

In [165]: ps.privatesuffix('a286330aad4e096be6cdda229527774f4.profile.tlv50-c1.cloudfront.net')
Out[165]: 'tlv50-c1.cloudfront.net'

And we known every root domain has NS record, so check it.

dig d2os3n5ieuk9g5.cloudfront.net NS

; <<>> DiG 9.10.6 <<>> d2os3n5ieuk9g5.cloudfront.net NS
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1735
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 3

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;d2os3n5ieuk9g5.cloudfront.net.	IN	NS

;; ANSWER SECTION:
d2os3n5ieuk9g5.cloudfront.net. 830 IN	NS	ns-1961.awsdns-53.co.uk.
d2os3n5ieuk9g5.cloudfront.net. 830 IN	NS	ns-1525.awsdns-62.org.
d2os3n5ieuk9g5.cloudfront.net. 830 IN	NS	ns-765.awsdns-31.net.
d2os3n5ieuk9g5.cloudfront.net. 830 IN	NS	ns-224.awsdns-28.com.

;; ADDITIONAL SECTION:
ns-1961.awsdns-53.co.uk. 2488	IN	A	205.251.199.169
ns-1525.awsdns-62.org.	8341	IN	A	205.251.197.245

;; Query time: 36 msec
;; SERVER: 10.95.44.53#53(10.95.44.53)
;; WHEN: Wed Sep 09 13:12:46 CST 2020
;; MSG SIZE  rcvd: 227
dig tlv50-c1.cloudfront.net NS

; <<>> DiG 9.10.6 <<>> tlv50-c1.cloudfront.net NS
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 868
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;tlv50-c1.cloudfront.net.	IN	NS

;; AUTHORITY SECTION:
cloudfront.net.		59	IN	SOA	ns-418.awsdns-52.com. hostmaster.cloudfront.net. 1377556270 16384 2048 1048576 60

;; Query time: 1018 msec
;; SERVER: 10.95.44.53#53(10.95.44.53)
;; WHEN: Wed Sep 09 13:13:56 CST 2020
;; MSG SIZE  rcvd: 119
nslookup a286330aad4e096be6cdda229527774f4.profile.tlv50-c1.cloudfront.net 8.8.8.8
Server:		8.8.8.8
Address:	8.8.8.8#53

Non-authoritative answer:
Name:	a286330aad4e096be6cdda229527774f4.profile.tlv50-c1.cloudfront.net
Address: 13.226.6.197
Name:	a286330aad4e096be6cdda229527774f4.profile.tlv50-c1.cloudfront.net
Address: 13.226.6.231
Name:	a286330aad4e096be6cdda229527774f4.profile.tlv50-c1.cloudfront.net
Address: 13.226.6.22
Name:	a286330aad4e096be6cdda229527774f4.profile.tlv50-c1.cloudfront.net
Address: 13.226.6.45

tlv50-c1.cloudfront.net has no NS recored, but a286330aad4e096be6cdda229527774f4.profile.tlv50-c1.cloudfront.net has A recored,
so the root domain of a286330aad4e096be6cdda229527774f4.profile.tlv50-c1.cloudfront.net is cloudfront.net.

Wrong timestamp parsed from last-modified header

I reside in UTC+03. When I use the update.py script:

>>> import time
>>> from email.utils import parsedate
>>> lastmod = "Thu, 28 May 2020 16:40:36 GMT"
>>> parsedate(lastmod)
(2020, 5, 28, 16, 40, 36, 0, 1, -1)  # <-- ok
>>> time.mktime(parsedate(lastmod))
1590673236.0  # <-- not ok! 3 hours offset (caused by my TZ)

# 1590673236 is "Thursday, May 28, 2020 1:40:36 PM GMT" (notice the 3 hours difference, caused by my TZ)
# should be 1590684036

A resolution for this is to replace time.mktime() with calendar.timegm().
Reference: https://docs.python.org/3/library/time.html#index-4

is_public is broken for upper case input

psl.is_public() is broken for upper case input with 2 or more labels.

psl.is_public("Jp") # => True
psl.is_public("Co.jp") # => False

TLD only domain has unintentionally returned the right value. related to #20

Dashes cause a parsing error

Try this example, as I believe there is a bug with a dash. Note that all I did was change "compute-1" to "compute1" and then it works as expected.

>>> from publicsuffixlist import PublicSuffixList
>>> psl = PublicSuffixList()
>>> psl.privatesuffix('ec2-107-21-74-29.compute-1.amazonaws.com')
>>> psl.publicsuffix('ec2-107-21-74-29.compute-1.amazonaws.com')
'ec2-107-21-74-29.compute-1.amazonaws.com'
>>> psl.publicsuffix('ec2-107-21-74-29.compute1.amazonaws.com')
'com'
>>> psl.privatesuffix('ec2-107-21-74-29.compute1.amazonaws.com')
'amazonaws.com'

Uppercase domain causes inconsistent result for TLD

publicsuffix() in 0.7.14 returns non-lower suffix for TLDs.

psl = publicsuffixlist.PublicSuffixList()
psl.publicsuffix("example.COM") # => "com"
psl.publicsuffix("COM") # => "COM"

the shortcut code path for TLD-only domain should return lowered one for consistency.

Public Suffix data incorrect?

Hello,
Thank you for this project. If I am understanding the purpose of these methods correctly then I believe the parser is pulling the incorrect information. Its my understanding that the eTLD (effective top level domain) where an organization could register a private domain would be places like ".com" and "com.uk" would be "public suffixes". The domain that someone would register there such as "google.com" and "google.com.uk" would both be "private suffixes". However, that isn't what the tool produces.

>>> PublicSuffixList().is_private("com.uk")   <-Should be False
True
>>> PublicSuffixList().is_private("com")
False

Furthermore if I try to retrieve the public and private suffixes I get incorrect data as well.

>>> PublicSuffixList().publicsuffix("google.com.uk")   <- should be com.uk
'uk'          
>>> PublicSuffixList().privatesuffix("google.com.uk")  <- Should be google.com.uk
'com.uk'
>>> PublicSuffixList().privatesuffix("google.com")     <- should be google.com and is correct
'google.com'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.