GithubHelp home page GithubHelp logo

meeb / whoisit Goto Github PK

View Code? Open in Web Editor NEW
62.0 7.0 19.0 196 KB

A Python library to RDAP WHOIS-like services for internet resources such as ASNs, IPs, CIDRs and domains

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.13% Python 99.87%
python whois rdap

whoisit's Introduction

whoisit

A Python client to RDAP WHOIS-like services for internet resources (IPs, ASNs, domains, etc.). whoisit is a simple library that makes requests to the "new" RDAP (Registration Data Access Protocol) query services for internet resource information. These services started to appear in 2017 and have become more widespread since 2020.

whoisit is designed to abstract over RDAP. While RDAP is a basic HTTP and JSON based protocol which can be implemented in a single line of Python with requests the bootstrapping (which RDAP service to query for what item) and extracting useful information from the RDAP responses is extensive enough that a library like this is useful.

Installation

whoisit is pure Python and only has a dependancy on the requests and dateutil libraries. You can install whoisit via pip:

$ pip install whoisit

Any modern version of Python3 will be compatible.

Usage

whoisit supports the 4 main types of lookups supported by RDAP services. These are:

  • ASNs (autonomous systems numbers) known as autnum objects
  • DNS registrations known as domain objects - only some TLDs are supported
  • IPv4 and IPv6 addresses and CIDRs / prefixes known as ip objects
  • Entities (People, organisations etc. by ENTITY-HANDLES) known as entity objects

whoisit returns parsed RDAP formatted JSON as (mostly) flat dictionaries by default.

Basic examples:

import whoisit

whoisit.bootstrap()

results = whoisit.asn(1234)
print(results['name'])

results = whoisit.domain('example.com')
print(results['nameservers'])

results = whoisit.ip('1.2.3.4')
print(results['name'])

results = whoisit.ip('1.2.3.0/24')
print(results['name'])

results = whoisit.ip('2404:1234:1234:1234:1234:1234:1234:1234')
print(results['name'])

results = whoisit.ip('2404:1234::/32')
print(results['name'])

results = whoisit.entity('ARIN-CHA-1')
print(results['last_changed_date'])

Raw response data

In each case results will be a dictionary containing the most useful information for each request type. If the data you want is not in the response you can request the raw, unparsed and large RDAP JSON data by adding the raw=True argument to the request, for example:

results = whoisit.domain('example.com', raw=True)
# 'results' is now the full, raw response from the RDAP service

If for some reason you accidentally end up querying the wrong RDAP endpoint your query should end up still working, for example if you query ARIN for information on the IP address 1.1.1.1 it will redirect you to APNIC (where 1.1.1.1 is allocated) automatically.

Some resources, most notably entity handles, do not redirect or have assigned obvious namespaces linked to particular registries. For these queries whoisit will attempt to guess the RDAP service to query by examining the name for prefixes or postfix, such as many RIPE entities are named RIPE-SOMETHING. If your entity does not have an obvious prefix or postfix like ARIN-* or *-AP you will need to tell whoisit which registry to make the request to by specifying the rir=name argument. The rir argument stands for "Regional Internet Registry". For example:

# This will work OK because the entity is prefixed with an obvious RIR name
results = whoisit.entity('RIPE-NCC-MNT')

# This will cause a QueryError to be raised because ARIN returns a 404 for RIPE-NCC-MNT
results = whoisit.entity('RIPE-NCC-MNT', rir='arin')

# This will cause a UnsupportedError to be raised because we have no way to detect
# which RDAP service to query as the entity has no RIR prefix or postfix
results = whoisit.entity('AS5089-MNT')

# This will work OK because the entity is registered at RIPE
results = whoisit.entity('AS5089-MNT', rir='ripe')

Weaken SSL ciphers

Some RDAP servers do not have particularly secure SSL implementations. As RDAP returns read-only and public information it may be acceptable for you to want to downgrade the security of your whoisit requests to successfully return data.

You can use the allow_insecure_ssl=True argument to your queries to enable this.

For example (as of 2021-07-25):

# This will result in an SSL error
results = whoisit.domain('nic.work')
# ... SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1129)')))

# This will work
results = whoisit.domain('nic.work', allow_insecure_ssl=True)

Note that with allow_insecure_ssl=True the upstream RDAP server certificate is still validated, it just permits weaker SSL ciphers during the handshake. You should only use allow_insecure_ssl=True if your request fails with an SSL cipher or handshake error first.

Domain lookup subrequests

Many RDAP endpoints for domains supply a related RDAP server run by a registry which may contain more information about the domain. whoisit by default will attempt to make a subrequest to the related RDAP endpoint if available to obtain more detailed results. Occasionally, the related RDAP endpoints may fail or return data in an invalid format. You can disable related RDAP endpoint subrequests by passing the follow_related=False argument to whoisit.domain(...). For example (as of 2024-04-30):

results = whoisit.domain('example.com', follow_related=False)

If you encounter a parsing error when using related RDAP endpoint data you can also skip the parsing by using raw=True but continue to use related RDAP data. whoisit will attempt to handle the RDAP data returned but there will be occasions when RDAP results change beyond what whoisit can parse. When using raw data you will need to parse the data yourself.

You can also write a fallback:

try:
    results = whoisit.domain('example.com')
    # Assume an error parsing the related RDAP data occurs here
except Exception as e:
    print(f'Failed to look up domain, trying fallback: {e}')
    results = whoisit.domain('example.com', follow_related=False)
    # Likely to succeed if the related RDAP data was the issue

Bootstrapping

whoisit needs to know which RDAP service to query for a resource. This information is provided by the IANA as bootstrapping information. Bootstrapping data simply says things like "this CIDR is allocated to ARIN, this CIDR is allocated to RIPE" and so on for all resources. The bootstrap data means you should be directly querying the correct RDAP server for your request at all times. You should cache the bootstrap information locally if you plan to make more than a single request otherwise you'll make additional requests to the IANA every time you run a query. Example bootstrap information caching:

import whoisit

print(whoisit.is_bootstrapped())  # -> False
whoisit.bootstrap()               # Slow, makes several HTTP requests to the IANA
print(whoisit.is_bootstrapped())  # -> True

# bootstrap_info returned here is a string of JSON serialised bootstap information
# You can store it in a memory cache or write it to disk for a few days
bootstrap_info = whoisit.save_bootstrap_data()

# Clear bootstrapping data
whoisit.clear_bootstrapping()

# Later, you can do
print(whoisit.is_bootstrapped())  # -> False
if not whoisit.is_bootstrapped():
    whoisit.load_bootstrap_data(bootstrap_info)  # Fast, no HTTP requests made
print(whoisit.is_bootstrapped())  # -> True

# For convenience internally whoisit stores a timestamp of when the bootstrap data was
# last updated and has a "is older than" helper method
if whoisit.bootstrap_is_older_than(days=3):
    # Bootstrap data was last updated over 3 days ago, refresh it
    whoisit.clear_bootstrapping()
    whoisit.bootstrap()
    bootstrap_info = whoisit.save_bootstrap_data()  # and save it to upload your cache

A reasonable suggested way to handle bootstrapping data would be to use Memcached or Redis, for example:

import whoisit
import redis

r = redis.Redis(host='localhost', port=6379, db=0)

bootstrap_info = r.get('whoisit_bootstrap_info')
if bootstrap_info:
    whoisit.load_bootstrap_data(bootstrap_info)
else:
    whoisit.bootstrap()
    bootstrap_info = whoisit.save_bootstrap_data()
    expire_in_3_days = 60 * 60 * 24 *3
    r.set('whoisit_bootstrap_info', bootstrap_info, ex=expire_in_3_days)

# Send queries as normal once bootstrapped
whoisit.asn(12345)

Some services, most notably TLDs, do have RDAP servers which may not be set properly in the IANA bootstrap data. whoisit maintains a record of these and can patch the IANA data to allow more TLDs to be queried. You can enable this with the overrides=True parameter when loading bootstrap data:

whoisit.bootstrap(overrides=True)

or

whoisit.load_bootstrap_data(bootstrap_info, overrides=True)

Important: when using the overrides you may recieve non-standard data, data that is not in the same format as officially listed IANA data and you may not recieve a copy of any required terms of service or terms of use. You will have to manually verify data returned by overridden endpoints.

Insecure (HTTP) RDAP endpoints

Some RDAP servers are only available over HTTP and not HTTPS. This is disabled by default. When you bootstrap whoisit a debug notice will be emitted for any RDAP endpoint that is not loaded because it is insecure. For example:

# Enable debug logging
import os
os.environ['DEBUG'] = 'true'
 # Load and boostrap whoisit
import whoisit
# > [datetime] bootstrap [DEBUG] Cleared bootstrap data
whoisit.bootstrap()
# > ... debug logs ...
# > [datetime] bootstrap [DEBUG] No valid RDAP service URLs could be parsed
#              from: ['http://cctld.uz:9000/'] (insecure scheme,
#              try whoisit.bootstrap(allow_insecure=True))
# > ... debug logs ...
# > [datetime] bootstrap [DEBUG] Bootstrapped

This line informs you that an RDAP endpoint has been skipped because it is only available over HTTP. You can opt-in to allow insecure endpoints by calling the bootstrap methods bootstrap() and load_bootstrap_data() with the optional allow_insecure=True argument. For example:

# Bootstrap with allowing insecure endpoints
whoisit.bootstrap(allow_insecure=True)

or

# Load saved bootstrap data with allowing insecure endpoints
whoisit.load_bootstrap_data(bootstrap_info, allow_insecure=True)

Response data

By default whoisit returns parsed, summary useful information. This information is simplified. This means that some information is lost from the raw, original data. For example, whoisit doesn't return the date that nameservers were last updated. If you need more information than whoisit returns by default remember to add raw=True to your query and parse the RDAP response yourself.

Data from whoisit is returned, where possible, as rich data types such as datetime, IPv4Network and IPv6Network objects.

The following values are returned for every successful response:

response = {
    'handle': str,               # Entity handle for the object, always set
    'parent_handle': str,        # Parent entity handle for the object
    'name': str,                 # Name of the object
    'whois_server': str,         # WHOIS server hostname object data can be found on
    'type': str,                 # Object type, such as autnum or domain
    'terms_of_service_url': str, # URL to the terms of service for using the object data
    'copyright_notice', str,     # Copyright notice for the object data
    'description': list,         # List of text lines that describe the object
    'last_changed_date': datetime or None,  # Date and time the object was last updated
    'registration_date': datetime or None,  # Date and time the object was registered
    'expiration_date': datetime or None,    # Date and time the object expires
    'rir': str,                  # Short name of the RIR for the object, such as 'arin'
    'url': str,                  # URL to the RDAP query which was made for this request
    'entities': dict,            # A dict of entities linked to the object
}

The entities dictionary has the following format, note there may be multiple entities for each role:

response['entities']['some_role'][] = { # Role names are strings, like 'registrant'
    'email': str,          # Email address of the entity
    'handle': str,         # Handle of the entity
    'name': str,           # Name of the entity
    'rir': str,            # Short name of the RIR where the entity is registered
    'type': str,           # Type of the entity, usually 'entity'
    'url': str,            # URL to an RDAP service to query this entity 
    'whois_server': str,   # WHOIS server hostname entity data can be found on
}

In addition to the default data for all responses listed above requests have additional extra fields in their responses, these are:

Additional ASN response data

# ASN response data includes all shared general response fields above and also:
response = {
    'asn_range': list,       # A list of the start and end range for an AS allocation
                             # For example, [123,134] or [123,123]
}

Additional domain response data

# Domain response data includes all shared general response fields above and also:
response = {
    'unicode_name': str,     # Domain name in unicode if available
    'nameservers': list,     # List of name servers for the domain as strings
    'status': list,          # List of the domain states as strings
}

Additional IP response data

# IP response data includes all shared general response fields above and also:
response = {
    'country': str,          # Two letter country code for the IP block
    'ip_version': int,       # 4 or 6 to denote the IP version
    'assignment_type': str,  # Assignment type, such as 'assigned portable'
    'network': IPvXNetwork,  # A IPv4Network or IPv6Network object for the prefix
}

Additional entity response data

# Entity response data includes all shared general response fields above and also:
response = {
    'email': str,            # If the entity as a root vcard the email address
}

Full response example

A full example response for an IP query for the IPv4 address 1.1.1.1:

import whoisit
whoisit.bootstrap()
response = whoisit.ip('1.1.1.1')
print(response)
{
    'handle': '1.1.1.0 - 1.1.1.255',
    'parent_handle': '',
    'name': 'APNIC-LABS',
    'whois_server': 'whois.apnic.net',
    'type': 'ip network',
    'terms_of_service_url': 'http://www.apnic.net/db/dbcopyright.html',
    'copyright_notice': '',
    'description': [
        'APNIC and Cloudflare DNS Resolver project',
        'Routed globally by AS13335/Cloudflare',
        'Research prefix for APNIC Labs'
    ],
    'last_changed_date': datetime.datetime(2020, 7, 15, 13, 10, 57, tzinfo=tzutc()),
    'registration_date': None,
    'expiration_date': None,
    'url': 'https://rdap.apnic.net/ip/1.1.1.0/24',
    'rir': 'apnic',
    'entities': {
        'abuse': [
            {
                'handle': 'IRT-APNICRANDNET-AU',
                'url': 'https://rdap.apnic.net/entity/IRT-APNICRANDNET-AU',
                'type': 'entity',
                'name': 'IRT-APNICRANDNET-AU',
                'email': '[email protected]',
                'rir': 'apnic'
            }
        ],
        'administrative': [
            {
                'handle': 'AR302-AP',
                'url': 'https://rdap.apnic.net/entity/AR302-AP',
                'type': 'entity',
                'name': 'APNIC RESEARCH',
                'email': '[email protected]',
                'rir': 'apnic'
            }
        ],
        'technical': [
            {
                'handle': 'AR302-AP',
                'url': 'https://rdap.apnic.net/entity/AR302-AP',
                'type': 'entity',
                'name': 'APNIC RESEARCH',
                'email': '[email protected]',
                'rir': 'apnic'
        ]
    },
    'country': 'AU',
    'ip_version': 4,
    'assignment_type': 'assigned portable',
    'network': IPv4Network('1.1.1.0/24')
}

Full API synopsis

whoisit.is_bootstrapped() -> bool

Returns boolean True or False if your whoisit instance is bootstrapped or not.

whoisit.bootstrap(overrides=bool, allow_insecure=bool) -> bool

Bootstraps your whoisit instance with remote IANA bootstrap information. Returns True or raises a whoisit.errors.BootstrapError exception if it fails. This method makes HTTP requests to the IANA.

whoisit.clear_bootstrapping() -> bool

Clears any stored bootstrap information. Always returns boolean True.

whoisit.save_bootstrap_data() -> str

Returns a string of JSON serialised bootstrap information if any is loaded. If no bootstrap information loaded a whoisit.errors.BootstrapError will be raised.

whoisit.load_bootstrap_data(data=str, overrides=bool, allow_insecure=bool) -> bool

Loads a string of JSON serialised bootstrap data as returned by save_bootstrap_data(). Returns True if the data is loaded or raises a whoisit.errors.BootstrapError if loading fails.

whoisit.bootstrap_is_older_than(days=int) -> bool

Tests if the loaded bootstrap data is older than the specified number of days as an integer. Returns True or False. If no bootstrap information is loaded a whoisit.errors.BootstrapError exception will be raised.

whoisit.asn(asn=int, rir=str, raw=bool, allow_insecure_ssl=bool) -> dict

Queries a remote RDAP server for information about the specified AS number. AS number must be an integer. Returns a dict of information. If raw=True is passed a large dict of the raw RDAP response will be returned. If the query fails a whoisit.errors.QueryError exception will be raised. If no bootstrap data is loaded a whoisit.errors.BootstrapError exception will be raised. if allow_insecure_ssl=True is passed the RDAP queries will allow weaker SSL handshakes. Examples:

whoisit.asn(12345)
whoisit.asn(12345, rir='arin')
whoisit.asn(12345, raw=True)
whoisit.asn(12345, rir='arin', raw=True)
whoisit.asn(12345, allow_insecure_ssl=True)

whoisit.domain(domain=str, raw=bool, allow_insecure_ssl=bool) -> dict

Queries a remote RDAP server for information about the specified domain name. The domain name must be a string and in a valid domain name "something.tld" style format. Returns a dict of information. If raw=True is passed a large dict of the raw RDAP response will be returned. If the query fails a whoisit.errors.QueryError exception will be raised. If no bootstrap data is loaded a whoisit.errors.BootstrapError exception will be raised. If the TLD is unsupported a whoisit.errors.UnsupportedError exception will be raised. if allow_insecure_ssl=True is passed the RDAP queries will allow weaker SSL handshakes. Note that not all TLDs are supported, only some have RDAP services! Examples:

whoisit.domain('example.com')
whoisit.domain('example.com', raw=True)
whoisit.domain('example.com', allow_insecure_ssl=True)

whoisit.ip(ip="1.1.1.1", rir=str, raw=bool, allow_insecure_ssl=bool) -> dict

Queries a remote RDAP server for information about the specified IP address or CIDR. The IP address or CIDR must be a string and in the correct IP address or CIDR format or any one of IPv4Address, IPv4Network, IPv6Address or IPv6Network objects. Returns a dict of information. If raw=True is passed a large dict of the raw RDAP response will be returned. If the query fails a whoisit.errors.QueryError exception will be raised. If no bootstrap data is loaded a whoisit.errors.BootstrapError exception will be raised. if allow_insecure_ssl=True is passed the RDAP queries will allow weaker SSL handshakes. Examples:

whoisit.ip('1.1.1.1')
whoisit.ip('1.1.1.1', rir='apnic')
whoisit.ip('1.1.1.1', raw=True, rir='apnic')
whoisit.ip('1.1.1.0/24')
whoisit.ip(IPv4Address('1.1.1.1'))
whoisit.ip(IPv4Network('1.1.1.0/24'))
whoisit.ip(IPv6Address('2001:4860:4860::8888'))
whoisit.ip(IPv6Network('2001:4860::/32'), rir='arin')
whoisit.ip('1.1.1.1', allow_insecure_ssl=True)

whoisit.entity(entity=str, rir=str, raw=bool, allow_insecure_ssl=bool) -> dict

Queries a remote RDAP server for information about the specified entity name. The entity name must be a string and in the correct entity format. Returns a dict of information. If raw=True is passed a large dict of the raw RDAP response will be returned. If the query fails a whoisit.errors.QueryError exception will be raised. If no bootstrap data is loaded a whoisit.errors.BootstrapError exception will be raised. if allow_insecure_ssl=True is passed the RDAP queries will allow weaker SSL handshakes. Examples:

whoisit.entity('ZG39-ARIN')
whoisit.entity('ZG39-ARIN', rir='arin')
whoisit.entity('ZG39-ARIN', rir='arin', raw=True)
whoisit.entity('ZG39-ARIN', allow_insecure_ssl=True)

Data usage

All data returned by RDAP servers are covered by the various policies embeddd in the results. As such you should carefuly review your usage of the data to make sure it complies with the policy of the RDAP server you are querying.

Excessive use

As an API client whoisit is entirely subject to the resource and request limits applied by the remote RDAP servers it queries. If you recieve request errors for rate limiting you should slow down your requests. Different servers have different limits. The LACNIC RDAP server in particular only permits a low number of requests per minute.

Tests

There is a test suite that you can run by cloning this repository, installing the required dependancies and execuiting:

$ make test

Debugging

whoisit will check for a DEBUG environment variable and if set, will output debug logs that detail the internals for the bootstrapping, requests and parsing operations. If you want to enable debug logging, set DEBUG=true (or 1 or y etc.). For example:

$ export DEBUG=true
$ python3 some-script-that-uses-whoisit.py

Contributing

All properly formatted and sensible pull requests, issues and comments are welcome.

whoisit's People

Contributors

ashdwilson avatar bpereto avatar case avatar case-fastly avatar droe avatar dumkydewilde avatar frntn avatar hughpyle avatar meeb avatar namper avatar zalaxx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

whoisit's Issues

urllib3 2.x complains about DEFAULT_CIPHERS

It looks like urllib3 2.x no longer includes DEFAULT_CIPHERS for SSL functionality, and an SSL context is preferred.

http_request in utils.py references DEFAULT_CIPHERS and an error is thrown when attempting to using a version of requests equal to or greater than 2.30.0 and a version of urllib3 greater than 1.x.

https://github.com/meeb/whoisit/blob/main/whoisit/utils.py#L45C6-L64

Should the ssl module and ssl.create_default_context() be used to set the value?

For example, something like the following in utils.py. I haven't tested it.

import ssl
ssl_context = ssl.create_default_context()
< Optionally set ciphers >
ssl_context.set_ciphers('...')
<>
request.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = ssl_context.get_ciphers()

Thanks!
Wes

Mutable defaults parameters en request wrapper

Hi! Just a recommendation, in utils.py this method is using mutable defaults parameters, which are not a good practice.

def http_request(session, url, method='GET', allow_insecure_ssl=False,
                 headers={}, data={}, *args, **kwargs):

Source/explanation

A simple fix should be

def http_request(session, url, method='GET', allow_insecure_ssl=False,
                 headers=None, data=None, *args, **kwargs):
    headers = headers or {}
    data = data or {}

IP address lookup returns only the first CIDR on the list

Steps to reproduce

Make a query for ip 13.104.0.0.

>>> import whoisit
>>> from pprint import pprint
>>> whoisit.bootstrap()
True
>>> results = whoisit.ip('13.104.0.0')
>>> print(results['network'])
13.64.0.0/11

Current behavior

The result contains only first CIDR, 13.64.0.0/11 which doesn't match the queried IP address.

Expected behavior

The result should contain all CIDRs: 13.64.0.0/11, 13.96.0.0/13, 13.104.0.0/14

Other comments
https://rdap.arin.net/registry/ip/13.104.0.0

https://search.arin.net/rdap/?query=13.104.0.0

Issue with urllib3 using requests 2.31.0

Thanks for all of your work on this project!

Since Requests v2.3.0, Requests has been vulnerable to potentially leaking Proxy-Authorization headers to destination servers, specifically during redirects to an HTTPS origin. This is a product of how rebuild_proxies is used to recompute and reattach the Proxy-Authorization header to requests when redirected. Note this behavior has only been observed to affect proxied requests when credentials are supplied in the URL user information component (e.g. https://username:password@proxy:8080).

Version 2.31.0 is now available to patch the issue.

As of requests 2.30.0, it appears that urllib3 2.x is required:

https://requests.readthedocs.io/en/latest/community/updates/

2.30.0 (2023-05-03)
Dependencies - ⚠️ Added support for urllib3 2.0. ⚠️

This may contain minor breaking changes so we advise careful testing and reviewing https://urllib3.readthedocs.io/en/latest/v2-migration-guide.html prior to upgrading.

Users who wish to stay on urllib3 1.x can pin to urllib3<2.

Using the current version that is specified as a dependency for whoisit causes a conflict.

Would it be possible to update the dependency?

Thanks!
Wes

Benchmarking whoisit against registrant name

Parsing Whois data is hard, especially because the format differ depending on the TLD. I've a specific issue with registrant value. So I decided to test multiple python whois library (pythonwhoisalt, asyncwhois, whoisit, whoisdomain) against the registrant field using google's domain dataset and check for their speed too. Initially, I was using whoisdomain, so here the initial post on its github: mboot-github/WhoisDomain#21

Here the script I wrote: https://gist.github.com/baderdean/cc4643ecd95d3ccde31dee80ebdbea28

For whoisit, it's pretty normal only a few TLDs are supported because of RDAP support itself, yet only 3% looks to me too few. Did I miss something?

And here the results:

{'asyncwhois': {'count': 49,
                'duration': 285.84409061399947,
                'percentage': 26,},
 'whoisdomain': {'count': 44,
                 'duration': 195.54051797400007,
                 'percentage': 24},
 'whoisit': {'count': 6,
             'duration': 27.91238160300054,
             'percentage': 3,},
'pythonwhoisalt': {'count': 7,
                    'duration': 1055.0711162629996,
                    'percentage': '4%'}}

PS: I've created similar issues in other projects as well.

Failure when sld require different url than tld

In a number of countries, GL for one, in addition to the TLD 'gl', an SLD is used for domains. The effective 'tld' to be handled by whoisit then becomes something like 'co.gl' or 'net.gl' which it does not currently support.

I have set this up with custom overrides like this:

whoisit.overrides.iana_overrides['domain'] = {
    'gl': ['https://rdap.centralnic.com/gl/'],
    'co.gl': ['https://rdap.centralnic.com/co.gl/'],
    'net.gl': ['https://rdap.centralnic.com/net.gl/']
}

I need something like this, because CentralNIC requires different base urls for these slds. Due to how the lookup is handled in get_domain_endpoint however, it does not currently work.

Implement an async mode

As all remote interactions from whoisit are just basic HTTP requests, with the exception of the additional custom SSL handshake options, an interface should be added to expose an async mode. aiohttp would be a reasonably low overhead drop-in replacement for the current requests sessions. This would add an extra dependency on aiohttp and that aiohttp has compiled extensions so async mode would need to be opt-in to maintain pure Python support.

No valid RDAP service URLs could be parsed

When starting my app, I get this message from whoisit:

Python310\site-packages\whoisit\bootstrap.py", line 206, in validate_rdap_urls
raise BootstrapError(f'No valid RDAP service URLs could be parsed '
whoisit.errors.BootstrapError: No valid RDAP service URLs could be parsed from: ['http://cctld.uz:9000/']

I'm running this under Python 3.10.5 under Windows, but I also see it in my Docker container.

I suspect it's related to the port in the URL, but I haven't done any testing to confirm this.

RDAP server connection error handling (for discussion)

@meeb Thank you for this fantastic library! It was quick and easy to get up and running.

I found an edge case that might be worth handling differently (or not!), and wanted to share here for the sake of discussion. e.g. this might warrant a small note in the Readme, rather than a code change. Here's the tl;dr:

  • https://rdap.nic.build/ is listed in the IANA bootstrap data, for the .build TLD.
  • This URL is deprecated (see below), and no longer has an active RDAP server responding there.
  • whoisit is currently returning the following error when attempting to connect to it:
raise QueryError(f'Failed to make a {method} request to {url}: {e}') from e

whoisit.errors.QueryError: Failed to make a GET request to https://rdap.nic.build/domain/tailwind.build:

HTTPSConnectionPool(host='rdap.nic.build', port=443):

Max retries exceeded with url: /domain/tailwind.build

(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x10632c430>:

Failed to establish a new connection:

[Errno 8] nodename nor servname provided, or not known'))

Digging deeper:

.build migrated to a different Registry backend provider (CentralNic) back on 2020-11-24:

…and I'm assuming there's a bug on CentralNic's side that hasn't updated their IANA data yet. I let them know here: https://twitter.com/gavinbrown/status/1465049265978413062?s=21 (their CTO responded, fortunately).

Here's the actual RDAP server for .build: https://rdap.centralnic.com/build/domain/tailwind.build

I looked into catching this specific exception in utils.py, e.g. via except requests.exceptions.ConnectionError as e: and a new class ConnectionError(WhoisItError), but it looks like whoisit.errors.QueryError is already catching this.

I also checked to see how the httpx library (Requests' possible successor?) handles this, and it's similar:

<snip>
httpx.ConnectError: [Errno 8] nodename nor servname provided, or not known

This Requests issue seemed related as well, but I don't think the issue with rdap.nic.build is DNS-related.

Apologies that this is sort of long-winded and rambling; this might be minor and/or rare enough that it can be ignored entirely? Or maybe worth adding a bit more fine-grained exception handling, to make debugging easier?

I could also submit a PR with a note for the Readme, e.g. something like this in the Debugging section:

RDAP server connection issues: There can be transient issues connecting to RDAP servers, but it's also possible that the IANA bootstrap data can be incorrect. For example, a TLD can migrate from one registry backend provider to another, and there can be a delay in updating the authoritative RDAP URLs (if they change) with IANA / ICANN.

Can the route information be returned when using whois + IP?

hi
When using the whois command under Linux, there will be route-related information. This part is not included in the information returned by whoisit. Is it possible to obtain this information? Thanks

For example

whois 14.204.0.0/15

At the end there is the following information:

**% Information related to '14.204.0.0/15AS4837'

route: 14.204.0.0/15
descr: China Unicom Yunnan Province Network
country: CN
origin: AS4837
mnt-by: MAINT-CNCGROUP-RR
last-modified: 2010-09-26T02:26:02Z
source: APNIC**

% This query was served by the APNIC Whois Service version 1.88.25 (WHOIS-AU3)

domain entity: handle requirement

Hi,

saw your library today and was amazed. cool work - also for the possibility to override the bootstrap. (nic.ch is also not yet submitted in iana ;-) )

I see a problem with the requirement of a handle in the domain parsing:

if not handle:

The rdap response profile defines this for domains:
https://www.icann.org/en/system/files/files/rdap-response-profile-15feb19-en.pdf

Section 3.2 - for registries

Contacts (Admin, Technical) - The RDAP response SHOULD contain at least two
entities​, with the ​ administrative​ and ​ technical​ roles respectively within the ​ entity
with the ​ registrar​ role. The ​ entities​ with the ​ administrative​ and ​ technical​ roles
MUST contain valid ​ fn​, ​ tel​, ​ email​ members, and MAY contain a ​ handle and a
valid ​ adr​ element

so entities can MAY contain a handle for the admin and technical role. (its a MUST for the registrar, but not for these two).
can we remove the enforcing of a handle in the extract_entities?

setup.py itself depends on python-dateutil

pip install whoisit in a fresh virtualenv fails:

% pip install whoisit
Collecting whoisit
  Using cached whoisit-2.5.1.tar.gz (31 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [10 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/var/folders/6y/xrb0tm811kl8v3jt6_7cz0q80000gn/T/pip-install-3p_w5gii/whoisit_1978a29a75af4988a86dd13a17a220a1/setup.py", line 4, in <module>
          from whoisit.version import version
        File "/private/var/folders/6y/xrb0tm811kl8v3jt6_7cz0q80000gn/T/pip-install-3p_w5gii/whoisit_1978a29a75af4988a86dd13a17a220a1/whoisit/__init__.py", line 4, in <module>
          from .parser import parse
        File "/private/var/folders/6y/xrb0tm811kl8v3jt6_7cz0q80000gn/T/pip-install-3p_w5gii/whoisit_1978a29a75af4988a86dd13a17a220a1/whoisit/parser.py", line 3, in <module>
          from dateutil.parser import parse as dateutil_parse
      ModuleNotFoundError: No module named 'dateutil'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

because setup.py imports code that uses python-dateutil it actually should be either specified in setup_requires in addition to install_requires (see https://setuptools.pypa.io/en/latest/references/keywords.html ) or a pyproject.toml as defined in PEP-518 (https://peps.python.org/pep-0518/ )

Feature Request: Add support for unicodeName

Domains with unicode characters have an additional field named unicodeName, which contains their name nicely formatted. It would be great to be able to parse it automatically, so that we can retrieve it easily if needed!

Example of such domain: château-de-la-branchoire.net (xn--chteau-de-la-branchoire-j6b.net)
Corresponding RDAP query: https://rdap.nameshield.net/domain/xn--chteau-de-la-branchoire-j6b.net
And here is the RFC regarding this field: https://www.rfc-editor.org/rfc/rfc9083.html#section-5.2-14.4

Some RDAP servers also seems to have both field even if there is no unicode characters (ex: https://rdap.publicinterestregistry.org/rdap/domain/kernel.org).

Some rdap servers seem to exhibit weak security, which throws errors

Hi there, thanks for this library. it's super simple and works great so far.

I've bumped into some cases where the "whois server" that ends up being poked has pretty weak security and then and an exception gets raised by one of the underlying libraries because of this.

Of course I can handle this exception in my code, but I was wondering if you think it might be reasonable to make whoisit avoid those errors as much as possible by enabling weaker security settings selectively when this exception is thrown.

However, this kind of downgrade behaviour is somehwat nasty in terms of security, so maybe if it is something that can be implemented by whoisit, it would need some method to disable this automatic downgrade.. or maybe better, not make it default but enable it explicitely with an option that needs to be set to True

Here's how you can reproduce this:

>>> import whoisit
>>> whoisit.bootstrap()
>>> whoisit.domain("frame.work")
---------------------------------------------------------------------------
SSLError                                  Traceback (most recent call last)
~/dev/project/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout
, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    698             # Make the request on the httplib connection object.
--> 699             httplib_response = self._make_request(
    700                 conn,

~/dev/project/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    381         try:
--> 382             self._validate_conn(conn)
    383         except (SocketTimeout, BaseSSLError) as e:

~/dev/project/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn)
   1009         if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
-> 1010             conn.connect()
   1011

~/dev/project/.venv/lib/python3.9/site-packages/urllib3/connection.py in connect(self)
    410
--> 411         self.sock = ssl_wrap_socket(
    412             sock=conn,

~/dev/project/.venv/lib/python3.9/site-packages/urllib3/util/ssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version,
 ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls)
    448     if send_sni:
--> 449         ssl_sock = _ssl_wrap_socket_impl(
    450             sock, context, tls_in_tls, server_hostname=server_hostname

~/dev/project/.venv/lib/python3.9/site-packages/urllib3/util/ssl_.py in _ssl_wrap_socket_impl(sock, ssl_context, tls_in_tls, server_hostname)
    492     if server_hostname:
--> 493         return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
    494     else:

/usr/lib/python3.9/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    499         # ctx._wrap_socket()
--> 500         return self.sslsocket_class._create(
    501             sock=sock,

/usr/lib/python3.9/ssl.py in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
   1039                         raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
-> 1040                     self.do_handshake()
   1041             except (OSError, ValueError):

/usr/lib/python3.9/ssl.py in do_handshake(self, block)
   1308                 self.settimeout(None)
-> 1309             self._sslobj.do_handshake()
   1310         finally:

SSLError: [SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1123)

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
~/dev/project/.venv/lib/python3.9/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    438             if not chunked:
--> 439                 resp = conn.urlopen(
    440                     method=request.method,

~/dev/project/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout
, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    754
--> 755             retries = retries.increment(
    756                 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]

~/dev/project/.venv/lib/python3.9/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    573         if new_retry.is_exhausted():
--> 574             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    575

MaxRetryError: HTTPSConnectionPool(host='rdap.nominet.uk', port=443): Max retries exceeded with url: /work/domain/frame.work (Caused by SSLError(SSLError(1, '[SSL: DH_KEY
_TOO_SMALL] dh key too small (_ssl.c:1123)')))

During handling of the above exception, another exception occurred:

SSLError                                  Traceback (most recent call last)
~/dev/project/.venv/lib/python3.9/site-packages/whoisit/utils.py in http_request(url, method, headers, data, *args, **kwargs)
     21     try:
---> 22         return requests.request(method, url, headers=headers, data=data, *args,
     23                                 **kwargs)

~/dev/project/.venv/lib/python3.9/site-packages/requests/api.py in request(method, url, **kwargs)
     60     with sessions.Session() as session:
---> 61         return session.request(method=method, url=url, **kwargs)
     62

~/dev/project/.venv/lib/python3.9/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redi
rects, proxies, hooks, stream, verify, cert, json)
    541         send_kwargs.update(settings)
--> 542         resp = self.send(prep, **send_kwargs)
    543

~/dev/project/.venv/lib/python3.9/site-packages/requests/sessions.py in send(self, request, **kwargs)
    654         # Send the request
--> 655         r = adapter.send(request, **kwargs)
    656

~/dev/project/.venv/lib/python3.9/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    513                 # This branch is for urllib3 v1.22 and later.
--> 514                 raise SSLError(e, request=request)
    515

SSLError: HTTPSConnectionPool(host='rdap.nominet.uk', port=443): Max retries exceeded with url: /work/domain/frame.work (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_
SMALL] dh key too small (_ssl.c:1123)')))

The above exception was the direct cause of the following exception:

QueryError                                Traceback (most recent call last)
<ipython-input-23-0d70ed62a17b> in <module>
----> 1 whoisit.domain("frame.work")

~/dev/project/.venv/lib/python3.9/site-packages/whoisit/__init__.py in domain(domain_name, raw)
     39         query_type='domain', query_value=domain_name)
     40     q = Query(method, url)
---> 41     response = q.request()
     42     return response if raw else parse(_bootstrap, 'domain', response)
     43

~/dev/project/.venv/lib/python3.9/site-packages/whoisit/query.py in request(self, *args, **kwargs)
    178     def request(self, *args, **kwargs):
    179         # args and kwargs here are passed directly to requests.request(...)
--> 180         response = http_request(self.url, self.method, *args, **kwargs)
    181         if response.status_code == 404:
    182             raise ResourceDoesNotExist(f'RDAP {self.method} request to {self.url} '

~/dev/project/.venv/lib/python3.9/site-packages/whoisit/utils.py in http_request(url, method, headers, data, *args, **kwargs)
     23                                 **kwargs)
     24     except Exception as e:
---> 25         raise QueryError(f'Failed to make a {method} request to {url}: {e}') from e
     26
     27

QueryError: Failed to make a GET request to https://rdap.nominet.uk/work/domain/frame.work: HTTPSConnectionPool(host='rdap.nominet.uk', port=443): Max retries exceeded wi
th url: /work/domain/frame.work (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1123)')))

Parsing of address, telephone and individual name ("n") field

Hey there

First, thank you for this project. I set out to do this about 18 months ago and got sidetracked and settled for partial (and not very elegant) parsing of the vCard data. I'm very appreciative of your work as the data I'm dealing with is both very large (thousands of unique networks) as well as very dispersed- requiring queries to practically every major RIR as well as almost all of the regional RIRs (or at least the ones that have implemented RDAP)

Second, the real reason for entering this, what interest do you have in parsing the following vCard values:

  • adr (address, 4 element array of strings)
  • tel (a little more complex but still reasonably simple, a dict with a list, and plain text)
  • n (4-element array of text, possibly a bit more if done fully)

More direct questions that are more easily answerable if you have the time:

  1. Is there a reason you didn't implement the parsing of these fields? I'm speculating you just didn't think the majority of people wanted or needed that level of data, as opposed to them being added complexity you wanted to avoid, but I've considered both as possibilities
  2. Are you interested in a PR that adds parsing of these fields? I have this in a fork, it's not terribly invasive, and though it requires more testing before ever going into a stable branch, it's working well for me. I have a need for this data, hence adding it
  3. Something I can look into myself, but have you tested this much against APNIC (and the regional RIRs within AsiaPac- China/Japan have a few) or any of the other non-North America RIRs (AFRINIC, LACNIC, RIPE)
  4. To what degree are you interested in performing additional normalization or parsing of some of these fields where applicable? For example, passing tel or adr through a third-party package specializing in those respective types to normalize them? Either as an optional "core" part of the library, or as a Python/setuptools "extra"?

Any answers you can provide are appreciated but if you're too busy to deal with this it's not a problem. I'm happy to work off my own fork for the moment and check in later once it's been more rigorously tested

Thanks again

Follow related links

Many RDAP requests are returned with a related link containing more information about the domain. For example, the .com domain is handled by Verisign, but Verisign might return a related link to the registrar for more information. For example, looking up amazon.com will yield a link to the registrar (MarkMonitor) with more information.

I feel like this might be a nice addition for convenience, maybe by adding an optional parameter for following relations. I currently have a super simple implementation as follows, which definitely has room for improvement. If you are interested or have additional thoughts, I am happy to create a PR.

tld_domain = "amazon.com"
result = whoisit.domain(tld_domain, raw=True)

for link in result.get("links", []):
    if link["rel"] in ["related", "registration"]:
        res = requests.get(link.get("href", ""))
        result = res.json() # overwrite result if a related link was followed

parsed_result = whoisit.parse(whoisit._bootstrap, "domain", tld_domain, result)

whoisit does not find the RDAP endpoint for UK domains

Hello,

It seems that there is some issue with uk domains

>>> whoisit.domain("citizensadvice.org.uk")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fl/.local/lib/python3.9/site-packages/whoisit/__init__.py", line 46, in domain
    method, url, exact_match = build_query(
  File "/home/fl/.local/lib/python3.9/site-packages/whoisit/query.py", line 57, in build
    url, exact_match = fetcher(query_value)
  File "/home/fl/.local/lib/python3.9/site-packages/whoisit/query.py", line 95, in get_domain_endpoint
    endpoints, exact_match = self.bootstrap.get_dns_endpoints(tld)
  File "/home/fl/.local/lib/python3.9/site-packages/whoisit/bootstrap.py", line 319, in get_dns_endpoints
    raise UnsupportedError(f'TLD "{tld}" has no known RDAP endpoint '
whoisit.errors.UnsupportedError: TLD "uk" has no known RDAP endpoint and is unsupported

Yet, .uk tld has an RDAP endpoint: https://rdap.nominet.uk/uk/domain/citizensadvice.org.uk

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.