meeb / whoisit Goto Github PK
View Code? Open in Web Editor NEWA Python library to RDAP WHOIS-like services for internet resources such as ASNs, IPs, CIDRs and domains
License: BSD 3-Clause "New" or "Revised" License
A Python library to RDAP WHOIS-like services for internet resources such as ASNs, IPs, CIDRs and domains
License: BSD 3-Clause "New" or "Revised" License
Thanks for all of your work on this project!
Since Requests v2.3.0, Requests has been vulnerable to potentially leaking Proxy-Authorization headers to destination servers, specifically during redirects to an HTTPS origin. This is a product of how rebuild_proxies is used to recompute and reattach the Proxy-Authorization header to requests when redirected. Note this behavior has only been observed to affect proxied requests when credentials are supplied in the URL user information component (e.g. https://username:password@proxy:8080).
Version 2.31.0 is now available to patch the issue.
As of requests 2.30.0, it appears that urllib3 2.x is required:
https://requests.readthedocs.io/en/latest/community/updates/
2.30.0 (2023-05-03)
Dependencies - ⚠️ Added support for urllib3 2.0. ⚠️
This may contain minor breaking changes so we advise careful testing and reviewing https://urllib3.readthedocs.io/en/latest/v2-migration-guide.html prior to upgrading.
Users who wish to stay on urllib3 1.x can pin to urllib3<2.
Using the current version that is specified as a dependency for whoisit causes a conflict.
Would it be possible to update the dependency?
Thanks!
Wes
@meeb Thank you for this fantastic library! It was quick and easy to get up and running.
I found an edge case that might be worth handling differently (or not!), and wanted to share here for the sake of discussion. e.g. this might warrant a small note in the Readme, rather than a code change. Here's the tl;dr:
https://rdap.nic.build/
is listed in the IANA bootstrap data, for the .build
TLD.whoisit
is currently returning the following error when attempting to connect to it:raise QueryError(f'Failed to make a {method} request to {url}: {e}') from e
whoisit.errors.QueryError: Failed to make a GET request to https://rdap.nic.build/domain/tailwind.build:
HTTPSConnectionPool(host='rdap.nic.build', port=443):
Max retries exceeded with url: /domain/tailwind.build
(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x10632c430>:
Failed to establish a new connection:
[Errno 8] nodename nor servname provided, or not known'))
Digging deeper:
.build
migrated to a different Registry backend provider (CentralNic) back on 2020-11-24:
…and I'm assuming there's a bug on CentralNic's side that hasn't updated their IANA data yet. I let them know here: https://twitter.com/gavinbrown/status/1465049265978413062?s=21 (their CTO responded, fortunately).
Here's the actual RDAP server for .build
: https://rdap.centralnic.com/build/domain/tailwind.build
I looked into catching this specific exception in utils.py
, e.g. via except requests.exceptions.ConnectionError as e:
and a new class ConnectionError(WhoisItError)
, but it looks like whoisit.errors.QueryError
is already catching this.
I also checked to see how the httpx library (Requests' possible successor?) handles this, and it's similar:
<snip>
httpx.ConnectError: [Errno 8] nodename nor servname provided, or not known
This Requests issue seemed related as well, but I don't think the issue with rdap.nic.build
is DNS-related.
Apologies that this is sort of long-winded and rambling; this might be minor and/or rare enough that it can be ignored entirely? Or maybe worth adding a bit more fine-grained exception handling, to make debugging easier?
I could also submit a PR with a note for the Readme, e.g. something like this in the Debugging section:
RDAP server connection issues: There can be transient issues connecting to RDAP servers, but it's also possible that the IANA bootstrap data can be incorrect. For example, a TLD can migrate from one registry backend provider to another, and there can be a delay in updating the authoritative RDAP URLs (if they change) with IANA / ICANN.
Steps to reproduce
Make a query for ip 13.104.0.0.
>>> import whoisit
>>> from pprint import pprint
>>> whoisit.bootstrap()
True
>>> results = whoisit.ip('13.104.0.0')
>>> print(results['network'])
13.64.0.0/11
Current behavior
The result contains only first CIDR, 13.64.0.0/11 which doesn't match the queried IP address.
Expected behavior
The result should contain all CIDRs: 13.64.0.0/11, 13.96.0.0/13, 13.104.0.0/14
Other comments
https://rdap.arin.net/registry/ip/13.104.0.0
Many RDAP requests are returned with a related link containing more information about the domain. For example, the .com
domain is handled by Verisign, but Verisign might return a related link to the registrar for more information. For example, looking up amazon.com will yield a link to the registrar (MarkMonitor) with more information.
I feel like this might be a nice addition for convenience, maybe by adding an optional parameter for following relations. I currently have a super simple implementation as follows, which definitely has room for improvement. If you are interested or have additional thoughts, I am happy to create a PR.
tld_domain = "amazon.com"
result = whoisit.domain(tld_domain, raw=True)
for link in result.get("links", []):
if link["rel"] in ["related", "registration"]:
res = requests.get(link.get("href", ""))
result = res.json() # overwrite result if a related link was followed
parsed_result = whoisit.parse(whoisit._bootstrap, "domain", tld_domain, result)
When starting my app, I get this message from whoisit:
Python310\site-packages\whoisit\bootstrap.py", line 206, in validate_rdap_urls
raise BootstrapError(f'No valid RDAP service URLs could be parsed '
whoisit.errors.BootstrapError: No valid RDAP service URLs could be parsed from: ['http://cctld.uz:9000/']
I'm running this under Python 3.10.5 under Windows, but I also see it in my Docker container.
I suspect it's related to the port in the URL, but I haven't done any testing to confirm this.
Hi there, thanks for this library. it's super simple and works great so far.
I've bumped into some cases where the "whois server" that ends up being poked has pretty weak security and then and an exception gets raised by one of the underlying libraries because of this.
Of course I can handle this exception in my code, but I was wondering if you think it might be reasonable to make whoisit avoid those errors as much as possible by enabling weaker security settings selectively when this exception is thrown.
However, this kind of downgrade behaviour is somehwat nasty in terms of security, so maybe if it is something that can be implemented by whoisit, it would need some method to disable this automatic downgrade.. or maybe better, not make it default but enable it explicitely with an option that needs to be set to True
Here's how you can reproduce this:
>>> import whoisit
>>> whoisit.bootstrap()
>>> whoisit.domain("frame.work")
---------------------------------------------------------------------------
SSLError Traceback (most recent call last)
~/dev/project/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout
, pool_timeout, release_conn, chunked, body_pos, **response_kw)
698 # Make the request on the httplib connection object.
--> 699 httplib_response = self._make_request(
700 conn,
~/dev/project/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
381 try:
--> 382 self._validate_conn(conn)
383 except (SocketTimeout, BaseSSLError) as e:
~/dev/project/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn)
1009 if not getattr(conn, "sock", None): # AppEngine might not have `.sock`
-> 1010 conn.connect()
1011
~/dev/project/.venv/lib/python3.9/site-packages/urllib3/connection.py in connect(self)
410
--> 411 self.sock = ssl_wrap_socket(
412 sock=conn,
~/dev/project/.venv/lib/python3.9/site-packages/urllib3/util/ssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version,
ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls)
448 if send_sni:
--> 449 ssl_sock = _ssl_wrap_socket_impl(
450 sock, context, tls_in_tls, server_hostname=server_hostname
~/dev/project/.venv/lib/python3.9/site-packages/urllib3/util/ssl_.py in _ssl_wrap_socket_impl(sock, ssl_context, tls_in_tls, server_hostname)
492 if server_hostname:
--> 493 return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
494 else:
/usr/lib/python3.9/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
499 # ctx._wrap_socket()
--> 500 return self.sslsocket_class._create(
501 sock=sock,
/usr/lib/python3.9/ssl.py in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
1039 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
-> 1040 self.do_handshake()
1041 except (OSError, ValueError):
/usr/lib/python3.9/ssl.py in do_handshake(self, block)
1308 self.settimeout(None)
-> 1309 self._sslobj.do_handshake()
1310 finally:
SSLError: [SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1123)
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last)
~/dev/project/.venv/lib/python3.9/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
438 if not chunked:
--> 439 resp = conn.urlopen(
440 method=request.method,
~/dev/project/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout
, pool_timeout, release_conn, chunked, body_pos, **response_kw)
754
--> 755 retries = retries.increment(
756 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
~/dev/project/.venv/lib/python3.9/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
573 if new_retry.is_exhausted():
--> 574 raise MaxRetryError(_pool, url, error or ResponseError(cause))
575
MaxRetryError: HTTPSConnectionPool(host='rdap.nominet.uk', port=443): Max retries exceeded with url: /work/domain/frame.work (Caused by SSLError(SSLError(1, '[SSL: DH_KEY
_TOO_SMALL] dh key too small (_ssl.c:1123)')))
During handling of the above exception, another exception occurred:
SSLError Traceback (most recent call last)
~/dev/project/.venv/lib/python3.9/site-packages/whoisit/utils.py in http_request(url, method, headers, data, *args, **kwargs)
21 try:
---> 22 return requests.request(method, url, headers=headers, data=data, *args,
23 **kwargs)
~/dev/project/.venv/lib/python3.9/site-packages/requests/api.py in request(method, url, **kwargs)
60 with sessions.Session() as session:
---> 61 return session.request(method=method, url=url, **kwargs)
62
~/dev/project/.venv/lib/python3.9/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redi
rects, proxies, hooks, stream, verify, cert, json)
541 send_kwargs.update(settings)
--> 542 resp = self.send(prep, **send_kwargs)
543
~/dev/project/.venv/lib/python3.9/site-packages/requests/sessions.py in send(self, request, **kwargs)
654 # Send the request
--> 655 r = adapter.send(request, **kwargs)
656
~/dev/project/.venv/lib/python3.9/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
513 # This branch is for urllib3 v1.22 and later.
--> 514 raise SSLError(e, request=request)
515
SSLError: HTTPSConnectionPool(host='rdap.nominet.uk', port=443): Max retries exceeded with url: /work/domain/frame.work (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_
SMALL] dh key too small (_ssl.c:1123)')))
The above exception was the direct cause of the following exception:
QueryError Traceback (most recent call last)
<ipython-input-23-0d70ed62a17b> in <module>
----> 1 whoisit.domain("frame.work")
~/dev/project/.venv/lib/python3.9/site-packages/whoisit/__init__.py in domain(domain_name, raw)
39 query_type='domain', query_value=domain_name)
40 q = Query(method, url)
---> 41 response = q.request()
42 return response if raw else parse(_bootstrap, 'domain', response)
43
~/dev/project/.venv/lib/python3.9/site-packages/whoisit/query.py in request(self, *args, **kwargs)
178 def request(self, *args, **kwargs):
179 # args and kwargs here are passed directly to requests.request(...)
--> 180 response = http_request(self.url, self.method, *args, **kwargs)
181 if response.status_code == 404:
182 raise ResourceDoesNotExist(f'RDAP {self.method} request to {self.url} '
~/dev/project/.venv/lib/python3.9/site-packages/whoisit/utils.py in http_request(url, method, headers, data, *args, **kwargs)
23 **kwargs)
24 except Exception as e:
---> 25 raise QueryError(f'Failed to make a {method} request to {url}: {e}') from e
26
27
QueryError: Failed to make a GET request to https://rdap.nominet.uk/work/domain/frame.work: HTTPSConnectionPool(host='rdap.nominet.uk', port=443): Max retries exceeded wi
th url: /work/domain/frame.work (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1123)')))
Hey there
First, thank you for this project. I set out to do this about 18 months ago and got sidetracked and settled for partial (and not very elegant) parsing of the vCard data. I'm very appreciative of your work as the data I'm dealing with is both very large (thousands of unique networks) as well as very dispersed- requiring queries to practically every major RIR as well as almost all of the regional RIRs (or at least the ones that have implemented RDAP)
Second, the real reason for entering this, what interest do you have in parsing the following vCard values:
adr
(address, 4 element array of strings)tel
(a little more complex but still reasonably simple, a dict with a list, and plain text)n
(4-element array of text, possibly a bit more if done fully)More direct questions that are more easily answerable if you have the time:
tel
or adr
through a third-party package specializing in those respective types to normalize them? Either as an optional "core" part of the library, or as a Python/setuptools "extra"?Any answers you can provide are appreciated but if you're too busy to deal with this it's not a problem. I'm happy to work off my own fork for the moment and check in later once it's been more rigorously tested
Thanks again
Domains with unicode characters have an additional field named unicodeName, which contains their name nicely formatted. It would be great to be able to parse it automatically, so that we can retrieve it easily if needed!
Example of such domain: château-de-la-branchoire.net (xn--chteau-de-la-branchoire-j6b.net)
Corresponding RDAP query: https://rdap.nameshield.net/domain/xn--chteau-de-la-branchoire-j6b.net
And here is the RFC regarding this field: https://www.rfc-editor.org/rfc/rfc9083.html#section-5.2-14.4
Some RDAP servers also seems to have both field even if there is no unicode characters (ex: https://rdap.publicinterestregistry.org/rdap/domain/kernel.org).
Parsing Whois data is hard, especially because the format differ depending on the TLD. I've a specific issue with registrant value. So I decided to test multiple python whois library (pythonwhoisalt, asyncwhois, whoisit, whoisdomain) against the registrant field using google's domain dataset and check for their speed too. Initially, I was using whoisdomain, so here the initial post on its github: mboot-github/WhoisDomain#21
Here the script I wrote: https://gist.github.com/baderdean/cc4643ecd95d3ccde31dee80ebdbea28
For whoisit, it's pretty normal only a few TLDs are supported because of RDAP support itself, yet only 3% looks to me too few. Did I miss something?
And here the results:
{'asyncwhois': {'count': 49,
'duration': 285.84409061399947,
'percentage': 26,},
'whoisdomain': {'count': 44,
'duration': 195.54051797400007,
'percentage': 24},
'whoisit': {'count': 6,
'duration': 27.91238160300054,
'percentage': 3,},
'pythonwhoisalt': {'count': 7,
'duration': 1055.0711162629996,
'percentage': '4%'}}
PS: I've created similar issues in other projects as well.
In a number of countries, GL for one, in addition to the TLD 'gl', an SLD is used for domains. The effective 'tld' to be handled by whoisit then becomes something like 'co.gl' or 'net.gl' which it does not currently support.
I have set this up with custom overrides like this:
whoisit.overrides.iana_overrides['domain'] = {
'gl': ['https://rdap.centralnic.com/gl/'],
'co.gl': ['https://rdap.centralnic.com/co.gl/'],
'net.gl': ['https://rdap.centralnic.com/net.gl/']
}
I need something like this, because CentralNIC requires different base urls for these slds. Due to how the lookup is handled in get_domain_endpoint
however, it does not currently work.
Hello,
It seems that there is some issue with uk domains
>>> whoisit.domain("citizensadvice.org.uk")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/fl/.local/lib/python3.9/site-packages/whoisit/__init__.py", line 46, in domain
method, url, exact_match = build_query(
File "/home/fl/.local/lib/python3.9/site-packages/whoisit/query.py", line 57, in build
url, exact_match = fetcher(query_value)
File "/home/fl/.local/lib/python3.9/site-packages/whoisit/query.py", line 95, in get_domain_endpoint
endpoints, exact_match = self.bootstrap.get_dns_endpoints(tld)
File "/home/fl/.local/lib/python3.9/site-packages/whoisit/bootstrap.py", line 319, in get_dns_endpoints
raise UnsupportedError(f'TLD "{tld}" has no known RDAP endpoint '
whoisit.errors.UnsupportedError: TLD "uk" has no known RDAP endpoint and is unsupported
Yet, .uk tld has an RDAP endpoint: https://rdap.nominet.uk/uk/domain/citizensadvice.org.uk
As all remote interactions from whoisit
are just basic HTTP requests, with the exception of the additional custom SSL handshake options, an interface should be added to expose an async mode. aiohttp
would be a reasonably low overhead drop-in replacement for the current requests
sessions. This would add an extra dependency on aiohttp
and that aiohttp
has compiled extensions so async mode would need to be opt-in to maintain pure Python support.
pip install whoisit
in a fresh virtualenv fails:
% pip install whoisit
Collecting whoisit
Using cached whoisit-2.5.1.tar.gz (31 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [10 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/private/var/folders/6y/xrb0tm811kl8v3jt6_7cz0q80000gn/T/pip-install-3p_w5gii/whoisit_1978a29a75af4988a86dd13a17a220a1/setup.py", line 4, in <module>
from whoisit.version import version
File "/private/var/folders/6y/xrb0tm811kl8v3jt6_7cz0q80000gn/T/pip-install-3p_w5gii/whoisit_1978a29a75af4988a86dd13a17a220a1/whoisit/__init__.py", line 4, in <module>
from .parser import parse
File "/private/var/folders/6y/xrb0tm811kl8v3jt6_7cz0q80000gn/T/pip-install-3p_w5gii/whoisit_1978a29a75af4988a86dd13a17a220a1/whoisit/parser.py", line 3, in <module>
from dateutil.parser import parse as dateutil_parse
ModuleNotFoundError: No module named 'dateutil'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
because setup.py imports code that uses python-dateutil it actually should be either specified in setup_requires
in addition to install_requires
(see https://setuptools.pypa.io/en/latest/references/keywords.html ) or a pyproject.toml
as defined in PEP-518 (https://peps.python.org/pep-0518/ )
Hi @meeb,
I was wondering if you'd consider adding git tags for new releases here? Then GitHub can auto-generate changelogs from the all the commits that go into a new release.
https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases
Releases are based on Git tags, which mark a specific point in your repository's history.
Thanks for considering!
The exception "AttributeError: 'list' object has no attribute 'get'" for parser.py at line 192 is raise for some domains like jeldwen.com when call with whoisit.domain(domain_name=domain, allow_insecure_ssl=True)
since release 2.7.4.
Hi! Just a recommendation, in utils.py
this method is using mutable defaults parameters, which are not a good practice.
def http_request(session, url, method='GET', allow_insecure_ssl=False,
headers={}, data={}, *args, **kwargs):
A simple fix should be
def http_request(session, url, method='GET', allow_insecure_ssl=False,
headers=None, data=None, *args, **kwargs):
headers = headers or {}
data = data or {}
Hi,
saw your library today and was amazed. cool work - also for the possibility to override the bootstrap. (nic.ch is also not yet submitted in iana ;-) )
I see a problem with the requirement of a handle in the domain parsing:
Line 167 in 80dc563
The rdap response profile defines this for domains:
https://www.icann.org/en/system/files/files/rdap-response-profile-15feb19-en.pdf
Section 3.2 - for registries
Contacts (Admin, Technical) - The RDAP response SHOULD contain at least two
entities, with the administrative and technical roles respectively within the entity
with the registrar role. The entities with the administrative and technical roles
MUST contain valid fn, tel, email members, and MAY contain a handle and a
valid adr element
so entities can MAY contain a handle for the admin and technical role. (its a MUST for the registrar, but not for these two).
can we remove the enforcing of a handle in the extract_entities
?
It looks like urllib3 2.x no longer includes DEFAULT_CIPHERS for SSL functionality, and an SSL context is preferred.
http_request
in utils.py
references DEFAULT_CIPHERS and an error is thrown when attempting to using a version of requests equal to or greater than 2.30.0 and a version of urllib3 greater than 1.x.
https://github.com/meeb/whoisit/blob/main/whoisit/utils.py#L45C6-L64
Should the ssl module and ssl.create_default_context() be used to set the value?
For example, something like the following in utils.py
. I haven't tested it.
import ssl
ssl_context = ssl.create_default_context()
< Optionally set ciphers >
ssl_context.set_ciphers('...')
<>
request.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = ssl_context.get_ciphers()
Thanks!
Wes
hi
When using the whois command under Linux, there will be route-related information. This part is not included in the information returned by whoisit. Is it possible to obtain this information? Thanks
For example
whois 14.204.0.0/15
At the end there is the following information:
**% Information related to '14.204.0.0/15AS4837'
route: 14.204.0.0/15
descr: China Unicom Yunnan Province Network
country: CN
origin: AS4837
mnt-by: MAINT-CNCGROUP-RR
last-modified: 2010-09-26T02:26:02Z
source: APNIC**
% This query was served by the APNIC Whois Service version 1.88.25 (WHOIS-AU3)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.