GithubHelp home page GithubHelp logo

Comments (8)

meeb avatar meeb commented on June 15, 2024 1

Hi,

Thanks for the issue. Absolutely very happy to accept pull requests that add sensible features. If it keeps within the style I've coded it which is somewhat verbose but easier to come back to after 18 months of not reading it and understand what it's still doing all the better :)

Effectively we had a reasonably narrow use case and built the library to work with that use case. To expand to general use I really just randomly selected a bunch of samples from global RDAP servers of different kinds and tweaked the parser until they seemed to work as expected. So in answer to your specific questions:

  1. No specific reason. Likely either didn't encounter them or where I did encounter them they were duplicated.
  2. Yep, feel free to submit a PR. Where possible the general aim of the "condensed" or "parsed" results was to return singular checked information, so if you want to try and add an extractor for a single true address string or single true phone number string etc. that would be fine.
  3. The library is well tested against network RDAP results, less so against regional ccTLD RDAP servers. Coverage seems reasonably comprehensive, but likely has the odd parsing issue with some of the more esoteric RDAP servers on the internet.
  4. Depends on the additional parsing really. If it's relatively basic and pure Python, probably fine. I'd generally like to keep the library simple and not depend on anything too crazy though.

Having taken a look at your sample output probably the only change I'd suggest is making the address a string were possible. The mix of comma-split single lines as a one element list vs. multi-element lists is messy. Also stripping the errant \r and \n etc. would probably be neater.

Generally, if you just add address as a clean, parsed string and phone and so on, sounds good. These additional parsers would need tests as well.

In your example how does ind_name differ from the existing name field? What's the raw vcard data there?

from whoisit.

mzpqnxow avatar mzpqnxow commented on June 15, 2024 1

Closing this out for now. Thanks!

from whoisit.

mzpqnxow avatar mzpqnxow commented on June 15, 2024

For a sample of the output (nothing surprising or exciting) here's what I'm emitting, first one with address (from APNIC) and then one that parsed n (into a field ind_name) from ARIN

For address, I'm taking the four string elements in adr and just filtering out the empty string elements. I generally avoid modifying structured data, but I think in this case it made sense. Curious of your thoughts

{
  "handle": "43.229.211.0 - 43.229.211.255",
  "parent_handle": "",
  "name": "DFLL-BD",
  "whois_server": "whois.apnic.net",
  "type": "ip network",
  "terms_of_service_url": "http://www.apnic.net/db/dbcopyright.html",
  "copyright_notice": "",
  "description": [
    "Dhaka Fiber Link Ltd."
  ],
  "last_changed_date": "2021-01-20T06:55:09+00:00",
  "registration_date": "2016-06-06T07:59:47+00:00",
  "expiration_date": null,
  "url": "https://rdap.apnic.net/ip/43.229.211.0/24",
  "rir": "apnic",
  "entities": {
    "abuse": [
      {
        "handle": "IRT-DFLL-BD",
        "url": "https://rdap.apnic.net/entity/IRT-DFLL-BD",
        "type": "entity",
        "name": "IRT-DFLL-BD",
        "address": [
          "House-22, Road-13C, Block-E, Banani, Dhaka-1213"
        ],
        "email": "[email protected]",
        "rir": "apnic"
      }
    ],
    "technical": [
      {
        "handle": "DFLL1-AP",
        "url": "https://rdap.apnic.net/entity/DFLL1-AP",
        "type": "entity",
        "name": "Dhaka Fiber Link Ltd administrator",
        "address": [
          "House-22, Road-13C, Banani, Dhaka-1213"
        ],
        "email": "[email protected]",
        "rir": "apnic"
      }
    ],
    "administrative": [
      {
        "handle": "DFLL1-AP",
        "url": "https://rdap.apnic.net/entity/DFLL1-AP",
        "type": "entity",
        "name": "Dhaka Fiber Link Ltd administrator",
        "address": [
          "House-22, Road-13C, Banani, Dhaka-1213"
        ],
        "email": "[email protected]",
        "rir": "apnic"
      }
    ]
  },
  "country": "BD",
  "ip_version": 4,
  "assignment_type": "allocated non-portable",
  "network": "43.229.211.0/24"
}

A sample from ARIN with n parsed. Note that the name ind_name is a poor arbitrary one I chose for the interim. I'm not certain at this point if I'm handling n completely correctly, but it's "good enough" for my current use:

{
  "handle": "NET-108-160-97-184-2",
  "parent_handle": "NET-108-160-97-184-1",
  "name": "QITX-TWIST-WALMART-BLOCK02",
  "whois_server": "whois.arin.net",
  "type": "ip network",
  "terms_of_service_url": "https://www.arin.net/resources/registry/whois/tou/",
  "copyright_notice": "Copyright 1997-2022, American Registry for Internet Numbers, Ltd.",
  "description": [],
  "last_changed_date": "2012-11-06T15:46:03-05:00",
  "registration_date": "2012-11-06T15:46:03-05:00",
  "expiration_date": null,
  "url": "https://rdap.arin.net/registry/ip/108.160.97.184",
  "rir": "arin",
  "entities": {
    ...
    "abuse": [
      {
        "handle": "BLANG-ARIN",
        "url": "https://rdap.arin.net/registry/entity/BLANG-ARIN",
        "type": "entity",
        "whois_server": "whois.arin.net",
        "name": "Tomy BLANGIS",
        "address": [
          "550 Sherbrooke West\r",
          "West Tower,suite 250",
          "Montreal",
          "QC",
          "H3A 1B9",
          "Canada"
        ],
        "email": "[email protected]",
        "rir": "arin",
        "ind_name": [
          "BLANGIS",
          "Tomy"
        ]
      }
    ],
...

from whoisit.

mzpqnxow avatar mzpqnxow commented on June 15, 2024

Thanks for the quick response

Absolutely very happy to accept pull requests that add sensible features. If it keeps within the style I've coded it which is somewhat verbose but easier to come back to after 18 months of not reading it and understand what it's still doing all the better :)

Yep, no disagreement there

3. The library is well tested against network RDAP results, less so against regional ccTLD RDAP servers. Coverage seems reasonably comprehensive, but likely has the odd parsing issue with some of the more esoteric RDAP servers on the internet.

The good news for you is that it seems to do pretty well against my inputs.I haven't encountered any issues so far, and that includes regional RIRs in Canada, Japan and China

Having taken a look at your sample output probably the only change I'd suggest is making the address a string were possible. The mix of comma-split single lines as a one element list vs. multi-element lists is messy

Good feedback. I think that may get to be a little tricky because of how unruly and inconsistent the address data can be, but I'll keep it in mind. I'll at least make it easy so it's easy for you to choose which of 2 or 3 styles to keep when I send the PR in (format 1, format2, raw, etc)

Also stripping the errant \r and \n etc. would probably be neater.

Man, you've got good eyesight! I hadn't even noticed, I'll make sure it strips any other whitsespace as well. As it is now, if an address element ended with a space (or newline, even) it would have ended up in there too- oops!

In your example how does ind_name differ from the existing name field? What's the raw vcard data there?

I'll paste that in a separate comment in a moment

from whoisit.

mzpqnxow avatar mzpqnxow commented on June 15, 2024

Unfortunately this a pretty ugly one, and it's also somewhat exceptional as (if I read it correctly) it has a company/org in the "individual" field

Also, you mentioned not duplicating fields, I think I see what you mean in this example- the fn text is roughly equivalent to the n text, the only real difference being the n text maintains structure. So there may be some choices here:

  1. Don't do it at all
  2. Do it, use the structured n field opportunistically, when fn is not present (not sure if that's common)
  3. Do it only when n and fn (after being normalized) are not roughly equivalent

(the above 3, with the caveat that the desire is to both keep the code relatively clean and simple, and the output reasonably simple/consistent)

{
  "rdapConformance": [
    "nro_rdap_profile_0",
    "rdap_level_0",
    "cidr0",
    "arin_originas0"
  ],
  "notices": [
    {
      "title": "Terms of Service",
      "description": [
        "By using the ARIN RDAP/Whois service, you are agreeing to the RDAP/Whois Terms of Use"
      ],
      "links": [
        {
          "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
          "rel": "terms-of-service",
          "type": "text/html",
          "href": "https://www.arin.net/resources/registry/whois/tou/"
        }
      ]
    },
    {
      "title": "Whois Inaccuracy Reporting",
      "description": [
        "If you see inaccuracies in the results, please visit: "
      ],
      "links": [
        {
          "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
          "rel": "inaccuracy-report",
          "type": "text/html",
          "href": "https://www.arin.net/resources/registry/whois/inaccuracy_reporting/"
        }
      ]
    },
    {
      "title": "Copyright Notice",
      "description": [
        "Copyright 1997-2022, American Registry for Internet Numbers, Ltd."
      ]
    }
  ],
  "handle": "NET-104-130-80-36-1",
  "startAddress": "104.130.80.36",
  "endAddress": "104.130.80.39",
  "ipVersion": "v4",
  "name": "RACKS-8-1403032696587479",
  "type": "ASSIGNMENT",
  "parentHandle": "NET-104-130-0-0-1",
  "events": [
    {
      "eventAction": "last changed",
      "eventDate": "2014-06-17T15:19:03-04:00"
    },
    {
      "eventAction": "registration",
      "eventDate": "2014-06-17T15:19:03-04:00"
    }
  ],
  "links": [
    {
      "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
      "rel": "self",
      "type": "application/rdap+json",
      "href": "https://rdap.arin.net/registry/ip/104.130.80.36"
    },
    {
      "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
      "rel": "alternate",
      "type": "application/xml",
      "href": "https://whois.arin.net/rest/net/NET-104-130-80-36-1"
    },
    {
      "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
      "rel": "up",
      "type": "application/rdap+json",
      "href": "https://rdap.arin.net/registry/ip/104.130.0.0/16"
    }
  ],
  "entities": [
    {
      "handle": "C05103560",
      "vcardArray": [
        "vcard",
        [
          [
            "version",
            {},
            "text",
            "4.0"
          ],
          [
            "fn",
            {},
            "text",
            "Walmart International Real Estate Systems"
          ],
          [
            "adr",
            {
              "label": "Walmart\nInternational Division\n702 SW 8th Street\nBentonville\nAR\n72716\nUnited States"
            },
            "text",
            [
              "",
              "",
              "",
              "",
              "",
              "",
              ""
            ]
          ],
          [
            "kind",
            {},
            "text",
            "org"
          ]
        ]
      ],
      "roles": [
        "registrant"
      ],
      "links": [
        {
          "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
          "rel": "self",
          "type": "application/rdap+json",
          "href": "https://rdap.arin.net/registry/entity/C05103560"
        },
        {
          "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
          "rel": "alternate",
          "type": "application/xml",
          "href": "https://whois.arin.net/rest/org/C05103560"
        }
      ],
      "events": [
        {
          "eventAction": "last changed",
          "eventDate": "2014-06-17T15:19:02-04:00"
        },
        {
          "eventAction": "registration",
          "eventDate": "2014-06-17T15:19:02-04:00"
        }
      ],
      "entities": [
        {
          "handle": "RACKS-8",
          "vcardArray": [
            "vcard",
            [
              [
                "version",
                {},
                "text",
                "4.0"
              ],
              [
                "fn",
                {},
                "text",
                "Rackspace Hosting"
              ],
              [
                "adr",
                {
                  "label": "1 Fanatical Place\nWindcrest\nTX\n78218\nUnited States"
                },
                "text",
                [
                  "",
                  "",
                  "",
                  "",
                  "",
                  "",
                  ""
                ]
              ],
              [
                "kind",
                {},
                "text",
                "org"
              ]
            ]
          ],
          "roles": [
            "registrant"
          ],
          "links": [
            {
              "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
              "rel": "self",
              "type": "application/rdap+json",
              "href": "https://rdap.arin.net/registry/entity/RACKS-8"
            },
            {
              "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
              "rel": "alternate",
              "type": "application/xml",
              "href": "https://whois.arin.net/rest/org/RACKS-8"
            }
          ],
          "events": [
            {
              "eventAction": "last changed",
              "eventDate": "2017-09-12T08:57:09-04:00"
            },
            {
              "eventAction": "registration",
              "eventDate": "2010-03-29T13:32:31-04:00"
            }
          ],
          "entities": [
            {
              "handle": "ZR9-ARIN",
              "vcardArray": [
                "vcard",
                [
                  [
                    "version",
                    {},
                    "text",
                    "4.0"
                  ],
                  [
                    "adr",
                    {
                      "label": "5000 Walzem Rd\nSan Antonio\nTX\n78218\nUnited States"
                    },
                    "text",
                    [
                      "",
                      "",
                      "",
                      "",
                      "",
                      "",
                      ""
                    ]
                  ],
                  [
                    "fn",
                    {},
                    "text",
                    "com Rackspace"
                  ],
                  [
                    "n",
                    {},
                    "text",
                    [
                      "Rackspace",
                      "com",
                      "",
                      "",
                      ""
                    ]
                  ],
                  [
                    "kind",
                    {},
                    "text",
                    "individual"
                  ],
                  [
                    "email",
                    {},
                    "text",
                    "[email protected]"
                  ],
                  [
                    "tel",
                    {
                      "type": [
                        "work",
                        "voice"
                      ]
                    },
                    "text",
                    "+1-210-312-4000"
                  ]
                ]
              ],
              "roles": [
                "technical"
              ],
              "links": [
                {
                  "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
                  "rel": "self",
                  "type": "application/rdap+json",
                  "href": "https://rdap.arin.net/registry/entity/ZR9-ARIN"
                },
                {
                  "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
                  "rel": "alternate",
                  "type": "application/xml",
                  "href": "https://whois.arin.net/rest/poc/ZR9-ARIN"
                }
              ],
              "events": [
                {
                  "eventAction": "last changed",
                  "eventDate": "2021-10-29T08:28:45-04:00"
                },
                {
                  "eventAction": "registration",
                  "eventDate": "2000-02-08T16:10:38-05:00"
                }
              ],
              "status": [
                "validated"
              ],
              "port43": "whois.arin.net",
              "objectClassName": "entity"
            },
            {
              "handle": "IPADM17-ARIN",
              "vcardArray": [
                "vcard",
                [
                  [
                    "version",
                    {},
                    "text",
                    "4.0"
                  ],
                  [
                    "adr",
                    {
                      "label": "1 Fanatical Place\nSan Antonio\nTX\n78218\nUnited States"
                    },
                    "text",
                    [
                      "",
                      "",
                      "",
                      "",
                      "",
                      "",
                      ""
                    ]
                  ],
                  [
                    "fn",
                    {},
                    "text",
                    "IPADMIN"
                  ],
                  [
                    "org",
                    {},
                    "text",
                    "IPADMIN"
                  ],
                  [
                    "kind",
                    {},
                    "text",
                    "group"
                  ],
                  [
                    "email",
                    {},
                    "text",
                    "[email protected]"
                  ],
                  [
                    "tel",
                    {
                      "type": [
                        "work",
                        "voice"
                      ]
                    },
                    "text",
                    "+1-210-312-4000"
                  ]
                ]
              ],
              "roles": [
                "technical",
                "administrative"
              ],
              "links": [
                {
                  "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
                  "rel": "self",
                  "type": "application/rdap+json",
                  "href": "https://rdap.arin.net/registry/entity/IPADM17-ARIN"
                },
                {
                  "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
                  "rel": "alternate",
                  "type": "application/xml",
                  "href": "https://whois.arin.net/rest/poc/IPADM17-ARIN"
                }
              ],
              "events": [
                {
                  "eventAction": "last changed",
                  "eventDate": "2021-05-24T10:20:15-04:00"
                },
                {
                  "eventAction": "registration",
                  "eventDate": "2002-09-16T13:50:19-04:00"
                }
              ],
              "status": [
                "validated"
              ],
              "port43": "whois.arin.net",
              "objectClassName": "entity"
            },
            {
              "handle": "HANSE157-ARIN",
              "vcardArray": [
                "vcard",
                [
                  [
                    "version",
                    {},
                    "text",
                    "4.0"
                  ],
                  [
                    "adr",
                    {
                      "label": "5000 Walzem Rd\nSan Antonio\nTX\n78218\nUnited States"
                    },
                    "text",
                    [
                      "",
                      "",
                      "",
                      "",
                      "",
                      "",
                      ""
                    ]
                  ],
                  [
                    "fn",
                    {},
                    "text",
                    "Chris Hansell"
                  ],
                  [
                    "n",
                    {},
                    "text",
                    [
                      "Hansell",
                      "Chris",
                      "",
                      "",
                      ""
                    ]
                  ],
                  [
                    "kind",
                    {},
                    "text",
                    "individual"
                  ],
                  [
                    "email",
                    {},
                    "text",
                    "[email protected]"
                  ],
                  [
                    "tel",
                    {
                      "type": [
                        "work",
                        "voice"
                      ]
                    },
                    "text",
                    "+1-210-312-4000"
                  ]
                ]
              ],
              "roles": [
                "technical",
                "noc"
              ],
              "links": [
                {
                  "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
                  "rel": "self",
                  "type": "application/rdap+json",
                  "href": "https://rdap.arin.net/registry/entity/HANSE157-ARIN"
                },
                {
                  "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
                  "rel": "alternate",
                  "type": "application/xml",
                  "href": "https://whois.arin.net/rest/poc/HANSE157-ARIN"
                }
              ],
              "events": [
                {
                  "eventAction": "last changed",
                  "eventDate": "2021-07-16T10:11:37-04:00"
                },
                {
                  "eventAction": "registration",
                  "eventDate": "2015-09-11T14:42:42-04:00"
                }
              ],
              "status": [
                "validated"
              ],
              "port43": "whois.arin.net",
              "objectClassName": "entity"
            },
            {
              "handle": "ABUSE45-ARIN",
              "vcardArray": [
                "vcard",
                [
                  [
                    "version",
                    {},
                    "text",
                    "4.0"
                  ],
                  [
                    "adr",
                    {
                      "label": "5000 Walzem Rd\nSan Antonio\nTX\n78218\nUnited States"
                    },
                    "text",
                    [
                      "",
                      "",
                      "",
                      "",
                      "",
                      "",
                      ""
                    ]
                  ],
                  [
                    "fn",
                    {},
                    "text",
                    "Abuse Desk"
                  ],
                  [
                    "org",
                    {},
                    "text",
                    "Abuse Desk"
                  ],
                  [
                    "kind",
                    {},
                    "text",
                    "group"
                  ],
                  [
                    "email",
                    {},
                    "text",
                    "[email protected]"
                  ],
                  [
                    "tel",
                    {
                      "type": [
                        "work",
                        "voice"
                      ]
                    },
                    "text",
                    "+1-210-312-4000"
                  ]
                ]
              ],
              "roles": [
                "abuse"
              ],
              "links": [
                {
                  "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
                  "rel": "self",
                  "type": "application/rdap+json",
                  "href": "https://rdap.arin.net/registry/entity/ABUSE45-ARIN"
                },
                {
                  "value": "https://rdap.arin.net/registry/ip/104.130.80.36",
                  "rel": "alternate",
                  "type": "application/xml",
                  "href": "https://whois.arin.net/rest/poc/ABUSE45-ARIN"
                }
              ],
              "events": [
                {
                  "eventAction": "last changed",
                  "eventDate": "2021-09-30T09:36:15-04:00"
                },
                {
                  "eventAction": "registration",
                  "eventDate": "2002-09-25T15:10:56-04:00"
                }
              ],
              "status": [
                "validated"
              ],
              "port43": "whois.arin.net",
              "objectClassName": "entity"
            }
          ],
          "port43": "whois.arin.net",
          "objectClassName": "entity"
        }
      ],
      "port43": "whois.arin.net",
      "objectClassName": "entity"
    }
  ],
  "port43": "whois.arin.net",
  "status": [
    "active"
  ],
  "objectClassName": "ip network",
  "cidr0_cidrs": [
    {
      "v4prefix": "104.130.80.36",
      "length": 30
    }
  ],
  "arin_originas0_originautnums": []
}

from whoisit.

meeb avatar meeb commented on June 15, 2024

According to the RFC fn and n should be equivalent, just fn is formatted as a single string and n is multi-part, represented as a list in JSON:

https://www.w3.org/2002/12/cal/rfc2426.html#sec3.1.1

https://www.w3.org/2002/12/cal/rfc2426.html#sec3.1.2

I don't think there should be a situation where n is set but fn is not, but I'm fine with the idea of a fallback to ', '.join(n) if fn is not present in the data.

The general aim of the library was to be logical for humans, so, if the data is extracted in a way of "this entity as an abuse contact, and their name is [person/company]" if that makes sense then it's working as intended.

I don't see anything we can really do for organisations or individuals in the wrong fields, they've literally put the individual down as "Rackspace Com" as first and last name in your example. In these sorts of situations the library should probably just return the data verbatim, it will be a bit incorrect, but the source data is incorrect.

from whoisit.

mzpqnxow avatar mzpqnxow commented on June 15, 2024

According to the RFC fn and n should be equivalent, just fn is formatted as a single string and n is multi-part, represented as a list in JSON:

https://www.w3.org/2002/12/cal/rfc2426.html#sec3.1.1

https://www.w3.org/2002/12/cal/rfc2426.html#sec3.1.2

I don't think there should be a situation where n is set but fn is not, but I'm fine with the idea of a fallback to ', '.join(n) if fn is not present in the data.

Thank you for those references. For now, I'm going to add an assert on my local copy and see if I encounter a single case where this happens in practice (n is set, fn is not) - if I don't encounter anything in the next few months then I won't bother to clutter things

I don't see anything we can really do for organisations or individuals in the wrong fields, they've literally put the individual down as "Rackspace Com" as first and last name in your example. In these sorts of situations the library should probably just return the data verbatim, it will be a bit incorrect, but the source data is incorrect.

Yep, I agree 100% with this, especially now that I've been dealing with the wrangling of bad data almost daily for the past 2 years- you can either drive yourself nuts trying to accomodate bad data (and still end up with bad data) or just "let it be" - it's arguably more correct in the original form anyway

Thanks again, I'll let you know if/when I send something in

from whoisit.

meeb avatar meeb commented on June 15, 2024

Sounds good! Thanks for the discussion. As said feel free to submit a PR and I'll be happy to review it. You OK if I close this issue for now?

from whoisit.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.