GithubHelp home page GithubHelp logo

nager / nager.publicsuffix Goto Github PK

View Code? Open in Web Editor NEW
142.0 5.0 21.0 489 KB

.NET public suffix domain parser

License: MIT License

C# 100.00%
publicsuffix domain-checker domain-parser domain-name csharp domain-validation dotnet tld c-sharp domain-verifier

nager.publicsuffix's Introduction

Nager.PublicSuffix (PSL)

With so many different endings for domain names, it's hard to know if they're valid or not. This project uses a list from publicsuffix.org, which keeps track of all the common endings like .com or .org. It checks domain names against this list to see if they're okay. Then, it splits the domain into three parts: the ending (like .com), the main part (like google), and any subparts (like www). You can find the list on GitHub under publicsuffix list repository.

Changes to version 3

If you like the new version then give the project a โญ or become a sponsor

  • Add Support for .NET8
  • Allow hot reload of rule data over the IRuleProvider
  • Optimize Dependency Injection support
  • Remove obsolete methods
  • Fix UriDomainNormalizer bug with http://
  • Cleanup project structure
  • Optimize code documentation
  • Add ILogger interface for logging issues
  • Allow inject HttpClient for CachedHttpRuleProvider
  • Error handling with TLD domains improved
  • Optimize custom exception logic, use default message of exception class

Breaking changes

  • Some namespaces have changed
  • DomainInfo->Hostname renamed to DomainInfo->FullyQualifiedDomainName

Previous code for V2

The source code of version 2 can be found here

Use cases

  • Cookie restriction for browsers
  • Domain highlighting in the URL bar of browsers
  • DMARC E-Mail Security
  • Certificate requests (ACME)
  • Determining Valid Wildcard Certificates
  • Two-factor authentication (FIDO)

Parts of a Domain

Fully Qualified Domain Name (FQDN) Top Level Domain (TLD) Domain Subdomain
blog.google.com com google blog
22.cn cn 22
www.volkswagen.de de volkswagen www
www.amazon.co.uk co.uk amazon www
www.wikipedia.org org wikipedia www

nuget

The package is available on nuget

PM> install-package Nager.PublicSuffix

Features

  • High performance
  • CacheProvider
  • Async support

Code Examples

Analyze domain

var ruleProvider = new LocalFileRuleProvider("public_suffix_list.dat");
await ruleProvider.BuildAsync();

var domainParser = new DomainParser(ruleProvider);

var domainInfo = domainParser.Parse("sub.test.co.uk");
//domainInfo.Domain = "test";
//domainInfo.FullyQualifiedDomainName = "sub.test.co.uk";
//domainInfo.RegistrableDomain = "test.co.uk";
//domainInfo.Subdomain = "sub";
//domainInfo.TopLevelDomain = "co.uk";

Check is a valid domain

var ruleProvider = new LocalFileRuleProvider("public_suffix_list.dat");
await ruleProvider.BuildAsync();

var domainParser = new DomainParser(ruleProvider);

var isValid = domainParser.IsValidDomain("sub.test.co.uk");

asp.net Intergration

// after -> var builder = WebApplication.CreateBuilder(args);

builder.Services.AddHttpClient(); //Required for CachedHttpRuleProvider
builder.Services.AddSingleton<ICacheProvider, LocalFileSystemCacheProvider>();
builder.Services.AddSingleton<IRuleProvider, CachedHttpRuleProvider>();
builder.Services.AddSingleton<IDomainParser, DomainParser>();

// after -> var app = builder.Build();
var ruleProvider = app.Services.GetService<IRuleProvider>();
if (ruleProvider != null)
{
    await ruleProvider.BuildAsync();
}

// minimal api
app.MapGet("/DomainInfo/{domain}", (string domain, IDomainParser domainParser) =>
{
    domain = HttpUtility.UrlEncode(domain);

    var domainInfo = domainParser.Parse(domain);
    return domainInfo;
})
.WithName("DomainInfo")
.WithOpenApi();

nager.publicsuffix's People

Contributors

fcallejon avatar louislouw avatar merijn040 avatar nazgaul avatar phil-dobson-fh avatar phildobsonongithub avatar ronnykarlsson avatar tinohager avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nager.publicsuffix's Issues

Dependency issue when upgrading from .NET 4.6.1 to 4.7.2+

Hi,

Love the utility you've created here.

Unfortunately when I'm attempting to upgrade my project from .NET 4.6.1 to a later release (e.g. v4.7.2 onwards), there seems to be a dependency issue with using this library.

image

I'm using v2.2.2 of Nager.PublicSuffix and keep getting the above error where Nager.PublicSuffix doesn't seem to be able to find my System.Net.Http library (which is v4.3.4) because it's looking for an older release.

Curious if this is a limitation you've come across or perhaps an implementation issue on my end.

I'm using the "Parse" method predominantly.
image

My build environment is:
Visual Studio 2019
Windows 10 Home Edition
.NET 4.6.1

Thanks in advance for your help.

Regards,
Sebastian

doesn't work with some http URLs

var url = "http://www.algida.hu/";
var domainParser = new DomainParser(new WebTldRuleProvider());
var parts = domainParser.Get(url);

parts is null

Strong assembly name required

Good day,
Great work on this project by the way. Thanks.
I have an issue running my code using a reference to the nuget package. It complains about a strong assembly name that is required. An error message says: Could not load file or assembly Nager.PublicSuffix A strongly-named assembly is required. Is there a workaround for this?

.NET Standard support

Please add support for .Net Standard.

Warning NU1701 Package 'Nager.PublicSuffix 1.0.4' was restored using '.NETFramework,Version=v4.6.1' instead of the project target framework '.NETCoreApp,Version=v2.0'. This package may not be fully compatible with your project.

System.FormatException: 'Rule contains empty part'

Hi,

Using the default example provided (Loading data from web change cache config),
in line
WebTldRuleProvider.BuildAsync().GetAwaiter().GetResult();

I get the following exception

System.FormatException: 'Rule contains empty part'

This exception was originally thrown at this call stack:
Nager.PublicSuffix.TldRule.TldRule(string, Nager.PublicSuffix.TldRuleDivision)
Nager.PublicSuffix.TldRuleParser.ParseRules(System.Collections.Generic.IEnumerable)
Nager.PublicSuffix.TldRuleParser.ParseRules(string)
Nager.PublicSuffix.WebTldRuleProvider.BuildAsync()
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()

Thanks for any help,
Dimosthenis

Caching of WebTldRuleProvider

According to the https://publicsuffix.org/list/ page the list should only be downloaded once a day.

If you wish to make your app download an updated list periodically, please use this URL and have your app download the list no more than once per day.

To honor this you really should implement some sort of caching strategy.

UriNormalizer checks only for https

i see that in UriNormalizer there is a check:
if (!domain.Contains("https://")) { domain = string.Concat("https://", domain); }

This may lead to wrong behavior if the domain starts with "http://".

For instance if i would try to get the details of "http://abc.com", then "https://http://abc.com" would be evaluated which is not a valid uri and would cause it to return null.

Thanks for the library by the way.

co.az is parsing incorrectly

Hi. az domains working well except co.az.
When you parse "xyz.co.az", TLD name is "az", registrable domain is "co.az". It's wrong.
Could you fix that?

Library is slow when it's used in bulk

I'm using this package on all domains on a DNS server. In my case there are about 3000 domains. It takes about 5 minutes to get the TLD from all of them. Is there any way to speed this up?

Trying to Get domain on ec2-34-206-8-177.compute-1.amazonaws.com throws exception

Trying to parse to get the domain from ec2-34-206-8-177.compute-1.amazonaws.com and getting an exception.
Using:
var domainParser = new DomainParser(new FileTldRuleProvider("public_suffix_list.dat");
domainParser.Get("ec2-34-206-8-177.compute-1.amazonaws.com");

The exception is:
throws Exception of type 'Nager.PublicSuffix.ParseException' was thrown.
Unknown domain ec2-34-206-8-177.compute-1.amazonaws.com

"ec2-34-206-8-177.compute-1.amazonaws.com" was returned from a reverse DNS on the IP address. So it should be valid.

An Exception inside WebTldRuleProvider.LoadFromUrl() results in a corrupt cache file

When the HttpClient inside WebTldRuleProvider.LoadFromUrl() throws an exception, it is silently caught and the value "error" is returned.
The caller WebTldRuleProvider.BuildAsync() has not way to know this, and calls _cacheProvider.SetValueAsync() with the "error" as value.

The end result is a DomainParser that is initialized without any indication that something bad happened, but parses domains in the wrong way, as there are no rules in place (for example: amazon.co.uk -> co.uk)

In my opinion the exception inside WebTldRuleProvider.LoadFromUrl() should surface all the way up to the user.

Breaking changes between 1.5.1 and 2.2.2?

According to SemVer, the newer version should have breaking changes from the older version. I couldn't find any release notes. I'd like to know what I have to consider before upgrading the NuGet package in my project. Please provide release notes or a change log accessible from the project page.

Add information on breaking changes in v3

Thanks for an immensely helpful and awesome library!

Visual Studio notified me that this package had been updated to 3.0 from 2.4.0. Because of the major version jump I was particularly interested in making sure the changes didn't break anything.

The readme currently states:

The current stable development branch V2 can be found here
I am currently working on a new version. This includes some breaking changes.

I was unable to find a changelist or a detail of the breaking changes but from what I can gather, here are some that are obvious:

  • netstandard2.1, net6.0 or net8.0 is now required
  • Some namespaces have changed (affecting TldRuleParser and TldRule)
  • IEnumerables TldRules now use IRuleProvider
  • DomainInfo no longer has Hostname
  • Can no longer parse rules from a string/embedded resource

Not sure if this is a comprehensive list. The biggest breaking changes for me are the last two. So my questions are:

  1. Is there a replacement for Hostname or do we need to construct it ourselves? If so, which properties would make the equivalent?
  2. For parsing rules from a string/resource, I assume that an IRuleProvider needs to be created. Am I correct, and is there already one planned to be included?
  3. And lastly, would it be possible to get some information on upgrading from v2, or at least a list of the breaking changes since v3 has already been pushed as a release (vs pre-release) to Nuget?

Thanks again!

Public suffix list is misused, yielding incorrect results

I actually discovered this bug when using TldExtract and logged an issue in that repo. I'm posting the problem here as well because this library is (seemingly) actively maintained.

The problem is that the public suffix list is not a list of TLDs.

Repro code:

void Main()
{
    try
    {
        var parser = new DomainParser(new WebTldRuleProvider());
        Console.WriteLine(parser.Parse("https://s3.amazonaws.com"));
    }
    catch (Exception ex)
    {
        Console.WriteLine($"{ex.GetType().Name}: {ex.Message}");
    }
}

Console output:

ParseException: Domain is a TLD according publicsuffix

Access to the path '...\publicsuffixcache.dat' is denied in WebTldRuleProvider

When using the DomainParser class with WebTldRuleProvider and default settings, I am encountering the following error Access to the path 'C:\Windows\TEMP\publicsuffixcache.dat' is denied in the real server environment.

This problem can be overcome by the following two methods:

  1. Use a FileTldRuleProvider, so that the file can be saved and used under the root folder of the application.

  2. As "Microsoft's Application Pool Identities Documentation" states, you can find the publicsuffixcache.dat' file, and then you can arrange the permissions by right clicking file and following Properties - Security - Edit - Add - Enter the object names to select (Write Users and Check Names) - OK. So you have to give permissions to Users.

I didn't like choosing the second option because I thought it would create security vulnerabilities (maybe I'm thinking wrong) and it would be hard to manage in case of multiple servers . Likewise, I may not want to work with a local file, thus i may not use FileTldRuleProvider.

In such a situation, I could not find how to overcome this problem. I will be glad if you help me. For now, I have basically solved the problem, but I am writing here to understand the source of the problem. Also, if someone is stuck like me, maybe it will be useful information.

Location of cached dat file

The cached file is downloaded to the current or active folder of the user. This can be problematic.
Can you put it in the users %Temp% folder? (Path.GetTempPath)

Why Some Domains Could Not be Parsed?

The domains in question:
"streaklinks.com"
"webflow.io"

i have a list > 100k domains, they all work fine except the above! not sure why these domains could not be parsed.

streaklinks

Unsure why certain domains are invalid

For example: in-berlin.de

Must be because of the hyphen, right? Are there rules defined somewhere of how this tool works? I've come across a huge list of false negatives, unfortunately.

Rule provider not initialized breaks dependency injection principles

In order to make use of DomainParser the rule provider it consumes must be initialised by calling the Build() method or else an exception is thrown (DomainDataStructure is not available).

In order to call the Build() method in an aspnet context, some specific, non standard code needs to be executed in the startup class, which, as a side effect, requires that dependencies have to be registered as singleton. When rule providers are more advanced than the ones provided this github repo, for instance when data bases are involved, this may lead to a far from optimal setup.

Why is the Build() method not called by the DomainParser class. That is, rather than throwing an exception when _ruleProvider.GetDomainDataStructure() returns null, why not calling _ruleProvider.Build()?

I offer my help to discuss the design to see if it can be improved for more advance usage scenarios.

package .dll not copying to bin with msbuild despite dll.refresh file

hi, while i know this probably isn't an issue with the package and more an issue with with my set up, i was wondiering whether somebody cleverer than me has any insight into what's happening because i'm at a complete loss.

i recently got a new machine and my ci build that previously built successfully is now failing, i've managed to reproduce the issue with a new project

step to reproduce

  • install visual studio 2022 and visual studio 2022 build tools (literally everything ticked in both installations)
  • create new ASP.NET empty web site in c:\development\test
  • install Nager.PublicSuffix via nuget and uninstall Microsoft.CodeDom.Providers.DotNetCompilerPlatform`
  • add a new webform default.aspx and add some example code
using System;
using System.Web;
using Nager.PublicSuffix;
public partial class _Default : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)
    {
        var host = HttpContext.Current.Request.Url.Host;
        var fileCache = new FileCacheProvider(cacheFileName: "testpublicsuffixcache.dat");
        var domainParser = new DomainParser(new WebTldRuleProvider(cacheProvider: fileCache));
        var domainInfo = domainParser.Parse(host);
        var regdom = domainInfo.RegistrableDomain;
    }
}
  • delete everything except nager.publicsuffix.dll.refresh in the /bin folder
  • add <assemblies><add assembly="netstandard, Version=2.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51" /></assemblies> to <system.web><compilation> in web.config
  • open cmd and run "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\MSBuild.exe" "C:\Development\test\test.sln" /verbosity:diagnostic

here's the msbuild log

thanks

.Net Framework 4.8.1 compatibility?

Just tried to update the package to 3.0 but VS is complaining about .net 4.8.1 compatibility? 2.4 is working fine

Could not install package 'Nager.PublicSuffix 3.0.0'. You are trying to install this package into a project that targets '.NETFramework,Version=v4.8.1', but the package does not contain any assembly references or content files that are compatible with that framework. For more information, contact the package author.

Example Load with custom cache time not correct

This seems to be incorrect - there is no cacheTimeToLive on WebTldRulProvider

var webTldRuleProvider = new WebTldRuleProvider(cacheTimeToLive: new TimeSpan(10, 0, 0)); //cache data for 10 hours

Nager.PublicSuffix.ParseException: 'Domain is a TLD according publicsuffix'

I don't know what is wrong with this domain. I am getting the exception in the title, here is the code to regenerate the error:

var url = "https://s3.eu-central-1.amazonaws.com/radiobob.de/standalone-player-parabelritter/index.html";
var domainParser = new DomainParser(new WebTldRuleProvider());
var domainInfo = domainParser.Parse(url);
var RegistrableDomain = domainInfo.RegistrableDomain;

Wildcard TLDs not handled correctly?

I'm getting a ParseException for a domain like [something].compute-1.amazonaws.com when I call DomainParser.Get().

This seems to be caused by the TLD rule *.compute-1.amazonaws.com which contains 4 labels (including the wildcard character) and the domain also contains 4 parts. This results in the DomainParser thinking that the domain itself is a TLD due to this check:

//Domain is TLD
if (parts.Count == winningRule.LabelCount)

Or does the wildcard actually mean that the full string [something].compute-1.amazonaws.com is actually a TLD by itself?

Returns null

Hi there! thx for your excellent work. :)

I am getting a null value for domainInfo on this situation:

var domainParser = new DomainParser(new WebTldRuleProvider());
var domainInfo = domainParser.Get("s3-us-west-2.amazonaws.com");

ParseException - Domain is a TLD according publicsuffix

Hi,

When try to parse the following URL:

string url = "http://instahuddle.en.aptoide.com";
var domainParser = new DomainParser(new WebTldRuleProvider());
var domainName = domainParser.Get(url);
var domain = domainName.RegistrableDomain;

The following exception in thrown:

((Nager.PublicSuffix.ParseException)ex).ErrorMessage
Domain is a TLD according publicsuffix

The url http://instahuddle.en.aptoide.com is fine and it should give for RegistrableDomain just aptoide.com

Tlds are now valid domain

Hi,
we use your public suffix library for domain parsing. In Version to the IsValidDomain did return false when getting a TLD such as "uk" as input but in V3 it returns true. Can we adjust a setting to get the old behaviour back or is this just a bug?

I wrote a local (on my machine because I did not want to commit it) test to validate the behaviour:

public async Task ParserTest()
{
    var ruleProvider = new LocalFileRuleProvider("public_suffix_list.dat");
    await ruleProvider.BuildAsync();
    var parser = new DomainParser(ruleProvider);
    Assert.IsFalse(parser.IsValidDomain("uk"));
}

As workaround I now do also parse the domain before calling IsValidDomain and return false if RegistrableDomain is null.

Incorrectly splits the *.blogspot.com

Any idea, why "sergueiko.blogspot.com" is wrongly split? E.g it gives me:

Domain: sergueiko
Hostname: sergueiko.blogspot.com
SubDomain: null
TLD: blogspot.com

Whereas I'd expect the Domain: "blogspot'.

Is it possible to not use the write permission on Windows when using the package?

When using the package, we got this error:

title description
Message "Access to the path 'C:\Windows\TEMP\publicsuffixcache.dat' is denied."
Exception type System.UnauthorizedAccessException
Failed method Nager.PublicSuffix.FileCachePrivider + d_4.MoveNext

This happened because we don't allow our application to have write access to filesystems.

From what I investigated, it seems like the cache object is always being created when TldWeb object is passed to the constructor of DomainParser.

Is there any other way to use the DomainParser without creating a file for cache or not use cache at all?

Currently we are using this:

//cache data for 10 hours
var cacheProvider = new FileCacheProvider(cacheTimeToLive: new TimeSpan(10, 0, 0));
var webTldRuleProvider = new WebTldRuleProvider(cacheProvider: cacheProvider);

var domainParser = new DomainParser(WebTldRuleProvider);

var isValid = webTldRuleProvider.CacheProvider.IsCacheValid();
if (!isValid)
{
    webTldRuleProvider.BuildAsync().GetAwaiter().GetResult(); //Reload data
}

var domainInfo = domainParser.Parse(post.Url);
}

Our goal is to get "youtube" from "www.youtube.com/somevalue", or "wikipedia" from "www.wikipedia.org/somevalue", for the urls in our objects.

Throws exception when entered a TLD

When I try to test a TLD (like "co.uk"), I get an exception. That exception doesn't show any signs of the cause, other than the message string "Domain is a TLD according publicsuffix". I'd like to distinguish between a TLD (or an otherwise too short domain name) being entered, and other technical trouble while downloading the file, parsing the rules etc. The first is a validation error, the second is a more general technical issue. These are presented to the user and logged differently.

How would I do that if not comparing the exception message text? Could you maybe add support to analyse TLDs directly and provide corresponding data in the DomainName class? Like the Domain property being empty.

BTW, the exception text would more correctly be: "Domain is a TLD according to publicsuffix."

eTLD support?

Please support eTLD+1, eTLD and TLD ?

Third-party cookies restrict was started, so eTLD+1 is important.

I want to distinguish between eTLD and TLD.

Full Domain eTLD+1 eTLD TLD
www.example.com example.com com com
www.example.ca.us example.ca.us ca.us us

Wrong behavior

This library has a bad behavior that doesn't follow any specification, I thought to fix, but it's not a bug, it's bad design.
So I explain here how should work, then it's up to you if follow specifications or not.

  1. Public Suffix != TLD
    TLD is only the last part of a public suffix, with the exception of some reserved domains that are not public:
    -RFC 6761
    .example: reserved for use in examples
    .invalid: reserved for use in obviously invalid domain names
    .localhost: reserved to avoid conflict with the traditional use of localhost as a hostname
    test: reserved for use in tests
    RFC 6762
    .local for link-local host names that can be resolved via the Multicast DNS name resolution protocol
    RFC 7686
    .onion for the self-authenticating names of Tor hidden services

But this library assumes the equivalence between public suffix and TLD, so domains like eu-west-2.elasticbeanstalk.com are threated like if they are TLD.

2)If you pass an internal intranet domain or an invalid domain, the library assumes that the last part is a TLD, so you can pass everything and get a nonsense TLD skipping any check

ex: If you pass foo.bar as domain, you get .bar as tld and foo.bar as registrable domain

  1. CcSLD are Second level regional domains according to: https://en.wikipedia.org/wiki/Country_code_second-level_domain
    There is no way to detect a CsSLD since I can't find a public database, but they can be handled as public suffix.

  2. https://publicsuffix.org/ it's just an open source database of TLD, CsSLD and other common public domains, used for example for cloud (amazon. google,azuew,...)
    The database is very useful to check registrable domains, but it's not a reliable source.
    In any case it can't be used to identify TLDs

5)The database of TLDs is public and handled by ICANN, that provides a reliable source: https://data.iana.org/TLD/tlds-alpha-by-domain.txt

Maybe I'll think to create a new project that follows the specifications.
In any case thanks for your hard work, I always appreciate who works for opens source.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.