GithubHelp home page GithubHelp logo

louw.sitemapparser's Introduction

Louw.SitemapParser

.NET libary to parse Sitemap files. See official specification: https://www.sitemaps.org/protocol.html

Support for various sitemap types:

  • Parse Robots.txt to detect sitemaps
  • Index Sitemaps
  • Normal Sitemaps

FUTURE DEVLOPMENT ROADMAP:

#####nuget The package is available on nuget https://www.nuget.org/packages/Louw.SitemapParser

install-package Louw.SitemapParser

#####Basic Example

	var sitemapLink = new Sitemap(new Uri("https://www.google.com/sitemap.xml"));
    var loadedSitemap = await sitemapLink.LoadAsync();

    if (loadedSitemap.SitemapType == SitemapType.Index)
        Debug.WriteLine($"Sitemap Index contains {loadedSitemap.Sitemaps.Count()} entries");
    else if (loadedSitemap.SitemapType == SitemapType.Items)
        Debug.WriteLine($"Sitemap contains {loadedSitemap.Items.Count()} content locations");

#####Load Sitemaps From Robots.txt Example

	var loader = new SitemapLoader();
    Sitemap robotSitemap = await loader.LoadFromRobotsTxtAsync(new Uri("https://www.google.com"));
    Assert.Equal(SitemapType.RobotsTxt, robotSitemap.SitemapType);
    Assert.NotEmpty(robotSitemap.Sitemaps); //We expect at least some Sitemaps to be in list
    Assert.Empty(robotSitemap.Items); //Robots.txt can only link to Sitemaps  (Not content items)

    Sitemap firstSitemap = robotSitemap.Sitemaps.First();
    Assert.False(firstSitemap.IsLoaded); //We only have sitemap location. Contents not yet loaded nor parsed

    var firstLoadedSitemap = await loader.LoadAsync(firstSitemap);
    Assert.True(firstLoadedSitemap.IsLoaded); //Now items are loaded!

    //We have to check type as we can either have links to other sitemaps (i.e. index sitemaps) 
    //-or- links to actual sitemap items (i.e. links to content)
    switch (firstLoadedSitemap.SitemapType)
    {
        case SitemapType.Index: Assert.NotEmpty(firstLoadedSitemap.Sitemaps); break;
        case SitemapType.Items: Assert.NotEmpty(firstLoadedSitemap.Items); break;
        default: throw new NotSupportedException($"SitemapType {firstLoadedSitemap.SitemapType} not expected here");
    }

#####More Examples

More examples can be found here: https://github.com/louislouw/Louw.SitemapParser/blob/master/test/Louw.SitemapParser.Examples/Examples.cs

louw.sitemapparser's People

Contributors

louislouw avatar

Watchers

James Cloos avatar Mark Chipman and sidekick Mollie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.