GithubHelp home page GithubHelp logo

djiwandou / sitemapper-for-js Goto Github PK

View Code? Open in Web Editor NEW

This project forked from comcast/sitemapper-for-js

0.0 0.0 0.0 25 KB

Generate XML sitemaps for JS Websites (Supports Angular, React)

License: Apache License 2.0

JavaScript 100.00%

sitemapper-for-js's Introduction

SITEMAP GENERATOR FOR SPA (SINGLE PAGE APPLICATION)

About

Sitemaps are simple XML documents consisting of links of all pages in a website. This provides additional information about the page to search engine crawlers to categories the type of content and serve it to the users based on their search keywords.

Most commonly used sitemap generators works well with websites with multiple pages that are built with PHP, ASP.NET or any old-school technologies since the browser reloads everytime the user navigates through different pages. Wherease, websites that rely heavily on Javascript (Like Angular / React / Vuejs ) will not reload during page navigation and only the view changes. This makes the existing generators difficult to capture different routes and create a sitemap out of it.

This Sitemap generator, built with puppeteer (Google Chrome's Headless Chrome Node API) works well with Javascript based websites in creating Sitemaps

Setup & Configuration

npm install

To start generating

npm start

config.js

module.exports  = {
    base: 'https://www.xfinity.com', // website url
    urls: [ // list of pages you want to crawl
        'https://www.xfinity.com/mobile', 
        'https://www.xfinity.com/mobile/plan',
        'https://www.xfinity.com/mobile/byod',
        'https://www.xfinity.com/mobile/support',
        'https://www.xfinity.com/mobile/shop?category=device',
        'https://www.xfinity.com/mobile/shop?category=accessories'
    ],
    strictPresence: 'www.xfinity.com/mobile/', // url will be added to xml only if this exists
    ignoreStrings: [ // ignore any url that has these texts
        'img.xfinity',
        'styles.',
        'm.me'
    ],
    autoCrawl: false, // Recursive crawling functionality
    crawlLevel: 0, // Recursive calling for pages upto 'x' levels
    pageLoad: { // page load configuration
        waitUntil: 'networkidle0',
        timeout: 3000000
    },
    disableHashRoutes: false, // disable routes with Hash in it
    sortBy: 'asc' // 'asc' | 'dsc' | 'none'
}

Configurations

base

base: 'https://www.xfinity.com'

Website that you want to create sitemap for

urls (only for manual Crawling)

Array of urls that you wanted to crawl. Links present in the mentional html pages will not be recursively called in this

strictPresence

Add the url to XML only if this string presents

ignoreStrings

List of urls/strings you wanted to ignore in the links you are adding to Sitemap

autoCrawl

Enable/Disable Auto Crawling feature. Auto-crawling takes more time than manual crawl. Largely depends on the complexity of website

crawlLevel(only for auto-crawl)

Mention number of child routes you would like to crawl in case of auto-crawling

E.g.: Lets assume the base url https://abc.com/,

crawlLevel=1 would crawl in pattern https://abc.com/<any-path> crawlLevel=2 would crawl in pattern https://abc.com/<any-path>/<any-path>

pageLoad

Page load settings inherited from puppeteer configuration

https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagegotourl-options

disableHashRoutes

Ignore any routes with # in it

E.g.:

Avoids these urls

https://abc.com/#section2 https://abc.com/#/section2 https://abc.com/about#section4

License

This repo is licensed under Apache License 2.0.

sitemapper-for-js's People

Contributors

johnriv avatar suryalovesjs avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.