GithubHelp home page GithubHelp logo

iprodev / php-xml-sitemap-generator Goto Github PK

View Code? Open in Web Editor NEW
83.0 10.0 51.0 96 KB

PHP Script that generates a sitemap by crawling a given URL.

Home Page: http://iprodev.github.io/PHP-XML-Sitemap-Generator/

License: MIT License

PHP 100.00%

php-xml-sitemap-generator's Introduction

PHP XML Sitemap Generator

This is a simple and small PHP script that I wrote quickly for myself to create a XML sitemap of my page for Google and other search engines. Maybe others can use the script too.

Sitemap format: http://www.sitemaps.org/protocol.html

##Features

  • Actually crawls webpages like Google would
  • Generates seperate XML file which gets updated every time the script gets executed (Runnable via CRON)
  • Awesome for SEO
  • Crawls faster than online services
  • Adaptable

Usage

Usage is pretty strait forward:

  • Configure the crawler by modifying the sitemap-generator.php file
    • Select URL to crawl
    • Select the file to which the sitemap will be saved
    • Select accepted extensions ("/" is manditory for proper functionality)
    • Select change frequency (always, daily, weekly, monthly, never, etc...)
    • Choose priority (It is all relative so it may as well be 1)
  • Generate sitemap
    • Either send a GET request to this script or simply point your browser
    • A sitemap will be generated and displayed
    • Submit sitemap.xml to Google
    • Setup a CRON Job to send web requests to this script every so often, this will keep the sitemap.xml file up to date

The script can be started as CLI script or as Website. CLI is the prefered way to start this script.

CLI scripts are started from the command line, can be used with CRON and so on. You start it with the php program.

CLI command to create the XML file: php sitemap-generator.php

To start the program with your Webserver as Website change in the script the line 22 from

   define ('CLI', true);

to

   define ('CLI', false);

sitemap.xml

Add the XML file to your /robots.txt.

Example line for the robots.txt:

Sitemap: http://www.iprodev.com/sitemap.xml

Credits

PHP XML Sitemap Generator was created by Hemn Chawroka from iProDev. Released under the MIT license.

Included scripts:

php-xml-sitemap-generator's People

Contributors

iprodev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

php-xml-sitemap-generator's Issues

From txt file?

First of all Thank You on this lovely generator!

I have been trying this week to modify your generator, to delete Scanner function and add links from txt file.
So generator would use urls from alllinks.txt and generate them as sitemap.
But i cannot succeed to make it myself. Did you maybe tried something like this by yourself? If so, can you add it on this project?

Thanks again!

Script goes into loop

See that the project is a bit old, but it is a good one! Have hopes someone can help me with issue. We are trying to use in a school classroom project and I am stumped.

Sitemap gen works, but it gest hung on URL like https://example.com/image-test-2.html?page=cookies

Then the loop begins like:

https://example.com/image-test-2.html?page=cookies?page=cookies
https://example.com/image-test-2.html?page=cookies?page=cookies?page=cookies
https://example.com/image-test-2.html?page=cookies?page=cookies?page=cookies?page=cookies
https://example.com/image-test-2.html?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies
https://example.com/image-test-2.html?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies
https://example.com/image-test-2.html?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies
https://example.com/image-test-2.html?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies
https://example.com/image-test-2.html?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies
https://example.com/image-test-2.html?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies?page=cookies

I have tried to add to skip array like

$skip = array (
	"https://example.com/?page=cookies/",
	"?page=cookies",
	"uploaded_images/",
	"?",
	 );

Nothing helps. The script will not even skip over "uploaded_images/"

How to prevent script scanning ?page= ???

Had to update simple_html_dom.php to latest so work with PHP 7.4

Option/Suggestion, Keywords and Descriptions

Would it be possible to add to this script to get Meta Keywords and Descriptions from each page and list in an xml.
I know this is a sitemap generator, but it would be handy for SEO analysing on a large site.

https site,need modify GetUrl function.

if your site use https,it's https, like my site https://www.zhoulujun.com it's not okay
you should modify GetUrl function like this
` function GetUrl ($url,$CA = true) {
$agent = "Mozilla/5.0 (compatible; iProDev PHP XML Sitemap Generator/" . VERSION . ", https://www.zhoulujun.com)";
$cacert = getcwd() . '/cacert.pem'; //CA根证书
$SSL = substr($url, 0, 8) == "https://" ? true : false;
$ch = curl_init();
curl_setopt ($ch, CURLOPT_AUTOREFERER, true);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_VERBOSE, 1);
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // 信任任何证书
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0); // 检查证书中是否设置域名
// if ($SSL && $CA) {
// curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true); // 只信任CA颁布的证书
// curl_setopt($ch, CURLOPT_CAINFO, $cacert); // CA根证书(用来验证的网站证书是否是CA颁布)
// curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0); // 检查证书中是否设置域名,并且是否与提供的主机名匹配
// } else if ($SSL && !$CA) {
// curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // 信任任何证书
// curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0); // 检查证书中是否设置域名
// }
$data = curl_exec($ch);

	curl_close($ch);

	return $data;
}`

thats okay。but my site has ten thousand pages
so,it’s will be error
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes)

Double entries

Hello,

I always get double entries of a domain like this:

http://domain.de
http://domain.de/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.