Click here to Skip to main content
12,628,919 members (33,703 online)
Click here to Skip to main content
Add your own
alternative version

Tagged as

Stats

8.9K views
6 bookmarked
Posted

Creating multi-page gzip-compressed sitemap with sitemap index for Google Webmaster Tools

, 28 Nov 2012 CPOL
Rate this:
Please Sign up or sign in to vote.
This article is intended for people who are planning to develop sitemap for a site with large amount of pages.

Introduction

Sitemap is an important part of your site, and large sites with more than 50,000 pages require a multi-page sitemaps. It might become a challenge to create such a sitemap, especially when trying to run it on the godaddy hosting, which is quite limiting.

Background

This article is intended for all types of audience and whoever could find it useful. SQL knowledge is required in order to create two SQL queries.

Using the code

The code is quite simple. The main idea is to calculate total amount of sitemap pages, and to create one page at the time using database offset (limiting amount of rows for each SQL request.) Two files are included: the generateSitemap.php and dbc.php. The generateSitemap is the logic, and the dbc is a database work file. Two queries are needed: first is to get total amount of products designated for the sitemap, and the second is to generate sitemap links for each product. Both queries are hardcoded to the dbc.php file

dbc.php:

<?php
class dbc {

    public $dbserver = 'SERVER';
    public $dbusername = 'USERNAME';
    public $dbpassword = 'PASSWORD';
    public $dbname = 'DATABASE NAME';

    function openDb() {
        try {
            $db = new PDO('mysql:host=' . $this->dbserver . ';dbname=' . 
              $this->dbname . ';charset=utf8', '' . $this->dbusername . '', '' . $this->dbpassword . '');
        } catch (PDOException $e) {
            die("error, please try again");
        }
        return $db;
    }

    function getTotalProductsInDatabase($recordsPerSiteMapFile) {
        $query = "SELECT count(*) as cnt FROM products";
        $dba = $this->openDb();
        $stmt = $dba->prepare($query);
        $stmt->execute();
        $row = $stmt->fetch();
        $dba = null;
        unset($dba);
        unset($stmt);
        //return total amount of sitemap pages
        return (((int) ($row['cnt'] / $recordsPerSiteMapFile)) + 1);
    }

    function getProductsForSitemapFileNumber($recordsPerSiteMapFile, $offset) {
        // query that returns 1 column that contains n-amount
        // of sitemap links - we will loop over them to create sitemap files.
        // query must end with: "limit ? OFFSET ?" - since we deal
        // with large amount of records, we need to partition our records into chunks
        $query = "(select product_links as description from products limit ? OFFSET ?)";
        $dba = $this->openDb();
        $stmt = $dba->prepare($query);
        $stmt->bindValue(1, $recordsPerSiteMapFile, PDO::PARAM_INT);
        $stmt->bindValue(2, $offset * $recordsPerSiteMapFile, PDO::PARAM_INT);
        $stmt->execute();
        $rows = $stmt->fetchAll(PDO::FETCH_ASSOC);
        $dba = null;
        unset($dba);
        unset($stmt);
        return $rows;
    }
}
?>

The code below is the core logic. Brief explanation: get current page, if no page specified, just start with the 0. Get n-amount of links for current page from the database(using offset). Loop over all links and add them to the sitemap. Save sitemap to the xml file using gzip compression. Redirect to next sitemap page or complete the page generation by creating the sitemap index file.

generateSitemap.php:

<?php
ini_set('display_errors', true);
error_reporting(E_ALL);
ini_set('memory_limit', '-1');
set_time_limit(0);
require 'dbc.php';

$db = new dbc();

//specify amount of records for a sitemap (no more than 50,000 (file size should not be over 10mb)
$recordsPerSiteMapFile = 30000;
$SERVER_NAME = "http://www.sitename.com/";
$rootPath = "/home/content/sitename.com/sitemaps/";
//the subdirectory sitemap is not declared here, it is hardcoded
$currentPage = getanyValue('page'); //get the page number, from the browser address. If no page specified, assumed 0; new start
$amountOfPages = ($db->getTotalProductsInDatabase($recordsPerSiteMapFile)); //how many total sitemap pages are there
header("Content-type: text/html; charset=utf-8");

//Start making the XML file for current sitemap page
$xmlDoc = new DOMDocument();
$root = $xmlDoc->appendChild(
        $xmlDoc->createElement("urlset"));
$tutTag = $root->appendChild(
                $xmlDoc->createAttribute("xmlns"))->appendChild(
        $xmlDoc->createTextNode("http://www.google.com/schemas/sitemap/0.9"));

//get records from the database for current sitemap offset
//rows contain only 1 column = DESCRIPTION. This column going to the sitemap
$currentSitemapPageRows = ($db->getProductsForSitemapFileNumber($recordsPerSiteMapFile, $currentPage));

//loop over each link and add it to the sitemap file
foreach ($currentSitemapPageRows as $key => $row) {
    $final_url = $SERVER_NAME . fixSymbols(getUrlFriendlyString($row{'description'}));
    $tutTag = $root->appendChild(
            $xmlDoc->createElement("url"));
    $tutTag->appendChild(
            $xmlDoc->createElement("loc", htmlentities($final_url)));
    $tutTag->appendChild(
            $xmlDoc->createElement("priority", "0.5"));
}
//sitemap file name
$fname = "sitemap_" . $currentPage . ".xml.gz";

$xmlDoc->formatOutput = true;
$theOutput = gzencode($xmlDoc->saveXML(), 9);

//create archive with the sitemap page
file_put_contents($rootPath . $fname, $theOutput);

unset($xmlDoc);
unset($currentSitemapPageRows);
unset($theOutput);
unset($tutTag);

//if current page if Last page, then create sitemap index file. 
//Otherwise, create a next sitemap file(redirect to itself with next sitemap page number)
if ($amountOfPages == $currentPage) {
    createSiteMapIndexFile($amountOfPages, $SERVER_NAME, $rootPath);
} else {
    ?>

Points of Interest

Script is useful for memory and resource-limited environments. Each page is generated independently, so the time-consuming script runs in batches.

History

11/28/2012 - First release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Andrew Zan
United States United States
Programming for fun using C#, Java, JSP, Servlets and PHP.

You may also be interested in...

Pro
Pro

Comments and Discussions

 
QuestionHow to add 2 new lines in loop Pin
Lucifix129-May-13 10:02
memberLucifix129-May-13 10:02 
QuestionCreate from URL list? Pin
The Real Glenn1-Dec-12 1:35
memberThe Real Glenn1-Dec-12 1:35 
AnswerRe: Create from URL list? Pin
Andrew Zan1-Dec-12 4:01
memberAndrew Zan1-Dec-12 4:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.161205.3 | Last Updated 28 Nov 2012
Article Copyright 2012 by Andrew Zan
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid