How to generate large XML sitemaps from a database in C#
By FoxLearn 12/26/2024 7:00:01 AM 57
While many online tools can generate sitemaps, such as xml-sitemaps.com, they are typically crawler-based. This means that the tool crawls your website starting from a URL, finds links, and continues traversing the site to build the sitemap.
The main issue with crawler-based tools is that they can put excessive strain on your web server. These tools might send many simultaneous requests to your server, overwhelming it and leading to performance degradation, or in worst-case scenarios, server crashes. Furthermore, the crawling process can take a long time, especially if your site is large, and may fail partway through, leaving the sitemap incomplete or corrupt.
Additionally, Google’s search bots already perform the task of crawling your site, and they do it more efficiently than any external crawler. If your website is pulling content from a database, it's highly likely that you have a better understanding of what pages should be included in the sitemap, making it unnecessary for an external tool to crawl your site.
I created a utility specifically designed to generate large XML sitemaps directly from a database query. The utility pulls data directly from the database, formats it as an XML sitemap, and saves it to files without making a single HTTP request.
It splits large sitemaps into multiple smaller files, with each file containing a manageable number of URLs. Google accepts up to 50,000 URLs per sitemap file, so the utility automatically creates a parent sitemap linking to child sitemaps.
The SiteMapGenerator
class allows you to specify the file directory, base file name, and maximum URLs per file. It generates XML sitemaps based on data retrieved from your database.
using System; using System.Collections.Generic; using System.Text; using System.Xml; using System.IO; class SiteMapWriter { private const string XmlFormatString = @"<?xml version=""1.0"" encoding=""UTF-8""?>"; private const string XmlStylesheetString = @"<?xml-stylesheet type=""text/xsl"" href=""sitemap.xsl""?>"; private const string Xmlns = "http://www.sitemaps.org/schemas/sitemap/0.9"; private const string XmlnsXsi = "http://www.w3.org/2001/XMLSchema-instance"; private const string XsiSchemaLocation = "http://www.sitemaps.org/schemas/sitemap/0.9\nhttp://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"; private int MaxUrlsPerFile; // Max URLs per file private XmlTextWriter writer; private XmlTextWriter parentWriter; private readonly HashSet<string> urls; private int numberOfFiles; private int numberOfUrls; private readonly string baseUri; private readonly string fileDirectory; private readonly string baseFileName; public SiteMapWriter(string fileDirectory, string baseFileName, string baseUri = "", int maxUrlsPerFile = 30000) { urls = new HashSet<string>(); numberOfFiles = 1; numberOfUrls = 0; this.baseUri = baseUri; this.baseFileName = baseFileName; this.fileDirectory = fileDirectory; this.MaxUrlsPerFile = maxUrlsPerFile; // Initialize parent writer string parentFilePath = Path.Combine(fileDirectory, $"{baseFileName}.xml"); parentWriter = new XmlTextWriter(parentFilePath, Encoding.UTF8) { Formatting = Formatting.Indented }; WriteXmlHeaders(parentWriter); parentWriter.WriteStartElement("sitemapindex"); parentWriter.WriteAttributeString("xmlns", Xmlns); parentWriter.WriteAttributeString("xmlns:xsi", XmlnsXsi); parentWriter.WriteAttributeString("xsi:schemaLocation", XsiSchemaLocation); CreateUrlSet(); } // Write the common XML headers (format and stylesheet) private void WriteXmlHeaders(XmlTextWriter writer) { writer.WriteRaw(XmlFormatString); writer.WriteRaw("\n"); writer.WriteRaw(XmlStylesheetString); } public void AddUrl(string loc, double priority = 0.5, string changefreq = null) { if (urls.Contains(loc)) return; writer.WriteStartElement("url"); writer.WriteElementString("loc", loc); if (!string.IsNullOrEmpty(changefreq)) writer.WriteElementString("changefreq", changefreq); writer.WriteElementString("priority", string.Format("{0:0.0000}", priority)); writer.WriteEndElement(); urls.Add(loc); numberOfUrls++; if (numberOfUrls % 2000 == 0) Console.WriteLine($"Urls Processed: {numberOfUrls}"); if (numberOfUrls >= MaxUrlsPerFile) LimitIsMet(); } private void LimitIsMet() { CloseWriter(); numberOfFiles++; CreateUrlSet(); numberOfUrls = 0; } private void CreateUrlSet() { string filePath = Path.Combine(fileDirectory, $"{baseFileName}_{numberOfFiles}.xml"); // Ensure unique filenames, avoiding unnecessary file deletions using (writer = new XmlTextWriter(filePath, Encoding.UTF8) { Formatting = Formatting.Indented }) { WriteXmlHeaders(writer); writer.WriteStartElement("urlset"); writer.WriteAttributeString("xmlns", Xmlns); writer.WriteAttributeString("xmlns:xsi", XmlnsXsi); writer.WriteAttributeString("xsi:schemaLocation", XsiSchemaLocation); // Add the generated file to the parent sitemap index AddSiteMapFile($"{baseFileName}_{numberOfFiles}.xml"); } } private void AddSiteMapFile(string filename) { parentWriter.WriteStartElement("sitemap"); parentWriter.WriteElementString("loc", $"{baseUri}{filename}"); parentWriter.WriteEndElement(); } private void CloseWriter() { writer.WriteEndElement(); writer.Flush(); writer.Close(); } public void Finish() { CloseWriter(); parentWriter.WriteEndElement(); parentWriter.Flush(); parentWriter.Close(); } }
How to generate a sitemap in C#?
using System; using System.Text; using System.Threading.Tasks; public static void Main(string[] args) { var writer = new SiteMapWriter( @"C:\SiteMapDirectory\", "SiteMap", "http://www.foxlearn.com/"); // Pre-generate a list of GUIDs and use them to create URLs var urls = new string[300000]; for (int i = 0; i < urls.Length; i++) { // Create unique URLs using GUID urls[i] = $"{Guid.NewGuid()}/index.html"; } // Process URLs in parallel to utilize multi-core processors (if applicable) Parallel.ForEach(urls, (url) => { writer.AddUrl(url); }); // Finalize the sitemap writing process writer.Finish(); }
This code generates 300,000 URLs, each using a random GUID. The utility will automatically handle splitting the sitemap into smaller files if necessary, ensuring that the generated sitemap adheres to the 50,000 URL limit imposed by Google.
This custom utility offers an efficient way to generate large XML sitemaps directly from a database without the need for crawling. It significantly reduces server load and speeds up the sitemap generation process.
- How to fix 'Failure sending mail' in C#
- How to Parse a Comma-Separated String from App.config in C#
- How to convert a dictionary to a list in C#
- How to retrieve the Executable Path in C#
- How to validate an IP address in C#
- How to retrieve the Downloads Directory Path in C#
- C# Tutorial
- Dictionary with multiple values per key in C#