Manipulating XML Google Merchant Data with C# and LINQ
By FoxLearn 1/10/2025 8:21:06 AM 42
This feed follows the RSS 2.0 standard, but with some added properties, like the xmlns:g="http://base.google.com/ns/1.0" namespace, specifically for Google product data.
However, this feed can sometimes contain irrelevant, outdated, or inconsistent data, especially when generated by legacy systems or busy merchants. As a result, you often need to clean up and manipulate the data before importing it into your product database.
In this article, we’ll explore how to manipulate Google Merchant feed data with C# and LINQ to handle tasks like removing duplicates, filtering out-of-stock items, and even adding custom query parameters to product URLs.
Example Google Merchant Feed
A typical Google Merchant product feed might look like this:
<?xml version="1.0" encoding="utf-8" ?> <rss version="2.0" xmlns:g="http://base.google.com/ns/1.0"> <channel> <title>Google product feed</title> <link href="https://pentia.dk" rel="alternate" type="text/html"/> <description>Google product feed</description> <item> <g:id><![CDATA[1123432]]></g:id> <title><![CDATA[Some product]]></title> <link><![CDATA[https://pentia.dk]]></link> <g:description><![CDATA[description]]></g:description> <g:gtin><![CDATA[5712750043243446]]></g:gtin> <g:mpn><![CDATA[34432-00]]></g:mpn> <g:image_link><![CDATA[https://pentia.dk/someimage.jpg]]></g:image_link> <g:product_type><![CDATA[Home > Dresses > Maxi Dresses]]></g:product_type> <g:condition><![CDATA[new]]></g:condition> <g:availability><![CDATA[in stock]]></g:availability> <g:price><![CDATA[15.00 USD]]></g:price> <g:sale_price><![CDATA[10.00 USD]]></g:sale_price> </item> </channel> </rss>
This feed contains various product details, such as ID, title, description, and price, all encapsulated in an <item>
element within the XML. Let’s look at how we can manipulate this data using C# and LINQ.
Load the XML Feed
The first step is to retrieve and load the XML feed into an XDocument
object for manipulation. You can do this using an HTTP client to fetch the feed and then parse it into an XML document:
using System; using System.Net.Http; using System.Xml.Linq; using System.Linq; using System.Threading.Tasks; private static HttpClient _httpClient = new HttpClient(); public static async Task<string> GetFeed(string url) { using (var result = await _httpClient.GetAsync(url)) { string content = await result.Content.ReadAsStringAsync(); return content; } } public static void Run() { // Get the RSS 2.0 XML data string feedData = GetFeed("https://url/thefeed.xml").Result; // Convert the data into an XDocument var document = XDocument.Parse(feedData); // Specify the Google namespace XNamespace g = "http://base.google.com/ns/1.0"; // Get all "item" nodes var items = document.Descendants().Where(node => node.Name == "item"); }
Manipulate the Feed Data
Once the XML feed is loaded into an XDocument
, you can use LINQ to manipulate the feed’s data.
Here are some common operations you might need to perform:
Remove Duplicate Products
If the feed contains duplicate products with the same ID, you can group by the product ID and remove all duplicates:
items.GroupBy(node => node.Element(g + "id").Value) .SelectMany(group => group.Skip(1)) .Remove();
Remove Out-of-Stock Products
You can remove items that are marked as "out of stock" by filtering based on the <g:availability>
element:
items = document.Descendants() .Where(node => node.Name == "item" && node.Descendants().Any(desc => desc.Name == g + "availability" && desc.Value == "out of stock")) .ToList(); items.Remove();
Remove Products Not on Sale
To filter out products that don’t have a sale price, you can check if the <g:sale_price>
field is empty:
items = document.Descendants() .Where(node => node.Name == "item" && node.Descendants().Any(desc => desc.Name == g + "sale_price" && string.IsNullOrWhiteSpace(desc.Value))) .ToList(); items.Remove();
items = document.Descendants() .Where(node => node.Name == "item" && node.Descendants().Any(desc => desc.Name == g + "sale_price" && string.IsNullOrWhiteSpace(desc.Value))) .ToList(); items.Remove();
Add Tracking Parameters to URLs
You may want to append tracking parameters like UTM tags to the product URLs. This can be done by iterating through each product and modifying the <link>
element:
foreach (var item in items) { string url = item.Element("link").Value; if (url.Contains("?")) item.Element("link").ReplaceNodes(new XCData(url + "&utm_source=s&utm_medium=m&utm_campaign=c")); else item.Element("link").ReplaceNodes(new XCData(url + "?utm_source=s&utm_medium=m&utm_campaign=c")); }
Modify Product Titles
If your feed contains used products, you might want to add the word “USED” to the title of each product:
foreach (var item in items) { var title = "USED " + item.Element("title").Value; item.Element("title").ReplaceNodes(title); }
Group Products by Type
If a particular product type has fewer than two products, you might want to categorize them as "Other".
foreach (var group in items.GroupBy(node => node.Element(g + "product_type").Value)) { if (group.Count() <= 2) { foreach (var advert in group) { advert.Element(g + "product_type").ReplaceNodes(new XCData("Other")); } } }
Convert the Manipulated Data Back to XML
Once you've applied the necessary changes, you can convert the manipulated data back into an XML string:
string convertedFeedData = document.ToString();
Manipulating Google Merchant XML data with C# and LINQ allows you to perform powerful data transformations in a minimal amount of code. Whether you need to remove duplicates, filter out-of-stock items, or add tracking parameters to URLs, C# and LINQ provide a flexible and efficient way to clean up and prepare your product feed for import into your database.
- How to Convert string to JSON in C#
- How to take a screenshot with Selenium WebDriver
- Sending JSON in .NET Core
- Writing Files Asynchronously with Multiple Threads in .NET Core
- Set Local Folder for .NET Core in C#
- How to Remove Duplicates from a List with LINQ in C#
- HttpClient Follow 302 Redirects in .NET Core
- Ignoring Namespaces in XML when Deserializing in C#