Manipulating XML Google Merchant Data with C# and LINQ

By FoxLearn 1/10/2025 8:21:06 AM   42
When working with data from the Google Merchant Center, you often receive a Google Product Feed, typically in XML format, containing product information.

This feed follows the RSS 2.0 standard, but with some added properties, like the xmlns:g="http://base.google.com/ns/1.0" namespace, specifically for Google product data.

However, this feed can sometimes contain irrelevant, outdated, or inconsistent data, especially when generated by legacy systems or busy merchants. As a result, you often need to clean up and manipulate the data before importing it into your product database.

In this article, we’ll explore how to manipulate Google Merchant feed data with C# and LINQ to handle tasks like removing duplicates, filtering out-of-stock items, and even adding custom query parameters to product URLs.

Example Google Merchant Feed

A typical Google Merchant product feed might look like this:

<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
    <channel>
        <title>Google product feed</title>
        <link href="https://pentia.dk" rel="alternate" type="text/html"/>
        <description>Google product feed</description>
        <item>
            <g:id><![CDATA[1123432]]></g:id>
            <title><![CDATA[Some product]]></title>
            <link><![CDATA[https://pentia.dk]]></link>
            <g:description><![CDATA[description]]></g:description>
            <g:gtin><![CDATA[5712750043243446]]></g:gtin>
            <g:mpn><![CDATA[34432-00]]></g:mpn>
            <g:image_link><![CDATA[https://pentia.dk/someimage.jpg]]></g:image_link>
            <g:product_type><![CDATA[Home &gt; Dresses &gt; Maxi Dresses]]></g:product_type>
            <g:condition><![CDATA[new]]></g:condition>
            <g:availability><![CDATA[in stock]]></g:availability>
            <g:price><![CDATA[15.00 USD]]></g:price>
            <g:sale_price><![CDATA[10.00 USD]]></g:sale_price>
        </item>
    </channel>
</rss>

This feed contains various product details, such as ID, title, description, and price, all encapsulated in an <item> element within the XML. Let’s look at how we can manipulate this data using C# and LINQ.

Load the XML Feed

The first step is to retrieve and load the XML feed into an XDocument object for manipulation. You can do this using an HTTP client to fetch the feed and then parse it into an XML document:

using System;
using System.Net.Http;
using System.Xml.Linq;
using System.Linq;
using System.Threading.Tasks;

private static HttpClient _httpClient = new HttpClient();

public static async Task<string> GetFeed(string url)
{
    using (var result = await _httpClient.GetAsync(url))
    {
        string content = await result.Content.ReadAsStringAsync();
        return content;
    }
}

public static void Run()
{
    // Get the RSS 2.0 XML data
    string feedData = GetFeed("https://url/thefeed.xml").Result;

    // Convert the data into an XDocument
    var document = XDocument.Parse(feedData);

    // Specify the Google namespace
    XNamespace g = "http://base.google.com/ns/1.0";
    
    // Get all "item" nodes
    var items = document.Descendants().Where(node => node.Name == "item");
}

Manipulate the Feed Data

Once the XML feed is loaded into an XDocument, you can use LINQ to manipulate the feed’s data.

Here are some common operations you might need to perform:

Remove Duplicate Products

If the feed contains duplicate products with the same ID, you can group by the product ID and remove all duplicates:

items.GroupBy(node => node.Element(g + "id").Value)
    .SelectMany(group => group.Skip(1))
    .Remove();

Remove Out-of-Stock Products

You can remove items that are marked as "out of stock" by filtering based on the <g:availability> element:

items = document.Descendants()
    .Where(node => node.Name == "item"
           && node.Descendants().Any(desc => desc.Name == g + "availability" && desc.Value == "out of stock"))
    .ToList();
items.Remove();

Remove Products Not on Sale

To filter out products that don’t have a sale price, you can check if the <g:sale_price> field is empty:

items = document.Descendants()
    .Where(node => node.Name == "item"
           && node.Descendants().Any(desc => desc.Name == g + "sale_price" && string.IsNullOrWhiteSpace(desc.Value)))
    .ToList();
items.Remove();
items = document.Descendants()
    .Where(node => node.Name == "item"
           && node.Descendants().Any(desc => desc.Name == g + "sale_price" && string.IsNullOrWhiteSpace(desc.Value)))
    .ToList();
items.Remove();

Add Tracking Parameters to URLs

You may want to append tracking parameters like UTM tags to the product URLs. This can be done by iterating through each product and modifying the <link> element:

foreach (var item in items)
{
    string url = item.Element("link").Value;
    if (url.Contains("?"))
        item.Element("link").ReplaceNodes(new XCData(url + "&utm_source=s&utm_medium=m&utm_campaign=c"));
    else
        item.Element("link").ReplaceNodes(new XCData(url + "?utm_source=s&utm_medium=m&utm_campaign=c"));
}

Modify Product Titles

If your feed contains used products, you might want to add the word “USED” to the title of each product:

foreach (var item in items)
{
    var title = "USED " + item.Element("title").Value;
    item.Element("title").ReplaceNodes(title);
}

Group Products by Type

If a particular product type has fewer than two products, you might want to categorize them as "Other".

foreach (var group in items.GroupBy(node => node.Element(g + "product_type").Value))
{
    if (group.Count() <= 2)
    {
        foreach (var advert in group)
        {
            advert.Element(g + "product_type").ReplaceNodes(new XCData("Other"));
        }
    }
}

Convert the Manipulated Data Back to XML

Once you've applied the necessary changes, you can convert the manipulated data back into an XML string:

string convertedFeedData = document.ToString();

Manipulating Google Merchant XML data with C# and LINQ allows you to perform powerful data transformations in a minimal amount of code. Whether you need to remove duplicates, filter out-of-stock items, or add tracking parameters to URLs, C# and LINQ provide a flexible and efficient way to clean up and prepare your product feed for import into your database.