How to Parse RSS and ATOM Feeds in C#

By FoxLearn 1/17/2025 4:38:22 AM   39
In this article, we’ll learn how to parse RSS and ATOM feeds using C#.

Parsing feeds is a foundational step in building a content aggregator application, allowing us to keep track of updates from various websites.

Initially, we’ll focus on an RSS feed parser, enabling us to retrieve updates from our favorite websites. Let’s start by understanding what an RSS feed is and how it works.

What is an RSS Feed?

RSS (Really Simple Syndication) is a standardized format that allows users and applications to access updates from websites in a computer-readable format. These feeds keep users updated with headlines, summaries, and links to the full articles. With RSS, you can track content from multiple websites in a single application.

A content aggregator uses RSS feeds to fetch updates daily and organizes the content based on user preferences. In this way, the aggregator provides a centralized platform for staying informed about new content from various sources.

How Does RSS Work?

RSS files are XML documents that automatically update with new information. These files are fetched by an RSS feed reader, which converts the raw XML into readable updates. The feed reader displays summaries, headlines, and links to the full articles.

Parsing RSS and ATOM Feeds in C#

Our application will be capable of parsing RSS, RDF, and ATOM feeds. Below, we outline how to parse these different feed formats using the XDocument class and LINQ to XML.

Loading the Feed

One way to parse a feed is to load it from a URL into an XDocument object:

XDocument doc = XDocument.Load(feedUrl);

Since RSS feeds are XML documents, the XDocument class provides an easy way to manipulate them. Additionally, LINQ makes querying the document structure straightforward.

public enum FeedType
{
    RSS,
    RDF,
    Atom
}

public class Item
{
    public string Link { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }
    public DateTime PublishDate { get; set; }
    public FeedType FeedType { get; set; }

    public Item()
    {
        PublishDate = DateTime.Today;
        FeedType = FeedType.RSS;
    }
}

public class FeedUrl
{
    private string _url;
    public string Url { get => _url; }

    private FeedUrl(string url)
    {
        _url = url;
    }

    public static FeedUrl Create(string url)
    {
        try
        {
            Uri uri = new Uri(url);
            return new FeedUrl(url);
        }
        catch (ArgumentNullException)
        {
            throw FeedUrlException.Create($"Url can not be empty");
        }
        catch (UriFormatException)
        {
            if (string.IsNullOrEmpty(url))
                throw FeedUrlException.Create($"Url can not be empty");
            else
                throw FeedUrlException.Create($"The format of the url {url} is incorrect");
        }
    }
}

Parsing Different Feed Formats

Parsing RSS Feeds

RSS feeds follow the structure:

<rss>
  <channel>
    <item>
      ...
    </item>
  </channel>
</rss>

You can query an RSS feed as follows:

public IList<Item> Parse(FeedUrl feedUrl)
{
    XDocument doc = XDocument.Load(feedUrl.Url);
    // RSS/Channel/item
    var entries = from item in doc.Root.Descendants().First(i => i.Name.LocalName == "channel").Elements().Where(i => i.Name.LocalName == "item")
                  select new Item
                  {
                      FeedType = FeedType.RSS,
                      Content = item.Elements().First(i => i.Name.LocalName == "description").Value,
                      Link = item.Elements().First(i => i.Name.LocalName == "link").Value,
                      PublishDate = DateTimeParser.ParseDate(item.Elements().First(i => i.Name.LocalName == "pubDate").Value),
                      Title = item.Elements().First(i => i.Name.LocalName == "title").Value
                  };
    return entries.ToList();
}

Parsing RDF Feeds

RDF feeds have a different structure, with <item> elements directly under the root:

<rdf:RDF>
  <item>
    ...
  </item>
</rdf:RDF>

Querying an RDF feed:

public IList<Item> Parse(FeedUrl feedUrl)
{
    XDocument doc = XDocument.Load(feedUrl.Url);
    // <item> is under the root
    var entries = from item in doc.Root.Descendants().Where(i => i.Name.LocalName == "item")
                  select new Item
                  {
                      FeedType = FeedType.RDF,
                      Content = item.Elements().First(i => i.Name.LocalName == "description").Value,
                      Link = item.Elements().First(i => i.Name.LocalName == "link").Value,
                      PublishDate = DateTimeParser.ParseDate(item.Elements().First(i => i.Name.LocalName == "date").Value),
                      Title = item.Elements().First(i => i.Name.LocalName == "title").Value
                  };
    return entries.ToList();
}

Parsing ATOM Feeds

ATOM feeds follow this structure:

<feed>
  <entry>
    ...
  </entry>
</feed>

To query an ATOM feed:

public IList<Item> Parse(FeedUrl feedUrl)
{
    XDocument doc = XDocument.Load(feedUrl.Url);
    // Feed/Entry
    var entries = from item in doc.Root.Elements().Where(i => i.Name.LocalName == "entry")
                  select new Item
                  {
                      FeedType = FeedType.Atom,
                      Content = item.Elements().First(i => i.Name.LocalName == "content").Value,
                      Link = item.Elements().First(i => i.Name.LocalName == "link").Attribute("href").Value,
                      PublishDate = DateTimeParser.ParseDate(item.Elements().First(i => i.Name.LocalName == "published").Value),
                      Title = item.Elements().First(i => i.Name.LocalName == "title").Value
                  };
    return entries.ToList();
}

In this article, we covered the basics of parsing RSS, RDF, and ATOM feeds in C#. These feed formats share similar XML structures but differ in their specific element organization. By leveraging XDocument and LINQ, you can efficiently extract the required data and build a robust content aggregator.