How to Parse RSS and ATOM Feeds in C#
By FoxLearn 1/17/2025 4:38:22 AM 39
Parsing feeds is a foundational step in building a content aggregator application, allowing us to keep track of updates from various websites.
Initially, we’ll focus on an RSS feed parser, enabling us to retrieve updates from our favorite websites. Let’s start by understanding what an RSS feed is and how it works.
What is an RSS Feed?
RSS (Really Simple Syndication) is a standardized format that allows users and applications to access updates from websites in a computer-readable format. These feeds keep users updated with headlines, summaries, and links to the full articles. With RSS, you can track content from multiple websites in a single application.
A content aggregator uses RSS feeds to fetch updates daily and organizes the content based on user preferences. In this way, the aggregator provides a centralized platform for staying informed about new content from various sources.
How Does RSS Work?
RSS files are XML documents that automatically update with new information. These files are fetched by an RSS feed reader, which converts the raw XML into readable updates. The feed reader displays summaries, headlines, and links to the full articles.
Parsing RSS and ATOM Feeds in C#
Our application will be capable of parsing RSS, RDF, and ATOM feeds. Below, we outline how to parse these different feed formats using the XDocument
class and LINQ to XML.
Loading the Feed
One way to parse a feed is to load it from a URL into an XDocument
object:
XDocument doc = XDocument.Load(feedUrl);
Since RSS feeds are XML documents, the XDocument
class provides an easy way to manipulate them. Additionally, LINQ makes querying the document structure straightforward.
public enum FeedType { RSS, RDF, Atom } public class Item { public string Link { get; set; } public string Title { get; set; } public string Content { get; set; } public DateTime PublishDate { get; set; } public FeedType FeedType { get; set; } public Item() { PublishDate = DateTime.Today; FeedType = FeedType.RSS; } } public class FeedUrl { private string _url; public string Url { get => _url; } private FeedUrl(string url) { _url = url; } public static FeedUrl Create(string url) { try { Uri uri = new Uri(url); return new FeedUrl(url); } catch (ArgumentNullException) { throw FeedUrlException.Create($"Url can not be empty"); } catch (UriFormatException) { if (string.IsNullOrEmpty(url)) throw FeedUrlException.Create($"Url can not be empty"); else throw FeedUrlException.Create($"The format of the url {url} is incorrect"); } } }
Parsing Different Feed Formats
Parsing RSS Feeds
RSS feeds follow the structure:
<rss> <channel> <item> ... </item> </channel> </rss>
You can query an RSS feed as follows:
public IList<Item> Parse(FeedUrl feedUrl) { XDocument doc = XDocument.Load(feedUrl.Url); // RSS/Channel/item var entries = from item in doc.Root.Descendants().First(i => i.Name.LocalName == "channel").Elements().Where(i => i.Name.LocalName == "item") select new Item { FeedType = FeedType.RSS, Content = item.Elements().First(i => i.Name.LocalName == "description").Value, Link = item.Elements().First(i => i.Name.LocalName == "link").Value, PublishDate = DateTimeParser.ParseDate(item.Elements().First(i => i.Name.LocalName == "pubDate").Value), Title = item.Elements().First(i => i.Name.LocalName == "title").Value }; return entries.ToList(); }
Parsing RDF Feeds
RDF feeds have a different structure, with <item>
elements directly under the root:
<rdf:RDF> <item> ... </item> </rdf:RDF>
Querying an RDF feed:
public IList<Item> Parse(FeedUrl feedUrl) { XDocument doc = XDocument.Load(feedUrl.Url); // <item> is under the root var entries = from item in doc.Root.Descendants().Where(i => i.Name.LocalName == "item") select new Item { FeedType = FeedType.RDF, Content = item.Elements().First(i => i.Name.LocalName == "description").Value, Link = item.Elements().First(i => i.Name.LocalName == "link").Value, PublishDate = DateTimeParser.ParseDate(item.Elements().First(i => i.Name.LocalName == "date").Value), Title = item.Elements().First(i => i.Name.LocalName == "title").Value }; return entries.ToList(); }
Parsing ATOM Feeds
ATOM feeds follow this structure:
<feed> <entry> ... </entry> </feed>
To query an ATOM feed:
public IList<Item> Parse(FeedUrl feedUrl) { XDocument doc = XDocument.Load(feedUrl.Url); // Feed/Entry var entries = from item in doc.Root.Elements().Where(i => i.Name.LocalName == "entry") select new Item { FeedType = FeedType.Atom, Content = item.Elements().First(i => i.Name.LocalName == "content").Value, Link = item.Elements().First(i => i.Name.LocalName == "link").Attribute("href").Value, PublishDate = DateTimeParser.ParseDate(item.Elements().First(i => i.Name.LocalName == "published").Value), Title = item.Elements().First(i => i.Name.LocalName == "title").Value }; return entries.ToList(); }
In this article, we covered the basics of parsing RSS, RDF, and ATOM feeds in C#. These feed formats share similar XML structures but differ in their specific element organization. By leveraging XDocument
and LINQ, you can efficiently extract the required data and build a robust content aggregator.
- How to fix 'Failure sending mail' in C#
- How to Parse a Comma-Separated String from App.config in C#
- How to convert a dictionary to a list in C#
- How to retrieve the Executable Path in C#
- How to validate an IP address in C#
- How to retrieve the Downloads Directory Path in C#
- C# Tutorial
- Dictionary with multiple values per key in C#