How to remove HTML tags from string in C#
By FoxLearn 1/9/2025 4:45:22 AM 156
This is a common task when processing text, especially when you need to extract plain content from HTML code or when you're working with user-generated HTML input.
A string in C# might contain HTML elements, and our goal is to strip them out. This is useful when displaying HTML content as simple text and removing any HTML formatting, such as bold, italic, or hyperlinks.
There are various methods to achieve this, but here we will focus on two common approaches:
Remove HTML Tags using RegEx
RegEx is one of the simplest and most efficient ways to remove HTML tags from a string.
// 1. Using Regular Expression to remove HTML tags System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex("<[^>]*>"); FinalData = regex.Replace(FinalData, "");
This code uses a regular expression to match all content enclosed in <
and >
, which is characteristic of HTML tags, and replaces them with an empty string.
Remove HTML Tags Without RegEx
If you prefer not to use RegEx, you can manually remove HTML tags using a character array. This approach checks for the <
and >
characters and removes everything between them.
public string RemoveHTMLTagsManually(string html) { var result = new StringBuilder(); bool isInsideTag = false; foreach (char currentChar in html) { if (currentChar == '<') { // Start skipping content inside tag isInsideTag = true; } else if (currentChar == '>') { // Stop skipping content inside tag isInsideTag = false; } else if (!isInsideTag) { // Add the character to result if not inside a tag result.Append(currentChar); } } return result.ToString(); }
This method works by iterating through the input HTML string. When it encounters a <
, it starts skipping characters until it finds a >
, thus removing the HTML tags.
Sometimes, you may want to remove specific HTML elements, such as lists (<ul>
) or tables (<table>
). Below is an example of how to remove a <ul>
element from an HTML string using custom logic:
// Example: Removing <ul> tags from HTML string while (htmlContent.ToLower().IndexOf("<ul") > 0) { try { string ulTag = htmlContent.Substring(htmlContent.IndexOf("<ul"), (htmlContent.IndexOf("</ul>") + 5) - htmlContent.IndexOf("<ul")); htmlContent = htmlContent.Replace(ulTag, ""); } catch (Exception ex) { string errorMessage = ex.Message; errorCount++; // Limit the number of iterations to avoid infinite loops in case of bad HTML if (errorCount > 100) { goto cleanup; } } }
This approach locates the <ul>
element and removes it from the HTML string. It handles cases where tags are improperly closed or malformed.
C# Remove <div> Tags from HTML String
Here's a method that uses RegEx to remove <div>
tags from an HTML string:
public static string RemoveDivTags(string input) { return Regex.Replace(input, "<div.*?>.*?</div>", String.Empty, RegexOptions.Singleline); }
This method removes the entire <div>
element and its contents using a RegEx pattern that matches <div>
tags and everything inside them.
How to remove all html tags in Javascript
If you need to remove HTML tags in JavaScript, you can use the following regular expression:
item = item.replace(/<(.|\n)*?>/g, '');
This JavaScript code will strip out all HTML tags from the given item
string, leaving only the plain text content.
By using one of these approaches, you can easily remove HTML tags from strings in both C# and JavaScript.
- How to fix 'Failure sending mail' in C#
- How to Parse a Comma-Separated String from App.config in C#
- How to convert a dictionary to a list in C#
- How to retrieve the Executable Path in C#
- How to validate an IP address in C#
- How to retrieve the Downloads Directory Path in C#
- C# Tutorial
- Dictionary with multiple values per key in C#