Remove non-alphanumeric characters from a string in C#
By FoxLearn 2/5/2025 9:00:29 AM 13
if (string.IsNullOrEmpty(s)) return s; return Regex.Replace(s, "[^a-zA-Z0-9]", "");
Note: Avoid passing in a null
value, as this will result in an exception.
Using regex is a straightforward method for filtering characters by "category," such as retaining only alphanumeric characters. However, keep in mind that regex can be the slower option compared to alternatives, which is something to consider if performance is critical.
This example only retains ASCII alphanumeric characters. If you need to handle other alphabets or character sets, there's more you can do.
For Optimal Performance, Use a Loop
An alternative method that improves performance is looping through the string and checking each character individually. This method is considerably faster (up to 7.5 times) than regex.
if (string.IsNullOrEmpty(s)) return s; StringBuilder sb = new StringBuilder(); foreach (var c in s) { if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9')) sb.Append(c); } return sb.ToString();
This method is significantly faster than regex and avoids the overhead of regex operations.
Avoid Using Compiled Regex
Using compiled regex is unlikely to improve performance in this case. In some scenarios, it could even be slower than just using non-compiled regex. A simpler approach is to use the static Regex.Replace()
method, which avoids the need to manage a compiled regex object.
Example of using compiled regex:
private static readonly Regex regex = new Regex("[^a-zA-Z0-9]", RegexOptions.Compiled); public static string RemoveNonAlphanumericChars(string s) { if (string.IsNullOrEmpty(s)) return s; return regex.Replace(s, ""); }
Use char.IsLetterOrDigit() for Unicode Alphanumeric Characters
If you need to keep all Unicode alphanumeric characters, you can use char.IsLetterOrDigit()
which will return true
for any character classified as a letter or digit across all Unicode character sets.
For example, this would allow characters from many languages and scripts (like Greek or Arabic):
if (string.IsNullOrEmpty(s)) return s; StringBuilder sb = new StringBuilder(); foreach (var c in s) { if (char.IsLetterOrDigit(c)) sb.Append(c); } return sb.ToString();
Note: Using char.IsLetterOrDigit()
can be inefficient in situations where you only need to keep a specific set of characters. In those cases, it’s best to specify exactly which characters you want.
Performance Comparison of Methods
To compare performance, I benchmarked four methods for removing non-alphanumeric characters from a string. The test was done with a string of 100 characters.
- Regex: 5016 ns (compiled regex: 4457 ns)
- Linq: 1506 ns
- Loop: 663 ns
The loop-based method outperforms all other methods by a significant margin.
Handling Non-ASCII Characters in Regex
What if you need to deal with non-ASCII characters, such as Greek characters? Here’s how to handle it:
For instance, if you're working with Greek characters (like "Ελληνικά"), you can specify the Unicode range for Greek characters:
Regex.Replace(s, "[^\u0370-\u03FF]", "");
Alternatively, you can use a Unicode named block to make the regex more readable:
Regex.Replace(s, @"[^\p{IsGreek}]", "");
Or, you can specify exactly which Unicode characters to allow, such as this range for Greek letters:
Regex.Replace(s, "[^α-ωάΕ]", "");
This approach helps ensure you're working with the exact characters you need, making the code more readable and easier to maintain.
- String to Byte Array Conversion in C#
- How to Trim a UTF-8 string to the specified number of bytes in C#
- How to Save a list of strings to a file in C#
- How to Convert string list to int list in C#
- How to Convert string list to float list in C#
- How to Remove a list of characters from a string in C#
- How to Check if a string contains any substring from a list in C#
- Find a character in a string in C#