Using yield return to minimize memory usage in C#

By FoxLearn 2/10/2025 7:51:57 AM   64
Imagine you are tasked with processing a large dataset, such as reading through a massive log file to identify specific errors and return relevant context objects for further analysis.

A naive approach might involve reading the entire dataset into memory at once and then filtering out the errors. However, this can lead to significant memory overhead, especially if the dataset is very large, as you would be holding all of the objects in memory even though you may only need a small subset.

In such scenarios, yield return can be a game changer. By using yield return, you can return one object at a time, which prevents the need to store the entire collection in memory at once. This not only reduces memory consumption but also speeds up the process of working with large datasets.

Using yield return to Find Errors in a Log File

public static IEnumerable<LogError> FindErrors(string logFilePath, HashSet<string> errorKeywords)
{
    using (var reader = new StreamReader(logFilePath))
    {
        int lineNumber = 0;
        while (!reader.EndOfStream)
        {
            string line = reader.ReadLine();
            lineNumber++;

            if (errorKeywords.Any(keyword => line.Contains(keyword)))
            {
                yield return new LogError
                {
                    ErrorLine = line,
                    LineNumber = lineNumber
                };
            }
        }
    }
}

In this example, the method reads a large log file line by line and checks if any of the keywords match. When a match is found, it returns a LogError object with the relevant line and its line number. This method uses yield return, which means that each LogError is returned individually without keeping all results in memory.

For example, how you can call the FindErrors method and process each error:

var errorKeywords = new HashSet<string> { "ERROR", "Critical", "Exception" };

foreach (var error in FindErrors(@"C:\logs\app_log.txt", errorKeywords))
{
    Console.WriteLine($"Found error: {error.ErrorLine} at line {error.LineNumber}");
}

This code will print each error as it’s found, keeping memory usage low since only one LogError is held in memory at a time.

Performance Comparison: Memory Usage of yield return vs. Creating a Full List

To understand the efficiency of yield return, let’s compare its memory usage against a version that creates a complete collection of results first, like a list.

Using yield return to Generate GUIDs

public static IEnumerable<string> GenerateGuids(int count)
{
    for (int i = 0; i < count; i++)
    {
        yield return Guid.NewGuid().ToString();
    }
}

// Save GUIDs to a file
System.IO.File.WriteAllLines(@"C:\temp\guids.txt", GenerateGuids(10000000));

In this case, the method generates GUIDs one at a time and writes them directly to a file, keeping memory usage minimal. After running the memory profiler, the process used a maximum of 12 MB of memory.

Creating a Full List of GUIDs

Now, let’s see what happens when we create the entire list of GUIDs first:

public static List<string> GenerateGuidsList(int count)
{
    var list = new List<string>();
    for (int i = 0; i < count; i++)
    {
        list.Add(Guid.NewGuid().ToString());
    }
    return list;
}

// Save GUIDs to a file
System.IO.File.WriteAllLines(@"C:\temp\guids.txt", GenerateGuidsList(10000000));

Here, the method generates all GUIDs at once and then writes the entire list to a file. After profiling this approach, we find that the process uses a massive 1.5 GB of memory and spikes close to 2 GB.

Memory Usage Comparison

MethodTotal Memory AllocatedMax Memory Usage at Any Given Time
Using yield return915 MB12 MB
Creating the Entire Collection at Once> 1 GB> 1 GB

As you can see, the difference is significant. The version using yield return kept memory usage under control, using just 12 MB at peak, whereas the method that generated the full list used far more memory, hitting close to 2 GB. This clearly demonstrates how yield return can be used to reduce memory consumption when dealing with large datasets.

When working with large datasets, using yield return can dramatically reduce memory usage by returning data one item at a time rather than holding everything in memory at once.