How to chunk data using LINQ in C#
By FoxLearn 12/30/2024 7:34:41 AM 126
By splitting a collection into fixed-size chunks, it helps optimize memory usage and performance, especially when dealing with large data sets or files.
The Chunk method is part of the System.Linq namespace in C# and provides a simple way to break down a collection into fixed-size chunks.
public static System.Collections.Generic.IEnumerable<T[]> Chunk<T>(this System.Collections.Generic.IEnumerable<T> source, int size);
The Chunk
extension method in LINQ takes two parameters: the collection to be chunked and the size of each chunk. The first parameter is the data source (e.g., an array), and the second parameter defines the maximum number of elements in each chunk.
Chunking an Array of Integers
Consider this example where we divide an array of integers into chunks of five elements each:
int[] numbers = { 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 }; var chunks = numbers.Chunk(4); int counter = 0; foreach (var chunk in chunks) { Console.WriteLine($"Chunk #{++counter}"); Console.WriteLine(string.Join(", ", chunk)); }
The output shows the integers in each chunk with the chunk number indicated. The last chunk contains the remaining three integers.
Chunk #1 21, 22, 23, 24 Chunk #2 25, 26, 27, 28 Chunk #3 29, 30, 31, 32 Chunk #4 33, 34, 35
Chunking a List of Strings
The Chunk
method can also be used with other collections, like a list of strings.
List<string> cities = new List<string> { "New York", "London", "Tokyo", "Paris", "Sydney", "Berlin", "Moscow", "Rome" }; var chunks = cities.Chunk(2); int counter = 0; foreach (var chunk in chunks) { Console.WriteLine($"Chunk #{++counter}"); Console.WriteLine(string.Join(", ", chunk)); }
The output shows the cities in each chunk, with each chunk containing two city names.
Chunk #1 New York, London Chunk #2 Tokyo, Paris Chunk #3 Sydney, Berlin Chunk #4 Moscow, Rome
Chunking to Process Large Files
Chunking is particularly useful when working with large files. Rather than loading the entire file into memory, you can process the file in chunks.
int chunkSize = 50; var lines = File.ReadLines(@"C:\path\to\largefile.csv"); foreach (var chunk in lines.Chunk(chunkSize)) { Console.WriteLine($"Processing {chunk.Count()} lines from the file:"); ProcessChunk(chunk); } void ProcessChunk(IEnumerable<string> chunk) { foreach (var line in chunk) { // Simulate processing the line (e.g., parsing CSV, performing computations) Console.WriteLine(line); } }
In this example:
- The
File.ReadLines
method reads the file line-by-line, which helps to avoid loading the entire file into memory. - The
Chunk(chunkSize)
method splits the file's lines into chunks of 50 lines. - The
ProcessChunk
method is called for each chunk, and each line of the chunk is processed (in this case, simply displayed).
The Chunk
method in LINQ is an excellent tool for handling large datasets efficiently. It helps in breaking down large collections or files into smaller, more manageable parts, ensuring better resource management and enhanced application performance.
- How to fix 'Failure sending mail' in C#
- How to Parse a Comma-Separated String from App.config in C#
- How to convert a dictionary to a list in C#
- How to retrieve the Executable Path in C#
- How to validate an IP address in C#
- How to retrieve the Downloads Directory Path in C#
- C# Tutorial
- Dictionary with multiple values per key in C#