How to chunk data using LINQ in C#

By FoxLearn 12/30/2024 7:34:41 AM   126
The Chunk method in LINQ is a powerful tool for dividing large datasets into smaller, more manageable chunks, improving efficiency in data processing.

By splitting a collection into fixed-size chunks, it helps optimize memory usage and performance, especially when dealing with large data sets or files.

The Chunk method is part of the System.Linq namespace in C# and provides a simple way to break down a collection into fixed-size chunks.

public static System.Collections.Generic.IEnumerable<T[]> Chunk<T>(this System.Collections.Generic.IEnumerable<T> source, int size);

The Chunk extension method in LINQ takes two parameters: the collection to be chunked and the size of each chunk. The first parameter is the data source (e.g., an array), and the second parameter defines the maximum number of elements in each chunk.

Chunking an Array of Integers

Consider this example where we divide an array of integers into chunks of five elements each:

int[] numbers = { 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 };
var chunks = numbers.Chunk(4);
int counter = 0;
foreach (var chunk in chunks)
{
    Console.WriteLine($"Chunk #{++counter}");
    Console.WriteLine(string.Join(", ", chunk));
}

The output shows the integers in each chunk with the chunk number indicated. The last chunk contains the remaining three integers.

Chunk #1
21, 22, 23, 24
Chunk #2
25, 26, 27, 28
Chunk #3
29, 30, 31, 32
Chunk #4
33, 34, 35

Chunking a List of Strings

The Chunk method can also be used with other collections, like a list of strings.

List<string> cities = new List<string> { "New York", "London", "Tokyo", "Paris", "Sydney", "Berlin", "Moscow", "Rome" };
var chunks = cities.Chunk(2);
int counter = 0;
foreach (var chunk in chunks)
{
    Console.WriteLine($"Chunk #{++counter}");
    Console.WriteLine(string.Join(", ", chunk));
}

The output shows the cities in each chunk, with each chunk containing two city names.

Chunk #1
New York, London
Chunk #2
Tokyo, Paris
Chunk #3
Sydney, Berlin
Chunk #4
Moscow, Rome

Chunking to Process Large Files

Chunking is particularly useful when working with large files. Rather than loading the entire file into memory, you can process the file in chunks.

int chunkSize = 50;
var lines = File.ReadLines(@"C:\path\to\largefile.csv");

foreach (var chunk in lines.Chunk(chunkSize))
{
    Console.WriteLine($"Processing {chunk.Count()} lines from the file:");
    ProcessChunk(chunk);
}

void ProcessChunk(IEnumerable<string> chunk)
{
    foreach (var line in chunk)
    {
        // Simulate processing the line (e.g., parsing CSV, performing computations)
        Console.WriteLine(line);
    }
}

In this example:

  • The File.ReadLines method reads the file line-by-line, which helps to avoid loading the entire file into memory.
  • The Chunk(chunkSize) method splits the file's lines into chunks of 50 lines.
  • The ProcessChunk method is called for each chunk, and each line of the chunk is processed (in this case, simply displayed).

The Chunk method in LINQ is an excellent tool for handling large datasets efficiently. It helps in breaking down large collections or files into smaller, more manageable parts, ensuring better resource management and enhanced application performance.