Courses/C#/Parallel Programming

Lesson 22 • Advanced Track

Parallel Programming with Parallel.For & PLINQ

By the end of this lesson you'll take a CPU-bound loop that crawls on one core and spread it across all your cores with Parallel.For, Parallel.ForEach, and PLINQ's .AsParallel() — and you'll do it safely, because you'll know exactly how to stop two threads corrupting the same variable. This is how you turn an 8-core machine into an 8x speed-up instead of a pile of subtle bugs.

What You'll Learn

Parallelise CPU-bound loops with Parallel.For, Parallel.ForEach and Parallel.Invoke
Control how many threads run at once with MaxDegreeOfParallelism
Turn any LINQ query parallel with .AsParallel() — and restore order with .AsOrdered()
Spot and fix data races with Interlocked, lock, and concurrent collections
Sum in parallel safely using thread-local state and a single atomic merge
Know when NOT to parallelise — and why parallel is the wrong tool for I/O

Before you start: finish Lesson 12: Async & Await first. That lesson is about async I/O — staying responsive while you wait on networks, files and databases (work where the CPU is mostly idle). This lesson is the opposite: parallelism for CPU-bound work — keeping every core busy crunching numbers. Use async/await when you're waiting; use Parallel/PLINQ when you're computing.

💡 Real-World Analogy

Picture a supermarket with one long queue of shoppers. Sequential code is a single checkout till: every shopper waits for the one in front, and the queue only moves as fast as that one cashier. Parallel code is opening more tills — eight cashiers serving the same queue at once, so the line clears roughly eight times faster. Parallel.For and PLINQ are the store manager shouting "open more tills!" — they automatically split the shoppers across however many cashiers (CPU cores) you have. But watch the shared till roll: if two cashiers reach for the same cash drawer at the same moment, the total goes wrong. That collision is a data race, and most of this lesson is about giving each cashier the right kind of drawer.

The mental model: parallel = many workers, async = one worker not waiting

These two ideas get confused constantly, so pin them down now. They solve different problems and you pick by asking one question: is my program busy, or is it waiting?

🧮 CPU-bound — the work keeps a core busy (maths, image filters, parsing, encryption). More cores finish it faster, so you want parallelism: Parallel.For / PLINQ.
⏳ I/O-bound — the work is mostly waiting on something external (HTTP, disk, database). More cores don't help; you want async so the thread isn't blocked while it waits.
🧵 Thread — a worker the OS schedules onto a CPU core. Parallel code uses the thread pool to run many bodies at once.
💥 Data race — two threads touching the same variable at the same time, where at least one writes. The result becomes unpredictable and usually wrong.

The golden rule: parallelise CPU work, await I/O work. Running Parallel.ForEach over a list of HTTP calls is a classic mistake — it ties up thread-pool threads doing nothing but waiting, which is exactly what async was built to avoid.

📊 Parallelism Toolbox (and async, for contrast)

Tool	Use it for	Example
Parallel.For	CPU loop over a number range	Parallel.For(0, n, i => ...)
Parallel.ForEach	CPU loop over a collection	Parallel.ForEach(items, x => ...)
Parallel.Invoke	Run a few different methods at once	Parallel.Invoke(A, B, C)
.AsParallel()	Make a LINQ query parallel (PLINQ)	q.AsParallel().Where(...)
.AsOrdered()	Keep original order in PLINQ output	.AsParallel().AsOrdered()
Interlocked	Atomic counter / sum, lock-free	Interlocked.Add(ref t, x)
lock	Guard a multi-step shared update	lock (gate) { ... }
ConcurrentDictionary	Thread-safe map, no locks needed	dict.AddOrUpdate(k, 1, ...)
async / await	I/O waiting (the other lesson)	await client.GetAsync(url)

Reach for the Parallel/PLINQ rows when the CPU is busy; reach for the last row when the program is just waiting.

1. Parallel.For, Parallel.ForEach & Parallel.Invoke

Parallel.For takes a start and an exclusive end index and runs the loop body across thread-pool threads — each iteration may land on a different core, in any order. Parallel.ForEach does the same over a collection, and Parallel.Invoke fires off several different methods at once. Because iterations run simultaneously, you must never assume an order: in the worked example below every iteration writes to its own array slot (safe), and the printed lines come out scrambled. Read it, run it twice, and notice the order changes but the work always completes.

Worked example: Parallel.For / ForEach / Invoke

Compare a sequential loop to Parallel.For, then fan out work across cores. Run it twice — the order changes.

Try it Yourself »

using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;

class Program
{
    // A deliberately heavy, CPU-bound calculation so the cores have real work.
    static double HeavyComputation(int n)
    {
        double result = 0;
        for (int i = 0; i < 200_000; i++)
            result += Math.Sqrt(i * (n + 1));
        return result;
    }

    static void Main()
    {
        const int items = 100;

        // SEQUENTIAL: one core, one item at a time. Th
...

Your turn. The program below ships and invoices six orders in parallel. Fill in the three ___ blanks to call Parallel.ForEach and Parallel.For correctly. The printed order will be different every run — that's expected — so you check the count at the end, not the order.

🎯 Your turn: process items with Parallel.For / ForEach

Fill in ForEach and Length. The line order varies, but the count is always 6.

Try it Yourself »

using System;
using System.Threading;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        // 🎯 YOUR TURN — fill in the blanks marked with ___
        // Goal: process 6 orders in parallel. Order of printing is RANDOM,
        // so we print a COUNT at the end, not order-sensitive output.

        string[] orders = { "A1", "B2", "C3", "D4", "E5", "F6" };

        // 1) Use Parallel.ForEach to walk the array across threads.
        Parallel.___(orders, order =>    
...

2. PLINQ — Parallel LINQ

Already know LINQ? Then you already know PLINQ. Drop .AsParallel() into any query and the engine partitions the data across cores and runs your .Where/.Select/.Sum on each chunk. Order-independent reductions like .Count() and .Sum() are the sweet spot — the answer is identical to the sequential version, just faster. The catch: by default the order of streamed results is arbitrary, so add .AsOrdered() when sequence matters.

Worked example: .AsParallel(), .AsOrdered(), .Sum()

Count primes in parallel, preserve order with AsOrdered, and sum squares with PLINQ.

Try it Yourself »

using System;
using System.Diagnostics;
using System.Linq;

class Program
{
    static bool IsPrime(int n)
    {
        if (n < 2) return false;
        for (int i = 2; i <= Math.Sqrt(n); i++)
            if (n % i == 0) return false;
        return true;
    }

    static void Main()
    {
        int range = 1_000_000;

        // Sequential LINQ — single-threaded.
        var sw = Stopwatch.StartNew();
        int seqCount = Enumerable.Range(2, range).Where(IsPrime).Count();
        sw.Stop(
...

Now you write a PLINQ pipeline. Take 1 to 100, keep the even numbers, square them, and sum the result — all in parallel. Because Sum doesn't care about order, the total is the same on every run. Fill in the four ___ blanks:

🎯 Your turn: PLINQ Where + Select + Sum

Make it parallel, filter evens, square, and sum — total should be 171700.

Try it Yourself »

using System;
using System.Linq;

class Program
{
    static void Main()
    {
        // 🎯 YOUR TURN — fill in the blanks marked with ___
        // Goal: take 1..100, keep the EVEN numbers, square them, and SUM them
        //       — all in PARALLEL. Sum is order-independent, so PLINQ is ideal
        //       and the TOTAL is the same on every run.

        int[] numbers = Enumerable.Range(1, 100).ToArray();

        long total = numbers
            // 1) Turn the query parallel.
          
...

3. Thread Safety: Races, Interlocked, lock & Concurrent Collections

Here's where parallel code bites. The instant two threads write to the same variable, you have a data race. The classic example is count++: it's secretly three steps — read, add one, write back — so two threads can read the same value and both write back the same result, silently losing increments. The worked example below shows the broken version (the total comes out too low), then three fixes: Interlocked for atomic counters, lock for multi-step updates, and ConcurrentDictionary for a thread-safe map.

Worked example: fix a data race three ways

See count++ lose increments, then fix it with Interlocked, lock, and ConcurrentDictionary.

Try it Yourself »

using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        // ❌ WRONG — many threads do unsafeCount++ on the SAME variable.
        // ++ is read-modify-write: two threads can read the same value and
        // both write back the same +1, so increments are LOST (a data race).
        int unsafeCount = 0;
        Parallel.For(0, 1_000_000, _ => { unsafeCount++; });
        Conso
...

4. Fast Parallel Sums with Thread-Local State

Locking on every single iteration is correct but slow — the threads spend their time queuing for the lock instead of computing. The professional pattern uses the four-lambda Parallel.For overload: each thread keeps a private running subtotal (zero contention), and only when a thread finishes does it merge its subtotal into the shared grand total once, atomically. Notice the key property — the threads finish in random order, but the final total is exactly the same every run. Determinism of the result, non-determinism of the schedule.

Worked example: thread-local subtotal + atomic merge

Sum 1..1,000,000 with private subtotals and one Interlocked.Add per thread.

Try it Yourself »

using System;
using System.Threading;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        // The 4-lambda Parallel.For overload is the FAST way to total things.
        // Each thread keeps a private 'subtotal' (no contention), then merges
        // its subtotal into the shared grand total ONCE, under Interlocked.
        long grandTotal = 0;

        Parallel.For(
            1, 1_000_001,                 // sum 1..1_000_000 (end is exclusive)
            () => 
...

5. Exceptions: AggregateException

When a parallel body throws, the failure can't just bubble up one stack — several threads might fail at once. So the Parallel methods and PLINQ collect every error into a single AggregateException. You catch that one type, then loop over its .InnerExceptions to see each underlying problem. (Remaining iterations may still run, so don't assume the loop stops dead at the first throw.)

Worked example: catch AggregateException

Two iterations throw; one catch unpacks both via InnerExceptions.

Try it Yourself »

using System;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        // When a parallel body throws, the failures from all threads are
        // bundled into ONE AggregateException. Catch it and inspect
        // .InnerExceptions to see each underlying error.
        try
        {
            Parallel.For(0, 10, i =>
            {
                if (i == 3 || i == 7)
                    throw new InvalidOperationException($"item {i} failed");
            });
     
...

🔎 Deep Dive: when NOT to parallelise

Parallelism is not free. Splitting the work, scheduling threads, and merging results all cost time and memory. If the per-item work is tiny or the collection is small, that overhead can make the parallel version slower than the simple loop. Don't guess — measure with a Stopwatch before and after.

Skip parallelism when:

📉 The work is small — a few thousand cheap operations finish before the threads even spin up.
🌐 The work is I/O-bound — HTTP, disk, database. Use async/await + Task.WhenAll, not Parallel.ForEach; otherwise you block thread-pool threads doing nothing but waiting.
🔗 Iterations depend on each other — if step i needs the result of step i-1, the work is inherently sequential.
🧱 There's heavy shared state — if every iteration must take the same lock, the threads serialise anyway and you've added overhead for nothing.

// CPU-bound + independent + big   -> Parallel / PLINQ
Parallel.For(0, 1_000_000, i => Crunch(i));

// I/O-bound (waiting on the network) -> async, NOT Parallel
var pages = await Task.WhenAll(urls.Select(u => http.GetStringAsync(u)));

Putting It Together: a Batch Image Processor

Here's a realistic CPU-bound job: process 200 "images", each needing heavy per-item work. It uses Parallel.ForEach capped to Environment.ProcessorCount cores, collects results in a thread-safe ConcurrentBag, and reports order-independent stats. You understand every line now — the parallelism, the core cap, the thread-safe collection, and why the totals are deterministic.

Worked example: batch-process 200 images in parallel

Cap to core count, collect into a ConcurrentBag, report deterministic totals.

Try it Yourself »

using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading.Tasks;

class Program
{
    // Pretend each "image" needs heavy CPU work (resize/filter).
    static long ProcessImage(int id)
    {
        long checksum = 0;
        for (int i = 0; i < 50_000; i++)
            checksum += (long)Math.Sqrt(i * (id + 1));
        return checksum;
    }

    static void Main()
    {
        int[] imageIds = Enumerable.Range(1, 200).ToArray();

        // Collect per-image
...

The images finish in a random order, but Count, Sum and Max are order-independent — so the reported numbers are identical on every run.

Pro Tips

💡 Measure before you parallelise: wrap both versions in a Stopwatch. Parallel overhead means small or cheap workloads are often slower in parallel.
💡 Cap at the core count: set MaxDegreeOfParallelism = Environment.ProcessorCount for pure CPU work — more threads than cores just adds context-switching.
💡 Prefer Interlocked over lock for a single counter or sum: Interlocked.Add/Increment is lock-free and much faster than wrapping ++ in a lock.
💡 Use the thread-local overload for reductions: private subtotals merged once per thread beat locking on every iteration by a wide margin.
💡 Partition for uneven work: Partitioner.Create(0, count, chunkSize) hands threads larger chunks, cutting scheduling overhead when each item is cheap.
💡 PLINQ's sweet spot is large data sets (tens of thousands of items) with CPU-heavy per-item work. For small collections, plain sequential LINQ wins.

Common Errors (and the fix)

Data race on shared state: two threads writing the same variable (total += x, adding to a plain List<T>) corrupts it. Fix with Interlocked, a lock, or a concurrent collection like ConcurrentBag/ConcurrentDictionary.
i++ / count++ across threads loses increments: ++ is read-modify-write, not atomic, so the total comes out too low. Use Interlocked.Increment(ref count).
Using Parallel for I/O-bound work: Parallel.ForEach over HTTP/disk calls blocks thread-pool threads while they just wait. Use async/await with Task.WhenAll instead.
Exceptions wrapped in AggregateException: a throw inside a parallel body surfaces as one AggregateException, not the original type. Catch AggregateException and inspect .InnerExceptions.
Assuming output order: parallel iterations run in a non-deterministic order, so printed lines come out scrambled. If sequence matters, add .AsOrdered() (PLINQ) or sort the collected results afterwards.
Over-parallelising: MaxDegreeOfParallelism = 100 on an 8-core box causes thrashing. Match it to Environment.ProcessorCount.

📋 Quick Reference

Task	Code	Notes
Parallel range loop	Parallel.For(0, n, i => ...)	end is exclusive
Parallel collection loop	Parallel.ForEach(items, x => ...)	order varies
Several methods at once	Parallel.Invoke(A, B, C)	fire-and-wait
Limit threads	new ParallelOptions { MaxDegreeOfParallelism = n }	cap to cores
Parallel LINQ	q.AsParallel().Where(...).Sum()	PLINQ
Keep order in PLINQ	.AsParallel().AsOrdered()	small cost
Atomic counter	Interlocked.Increment(ref n)	lock-free
Atomic add	Interlocked.Add(ref total, x)	lock-free
Guard a block	lock (gate) { ... }	one thread inside
Thread-safe map	dict.AddOrUpdate(k, 1, (k,v)=>v+1)	ConcurrentDictionary
Catch parallel errors	catch (AggregateException ex)	see InnerExceptions

Frequently Asked Questions

Q: What's the difference between parallel and async?

Parallel uses many threads to do CPU work faster (more cooks). Async uses one thread efficiently while it waits on I/O (one cook not standing idle). Parallelise computation; await network/disk/database. Mixing them up — e.g. Parallel.ForEach over HTTP calls — wastes threads.

Q: Why is my output in a different order every time I run it?

Because parallel iterations run on different threads with no guaranteed schedule. That's normal and expected. If order matters, use PLINQ's .AsOrdered(), or collect into a list and sort it afterwards. For totals and counts, order doesn't matter at all.

Q: My parallel sum gives a different (too-low) total each run — why?

That's a data race. A plain total += x across threads loses updates because += isn't atomic. Use Interlocked.Add(ref total, x), or the thread-local Parallel.For overload that sums private subtotals and merges once. The fixed version is deterministic — same total every run.

Q: Is parallel always faster?

No. Splitting work and scheduling threads has overhead, so for small or cheap workloads the simple loop wins. Parallelism pays off when the data is large and each item does real CPU work. Always measure both with a Stopwatch before committing.

Mini-Challenge: Thread-Safe Parallel Sum

No blanks this time — just a brief and an outline. Sum every number from 1 to 1,000,000 using Parallel.For, but do it safely: a plain total += i is a data race, so make the add atomic with Interlocked.Add (or use the thread-local subtotal overload for extra credit). The whole point: even though the threads run in a random order, the total is identical on every run. Run it and check against the expected line in the comments.

🎯 Mini-Challenge: deterministic parallel sum

Sum 1..1,000,000 in parallel with a thread-safe total — should always print 500000500000.

Try it Yourself »

using System;
using System.Threading;
using System.Threading.Tasks;

class Program
{
    static void Main()
    {
        // 🎯 MINI-CHALLENGE: Parallel sum of a range, the THREAD-SAFE way
        // 1. You want the sum of all numbers from 1 to 1_000_000.
        // 2. Declare a shared 'long total = 0;'.
        // 3. Use Parallel.For(1, 1_000_001, i => { ... }) to add each i to
        //    'total'. A plain 'total += i;' is a DATA RACE — instead make the
        //    add atomic with Interlock
...

🎉 Lesson Complete

✅ Parallel.For / Parallel.ForEach / Parallel.Invoke spread CPU-bound work across all cores
✅ MaxDegreeOfParallelism caps how many threads run at once — match it to Environment.ProcessorCount
✅ PLINQ: .AsParallel() makes any query parallel; .AsOrdered() restores element order
✅ Output order is non-deterministic, but counts, sums and other reductions are deterministic
✅ Fix data races with Interlocked (lock-free), lock (multi-step), or concurrent collections
✅ Parallel errors arrive as one AggregateException — inspect .InnerExceptions
✅ Don't parallelise small, sequential, or I/O-bound work — async is the tool for I/O
✅ Next lesson: Memory Management & Garbage Collector Internals

Parallel Programming with Parallel.For & PLINQ

What You'll Learn

💡 Real-World Analogy

The mental model: parallel = many workers, async = one worker not waiting

📊 Parallelism Toolbox (and async, for contrast)

1. Parallel.For, Parallel.ForEach & Parallel.Invoke

Worked example: Parallel.For / ForEach / Invoke

🎯 Your turn: process items with Parallel.For / ForEach

2. PLINQ — Parallel LINQ

Worked example: .AsParallel(), .AsOrdered(), .Sum()

🎯 Your turn: PLINQ Where + Select + Sum

3. Thread Safety: Races, Interlocked, lock & Concurrent Collections

Worked example: fix a data race three ways

4. Fast Parallel Sums with Thread-Local State

Worked example: thread-local subtotal + atomic merge

5. Exceptions: AggregateException

Worked example: catch AggregateException

🔎 Deep Dive: when NOT to parallelise

Putting It Together: a Batch Image Processor

Worked example: batch-process 200 images in parallel

Pro Tips

Common Errors (and the fix)

📋 Quick Reference

Frequently Asked Questions

Mini-Challenge: Thread-Safe Parallel Sum

🎯 Mini-Challenge: deterministic parallel sum

🎉 Lesson Complete

Cookie & Privacy Settings