Skip to main content
    Courses/C#/Parallel Programming

    Lesson 22 • Advanced Track

    Parallel Programming with Parallel.For & PLINQ

    By the end of this lesson you'll take a CPU-bound loop that crawls on one core and spread it across all your cores with Parallel.For, Parallel.ForEach, and PLINQ's .AsParallel() — and you'll do it safely, because you'll know exactly how to stop two threads corrupting the same variable. This is how you turn an 8-core machine into an 8x speed-up instead of a pile of subtle bugs.

    What You'll Learn

    • Parallelise CPU-bound loops with Parallel.For, Parallel.ForEach and Parallel.Invoke
    • Control how many threads run at once with MaxDegreeOfParallelism
    • Turn any LINQ query parallel with .AsParallel() — and restore order with .AsOrdered()
    • Spot and fix data races with Interlocked, lock, and concurrent collections
    • Sum in parallel safely using thread-local state and a single atomic merge
    • Know when NOT to parallelise — and why parallel is the wrong tool for I/O

    💡 Real-World Analogy

    Picture a supermarket with one long queue of shoppers. Sequential code is a single checkout till: every shopper waits for the one in front, and the queue only moves as fast as that one cashier. Parallel code is opening more tills — eight cashiers serving the same queue at once, so the line clears roughly eight times faster. Parallel.For and PLINQ are the store manager shouting "open more tills!" — they automatically split the shoppers across however many cashiers (CPU cores) you have. But watch the shared till roll: if two cashiers reach for the same cash drawer at the same moment, the total goes wrong. That collision is a data race, and most of this lesson is about giving each cashier the right kind of drawer.

    The mental model: parallel = many workers, async = one worker not waiting

    These two ideas get confused constantly, so pin them down now. They solve different problems and you pick by asking one question: is my program busy, or is it waiting?

    • 🧮 CPU-bound — the work keeps a core busy (maths, image filters, parsing, encryption). More cores finish it faster, so you want parallelism: Parallel.For / PLINQ.
    • I/O-bound — the work is mostly waiting on something external (HTTP, disk, database). More cores don't help; you want async so the thread isn't blocked while it waits.
    • 🧵 Thread — a worker the OS schedules onto a CPU core. Parallel code uses the thread pool to run many bodies at once.
    • 💥 Data race — two threads touching the same variable at the same time, where at least one writes. The result becomes unpredictable and usually wrong.

    The golden rule: parallelise CPU work, await I/O work. Running Parallel.ForEach over a list of HTTP calls is a classic mistake — it ties up thread-pool threads doing nothing but waiting, which is exactly what async was built to avoid.

    📊 Parallelism Toolbox (and async, for contrast)

    ToolUse it forExample
    Parallel.ForCPU loop over a number rangeParallel.For(0, n, i => ...)
    Parallel.ForEachCPU loop over a collectionParallel.ForEach(items, x => ...)
    Parallel.InvokeRun a few different methods at onceParallel.Invoke(A, B, C)
    .AsParallel()Make a LINQ query parallel (PLINQ)q.AsParallel().Where(...)
    .AsOrdered()Keep original order in PLINQ output.AsParallel().AsOrdered()
    InterlockedAtomic counter / sum, lock-freeInterlocked.Add(ref t, x)
    lockGuard a multi-step shared updatelock (gate) { ... }
    ConcurrentDictionaryThread-safe map, no locks neededdict.AddOrUpdate(k, 1, ...)
    async / awaitI/O waiting (the other lesson)await client.GetAsync(url)

    Reach for the Parallel/PLINQ rows when the CPU is busy; reach for the last row when the program is just waiting.

    1. Parallel.For, Parallel.ForEach & Parallel.Invoke

    Parallel.For takes a start and an exclusive end index and runs the loop body across thread-pool threads — each iteration may land on a different core, in any order. Parallel.ForEach does the same over a collection, and Parallel.Invoke fires off several different methods at once. Because iterations run simultaneously, you must never assume an order: in the worked example below every iteration writes to its own array slot (safe), and the printed lines come out scrambled. Read it, run it twice, and notice the order changes but the work always completes.

    Worked example: Parallel.For / ForEach / Invoke

    Compare a sequential loop to Parallel.For, then fan out work across cores. Run it twice — the order changes.

    Try it Yourself »
    C#
    using System;
    using System.Diagnostics;
    using System.Threading;
    using System.Threading.Tasks;
    
    class Program
    {
        // A deliberately heavy, CPU-bound calculation so the cores have real work.
        static double HeavyComputation(int n)
        {
            double result = 0;
            for (int i = 0; i < 200_000; i++)
                result += Math.Sqrt(i * (n + 1));
            return result;
        }
    
        static void Main()
        {
            const int items = 100;
    
            // SEQUENTIAL: one core, one item at a time. Th
    ...

    Your turn. The program below ships and invoices six orders in parallel. Fill in the three ___ blanks to call Parallel.ForEach and Parallel.For correctly. The printed order will be different every run — that's expected — so you check the count at the end, not the order.

    🎯 Your turn: process items with Parallel.For / ForEach

    Fill in ForEach and Length. The line order varies, but the count is always 6.

    Try it Yourself »
    C#
    using System;
    using System.Threading;
    using System.Threading.Tasks;
    
    class Program
    {
        static void Main()
        {
            // 🎯 YOUR TURN — fill in the blanks marked with ___
            // Goal: process 6 orders in parallel. Order of printing is RANDOM,
            // so we print a COUNT at the end, not order-sensitive output.
    
            string[] orders = { "A1", "B2", "C3", "D4", "E5", "F6" };
    
            // 1) Use Parallel.ForEach to walk the array across threads.
            Parallel.___(orders, order =>    
    ...

    2. PLINQ — Parallel LINQ

    Already know LINQ? Then you already know PLINQ. Drop .AsParallel() into any query and the engine partitions the data across cores and runs your .Where/.Select/.Sum on each chunk. Order-independent reductions like .Count() and .Sum() are the sweet spot — the answer is identical to the sequential version, just faster. The catch: by default the order of streamed results is arbitrary, so add .AsOrdered() when sequence matters.

    Worked example: .AsParallel(), .AsOrdered(), .Sum()

    Count primes in parallel, preserve order with AsOrdered, and sum squares with PLINQ.

    Try it Yourself »
    C#
    using System;
    using System.Diagnostics;
    using System.Linq;
    
    class Program
    {
        static bool IsPrime(int n)
        {
            if (n < 2) return false;
            for (int i = 2; i <= Math.Sqrt(n); i++)
                if (n % i == 0) return false;
            return true;
        }
    
        static void Main()
        {
            int range = 1_000_000;
    
            // Sequential LINQ — single-threaded.
            var sw = Stopwatch.StartNew();
            int seqCount = Enumerable.Range(2, range).Where(IsPrime).Count();
            sw.Stop(
    ...

    Now you write a PLINQ pipeline. Take 1 to 100, keep the even numbers, square them, and sum the result — all in parallel. Because Sum doesn't care about order, the total is the same on every run. Fill in the four ___ blanks:

    🎯 Your turn: PLINQ Where + Select + Sum

    Make it parallel, filter evens, square, and sum — total should be 171700.

    Try it Yourself »
    C#
    using System;
    using System.Linq;
    
    class Program
    {
        static void Main()
        {
            // 🎯 YOUR TURN — fill in the blanks marked with ___
            // Goal: take 1..100, keep the EVEN numbers, square them, and SUM them
            //       — all in PARALLEL. Sum is order-independent, so PLINQ is ideal
            //       and the TOTAL is the same on every run.
    
            int[] numbers = Enumerable.Range(1, 100).ToArray();
    
            long total = numbers
                // 1) Turn the query parallel.
              
    ...

    3. Thread Safety: Races, Interlocked, lock & Concurrent Collections

    Here's where parallel code bites. The instant two threads write to the same variable, you have a data race. The classic example is count++: it's secretly three steps — read, add one, write back — so two threads can read the same value and both write back the same result, silently losing increments. The worked example below shows the broken version (the total comes out too low), then three fixes: Interlocked for atomic counters, lock for multi-step updates, and ConcurrentDictionary for a thread-safe map.

    Worked example: fix a data race three ways

    See count++ lose increments, then fix it with Interlocked, lock, and ConcurrentDictionary.

    Try it Yourself »
    C#
    using System;
    using System.Collections.Concurrent;
    using System.Linq;
    using System.Threading;
    using System.Threading.Tasks;
    
    class Program
    {
        static void Main()
        {
            // ❌ WRONG — many threads do unsafeCount++ on the SAME variable.
            // ++ is read-modify-write: two threads can read the same value and
            // both write back the same +1, so increments are LOST (a data race).
            int unsafeCount = 0;
            Parallel.For(0, 1_000_000, _ => { unsafeCount++; });
            Conso
    ...

    4. Fast Parallel Sums with Thread-Local State

    Locking on every single iteration is correct but slow — the threads spend their time queuing for the lock instead of computing. The professional pattern uses the four-lambda Parallel.For overload: each thread keeps a private running subtotal (zero contention), and only when a thread finishes does it merge its subtotal into the shared grand total once, atomically. Notice the key property — the threads finish in random order, but the final total is exactly the same every run. Determinism of the result, non-determinism of the schedule.

    Worked example: thread-local subtotal + atomic merge

    Sum 1..1,000,000 with private subtotals and one Interlocked.Add per thread.

    Try it Yourself »
    C#
    using System;
    using System.Threading;
    using System.Threading.Tasks;
    
    class Program
    {
        static void Main()
        {
            // The 4-lambda Parallel.For overload is the FAST way to total things.
            // Each thread keeps a private 'subtotal' (no contention), then merges
            // its subtotal into the shared grand total ONCE, under Interlocked.
            long grandTotal = 0;
    
            Parallel.For(
                1, 1_000_001,                 // sum 1..1_000_000 (end is exclusive)
                () => 
    ...

    5. Exceptions: AggregateException

    When a parallel body throws, the failure can't just bubble up one stack — several threads might fail at once. So the Parallel methods and PLINQ collect every error into a single AggregateException. You catch that one type, then loop over its .InnerExceptions to see each underlying problem. (Remaining iterations may still run, so don't assume the loop stops dead at the first throw.)

    Worked example: catch AggregateException

    Two iterations throw; one catch unpacks both via InnerExceptions.

    Try it Yourself »
    C#
    using System;
    using System.Threading.Tasks;
    
    class Program
    {
        static void Main()
        {
            // When a parallel body throws, the failures from all threads are
            // bundled into ONE AggregateException. Catch it and inspect
            // .InnerExceptions to see each underlying error.
            try
            {
                Parallel.For(0, 10, i =>
                {
                    if (i == 3 || i == 7)
                        throw new InvalidOperationException($"item {i} failed");
                });
         
    ...

    🔎 Deep Dive: when NOT to parallelise

    Parallelism is not free. Splitting the work, scheduling threads, and merging results all cost time and memory. If the per-item work is tiny or the collection is small, that overhead can make the parallel version slower than the simple loop. Don't guess — measure with a Stopwatch before and after.

    Skip parallelism when:

    • 📉 The work is small — a few thousand cheap operations finish before the threads even spin up.
    • 🌐 The work is I/O-bound — HTTP, disk, database. Use async/await + Task.WhenAll, not Parallel.ForEach; otherwise you block thread-pool threads doing nothing but waiting.
    • 🔗 Iterations depend on each other — if step i needs the result of step i-1, the work is inherently sequential.
    • 🧱 There's heavy shared state — if every iteration must take the same lock, the threads serialise anyway and you've added overhead for nothing.
    // CPU-bound + independent + big   -> Parallel / PLINQ
    Parallel.For(0, 1_000_000, i => Crunch(i));
    
    // I/O-bound (waiting on the network) -> async, NOT Parallel
    var pages = await Task.WhenAll(urls.Select(u => http.GetStringAsync(u)));

    Putting It Together: a Batch Image Processor

    Here's a realistic CPU-bound job: process 200 "images", each needing heavy per-item work. It uses Parallel.ForEach capped to Environment.ProcessorCount cores, collects results in a thread-safe ConcurrentBag, and reports order-independent stats. You understand every line now — the parallelism, the core cap, the thread-safe collection, and why the totals are deterministic.

    Worked example: batch-process 200 images in parallel

    Cap to core count, collect into a ConcurrentBag, report deterministic totals.

    Try it Yourself »
    C#
    using System;
    using System.Collections.Concurrent;
    using System.Linq;
    using System.Threading.Tasks;
    
    class Program
    {
        // Pretend each "image" needs heavy CPU work (resize/filter).
        static long ProcessImage(int id)
        {
            long checksum = 0;
            for (int i = 0; i < 50_000; i++)
                checksum += (long)Math.Sqrt(i * (id + 1));
            return checksum;
        }
    
        static void Main()
        {
            int[] imageIds = Enumerable.Range(1, 200).ToArray();
    
            // Collect per-image
    ...

    The images finish in a random order, but Count, Sum and Max are order-independent — so the reported numbers are identical on every run.

    Pro Tips

    • 💡 Measure before you parallelise: wrap both versions in a Stopwatch. Parallel overhead means small or cheap workloads are often slower in parallel.
    • 💡 Cap at the core count: set MaxDegreeOfParallelism = Environment.ProcessorCount for pure CPU work — more threads than cores just adds context-switching.
    • 💡 Prefer Interlocked over lock for a single counter or sum: Interlocked.Add/Increment is lock-free and much faster than wrapping ++ in a lock.
    • 💡 Use the thread-local overload for reductions: private subtotals merged once per thread beat locking on every iteration by a wide margin.
    • 💡 Partition for uneven work: Partitioner.Create(0, count, chunkSize) hands threads larger chunks, cutting scheduling overhead when each item is cheap.
    • 💡 PLINQ's sweet spot is large data sets (tens of thousands of items) with CPU-heavy per-item work. For small collections, plain sequential LINQ wins.

    Common Errors (and the fix)

    • Data race on shared state: two threads writing the same variable (total += x, adding to a plain List<T>) corrupts it. Fix with Interlocked, a lock, or a concurrent collection like ConcurrentBag/ConcurrentDictionary.
    • i++ / count++ across threads loses increments: ++ is read-modify-write, not atomic, so the total comes out too low. Use Interlocked.Increment(ref count).
    • Using Parallel for I/O-bound work: Parallel.ForEach over HTTP/disk calls blocks thread-pool threads while they just wait. Use async/await with Task.WhenAll instead.
    • Exceptions wrapped in AggregateException: a throw inside a parallel body surfaces as one AggregateException, not the original type. Catch AggregateException and inspect .InnerExceptions.
    • Assuming output order: parallel iterations run in a non-deterministic order, so printed lines come out scrambled. If sequence matters, add .AsOrdered() (PLINQ) or sort the collected results afterwards.
    • Over-parallelising: MaxDegreeOfParallelism = 100 on an 8-core box causes thrashing. Match it to Environment.ProcessorCount.

    📋 Quick Reference

    TaskCodeNotes
    Parallel range loopParallel.For(0, n, i => ...)end is exclusive
    Parallel collection loopParallel.ForEach(items, x => ...)order varies
    Several methods at onceParallel.Invoke(A, B, C)fire-and-wait
    Limit threadsnew ParallelOptions { MaxDegreeOfParallelism = n }cap to cores
    Parallel LINQq.AsParallel().Where(...).Sum()PLINQ
    Keep order in PLINQ.AsParallel().AsOrdered()small cost
    Atomic counterInterlocked.Increment(ref n)lock-free
    Atomic addInterlocked.Add(ref total, x)lock-free
    Guard a blocklock (gate) { ... }one thread inside
    Thread-safe mapdict.AddOrUpdate(k, 1, (k,v)=>v+1)ConcurrentDictionary
    Catch parallel errorscatch (AggregateException ex)see InnerExceptions

    Frequently Asked Questions

    Q: What's the difference between parallel and async?

    Parallel uses many threads to do CPU work faster (more cooks). Async uses one thread efficiently while it waits on I/O (one cook not standing idle). Parallelise computation; await network/disk/database. Mixing them up — e.g. Parallel.ForEach over HTTP calls — wastes threads.

    Q: Why is my output in a different order every time I run it?

    Because parallel iterations run on different threads with no guaranteed schedule. That's normal and expected. If order matters, use PLINQ's .AsOrdered(), or collect into a list and sort it afterwards. For totals and counts, order doesn't matter at all.

    Q: My parallel sum gives a different (too-low) total each run — why?

    That's a data race. A plain total += x across threads loses updates because += isn't atomic. Use Interlocked.Add(ref total, x), or the thread-local Parallel.For overload that sums private subtotals and merges once. The fixed version is deterministic — same total every run.

    Q: Is parallel always faster?

    No. Splitting work and scheduling threads has overhead, so for small or cheap workloads the simple loop wins. Parallelism pays off when the data is large and each item does real CPU work. Always measure both with a Stopwatch before committing.

    Mini-Challenge: Thread-Safe Parallel Sum

    No blanks this time — just a brief and an outline. Sum every number from 1 to 1,000,000 using Parallel.For, but do it safely: a plain total += i is a data race, so make the add atomic with Interlocked.Add (or use the thread-local subtotal overload for extra credit). The whole point: even though the threads run in a random order, the total is identical on every run. Run it and check against the expected line in the comments.

    🎯 Mini-Challenge: deterministic parallel sum

    Sum 1..1,000,000 in parallel with a thread-safe total — should always print 500000500000.

    Try it Yourself »
    C#
    using System;
    using System.Threading;
    using System.Threading.Tasks;
    
    class Program
    {
        static void Main()
        {
            // 🎯 MINI-CHALLENGE: Parallel sum of a range, the THREAD-SAFE way
            // 1. You want the sum of all numbers from 1 to 1_000_000.
            // 2. Declare a shared 'long total = 0;'.
            // 3. Use Parallel.For(1, 1_000_001, i => { ... }) to add each i to
            //    'total'. A plain 'total += i;' is a DATA RACE — instead make the
            //    add atomic with Interlock
    ...

    🎉 Lesson Complete

    • Parallel.For / Parallel.ForEach / Parallel.Invoke spread CPU-bound work across all cores
    • MaxDegreeOfParallelism caps how many threads run at once — match it to Environment.ProcessorCount
    • ✅ PLINQ: .AsParallel() makes any query parallel; .AsOrdered() restores element order
    • ✅ Output order is non-deterministic, but counts, sums and other reductions are deterministic
    • ✅ Fix data races with Interlocked (lock-free), lock (multi-step), or concurrent collections
    • ✅ Parallel errors arrive as one AggregateException — inspect .InnerExceptions
    • ✅ Don't parallelise small, sequential, or I/O-bound work — async is the tool for I/O
    • Next lesson: Memory Management & Garbage Collector Internals

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service