Skip to main content
    Courses/C#/Performance Profiling

    Advanced Track

    Performance Profiling & Benchmarking

    By the end of this lesson you'll be able to measure how fast your code runs instead of guessing — timing a block with Stopwatch, comparing two approaches fairly, writing rigorous benchmarks with BenchmarkDotNet, watching memory allocations, and knowing when to reach for a full profiler. The golden rule of all performance work: measure first, optimise second.

    What You'll Learn

    • Time a block of code with Stopwatch and read ElapsedMilliseconds
    • Compare two approaches fairly and let the numbers decide
    • Avoid Stopwatch pitfalls — JIT warmup, Debug builds, tiny loops
    • Write real benchmarks with BenchmarkDotNet: [Benchmark], BenchmarkRunner
    • Measure allocations and GC pressure with [MemoryDiagnoser]
    • Know when to reach for a profiler (dotTrace, PerfView) and spot common hotspots

    ⏱️ Real-World Analogy

    Think of two tools an athlete uses. A Stopwatch is the handheld stopwatch a coach clicks at the start and end of a sprint — instant, simple, and perfect for a quick "how long did that take?". BenchmarkDotNet is the full fitness tracker: it makes you do warmup laps, runs the test many times, throws out the flukes, reports the average and the variation, and even logs how much energy (memory) you burned. The stopwatch tells you a rough number in seconds; the fitness tracker gives you trustworthy, repeatable data you can stake a decision on. The rule both share: never judge fitness by feel — measure it. Optimising code you haven't measured is like training harder at the wrong thing.

    Three tools, three jobs

    ToolWhat it's forPrecisionReach for it when…
    StopwatchQuick, ad-hoc timing of a blockRough (ms)You want a fast "how long?" answer right now
    BenchmarkDotNetRigorous A/B benchmarks with stats + memoryHigh (ns, statistical)You're comparing approaches and the result matters
    Profiler (dotTrace / PerfView)Finding where time/allocations go in a whole appWhole-programSomething's slow but you don't know which method

    A benchmark answers "is A faster than B?". A profiler answers "which of my 500 methods is the slow one?". You usually profile to find the hotspot, then benchmark to fix it.

    1. Measure, Don't Guess: Stopwatch

    The fastest way to answer "how long does this take?" is System.Diagnostics.Stopwatch. Stopwatch.StartNew() creates and starts a high-resolution timer in one line; you do your work, call .Stop(), then read .ElapsedMilliseconds. Crucially, do not use DateTime.Now for this — it can jump backwards (clock changes, NTP syncs) and has poor resolution. Stopwatch is monotonic and built for exactly this. Read this worked example, run it, then you'll time your own block.

    Worked example: time a block with Stopwatch

    Read every comment, run it, and check the Elapsed line appears.

    Try it Yourself »
    C#
    using System;
    using System.Diagnostics;
    
    class Program
    {
        static void Main()
        {
            // Stopwatch is a high-resolution timer — far more accurate than
            // DateTime.Now for measuring elapsed time. StartNew() creates it
            // AND starts it in one line.
            var sw = Stopwatch.StartNew();
    
            // === The block of work you want to time ===
            long total = 0;
            for (int i = 0; i < 10_000_000; i++)
                total += i;                 // some real work to measu
    ...

    Your turn. The program below times a counting loop — it just needs the three Stopwatch calls filled in. Replace each ___ using the hints, then run it.

    🎯 Your turn: time a loop with Stopwatch

    Fill in StartNew(), Stop(), and ElapsedMilliseconds, then run it.

    Try it Yourself »
    C#
    using System;
    using System.Diagnostics;
    
    class Program
    {
        static void Main()
        {
            // 🎯 YOUR TURN — fill in the blanks marked with ___, then run it.
    
            // 1) Create AND start a Stopwatch in one line.
            var sw = ___;               // 👉 Stopwatch.StartNew()
    
            // The work we want to time: count up to a big number.
            long count = 0;
            for (int i = 0; i < 5_000_000; i++)
                count++;
    
            // 2) Stop the clock.
            sw.___;                    
    ...

    2. Comparing Two Approaches

    A single timing in isolation rarely tells you much — what you really want is "is A faster than B?". The classic example: building a big string. Using += in a loop secretly creates a brand-new string on every pass and copies everything so far, so the cost grows with the square of the size. A StringBuilder keeps one growable buffer and appends into it. Time both side by side and the gap is dramatic — and now you can prove it rather than assert it.

    Worked example: += concat vs StringBuilder

    See the same job timed two ways — note how large the gap is.

    Try it Yourself »
    C#
    using System;
    using System.Diagnostics;
    using System.Text;
    
    class Program
    {
        static void Main()
        {
            const int N = 50_000;
    
            // --- Approach A: build a string with += in a loop ---
            // Each += makes a BRAND-NEW string and copies everything so far,
            // so the work grows with the square of N. This is the slow trap.
            var swA = Stopwatch.StartNew();
            string a = "";
            for (int i = 0; i < N; i++)
                a += "x";               // allocates a ne
    ...

    Now you try. Time both approaches yourself and print them so you can compare. Fill in the three ___ blanks — two stopwatch calls and one result read:

    🎯 Your turn: time and compare two approaches

    Time the += loop and the StringBuilder, then print both timings.

    Try it Yourself »
    C#
    using System;
    using System.Diagnostics;
    using System.Text;
    
    class Program
    {
        static void Main()
        {
            const int N = 30_000;
    
            // 🎯 YOUR TURN — time BOTH approaches and compare them.
    
            // --- Approach A: += in a loop (the slow way) ---
            var swA = Stopwatch.StartNew();
            string a = "";
            for (int i = 0; i < N; i++)
                a += "*";
            // 1) Stop stopwatch A.
            swA.___;                    // 👉 Stop()
    
            // --- Approach B: StringBui
    ...

    🔎 Deep Dive: why your Stopwatch number can lie

    The JIT warmup trap. .NET compiles your method to machine code the first time it runs (Just-In-Time compilation). So the very first call includes the cost of compiling — it can be 10–100x slower than steady state. If you time only one run, you're often timing the compiler, not your code. The fix: do one or two throwaway runs first, then start the stopwatch.

    Debug vs Release. In a Debug build the JIT turns off optimisations so debugging is easier. Always measure performance in Release (dotnet run -c Release); Debug numbers are not representative.

    Tiny operations. A Stopwatch resolves to fractions of a millisecond at best. Trying to time something that takes nanoseconds (a single method call) by running it once is hopeless — measurement noise swamps the signal. To time tiny things you must run them millions of times in a loop and divide... which is exactly the fiddly, error-prone work that BenchmarkDotNet does correctly for you.

    work();                       // 1) warmup — let the JIT compile it
    var sw = Stopwatch.StartNew();
    for (int i = 0; i < 1000; i++) work();   // 2) many runs, not one
    sw.Stop();
    double perRun = (double)sw.ElapsedMilliseconds / 1000;  // 3) average

    3. The Right Tool: BenchmarkDotNet

    BenchmarkDotNet is the industry-standard .NET benchmarking library, and it handles every pitfall above for you. You tag methods with [Benchmark], mark one as the Baseline, and call BenchmarkRunner.Run<YourClass>(). It warms up the JIT, runs each method many times, discards outliers, computes the mean and standard deviation, and prints a clean comparison table. You read it, not write the timing scaffolding yourself. (BenchmarkDotNet needs a real .NET project, so study this worked example here, then run it in your own project with dotnet run -c Release.)

    Worked example: a real BenchmarkDotNet benchmark

    Study the [Benchmark] attributes, the runner, and the sample results table.

    Try it Yourself »
    C#
    using BenchmarkDotNet.Attributes;
    using BenchmarkDotNet.Running;
    using System.Text;
    
    // Run benchmarks with:  dotnet run -c Release
    // NEVER benchmark in Debug mode — the JIT skips optimisations there,
    // so the numbers are meaningless.
    
    [MemoryDiagnoser]              // also report allocations + GC, not just time
    public class StringBenchmarks
    {
        private readonly string[] _words = System.Linq.Enumerable
            .Range(0, 1000)
            .Select(i => $"word{i}")
            .ToArray();
    
        // Basel
    ...

    Notice what the table gives you that a bare Stopwatch can't: a Mean (in microseconds, far finer than ms), a Ratio against the baseline so the relative speed is obvious at a glance, and — because we added [MemoryDiagnoser] — the memory each method allocated. That last column is the one beginners forget.

    4. Allocations & GC Pressure

    Speed isn't the only cost. Every object you create on the heap is future work for the garbage collector (GC) — the part of .NET that reclaims memory you're no longer using. Allocate a lot and the GC runs more often, and a GC pause briefly freezes your program. Under load that's frequently what makes a "fast" method slow in production. Add [MemoryDiagnoser] to a benchmark and watch the Allocated and Gen0 columns: a method that's twice as fast but allocates ten times more memory may lose once real traffic hits it.

    Worked example: measuring allocations with [MemoryDiagnoser]

    Compare a LINQ sum with a plain loop — note the Allocated column.

    Try it Yourself »
    C#
    using BenchmarkDotNet.Attributes;
    
    // The Allocated and Gen0 columns from [MemoryDiagnoser] often matter
    // MORE than Mean. Every allocation is future work for the garbage
    // collector — and under load, GC pauses are what slow real apps down.
    
    [MemoryDiagnoser]
    public class AllocationBenchmarks
    {
        private readonly int[] _data = System.Linq.Enumerable
            .Range(0, 1_000_000).ToArray();
    
        // Allocates an iterator + delegates behind the scenes.
        [Benchmark(Baseline = true)]
        public
    ...

    🔎 Deep Dive: profilers — finding where the time goes

    A benchmark compares two known options. But when a whole app is slow and you don't know which method is to blame, you need a profiler — a tool that watches your program run and reports where the CPU time and the allocations actually went.

    • 🔬 dotTrace (JetBrains) and Visual Studio's Diagnostic Tools — friendly GUIs that show a call tree with each method's share of total time. Start here to find the "hot path".
    • 🔬 PerfView (Microsoft, free) — heavier and less pretty, but unbeatable for deep CPU and allocation investigations, including GC behaviour, in production-like conditions.
    • 🔬 dotnet-trace / dotnet-counters — free command-line tools that attach to a running process, great for diagnosing a live service without stopping it.

    The healthy workflow: profile to find the one method eating 80% of the time (it's almost never the one you'd guess), then benchmark two fixes for that method to prove which is actually better.

    Common hotspots to look for

    When you profile real C# code, the same culprits show up again and again:

    • 🔥 String building with += in a loop — use StringBuilder or string.Join.
    • 🔥 Wrong collectionList.Contains is O(n); a HashSet or Dictionary lookup is O(1). This alone can be 1,000x.
    • 🔥 LINQ on a hot path — elegant, but it allocates; a plain loop is often faster and allocation-free where it matters.
    • 🔥 Boxing — putting a value type (like int) into an object allocates; avoid it in tight loops.
    • 🔥 Repeated work — recomputing the same thing inside a loop instead of once outside it; cache it.

    Putting It Together: a fair timing helper

    Here's a small, reusable helper that puts the whole lesson into practice: it does a warmup run (to dodge the JIT trap), runs the work many times, and reports both total and per-run time — so you can compare two approaches fairly. It's a poor man's BenchmarkDotNet, but it's honest about warmup and repeats, which is more than a naive single timing ever is.

    Worked example: a reusable warmup-and-repeat timer

    Compare a manual loop with LINQ Max() using fair warmup + repeats.

    Try it Yourself »
    C#
    using System;
    using System.Diagnostics;
    
    class Program
    {
        // A reusable helper: time any block of code, print the result.
        // Run the block ONCE to warm up (let the JIT compile it), then time it.
        static void Time(string label, int repeats, Action work)
        {
            work();                          // warmup run — JIT-compile the method
            var sw = Stopwatch.StartNew();
            for (int i = 0; i < repeats; i++)
                work();                      // the timed runs
            sw.
    ...

    Notice the helper runs work() once before starting the clock — that warmup run gets the JIT compilation out of the way so it doesn't pollute the measurement. For anything you'll publish a number about, graduate to BenchmarkDotNet.

    Pro Tips

    • 💡 Measure before you optimise. Profile to find the real hotspot — it's rarely the code you suspect. Optimising a guess wastes effort and adds complexity for nothing.
    • 💡 Always benchmark in Release (dotnet run -c Release). Debug-mode numbers are meaningless because the JIT skips optimisations.
    • 💡 Add [MemoryDiagnoser] to every benchmark. Allocations and GC pressure often matter more than raw speed under real load.
    • 💡 Warm up before timing. Run the code once to let the JIT compile it, then start your Stopwatch.
    • 💡 Use Stopwatch, never DateTime.Now, for elapsed time — the wall clock can jump backwards and has poor resolution.
    • 💡 Optimise the hot path, not the warm one. Code that runs once at startup isn't worth micro-tuning; the loop that runs a million times is.

    Common Errors (and the fix)

    • Micro-benchmarking a tiny operation with Stopwatch run once: the result is mostly JIT warmup and timer noise, not your code. Run it millions of times and divide — or better, use BenchmarkDotNet, which does this correctly.
    • Benchmarking in Debug mode: the JIT disables optimisations, so the numbers are meaningless. Always run with -c Release.
    • Optimising without measuring: you "speed up" a method that was never the bottleneck and the app is no faster — you just added complexity. Profile first to find the real hotspot.
    • Ignoring allocations: you compare only the Mean column, pick the "faster" method, and it allocates 10x more memory — so under load it triggers GC pauses and loses. Read the Allocated column too.
    • Premature optimisation: you contort readable code for a speed-up the program never needed. Make it correct and clear first; only optimise the spots a profiler proves are hot.
    • Using DateTime.Now for timing: it isn't monotonic — a clock adjustment can make elapsed time negative. Use Stopwatch.

    📋 Quick Reference

    TaskCodeNotes
    Start a timervar sw = Stopwatch.StartNew();Creates + starts in one line
    Stop itsw.Stop();Freezes the elapsed time
    Read elapsed (ms)sw.ElapsedMillisecondsA long, whole ms
    Read elapsed (fine)sw.ElapsedA TimeSpan
    Mark a benchmark[Benchmark]On a method to time
    Set the baseline[Benchmark(Baseline = true)]Others compared to it
    Track memory[MemoryDiagnoser]Adds Allocated/Gen0 columns
    Run benchmarksBenchmarkRunner.Run<T>();Use dotnet run -c Release

    Frequently Asked Questions

    Q: When is Stopwatch enough and when do I need BenchmarkDotNet?

    Stopwatch is great for a quick, rough "how long did that take?" on a chunk of work that takes milliseconds or more. The moment you're comparing two approaches and the answer matters — or the operation is tiny (microseconds) — use BenchmarkDotNet. It handles warmup, many iterations, statistics, and memory for you, so the number you report is trustworthy.

    Q: Why is my first timed run so much slower than the rest?

    That's JIT compilation. .NET compiles each method to machine code the first time it runs, and that one-off cost lands on your first measurement. Do a throwaway warmup run before you start the Stopwatch (BenchmarkDotNet does this automatically).

    Q: Should I worry about memory or just speed?

    Both. Memory allocations create work for the garbage collector, and GC pauses are a leading cause of latency spikes under load. A method that's faster but allocates far more can perform worse in production. Use [MemoryDiagnoser] and check the Allocated column, not just Mean.

    Q: What's the difference between a benchmark and a profiler?

    A benchmark answers "is approach A faster than B?" for code you already suspect. A profiler answers "which method in my whole app is the slow one?" — it watches a running program and shows where time and allocations actually go. Profile to find the hotspot, then benchmark to fix it.

    Q: Isn't optimising early just good engineering?

    No — "premature optimisation" makes code harder to read for a gain the program may never need. Write it correctly and clearly first, measure, and then optimise only the spots a profiler proves are hot. Most code isn't on a hot path at all.

    Mini-Challenge: Benchmark Two Ways to Sum an Array

    No blanks this time — just a brief and an outline to keep you on track. Build a large array, sum it two ways (a for loop and LINQ's .Sum()), time each with a Stopwatch after a warmup run, and report which is faster. Run it and check your output against the example in the comments — this is the core skill of the whole lesson: let the measurement decide.

    🎯 Mini-Challenge: which array sum is faster?

    Time a for loop vs LINQ .Sum() and print the winner.

    Try it Yourself »
    C#
    using System;
    using System.Diagnostics;
    
    class Program
    {
        static void Main()
        {
            // 🎯 MINI-CHALLENGE: which way of summing an array is faster?
            // 1. Build an int[] of 5,000,000 numbers (fill it in a loop).
            // 2. Approach A — sum it with a for loop. Time it with a Stopwatch.
            // 3. Approach B — sum it with LINQ's .Sum(). Time that too.
            //    (Add 'using System.Linq;' at the top, then call data.Sum().)
            // 4. Print BOTH timings AND which approach
    ...

    🎉 Lesson Complete

    • Measure, don't guess — always profile or benchmark before optimising
    • Stopwatch.StartNew() → work → .Stop().ElapsedMilliseconds for quick timing
    • ✅ Beware Stopwatch pitfalls: JIT warmup, Debug builds, and timing tiny operations
    • BenchmarkDotNet ([Benchmark], BenchmarkRunner) gives rigorous, statistical results
    • [MemoryDiagnoser] reveals allocations and GC pressure — often the real cost
    • ✅ Use a profiler (dotTrace, PerfView) to find where time goes; watch for the common hotspots
    • Next lesson: the Final Project — put every C# skill together into one real application

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service