Tuesday, January 3, 2012

.Net's Sort Is Not Secure. Don't Use It. Here's a Better One.

.Net's Array.Sort (up to at least version 4.0) has serious weaknesses:

1. It is insecure and using it makes you vulnerable to a malicious attacker. .Net uses an ordinary quicksort with the pivot selected by the median-of-three method. It is easy to provoke quicksort's worst-case (quadratic) behavior and increase running times by multiple orders-of-magnitude. An attacker will be happy to exploit this as an effective denial-of-service attack.

2. It is inflexible. It does not allow you to provide a delegate for the swap function so sorting data structures where data synchronization is required to maintain consistency as items are moved is impossible. Also, you can only sort items on the .Net heap so sorting unmanaged memory is impossible.

3. It is slower than it should be even in the absence of an attacker.

Zimbry.Introsort addresses each of these problems.

1. It is secure. It is based on David Musser's Introsort algorithm. Introsort is essentially a quicksort that, should it fail, falls back to a secure heapsort.

2. It is flexible. Both the compare and swap operations are provided by the user. You can use it to sort anything.

3. It is faster. This wasn't an explicit objective but it's nice that we don't have to trade away performance to get a secure and flexible sort.

Click the links to see the benchmarks:

Let's look at the worst-case of dealing with an adversary.

It takes .Net over 26 minutes to sort one million integers when they are provided by an adversary. Zimbry.Introsort does it in half a second.

Those are the worst-case results. We can disable the adversary and benchmark it again:

Zimbry.Introsort is twice as fast in the average case and rarely less than 13% faster in any case.

(Each test was run only once so the timings for small arrays contain noticeable sampling noise. A more robust benchmark would filter multiple samples.)

I am releasing the source under the MIT license: Click here for the source

Some notes on the source:

You'll find many alternative sort algorithms in the Zimbry.Sort.OtherSorts project. I experimented with these along the way. You can enable them in the benchmark if you have a great deal of patience.

The class in QuicksortAdversary.cs was derived from Doug McIlroy's paper, A Killer Adversary for Quicksort. Be careful. It will beat up quicksort and steal its lunch money.

Zimbry.Introsort contains four sort algorithms layered together:

1. Quicksort with pivot selected by median-of-nine: For large partitions.

2. Quicksort with pivot selected by median-of-five: For small partitions.

3. Heapsort as a fall-back when quicksort recurses too deep: Heapsort is slower than quicksort in the best case but it has no quadratic behavior to exploit so it provides effective protection against an adversary.

4. Insertion sort: For tiny partitions where quicksort is inefficient.

Using these four algorithms lets us enjoy the performance advantage of quicksort for the typical case with protection against a malicious attacker in the worst case.

Both quicksorts use Bentley & McIlroy's "fat-pivot" partitioning method from their paper, Engineering a Sort Function, for better performance. This is a big part of why it performs better than .Net's quicksort in many tests.

While this is an improvement it is far from the last word in sorting. Some ideas to consider:

Better performance may be found with Vladimir Yaroslavskiy's dual-pivot quicksort.

It really needs special versions for handling known data types (avoiding the requirement for using compare and swap delegates in all cases). This would give a significant speed improvement.

There's more room for performance tuning. I tried to leave the code in a fairly readable state and some sacrifices could be made to buy a little more performance.

It would be nice to add support for stable sorting.


  1. Array.Sort has a weekness how about the linq sort? Is this one any safe?

    It should be reported to Microsoft using their public ticket where everyone can vote up. If the released code is under and acceptable license for Microsoft it could probably be event updated in future version of the .NET Framework.

  2. "The class in QuicksortAdversary.cs was derived from David Musser's paper, A Killer Adversary for Quicksort"

    Sorry no. That paper is by Doug McIlroy

  3. Another way to harden quicksort is to select the elements used in the pivot calculation randomly. This makes it nearly impossible for an attacker to cause worse case performance, and doesn't require you to also implement multiple sort algorithms.

  4. Well if it's that good you should definitely report it to Microsoft, maybe they will include it in future version.

  5. Jc, Good question. I don't know if LINQ uses the same underlying sort. It will be easy enough to find out, though. I'll run some tests and post what I learn.

  6. Stephen, Thanks for the catch! I have made the correction.

  7. Eric, That's worthy of a post in itself. If you look in Zimbry.Sort.OtherSorts you'll find two variations of quicksort that choose a pivot at random: QuicksortClassicR1 and QuicksortFatPivotR1. I'll include these in the benchmark, run it, and post what I find.

  8. Array sort might be unsecure, might be inflexible, but not agreed that its slower.

    I created a C# window application and create an array of random generated integers (from 1 to 1 million). What result i got is
    1. For 1 million records it takes 124 milliseconds
    2. For 10 million records it takes 1310 milliseconds
    3. For 100 million records it takes 13979 milliseconds

    So i am 100% disagree that "It takes .Net over 80 minutes to sort one million integers "

    My computer configuration is
    Window 7 64 bit Enterprise, i7, 8GB RAM

    Here is sample code:
    private void StartProcess_Click(object sender, EventArgs e)
    long len = long.Parse(txtLength.Text);

    long[] arrForArraySort;
    arrForArraySort = new long[len];

    Random r = new Random();

    DateTime dt1 = DateTime.Now;
    for (long i = 0; i < len; i++)
    arrForArraySort[i] = r.Next(1, 1000000);

    //Array sort
    DateTime dt2 = DateTime.Now;
    DateTime dt3 = DateTime.Now;

    lblFillArray.Text = dt2.Subtract(dt1).TotalMilliseconds.ToString();
    lblArraySort.Text = dt3.Subtract(dt2).TotalMilliseconds.ToString();

  9. Satya, please read it again. It says:

    > It takes .Net over 80 minutes to sort
    > one million integers when they are
    > provided by an adversary.

    The last part is very important. It takes a long time when the integers are supplied by an adversary, not when they are chosen at random.

    .Net's sort is fast when sorting random integers.

    Read this paper by Doug McIlroy:

    A Killer Adversary for Quicksort

  10. Strange usage of the term "insecure", because some user can invoke a quadratic time response? That's not what term insecure.
    Can't supply as delegate? You supply an object that implements IComparer or Comparision to do specific sorting.

  11. > Strange usage of the term "insecure",
    > because some user can invoke a quadratic
    > time response? That's not what term insecure.

    Yes, it is insecure. It is a resource starvation attack.

    > Can't supply as delegate? You supply an
    > object that implements IComparer or
    > Comparision to do specific sorting.

    Please read it again. You can supply the comparison delegate. It does not provide for supplying the swap delegate.

  12. "It is a resource starvation attack."
    It always possible to define some data to invoke the worst case. It's not insecure but potential inefficient, that's why sorting algorithm declare a worst case. It's even stated in the MSDN documentation that it's O(n^2), which I think for Quicksort is the data starts in reverse order.

    Insecure to me, means you can access sensitive information like storing password as plain text.

    If we use your definition of insecure. Your bank's website is insecure, just means it's slow.

    "... sorting data structures where data synchronization is required to maintain consistency as items are moved is impossible. ..."

    They why are you using array.sort() to do the sort? The data structure (class) itself should provide the sort (if it is required.). An array is just a simple collection of "simple" types.

    What would have be clever if the sort detected the type of the array be pass in, and if it was possible to do a radix / counting sort on it to do that.

  13. > Insecure to me, means you can access
    > sensitive information like storing
    > password as plain text.

    Insecure means it's vulnerable to an attack. You don't have to take my word for it. Here's Wikipedia:


    > They why are you using array.sort() to do
    > the sort? The data structure (class) itself
    > should provide the sort (if it is required.).
    > An array is just a simple collection of
    > "simple" types.

    Yes, you can create a unique sort function for each of your data structures.

    Or, you can use one common sort function on many different kinds of data.

    One sort or many. Which is better?

  14. I wonder if the .net framework uses quicksort vs introsort is that .net framework development had progressed too far for it modified so it could implement it. If they wait a little bit longer it could have generics from the start.

  15. Wow, blast from the past. Where to begin...

    As a general rule, arrays run like crap compared to the generic collections in .NET. The generic, linq enable collection classes only begin performing less than optimally when you start using the sorting and query methods in particular. Even so the interim collection performance makes up for it.

    Why do I bring this up? Because javanubs usually come in, bitter that they're having to switch and begin immediately espousing the glory of simple arrays when they see generics, assuming they're the same as java generics. The rule being "if you don't like the platform, the easiest comparisons to assault are its performance or its security." This smells like a permutation on that attack vector, and is an argument I haven't heard brought up since 2007 when it became largely irrelevant.

    So, the simple answer to your original problem: don't use arrays, save when it's absolutely mandated by a core class.

  16. David its nice artical keep it up

  17. Mallah Diction,

    > So, the simple answer to your original
    > problem: don't use arrays, save when
    > it's absolutely mandated by a core class.

    Have you tested this?

    If the Sort method on the generic List<> collection can also be provoked into quadratic behavior then it is also insecure and vulnerable to an attack.

    I'll run benchmarks overnight and report what I find tomorrow.

  18. Thanks, it's nice that you've provided this. I felt like it would be handy if there were also some "standard" Sort overloads, so that this implementation can be easily swapped with existing .NET Sort methods.

    I've added a couple of them, feel free to use them, modify them, no licence attached:

    public static void Sort<T>
        (IList<T> list, int start, int count, Comparison<T> comparison)
        Compare c = (i, j) => comparison(list[(int)i], list[(int)j]);

        Swap s = (i, j) =>
            T tmp = list[(int)i];
            list[(int)i] = list[(int)j];
            list[(int)j] = tmp;

        Sort(c, s, start, count);

    public static void Sort<T>
        (IList<T> list, int start, int count, IComparer<T> comparer)
        Sort(list, comparer.Compare);

    public static void Sort<T>(IList<T> list, IComparer<T> comparer)
        Sort<T>(list, 0, list.Count, comparer);

    public static void Sort<T>(IList<T> list)
        Sort(list, Comparer<T>.Default);

    public static void Sort<T>(IList<T> list, Comparison<T> comparison)
        Sort(list, 0, list.Count, comparison);


    With these in place, you can simply write:

    IList<T> list = SomeList();

    // using the default comparer for 'T'

    // using a delegate
    Introsort.Sort(list, (x, y) => (x - y));

    // using a custom implementation of IComparer<T>
    Introsort.Sort(list, new CustomComparer());

  19. @Mallah Diction: I don't see your point. Array.Sort<T> and List<T>.Sort both use the same QuickSort algorithm. And generic collections all use plain arrays under the hood, which are incredibly fast to allocate and manipulate.

    The only issue with arrays is that their design is poor and feels like ANSI C legacy compared to interface-based collections and syntactic sugar brought by LINQ extension methods. But in sorting context, there is nothing "closer to the metal" than arrays.

  20. Thanks, Groo. What I really need to do is build type-aware versions of the sort. Forcing it to always use delegates is expensive. It's not just due to the delegate call overhead but also because some algorithmic optimizations are not available to a pure comparison sort.

    1. I am not sure what you meant by "to always use delegates is expensive". I doubt anyone frequently has a need to sort a plain array of integers? Or did I misunderstand you? Most use cases for sorting will involve around comparison sorts using delegates (if you wanted to implement a radix sort, fine, but you would still use a delegate to project a certain property to an integer).

    2. Groo, your question deserves a thoughtful answer. It's going to be longer than I would prefer so please bear with me. There are three parts to the answer:

      Part 1: The way I use delegates isn't the same as the way .Net is using them. .Net lets you provide an IComparer for an object while I simply call a delegate with array indexes. There's a big difference between the two approaches. .Net can store items outside of the collection being sorted, leaving it briefly in an inconsistent state, while I must always work with items within the array itself.

      This has two consequences:

      1. For quicksort: .Net's quicksort can take the pivot item, store it in a local variable (it's just a pointer to an object), and compare it against items in the array. My sort has to first swap it to the beginning of the array, compare against that with an index and then swap it back when the partitioning is complete. Quicksort has to do this often and it's expensive. For heapsort: Floyd's optimization is not available to me which is about a 25% penalty.

      2. In my code every comparison and swap incurs two array reference penalties.

      Part 2: .Net does not provide a delegate to perform swaps which increases performance.

      Part 3: .Net provides special versions of its sort to handle native data types. You wondered how often this would be useful. I don't know but it's probably often enough to be worth handling. Worse, if you don't do it, anyone with a simple benchmark is likely to draw the wrong conclusions.

      Going with the pure delegate approach offers maximum flexibility but it comes at both a mechanical cost (the overhead of indirecting through an array index) and an algorithmic cost (choices that are made unavailable.)

      I can't say for certain what the final performance cost is without building the alternative but I suspect it's significant. There's a lot of performance left laying on the table with the code in its current state.

    3. Right, now I get what you wanted to say. I completely forgot that abstracting the swap operation prevents all access to actual elements. When you mentioned delegates, it confused me because calling IComparison<T>.Compare(A,B) or simply the delegate Comparison<T>(A,B) is more or less the same, performance wise, but you were referring to a completely different issue.

      That makes perfect sense, as I was also initially confused by the approach that you decided to take, given that the point of the article seemed to be to provide a replacement for QuickSort (with performance benchmarks).

      Not to mention that the "non-delegate" version would additionally improve benchmark results in your favor.

  21. Your test jig has a error in the ForceScrambledOrder method:
    array[i] = swap;
    should be:
    array[i] = array[swap];

  22. I compared this to a .Net implementation of TimSort from http://timsort4net.codeplex.com/releases/view/73969 and sadly, TimSort kicks this guy's butt (though there is an outlier for the All unique, partial ascending order test case -- which is why I discovered the error in the partial shuffling code). If you want the test jig changes, I'll zip it up and you can delve further. Admittedly we're not playing fair because of your extra swapper-features, but it IS a big difference TimSort takes 54% of Introsort on 10,000 entries.

  23. Marc,

    Thank you very much for the bug report! I'll make the correction and rerun the benchmarks.

    Please send me the TimSort code. I would like to give it a try.

  24. @Marc: It is worth noting that TimSort has a O(n) space complexity, compared to O(logN) for QuickSort and O(1) for HeapSort.

  25. I think Mallah's mistake was to assume that C# arrays were non-generic and required boxing/unboxing which would make them very inefficient. But .NET arrays are generic.

  26. This comment has been removed by the author.

  27. This comment has been removed by the author.

  28. Starting with .Net 4.5 Array.Sort IS using introsort itself. .Net 4.5. So your post is not correcy anymore.

  29. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a .Net developer learn from Dot Net Training in Chennai. or learn thru Dot Net Training in Chennai. Nowadays Dot Net has tons of job opportunities on various vertical industry.
    or Javascript Training in Chennai. Nowadays JavaScript has tons of job opportunities on various vertical industry.

  30. Now that the hardware 123.hp.com/setup 9025 of your OfficeJet printer is completed. Let’s move on to the software installation process;

    Navigate to 123.hp.com/setup 9025 web page and specify your printer model there to download the driver software
    Once downloaded, launch the file on your computer
    Follow the on-screen instructions to complete the software installation process

    Do not worry if you face any hindrances during the 123.hp.com/setup 9025 process, call us right away.