<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6519772823709605314</id><updated>2012-02-16T17:57:55.937-08:00</updated><title type='text'>Twiddling the Bits</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://zimbry.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://zimbry.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>David Stafford</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-h0OWawehKko/ToK41BayHEI/AAAAAAAAAC4/Mt6yZsXz6vI/s220/david_200.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>9</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6519772823709605314.post-1060536454273786201</id><published>2012-02-12T17:39:00.000-08:00</published><updated>2012-02-12T17:39:16.864-08:00</updated><title type='text'>Everything you always wanted to know about smart people but were afraid to ask</title><content type='html'>Here is a link to the best essay I've read all week. Don't worry about looking foolish for asking questions. Worry about being foolish for not asking questions.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://tmac721.tumblr.com/post/17500383225/what-ive-learned-about-smart-people"&gt;What I've Learned About Smart People&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6519772823709605314-1060536454273786201?l=zimbry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zimbry.blogspot.com/feeds/1060536454273786201/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zimbry.blogspot.com/2012/02/everything-you-always-wanted-to-know.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/1060536454273786201'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/1060536454273786201'/><link rel='alternate' type='text/html' href='http://zimbry.blogspot.com/2012/02/everything-you-always-wanted-to-know.html' title='Everything you always wanted to know about smart people but were afraid to ask'/><author><name>David Stafford</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-h0OWawehKko/ToK41BayHEI/AAAAAAAAAC4/Mt6yZsXz6vI/s220/david_200.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6519772823709605314.post-5555540827636799941</id><published>2012-01-10T22:04:00.000-08:00</published><updated>2012-01-10T22:04:14.047-08:00</updated><title type='text'>LINQ sorting is also vulnerable</title><content type='html'>This is a follow-up post to &lt;a href="http://zimbry.blogspot.com/2012/01/nets-sort-is-not-secure-dont-use-it.html"&gt;.Net's Sort Is Not Secure. Don't Use It. Here's a Better One.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;A reader asked if LINQ's sort has the same vulnerability. I added tests for sorting an array using LINQ's 'orderby' clause. Another reader wondered if the Sort() method for generic lists had the same problem so I added that test as well.&lt;br /&gt;&lt;br /&gt;Time required for sorting one million items provided by an adversary:&lt;br /&gt;&lt;br /&gt;&lt;table border="2" cellpadding="10" rules="all"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&lt;b&gt;Algorithm&lt;/b&gt;&lt;/th&gt;&lt;th&gt;&lt;b&gt;Sorting Time&lt;/b&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Zimbry Introsort&lt;/td&gt;&lt;td&gt;&amp;lt; 2 seconds&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;.Net CLR (Array)&lt;/td&gt;&lt;td&gt;1 hour, 22 minutes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;.Net CLR (LINQ)&lt;/td&gt;&lt;td&gt;1 hour, 40 minutes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;.Net CLR (List)&lt;/td&gt;&lt;td&gt;1 hour, 22 minutes&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Detailed benchmark results are here: &lt;a href="http://dl.dropbox.com/u/40731642/sorting_performance_tests/sort_performance_tests_linq.html"&gt;Sorting up to 1 million items (with an adversary)&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Notes on the test:&lt;br /&gt;&lt;br /&gt;All tests were run under .Net framework version 4.0&lt;br /&gt;&lt;br /&gt;The times reported in the first table are an average of the times taken for one ascending sort and one descending sort.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6519772823709605314-5555540827636799941?l=zimbry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zimbry.blogspot.com/feeds/5555540827636799941/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zimbry.blogspot.com/2012/01/linq-sorting-is-also-vulnerable.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/5555540827636799941'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/5555540827636799941'/><link rel='alternate' type='text/html' href='http://zimbry.blogspot.com/2012/01/linq-sorting-is-also-vulnerable.html' title='LINQ sorting is also vulnerable'/><author><name>David Stafford</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-h0OWawehKko/ToK41BayHEI/AAAAAAAAAC4/Mt6yZsXz6vI/s220/david_200.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6519772823709605314.post-5056996984981226913</id><published>2012-01-06T13:08:00.000-08:00</published><updated>2012-01-06T13:08:17.526-08:00</updated><title type='text'>Quicksort: Is picking a pivot at random secure?</title><content type='html'>&lt;a href="http://www.blogger.com/profile/12971600792040994189"&gt;Eric&lt;/a&gt; offered this suggestion:&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;&lt;span style="font-family: Georgia, 'Times New Roman', serif;"&gt;Another way to harden quicksort is to select the elements used in the pivot calculation randomly. This makes it nearly impossible for an attacker to cause worst case performance, and doesn't require you to also implement multiple sort algorithms.&lt;/span&gt;&lt;/blockquote&gt;This is well-worth considering. A single sort algorithm is simpler than combining multiple algorithms so there has to be a compelling benefit gained to justify the additional complexity.&lt;br /&gt;&lt;br /&gt;I've done some testing and benchmarks but before we jump to the conclusion take a few minutes to ponder this with me and let's see just far down this rabbit hole goes.&lt;br /&gt;&lt;br /&gt;Hardening quicksort is a thorny problem and for an interesting reason.&amp;nbsp;Quicksort's soft spot is in the method used to estimate the median. The risk that a sequence of partitionings will be unbalanced and lead to degenerate performance depends directly on the quality of the estimate.&amp;nbsp;So why bother with a questionable estimate when it's easy to pick the actual median and have a perfectly secure quicksort? Ah, because there's a catch. Finding the actual median requires almost as much effort as it would to perform the sort itself. So, the only way that quicksort is viable is by making estimates and, as such, most quicksort algorithms are defined by their estimating technique: first item, random item, median-of-three, median-of-N,&amp;nbsp;median-of-random-three,&amp;nbsp;median-of-medians-of-three... you name it and it's probably been tried.&lt;br /&gt;&lt;br /&gt;As long as we are estimating the median we have to accept that we are, in effect, rolling the dice and hoping a rare series of bad choices does not happen. And if an attacker is present they are only too happy to hand us loaded dice to ensure the worst possible outcome.&lt;br /&gt;&lt;br /&gt;But wait... For an attack to work the attacker must be able to anticipate our method of estimating a median. What happens if we choose it at random? Does this defeat the attacker?&lt;br /&gt;&lt;br /&gt;First, what exactly do we mean by 'random'?&amp;nbsp;The pseudo-random number generator&amp;nbsp;provided by most frameworks (the &lt;a href="http://msdn.microsoft.com/en-us/library/system.random.aspx"&gt;Random class in .Net&lt;/a&gt; is a good example) is highly predictable. It won't slow down&amp;nbsp;an attacker.&amp;nbsp;&lt;span style="background-color: white; font-family: inherit; line-height: 18px;"&gt;Alternatively,&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; font-family: inherit; line-height: 18px;"&gt;y&lt;/span&gt;&lt;span style="background-color: white; font-family: inherit; line-height: 18px;"&gt;ou could use a&amp;nbsp;&lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator" style="background-color: white; font-family: inherit; line-height: 18px;"&gt;cryptographically secure random number generator&lt;/a&gt;&lt;span style="background-color: white; font-family: inherit; line-height: 18px;"&gt;&amp;nbsp;but that comes at a performance penalty that almost certainly makes it unworkable. That cost will come down soon, though.&lt;/span&gt;&lt;span style="background-color: white; line-height: 18px;"&gt;&amp;nbsp;Intel has new CPUs coming this year that offer fast, high-quality &lt;a href="http://software.intel.com/en-us/blogs/2011/06/22/find-out-about-intels-new-rdrand-instruction/"&gt;random number generation in hardware&lt;/a&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's assume we have that CPU today and it delivers fast and cryptogaphically secure random values. We are probably(1) not at risk from an attacker.&lt;br /&gt;&lt;br /&gt;Not so fast. There are still two problems we can't erase:&lt;br /&gt;&lt;br /&gt;1. Although highly unlikely, it doesn't eliminate the risk of a series of low-probability but catastrophic choices. I have seen quicksorts go quadratic on natural data, without the presence of an attacker, even when using the median-of-three method. Choosing at random should be more resilient. Whether this risk is acceptable or not depends on whether the sort is part of tax preparation software or whether it's running the auto-pilot of a jumbo jet.&lt;br /&gt;&lt;br /&gt;2. It makes low-quality estimates of the median. Partitions will be less balanced and performance will suffer for it.&lt;br /&gt;&lt;br /&gt;There is only one way I know of to reliably harden quicksort against both an attacker and an unlucky series of choices: Set an upper limit on how much effort quicksort is allowed to spend before abandoning it entirely.&lt;br /&gt;&lt;br /&gt;Introsort is secure because it limits how many bad partitions it can accept before dropping out of&amp;nbsp;quicksort and switching to&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Heapsort"&gt;heapsort&lt;/a&gt;&amp;nbsp;to finish the job.&lt;br /&gt;&lt;br /&gt;Of course, we could throw away all the complexity and just use heapsort. It has no degenerate cases and is secure against an attacker. But there's a catch. The reason we don't embrace heapsort so quickly is it tends to be significantly slower than quicksort in the average case.(2)&lt;br /&gt;&lt;br /&gt;Let's look at the benchmarks: &lt;a href="http://dl.dropbox.com/u/40731642/sorting_performance_tests/sort_performance_tests_random_pivot.html"&gt;Sorting up to 100 million items (with no adversary)&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The final algorithm,&amp;nbsp;Zimbry_PureQuicksortFatPivotR1, is a pure quicksort (does not fall back to an insertion sort on tiny partitions) that uses a pseudo-random number generator to choose the pivot. It's what Eric suggests: A single quicksort algorithm that chooses a pivot at random. Introsort is 19% faster on average and is not vulnerable to either an attacker or at risk of degenerating on a bad series of random pivot choices.&lt;br /&gt;&lt;br /&gt;So, here we are, at the bottom of the rabbit hole, and we find ourselves surrounded by trade-offs:&lt;br /&gt;&lt;br /&gt;1. We can have a pure quicksort if we're willing to accept slower performance and the highly unlikely chance of bad behavior.&lt;br /&gt;&lt;br /&gt;2. We can be invulnerable to both attackers and bad luck by choosing heapsort but that armor comes at a severe average performance cost compared to quicksort.&lt;br /&gt;&lt;br /&gt;3. We can "have our performance cake and eat it too" by building a quicksort that falls back to heapsort in an emergency. We get good performance on average and no bad behavior to worry about but it comes at the cost of additional engineering complexity.&lt;br /&gt;&lt;br /&gt;Engineering is all about trade-offs.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;(1) A motivated attacker won't give up so easily. If there is a flaw in Intel's random number generator it will be found and exploited. It's best to give the community (both the good guys and bad guys) time to stress-test any new security software or hardware before adopting it for your own use.&lt;br /&gt;&lt;br /&gt;(2) Ball-park comparison: A well-implemented heapsort is typically three to five times slower than a well-implemented quicksort.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6519772823709605314-5056996984981226913?l=zimbry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zimbry.blogspot.com/feeds/5056996984981226913/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zimbry.blogspot.com/2012/01/quicksort-is-picking-pivot-at-random.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/5056996984981226913'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/5056996984981226913'/><link rel='alternate' type='text/html' href='http://zimbry.blogspot.com/2012/01/quicksort-is-picking-pivot-at-random.html' title='Quicksort: Is picking a pivot at random secure?'/><author><name>David Stafford</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-h0OWawehKko/ToK41BayHEI/AAAAAAAAAC4/Mt6yZsXz6vI/s220/david_200.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6519772823709605314.post-905735322534257430</id><published>2012-01-05T15:52:00.000-08:00</published><updated>2012-01-05T15:54:52.878-08:00</updated><title type='text'>Vote for fixing .Net's sorting security vulnerability</title><content type='html'>&lt;br /&gt;Several people suggested reporting this to Microsoft. Good idea. If you feel it is important to fix this you can vote for it at the following link:&lt;br /&gt;&lt;br /&gt;&lt;a href="https://connect.microsoft.com/VisualStudio/feedback/details/716864/nets-sort-is-not-secure-and-is-vulnerable-to-an-attacker-who-can-use-it-to-create-a-dos-attack"&gt;Bug: .Net's sort is not secure and is vulnerable to an attacker who can use it to create a DOS attack&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6519772823709605314-905735322534257430?l=zimbry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zimbry.blogspot.com/feeds/905735322534257430/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zimbry.blogspot.com/2012/01/vote-for-fixing-net-security.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/905735322534257430'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/905735322534257430'/><link rel='alternate' type='text/html' href='http://zimbry.blogspot.com/2012/01/vote-for-fixing-net-security.html' title='Vote for fixing .Net&apos;s sorting security vulnerability'/><author><name>David Stafford</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-h0OWawehKko/ToK41BayHEI/AAAAAAAAAC4/Mt6yZsXz6vI/s220/david_200.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6519772823709605314.post-504002872569557624</id><published>2012-01-03T21:25:00.000-08:00</published><updated>2012-01-05T12:49:26.138-08:00</updated><title type='text'>.Net's Sort Is Not Secure. Don't Use It. Here's a Better One.</title><content type='html'>.Net's Array.Sort (up to at least version 4.0) has serious weaknesses:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;1. It is insecure and using it makes you vulnerable to a malicious attacker.&lt;/b&gt; .Net uses an ordinary quicksort with the pivot selected by the&amp;nbsp;median-of-three method. It is easy to provoke quicksort's worst-case&amp;nbsp;(quadratic) behavior and increase running times by multiple&amp;nbsp;orders-of-magnitude. An attacker will&amp;nbsp;be happy&amp;nbsp;to exploit this as an effective denial-of-service attack.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;b&gt;2. It is inflexible.&lt;/b&gt; It does not allow you to provide a delegate for the swap function so sorting data structures where data synchronization is&amp;nbsp;required to maintain consistency as items are moved is&amp;nbsp;impossible. Also, you can only sort items on the .Net heap so&amp;nbsp;sorting unmanaged memory is impossible.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;3. It is slower than it should be even in the absence of an attacker.&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://dl.dropbox.com/u/40731642/sorting_performance_tests/sorting_experiments.zip"&gt;Zimbry.Introsort&lt;/a&gt;&amp;nbsp;addresses each of these problems.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. It is secure. It is based on David Musser's &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.5196"&gt;Introsort &lt;/a&gt;algorithm. Introsort is essentially a quicksort that, should it fail, falls back to a secure heapsort.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;2. It is flexible. Both the compare and swap operations are provided by the user. You can use it to sort anything.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;3. It is faster. This wasn't an explicit objective but it's nice that we don't have to trade away performance to get a secure and flexible sort.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Click the links to see the benchmarks:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://dl.dropbox.com/u/40731642/sorting_performance_tests/sort_performance_tests.html"&gt;Sorting up to one million items (with an adversary)&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's look at the worst-case of dealing with an adversary.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It takes .Net over 80 minutes to sort one million integers when they are provided by an&amp;nbsp;adversary.&amp;nbsp;Zimbry.Introsort does it in less than two seconds.&amp;nbsp;Sorting 10 million items would take .Net an estimated six days. (I didn't have the patience to try it.)&amp;nbsp;Zimbry.Introsort does it in 22 seconds.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Those are the worst-case results. We can disable the adversary and benchmark it again:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://dl.dropbox.com/u/40731642/sorting_performance_tests/sort_performance_tests_no_adversary.html"&gt;Sorting up to 500 million items (no adversary)&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Zimbry.Introsort is about 45% faster in the average case and rarely less than 13% faster in any case.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(Each test was run only once so the timings for small arrays contain noticeable sampling noise. A more robust benchmark would filter multiple samples.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I am releasing the source under the MIT license:&amp;nbsp;&lt;a href="http://dl.dropbox.com/u/40731642/sorting_performance_tests/sorting_experiments.zip"&gt;Click here for the source&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Some notes on the source:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You'll find many alternative sort algorithms in the Zimbry.Sort.OtherSorts project. I experimented with these along the way. You can enable them in the benchmark if you have a great deal of patience.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The class in&amp;nbsp;QuicksortAdversary.cs was derived from Doug McIlroy's paper, &lt;a href="http://www.cs.dartmouth.edu/~doug/mdmspe.pdf"&gt;A Killer Adversary for Quicksort&lt;/a&gt;. Be careful. It will beat up quicksort and steal its lunch money.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Zimbry.Introsort contains four sort algorithms layered together:&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. Quicksort with pivot selected by median-of-nine: For large partitions.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2. Quicksort with pivot selected by median-of-five: For small partitions.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;3. Heapsort as a fall-back when quicksort recurses too deep: Heapsort is&amp;nbsp;slower than quicksort in the best case but it has no quadratic behavior&amp;nbsp;to exploit so it provides effective protection against an adversary.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;4. Insertion sort: For tiny partitions where quicksort is inefficient.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Using these four algorithms lets us enjoy the performance advantage of&amp;nbsp;quicksort for the typical case with protection against a malicious attacker&amp;nbsp;in the worst case.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Both quicksorts use Bentley &amp;amp; McIlroy's "fat-pivot" partitioning method from their paper, &lt;a href="http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.8162"&gt;Engineering a Sort Function&lt;/a&gt;,&amp;nbsp;for better performance. This is a big part of why it performs better than .Net's quicksort in many tests.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;While this is an improvement it is far from the last word in sorting. Some ideas to consider:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Better performance may be found with Vladimir Yaroslavskiy's &lt;a href="http://gdtoolbox.com/DualPivotQuicksort.pdf"&gt;dual-pivot quicksort&lt;/a&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It really needs special versions for handling known data types (avoiding the requirement for using compare and swap delegates in all cases). This would give a significant speed improvement.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There's more room for performance tuning. I tried to leave the code in a fairly readable state and some sacrifices could be made to buy a little more performance.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It would be nice to add support for stable sorting.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6519772823709605314-504002872569557624?l=zimbry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zimbry.blogspot.com/feeds/504002872569557624/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zimbry.blogspot.com/2012/01/nets-sort-is-not-secure-dont-use-it.html#comment-form' title='30 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/504002872569557624'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/504002872569557624'/><link rel='alternate' type='text/html' href='http://zimbry.blogspot.com/2012/01/nets-sort-is-not-secure-dont-use-it.html' title='.Net&apos;s Sort Is Not Secure. Don&apos;t Use It. Here&apos;s a Better One.'/><author><name>David Stafford</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-h0OWawehKko/ToK41BayHEI/AAAAAAAAAC4/Mt6yZsXz6vI/s220/david_200.jpg'/></author><thr:total>30</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6519772823709605314.post-2376575933671510943</id><published>2011-12-21T23:20:00.000-08:00</published><updated>2011-12-21T23:20:39.366-08:00</updated><title type='text'>SPRUCE - A Way of Thinking About Software</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-CL-chSFNoXo/TvKaYabRXII/AAAAAAAAAFk/ugIf9yPEj1M/s1600/spruce.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="80" src="http://3.bp.blogspot.com/-CL-chSFNoXo/TvKaYabRXII/AAAAAAAAAFk/ugIf9yPEj1M/s320/spruce.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Analyzing and comparing software can be a complex task and I needed a way to break it up into components to avoid being overwhelmed by the details. These six top-level categories help keep me organized: &lt;b&gt;Security&lt;/b&gt;, &lt;b&gt;Performance&lt;/b&gt;, &lt;b&gt;Reliability&lt;/b&gt;, &lt;b&gt;Usability&lt;/b&gt;, &lt;b&gt;Community&amp;nbsp;&lt;/b&gt;and &lt;b&gt;Economy&lt;/b&gt;. I call it Spruce to make it easy to remember. It works equally well when thinking about operating systems, languages, frameworks and individual applications.&lt;br /&gt;&lt;br /&gt;A brief summary of Spruce:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Security&lt;/b&gt; - Protection of sensitive data through passwords and encryption is the visible part. The invisible part that is hard to measure is how much exploitable surface area is exposed to an attacker. That may not be initially obvious and it generally takes experience to develop a sense for the size of the risk. There is overlap&amp;nbsp;with reliability with regards to attack-resistance.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Performance&lt;/b&gt; - We're concerned with the resources it requires relative to its alternatives. How well does it scale as the problem size increases and what trade-offs are unavoidable to achieve scale (ex: consistency vs. availability)? There can be overlap here with economy if it requires expensive hardware to achieve reasonable performance.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Reliability &lt;/b&gt;- This is about attack-resistance, fault-tolerance, error-correction and recovery. How gracefully does it deal with hardware/power failures, incorrect input and outright data corruption? There is overlap with security with regards to dealing with attacks. Can it keep running even under adverse conditions or does it go down every time the wind shifts direction? Has it been battle-tested or are you the brave pioneer? If redundancy is required there is overlap with economy.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Usability&lt;/b&gt; - This is considered from the point-of-view of the user or programmer as appropriate. I'm concerned with documentation, user-experience and API design. How well does it adapt to problems the original developer did not anticipate? Is it a pleasure to use or does it make you regret your career path?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Community&lt;/b&gt; - This is anyone who can provide you with help and enhance the usefulness of the product. It ranges from support from the original developer to a vibrant third-party community pushing the tech forward. Is it easy to get answers to questions and solve problems? How often is it mentioned on Stack Overflow and GitHub? Can you find developers who are eager to work with it or do they consistently forget to return your calls when you tell them the name of the underlying tech?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Economy&lt;/b&gt; - We're interested in the total cost of ownership relative to its alternatives. The visible parts are licensing fees, support contracts and hardware requirements. The invisible parts are the impact it has on other decisions. If it turns out you made the wrong decision how expensive is it to correct the mistake?&lt;br /&gt;&lt;br /&gt;Engineering is all about trade-offs so it's rare that any tool excels in all of these areas. Reliability may be emphasized over performance or economy. Community might trump everything else. The key thing is to simply be aware of what the trade-offs are and be conscientious about them.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6519772823709605314-2376575933671510943?l=zimbry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zimbry.blogspot.com/feeds/2376575933671510943/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zimbry.blogspot.com/2011/12/spruce-way-of-thinking-about-software.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/2376575933671510943'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/2376575933671510943'/><link rel='alternate' type='text/html' href='http://zimbry.blogspot.com/2011/12/spruce-way-of-thinking-about-software.html' title='SPRUCE - A Way of Thinking About Software'/><author><name>David Stafford</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-h0OWawehKko/ToK41BayHEI/AAAAAAAAAC4/Mt6yZsXz6vI/s220/david_200.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-CL-chSFNoXo/TvKaYabRXII/AAAAAAAAAFk/ugIf9yPEj1M/s72-c/spruce.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6519772823709605314.post-8652772141302884020</id><published>2011-12-11T11:25:00.001-08:00</published><updated>2011-12-11T11:46:56.076-08:00</updated><title type='text'>Quicksort with Hungarian Folk Dance</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://3.gvt0.com/vi/ywWBy6J5gz8/0.jpg" height="266" width="320"&gt;&lt;param name="movie" value="http://www.youtube.com/v/ywWBy6J5gz8&amp;fs=1&amp;source=uds" /&gt;&lt;param name="bgcolor" value="#FFFFFF" /&gt;&lt;embed width="400" height="300"  src="http://www.youtube.com/v/ywWBy6J5gz8&amp;fs=1&amp;source=uds" type="application/x-shockwave-flash"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;This is brilliant (and finally explains the music that plays from my computer every time I call a sort function!)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6519772823709605314-8652772141302884020?l=zimbry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zimbry.blogspot.com/feeds/8652772141302884020/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zimbry.blogspot.com/2011/12/quicksort-with-hungarian-folk-dance.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/8652772141302884020'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/8652772141302884020'/><link rel='alternate' type='text/html' href='http://zimbry.blogspot.com/2011/12/quicksort-with-hungarian-folk-dance.html' title='Quicksort with Hungarian Folk Dance'/><author><name>David Stafford</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-h0OWawehKko/ToK41BayHEI/AAAAAAAAAC4/Mt6yZsXz6vI/s220/david_200.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6519772823709605314.post-3387384404383557073</id><published>2011-09-28T21:29:00.000-07:00</published><updated>2011-09-28T23:39:56.263-07:00</updated><title type='text'>Better Bit Mixing - Improving on MurmurHash3's 64-bit Finalizer</title><content type='html'>&lt;p&gt;&lt;a href="http://tanjent.com/doku.php?id=start"&gt;Austin Appleby's&lt;/a&gt; superb &lt;a href="http://code.google.com/p/smhasher/"&gt;MurmurHash3&lt;/a&gt; is one of the best general purpose hash functions available today. Still, as good as it is, there are a couple of minor things about it that make me uneasy. I want to talk about one of those things today.&lt;/p&gt;&lt;p&gt;Here is the 64-bit finalizer from MurmurHash3:&lt;/p&gt;&lt;pre&gt;&lt;br /&gt;UInt64 MurmurHash3Mixer( UInt64 key )&lt;br /&gt;  {&lt;br /&gt;  key ^= (key &amp;gt;&amp;gt; 33);&lt;br /&gt;  key *= 0xff51afd7ed558ccd;&lt;br /&gt;  key ^= (key &amp;gt;&amp;gt; 33);&lt;br /&gt;  key *= 0xc4ceb9fe1a85ec53;&lt;br /&gt;  key ^= (key &amp;gt;&amp;gt; 33);&lt;br /&gt;&lt;br /&gt;  return key;&lt;br /&gt;  }&lt;br /&gt;&lt;/pre&gt;&lt;p&gt;The goal of a finalizer (sometimes called a "bit mixer") is to take an intermediate hash value that may not be thoroughly mixed and increase its entropy to obtain both better distribution and fewer collisions among hashes. Small differences in input values, as you might get when hashing nearly identical data, should result in large differences in output values after mixing.&lt;/p&gt;&lt;p&gt;Ideally, flipping a single bit in the input key results in all output bits changing with a probability of 0.5. This is called the &lt;a href="http://en.wikipedia.org/wiki/Avalanche_effect"&gt;Avalanche effect&lt;/a&gt;. Cryptographically-secure hashes are willing to spend an impressive number of computer cycles getting this probability close to 0.5. Hash functions that don't have to be cryptographically-secure trade away some avalanche accuracy in favor of performance. MurmurHash3 gets close to 0.5 with a remarkable economy of effort: Just two multiplies, three shifts and three exclusive-ors. Austin reports MurmurHash3 avalanches all bits to &lt;a href="http://code.google.com/p/smhasher/wiki/MurmurHash3"&gt;within 0.25% bias&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;The overall structure of the finalizer (shifts and multiplies) was designed by Austin. The constants were chosen by "a simple simulated-annealing algorithm" (his description.) The part that troubles me is the simulation was driven with random numbers. That doesn't seem right. After all, if the inputs were truly random what would be the point of mixing? It seems to me that a good mixer should be able to take a series of, say, counting numbers and produce output that is virtually indistinguishable from random.&lt;/p&gt;&lt;p&gt;First question: Could I build a better mixer by training the simulation on low-entropy inputs such as counting numbers and high-likelihood bit patterns? Here are the results for MurmurHash3 and the top 14 mixers my simulation found:&lt;/p&gt;&lt;table border="2" cellpadding="10" rules="all"&gt;&lt;caption&gt;&lt;h2&gt;4,853,184 Low-entropy keys&lt;/h2&gt;&lt;/caption&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&lt;b&gt;Mixer&lt;/b&gt;&lt;/th&gt;&lt;th&gt;&lt;b&gt;Maximum error&lt;/b&gt;&lt;/th&gt;&lt;th&gt;&lt;b&gt;Mean error&lt;/b&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;MurmurHash3 mixer&lt;/td&gt;&lt;td&gt;0.005974634384355&lt;/td&gt;&lt;td&gt;0.000265202074121&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix01&lt;/td&gt;&lt;td&gt;0.000791851287732&lt;/td&gt;&lt;td&gt;0.000190873655461&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix02&lt;/td&gt;&lt;td&gt;0.000885192071844&lt;/td&gt;&lt;td&gt;0.000197317606821&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix03&lt;/td&gt;&lt;td&gt;0.000828116139837&lt;/td&gt;&lt;td&gt;0.000191444267889&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix04&lt;/td&gt;&lt;td&gt;0.000991926125199&lt;/td&gt;&lt;td&gt;0.000199204707592&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix05&lt;/td&gt;&lt;td&gt;0.000932171539344&lt;/td&gt;&lt;td&gt;0.000195036565653&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix06&lt;/td&gt;&lt;td&gt;0.000819874127995&lt;/td&gt;&lt;td&gt;0.000199337714668&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix07&lt;/td&gt;&lt;td&gt;0.000805656657567&lt;/td&gt;&lt;td&gt;0.000194049828213&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix08&lt;/td&gt;&lt;td&gt;0.000906415252337&lt;/td&gt;&lt;td&gt;0.000191828700595&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix09&lt;/td&gt;&lt;td&gt;0.000927020281943&lt;/td&gt;&lt;td&gt;0.000194149835046&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix10&lt;/td&gt;&lt;td&gt;0.000877774261186&lt;/td&gt;&lt;td&gt;0.000194585176663&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix11&lt;/td&gt;&lt;td&gt;0.000955867323390&lt;/td&gt;&lt;td&gt;0.000194238221367&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix12&lt;/td&gt;&lt;td&gt;0.000932377589640&lt;/td&gt;&lt;td&gt;0.000193450189656&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix13&lt;/td&gt;&lt;td&gt;0.000789996835068&lt;/td&gt;&lt;td&gt;0.000190778327016&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix14&lt;/td&gt;&lt;td&gt;0.000800917500758&lt;/td&gt;&lt;td&gt;0.000191264175101&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;The maximum error tends to be about seven times lower. Mean error is lower too but only slightly. The answer, then, is yes, it does get better results with low-entropy keys. But that leads to the second question: Does training on low-entropy keys result in better or worse performance with high-entropy keys? To find out, I tested it with 100 million cryptographic-quality random numbers:&lt;/p&gt;&lt;table border="2" cellpadding="10" rules="all"&gt;&lt;caption&gt;&lt;h2&gt;100,000,000 Random keys&lt;/h2&gt;&lt;/caption&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&lt;b&gt;Mixer&lt;/b&gt;&lt;/th&gt;&lt;th&gt;&lt;b&gt;Maximum error&lt;/b&gt;&lt;/th&gt;&lt;th&gt;&lt;b&gt;Mean error&lt;/b&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;MurmurHash3 mixer&lt;/td&gt;&lt;td&gt;0.000212380000000&lt;/td&gt;&lt;td&gt;0.000040445117187&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix01&lt;/td&gt;&lt;td&gt;0.000177410000000&lt;/td&gt;&lt;td&gt;0.000040054211426&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix02&lt;/td&gt;&lt;td&gt;0.000179150000000&lt;/td&gt;&lt;td&gt;0.000039797316895&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix03&lt;/td&gt;&lt;td&gt;0.000170070000000&lt;/td&gt;&lt;td&gt;0.000040068117676&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix04&lt;/td&gt;&lt;td&gt;0.000185470000000&lt;/td&gt;&lt;td&gt;0.000039775007324&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix05&lt;/td&gt;&lt;td&gt;0.000192510000000&lt;/td&gt;&lt;td&gt;0.000039626535645&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix06&lt;/td&gt;&lt;td&gt;0.000195660000000&lt;/td&gt;&lt;td&gt;0.000040216433105&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix07&lt;/td&gt;&lt;td&gt;0.000193810000000&lt;/td&gt;&lt;td&gt;0.000039834248047&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix08&lt;/td&gt;&lt;td&gt;0.000196590000000&lt;/td&gt;&lt;td&gt;0.000039063793945&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix09&lt;/td&gt;&lt;td&gt;0.000174280000000&lt;/td&gt;&lt;td&gt;0.000039541943359&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix10&lt;/td&gt;&lt;td&gt;0.000181790000000&lt;/td&gt;&lt;td&gt;0.000039569926758&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix11&lt;/td&gt;&lt;td&gt;0.000181140000000&lt;/td&gt;&lt;td&gt;0.000039501779785&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix12&lt;/td&gt;&lt;td&gt;0.000183920000000&lt;/td&gt;&lt;td&gt;0.000039622690430&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix13&lt;/td&gt;&lt;td&gt;0.000175610000000&lt;/td&gt;&lt;td&gt;0.000039437180176&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix14&lt;/td&gt;&lt;td&gt;0.000182000000000&lt;/td&gt;&lt;td&gt;0.000040092158203&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;Yes, we can do slightly better than MurmurHash3 even on random test keys.&lt;/p&gt;&lt;p&gt;On average this is a small but measurable improvement that comes at no performance cost. For the worst-case, low-entropy keys that concern us the most, it provides a significant improvement.&lt;/p&gt;&lt;p&gt;Here's a handy tip: A great use for a mixer like this is to use it to "purify" untrustworthy input hash keys. If a calling function provides poor-quality hashes (even counting numbers!) as input to your code then purifying it with one of these mixers will insure it does no harm.&lt;/p&gt;&lt;p&gt;Here are the parameters for the mixers I tested:&lt;/p&gt;&lt;table border="2" cellpadding="10" rules="all"&gt;&lt;caption&gt;&lt;h2&gt;Mixer parameters&lt;/h2&gt;&lt;/caption&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&lt;b&gt;Mixer&lt;/b&gt;&lt;/th&gt;&lt;th colspan="5"&gt;&lt;b&gt;Mixer operations&lt;/b&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;MurmurHash3 mixer&lt;/td&gt;&lt;td&gt;33&lt;/td&gt;&lt;td&gt;0xff51afd7ed558ccd&lt;/td&gt;&lt;td&gt;33&lt;/td&gt;&lt;td&gt;0xc4ceb9fe1a85ec53&lt;/td&gt;&lt;td&gt;33&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix01&lt;/td&gt;&lt;td&gt;31&lt;/td&gt;&lt;td&gt;0x7fb5d329728ea185&lt;/td&gt;&lt;td&gt;27&lt;/td&gt;&lt;td&gt;0x81dadef4bc2dd44d&lt;/td&gt;&lt;td&gt;33&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix02&lt;/td&gt;&lt;td&gt;33&lt;/td&gt;&lt;td&gt;0x64dd81482cbd31d7&lt;/td&gt;&lt;td&gt;31&lt;/td&gt;&lt;td&gt;0xe36aa5c613612997&lt;/td&gt;&lt;td&gt;31&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix03&lt;/td&gt;&lt;td&gt;31&lt;/td&gt;&lt;td&gt;0x99bcf6822b23ca35&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;td&gt;0x14020a57acced8b7&lt;/td&gt;&lt;td&gt;33&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix04&lt;/td&gt;&lt;td&gt;33&lt;/td&gt;&lt;td&gt;0x62a9d9ed799705f5&lt;/td&gt;&lt;td&gt;28&lt;/td&gt;&lt;td&gt;0xcb24d0a5c88c35b3&lt;/td&gt;&lt;td&gt;32&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix05&lt;/td&gt;&lt;td&gt;31&lt;/td&gt;&lt;td&gt;0x79c135c1674b9add&lt;/td&gt;&lt;td&gt;29&lt;/td&gt;&lt;td&gt;0x54c77c86f6913e45&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix06&lt;/td&gt;&lt;td&gt;31&lt;/td&gt;&lt;td&gt;0x69b0bc90bd9a8c49&lt;/td&gt;&lt;td&gt;27&lt;/td&gt;&lt;td&gt;0x3d5e661a2a77868d&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix07&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;td&gt;0x16a6ac37883af045&lt;/td&gt;&lt;td&gt;26&lt;/td&gt;&lt;td&gt;0xcc9c31a4274686a5&lt;/td&gt;&lt;td&gt;32&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix08&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;td&gt;0x294aa62849912f0b&lt;/td&gt;&lt;td&gt;28&lt;/td&gt;&lt;td&gt;0x0a9ba9c8a5b15117&lt;/td&gt;&lt;td&gt;31&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix09&lt;/td&gt;&lt;td&gt;32&lt;/td&gt;&lt;td&gt;0x4cd6944c5cc20b6d&lt;/td&gt;&lt;td&gt;29&lt;/td&gt;&lt;td&gt;0xfc12c5b19d3259e9&lt;/td&gt;&lt;td&gt;32&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix10&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;td&gt;0xe4c7e495f4c683f5&lt;/td&gt;&lt;td&gt;32&lt;/td&gt;&lt;td&gt;0xfda871baea35a293&lt;/td&gt;&lt;td&gt;33&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix11&lt;/td&gt;&lt;td&gt;27&lt;/td&gt;&lt;td&gt;0x97d461a8b11570d9&lt;/td&gt;&lt;td&gt;28&lt;/td&gt;&lt;td&gt;0x02271eb7c6c4cd6b&lt;/td&gt;&lt;td&gt;32&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix12&lt;/td&gt;&lt;td&gt;29&lt;/td&gt;&lt;td&gt;0x3cd0eb9d47532dfb&lt;/td&gt;&lt;td&gt;26&lt;/td&gt;&lt;td&gt;0x63660277528772bb&lt;/td&gt;&lt;td&gt;33&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix13&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;td&gt;0xbf58476d1ce4e5b9&lt;/td&gt;&lt;td&gt;27&lt;/td&gt;&lt;td&gt;0x94d049bb133111eb&lt;/td&gt;&lt;td&gt;31&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Mix14&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;td&gt;0x4be98134a5976fd3&lt;/td&gt;&lt;td&gt;29&lt;/td&gt;&lt;td&gt;0x3bc0993a5ad19a13&lt;/td&gt;&lt;td&gt;31&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6519772823709605314-3387384404383557073?l=zimbry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zimbry.blogspot.com/feeds/3387384404383557073/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zimbry.blogspot.com/2011/09/better-bit-mixing-improving-on.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/3387384404383557073'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/3387384404383557073'/><link rel='alternate' type='text/html' href='http://zimbry.blogspot.com/2011/09/better-bit-mixing-improving-on.html' title='Better Bit Mixing - Improving on MurmurHash3&apos;s 64-bit Finalizer'/><author><name>David Stafford</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-h0OWawehKko/ToK41BayHEI/AAAAAAAAAC4/Mt6yZsXz6vI/s220/david_200.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6519772823709605314.post-234268845328661134</id><published>2011-09-27T23:41:00.000-07:00</published><updated>2011-12-23T12:25:28.800-08:00</updated><title type='text'>About Me</title><content type='html'>I am a computer programmer and I love building things.&lt;br /&gt;&lt;br /&gt;I have co-founded four startups, assembled and managed teams of developers and delivered products that millions of people use every day.&lt;br /&gt;&lt;br /&gt;Some bits of trivia:&lt;br /&gt;&lt;br /&gt;Remember that pinball game that shipped with Windows from Windows95 through XP? That was made by me and my co-founders at my first startup, Cinematronics. We were later acquired by Maxis.&lt;br /&gt;&lt;br /&gt;My second startup, Eclipse Entertainment, created the first browser-based 3D graphics engine. It was acquired by WildTangent.&lt;br /&gt;&lt;br /&gt;I built a compressor called Quantum and for a few years in the 90s it was the best-performing lossless data compressor. It was the first compressor with a workable solution to the optimal parsing problem and typically outperformed PkZip by 20%-30%. Borland, Microsoft and Novell licensed it. Microsoft used it for their .CAB files (it knocked two diskettes off the size of Windows) and almost all of their products were compressed with Quantum.&lt;br /&gt;&lt;br /&gt;I created World Class Chess which was the first commercial chess program which offered a user-configurable opening book library. I made it configurable so my chess-playing friends would populate the opening book for me. That plan&amp;nbsp;back-fired in an amusing way: They tended to enter whatever obscure openings they were studying at the time. As a result the program liked&amp;nbsp;to play openings it couldn't understand and when it exhausted its book it proceeded to waste time rearranging its pieces to match its own ideas about pawn structure, mobility and king safety. (If you're a programmer you might enjoy thinking about how to represent a chess opening book library in just 8 bits per move. It used to be important to fit it in as little space as possible.)&lt;br /&gt;&lt;br /&gt;I&amp;nbsp;created&amp;nbsp;Launch, a Windows shell that replaced the dreaded Program Manager. It sold well for a few years and&amp;nbsp;had a community of enthusiastic users.&lt;br /&gt;&lt;br /&gt;I created Savant, the first Scrabble game that was fast enough to do a tree search in the end-game. Its middle-game was weaker than the best program at the time but it made up for it with near-perfect end-game play. Computers are faster now and the best Scrabble games today do Monte-Carlo simulations at all stages of the game.&lt;br /&gt;&lt;br /&gt;I built and shipped (in Encarta) a natural-language search engine at Microsoft. Its accuracy outperformed our best keyword-based search engine at the time.&lt;br /&gt;&lt;br /&gt;Optimizing for performance is one of my passions. You can read about a couple of my programs at the following links:&lt;br /&gt;&lt;br /&gt;There Ain't No Such Thing As The Fastest Code&lt;br /&gt;&lt;a href="http://downloads.gamedev.net/pdf/gpbb/gpbb16.pdf"&gt;http://downloads.gamedev.net/pdf/gpbb/gpbb16.pdf&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;It's a Wonderful Life&lt;br /&gt;&lt;a href="http://downloads.gamedev.net/pdf/gpbb/gpbb18.pdf"&gt;http://downloads.gamedev.net/pdf/gpbb/gpbb18.pdf&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I wrote much of Borland's Turbo C run-time library. I often think back to what a great experience it was to be a part of that team. The stars were aligned for Borland in those days. It was magical and, in many ways, you could say that I've spent much of my career attempting to recreate that magic. I've come close but have never been totally successful. Such things are rare. I was too young to appreciate it at the time.&lt;br /&gt;&lt;br /&gt;My interests are focused around machine learning, natural language processing, information theory, high-performance computing and scalability.&lt;br /&gt;&lt;br /&gt;I lived and worked in Japan for two years though I've pretty much forgotten what little Japanese I once knew.&lt;br /&gt;&lt;br /&gt;I enjoy games and puzzles: Go (Weiqi/Weichi/Baduk), Chess and Scrabble.&lt;br /&gt;&lt;br /&gt;In 2002 I took a year off and rode a motorcycle around the world.&lt;br /&gt;&lt;br /&gt;I was interrogated by the Russian secret police and told them everything I knew in great detail. They grew bored and let me go.&lt;br /&gt;&lt;br /&gt;Some things I believe: Working with good people is soul-enriching. Mentoring younger people is a way of repaying those who mentored me. Shipping is hard but real artists ship. Nothing can replace perseverance.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6519772823709605314-234268845328661134?l=zimbry.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zimbry.blogspot.com/feeds/234268845328661134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zimbry.blogspot.com/2011/09/about-me.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/234268845328661134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6519772823709605314/posts/default/234268845328661134'/><link rel='alternate' type='text/html' href='http://zimbry.blogspot.com/2011/09/about-me.html' title='About Me'/><author><name>David Stafford</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/-h0OWawehKko/ToK41BayHEI/AAAAAAAAAC4/Mt6yZsXz6vI/s220/david_200.jpg'/></author><thr:total>0</thr:total></entry></feed>
