JavaScript Array Sorting Performance Puzzler #
I recently fixed a small performance problem in Quip. Since it ended up being yet another tidbit in my mile-long list of factoids that are necessary to keep in your head when doing web development, I thought I would write up the process.
To implement search-as-you-type for contacts, documents, etc., Quip maintains a sorted array of index terms. Occasionally, this index needs to be updated on the client (for example, when a new contact is added). I noticed that the index updates were taking longer than expected, to the point where they were impacting typing latency if they happened at the wrong time.
As initially implemented the index update involving making the additions and modifications to the array, and then re-sorting it. It was the sort operation that was taking a while: up to a few hundred milliseconds in the case of an index with 10,000 terms. Given that in the grand scheme of things that is not a very big array and the sort was pure computation (with no need to touch the DOM or otherwise do anything expensive) that was surprising.
To be a bit more specific, the array that was being re-sorted was of the form:
[ ["termA", ["id1", "id2", ...], ["termB", ["id7", "id9", ...], ... ]
I've created a jsPerf test case that simulates it (array1
in the setup code). On my machine that test runs at 4.23 runs/second, which works out to 236ms, which lines up well with that I was seeing within the Quip codebase.
My first guess was that the fact that the array was nearly in sorted order already was somehow triggering some pathological behavior in v8's sort implementation. I tested this guess by shuffling the array (array
in the jsPerf test case), but sorting time was not affected. From reading the source this makes sense — for large arrays v8 uses Quicksort where the pivot point is picked as the median of the first, middle and last elements, thus it should still have O(N log N) running time regardless of the input's ordering.
I then wondered if the fact that the array members were arrays themselves was a factor. I created a simplified test case (array3
in the jsPerf test case) where the term was used as the array member directly, instead of being the first element in a sub-array. This resulted in significantly higher speeds (163 runs/second or 6.1ms). That was a bit surprising, since the effective thing being compared should have been the same (the terms were all unique, thus the list of IDs should not be a factor in the sorting).
I then decided to read the documentation a bit more carefully, which said “If [a comparator] is not supplied, elements are sorted by converting them to strings and comparing strings in lexicographic order.”¹ Though that behavior did result in a correct sort order, it did mean that each comparison involved the stringification of each of the sub-arrays, which meant a lot of needless memory allocations and computations.
Changing the sort to use a very simple comparator (function (a, b) { return a[0] < b[0] ? -1 : (a[0] > b[0] ? 1 : 0); }
, see “array1
with comparator” on jsPerf) resulted in 302 runs/second, or 3.3ms. This was more inline with my expectations. Mystery solved.
Though as it turned out, in Quip's specific case, this optimization ended up not being needed, since I removed the need for the re-sorting altogether. It was instead more efficient to do an in-place modification or insertion in the right spot into the array. The correct index can be determined via binary search (i.e. O(log N)) and the insertion may involve O(N) operations to move the other array elements around, but that's still better than then O(N log N) of the re-sort.
- This default behavior also explains the seemingly non-intuitive result when sorting an array of numbers without a comparator.
1 Comment
Post a Comment