Array intersections -part 1
When a database performs a query there is every likelihood sets of matching records find their was into multiple arrays of record numbers that need filtering against each other to arrive at a final set of records. I even myself wrote a database that pretty much exclusively used this method for index.
Intersecting common numbers in two or more arrays might sound a little trivial, but there are rather a lot of papers written on how to do this optimally.
As such this is going to be a series of articles exploring the subject. One of the things we will be covering along with speed is memory usage.
We will start off badly! Though we will still use -O for compiler optimisation. It's unfair not to give the compiler at least a chance to improve on whatever rubbish we have thrown at it.
Running this I get:
Total 1000000 Intersect 48836 time 1.252631 seconds
Is that good or bad, well at this point we technical don't know. However if we do:
memusage ./a.out
We will see:
Total 1000000 Intersect 48836 time 0.459821 seconds
Memory usage summary: heap total: 52073728, heap peak: 52073728, stack peak: 2256
total calls total memory failed calls
malloc| 1000005 52073728 0
realloc| 0 0 0 (nomove:0, dec:0, free:0)
calloc| 0 0 0
free| 1000000 40000000
Histogram for block sizes:
32-47 1000000 99% ==================================================
1024-1039 1 <1%
large 4 <1%
Now that does look bad, we have made a million memory allocation calls. Why have we made such a lot of calls, well my first thought was probably largely as set1 has no idea how large it will grow to be, we actually do know and so it's pretty inexcusable for our code to be quite this awful. Lets make a small change.
We are now using a constructor that uses the array itself and hence can preallocate space or so we might think.
This time:
memusage ./a.out
Total 1000000 Intersect 48836 time 0.220658 seconds
Memory usage summary: heap total: 52073728, heap peak: 52073728, stack peak: 2272
total calls total memory failed calls
malloc| 1000005 52073728 0
realloc| 0 0 0 (nomove:0, dec:0, free:0)
calloc| 0 0 0
free| 1000000 40000000
Histogram for block sizes:
32-47 1000000 99% ==================================================
1024-1039 1 <1%
large 4 <1%
We seem to have halved the time, but we are still a memory monster! If we look closer at what memusage has told us, we see the histogram data well us the million allocations are for small blocks of memory, so plainly "set" has something complicated going on.
But seriously why are we using "set"? it's pretty lazy programming. So lets start being sensible.
This time be bother to think for ourselves and step through the arrays we know are ordered.
Total 1000000 Intersect 48836 time 0.009072 seconds
Memory usage summary: heap total: 12001024, heap peak: 12001024, stack peak: 2160
total calls total memory failed calls
malloc| 4 12001024 0
realloc| 0 0 0 (nomove:0, dec:0, free:0)
calloc| 0 0 0
free| 0 0
Histogram for block sizes:
1024-1039 1 25% ================
large 3 75% ==================================================
There is not surprisingly no comparison. We use a fraction of the memory, hardly allocate at all, and are 25x faster.
So is this the end of the story? Not by a long way if you need things truly optimal, but that is where profiling comes in. Though even before profiling you might take time to consider your actual input arrays.
We have thus far worked with two arrays of similar long length. But what if you know that one array is very large and one array very small. Taking things to an extreme if there is one value in the second array then our execution time will be in proportion to half the length of the first array, a clear linear function of length to time. For more than one value I would expect the time to be some function of the two lengths and be worse than linear.
Part 2 will address improving on this.
Comments
Post a Comment