Self-Optimizing Data Structures
October 23, 2009
Why not?
Why should I have to spend my time thinking about how some data
structure will be used, just to select the best implementation? Why
can't the computer do that for me?
Let's say I want to have a list of stuff. In today's world, I have to
designate the implementation to be used:
List<Stuff> stuffs = new ArrayList<Stuff>();
But it may well be that a LinkedList would have been more
efficient. Or some other kind of list.
When I was building the Omniscient Debugger (See "Debugging
Backwards in Time" Dr Dobbs... 2005? Just Google "Omniscient"), I
found that I spent a massive amount of time creating lists and adding
things to them (almost never removing anything). I forget, but it was
90% of my CPU or more.
I studied the usage of lists and found a distinct pattern. 99.9% of
lists in situation 1 contained exactly 1 element, and 99% of lists in
situation 2 had 1 or 3 elements. I wrote two specialized list
implementations and my program ran 5x faster.
Why couldn't the computer have done that for me?
I can easily imagine a static optimizer. I'd write the program, run it
on a "representive" test case in "analysis" mode, and an analyzer
would post-process the results and cache suggestions for which list
implementations to use where. On the next run, my program would use
those suggestions and it would run faster.
[If we limit our choices to LinkedLists and ArrayLists, then the
decision is pretty simple. Allowing the machine to create new and
innovative implementions ala the Omniscient Debugger would be so much
more fun, however.]
But why do it statically?
Why not just let the program run and refine the data structures being
used dynamically?
The Sun Hot Spot dynamic complier does something pretty much like this
for compiling byte code to native instructions, including backing out
previously compiled code and recompiling, when required.
So it's definitely possible.
Would it be worth it?
-Bil