Parallel Data Structures
In my first blog for Dr. Dobb's, I thought I'd forgo the standard introductory essay. If you don't know who I am, I'm easy to find online. (I just did a Google search on my name; my writing over at the Intel Software Network site, my book, and a video clip I did way back in 2008 all came up in the top five hits. That's me, professionally, at least.) I do work for Intel, but this will not be the place for me to play corporate shill. That's my day job.
My plan for this blog is to investigate parallel coding. Some algorithms will be simple and direct, others may be not so simple. I might start with something that I included in The Art of Concurrency and re-examine it in light of some new programming model, or attempt to redesign it in a way I hadn't thought of at the time, or maybe investigate an approach that I didn't have space to include. I can guarantee right now that not all of the parallel implementations I present will be the best, but that might be the point I'm trying to make. I will also try to mix up the parallel libraries used. If you stick with me long enough, you may notice that the Intel programming models will tend to be used in a majority of the code presented. That's only because these are the ones I'm most familiar with, not because I'm trying to push an agenda here.
For this first entry, I don't have any code to share. Instead, I figured I might try a thought experiment (which I shall be wont to do from time to time). So, today's point to ponder: Are there any truly parallel data structures?
I was brushing up my algorithmic complexity analysis skills by reading A Programmer's Companion to Algorithm Analysis by Ernst L. Leiss. While working through some tree and graph algorithm examples, I naturally thought about how I would make those algorithms parallel. In doing that, I also needed to consider how to make access to these kinds of shared data structures safe. Reading nodes in a graph or tree already constructed is one thing, but when you need to add nodes to a binary search tree or rebalance an AVL tree, while allowing other threads to be accessing nodes in the same structure, it can quickly become a complex programming task. Assuming you'd like to have some performance benefit from parallel execution, that is.
This got me thinking about how the design of these structures (and all other data structures I'm aware of) relies on serial execution. You can't blame the Computer Science pioneers who devised and developed data structures and the algorithms that we use to incorporate them into our applications. Sequential execution has been the standard model for computer algorithms and patterns since the beginning of modern programming (and even before that if you include the Turing Machine model).
For our parallel usage, we can take a "standard" data structure and add some synchronization primitives to make it thread safe. It's a necessary evil, but such synchronization often becomes a performance bottleneck. If our programming paradigm is shifting to concurrency and parallelism, shouldn't there be some data structure that is designed to be inherently parallel? I'm looking for something that can allow multiple threads to safely access the structure, read some of the contents, overwrite or insert others, handle conflicts, and do it all correctly without external synchronization.
Does the serial nature of our memory subsystems restrict how we approach design of data structures? Maybe the best we can do is to have some form of light-weight synchronization built into the structure and then utilized by the algorithms governing access and modifications to that structure. It's not the pinnacle of what I would hope for, but it might be the best we can do.
I wonder if our brains and thought processes will ever catch up to our software development tools. It might be that we're in a transition state, where everything we do is still going to start with a serial algorithm and then get translated to a parallel implementation down the road. We might not code the serial version directly, but the initial design could still have a basis in sequential execution or a linear ordering of steps. It's what we've been doing for over half a century and it's not going to change overnight. I think the same will be true of our data structures for the foreseeable future.
Will there ever be native parallel data structures, will we even need such beasts, or is this all just a pipe dream on my part? Who knows? I didn't have the grades to get accepted into Hogwarts Tech, so I don't have a crystal ball to foresee the resolution to technical questions. However, once a generation of programmers and computer scientists have been educated and practiced exclusively in a world where "parallel programming" is simply "programming," parallel data structures may naturally emerge.