It's been five years since Dr. Dobb's published Herb Sutter's breakthrough article A Fundamental Turn Toward Concurrency in Software -- a call to action for programmers everywhere. To find out if we've made progress when it comes to parallelism, James Reinders recently talked with Herb Sutter about the impact of the now-famous "Free Lunch" article and where we're headed.
Reinders: I'm glad to we could get together today and talk a little bit about parallelism. We've been seeing a lot of adoption of parallelism. If we think back five years ago, I think, or so when we started seeing multicore come on the scene, I guess we've had about five years of multicore processors now and we're still seeing new evolutions and tools and capabilities but we've seen a lot of activity. Start me off. What would you like to talk about, about what you've seen that fascinates you?
Sutter: How many days do we have? I think it's exciting that we're turning a really big ship right now -- the whole industry. Five years ago when we did A Fundamental Turn Toward Concurrency in Software article we didn't realize it was an evolution like we've done for GUIs and objects before where we're going to retool the industry and it's going to take five to seven years to turn the ship. That train didn't start right away and every major player, new startups, have been -- and still are -- bringing tools to market, bringing new operating systems, languages, improvements, libraries to help us program in a different way.
So it's interesting to see us part way through this turn because we've got the super tanker that is still going strong but it needs to go this way now and we're still partway through. I think it's interesting when people ask, "is Threading Building Blocks (TBB) the answer?" or "is Microsoft's Parallel Programming Language, the Parallel Patterns Library, the answer?" And they're all good and they're also Concurrency 1.0. We are now just beginning to bring that first set of real tools to market that mainstream programmers can reasonably adopt where they're to the point where people can learn them and use them in a real-world code.
Reinders: I love your analogy of turning a ship because when you wrote the article about the "free lunch is over" a lot of the reaction at the time was, "Oh, my goodness, we're switching to this multicore. I didn't know that." Or, "Oh, my gosh, this is scary", and starts turning the ship just for people to know something was happening. What year was that? That was about five years ago.
Sutter: Yeah, that was about Christmas-New Years 2005; that's five years now. Time flies when you're having fun.
Reinders: That article was extraordinarily popular very quickly because it was an issue that was obvious. It was important but it was also an issue that not a lot of people were thinking about. The ship continues to turn. You mentioned a number of things that are turning that ship a little bit and the ship is not done turning, as you said, so we're not done. Why don't we talk about a few of the things that are turning the ship? You mentioned TBB/PPL. These are built on Cilk-style scheduling or --
Sutter: Work stealing.
Reinders: Work stealing, task stealing.
Sutter: Stealing is good. It's good when it's legal.
Reinders: This is part of what we're teaching people is stealing is good, in this context. To me, though, the most fundamental thing is telling people "program and task". Don't spend your time programming in threads. Don't spend your time trying to figure out, "Oh, my gosh. I'm running on quad-core. How do I divide my program into four parts?"
Sutter: Yeah, don't do that, please. Don't ask: How many cores do I have?
Reinders: How many parts should I divide a program into?
Sutter: As many as you can.
Reinders: That shocks people when you say that, right?
Sutter: Stop only when it hurts.
Reinders: Is there something we could show or talk about why. You and I have the same faith that we should tell people: "Create as many tasks as you can." And people say, "But when is too many?" The answer is, there isn't too many if you do it right.
Sutter: Can I come back to that question?
Sutter: I think it's more important, first, to talk about what the landscape is before seeing that this is one particular part of the landscape. It's so easy to say, "Okay, how can I get scalable parallelism?" which is obviously interesting. We obviously want to do this and it's the biggest challenge that we've been addressing with the different products that we and other people have been putting out but it's only one slice of the problem.
Reinders: That's good. I love to get right into the details but you're going to paint us a higher level picture first. I think that might be helpful.
Sutter: Anybody who has gone to concurrency talks lately, and especially mine, know that I'll natter on about the Three Pillars of Concurrency and people have identified this: David Callahan at Microsoft, myself and you and others have done this. We have three major things when we talk about concurrency and parallelism and everything that we want to do in this space fits into these. Every tool you'll ever use, every program requirement you'll ever have in your application fits into these and they all compose well so all parts are going to be surfaced in your application, in your tools. They just have to fit together.
The first is just doing work asynchronously. You want isolation and today we use threads. Tomorrow we'll use active objects/futures. So that is all about: How do I just do work independently? I want a background thread to get work off the GUI so I can do my print rendering and my background save without making the application freeze up and not process windowing messages anymore, which would be bad. Let's not talk about applications that still do that.
This is good on single core. You always want to be able to do this kind of asynchronous, isolated, independent programming. Just express pieces of work that don't depend on each other and they communicate through messages. So threads, messages in the future, active objects and futures are coming online now. Those are the tools that you use just to express independent work.
Then you talk about the free lunch. Now we're talking about scalability and this is free lunch right here because you want to build applications that can use a hundred cores to get the answer faster. Hopefully, something approaching a hundred times faster; that would be really nice.
Here we have used thread pools in the past -- and now we're getting closer to the question you were asking -- and here is where TBB, PPL, TPL, pick your favorite alphabet soup, fit in. How can I decompose my work into pieces, run them in parallel, and then joined again? So Java Fork/Join, all of these technologies play in this space.
Finally, once you've expressed these two different kinds -- we sometimes use the term "concurrency" for the first one and "parallelism" for the second one -- once you've done all this, how do you prevent yourself from falling over by corrupting shared state and having races and deadlocks? And that's where you have consistency. So you want to be race free, deadlock free, preferably by construction.
Reinders: This third pillar applies, regardless of the fashion, the first or second pillar that you're using.
Sutter: Exactly. In fact, all of these compose because you may want to get work off the GUI thread, have a background worker so that you use Pillar 1 for that. You have some asynchronous work going on. You send messages to it when you have work that you want it to do, independently. It may say: For this chunk of work, I can get this chunk done using hundred cores, by using Pillar 2 techniques on a work stealing pool or on a thread pool, if any of these things share data, we better know about locks and about lock-free programming, or whatever is appropriate. Synchronous is what this is about.
So having said that we, the industry, have been doing a lot of work in Pillar 2 with Threading Building Block, PPL, TPL. We've been doing a lot of work here. We've been doing generally less work in Pillar 1 and 3 because, frankly, we don't have much better than locks. We've been doing exploring into transactional memory. We still hope that will pan out some day -- we still hope -- and we want that to be true because we have no better answer right now on the horizon than locks, with all their awards. We can still do better by associating data with locks, having lock hierarchies. Those idioms we're going to progress more and deliver more tooling around but it's a bit more incremental, very valuable but more incremental. So we still need to do that.
We still need to do the active objects in futures and those kinds of abstractions for Pillar 1 more. We have shipped some things because the agent's library that's in Visual Studio 2010 for native code fits into Pillar 1. It's about asynchronous isolation, message oriented kind of work. But mostly we've been trying to build 1.0 tools for this middle pillar -- scalability -- because that is the one that was missing where people couldn't really write code that could make you some hundred cores to get the answer faster which we really need. I mean, if you're going to have a hundred cores soon, you might as well be able to use them.
Reinders: Yeah. As you've said, we've seen a lot of activity on Pillar 2 recently. Pillar 1 and Pillar 3 have been going on for some time before that. My first PC had DOS on it and DOS didn't understand multiuser concurrency and so forth so it wasn't dealing with any of these pillars. I think we've seen our operating systems and our software work on Pillar 1 and 2 -- Pillar 1, just being able to run multiple apps at a time, making the apps play with each other or virtual memory on the system so that address 10 wasn't the same on both programs.
Reinders: We take things like that for granted but you're really talking about concurrency in a very broad level here. The first couple of pillars have been important as we've gotten the computers to be where we know and love them, even before multicore. But once multicore processors show up, then Pillar 2 has to get flushed out and I think that's why we're seeing a lot of activity in that area. But there is a lot of work going on in all three pillars. I just think Pillar 2 has gotten a little bit more of the limelight.
Sutter: And that's exactly the way it should be because this is the one that is missing. The whole point of the "free lunch being over" means I can't ship it binary anymore and just rely on single threaded execution being faster and faster. Thank you very much on new processors. It still is faster but not nearly at the rates that it used to be. So we're still seeing incremental improvements.
Thank you, by the way. I appreciate that very much to get the free lunch back for the next decade. To ship an application that is going to get faster on tomorrow's hardware and faster still on next year's hardware, I have to be able to ship an app that's scalable that has lots of latent concurrency, juicy latent concurrency, that I can spread across a machine with lots of cores, otherwise I won't be able to ship my app today and still have people buy a new machine three years from now and have a wonderful experience where, Look, it's faster again, which is what we want.
And it's not just to sell machines. There are a lot of compute intensive problems. If we never went to multicore and minicore, we are still interested in compute intensive work and in software that today can't deliver answers fast enough and needs to. So even if we hadn't made the turn to multicore, we'd still be improving through single threaded means the performance of code. It's just that demand hasn't gone away. It's just, now, because of practical reasons, we need to turn the ship and now continue doing that this other way. So we just changed the tools a little bit here and we're all going to have to go through the process of learning all the new tools and using them well and developing multiple major releases of them.
Reinders: Yes, we do. We have that ahead of us.
Sutter: What do you think is missing when you look at this? As an industry, not just Microsoft or Intel, what would you like us to be to deliver great tool X two to four years from now that we don't have as well as we'd like today that would really help people?
Reinders: The two areas I think about: one is feeling like you have a stable development environment. I feel like you want to make things as simple as possible, but no simpler, as the phrase goes. Simple to me, as a developer, is when I'm running my program and I'm in a debugger -- let's take something really easy -- and I stop the program at a certain point and the variables are not doing what I want. Obviously, the program is not working at this point. Now, what I really want to do is back up. I've always wanted to back up.
Sutter: I always wanted time machines.
Reinders: Right. But what we do is go, I'll stick a printf here or I'll set a breakpoint earlier. And then what do I do? I run the program again. That has worked so well for me for decades. It doesn't work so well today on a multicore because often when I'm debugging is a program that's nondeterministic. It's not running the same each time. Because of this concurrency, so many things are in flight that if I rerun that program and just try to stop it a little earlier, I'm not debugging the same problem. I can't guarantee that if I ran it forward to the point I was at before that I'd see the same errors. So all the notes I've made about when I get to this point the data structure is going to be corrupt so I'm going to stop earlier. It's not there.
So the first thing I dream of is how can we get some sort of a rollback, a time machine, as you said? There are interesting innovations going on in this space, certainly. But I think that the first thing that's very disturbing about doing parallel programs is when you get into errors if you can't control and get rid of the errors, then you have to debug them and having a predictable debug environment.
So I think we've going to see a lot of innovations in this space and a lot of different things out there. We're going to have to learn them for a while and then I suspect at some point we'll all just wake up and say, Okay, now, as an industry, we understand how to make this feel stable again. In the interim, it's a little scary.
The other thing is: any tools that we can do to help people imagine how to decompose their program in parallel -- and that's just a big topic -- Notice, I didn't say "tools to automatically extract the parallelism". But there are a lot of algorithmic transforms, a lot of insights that people will get, and I still believe developers are the key to this. The algorithmic developments always trump hardware or software developments. You look at the problem different than anyone has done before and you can do it faster or better. I think there are enormous opportunities for developing algorithms that take advantage of parallelism and solve problems, maybe problems we've solved before or problems we've never solved and it in parallel and to surprising results. We're seeing some of this with the Internet.
Sutter: But the Internet is not deterministic.
Reinders: Yeah, exactly. The things our phones can do now if you had asked me to design a device like that 10 years ago they would have said: You've got to have an enormous database and you put it all on the phone. When you query restaurants, it's not like that. It's solving it by a different fashion which is the database is somewhere else. If that makes any sense, parallelism is a different dimension to explore with our algorithms and our applications. I have a feeling there are tools to help with that space and I haven't seen those tools yet, not in a big way, and I think there is a lot of opportunity for us to see things that will help us explore with that imagination, innovate toolkits. They'll be building blocks, things that we build upon to construct this new world.
Sutter: It's interesting. The first part of your answer, in particular, focused on the third pillar, which is consistency.
Reinders: Yes. Nasty, nasty pillar.
Sutter: Yeah, so deterministic re-execution of your program so you can debug it. In the perfect world, we just wouldn't write races, just don't do that.
Reinders: Please don't.
Sutter: By the way, don't write deadlocks, too, while you're at it. The reality today is we will make mistakes and so how do we get tools that will let us detect race conditions and fix them in a way that gives us the information we know what to do and to be able to do things like replay for debugging and to be able to re-execute things, as we need to.
Having said that -- that's on the tool side -- we can do a lot still here just by providing ways that people can already write by hand today to associate what data is protected by what new text so that even statically or dynamically just in library code you can enforce that. We can do that today. People can write it by hand today. We should actually standardize on something like that. It seems like it's kind of a useful thing you would expect to find in a library.
Reinders: I agree. You teach that in your concurrency classes. What do you call them? "Success with concurrency."
Sutter: Effective concurrency.
Reinders: Effective concurrency, same thing.
Sutter: Hopefully, it's success, yes.
Reinders: What fascinates me about that is you're not talking about changing a standard necessarily or this or that. It's not rocket science. It's a simple coding method that says, make the lock and the data associated and make it a coding standard that you can understand, you can test against and that will step you forward. It will greatly reduce the number of times you have this sort of problem. This problem is worth avoiding.
Sutter: If we think that all of our customers should be solving this problem, why don't we solve it for them, or at least provide an implementation where -- Instead of, Mr. Customer, we know we've been telling you to write this all yourself but we decided we should actually write this for you and you can use it, if you like, or use your own, if you like. But at least here is a standard thing that people ought to be able to expect.
I think it's a travesty to look at our industry, as a whole -- every language: Java, .NET, native code on Windows, Linux, other platforms -- all of them have concurrency libraries. Not a single one that I know of, not a single major mainstream language in library, ignoring niche ones, provide with their threads and their locks and their thread pools anything resembling associate data with new texts. So here is how you express a lock hierarchy. It's time. Let's do that. So that will be part of the next wave that we do.
You were asking: What about the engineering and performance going on here? I'm going to actually use a picture that you drew in your Threading Building Blocks book, if I can.
Reinders: Certainly. I'm flattered.
Sutter: But if on the one axis, you look at the overhead that it is costing us just to express concurrency or parallelism in our code -- I'll explain that in a minute -- and then you look at what is the chunk size or the granularity.
The idea is, when I have to decompose work. This is now Pillar 2. I want to be scalable so I want to decompose a problem into a hundred pieces or more so I can run them at a hundred cores, get the answer faster, and then merge the results. It is fork, run in parallel and join. The fork part is always interesting because how many pieces should I make. For example, if you're doing a recursive algorithm like QuickSort, with QuickSort you'll say, Okay, partition. Put the smaller things on one side, the bigger things on the other, and then recurse, QuickSort the left subrange and the right subrange. Those things you should be able to do in parallel.
The question is: Do you just blindly do them in parallel? What if the subrange is size two? If 'do it in parallel' means ship it over there on to the neighbor's thread pool and ship the answer back, I mean --
Reinders: Just to possibly flip --
Sutter: Flip two elements, possibly, because maybe they're already in order.
Sutter: But for a single element range, it will work but you won't be happy with the performance. It is just like the chunk size. If you take the chunk size down to the extreme where you have, say, integer addition (int + int) you are not going to ship that over to a thread pool to get the answer back. It's just not going to make any sense.
Reinders: Right. You want to have a vector to add together or you want to have a larger list to do a ?_swat on.
Sutter: Yes. Clearly, as you get the chunk size further and further smaller, the overhead start to get astronomical. Where if I did every plus as its own thread pool work item, clearly I'm going to be spending most of my time in just overhead and not getting the answer faster which is silly.
You might say, Then, for that reason larger chunks are good, except the problem is that you end up having a picture like this and the reason is, is because if I have, say, an eight-core machine and I make five big chunks, I'm clearly not going to scale to make use of the whole machine. So I need at least eight chunks. The problem is what if I divide my work into 12 chunks on an eight-core machine? That will be fine, even assuming that they're exactly the same execution time, each, which we can never assume, really. Then, you'll run eight in parallel and then half the machine will be idle while you finish the other four, assuming you have the whole machine. That's where you're in this part of the range and you start having the ragged ends. Let's use colors. This is the "ragged tail" of my work didn't match and so some processors are finished and others are still going but there is no other work for the others to do so we have that ragged tail. This is the "not enough" to scale.
Reinders: So you can add all the cores you want but we didn't divide it up into enough chunks for there to be work to do.
Sutter: What we often do is we say, Look, the nice thing is to pick your chunk size. There is this nice big sweet spot and you can pick anything in there. I think that's not the greatest answer. I think what the answer needs to be is you always want your chunks to be as small as possible. The reason is the more chunks you have the more juicy latent concurrency you're exposing in your application.
Let's say that that recursive QuickSort went all the way down without cutoff to even single element QuickSorts. You now expressed all this juicy latent concurrency, which if they were zero overheads to doing that you could scale to the largest machines. You could scale longer in the future on newer machines than you could with a smaller number of chunks, and you can scale better because you'll get less raggedness at the end because everything will be much smoother. The only reason we don't try to go to zero is because of overheads. If you look at where thread pools were -- maybe red is a good example for thread pools -- thread pools' curve is kind of like this.
Reinders: A little bit high on the overhead.
Sutter: Exactly, because you generally want to have about 10K instructions worth of stuff to do and try and explain that to somebody. How do I know if it's 10K instructions? Because you have to have at least this much work to make it worth running somewhere else.
Reinders: In instructions you're talking assembly language instructions --
Sutter: Right, which of course we all know what that is. But, then, Intel TBB, PPL, Cilk -- all of these technologies -- what they've been doing is driving this cost down. This cost is the cost of: If I didn't do a cutoff and I'm trying to do something that's too small, how small is too small? If I say, for instance, my parallel QuickSort do left subrange in parallel, right subrange in parallel and I end up not actually doing it in parallel -- I ended up doing it synchronously because all of these fall back to synchronous if it's not worth doing in parallel -- How much did it cost me to say that I could? Because there is data structure and stuff you set up to say, this is now ready to run in parallel, if you can. If you can say, do the work stealing. So how much did it cost me to say it could run on parallel, if I end up not using it?
The answer today in our current product at least will say PPL and TPL. It's on the order of a few empty function calls. If you think of the cost of just the overhead of calling a function times three or four, that is way, way lower than the cost of shipping something off to a guaranteed run on another thread which at least a context switch, even on the same core, and then ship the answer back.
Instead of tens of thousands of cycles, we're now talking about on the order of just make sure it does an empty function calls or more worth and then make your chunks as small as possible but not so small that the overheads dominate, and that tends to be a rule of thumb that people can understand and grok better.
Reinders: I think this is easier to teach what you're talking about. I really like this. I think the key things here are that programmers programming on top of a method that uses task stealing. Even though we've mentioned it a bunch of times, the application program doesn't need to know what the heck this task stealing is all about. But using something like PPL, TBB, Cilk and sitting on top of it, those technologies are able to drive down the overheads and then we're able to give the simplistic answer of: create as many tasks as you can. Because the task stealing doesn't have to happen if there isn't available hardware concurrency, the overhead can be lower.
I really like your rule of thumb: Make sure that the smallest amount of work you're asking to do in parallel takes about the same amount of overhead as a few function calls. I think that's more understandable than 10,000 assembly language instructions for most of us today.
Sutter: The nice thing about it is if you do that, you get to take advantage of the fact that costs are going down over here and then future releases will all try to keep engineering those costs down. You can always asymptotically approach zero, at least, but we never quite get there but we can do some more in driving that cost down.
Besides all the other reasons you don't want to be over here -- the ragged tail, not enough scalability -- is there is no engineering we can do over here because these effects are from my chunks are too coarse. That's going to happen no matter what.
Reinders: Right. Your program is going to be frozen in time and not going to keep up with software or the hardware or the futures. So programming using these methods means that software advances that lower overhead will benefit your application without changing the application and benefits of more hardware cores. The hardware will also benefit you. Yeah, the advice is pretty simple. Expose as much parallelism as possible. Create as many tasks as parallel.
Sutter: The nice thing about this is this part of the curve is fundamental. It's just inherent in the problem. But this part of the curve is under our engineering control and if people just express the concurrency we can drive these costs down. Now, we may not be to the point where people can just ignore it and just go QuickSort down to nothing. But you don't do that for sequential code anyway, even in plain Jane QuickSort.
Today, the reality is, anybody who writes their own QuickSort generally forgets the cutoff. If the range is too small, QuickSort it anyway and you go down to quick sorting a single element or two.
Reinders: It still works.
Sutter: It still works. Commercial implementations know better. They say, We'll QuickSort until you get down to a threshold and below that threshold we'll do bubble sort because bubble sort is evil and smelly and doesn't scale but it's faster than QuickSort.
Reinders: Right, for smaller.
Sutter: For smaller data sizes. They know to do that. It's the same thing here. If you think of, say, a recursive QuickSort, you do the same thing instead of going from QuickSort for left and right subrange if they're too small going to bubble sort. Instead you, say, if they're too small to run that instead of running on parallel run them sequentially and maybe switch to bubble sort at the same time. So it's roughly the same idea still as we do today in high quality implementations and even sequential algorithms.
Reinders: I think that this is catching on -- the turning the ship -- engineering to writing applications, create a lot of task. I think these tools are coming on into play -- TBB, PPL, Cilk -- they all encourage this. So I see work going on there. What else is turning the ship?
Sutter: Let me ask you question? What do you think a functional language is? Are they going to take over the world? I have an answer but I want to hear yours.
Reinders: I don't think so and what's more disturbing to me about that is I couldn't tell you why. Maybe you can tell me: Why aren't they going to take over? They've been around a long time. They solve a lot of these problems. They solve issues that we're going to be battling for ages with non-functional languages, yet they don't take over. Now, yes, there are lots of fans of Erlang, Haskell, and F#.
Sutter: And rightly so.
Reinders: Absolutely, and they definitely do some great things. What I can't explain is why they aren't more popular. I don't think they will be because I don't think these sort of things have been around so long that whatever it is -- unless somebody can explain to me what's wrong and why aren't they taking off more and then we fix it -- I don't see a fix. I don't see a change.
Sutter: I think my answer is the same as yours. I actually have two schizophrenic answers because on the one hand I think if they would have taken over the world it would have happened already and if anything this switch to parallelism, which favors some of the advantages functional languages had if it's ever going to take off, this is the time.
Reinders: Right. You would think so.
Sutter: Having said that, fundamentally, it turns out that most people who have done computer science can think that way. However, to write production code for mainstream programmers and large scale systems that aren't just, say, telecommunications or certain domains that naturally do lend themselves there -- use the right tool for the job -- but for most mainstream coding people don't get functional languages. They find it harder to think and reason about things that way, for whatever reason.
Reinders: I agree. That's the reality. I don't know why.
Sutter: Cookbook comparative is easier.
Reinders: So that's one of your answers.
Sutter: The schizophrenic answer is on the one hand I agree. I doubt that F# and Haskell will take over the world, as nice as they are. They will definitely take over segments of the world.
Sutter: But I don't think they're going to take over where C left off -- or where Java or C# or C++ left off. Having said that, they already have taken over the world. Functional languages really already have taken over the world. If you look at the best parts of functional languages that makes them most suited to parallelism ,they have already been silently adopted by the mainstream imperative language.
Reinders: It's an old story. Invent a better mousetrap. We're going to take the futures out of it and put it in what we've always been doing.
Sutter: It turns out that you can have that cheese without having it be in that mousetrap. You can put it in this mousetrap, as well. The advantages are not unique to functional languages.
Let me talk about two of them. One of them is immutable data. Everybody gets that this is good and look at functional languages. The pure ones have only immutable data. Anytime you change an object you need to make a copy of it.
The bad news is you aren't going to do that every time you modify the millionth element in a two million element vector. You are not going to take a copy of the whole thing. There are some interesting ways around that by adding a bit of complexity under the covers to avoid having to make those expensive copies.
But apart from that there are two kinds of functional languages -- this is the sound bite -- there is "pure functional" language where all data is pure and immutable, and then there are the ones people actually use.
Reinders: The ones that have performance.
Sutter: Right, because you cannot live in the pure world for any real code. However, mainstream languages have had immutable data for a long time. If you just look at some obvious examples, Java and C#, .NET strings are immutable. You would never change a string once you construct it. You always get a new one. And that same pattern has been played out by people writing libraries that are concurrency friendly because you don't ever need to synchronize the state that it doesn't change. So we're already using that, not to the degree functional languages do but to an effective degree.
Reinders: But enough to lower their advantage, at least on this one.
Sutter: Not to lower it but to borrow it, to use it as well.
Reinders: Right. But jump to a functional language is worth less in this vector because it's available online which is what we have now.
Sutter: But the big one is lambda functions and closures.
Reinders: Lambda functions. You know I'm a fan of lambda functions.
Sutter: I mean, your compiler ships it. Our compiler, which just went into escrow, is now going to be shipping it in the coming months. It's already out in the release candidates and Betas. GCC is adding it. It's in the C++ standard. That's just C++.
Reinders: I don't think we've seen one feature go any faster than lambdas as far as, Oh, we want this. Put it in a compiler.
Sutter: But it's not just C++ the standard in the compilers. C# has been adding lambdas incrementally so now they're full blown in the current version of C# and you can see them in other language like Visual Basic, as well. Of course F# already knows about these things.
Reinders: And LISP had them for a while.
Sutter: Something in LISP. That might have been true. I forget. That was so long ago. Of course, that's where a lot of this stuff has come from. But even Java, which had an on-again, off-again romance with adding lambdas for Java 7, last I heard playing along at home with the scoreboard it seems to me that they will add it which, frankly, I think is a good thing for Java and they really ought to because it makes things so much easier to be able to speak about a piece of code, as a first class object.
Now, funny, in C++ we could already do that. We had functors but you had to write them out of line. So it's interesting because lambda functions in C++ are only -- only, only -- syntactic sugar for writing a functor by hand.
Reinders: But they're such sweet sugar.
Sutter: And they're essential sugar because the difference between writing the code here versus writing it somewhere else makes all the difference.
Reinders: It sits on the same PowerPoint slide when I'm teaching.
Sutter: But the code locality, visual locality for programmers reading code, is very, very important.
Reinders: I joke about putting them on one slide but it's exactly what you're saying. You can look at the code.
Sutter: And know what it does.
Reinders: Yeah. With TBB we started with C++ before lambdas. Now, I like teaching TBB a lot more because we can take advantage of lambdas. When you say, do this in parallel, the "this" isn't somewhere else in this functor. As you said, that's just syntactic sugar but it makes a lot of difference.
Sutter: It's super fundamental to talk about a piece of code as an object because especially in the sense of parallelism you want to be able to say, take this and note, evaluate it, and ship the results somewhere. It's, take this and evaluate it somewhere else. Take this before evaluating and then run it over here. Java has had runnable objects but if you take a look at C# and Java they've both had runnable objects and they've had delegates. However, once you start having the closures, the anonymous delegates with all the nice lambda syntax and C#, what a difference it makes. So I hope the whole world comes to love lambdas as much as we do. They're already well on the way there.
Reinders: I think so.
Sutter: I could go on for a long time about lambdas. They make parallel, they make STL, non-parallel STL much easier. You end up using them way more often than you think you will. But the bottom line is: that's why, on the one hand, functional languages aren't going to take over the world, I don't think, in that nobody is going to drop what they're doing and write everything in Haskell, even if maybe they should. But they're not going to. However, they already have taken over the world because the essential, I think the key advantages, are already in the imperative languages.
Reinders: And if they come up with more really cool things they'll probably just be adopted by all programming languages.
Sutter: "Adopted" is so much better than steal" -- begged and borrowed.
Reinders: It feels like we talked about so many interesting things. It's been wonderful having you here. We need to get together again soon because there are so many fascinating topics here.
Sutter: Sounds good.