Channels ▼
RSS

Parallel

Array Building Blocks: A Flexible Parallel Programming Model for Multicore and Many-Core Architectures


Black-Scholes Implementation

Let us walk through the implementation of Black-Scholes, a well-known analytical model for European option pricing. We show how to migrate the hotspots in Black-Scholes from C/C++ to Array Building Blocks. Again, we use the vector style for the Array Building Blocks Code although the kernel style is also possible.


// Black-Scholes Sequential C code
float s[N], x[N], r[N], v[N], tm[N];
float result[N];
for (int i = 0; i < N; ++i)
{
   float d1 = s[i] / log[m14](x[i]);
   d1 += (r[i] + v[i] * v[i] * 0.5f) * tm[i];
   d1 /= sqrt(tm[i]);
   float d2 = d1 * sqrt(tm[i]);
   result[i] = x[i] * exp((r[i] * tm[i]) * (1.0f - CND(d2)) + (-s[i] * (1.0f - CND(d1)); 
} 

// Black-Scholes Parallel Array Building Blocks code
f32 s[N], x[N], r[N], v[N], tm[N];
f32 result[N];
dense<f32> S(s,N), X(x,N), R(r,N), V(v,N), TM(tm,N);
dense<f32> d1 = S / log(X);
d1 += (R + V * V * 0.5f) * TM;
d1 /= sqrt(TM);
dense<f32> d2 = d1 * sqrt(TM);
dense<f32> result = X * exp(R * TM) * (1.0f - CND(d2)) + (-S) * (1.0f - CND(d1));

The Array Building Blocks code is highly efficient: it is vectorized, threaded, and forward-scaling. As opposed to a targeted implementation (like SSE), Array Building Blocks allows abstraction of the underlying hardware features, instructions, threads or operating system.

Performance Results

The implementations shown showcase the benefit of using Array Building Blocks programming model in ensuring portability. These implementations also show good scalability of performance. Array Building Blocks demonstrates that you do not have to sacrifice productivity to obtain great performance. Hand-tuned performance frequently does not justify the coding time it took to obtain optimal results. Array Building Blocks strives to provide a platform that will enable developers to approach hand-tuned performance with significantly less coding time.

For example, the performance of Array Building Blocks and sequential C implementations of the Black-Scholes application was measured on an Intel Xeon processor E5345 platform. For single-threaded performance, Array Building Blocks achieves a speedup as high as 10X for Black-Scholes. Note that this is for a single core. This is because the Array Building Blocks implementation automatically vectorizes and also invokes vectorized versions of the functions like sqrt. In the sequential C implementation, programmers typically fall back to a scalar loop and call these C functions on each individual element. When running on multiple cores, the Array Building Blocks implementation of Black-Scholes achieves an additional 7X gain in performance on 8 cores of execution vs. 1 core. Thus we see an effective speedup of 70X when comparing the Array Building Blocks parallel execution vs. sequential C execution. (Performance results may vary, and of course speedup will depend on how well the baseline is already optimized. See www.intel.com/performance for more details.)

Conclusions

Today, there are several models available to parallelize and vectorize software applications, including compiler supported paradigms such as OpenMP, libraries and runtimes such as Intel Threading Building Blocks (TBB), native threading APIs such as Win32 and Pthreads, and the streaming data parallelism models usually employed by GPUs. All these programming models have their merits and demerits on user productivity, application portability and performance.

Intel Array Building Blocks is a new model supporting advanced data-parallelism including both parallelization and vectorization. It also supports advanced generic programming that significantly extends the potential of C++ for high-performance applications.

We encourage you to try Intel Array Building Blocks and to sign up for the product beta at http://software.intel.com/en-us/data-parallel/. This implementation of Array Building Blocks has the following features:

  • Powerful C++ frontend API for flexible, forward scaling data parallelism in C++
  • Many examples of key algorithms in various application domains
  • Works with Intel C/C++ Compiler, Microsoft Visual C++, and GCC.
  • Debugger integration with Visual Studio and GDB
  • Upcoming full integration with Intel developer products.
  • Available on IA-32 and Intel 64 processors for both Linux and Windows.

We envision Array Building Blocks to be especially useful in the domains of medical imaging, bioinformatics, engineering design, oil and gas, financial analytics, visual computing, signal and image processing, and science and enterprise applications like data mining. The provided examples include implementations of some key algorithms in many of these domains. We would be pleased to hear your experiences in using Array Building Blocks and look forward to working with the community to apply it to a variety of applications.

References

[1] Future-Proof Data Parallel Algorithms and Software on Intel Multicore Architecture, Anwar Ghuloum et al.

[2] Intel Array Building Blocks Specification version 1.09

[3] Programming Option Pricing Financial Models with Intel Array Building Blocks, Anwar Ghuloum, Gansha Wu, Xin Zhou, Peng Guo, Jesse Fang, and et al.

[4] A Flexible Parallel Programming Model for Tera-scale Architectures, Anwar Ghuloum, Eric Sprangle, Jesse Fang, Gansha Wu, Xin Zhou et al.

[5] Intel Array Building Blocks: C/C++ for Throughput Computing

[6] Intel microarchitecture, codenamed Larrabee: Next-Generation Visual Computing Microarchitecture

[7] Intel Xeon processor, codenamed, Nehalem-EX

[8] Intel AVX (256-bit Advanced Vector Extensions).

[9] Intel Array Building Blocks: C for Throughput Computing

Acknowledgements

We would like to acknowledge Nash Palaniswamy, Ofer Rosenberg, Timothy G Mattson, Kalyan Muthukumar at Intel for providing valuable feedback.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video