Date of Original Version

3-2011

Type

Conference Proceeding

Rights Management

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-19861-8_13

Abstract or Description

Stencil computations are at the core of applications in many domains such as computational electromagnetics, image processing, and partial differential equation solvers used in a variety of scientific and engineering applications. Short-vector SIMD instruction sets such as SSE and VMX provide a promising and widely available avenue for enhancing performance on modern processors. However a fundamental memory stream alignment issue limits achieved performance with stencil computations on modern short SIMD architectures. In this paper, we propose a novel data layout transformation that avoids the stream alignment conflict, along with a static analysis technique for determining where this transformation is applicable. Significant performance increases are demonstrated for a variety of stencil codes on three modern SIMD-capable processors.

DOI

10.1007/978-3-642-19861-8_13

Share

COinS
 

Published In

Lecture Notes in Computer Science, 6601, 225-245.