Date of Original Version
Abstract or Description
We present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-based representation for memory, which compactly summarizes both control-flow- and dependence information. In CASH, memory optimization is a four-step process: (1) first an initial, relatively coarse representation of memory dependences is built; (2) next, unnecessary memory dependences are removed using dependence tests; (3) third, redundant memory operations are removed (4) finally, parallelism is increased by pipelining memory accesses in loops. While the first three steps above are very general, the loop pipelining transformations are particularly applicable for spatial computation, which is the primary target of CASH. The redundant memory removal optimizations presented are: load/store hoisting (subsuming partial redundancy elimination and common-subexpression elimination), load-after-store removal, store-before-store removal (dead store removal) and loop-invariant load motion. One of our loop pipelining transformations is a new form of loop parallelization, called loop decoupling. This transformation separates independent memory accesses within a loop body into several independent loops, which are allowed dynamically to slip with respect to each other A new computational primitive, a token generator is used to dynamically control the amount of slip, allowing maximum freedom, while guaranteeing that no memory dependences are violated.