Date of Original Version
All Rights Reserved
Abstract or Description
This paper aims to tackle two fundamental memory bottlenecks: limited off-chip bandwidth (bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach exploits the inherent error resilience of a wide range of applications. We introduce an approximation technique, called Rollback-Free Value Prediction (RFVP). When certain safe-to-approximate load operations miss in the cache, RFVP predicts the requested values. However, RFVP never checks for or recovers from load value mispredictions, hence avoiding the high cost of pipeline flushes and re-executions. RFVP mitigates the memory wall by enabling the execution to continue without stalling for long-latency memory accesses. To mitigate the bandwidth wall, RFVP drops some fraction of load requests which miss in the cache after predicting their values. Dropping requests reduces memory bandwidth contention by removing them from the system. The drop rate then becomes a knob to control the tradeoff between performance/energy efficiency and output quality.
For a diverse set of applications from Rodinia, Mars, and NVIDIA SDK, employing RFVP with a 14KB predictor per streaming multiprocessor (SM) in a modern GPU delivers, on average, 40% speedup and 31% energy reduction, with average 8.8% quality loss. With 10% loss in quality, the benefits reach a maximum of 2.4× speedup and 2.0× energy reduction. As an extension, we also evaluate RFVP’s latency benefits for a single core CPU. For a subset of the SPEC CFP 2000/2006 benchmarks that are amenable to safe approximation, RFVP achieves, on average, 8% speedup and 6% energy reduction, with 0.9% average quality loss.