Date of Original Version
All Rights Reserved
Abstract or Description
Chip Multi-Processors are quickly growing to dozens and potentially hundreds of cores, and as such the design of the interconnect for on chip resources has become an important field of study. Of the available topologies, tiled mesh networks are an appealing approach in tiled CMPs, as they are relatively simple and scale fairly well. The area has seen recent focus on optimizing network on chip routers for performance as well as power and area efficiency. One major cost of initial designs has been their power and area consumption, and recent research into bufferless routing has attempted to counter this by entirely removing the buffers in the routers, showing substantial decreases in NoC energy consumption.
However, this research has shown that at high network loads, the energy benefits of bufferless schemes are vastly outweighed by performance degradation. When evaluated with pessimistic traffic patterns, the proposed router designs significantly increase network delays in last level cache traffic, and can lower the throughput of the system significantly.
We evaluate these router designs as one component of the entire memory hierarchy design. They are evaluated alongside simple cache mapping mechanisms designed to reduce the need for cross-chip network traffic, as well as packet prioritization mechanisms proposed for high performance.
We conclude, based on our evaluations, that with intelligent, locality-aware mapping of data to on-chip cache slices, bufferless network performance can get very close to buffered network performance. Locality-aware data mapping also significantly increases the network power advantage of bufferless routers over buffered ones