Date of Original Version
All Rights Reserved
Abstract or Description
This thesis describes the design, implementation, and evaluation of a Resilient Overlay Network (RON), an architecture that allows end-to-end communication across the wide-area Internet to detect and recover from path outages and periods of degraded performance within several seconds. A RON is an application-layer overlay on top of the existing Internet routing substrate. The overlay nodes monitor the liveness and quality of the Internet paths among themselves, and they use this information to decide whether to route packets directly over the Internet or by way of other RON nodes, optimizing application-specific routing metrics.
We demonstrate the potential benefits of RON by deploying and measuring a working RON with nodes at thirteen sites scattered widely over the Internet. Over a 71-hour sampling period in March 2001, there were 32 significant outages lasting over thirty minutes each, between the 156 communicating pairs of RON nodes. RON’s routing mechanism was able to detect and recover around all of them, showing that there is, in fact, physical path redundancy in the underlying Internet in many cases. RONs are also able to improve the loss rate, latency, or throughput perceived by data transfers; for example, about 1% of the transfers doubled their TCP throughput and 5% of our transfers saw their loss rate reduced by 5% in absolute terms. These improvements, particularly in the area of fault detection and recovery, demonstrate the benefits of moving some of the control over routing into the hands of end-systems