Date of Original Version
Abstract or Description
Network operators in large-scale networks are often faced with long lists of maintenance tasks and find it difficult to track the relative importance of these tasks, without knowing their impact on the network’s operation. As a result, operators may react slowly to critical tasks, increasing network downtime and maintenance costs. We present a system that quantifies the impact of maintenance tasks so that operators can prioritize their reaction according to the estimated impact (i.e., spend more time and effort on avoiding the disruption caused by high-impact maintenance tasks). In particular, the proposed system estimates the amount of traffic loss due to maintenance operations on interdomain routing sessions, one of the most frequently modified aspects of network configurations. We implement the proposed system and apply it to 372 routing sessions in a nation-wide ISP network. The system identifies sessions with a varying degree of impact: sessions with nearly zero data loss, as well as sessions that can result in more than 1,000 GB of data loss if disrupted without any protection mechanism applied. We also show that predicting the amount of data loss is not straightforward since this amount changes over time, often in unexpected ways (e.g., from 50GB to 0 over one-month period). Therefore, the proposed impact analysis system is necessary for network operators to perform periodic audits of the routing sessions’ impact and to classify the sessions according to the projected data losses. Operators can then decide the level of protection for each session (e.g., employ more effective and costly methods to protect critical sessions) and thus allocate maintenance costs more efficiently.