Date of Original Version
Abstract or Table of Contents
Stealthy malware, such as botnets and spyware, are hard to detect because their activities are subtle and do not disrupt the network, in contrast to DoS attacks and aggressive worms. Stealthy malware, however, does communicate to exfiltrate data to the attacker, to receive the attacker’s commands, or to carry out those commands (e.g., send spam). Moreover, since malware rarely infiltrates only a single host in a large enterprise, these communications should emerge from multiple hosts within coarse temporal proximity to one another. In this paper, we describe a system called T ĀMD (pronounced “tamed”) with which an enterprise can identify infected computers within its network by finding new communication “aggregates” involving multiple internal hosts, i.e., communication flows that share common characteristics. We describe characteristics for defining aggregates—including flows that communicate with the same external network, that share similar payload, and/or that involve internal hosts with similar software platforms—and justify their use in finding infected hosts. We also detail efficient algorithms employed by T ĀMD for identifying such aggregates, and demonstrate a particular configuration of T ĀMD that identifies new infections for multiple bot and spyware examples with very few false detections, within traces of traffic recorded at the edge of a university network. This is achieved even when the number of infected hosts comprise only about 0.0065% of all internal hosts in the network.