Date of Original Version
Abstract or Description
Many experts believe that new malware is created at a rate faster than legitimate software. For example, in 2007 over one million new malware samples were collected by a major security solution vendor. However, it is often speculated, though to the best of our knowledge unproven, that new malware is produced by modifying existing malware, either through simple tweaks, code composition, or a variety of other techniques. Moreover, when buggy code is copied from one program to another program, both original and new programs have to be patched. However, code copying is typically not recorded. Such code reuse is a recurring problem in security.
In this paper we propose a fast, scalable algorithm for automatic code reuse detection in binary code, BitShred. BitShred can be used for identifying the amount of shared code based upon the ability to calculate the similarity among binary code. BitShred can be applied to many security problems, such as malware clustering and bug finding. We developed a prototype implementation to evaluate our algorithm. The experimental results show that BitShred is able to detect plagiarism among malware samples and cluster them efficiently.