Journal of Privacy and Confidentiality


A major obstacle that hinders medical and social research is the lack of reliable data due to people's reluctance to reveal private information to strangers. Fortunately, statistical inference always targets a well-defined population rather than a particular individual subject and, in many current applications, data can be collected using a web-based system or other mobile devices. These two characteristics enable us to develop a data collection method, called triple matrix-masking (TM$^2$), which offers strong privacy protection with an immediate matrix transformation so that even the researchers cannot see the data, and then further uses matrix transformations to guarantee that the data will still be analyzable by standard statistical methods. The entities involved in the proposed process are a masking service provider who receives the initially masked data and then applies another mask, and the data collectors who partially decrypt the now doubly masked data and then apply a third mask before releasing the data to the public. A critical feature of the method is that the keys to generate the matrices are held separately. This ensures that nobody sees the actual data, but because of the specially designed transformations, statistical inference on parameters of interest can be conducted with the same results as if the original data were used. Hence the TM$^2$ method hides sensitive data with no efficiency loss for statistical inference of binary and normal data, which improves over Warner's randomized response technique. In addition, we add several features to the proposed procedure: an error checking mechanism is built into the data collection process in order to make sure that the masked data used for analysis are an appropriate transformation of the original data; and a partial masking technique is introduced to grant data users access to non-sensitive personal information while sensitive information remains hidden.