Disclosure Risk vs. Data Utility through the R-U Confidentiality Map in Multivariate Settings

Date of Original Version



Working Paper

Abstract or Description

Information organizations, such as statistical agencies, must ensure that data access does not compromise the confidentiality afforded data providers, whether individuals or establishments. Recognizing that deidentification of data is generally inadequate to protect confidentiality against attack by a data snooper, information organizations (IOs)—such as statistical agencies, data archives, and trade associations—can implement a variety of disclosure limitation (DL) techniques—such as topcoding, noise addition and data swapping—in developing data products. Desirably, the resulting restricted data have both high data utility U to data users and low disclosure risk R from data snoopers. IOs lack a framework for examining tradeoffs between R and U under a specific DL procedure. They also lack systematic ways of comparing the performance of distinct DL procedures. To provide this framework and facilitate comparisons, the R-U confidentiality map is introduced to trace the joint impact on R and U to changes in the parameters of a DL procedure. Implementation of an R-U confidentiality map is illustrated in the case of multivariate noise addition. Analysis is provided for two important multivariate estimation problems: a data user seeks to estimate linear combinations of means and to estimate regression coefficients.