Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets

David R. McClure; Jerome P. Reiter

doi:10.29012/jpc.v4i1.616

PDF

Published: Jul 20, 2012

DOI: https://doi.org/10.29012/jpc.v4i1.616

Keywords:

Confidentiality, Disclosure, Multiple imputation, Utility, Verification

David R. McClure

Department of Statistical Science, Duke University, Durham, NC

https://orcid.org/0000-0001-8470-7190

Jerome P. Reiter

Department of Statistical Science, Duke University, Durham, NC

https://orcid.org/0000-0002-8374-3832

Abstract

When releasing individual-level data to the public, statistical agencies typically alter data values to protect the confidentiality of individuals’ identities and sensitive attributes. When data undergo substantial perturbation, secondary data analysts’ inferences can be distorted in ways that they typically cannot determine from the released data alone. This is problematic, in that analysts have no idea if they should trust the results based on the altered data.To ameliorate this problem, agencies can establish verification servers, which are remote computers that analysts query for measures of the quality of inferences obtained from disclosure-protected data. The reported quality measures reflect the similarity between the analysis done with the altered data and the analysis done with the confidential data. However, quality measures can leak information about the confidential values, so that they too must be subject to disclosure protections. In this article, we discuss several approaches to releasing quality measures for verification servers when the public use data are generated via multiple imputation, also known as synthetic data. The methods can be modified for other stochastic perturbation methods.

How to Cite

McClure, David R., and Jerome P. Reiter. 2012. “Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets”. Journal of Privacy and Confidentiality 4 (1). https://doi.org/10.29012/jpc.v4i1.616.

Issue

Vol. 4 No. 1 (2012)

Section

Articles

Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.

Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.

Funding data

National Science Foundation
Grant numbers SES-0751671

Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets

Abstract

Funding data

Similar Articles

Most read articles by the same author(s)

Similar Articles

Practical Data Synthesis for Large Samples

Data Confidentiality: The Next Five Years Summary and Guide to Papers

Multiple Imputation for Disclosure Limitation: Future Research Challenges

Article Sidebar

Main Article Content

Abstract

Article Details

Funding data

Similar Articles

Most read articles by the same author(s)