<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Journal of Privacy and Confidentiality</title>
<copyright>Copyright (c) 2013 Carnegie Mellon University All rights reserved.</copyright>
<link>http://repository.cmu.edu/jpc</link>
<description>Recent documents in Journal of Privacy and Confidentiality</description>
<language>en-us</language>
<lastBuildDate>Fri, 29 Mar 2013 06:50:55 PDT</lastBuildDate>
<ttl>3600</ttl>








<item>
<title>Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy</title>
<link>http://repository.cmu.edu/jpc/vol4/iss2/5</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss2/5</guid>
<pubDate>Fri, 01 Mar 2013 13:36:59 PST</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>


</item>






<item>
<title>Estimation of Regression Parameters from Noise Multiplied Data</title>
<link>http://repository.cmu.edu/jpc/vol4/iss2/4</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss2/4</guid>
<pubDate>Fri, 01 Mar 2013 13:36:58 PST</pubDate>
<description>
	<![CDATA[
	<p>This paper considers the scenario that all data entries in a confidentialised unit record file were masked by multiplicative noises, regardless of whether unit records are sensitive or not and regardless of whether the masked variables are dependent or independent variables in the underlying regression analysis. A technique is introduced in this paper to show how to estimate parameters in a regression model, which is originally fitted by unmasked data, based on masked data. Several simulation studies and a real-life data application are presented.</p>

	]]>
</description>

<author>Yan-Xia Lin et al.</author>


</item>






<item>
<title>Random Differential Privacy</title>
<link>http://repository.cmu.edu/jpc/vol4/iss2/3</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss2/3</guid>
<pubDate>Fri, 01 Mar 2013 13:36:56 PST</pubDate>
<description>
	<![CDATA[
	<p>We propose a relaxed privacy definition called {\em random differential privacy} (RDP). Differential privacy requires that adding any new observation to a database will have small effect on the output of the data-release procedure. Random differential privacy requires that adding a {\em randomly drawn new observation} to a database will have small effect on the output. We show an analog of the composition property of differentially private procedures which applies to our new definition. We show how to release an RDP histogram and we show that RDP histograms are much more accurate than histograms obtained using ordinary differential privacy. We finally show an analog of the global sensitivity framework for the release of functions under our privacy definition.</p>

	]]>
</description>

<author>Robert Hall et al.</author>


</item>






<item>
<title>Silent Listeners: The Evolution of Privacy and Disclosure on Facebook</title>
<link>http://repository.cmu.edu/jpc/vol4/iss2/2</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss2/2</guid>
<pubDate>Fri, 01 Mar 2013 13:36:53 PST</pubDate>
<description>
	<![CDATA[
	<p>Over the past decade, social network sites have experienced dramatic growth in popularity, reaching most demographics and providing new opportunities for interaction and socialization. Through this growth, users have been challenged to manage novel privacy concerns and balance nuanced trade-offs between disclosing and withholding personal information. To date, however, no study has documented how privacy and disclosure evolved on social network sites over an extended period of time. In this manuscript we use profile data from a longitudinal panel of 5,076 Facebook users to understand how their privacy and disclosure behavior changed between 2005---the early days of the network---and 2011. Our analysis highlights three contrasting trends. First, over time Facebook users in our dataset exhibited increasingly privacy-seeking behavior, progressively decreasing the amount of personal data shared publicly with unconnected profiles in the same network. However, and second, changes implemented by Facebook near the end of the period of time under our observation arrested or in some cases inverted that trend. Third, the amount and scope of personal information that Facebook users revealed privately to other connected profiles actually increased over time---and because of that, so did disclosures to ``silent listeners'' on the network: Facebook itself, third-party apps, and (indirectly) advertisers. These findings highlight the tension between privacy choices as expressions of individual subjective preferences, and the role of the environment in shaping those choices.</p>

	]]>
</description>

<author>Fred Stutzman et al.</author>


</item>






<item>
<title>Is the Privacy of Network Data an Oxymoron?</title>
<link>http://repository.cmu.edu/jpc/vol4/iss2/1</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss2/1</guid>
<pubDate>Fri, 01 Mar 2013 13:36:51 PST</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Stephen E. Fienberg</author>


</item>






<item>
<title>Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings</title>
<link>http://repository.cmu.edu/jpc/vol4/iss1/10</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss1/10</guid>
<pubDate>Thu, 16 Aug 2012 10:04:54 PDT</pubDate>
<description>
	<![CDATA[
	<p>We focus on the problem of multi-party data sharing in high dimensional data settings where the number of measured features (or the dimension) p is frequently much larger than the number of subjects (or the sample size) n, the so-called p >> n scenario that has been the focus of much recent statistical research. Here, we consider data sharing for two interconnected problems in high dimensional data analysis, namely the feature selection and classification. We characterize the notions of ``cautious", ``regular", and ``generous" data sharing in terms of their privacy-preserving implications for the parties and their share of data, with focus on the ``feature privacy" rather than the ``sample privacy", though the violation of the former may lead to the latter. We evaluate the data sharing methods using {\it phase diagram} from the statistical literature on multiplicity and Higher Criticism thresholding. In the two-dimensional phase space calibrated by the signal sparsity and signal strength, a phase diagram is a partition of the phase space and contains three distinguished regions, where we have no (feature)-privacy violation, relatively rare privacy violations, and an overwhelming amount of privacy violation.</p>

	]]>
</description>

<author>Stephen E. Fienberg et al.</author>


</item>






<item>
<title>Achieving Both Valid and Secure Logistic Regression Analysis on Aggregated Data from Different Private Sources</title>
<link>http://repository.cmu.edu/jpc/vol4/iss1/9</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss1/9</guid>
<pubDate>Thu, 16 Aug 2012 10:04:52 PDT</pubDate>
<description>
	<![CDATA[
	<p>Preserving the privacy of individual databases when carrying out statistical calculations has a relatively long history in statistics and had been the focus of much recent attention in machine learning. In this paper, we present a protocol for fitting a logistic regression when the data are held by separate parties---without actually combining information sources---by exploiting results from the literature on multi-party secure computation. Our protocol provides only the final result of the calculation compared with other methods that share intermediate values and thus present an opportunity for compromise of values in the individual databases. Our paper has two themes: (1) the development of a secure protocol for computing the logistic parameters, and a demonstration of its performances in practice, and (2) the presentation of an amended protocol that speeds up the computation of the logistic function. We illustrate the nature of the calculations and their accuracy using an extract of data from the Current Population Survey divided between two parties. Throughout, we build our protocol from existing cryptographic primitives, thus the novelty is in designing a concrete procedure for private computation of the logistic regression MLE rather than to propose new cryptographic constructions.</p>

	]]>
</description>

<author>Yuval Nardi et al.</author>


</item>






<item>
<title>Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets</title>
<link>http://repository.cmu.edu/jpc/vol4/iss1/8</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss1/8</guid>
<pubDate>Thu, 16 Aug 2012 10:04:49 PDT</pubDate>
<description>
	<![CDATA[
	<p>When releasing individual-level data to the public, statistical agencies typically alter data values to protect the confidentiality of individuals’ identities and sensitive attributes. When data undergo substantial perturbation, secondary data analysts’ inferences can be distorted in ways that they typically cannot determine from the released data alone. This is problematic, in that analysts have no idea if they should trust the results based on the altered data.To ameliorate this problem, agencies can establish verification servers, which are remote computers that analysts query for measures of the quality of inferences obtained from disclosure-protected data. The reported quality measures reflect the similarity between the analysis done with the altered data and the analysis done with the confidential data. However, quality measures can leak information about the confidential values, so that they too must be subject to disclosure protections. In this article, we discuss several approaches to releasing quality measures for verification servers when the public use data are generated via multiple imputation, also known as synthetic data. The methods can be modified for other stochastic perturbation methods.</p>

	]]>
</description>

<author>David R. McClure et al.</author>


</item>






<item>
<title>Privacy Protection from Sampling and Perturbation in Survey Microdata</title>
<link>http://repository.cmu.edu/jpc/vol4/iss1/7</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss1/7</guid>
<pubDate>Thu, 16 Aug 2012 10:04:47 PDT</pubDate>
<description>
	<![CDATA[
	<p>Statistical agencies release microdata from social surveys as public-use files after applying statistical disclosure limitation (SDL) techniques. Disclosure risk is typically assessed in terms of identification risk, where it is supposed that small counts on cross-classified identifying key variables, i.e. a key, could be used to make an identification and confidential information may be learnt. In this paper we explore the application of definitions of privacy from the computer science literature to the same problem, with a focus on sampling and a form of perturbation which can be represented as misclassification. We consider two privacy definitions: differential privacy and probabilistic differential privacy. Chaudhuri and Mishra (2006) have shown that sampling does not guarantee differential privacy, but that, under certain conditions, it may ensure probabilistic differential privacy. We discuss these definitions and conditions in the context of survey microdata. We then extend this discussion to the case of perturbation. We show that differential privacy can be ensured if and only if the perturbation employs a misclassification matrix with no zero entries. We also show that probabilistic differential privacy is a viable alternative to differential privacy when there are zeros in the misclassification matrix. We discuss some common examples of SDL methods where in some cases zeros may be prevalent in the misclassification matrix.</p>

	]]>
</description>

<author>Natalie Shlomo et al.</author>


</item>






<item>
<title>Confidentialising Survival Analysis Output in a Remote Data Access System</title>
<link>http://repository.cmu.edu/jpc/vol4/iss1/6</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss1/6</guid>
<pubDate>Thu, 16 Aug 2012 10:04:45 PDT</pubDate>
<description>
	<![CDATA[
	<p>A remote analysis system addresses the challenge of enabling the use of confidential or private data while maintaining standards of confidentiality and privacy. Traditional approaches typically involve reducing the risk of disclosure by modifying or <em>confidentialising</em> data before releasing it to users. In contrast, a remote analysis system enables users to submit statistical queries and receive output without direct access to the data. A remote analysis system may involve confidentialisation of the underlying data itself or the system outputs, or both.</p>
<p>In this paper we discuss the implementation of a remote analysis system enabling survival analysis. In this system the underlying data are not confidentialised, although for some analyses a random sample of the data is used, and the system outputs are modified to protect confidentiality and privacy. We describe confidentiality objectives for the system outputs, and describe measures for achieving them. To illustrate the effect of the methods, we provide a comprehensive example comparing confidentialised output with traditional output for a range of common survival analyses.</p>
<p>We believe that the confidentialised output of the remote analysis system for survival analysis as described in this paper is still useful for survival analysis in some situations, provided the user understands the confidentialisation process and its potential impact. If the remote analysis system user requires more detailed information such as outlier values, event times and/or and standard errors, then they would need to apply for access to the underlying data.</p>

	]]>
</description>

<author>Christine M. O&apos;Keefe et al.</author>


</item>






<item>
<title>Differential Privacy for Protecting Multi-dimensional Contingency Table Data: Extensions and Applications</title>
<link>http://repository.cmu.edu/jpc/vol4/iss1/5</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss1/5</guid>
<pubDate>Thu, 16 Aug 2012 10:04:43 PDT</pubDate>
<description>
	<![CDATA[
	<p>The methodology of differential privacy has provided a strong definition of privacy which in some settings, using a mechanism of doubly-exponential noise addition, also allows for extraction of informative statistics from databases. In a recent paper, Barak et al.[1] extend this approach to the release of a specified set of margins from a multi-way contingency table. Privacy protection in such settings implicitly focuses on small cell counts that might allow for the identification of units that are unique in the database. We explore how well the mechanism works in the context of a series of examples, and the extent to which the proposed differential-privacy mechanism allows for sensible inferences from the released data. We conclude that the methodology, as it is currently formulated, is problematic in the context of the types of large sparse contingency tables encountered in statistical practice.</p>

	]]>
</description>

<author>Xiaolin Yang et al.</author>


</item>






<item>
<title>Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning</title>
<link>http://repository.cmu.edu/jpc/vol4/iss1/4</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss1/4</guid>
<pubDate>Thu, 16 Aug 2012 10:04:41 PDT</pubDate>
<description>
	<![CDATA[
	<p>The ubiquitous need for analyzing privacy-sensitive information—including health records, personal communications, product ratings and social network data—is driving significant interest in privacy-preserving data analysis across several research communities. This paper explores the release of Support Vector Machine (SVM) classifiers while preserving the privacy of training data. The SVM is a popular machine learning method that maps data to a high-dimensional feature space before learning a linear decision boundary. We present efficient mechanisms for finite-dimensional feature mappings and for (potentially infinite-dimensional) mappings with translation-invariant kernels. In the latter case, our mechanism borrows a technique from large-scale learning to learn in a finite-dimensional feature space whose inner-product uniformly approximates the desired feature space inner-product (the desired kernel) with high probability. Differential privacy is established using algorithmic stability, a property used in learning theory to bound generalization error. Utility—when the private classifier is pointwise close to the non-private classifier with high probability—is proven using smoothness of regularized empirical risk minimization with respect to small perturbations to the feature mapping. Finally we conclude with lower bounds on the differential privacy of any mechanism approximating the SVM.</p>

	]]>
</description>

<author>Benjamin I. P. Rubinstein et al.</author>


</item>






<item>
<title>Minimaxity, Statistical Thinking and Differential Privacy</title>
<link>http://repository.cmu.edu/jpc/vol4/iss1/3</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss1/3</guid>
<pubDate>Thu, 16 Aug 2012 10:04:39 PDT</pubDate>
<description>
	<![CDATA[
	<p>We discuss the role of minimax statistical theory for privacy theory. Minimax theory gives a way to measure information loss for sanitized databases. We also discuss some differences between privacy theory from the statistical perspective versus the computer science perspective.</p>

	]]>
</description>

<author>Larry Wasserman</author>


</item>






<item>
<title>An Axiomatic View of Statistical Privacy and Utility</title>
<link>http://repository.cmu.edu/jpc/vol4/iss1/2</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss1/2</guid>
<pubDate>Thu, 16 Aug 2012 10:04:37 PDT</pubDate>
<description>
	<![CDATA[
	<p>"Privacy" and "utility" are words that frequently appear in the literature on statistical privacy. But what do these words really mean? In recent years, many problems with intuitive notions of privacy and utility have been uncovered. Thus more formal notions of privacy and utility, which are amenable to mathematical analysis, are needed. In this paper we present our initial work on an axiomatization of privacy and utility. We present two privacy axioms which describe how privacy is affected by post-processing data and by randomly selecting a privacy mechanism. We present three axioms for utility measures which also describe how measured utility is affected by post-processing. Our analysis of these axioms yields new insights into the construction of privacy definitions and utility measures. In particular, we characterize the class of relaxations of differential privacy that can be obtained by changing constraints on probabilities; we show that the resulting constraints must be formed from concave functions. We also present several classes of utility metrics satisfying our axioms and explicitly show that measures of utility borrowed from statistics can lead to utility paradoxes when applied to statistical privacy. Finally, we show that the outputs of differentially private algorithms are best interpreted in terms of graphs or likelihood functions rather than query answers or synthetic data.</p>

	]]>
</description>

<author>Daniel Kifer et al.</author>


</item>






<item>
<title>Special Issue on Statistical and Learning-Theoretic Challenges in Data Privacy</title>
<link>http://repository.cmu.edu/jpc/vol4/iss1/1</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol4/iss1/1</guid>
<pubDate>Thu, 16 Aug 2012 10:04:35 PDT</pubDate>
<description>
	<![CDATA[
	<p>This special issue presents papers based on talks from a workshop on "Statistical and Learning-Theoretic Challenges in Data Privacy" held at UCLA's Institute for Pure and Applied Mathematics (IPAM), February 22–26, 2010.</p>

	]]>
</description>

<author>Aleksandra B. Slavkovic et al.</author>


</item>






<item>
<title>Rejoinder</title>
<link>http://repository.cmu.edu/jpc/vol3/iss2/11</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol3/iss2/11</guid>
<pubDate>Wed, 02 Nov 2011 12:12:02 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Gerald W. Gates</author>


</item>






<item>
<title>Privacy and the Statistician: What Do We Need to Know to Certify Nondisclosure?</title>
<link>http://repository.cmu.edu/jpc/vol3/iss2/10</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol3/iss2/10</guid>
<pubDate>Wed, 02 Nov 2011 12:12:01 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Alan M. Zaslavsky</author>


</item>






<item>
<title>Trust but Pre-Verify?</title>
<link>http://repository.cmu.edu/jpc/vol3/iss2/9</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol3/iss2/9</guid>
<pubDate>Wed, 02 Nov 2011 12:12:00 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Fritz Scheuren</author>


</item>






<item>
<title>Comment on Article by Gates</title>
<link>http://repository.cmu.edu/jpc/vol3/iss2/8</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol3/iss2/8</guid>
<pubDate>Wed, 02 Nov 2011 12:11:59 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Jerome P. Reiter</author>


</item>






<item>
<title>Toward a Reconceptualization of Confidentiality Protection in the Context of Linkages with Administrative Records</title>
<link>http://repository.cmu.edu/jpc/vol3/iss2/7</link>
<guid isPermaLink="true">http://repository.cmu.edu/jpc/vol3/iss2/7</guid>
<pubDate>Wed, 02 Nov 2011 12:11:57 PDT</pubDate>
<description>
	<![CDATA[
	
	]]>
</description>

<author>Stephen E. Fienberg</author>


</item>





</channel>
</rss>
