Privacy definitions are often analyzed using a highly targeted approach: a *specific* attack strategy is evaluated to determine if a *specific* type of information can be inferred. If the attack works, one can conclude that the privacy definition is too weak. If it doesn't work, one often gains little information about its security (perhaps a slightly different attack would have worked?). Furthermore, these strategies will not identify cases where a privacy definition protects unnecessary pieces of information.

On the other hand, technical results concerning generalizable and systematic analyses of privacy are few in number, but such results have significantly advanced our understanding of the design of privacy definitions. We add to this literature with a novel methodology for analyzing the Bayesian properties of a privacy definition. Its goal is to identify precisely the type of information being protected, hence making it easier to identify (and later remove) unnecessary data protections.

Using privacy building blocks (which we refer to as axioms), we turn questions about semantics into mathematical problems -- the construction of a *consistent normal form* and the subsequent construction of the *row cone* (which is a geometric object that encapsulates Bayesian guarantees provided by a privacy definition).

We apply these ideas to study randomized response, FRAPP/PRAM, and several algorithms that add integer-valued noise to their inputs; we show that their privacy properties can be stated in terms of the protection of various notions of parity of a dataset. Randomized response, in particular, provides unnecessarily strong protections for parity, and so we also show how our methodology can be used to relax privacy definitions.

]]>In this paper we investigate the applicability of regression-tree-based methods for constructing synthetic business data. We give a detailed example comparing exploratory data analysis and linear regression results under two variants of a regression-tree-based synthetic data approach. We also include an evaluation of the analysis results with respect to the results of analysis of the original data. We further investigate the impact of different stopping criteria on performance.

While it is certainly true that any method designed to protect confidentiality introduces error, and may indeed give misleading conclusions, our analysis of the results for synthesisers based on CART models has provided some evidence that this error is not random but is due to the particular characteristics of business data. We conclude that more careful analysis needs to be done in applying these methods and end users certainly need aware of possible discrepancies.

]]>In this paper we discuss the implementation of a remote analysis system enabling survival analysis. In this system the underlying data are not confidentialised, although for some analyses a random sample of the data is used, and the system outputs are modified to protect confidentiality and privacy. We describe confidentiality objectives for the system outputs, and describe measures for achieving them. To illustrate the effect of the methods, we provide a comprehensive example comparing confidentialised output with traditional output for a range of common survival analyses.

We believe that the confidentialised output of the remote analysis system for survival analysis as described in this paper is still useful for survival analysis in some situations, provided the user understands the confidentialisation process and its potential impact. If the remote analysis system user requires more detailed information such as outlier values, event times and/or and standard errors, then they would need to apply for access to the underlying data.

]]>