Date of Original Version
Copyright 2010 by the author(s)/owner(s)
Abstract or Description
Most learning algorithms assume that a data set is given initially. We address the com- mon situation where data is not available ini- tially, but can be obtained, at a cost. We focus on learning Bayesian belief networks (BNs) over discrete variables. As such BNs are models of probabilistic distributions, we consider the “generative” challenge of learning the parameters for a fixed structure, that best match the true distribution. We focus on the budgeted learning setting, where there is a known fixed cost ci for acquiring the value of the ith feature for any specified instance, and a known total budget to spend acquiring all information. After formally defining this problem from a Bayesian perspective, we first consider non-sequential algorithms that must decide, before seeing any results, which features of which instances to probe. We show this is NP-hard, even if all variables are independent, then prove that the greedy allocation algorithm iga is optimal when the costs are uniform and the features are independent, but can otherwise be sub-optimal. We then show that general (sequential) policies per- form better, and explore the challenges of learning the parameters for general belief net- works in this setting, describing conditions for when the obvious round-robin algorithm will, versus will not, work optimally. We also explore the effectiveness of this and various other heuristic algorithms
Proceedings of the 27 th International Conference on Machine Learning.