Date of Original Version

December 2012



Abstract or Description

This work applies the distributed computing framework MapReduce to Bayesian network parameter learning from incomplete data. We formulate the classical Expectation Maximization (EM) algorithm within the MapReduce framework. Analytically and experimentally we analyze the speed-up that can be obtained by means of MapReduce. We present details of the MapReduce formulation of EM, report speed-ups versus the sequential case, and carefully compare various Hadoop cluster configurations in experiments with Bayesian networks of different sizes and structures.



Published In

Proc. of Big Learning: Algorithms, Systems and Tools.


To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.