Date of Award

Winter 12-2016

Embargo Period


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Electrical and Computer Engineering


Hyong S. Kim


Management solutions for current and future Infrastructure-as-a-Service (IaaS) Data Centers (DCs) face complex challenges. First, DCs are now very large infrastructures holding hundreds of thousands if not millions of servers and applications. Second, DCs are highly heterogeneous. DC infrastructures consist of servers and network devices with different capabilities from various vendors and different generations. Cloud applications are owned by different tenants and have different characteristics and requirements. Third, most DC elements are highly dynamic. Applications can change over time. During their lifetime, their logical architectures evolve and change according to workload and resource requirements. Failures and bursty resource demand can lead to unstable states affecting a large number of services. Global and centralized approaches limit scalability and are not suitable for large dynamic DC environments with multiple tenants with different application requirements. We propose a novel fully distributed and dynamic management paradigm for highly diverse and volatile DC environments. We develop LAMA, a novel framework for managing large scale cloud infrastructures based on a multi-agent system (MAS). Provider agents collaborate to advertise and manage available resources, while app agents provide integrated and customized application management. Distributing management tasks allows LAMA to scale naturally. Integrated approach improves its efficiency. The proximity to the application and knowledge of the DC environment allow agents to quickly react to changes in performance and to pre-plan for potential failures. We implement and deploy LAMA in a testbed server cluster. We demonstrate how LAMA improves scalability of management tasks such as provisioning and monitoring. We evaluate LAMA in light of state-of-the-art open source frameworks. LAMA enables customized dynamic management strategies to multi-tier applications. These strategies can be configured to respond to failures and workload changes within the limits of the desired SLA for each application.