Empirical Study of Straggler Problem in Parameter Server on Iterative Convergent Distributed Machine Learning
Abstract
Benjamin Wong
The purpose of this study is to test the effectiveness of current straggler mitigation techniques over different important iterative convergent machine learning (ML) algorithm including Matrix Factorization (MF), Multinomial Logistic Regression (MLR), and Latent Dirichlet Allocation (LDA). The experiment was conducted to implemented using the Flex PS system, which is the latest system implementation that employ parameter server architecture [1,2]. The experiment employed the Bulk Synchronous Parallel (BSP) computational model to examine the straggler problem in Parameter Server on Iterative Convergent Distributed Machine Learning. Moreover, the current research analyzes the experimental arrangement of the parameter server strategy concerning the parallel learning problems by injecting universal straggler patterns and executing latest mitigation techniques. The findings of the study are significant in that as they will provide the necessary platform for conducting further research into the problem and allow the researcher to compare different methods for various applications. The outcome is therefore expected to facilitate the development of new techniques coupled with new perspectives in addressing this problem.