Training a deep neural network over a large dataset is very time-consuming. A popular solution is to distribute the training process across multiple nodes using a parameter server framework that aggregates weight updates from the workers. There are several algorithm for synchronous and asynchronous weight update e.g., Parallel SGD, ADMM). The parameter server framework supports model parallelism and data parallelism. In model parallelism, a large model can be partitioned and its components are assigned to workers. However, it is difficult to decouple a model due to dependencies between components of the model (e.g., layers) and the nature of the optimization method (e.g., stochastic gradient descent). In data parallelism, the training data is partitioned based on the number of workers and a partition is assigned to each worker. You will focus on data parallelism, experiment and compare several standard algorithms for updating model parameters. You will enable distributed deep learning on resource-constrained devices, not a GPU cluster.