Distributed Training for Deep Learning (NES Group)

Training a deep neural network over a large dataset is very time-consuming. A popular solution is to distribute the training process across multiple nodes using a parameter server framework that aggregates weight updates from the workers. There are several algorithm for synchronous and asynchronous weight update e.g., Parallel SGD, ADMM). The parameter server framework supports model parallelism and data parallelism. In model parallelism, a large model can be partitioned and its components are assigned to workers. However, it is difficult to decouple a model due to dependencies between components of the model (e.g., layers) and the nature of the optimization method (e.g., stochastic gradient descent). In data parallelism, the training data is partitioned based on the number of workers and a partition is assigned to each worker. You will focus on data parallelism, experiment and compare several standard algorithms for updating model parameters. You will enable distributed deep learning on resource-constrained devices, not a GPU cluster.

Download as PDF

Student Target Groups:

  • Students of ICE/Telematics
  • Students of Computer Science

Thesis Type:

  • Master Thesis

Goals and Tasks:

  • Thorough literature research on the topic
  • Implement a deep learning algorithm on one local machine
  • Implement a deep learning algorithm on multiple distributed devices
  • Summarize the results in a written report, present and demonstrate the prototype

Required Knowledge:

  • Creativity, interest in state-of-the-art deep learning methods
  • Programming skills in Python
  • Prior experience in deep learning frameworks is desirable (preferably PyTorch)


  • As soon as possible