Enabling N:M Sparsity Without Penalty

Due to large neural network size and high redundancy, there is a growing interest in various techniques to reduce the number of weights and accelerate training and inference. An active area of research in this field is sparsity - encouraging zero values in parameters that can then be discarded from storage or computations. The recently introduced NVIDIA Ampere accelerator architecture supports 2:4 sparsity pattern, i.e., halves a model’s parameter count, requiring that every group of consecutive four values contains at least two zeros. This leads to twice the math throughput of dense matrix units. In general, we consider N:M sparsity. In this work, your will explore how to efficiently build, zip and update N:M-sparse networks. As a starting point, we encourage you to read the following papers: (1), (2), (3). You will be provided an algorithm helpful in this setting. Your ideas are welcome

Download as PDF

Student Target Groups:

  • Students in ICE
  • Students in Computer Science

Thesis Type:

  • Master Thesis / Master Project

Goal and Tasks:

  • Literature review on N:M sparsity, permutation invariance, network compression, multi-task zipping and distributed training
  • Implement N:M sparsity and compare to the vanilla algorithm published in the literature
  • Evaluate the same idea in the context of multi-task zipping and distributed training
  • Summarize the results in a written report, and prepare an oral presentation

Recommended Prior Knowledge:

  • A good knowledge of neural networks and interest in exploring network sparsity
  • Programming skills in Python
  • Prior experience in deep learning frameworks is desirable (preferably PyTorch)

Start:

  • a.s.a.p.

Contact: