Model folding on large vision models

As vision models like Vision Transformers (ViTs), CLIP, or SAM grow larger, they require significant memory and computation, limiting their deployment in resource-constrained settings. Traditional compression methods such as pruning, quantization, and distillation reduce model size but often compromise accuracy or require retraining. Model Folding is a recent technique that clusters similar neurons to reduce parameters while preserving data statistics—offering a new trade-off between size and performance. While effective on smaller tasks, its impact on large vision models remains unexplored.
This thesis aims to investigate model folding on large-scale vision architectures, evaluating its effectiveness alone or in combination with other compression techniques like quantization.
Interested? Please contact us for more details!

Download as PDF

Student Target Groups:

  • Students of ICE;
  • Students of Computer Science;
  • Students of Software Engineering.

Thesis Type:

  • Master Thesis / Master Project

Goals and Tasks:

  • Conduct a literature review on model compression for large vision models;
  • Select and analyze one or more large-scale vision models (e.g., ViT, CLIP, SAM);
  • Implement model folding on selected models;
  • Evaluate performance trade-offs in terms of accuracy, size, and compute on public datasets;
  • Present your findings in a final presentation and written report.

Requirements:

  • Solid knowledge of neural networks and model architectures;
  • Programming skills in Python, PyTorch or TensorFlow;
  • (Optional) Familiarity with Vision Transformers, CLIP, or SAM models.

Used Tools & Equipment

  • A computation cluster of TU Graz

Start:

  • a.s.a.p.

Contact:

  • Dong Wang (dong.wangnoSpam@tugraz.at)
  • Assoc. Prof. Dr. Olga Saukh (saukhnoSpam@tugraz.at)
<style> @media print { #ext-topmenu, #ext-menu, #ext-content-right, #ext-footer, #cookie-consent-popup { display: none; } /* Avoid page breaks directly after a heading */ h1, h2, h3, h4, h5 { page-break-after: avoid; } /* Display the URL of links */ .bodytext a::after { content: &quot; (&quot; attr(href) &quot;)&quot;; } /* Content manipulation */ #ext-content-middle { margin: auto; width: 100%; padding: 30px 0 0 0; } #ext-content-middle-bbox { margin: auto; width: 85%; padding: 2%; } /* Header manipulation */ #ext-header-title-logo { display:inline-block; margin:10px 0 0 10px; } #ext-header #ext-header-title-text { display:inline-block; float:left; margin:0px; text-align:left; font-size:13px; text-transform: uppercase; letter-spacing: 0.2em; padding: 5px 0 0 15px; /* without logo */ width:250px; } #ext-header-logo-right { display:table-cell; float:right; } #ext-header-tulogo-claim { display:inline-block; } .my-container { display: flex; align-items: center; justify-content: center; } .my-image { max-width: 25%; max-height:15%; float: left; } .my-text { font-size: 20px; padding-left: 20px; padding-top: 20%; float: left; } </style>