WACV 2026 — Pages 7605–7615

Improvise, Adapt, Overcome —
Telescopic Adapters for Efficient Fine-tuning
of Vision Language Models in Medical Imaging

Ujjwal Mishra¹ Vinita Shukla¹ Praful Hambarde¹ Amit Shukla¹

¹Centre for Artificial Intelligence and Robotics, Indian Institute of Technology Mandi, India

arXiv Paper CVF Open Access Cite Live Demo

613k
Trainable parameters — only 0.4% of CLIPSeg's 150M

244×
Fewer parameters than end-to-end fine-tuning

SOTA
Best PEFT performance across five medical benchmarks

Abstract

Adapting Vision Language Segmentation Models (VLSMs) to medical imaging domains requires significant computational overhead when using conventional fine-tuning approaches. We introduce Telescopic Adapters, a novel PEFT framework that employs depth-aware scaling to progressively increase adapter capacity from shallow to deep transformer layers. Using only 613k trainable parameters — 244× fewer than end-to-end fine-tuning — Telescopic Adapters achieve superior performance across five diverse medical datasets spanning polyp segmentation, skin lesion detection, and breast ultrasound imaging.

Method

Telescopic Adapter Framework

The core insight is a depth-aware telescopic scaling strategy: rather than applying uniform adapter widths, we assign smaller bottleneck dimensions to early layers and progressively increase capacity toward deeper layers.

Figure 1. Overview of the proposed telescopic adaptation framework highlighting depth-aware dimension allocation across vision and text branches.

Citation

BibTeX

@InProceedings{Mishra_2026_WACV,
  author    = {Mishra, Ujjwal and Shukla, Vinita and
               Hambarde, Praful and Shukla, Amit},
  title     = {Improvise, Adapt, Overcome -- Telescopic Adapters
               for Efficient Fine-tuning of Vision Language
               Models in Medical Imaging},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference
               on Applications of Computer Vision (WACV)},
  month     = {March},
  year      = {2026},
  pages     = {7605--7615}
}

Improvise, Adapt, Overcome — Telescopic Adapters for Efficient Fine-tuning of Vision Language Models in Medical Imaging

Telescopic Adapter Framework

BibTeX

Improvise, Adapt, Overcome —
Telescopic Adapters for Efficient Fine-tuning
of Vision Language Models in Medical Imaging