WACV 2026 — Pages 7605–7615

Improvise, Adapt, Overcome —
Telescopic Adapters for Efficient Fine-tuning
of Vision Language Models in Medical Imaging

Ujjwal Mishra1 Vinita Shukla1 Praful Hambarde1 Amit Shukla1

1Centre for Artificial Intelligence and Robotics, Indian Institute of Technology Mandi, India

613k
Trainable parameters — only 0.4% of CLIPSeg's 150M
244×
Fewer parameters than end-to-end fine-tuning
SOTA
Best PEFT performance across five medical benchmarks
Abstract
Adapting Vision Language Segmentation Models (VLSMs) to medical imaging domains requires significant computational overhead when using conventional fine-tuning approaches. We introduce Telescopic Adapters, a novel PEFT framework that employs depth-aware scaling to progressively increase adapter capacity from shallow to deep transformer layers. Using only 613k trainable parameters — 244× fewer than end-to-end fine-tuning — Telescopic Adapters achieve superior performance across five diverse medical datasets spanning polyp segmentation, skin lesion detection, and breast ultrasound imaging.
Method

Telescopic Adapter Framework

The core insight is a depth-aware telescopic scaling strategy: rather than applying uniform adapter widths, we assign smaller bottleneck dimensions to early layers and progressively increase capacity toward deeper layers.

Overview architecture
Figure 1. Overview of the proposed telescopic adaptation framework highlighting depth-aware dimension allocation across vision and text branches.
Citation

BibTeX

@InProceedings{Mishra_2026_WACV,
  author    = {Mishra, Ujjwal and Shukla, Vinita and
               Hambarde, Praful and Shukla, Amit},
  title     = {Improvise, Adapt, Overcome -- Telescopic Adapters
               for Efficient Fine-tuning of Vision Language
               Models in Medical Imaging},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference
               on Applications of Computer Vision (WACV)},
  month     = {March},
  year      = {2026},
  pages     = {7605--7615}
}