Explainable Semantic Segmentation Of Organs In Laparoscopic Hysterectomy Using Transfer Learning, Ensembles, And Vision Transformers
DOI:
https://doi.org/10.64252/n9dwkx40Keywords:
Laparoscopic Hysterectomy; Deep Learning; Semantic Segmentation; Transfer Learning; Ensemble Learning; Vision Transformer; Explainable Artificial Intelligence.Abstract
Purpose: To develop and evaluate a clinically oriented semantic-segmentation framework that delineates the ureter, uterine artery, and pelvic nerves in laparoscopic hysterectomy. Methods: We analyzed the publicly available, de-identified UD Ureter–Uterine Artery–Nerve dataset (586 RGB images) with expert multiclass masks. Images were resized to 128×128 and intensities normalized. We trained six transfer-learning U-Net backbones (VGG16, ResNet50, MobileNetV2, EfficientNet-B0, Inception-ResNetV2, MultiResUNet), a Vision Transformer (ViT), and a weighted soft-probability ensemble of OrganFocus U-Net, a baseline U-Net, and the ViT. Augmentation (flips, small rotations, mild intensity changes, elastic deformations) was applied to the training split only; all augmented variants of an image remained within the same split. An 80/10/10 image-level split (train/validation/test) with class balance was used. The primary metric was mean Intersection-over-Union (mIoU). Ninety-five percent confidence intervals (95% CI) were computed by non-parametric bootstrap over test images (≥5,000 resamples; percentile method). Results: The best transfer-learning backbone (VGG16) achieved mIoU 76.85% (95% CI: 73.71–79.76%). The ViT achieved 84.00% (95% CI: 81.52–85.35%). The weight-optimized ensemble reached 86.57% (95% CI: 84.49–87.99%). Grad-CAM heatmaps showed anatomically coherent focus across models. Conclusions: Combining complementary inductive biases from convolutional encoders and Transformers via a weighted ensemble yields high-accuracy segmentation on clinically acquired laparoscopic images, while Grad-CAM supports case-level interpretability. Future work will profile compute/throughput and validate on multi-institutional data and surgical videos.




