Comparison of deep learning models for real-time neural tissue segmentation in spinal endoscopy

Background In biportal endoscopic spine surgery (BESS), accurately identifying neural structures, mainly the spinal nerve roots, dural sac, and the cauda equina, is crucial for preventing dural tears and achieving optimal clinical outcomes. Contrary to the growing popularity of deep learning in biom...

Full description

Saved in:
Bibliographic Details
Published in:BMC medical imaging Vol. 25; no. 1; pp. 470 - 14
Main Authors: Rhee, Wounsuk, Lee, Hyung Rae, Chang, Bong-Soon, Chang, Sam Yeol, Kim, Hyoungmin
Format: Journal Article
Language:English
Published: London BioMed Central 17.11.2025
BioMed Central Ltd
BMC
Subjects:
ISSN:1471-2342, 1471-2342
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background In biportal endoscopic spine surgery (BESS), accurately identifying neural structures, mainly the spinal nerve roots, dural sac, and the cauda equina, is crucial for preventing dural tears and achieving optimal clinical outcomes. Contrary to the growing popularity of deep learning in biomedical image processing, its application to BESS has not yet been well established. We propose a two-stage framework for real-time neural tissue segmentation from spinal endoscopy video, and compare various types of deep learning architectures. Methods 6410 intraoperative images from 28 patients were collected and split at the patient level into 4661 and 1749 images for training and testing, respectively. For each set, 2307 and 635 images contained neural tissue. First, a lightweight image classifier that determines the presence of neural tissue was developed. Then, six variants of the U-Net family and SegFormers for neural tissue segmentation were trained. Ground truth segmentation masks were generated by a spine specialist with more than four years of experience. AUROC and DSC on the test set were the primary outcome measures for the classification and segmentation models, respectively, and computational burden was also measured, followed by qualitative assessment of the output predictions. Results ResNet-18 achieved the highest test AUROC of 0.92 (95% CI: 0.91–0.93), running at a mean inference time of 1.2 ms per image, outperforming MobilenetV3-Large in all performance metrics ( p  < 0.001) and computational efficiency. AR2U-Net exhibited the highest test DSC of 0.80 (95% CI: 0.78–0.81), IoU of 0.70 (95% CI: 0.68–0.72), and AUPRC of 0.88 (95% CI: 0.87–0.89). The test performance metrics of other U-Net variants did not differ significantly, except for AUPRC ( p  < 0.001), while those of SegFormer-B0 and SegFormer-B1 were significantly inferior ( p  < 0.001). U-Net variants were generally operable at a reasonable speed (25–40 FPS), in which U-Net and AU-Net exhibited inference times of less than 30 ms. Conclusions We have developed and compared deep learning models for neural tissue segmentation in spinal endoscopy and explored their potential for clinical application to real-time surgical video streams. Clinical trial number Not applicable.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1471-2342
1471-2342
DOI:10.1186/s12880-025-01918-4