Validation of an automated, artificial-intelligence-based system for grading radiological features of degeneration on MRIs of the lumbar spine — The International Society for the Study of the Lumbar Spine

Validation of an automated, artificial-intelligence-based system for grading radiological features of degeneration on MRIs of the lumbar spine (#75)

Alexandra Grob 1 , Markus Loibl 1 , Amir Jamaludin 2 , Jeremy CT Fairbank 3 , Sebastian Winklhofer 4 , Tamas F Fekete 1 , Daniel Haschtmann 1 , Frank S Kleinstück 1 , Dezsö Jeszenszky 1 , Francois Porchet 1 , Anne F Mannion 1
  1. Schulthess Klinik, Zürich, Switzerland
  2. Department of Engineering Science, University of Oxford, Oxford, UK
  3. Nuffield Department of Rheumatology, Orthopaedics and Musculoskeletal Sciences, University of Oxford, , Oxford, United Kingdom
  4. Department of Neuroradiology, Clinical Neuroscience Center, University Hospital Zurich, Zurich, Switzerland

INTRODUCTION Magnetic resonance imaging (MRI) is used to detect degenerative changes of the lumbar spine. SpineNet (SN), a computer vision-based system, performs an automated analysis of degenerative features in MRI scans aiming to provide high accuracy, consistency and objectivity (1). This study aimed to externally validate SN’s ratings against those of an expert radiologist.

METHODS MRIs of 882 patients (mean age, 72 ± 8.8 years) with degenerative spinal disorders from two previous trials carried out in our spine center between 2011 and 2019, were analyzed by an expert radiologist. Lumbar segments (L1/2 – L5/S1) were graded for Pfirrmann Grading (PG), Spondylolisthesis (SL) and Central Canal Stenosis (CCS). SN’s analysis for the equivalent parameters was generated. Inter-rater agreement was analyzed using kappa coefficients and Spearman correlation (Rho) coefficients and class average accuracy (CAA).

RESULTS 4410 lumbar segments were analyzed. Depending on the vertebral level in question, kappa statistics showed moderate to substantial agreement  between the radiologist and SN for PG (range for kappa, 0.63-0.77 (all vertebral levels together, 0.72), CAA 45-68% (all levels together, 55%), Rho 0.64-0.79 (all levels together, 0.72)); slight to substantial agreement for SL (kappa 0.07-0.60 (all levels together, 0.63), CAA 47-57% (all levels together, 56%), Rho 0.13-0.64 (all levels together, 0.36)); and slight to substantial agreement for CCS (kappa 0.17-0.57 (all levels together, 0.60), CAA 35-41% (all levels together, 43%), Rho 0.32-0.74 (all levels together, 0.57). Considering all vertebral levels together, SN indicated more severe disc degeneration (PG) but less severe SL and CCS than did the radiologist (p<0.01).

DISCUSSION SN would appear to be a robust and reliable tool with the ability to grade degenerative features such as PG, SL or CCS in lumbar MRIs with moderate to substantial agreement compared to the current gold-standard, the radiologist, and with kappa values for agreement comparable to those reported in the literature for interrater reliability between any two radiologists. It represents an effective and efficient alternative for analyzing MRIs from large cohorts for diagnostic and research purposes.

(1) Jamaludin, A., et al., ISSLS PRIZE IN BIOENGINEERING SCIENCE 2017: Automation of reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist. Eur Spine J, 2017. 26(5): p. 1374-1383.

#ISSLS2022