Comparative analysis of natural language processing methodologies for classifying computed tomography enterography reports in Crohn's disease patients

PMID: 40442294
Source: NPJ Digit Med
Publication date: 2025-05-29
Year: 2025

Abstract

Imaging is crucial to assess disease extent, activity, and outcomes in inflammatory bowel disease (IBD). Artificial intelligence (AI) image interpretation requires automated exploitation of studies at scale as an initial step. Here we evaluate natural language processing to classify Crohn's disease (CD) on CTE. From our population representative IBD registry a sample of CD patients (male: 44.6%, median age: 50 IQR37-60) and controls (n = 981 each) CTE reports were extracted and split into training- (n = 1568), development- (n = 196), and testing (n = 198) datasets each with around 200 words and balanced numbers of labels, respectively. Predictive classification was evaluated with CNN, Bi-LSTM, BERT-110M, LLaMA-3.3-70B-Instruct and DeepSeek-R1-Distill-LLaMA-70B. While our custom IBDBERT finetuned on expert IBD knowledge (i.e. ACG, AGA, ECCO guidelines), outperformed rule- and rationale extraction-based classifiers (accuracy 88.6% with pre-tuning learning rate 0.00001, AUC 0.945) in predictive performance, LLaMA, but not DeepSeek achieved overall superior results (accuracy 91.2% vs. 88.9%, F1 0.907 vs. 0.874).