• KOSASS
  • Contact us
  • E-Submission
ABOUT
BROWSE ARTICLES
EDITORIAL POLICY
FOR CONTRIBUTORS

Page Path

1
results for

"Python"

Filter

Article category

Keywords

Publication year

Authors

"Python"

Original Article
Automated Clinical Questionnaire Processing in Spine Surgery Using LLM Vision API: Comparative Performance Evaluation of Claude and GPT Models
Sang-Min Park, Jiwon Park, Ho-Joong Kim, Jin S. Yeom
J Adv Spine Surg 2025;15(2):71-77.   Published online December 31, 2025
DOI: https://doi.org/10.63858/jass.15.2.71
Purpose
This study evaluates the performance of Claude and GPT LLM Vision APIs for automated clinical questionnaire processing in spine surgery by comparing accuracy, efficiency, reproducibility, and cost-effectiveness.
Methods
Clinical questionnaires from 56 patients (336 total pages) were processed using a Python 3.12-based system incorporating PDF preprocessing, image enhancement via OpenCV, and direct LLM Vision analysis. Both models were evaluated on 26 questionnaire items (1,456 data points) using accuracy comparison, processing time measurement, token utilization analysis, and intra-class correlation coefficient (ICC) assessment through three independent iterations.
Results
GPT achieved 98.83% accuracy (1,439/1,456) compared to Claude's 97.94% (1,426/1,456). Both models processed questionnaires in 27 seconds per set, representing 68% time reduction versus manual entry (85 seconds). GPT demonstrated 59% cost advantage ($0.023 vs. $0.056 per questionnaire), while Claude showed superior reproducibility (ICC 0.98 vs. 0.96). GPT achieved 100% accuracy across 21 items versus Claude's 17 items. Error analysis identified predominantly handwriting recognition (52%) and image quality issues (28%), with 89% of errors successfully flagged for review.
Conclusions
Both models achieve clinical-grade performance exceeding 90% accuracy. GPT demonstrates superior accuracy and cost-effectiveness, while Claude provides better reproducibility. Model selection should be guided by institutional priorities regarding accuracy, reproducibility, and operational scale.
  • 144 View
  • 5 Download
TOP