Background: After definitive radiation therapy (RT) for head and neck (HN) cancer, 18FDG-PET/CT is typically utilized to detect residual disease. Oncology clinicians must rely on a combination of personal imaging review and interpretation of freeform nuclear medicine (NM) reports to determine whether residual disease is present and manage the patient accordingly. Multiple studies have demonstrated significant variability in both the interpretation of imaging results as well as in the communication/interpretation of the findings within the radiology reports. The aim of our study was to assess variability in the interpretation of freeform PET/CT radiology reports and compare this to a standardized NM interpretation of post-RT PET/CT scans for patients treated with RT for HN cancer.
Hypothesis: We hypothesize that low rates of interrater reliability (IRR) among clinicians and low rates of agreement between clinician and NM specialists exist when interpretating post-RT PET/CT scans for head and neck cancer patients. It is further hypothesized that a systematic scale to interpret PET/CT imaging is better associated with disease outcomes than freeform radiology reports.
Methods: We identified 176 patients treated with RT for squamous cell carcinoma of the HN. The free-form PET/CT report was abstracted from the medical record. Four blinded radiation oncologists reviewed the PET/CT report and scored their interpretation using a validated scale (1: No residual disease, 2: indeterminate, 3: residual disease, 4: progressive disease). Scores 1-2 were considered negative; scores 3-4 were considered positive for disease. In cases of disagreement, a clinician consensus score was generated to compare with the true value, a score generated by faculty NM review of the PET/CT images. IRR was assessed using percent agreement and Kappa Statistics and tests of diagnostic accuracy were performed. Overall survival (OS), progression-free survival (PFS), and locoregional failure (LRF) were compared across clinician and NM scoring.
Results: The percent agreement and Kappa Statistic were 65.3% and 0.682 (indicating substantial agreement), respectively. Sensitivity, specificity, PPV, and NPV of the consensus clinician interpretation were 84.1%, 57.9%, 19.8%, and 96.7% respectively. PFS and LRF were associated with clinician consensus score (p<0.05), but OS was not. There was 63.7% agreement between clinician consensus and NM specialist interpretations (K = 0.365 indicating fair agreement). Diagnostic accuracy of the nuclear medicine interpretation was as follows: sensitivity 85.0%, specificity 44.4%, PPV 15.9% and NPV 96.0%. NM specialist score was strongly associated with OS, PFS and LRF and provided better discrimination between negative (score 1-2) and positive (score 3-4). 3-year OS rates for nuclear medicine score 1, 2, 3 and 4 were 92%, 91%, 72%, 67%, respectively (p=0.002); 3-yr PFS was 85%, 88%, 58% and 54% (p<0.001); 3-year LRF rates were 9%, 6%, 30%, 31%. (p=0.002)
Conclusions: This study showed substantial IRR in the interpretation of free-form post-RT PET/CT reports among clinicians. However, clinicians were found to have low agreement and IRR with the NM interpretation of the actual images. N. 48
Source of mentor’s funding or other support that funded this research: Funding for this research was provided by Kulynuch Family Funds for Medical Research in Honor of Timothy C. Pennell, MD.
Powered by Acadiate
© 2011-2024, Acadiate Inc. or its affiliates · Privacy