Interobserver variability impairs radiologic grading of primary graft dysfunction after lung transplantation

Schwarz S1, Muckenhuber M1, Benazzo A1, Beer L2, Gittler F2, Prosch H2, Röhrich S2, Milos R2, Schweiger T1, Jaksch P1, Klepetko W1, Hoetzenecker K3. (Medical University of Vienna, Vienna, Austria. Electronic address: konrad.hoetzenecker@meduniwien.ac.at.

J Thorac Cardiovasc Surg. 2019 Sep;158(3):955-962.e1 doi: 10.1016/j.jtcvs.2019.02.134. Epub 2019 May 11.

OBJECTIVES:
The current score for primary graft dysfunction after lung transplantation relies heavily on chest radiographs, and radiologic judgment can make the difference between the lowest (primary graft dysfunction 0) and the highest (primary graft dysfunction 3) grade. This study aimed to evaluate interobserver variability of the scoring of postoperative chest radiographs and its impact on primary graft dysfunction grades in a large single-center cohort.

METHODS:
We retrospectively analyzed 497 lung transplantations performed between January 2010 and July 2016 at the Medical University of Vienna. Five trained thoracic radiologists were asked to independently examine postoperative chest radiographs performed at 0 to 6 hours, 24 hours, 48 hours, and 72 hours after arrival at the intensive care unit. Interobserver variability was calculated using Fleiss’ kappa (κ) statistics.

RESULTS:
A total of 1988 chest radiographs were evaluated. Consensus among all 5 radiologists was found in only 826 cases (43.0%). At 0 to 6 hours and 24 hours, only a moderate agreement was found among the 5 radiologists (κ = 0.456 and 0.456, respectively), and agreement was even worse at 48 and 72 hours (κ = 0.405 and κ = 0.409). On the basis of this high interobserver variability, best and worst case scenarios were calculated leading to primary graft dysfunction 3 rates of 8.4% versus 28.4% at 0 to 6 hours, 1.8% versus 4.8% at 24 hours, 2.0% versus 5.3% at 48 hours, and 0.2% versus 3.1% at 72 hours. A high recipient body mass index and size-reduced transplants were found to be factors associated with higher rates of interobserver variability.

CONCLUSIONS:
The substantial interobserver variability found in this retrospective analysis underlines the difficulty to adequately grade post-transplant organ function. Future revisions of the primary graft dysfunction grading should take this problem into consideration.

Copyright © 2019 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved. KEYWORDS: interobserver variability; lung transplantation; primary graft dysfunction PMID: 31204131 DOI: 10.1016/j.jtcvs.2019.02.134