AutoPET Challenge on Fully Automated Lesion Segmentation in Oncologic PET/CT Imaging, Part 2: Domain Generalization.

Dexl J., Gatidis S., Früh M., Jeblick K., Mittermeier A., Stüber AT., Schachtner B., Topalis J., Fabritius MP., Gu S., Murugesan GK., VanOss J., Ye J., He J., Alloula A., Papież BW., Mesbah Z., Modzelewski R., Hadlich M., Marinov Z., Stiefelhagen R., Isensee F., Maier-Hein KH., Galdran A., Nikolaou K., la Fougère C., Kim M., Kallenberg N., Kleesiek J., Herrmann K., Werner R., Ingrisch M., Cyran CC., Küstner T.

This article reports the results of the second iteration of the autoPET challenge on automated lesion segmentation in whole-body PET/CT, held in conjunction with the 26th International Conference on Medical Image Computing and Computer Assisted Intervention in 2023. In contrast to the first autoPET challenge, which served as a proof of concept, this study investigates whether machine learning-based segmentation models trained on data from a single source can maintain performance across clinically relevant variations in PET/CT data, reflecting the demands of real-world deployment. Methods: A comprehensive biomedical segmentation challenge on PET/CT domain generalization was designed and conducted. Participants were tasked to train machine learning models on annotated whole-body 18F-FDG data (n = 1,014). These models were then evaluated on a test set of 200 samples from 5 clinically relevant domains, including variations in institutions, pathologies, and populations and a different tracer. Performance was measured in terms of average dice similarity coefficient, average false-positive volume, and average false-negative volume. The best-performing teams were awarded in 3 categories. Furthermore, a detailed analysis was conducted after the challenge, examining results across domains and unique instances, along with a ranking analysis. Results: Generalization from a single-source domain remains a significant challenge. Seventeen international teams successfully participated in the challenge. The best-performing team reached an average dice similarity coefficient of 0.5038, a mean false-positive volume of 87.8388 mL, and a mean false-negative volume of 8.4154 mL on the test set. nnU-Net was the most commonly used framework, with most participants using a 3-dimensional U-Net. Despite competitive in-domain results, out-of-domain performance deteriorated substantially, particularly on pediatric and prostate-specific membrane antigen data. Detailed error analysis revealed frequent false-positives due to physiologic uptake and decreased sensitivity in detecting small or low-uptake lesions. A majority-vote ensemble offered minimal performance gains, whereas an oracle ensemble indicates hypothetical gains. Ranking analysis showed no single team consistently outperformed all others across ranking schemes. Conclusion: The second autoPET challenge provides a comprehensive evaluation of the current state of automated PET/CT tumor segmentation, highlighting both progress and persistent challenges of single-source domain generalization and the need for diverse public datasets to enhance algorithm robustness.

More information Original publication

DOI

10.2967/jnumed.125.270260

Type

Journal article

Publication Date

2026-03-02T00:00:00+00:00

Volume

Pages

481 - 488

Total pages

Keywords

PET/CT, biomedical image analysis challenge, deep learning, domain generalization, oncology, segmentation, Positron Emission Tomography Computed Tomography, Humans, Image Processing, Computer-Assisted, Automation, Neoplasms, Machine Learning, Fluorodeoxyglucose F18, Male

Cookies on this website