BACKGROUND: Reliable classification of ischaemic stroke (IS) aetiological subtypes is required in research and clinical practice, but the predictive properties of these subtypes in population studies with incomplete investigations are poorly understood. AIMS: To compare the prognosis of aetiologically-classified IS subtypes and use machine learning (ML) to classify incompletely investigated IS cases. METHODS: In a 9-year follow-up of a prospective study of 512,726 Chinese adults, 22,216 incident IS cases, confirmed by clinical adjudication of medical records, were assigned subtypes using a modified Causative Classification System for Ischemic Stroke (CCS) (LAA: large artery atherosclerosis; SAO: small artery occlusion, CE: cardioaortic embolism; or undetermined aetiology) and classified by CCS as "evident", "probable", or "possible" IS cases. For incompletely investigated IS cases where CCS yielded an undetermined aetiology, a ML model was developed to predict IS subtypes from baseline risk factors and screening for cardioaortic sources of embolism. The 5-year risks of subsequent stroke and all-cause mortality (measured using cumulative incidence functions and 1 minus Kaplan-Meier estimates, respectively) for the ML-predicted IS subtypes were compared with aetiologically-classified IS subtypes. RESULTS: Among 7,443 IS subtypes with evident or probable aetiology, 66% had SAO, 32% had LAA and 2% had CE, but proportions of SAO-to-LAA cases varied by regions in China. CE had the highest rates of subsequent stroke and mortality (43.5%, 40.7%), followed by LAA (43.2%, 17.4%) and SAO (38.1%, 11.1%), respectively. ML provided classifications for cases with undetermined aetiology and incomplete clinical data (24% of all IS cases; n=5,276), with area under the curves (AUC) of 0.99 (0.99-1.00) for CE, 0.67 (0.64-0.70) for LAA, and 0.70 (0.67-0.73) for SAO for unseen cases. ML-predicted IS subtypes yielded comparable subsequent stroke and all-cause mortality rates to the aetiologically-classified IS subtypes. CONCLUSIONS: This study highlighted substantial heterogeneity in prognosis of IS subtypes and utility of ML approaches for classification of IS cases with incomplete clinical investigations.
Int J Stroke
Aetiology, China, Classification, Ischaemic stroke, Machine learning, Prevention