A risk model for lung cancer incidence.
Hoggart C., Brennan P., Tjonneland A., Vogel U., Overvad K., Østergaard JN., Kaaks R., Canzian F., Boeing H., Steffen A., Trichopoulou A., Bamia C., Trichopoulos D., Johansson M., Palli D., Krogh V., Tumino R., Sacerdote C., Panico S., Boshuizen H., Bueno-de-Mesquita HB., Peeters PHM., Lund E., Gram IT., Braaten T., Rodríguez L., Agudo A., Sánchez-Cantalejo E., Arriola L., Chirlaque M-D., Barricarte A., Rasmuson T., Khaw K-T., Wareham N., Allen NE., Riboli E., Vineis P.
Risk models for lung cancer incidence would be useful for prioritizing individuals for screening and participation in clinical trials of chemoprevention. We present a risk model for lung cancer built using prospective cohort data from a general population which predicts individual incidence in a given time period. We build separate risk models for current and former smokers using 169,035 ever smokers from the multicenter European Prospective Investigation into Cancer and Nutrition (EPIC) and considered a model for never smokers. The data set was split into independent training and test sets. Lung cancer incidence was modeled using survival analysis, stratifying by age started smoking, and for former smokers, also smoking duration. Other risk factors considered were smoking intensity, 10 occupational/environmental exposures previously implicated with lung cancer, and single-nucleotide polymorphisms at two loci identified by genome-wide association studies of lung cancer. Individual risk in the test set was measured by the predicted probability of lung cancer incidence in the year preceding last follow-up time, predictive accuracy was measured by the area under the receiver operator characteristic curve (AUC). Using smoking information alone gave good predictive accuracy: the AUC and 95% confidence interval in ever smokers was 0.843 (0.810-0.875), the Bach model applied to the same data gave an AUC of 0.775 (0.737-0.813). Other risk factors had negligible effect on the AUC, including never smokers for whom prediction was poor. Our model is generalizable and straightforward to implement. Its accuracy can be attributed to its modeling of lifetime exposure to smoking.