Better tools for better estimates: Improving approaches to handling missing data in Swiss cancer registries

Has survival after lung cancer increased or decreased in Switzerland in the last decade? Does cancer prevalence vary between Swiss language regions? What factors determine when patients are diagnosed with colorectal cancer? Cancer registries all over the world collect relevant information about cancer occurrence, treatment and outcomes to answer questions like this. Cancer registries, like other data collections efforts, work best when they have complete data about patient outcomes. Until recently, Swiss cancer registries were implemented on a cantonal level. Despite their efforts for data completeness, there is up to 24% incomplete data - including on vital status (whether a person has already died, and if so, when), a key measure for assessing survival. Without a way to accurately fill in this information, analysis of Swiss cancer registry data leads to biased survival estimates, even when the incomplete cases are not recent, and ultimately to inaccurate survival estimates for cancer patients in Switzerland. This makes comparisons and inferences about Swiss cancer outcomes difficult. While the literature provides a number of *ad hoc* methodological solutions, it remains unclear, however, if any of the methods provide an accurate outlook on cancer survival in Switzerland. Further, it is also unclear which methods are most appropriate to use in the various analysis tasks with which a registry is charged. The current proposal examines the accuracy, ease of use and generalizability of a set of statistical approaches. In a first step, we will artificially remove vital status and shorten time of follow-up in a set of complete cancer registries, and then apply each approach in order to compare them. In a second step, we will use the corrected data to perform some analysis tasks typical to cancer registries to determine how sensitive their results are to the choice of correction approach. Our project is the first to systematically compare approaches to correcting for missing death information in cancer registries. While the methods we are proposing are not new in their use in cancer registries, no one has previously compared them head-to-head in order to determine which approaches are the best and easiest to use for handling missing vital status follow-up from cancer registries. The results of this project will give researchers clear guidance on the choice of statistical tools to be used when vital status data are missing. Using these methodological approaches should reduce bias in the estimates of cancer survival as reported by the cancer registries. Results will be useful for other registries (of all kinds) around the world facing similar challenges with missing data.