The Behavioral Risk Factor Surveillance System (BRFSS) is an annual program ran by the Center of Disease Control (CDC) and State Health Departments to survey the population in the United States. Standardized questions are asked via telephone (cellular and landline) that revolve around health-related risk behaviors, chronic health conditions, and the use of preventive services. The goal for this project to predict the probability of a survey participant having diabetes and identify those at probable risk of diabetes. Ideally the constructed model would be used during the survey and notify respondants who are at high risk of diabetes and offer an incentive to decrease their chances of diagnosis.
Methodology
The analysis performed encompassed using exploratory data analysis to identify variables belonging to the dataset that would contribute to a viable logistic regression model to predict the probabilities of a diabetic health outcome. The probability estimates provided by the model would then be used to categorize survey participants who do not have diabetes into a risk classes; the highest risk class (those in the 45%-49.99%) would be flagged for an incentive program.
Model Findings
Following exploratory data analysis, two models were constructed. alpha a simple three dimensional model that can be used for visualization and explanatory purposes and bravo a complex seven dimensional model that would be operationalized for survey time use. See appendix for visualizations, model outputs, and model risk categorizations. The complex model was the most performant and is proposed to be used as a production model (see appendix for performance). A takeaway from the analysis is that the model provides a proportion of diabetic estimates. Using the proportions below, we can claim that if we were to follow similar survey processes for 1000 people, 40 of those people are estimated to have diabetes. Given that 961 people would not have diabetes, we can use the provided risk classification from the model to estimate that 18 of non diabetics are at high risk of diabetes.
Risk Count Percentage
0 Low Risk 233312 0.919743
1 Medium Risk 5915 0.023318
2 High Risk 4542 0.017905
3 Has Diabetes 9902 0.039035
39 out of 1000 people have diabetes
Out of the 961 without diabetes, 17 of them are at high risk