BRFSS 2014 Report

Healthcare Analytics

Author

Elmer Camargo

Published

6/26/24

Overview

The Behavioral Risk Factor Surveillance System (BRFSS) is an annual program ran by the Center of Disease Control (CDC) and State Health Departments to survey the population in the United States. Standardized questions are asked via telephone (cellular and landline) that revolve around health-related risk behaviors, chronic health conditions, and the use of preventive services. The goal for this project to predict the probability of a survey participant having diabetes and identify those at probable risk of diabetes. Ideally the constructed model would be used during the survey and notify respondants who are at high risk of diabetes and offer an incentive to decrease their chances of diagnosis.

Methodology

The analysis performed encompassed using exploratory data analysis to identify variables belonging to the dataset that would contribute to a viable logistic regression model to predict the probabilities of a diabetic health outcome. The probability estimates provided by the model would then be used to categorize survey participants who do not have diabetes into a risk classes; the highest risk class (those in the 45%-49.99%) would be flagged for an incentive program.

Model Findings

Following exploratory data analysis, two models were constructed. alpha a simple three dimensional model that can be used for visualization and explanatory purposes and bravo a complex seven dimensional model that would be operationalized for survey time use. See appendix for visualizations, model outputs, and model risk categorizations. The complex model was the most performant and is proposed to be used as a production model (see appendix for performance). A takeaway from the analysis is that the model provides a proportion of diabetic estimates. Using the proportions below, we can claim that if we were to follow similar survey processes for 1000 people, 40 of those people are estimated to have diabetes. Given that 961 people would not have diabetes, we can use the provided risk classification from the model to estimate that 18 of non diabetics are at high risk of diabetes.

Risk Category Estimated Probability of Diabetes Estimated Proportion Per Category
Low Risk 35% to 39.99% 92.0%
Medium Risk 40% to 44.99% 2.2%
High Risk 45% to 49.99% 1.8%
Has Diabetes 50% or Higher 4.0%

Appendix

brfss_2014.train_simple_model()
Train Score: 0.8604294780826238
Test Score: 0.8575764742983286
Coefs: [[-2.34938268  0.09087554 -0.00279418  0.09087554  0.22390481 -0.37309528]]
Intercept: [-2.34938268]
Train Confusion Matrix
        0     1
0  173306  1503
1   26822  1313
Train Classification Report
              precision    recall  f1-score   support

       False       0.87      0.99      0.92    174809
        True       0.47      0.05      0.08     28135

    accuracy                           0.86    202944
   macro avg       0.67      0.52      0.50    202944
weighted avg       0.81      0.86      0.81    202944

Test Confusion Matrix
       0    1
0  43183  342
1   6884  327
Test Classification Report
              precision    recall  f1-score   support

       False       0.86      0.99      0.92     43525
        True       0.49      0.05      0.08      7211

    accuracy                           0.86     50736
   macro avg       0.68      0.52      0.50     50736
weighted avg       0.81      0.86      0.80     50736

Train AUC Score is: 0.5190349479738806
Test AUC Score is: 0.5187449164038826
Final AUC Score is: 0.6681904304065328
Final Classification Report
              precision    recall  f1-score   support

       False       0.87      0.99      0.92    218334
        True       0.47      0.05      0.09     35346

    accuracy                           0.86    253680
   macro avg       0.67      0.52      0.51    253680
weighted avg       0.81      0.86      0.81    253680

        0     1
0  216447  1887
1   33666  1680
pe_plot, no_pe_plot = brfss_2014.simple_model_boundary_plot()
pe_plot.fig.show()
no_pe_plot.fig.show()
brfss_2014.train_complex_model()
Train Score: 0.865406220435194
Test Score: 0.8643172500788395
Coefs: [[-1.53321984  0.06270447 -0.00197162 -0.5620955  -0.66259759 -0.52697951
  -0.24941241 -0.11986201 -0.02150605  0.0589947   0.06743298  0.16628645
   0.25445011  0.2231488   0.04356307 -0.20464288 -1.382092   -0.74929809
  -0.1123636   0.27980021  0.43073364 -0.78075042 -0.5888835  -0.42261334
  -0.24084965 -0.09638229  0.03568032  0.23935079  0.32122827  0.06270447
   0.11631694  0.05509008  0.09354213  0.71607553  0.55348007 -0.21230396]]
Intercept: [-1.55214103]
Train Confusion Matrix
        0     1
0  171393  3416
1   23899  4236
Train Classification Report
              precision    recall  f1-score   support

       False       0.88      0.98      0.93    174809
        True       0.55      0.15      0.24     28135

    accuracy                           0.87    202944
   macro avg       0.72      0.57      0.58    202944
weighted avg       0.83      0.87      0.83    202944

Test Confusion Matrix
       0     1
0  42722   803
1   6081  1130
Test Classification Report
              precision    recall  f1-score   support

       False       0.88      0.98      0.93     43525
        True       0.58      0.16      0.25      7211

    accuracy                           0.86     50736
   macro avg       0.73      0.57      0.59     50736
weighted avg       0.83      0.86      0.83     50736

Train AUC Score is: 0.5655092364979978
Test AUC Score is: 0.5691279334152757
Final AUC Score is: 0.7175045122483217
Final Classification Report
              precision    recall  f1-score   support

       False       0.88      0.98      0.93    218334
        True       0.56      0.16      0.24     35346

    accuracy                           0.87    253680
   macro avg       0.72      0.57      0.58    253680
weighted avg       0.83      0.87      0.83    253680

        0     1
0  213951  4383
1   29827  5519
brfss_2014.model_risk_categorizations(model_name="complex")
           Risk   Count  Percentage
0      Low Risk  233312    0.919743
1   Medium Risk    5915    0.023318
2     High Risk    4542    0.017905
3  Has Diabetes    9902    0.039035
39 out of 1000 people have diabetes
Out of the 961 without diabetes, 17 of them are at high risk