Analysis of
2014 Behavioral Risk Factor Surveillance System

Healthcare Analytics

Table of Contents

  1. Executive Summary
  2. BRFSS Overview + Scope of Analysis
  3. Exploratory Data Analysis
  4. Modeling
  5. Recommendations + Follow Up Research

Executive Summary

  • BRFSS is an annnual survey of americans over the age of 18
  • Survey data is tricky (sampling + survey designs)
  • Based on our predictive model, we would estimate 39/1000 people have diabetes
  • Out of those without diabetes, we can incentivize 38/961 people who are at high risk of diabetes

BRFSS Overview + Scope of Analysis

  • Behaviorial Risk Factor Surveillance System (BRFSS) is an annual survey corrdinated by State Level Health Departments and the Center for Disease Control (CDC)
  • Phone surveys are conducted in all states asking questions pertaining to demographics, risk behaviors, chronic disease and conditions, etc…
  • The aim of this study is to predict the probability of a survey participant have diabetes using common health survey variables and identify those at probable risk of diabetes

Exploratory Data Analysis

Exploratory Data Analysis (Continued)

Exploratory Data Analysis (Continued)

Exploratory Data Analysis (Continued)

Modeling

  • diabetes ~ ["distance_bmi"^2, "age", "physical_activity"]
    • Accuracy : .86
    • AUC: .66
    • Count Predicted True: 3567
    • Count High Risk (Prob between 45%-49.9%): 2845
  • diabetes ~ ["distance_bmi"^2, OHOT("age", "gen_health", "income_level"), "difficulty_walk", "high_bp", "high_chol"]
    • Accuracy : .87
    • AUC: .72
    • Count Predicted True: 9902
    • Count High Risk (Prob between 45%-49.9%): 4542

Simple

Simple (Continued)

Recommendations + Follow Up Research

  • Investigate the fix for the survey and sampling design
  • Although, the models false positives are considerable, the model does provide guidance on high risk population that may be provided with additional resources
  • Follow up independent research should entail handling additional years, provide a means of continuous training, and address organizational integration

Extra (Confusion Matrix)

Complex False True
False 213951 4383
True 29827 5519


precision    recall  f1-score   support

       False       0.88      0.98      0.93    218334
        True       0.56      0.16      0.24     35346

    accuracy                           0.87    253680
   macro avg       0.72      0.57      0.58    253680
weighted avg       0.83      0.87      0.83    253680

Risk   Count  Percentage
Low Risk  233312    0.919743
Medium Risk    5915    0.023318
High Risk    4542    0.017905
Has Diabetes    9902    0.039035
39 out of 1000 people have diabetes
Out of the 961 without diabetes, 38 of them are at high risk

Extra (Confusion Matrix)

Simple False True
False 216447 1887
True 33666 1680


precision    recall  f1-score   support

       False       0.87      0.99      0.92    218334
        True       0.47      0.05      0.09     35346

    accuracy                           0.86    253680
   macro avg       0.67      0.52      0.51    253680
weighted avg       0.81      0.86      0.81    253680

Risk   Count  Percentage
Low Risk  243813    0.961105
Medium Risk    3455    0.013620
High Risk    2845    0.011215
Has Diabetes    3567    0.014061
14 out of 1000 people have diabetes
Out of the 986 without diabetes, 14 of them are at high risk