Alcohol Mortality Model

A logistic regression model which assesses global alcohol-attributable mortality rates. Final project for STAT 303-2 at Northwestern University.

Repository

Presentation

Overview

This project’s goal was to build a model that will be able to identify if a country is at a high risk for having too many deaths attributable to alcohol (as a proportion of total deaths). We sought to optimize the model’s classification accuracy, along with its false negative rate. To test the metrics, we used 10-fold cross validation. The final model was developed using forward selection with interactions between some predictors. Check out the GitHub repository for a more in-depth report of the project.

Technologies

This project uses the following technologies:

Python: An easy-to-learn language for data science; countless libraries for data manipulation, cleaning, and analysis
Pandas: A widely-used Python library built for data manipulation and analysis.
NumPy: A Python library built for large, multidimensional arrays with many mathematical functions to operate on those arrays.
Matplotlib and Seaborn: Python data visualization libraries.