/

STEM Salary Model

An ensemble of models which predict employee salaries in STEM fields based on a variety of predictors.


Overview

This project focuses on the development of a model to predict salaries in STEM fields. In this project, the optimization metric used was Mean Absolute Error. We leveraged the following models and ensembling techniques: Ridge, Lasso, Random Forest, AdaBoost, Gradient Boosting, and XGBoost. Based on this model, stakeholders including students and employers can more accurately predict salaries to correctly value work and avoid overcompensating employees.

Technologies

This project uses the following technologies:

  • Python: An easy-to-learn language for data science; countless libraries for data manipulation, cleaning, and analysis
  • scikit-learn: A comprehensive machine learning library built for Python. It features various classification and regression algorithms that we used.
  • Pandas: A widely-used Python library built for data manipulation and analysis.
  • NumPy: A Python library built for large, multidimensional arrays with many mathematical functions to operate on those arrays.
  • Matplotlib and Seaborn: Python data visualization libraries.