Python Scripting: Train, Test, and Predict with ML models
Created Python scripts for EDA, Data Cleaning, training ML models, and predictions.
Explored data, applied feature engineering techniques, trained models using Logistic Regression, Random Forest, and XGBoost. 3 separate training scripts for each of the models giving output model file, ROC curve, Feature Importance graph. Another prediction script uses output model/s and new unlabeled data to predict Classification probabilities.
Preprocessed by joining over the appropriate key column. Explored and cleaned by handling null and missing values. Applied PCA to reduce feature dimension. Trained 3 models with appropriate hyperparameters. Applied Inverse PCA transformation on prediction data before using model/s.
Working on creating a single file for providing data, input argument (customize), other requirements/resources to implement actions. Looking to create a pipeline on Cloud.
https://github.com/keshavs0305/scripting-training-and-prediction-tasks