End-to-End Machine Learning about health care

Posted by Martin sertin on May 11, 2025

🧠 End-to-End Machine Learning Project Report

📌 Project Overview

This project involved building and evaluating machine learning models to predict

The full ML lifecycle was covered, including data preprocessing, modeling, evaluation, and explainability.


🛠️ Workflow Summary

✅ Data Preprocessing

  • Handled missing values using IterativeImputer.
  • Removed outliers with a custom OutlierClipper.
  • Scaled numerical features using StandardScaler.
  • Selected top features via SelectKBest with f_regression.

✅ Data Splitting

  • Split into train (68%), validation (17%), and holdout (15%) sets using train_test_split.
  • Ensured no data leakage by fitting preprocessors only on training data within the pipeline.

🤖 Model Training & Evaluation

The following models were trained using Repeated K-Fold Cross-Validation:

  • Linear Models: LinearRegression, Ridge, Lasso, ElasticNet
  • Tree-Based: RandomForest, ExtraTrees, GradientBoosting, XGBoost
  • Others: KNN, MLPRegressor

📊 Metrics Evaluated

  • R² Score
  • MAE
  • RMSE

All results were compiled and visualized to compare model performances.