Evaluation of Machine Learning Models to Predict Student Academic Performance Using Structured Educational Data
- 1 Faculty of Computer Science and Applications, Smt. Chandaben Mohanbhai Patel Institute of Computer Applications, Charotar University of Science and Technology (CHARUSAT), Changa, Gujarat, India
Abstract
This study analyses the use of machine learning for predicting the academic performance of students using their academic information from the institution, combined with socio-economic information that comes from outside sources. The collection of information is done using structured questionnaires as well as through data extraction from the Student Information System (SIS). To increase the reliability of models built, a sharp preprocessing pipeline, i.e., exploratory data analysis, feature selection, missing values filling, and class balancing procedure, was used. Several machine learning models, such as Linear Regression, Logistic Regression, Support Vector Machine (SVM), Naive Bayes, Decision Tree Regressor, Gradient Boosting, and XGBoost, were tried and tested with typical performance evaluators, which include R2 score, Mean Squared Error (MSE), precision, recall, F1-score, and accuracy. The findings show that the performance of the models increased considerably after consecutive data preparation and hyperparameter tuning optimization. The analysis of the experiment shows that the presented framework is quite stable in terms of regression and classification tasks. The Support Vector Machine (SVM) had the best R2 score (0.9125) with the lowest MSE (0.0097), followed by Gradient Boosting, XGBoost, and Decision Tree Regressor and is deemed to have a good predictive power. The Logistic Regression (Balanced) in the classification models showed good overall performance with the accuracies of 89 percent and the high values of precision and recall outperforming Naïve Bayes by 6%. All these results clearly show that the particular modeling approach can withstand any test and is generalizable enough, being also quite good at solving educational data prediction problems.
DOI: https://doi.org/10.3844/jcssp.2026.1721.1742
Copyright: © 2026 Hardik Ishwarbhai Patel and Dharmendra Patel. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 49 Views
- 9 Downloads
- 0 Citations
Download
Keywords
- Machine Learning
- Exploratory Data Analysis
- SVR
- Naïve Bayes
- Logistic Regression
- XGBoost
- Gradient Boosting
- Cross Validation