Using Decision Tree Algorithm to Predict Student Performance

Objective: Everyone has the right to education. For Higher Educational Institutions, students serve as its best asset. The prediction of students’ success in their academic performance is then vital for it will benefit both students and professors, enabling the latter to do proactive measures and find ways in helping students learn, ultimately improving their academic performance. Methods: This study utilized the Data mining technique, specifically; the J48 algorithm was used to create the Decision Tree Model in predicting the Student Performance in Data Structures and Algorithms. For model accuracy, K-fold cross-validation and Receiving Operating Characteristics Curve (ROC) was used. The datasets used were collected from the grades of 2nd year BSIT students enrolled during the school year 2015-2016. Findings: The generated Decision Tree Model results showed that Finals had the highest instance and in predicting student performance in the Data Structures and Algorithms subject. It also shows that Finals has the highest factor to receive either of the following remarks: Pass, Failed or Conditional. The model was also able to identify 85.31% accuracy for the attribute Pass, 79.41% accuracy for the attribute Conditional and 91.67% accuracy for the attribute Failed. Further, the Decision Tree Model likewise revealed that for the student to pass the Data Structures and Algorithms subject they should have a grade higher than 66.12% in Midterms and a grade higher that 72.30% in Finals. Application/Improvements: The use of the data driven system can be used by institutions to track student performance. Data analysis is a key component to further strengthen their policies and do intervention programs where it is highly needed. Further, for more improvement of this study additional data mining techniques can be applied. *Author for correspondence


Introduction
Everyone has the right to education 1 . Education in the Philippine is prioritized by parents; it is indispensable, a national legacy which should be instilled in every generation 2 . The Philippine Educational System had undergone various development and changes to equip its graduates with the necessary skills to be competitive with other graduates from other countries. In fact, the Commission on Higher Education (CHED) had issued CMO 46, series versities collect an enormous amount of students' data, this remains unutilized and does not help in any decisions or policy making to improve the performance of students 3 .
Earlier identifaction of factors contributing to the low performance of students is important. Students' are the asset of a university. Students' performance (academic achievement) plays an important role in producing the best quality graduates who will become a great leader and manpower for the country thus responsible for the country's economic and social development 4 . Performance is an observable or measurable behavior of a person in a particular situation 5,6 . On the other hand, Academic Performance or Academic Achievement, represents the performance outcomes by a person indicating how far that person accomplished specific goals that were the focus on the different activities in the learning process 7 . Typically, student academic performance is measured by the grades acquired by completing requirements set by their professors. Meanwhile, results of a study views that student's performance is linked with the student's profile: attitude towards class attendance, time allocation for studies, parent's level of income, mother's age and mother's education 8 .
Data mining is the process of sorting through large data sets 9 . Its purpose is to identify patterns and be able to establish relationships to solve problems through data analysis ultimately allowing prediction of future trends. The main functions of data mining are applying various methods and algorithms to discover and extract patterns of stored data 10 . Further, its significance to decision making makes it an essential component in various organizations. Research interest in predicting student academic performance has been increasing. A research using 1,547 datasets made use of Decision Tree (ID3) method to predict the final grades of students 11 . Predictors like the Midterm Marks, Lab Test Grade, Seminar Performance, Assignment, Measure of Student Participation, Attendance, Homework and Final Grade Marks were used. The use of ID3 resulted in the following: 292 students were "Excellent, " 536 "Very Good, " 477 "Good, " 188 "Acceptable" and 54 "Fail". Additionally, another research made use of Educational Data Mining (EDM) from the 60 students datasets from MCA course in Pimpri Chinchwad College of Engineering at Pune University 12 . Attributes like students graduation percentage, assignment work, attendance and unit test performance were used to determine how these affect the students' university result. The findings of the research were that for the student to have good performance, a student should be good in their attendance, assignment and Unit test.
On the other hand, a research to determine the success of students in higher educational institutions was made through the use of the J48 alogrithm 13 . The researchers conducted a 60 questions survey covering the following fields: Social activity, relationships, health and academic performance. Results show that age, work, gender, stage and status has fewer effects on students' success, but students' GPA, credits, list of important notes, father work and fresh food was the most significant effect on the student success. A Research was also made on 158 students of the Information Technology Department of King Saud University, Saudi Arabia by using three classifiers: C4.5 decision tree, Naïve Bayes, and JRip 14 . The student performance of students enrolled in its Data Structures subject was the focus of the research for it has the high failure rate. Student ID, student name, grades in quiz 1, quiz 2, quiz 3, midterms 1, midterms 2, project, tutorial, final exam and total points obtained were the attributes used, from which the attirbute midterm 1 was the highest indicator in determining students' performance in the subject.
From these literatures, it can be said that predicting student academic performance is crucial in helping educators plan and strategize their lesson delivery. In conventional teaching environments, educators are able to obtain feedback on student learning experiences in face-to-face interactions with their students, enabling continual evaluation of their teaching programs 15 . But with the integration of technology in learning environments, in order to get this information, educators must find other ways to attain these. Results of the predictive model will help educators take measures to help improve students' performance. This study will make use of the J48 algorithm a data mining technique in the prediction of the academic performance of students in their Data Structures and Algorithm subject.

Methodology
This study will make use of the Knowledge Discovery in Database (KDD) process. KDD revolves on the investigation and creation of knowledge, processes, algorithms and the mechanisms for retrieving potential knowledge from data collections 16 .

Data Collection
A total of 108 datasets were collected from the grades of the BSIT 2 nd year students enrolled in Data Structures and Algorithms during the school year 2015-2016.

Lab Exercises/Project (LEP)
Lab Exercises are given to students after finishing the topic. These exercises are designed to challenge students with their critical thinking skills. Project is given after the Midterms Exam and serves as a completion requirement for the subject.

Quizzes (Q)
May come in the form of announced and pop quizzes. These are used to gauge students' understanding and comprehension of the lesson. Grades are computed as Raw score divided by the total number of items multiplied by 35 plus 60.

Midterms (M)
This is given during the middle of the semester. This helps the professor in determining how the students learned and fully understand the lessons.

Finals (F)
This is given before the end of the semester.

WEKA Software
The Waikato Environment for Knowledge Analysis (WEKA) software was used in the study. With GNU General Public license, WEKA is an open source software. WEKA is a collection of machine learning algorithms for data mining tasks, which contain tools for data preparation, classification, regression, clustering, association rules mining, and visualization 17 .

Data Mining Process
Students grades are stored using the MS Excel application and then later converted into a Microsoft Excel Comma Separated Values File (.csv). Notepad++ was used to load the .csv file, and at this point, data cleaning is performed by eliminating unwanted symbols (e.g. spaces, comma and colon). As a requirement for the WEKA application, the following syntax: @Relation, @Attribute and @Data were included. Still, with the use of Notepad++, the file is then converted to Attribute-Relation File Format (ARFF). This file format was developed for use with the WEKA Software. It is an ASCII text file that describes a list of instances sharing a set of attributes. Information is then uploaded to the WEKA Application and the conversion of the pre-processed raw data to a more understandable file format.
Next, the data modelling stage consists of five phases: Training, pattern, testing, result evaluation and knowledge representation. This is also where WEKA is used for the prediction of the Student Performance in the Data Structures and Algorithms Course. Next, cross-validation was used. Cross-validation is a model evaluation method where the entire data will not be utilized when training a learner 18 . Its most straightforward technique is called the holdout method. Here, data is divided into two, namely, the training set and test set. The training set is used to train the model, while the test set is used to evaluate it.
The J48 algorithm is used in the training stage and was used to build a model. The J48 classifier is a simple C4.5 decision tree for classification for the creation of a binary tree 19 . The testing stage, on the other hand, is where the K-fold cross validation is performed. This study made use of 10-fold cross-validation. K-fold cross-validation is one way to improve the holdout method where the data set is divided into k subsets, and the holdout method is repeated k times. For each repetition, one of the k subsets is used as the test set and the other as the training set. For model accuracy, the Receiving Operating Characteristics Curve (ROC) Area under ROC Curve technique is used. ROC Area under ROC Curve techniques is a universal biostatistical tool for describing the accuracy of a model regarding predicting a phenomenon 20 . Figure 1 shows the graphical presentation of the pruned decision tree on Student Performance in Data Structures and Algorithms. Finals had the highest instance and became the basis for the first split between Finals <= 72.30 and Midterms > 72.30 in predicting student performance in the Data Structures and Algorithms subject. Additionally, Figure 2 shows the student performance decision rule that Finals has the highest factor to receive either of the following remarks: Pass, Failed or Conditional.

The Model
The confusion matrix in Table 1 reflects the correctly classified instances and the misclassification of the students' performance. A total of 108 classifications were made. The confusion matrix table shows the following results: • The decision tree has classified eighty-six (86) instances as PASS and six (6) as FAILED leading to six (6) misclassifications.   • The decision tree has classified five (5) instances as CONDITIONAL leading to zero (0) misclassifications; and • The decision tree has classified two (2) instances as PASS, one (1) instance as CONDITIONAL and eight (8) instances as FAILED leading to three (3) misclassifications. Table 2 shows the Cross-Validation Summary, wherein 91.67% instances were correctly classified as compared to 8.3% instances incorrectly classified. Results from Table  2 are supported by the results shown in Table 3 where it shows the complete accuracy by the class which the Precision weighted average of the student performance in Data Structures and Algorithms is 91.2%. The study also utilized the Receiving Operating Characteristics Curve (ROC) and the Area under ROC Curve (AUC) for model accuracy. Figure 3 shows  Cross-Validation 10-folds

Student Performance
For higher education institutions whose goal is to contribute to the improvement of the quality of higher education, the success of creation of human capital is the subject of a continuous analysis 21 . Result of the study made in Cordoba University, using 438 datasets in 7 Moodle courses, showed that Quizzes was the main determiner for the final marks of the students 22 . Though, Quizzes was the main determiner for the good performance of the students, the researchers also mentioned that the result could help teachers decide to promote the use of some activities to obtain higher marks or eliminate some activities because they are related to low marks. The prediction of students' success in their academic performance is then vital for it will benefit not only the students but its professors as well. Professors in their part, will be able to proactive measures in helping students and find ways to help students learn ultimately improving their academic performance. The Decision Tree Model was able to predict 85.31% accuracy for Pass, 79.41% accuracy for Conditional and 91.67% accuracy for Failed based on the ROC curve shown of Figure 3. The Decision Tree Model likewise revealed that for the student to pass the Data Structures and Algorithms subject they should have a grade higher than 66.12% in Midterms and a grade higher than 72.30% in Finals. The Finals attribute serves as the highest indicator that can affect the student. Data Structures and Algorithms is essential for BS Information Technology course. Data structures refer to the way information is organized, while algorithms refer to the step-by-step procedure used to solve a problem. To be a good programmer, these two should be mastered by the students.

Conclusion
Research interest in predicting student academic performance has been increasing. Knowing beforehand the attributes that significantly affects the performance of student greatly helps professors in doing proactive measures for the students' benefit. This study focused on the attributes of the Data Structures and Algorithms course that will affect students' performance. J48 algorithm was used for the creation of the decision tree model, therefore, identifying that the Finals attribute gained the highest indicator that is crucial for the students passing the subject. Most importantly, a model was established in determining the Students Performance in Data Structures and Algorithms as shown in the Decision Tree, Confusion Matrix, ROC, and the Area under ROC Curve. Further, the use of the data driven system can be used by institutions to track student performance. Data analysis is a key component to further strengthen their policies and do intervention programs where it is highly needed.