Data mining techniques for classifying and predicting Teachers’ performance based on their evaluation reports

Background/Objectives: Teachers’ performance is a key bridge to ensure successful pedagogical and educational objectives. However, the evaluation of teachers’ performance has been used to be a manual and temperamental task for school principals. This traditional context limits the teachers’ engagement to develop his/her performance as well as the principle to predict the strengths and weaknesses attached. Hence, schools’ principals need to use initiative methods to evaluate the teachers’ performance. In this study, a comparative approach was developed to evaluate the teachers’ performance aiming at avoiding the potential biased and temperamental human behaves in the teacher’s evaluation process. Methods: It involves different Data Mining (DM) techniques to identify the key patterns that are driving the teachers’ performance evaluation process. Therefore, the proposed approach extracts several potential and influential indicators mined from a paper-based on teachers’ performance reports at the Directorate of Education/ Southern Ghawrs, along with some demographics variables. Several DM algorithms are used to analyze teachers’ performance reports and predict their performance, such as NB Tree, Naïve Bayes, and Conjunctive Rule methods. Findings: The experimental results show a significant prediction accuracy improvement by (33%) when applying NB Tree compared to Conjunctive rule, and (12%) when compared to Naïve Bayes techniques respectively.


Introduction
Data Mining (DM) is the process to analyse the hidden patterns within data collection to classify them into useful information (1)(2)(3) . The basic motivation behind DM is that these large data sources contain information that is of value to the data owners, but this information is concealed within the mass of uninteresting data and remains to be uncovered (4,5) .
https://www.indjst.org/ DM techniques help to retrieve the important and relevant information (i.e classification ) in what has been so-called Educational Data Mining (EDM). However, EDM analytics need to be improved with Machine Learning (ML) algorithms, such as, such as Naïve Bayesian, Conjunctive Rule, and NB Tree and many other algorithms, which train the computer how to make sense of data, and then to make predictions about new data sets. EDM techniques can support the students' behavior, assist teachers, improve teaching, evaluate the performance of teachers and the learning system, improve curriculums, and many other benefits (3,6) .
Teachers' evaluation process involves personal and academic data to perform a periodical performance evaluation. It produces the required information that leads to a reasonable decision on their performance based on superiors' evaluation reports. However, the evaluation process needs to be objective to ensure the desired learning outcomes (7,8) . Hence, it is worth to extract the hidden, but useful knowledge from data through DM tools. DM techniques can be used to build a performance prediction system that concentrates on teachers' continuous assessment based on evaluation reports. Based on these DMTs and Machine Learning(ML) classifiers, a set of generated rules is used to predict teachers' classes based on their classification data (1,9,10) .
In Jordan educational context, teachers work within a system of integrated functional chains supported by traditional methods of evaluation and motivation (11) . For instance, the evaluation process might lead to an imbalance evaluation in determining active or weak performing teachers since it depends on superior abilities and predictions. Furthermore, these methods exponentially consume human resources time and effort to filter and collect convenient data for the evaluation process, which makes the evaluation process mainly inaccurate. Besides, the traditional evaluation system generally involves bias and personal considerations between the teacher and the evaluator. Therefore, this study is an attempt to optimize the evaluation process by using DMT and Machine Learning(ML) collaborative techniques to guarantee the right decision when superiors tend to examine the teacher's performance.

Related Work
Educational data mining is the process to analyze the different data mining techniques to analyze educational data (9) . Usually, predicting people's performance has a significant issue in many organizations such as educational institutions. Hence, Data Mining techniques have been used to reduce this challenge and support predicting the performance of students and teachers. Shahiri, et al. reviewed those studies that have been applied to predict the students' performance in their schools (10) . The study showed the frequent methods that have been used on students' performance prediction were Neural Network and Decision Tree.
Chalaris et al., (2014) focused on how the use of data mining techniques on educational data to prove a useful strategy for the administration of Higher Education Institutes (HEIs) and addressing the crucial challenge and shortages of improving the quality of educational processes. In addition, this study aimed to support decision-making based on knowledge previously unknown and hidden inside the institutional resources (12) .
Al-Barrak and Al-Razgan (2016) applied a data mining technique in educational data as a case study to improve the students' performance and detect their Grade Point Average (GPA). They used decision trees as classification techniques within WEKA software in a different course in the study plan to extract useful knowledge from GPA. This study showed a significant advantage to identify the important course in the study plan based on the classification of student grades (13) .
Pal & Pal (2013) proposed a model-based DMT to evaluate teacher's performance by different factors. They collected sample data from postgraduate engineering students over three years. Their proposed model considers the various aspects of teachers' performance measures that have a profound influence on the teachers' performance such as Students' Feedback (voice modulation, the speed of delivery, content arrangement, presentation, communication, overall impression, content delivery, explanation power, overall teaching and regularity, Results, Students attendance). The results showed that Naïve Bayes Classifier achieved the highest accuracy of (80.35%) followed by the LAD tree with a percentage of (75.00%) and subsequently CART (14) .
Ola & Pallaniappan adopted an intelligent technique for the evaluation of instructors' performance in higher institutions of learning. They proposed an optimal algorithm and designed a system framework that is suitable for predicting instructors' performance. The technique overcomes the limitations of the existing techniques and improves the reliability and efficiency of instructors' performance evaluation system. Also, it provides the basis for performance improvement that optimizes students' academic outcomes and improves the standards of education. Consequently, it contributes to the achievement of the goals, it also helps to produce efficient plans to improve the learning process (15) .
Ahmadi and Ahmad (16) analyzed the performance of the final Teacher Evaluation by using association rules and J48 Tree in the teacher evaluation process. Their study adopted a popular data mining methodology called Cross-Industry Standard Process for Data Mining (CRISP-DM), which is a six-step process: (problem description, understanding the data, preparing https://www.indjst.org/ data, creating the models, evaluating the models, and using the model). While Ajay and Saurabh (2013) discussed the teachers' performance evaluation by applying different data mining techniques on University teachers' data. The model considers the various aspects of performance measures of teachers that have a deep influence on the teachers' performance in university, such as Students' Feedback (voice modulation, the speed of delivery, content arrangement, presentation, communication, overall impression, content delivery, explanation power, overall teaching and regularity, Results, Students attendance). Their proposed model combines the knowledge and expertise of human experts with reasoning capabilities that will provide great support to the decision-making process in educational institutions. Overall, the accuracy was 80.35%,65.17%, 75%, and 75 for Naïve Bayes, ID3, CART, LAD Tree respectively (16) .
Mardikyan and Badur (17) conducted a study to understand the key factors affecting the teaching performance of the instructors through regression and Decision Tree algorithms. The data were collected anonymously from students' evaluation records to identify the factors associated with the teaching performance of instructors, and variables related to instructor and course characteristics. In another study by Sok-Foon et al., (2012), they used a questionnaire instrument to identify the most influential factors on the lecturer performance among undergraduates in private universities in Malaysia. They use a total of 223 respondents who were recruited using multistage sampling. The results showed that the lecturer and tutor characteristics, subject characteristics, the studentship and learning resources and facilities were positively correlated with overall lecturer performance at a significant level(p<=0.5) (17) .
Agaoglu, (18) randomly collected data from several departments at Marmara University, Istanbul, Turkey. A total of (2850) evaluation scores were obtained. He used (70%) of the data for training the classifier models and the remaining (30%) for testing. In addition, seven classification models were used: two of them by using decision tree algorithms (C5.0, and CART), the second one was by using Support Vector Machines (SVM), the next three were by using Artificial Neural Networks (ANNs), the last one was by using Discriminant Analysis (DA). The performances of these models were evaluated on the test data in terms of accuracy, precision, recall, and specificity. Further, all the applied classifiers were compared using evaluation measures (18) .
Alom & Courtney (19) argued the role of student gender on successive rates of educational completion in Australia. Implications for future lines of inquiry are discussed (19) . Their study describes the application of data mining, machine learning, and statistics on data generated from educational settings. Chaware & Lanjewar, (2018) showed that institutions give major focus on infrastructure, qualified faculty, marketing of institutions, Value-added programs, etc. They argued that EDM techniques should be implemented for better decision-making by management; and by doing so, we can understand student's trends in a better way so that it can be applied to upcoming batches. Furthermore, it provides a systematic review of EDM for higher education sustainability (20) .
In summary, based on the findings of the previous section (literature review), different DMTs were applied to teachers' performance prediction. These techniques incorporate the perceptions of teachers' evaluation, performance prediction and the traditional methods that have been used during the evaluation process. Also, findings showed that a unique opportunity to develop a method that can effectively predict the current teachers' evaluation process status and the perception of their performance is applicable. Moreover, evidence from the literature review indicates that teacher evaluation will affect the whole education system in Jordan. Therefore, the feature of this study stems from its aim to propose an inclusive model based DMTs that incorporate all influential factors. The proposed model fills the literature gap by providing the simplest implementation including all important features that build up a reliable performance prediction model. The proposed model smartly simulates the process of teachers' evaluation to help educational institutes addressing its challenges and problems.
Furthermore, DMs provide work patterns that help in the earlier identification of well-performing teachers (9,21) . This study enables superiors to refocus on the criteria related to teachers' capabilities and thereby enhances their performance. Furthermore, this study investigates several samples include teachers from different schools in South Ghawrs Directorate; and consider several attributes of the teacher to be correlated with the proposed method. The supervised classification employed several DM techniques including Naïve Bayes (NB), Naïve Bayes Tree (NBT), and conjunctive Rule (CR), which are implemented using the WEKA 3.6.13 DM software tool. These classifiers were used among others as they are the most widely used ones in DM for such studies

Methodology
The study aims at developing a prediction model for teachers' performance. This model was developed using the supervised classification methods of DM techniques. The supervised classification employed several DM techniques including Naïve Bayes (NB), Naïve Bayes Tree (NBT), and conjunctive Rule (CR), which are implemented using the WEKA 3.6.13 DM software tool. Figure 1 shows the architecture of the proposed prediction model based on the study methodology.
In Figure 3.2, the architecture of the proposed prediction model can be explained as follows: https://www.indjst.org/

Data processing
This phase aims at acquiring data from the teachers' evaluation reports and then eliminating the irrelevant characteristics or attributes such as marital status and date of hiring and the empty row of data. The real dataset was collected from teachers' evaluation reports in the Ministry of Education, Jordan in South Ghawrs directorate as a case study. The data include evaluation reports of teachers for previous years, which contains 1100 teachers. The collected data were filtered using Microsoft excel to remove single records and potentially misidentified attributes from the list before analyzing to increase the accuracy of data mining results and assign the final evaluation decision using the weighted measurements caption categories (Scores(S) out of 100 points) : (Excellent: S >=84; Very Good: 76> S <84; Good: 65> S <76, Accepted: 60 > S <65; and Weak S<60). Table 1 shows a sample of teachers' evaluation attributes. The filtering results defined (1000) valid cases.

Data selection and transformation
This phase includes the parameters which were selected for data mining. In this study, two experiments were carried out with a different number of selected parameters or attributes to examine the evaluation performance of teachers. The first dataset(1' st Selected attributes) involved all attributes; while the 2' nd dataset (2' nd selected Attributes) involved (19) attributes as shown in Table 2. These attributes were selected based on the criteria of its availability in the evaluation reports. Attributes 1, 2, and 5 were removed since they are demographic variables and can't be used for evaluation to avoided biased decisions.

Data splitting
In this phase, the data set is divided into two partitions namely: Training and Testing samples. A common split value of training partition is 80% to 20% for testing samples respectively. Table 3 shows a description of every dataset partitions. Out of the total (1000) cases included in this study, 800 (80%) were used as the training set; while 200 cases (20%) were used as the test set; however, (50%) of the cases from the test data were used in the validation set.

Prediction model building
This study adopts three commonly used classifiers namely: NB, NBT, and CR to build the proposed prediction model for evaluating the teachers' performance based on data extracted from the evaluation reports.

Model evaluation
This task aims to investigate the performance of the used classification techniques. Therefore, the validation process allows the model with class labels "hidden" to predict the label assigned by the proposed model by comparing the original class label with https://www.indjst.org/ a hidden label, then calculate the corresponding degree between them. Further, there are two situations of prediction; the first one is when two labels (actual and predicted) are the same; the prediction to this sample is counted as a success; otherwise, it is an error. The weighted averages of the models were evaluated using different performance measures based on evaluation parameters as follows (22) : • True Positive (TP) and True Negative (TN) are the correct classifications in samples of each class, respectively. • False Positive (FP) is when a class predicted sample is incorrectly predicted as an actual class sample. • False Negative (FN) is when a class sample is predicted as a class predicted sample.
Then, the performance of the adopted classifiers was compared using different performance measures including Precision(P), Recall, F1 measure, Accuracy(A), and Error Rate (ER) as shown in equations 1 to 5 respectively.

Experiments setup and results
To address the issue of teachers' evaluation; a specific prediction model was developed using several data mining techniques, namely NB, VBT, and CR. The data is analyzed and implemented in WEKA (Waikato Environment for Knowledge Analysis), which is a common open-source software for DM and ML. The next step after loading the dataset into WEKA pre-process panel is the model construction to implement the different classifiers with 10-fold cross-validation, without an 80% percentage split. This means that 80% of the 1' st selected attributes and the 2' nd selected attributes dataset was used for training, and the remaining was used for testing and prediction. The classification accuracy results for both scenarios (1' st and 2' nd selected attributes) are presented in Table 4. The proposed approach has been evaluated using 1000 teachers. The findings and results are obtained from the output of three algorithms. First of all, the results of classification using different algorithms are analyzed. The performance of algorithms is evaluated based on precision, recall, and F-measure. Table 5 discuss the results of NB Tree, Conjunctive Rule, and Naïve Bayesian. As far as this study and as shown in Table 4, there is a significant general improvement in the effectiveness of the evaluation system using NB Tree more than other methods used in teachers' evaluation.
https://www.indjst.org/ Table 5 shows the other classification results like mean absolute error, root means squared error, and relative absolute error. Table 4 proves the improvement in the proposed approach by using NB Tree in a total of correctly classified instances. Whereas the NB predicted with 186 of instance, followed 158 and 100 of Naïve Bayes, and Conjunctive Rule respectively. As for other error measurements, the system proved that the tree method was the best and this was clear from the values as shown in Table  6. As can be seen in Tables 6 and 7 that have shown the proposed method achieved a slight improvement in both two selected training sets either through using 60 points for academic features only or with demographic characteristics. Hence, the proposed system proved that the demographic did not effect in the overall performance of the system. From this point, we can observe the system was able to prevent bias in the evaluation more than human methods. Table 8 shows the distribution of the performance values like Excellent, Very Good, Good, Weak, Accepted through using two selected attributes set. The experiments were designed to evaluate the performance of the classifiers for computer science, Information System. In this study, six experiments are carried out to train and test each model using three classifiers Naïve Bayes, NB Tree, and conjunctive rule as shown in Tables 6 and 7. The first three experiments were conducted in this study contain 13-selected attributes. While the other three experiments were on 19 selected attributes. As can be seen in the above Tables 6 and 7, the proposed model achieved the highest results were by NB Tree as 93% and 92% through using 60 points of academic features only and 60 points with demographic within 1.04, 3.07 seconds respectively. The medium results have been achieved by the Naïve Bayes algorithm by using two selected attributes as 79%, 80.5% within 0, and 0.01 seconds respectively. Likewise, the lowest results were using the conjunctive rule through two selected attributes as 50% and 58.5% within 0.02 and 0.03 respectively.
6 Results and discussion [3][4][5][6][7] showed that NB Tree is superior compared to other methods though using two sets of selected attributes for training and testing. Thus, NB Tree may be utilized in handling the problem of teachers' classification and evaluation. Based on Table 5 accuracy of the three classifiers, the NB Tree achieved more correctly classified instances than other classification techniques.
https://www.indjst.org/ It classified 186,184 out of 200 of the test set with an accuracy of 93% and 92% for two sets of selected attributes respectively. Figure 2 illustrates more about correctly and incorrectly classified Instances in the three algorithms that have been used.  Table 9 and Figure 3 show the compared results by using three classifiers. Those classifiers are NB Tree, Naïve Bayes, and Conjunctive Rule. Overall, three classifiers were tested on two selected available attributes within all instances. In this Table, two evaluation parameters are used to investigate the performance of the proposed model using different classifiers like True Positive (TP), and False Positive (FP) rates. Overall, NB tree classifiers have shown a significant improvement in distributed TP and FP more than the other two classifiers. In addition, Figure 3 shows the results have been obtained by three classifiers. The results indicate that the NB tree trained evaluation reports to have a certain level of capability to classify teachers that are distantly related by sequence. NBT can find https://www.indjst.org/ the common factor in a diverse set of training data set, and use the common factors to find the optimal classification. Thus, this proposed method may be used as a complementary method to those sequence alignment methods in performance prediction  Table 10 shows the total number of teachers whose performance was predicted based on the matrix mentioned in Table 1. Also, Table 10 shows the proposed approach has successfully predicted (24,57,68,37) and (23,66,62,33) out of 200 for excellent by using NB Tree, Naïve Bayes, and conjunctive Rule classifiers within two selected attributes respectively. Likewise, the proposed system has successfully predicted (57,47,30) and (66,56,55) out of 200for very good through three classifiers within two selected attributes respectively. While it predicted (68,48,70) and (62,49,62) out of 200for very good performance using three methods respectively, followed by (68,48,70) and (62,49,62) out of 200 for good, then (37,39,2) and (33,32,0) out of 200 for Accepted and weak for three methods respectively. As a result, the findings confirmed that the proposed method was quite ideal in classifying teachers and predicting teachers' performance by using NB Tree as described in Table 10, 11, and Figure 3.

Conclusion and future work
The results after implementing the classification methods have shown an improvement in teachers' evaluation by allowing principles to actively predict teachers' performance. This predictive model for teachers' performance utilizes DM and ML techniques. The classification technique is carried out using the Naive Bayesian, NB Tree, and Conjunctive Rule algorithms. Two models for each of these algorithms are built within different selected attributes from the training dataset, and the best overall classifier model from them has been detected. Based on the results of the developed model with three classifiers, NB Tree achieved a higher accuracy level than the other classifiers used for the model evaluation. Moreover, results show that there is no significant difference in results either by using only academic characteristics or using them with demographic characteristics. This means that there is no effect by demographic characteristics on the final evaluation of the teachers. Therefore, the proposed model has succeeded in eliminating bias in the human methods, which are usually exposed to these factors such as gender, qualification, category. Henceforward, teachers' evaluation became more realistic and logical using this model compared to the traditional methods used. In future works, the prediction of teachers' performance can be improved by using more attributes that may affect the evaluation of teachers' performance. For example; an important attribute that was ignored from this research is experience years, it is ignored for the large missing values. We can try to apply new data mining techniques or algorithms that may give more accurate results. We may use data from another destination like teachers' characteristics that are found in the civil registry of the ministry of the interior offers.