Mining educational data in predicting the influence of Mathematics on the programming performance of University students

Objectives: The aim of this study was to investigate the influence of mathematics to the programming performance of Information Technology students and identified the relationships of their performance in programming among genders. Methods/Statistical analysis: The study utilized the data mining method using J48 classification algorithm and descriptive-correlation design. The data were gathered from Electronic Database of the University. Failure ratings of the students were removed as significant outliers and came up with 73 data sets. Pearson r and Point Biserial Correlations were used with 0.05 level of significance alpha to test the correlation between continuous measures of independent and dependent variables. Further, descriptive statistics were used to describe the level of performance in mathematics and programming. Findings: The results show that students demonstrated a high performance in their mathematics in the modern world course with a mean rating of 2.16 (SD=0.27), and a low performance in their mathematics enhancement 1 with a mean grade of 2.81 (SD=0.38). Similar result in their programming course with a mean grade of 2.64 (SD=0.39). The mathematics performance of the students is significantly correlated to their performance in programming. The low performance in mathematics enhancement 1 corresponds to the low performance in programming. Moreover, female students performed better in their programming course compared to males. Applications: The results could help the teachers improve the quality of instructions particularly in mathematics and programming that will improve the performance of the students in both subjects. Concerned University administrators should conduct frequent assessments and curriculum revisits to examine possible areas of improvement beneficial to the students.


Introduction
Previous studies reported that mathematics contributes to the success of students in their programming courses. One of the cognitive prerequisites of learning programming is mathematics (1) . Programming courses are among the most important components of the curriculum to be studied particularly in Information Technology. However, studies (2)(3)(4) show that programming subjects are considered difficult as they need complex analysis skills to solve programming problems. According to the curriculum structure of the Information technology program, mathematics is included in the mandatory field of expertise as required by the Commission on Higher Education (CHED) among universities in the Philippines. Mathematics is integrated in the curriculum as it plays an important role in mastering information technology (2,5,6) . It helps the students improve their problem-solving abilities as such studies confirmed that mathematics found to be one of the contributory factors on the success of the students taking information technology program (7,8) . Therefore, students must have a strong background in mathematics to have greater success in taking programming courses (9)(10)(11) .
Previous studies suggest that mathematics is found to be significant in relation to computer programming wherein math is an avenue of learning essential skills that is necessary also for learning how to program (7,11) . Several works in literature on the relationship of mathematics and programming have been examined by the researchers to show the relevance of the present study. A recent study of (7) proves that there is a correlation on the mathematical ability of the students and their programming performance. The study was conducted on the 19 computer science students using correlation design. The findings also confirm the study of (12) that there is a relationship between a student's math ability and the performance of learning programming. Another study was conducted on the influence of mathematics in programming subjects, the results also show that programming is correlated with the mathematics subjects (3) . The research was based on the results of all mathematics and programming subjects taken by 99 students who already graduated in computer science. They concluded that students that get better results in mathematics subjects will get a better result in programming subjects. A similar study of (2) also proves that there is a relationship of the performance of the students in mathematics and programming. The data were collected from the 132 students of computer science education study program using an achievement value data on basic mathematics courses and basic programming.
There are also discussions by some researchers in the gender gap between males and females in the field of computing, particularly information technology (13)(14)(15)(16) . A limited study has showed that male students have a higher ability in programming class than female students (17,18) . It was also supported by a recent study in Georgia where male students show a better performance than females (15) . However, other studies have reported both male and female students showed a similar level of performance when learning programming (19)(20)(21) . To confirm the gender gap and contribute to the up-to-date knowledge about gender differences in mathematical and programming performance, this study will also delve into the performance of the information technology students in mathematics and computer programming among genders.
Works on literatures on the prediction of factors of computer programming achievement revealed that cognitive and academic variables, computer exposures and demographics are strongly predicted class performance. Other studies considered variables like the learning styles of the students (22) , their comfort level (23,24) , self-efficacy (25,26) , personality type (27,28) and their mental ability (29) . However, considering the influence of gender and mathematics ability to the programming performance of information technology students remain unexplored. Previous researchers studied the relationship of mathematics and programming during the last year of the students in the program, studies believe that student's performance should be identified at the early stage of their academic journey, therefore, it is important to determine the performance of the students in mathematics and programming at the early stage.
Furthermore, educational data mining (EDM) was used in the present study. EDM has been on the realm of education research (30)(31)(32)(33) . It is used to extract useful data and create patterns from educational database for better understanding, improve the performance of the students, and their learning process (34) . It provides teachers and students as well as policy makers to take appropriate actions and decisions which will eventually improve learning achievement and will contribute to the improvement of academic success (35) such as providing online feedback, creating educational models, patterns, predicting the learning difficulties, etc. There are various data mining techniques used for educational data mining, such as classification techniques, association rules, sentiment analysis, and clustering. This study applied the classification technique using a decision tree model to predict the influence of the mathematics courses to the programming performance of the Information Technology students, specifically, using the J48 algorithm.
The results of the study may help teachers improve the quality of instructions in mathematics and programming courses to improve students' programming skills and outcomes. This will also guide on the revision of curriculum for the Bachelor of Science in Information Technology program in a Philippine State University that will help the students acquire necessary skills in their programming courses. EDM using J48 classification algorithm and descriptive -correlational was used in the present study to examine the influence of the mathematics performance and the programming ability of the information technology https://www.indjst.org/ students in a State University. This study also looked into the performance of the students in mathematics and programming between genders.

Objectives
This study aimed to determine whether the performance of the BS Information Technology students in mathematics is significantly correlated to their programming performance. This study also looked into the relationship of students' gender to their programming performance. Specifically, this study has the following specific objectives: 1. To identify the level of performance of the students in mathematics and programming; 2. To determine the relationship on the programming performance of the students between genders; 3. To measure the significant relationship between the level of mathematical performance and programming performance among the respondents, and; 4. To present patterns useful for predicting the influence of mathematics in the programming performance of the students based on the decision tree model. In Figure 1, the variables employed by the researchers were presented and classified as independent and dependent variables. The independent variables in the study includes the students' gender and mathematics performance in the courses -Mathematics in the Modern World and Mathematics Enhancement 1. These variables were categorically measured as nominal and continuous level of measurements. Mathematics performance is also affected by other factors which then disqualifies such to be independent in nature. The dependent variable, on the other hand, was identified to be the students' performance in a Programming course with a continuous scale level of measurement. Correlation analyses were calculated between the variables under study to investigate the influence of gender and mathematics performance towards the programming performance of the students. Also, the descriptive measures were calculated to describe the variables. Moreover, J48 classification algorithm was used to generate pattern on the performance of the students.

Research design
To obtain the objectives covered in the study, a descriptive-correlational design was employed (36)(37)(38) where it involves the collection of data and describing the status of the subject under investigation. Moreover, it was correlational in nature as it measures the extent of relationships between variables. In this study, the researchers also measured the extent of influence of the variables gender and mathematics performance on programming performance. Also, educational data mining technique was used in the study applying the J48 classification algorithm. This will be used to create decision tree model that will further determine the influence of mathematics in the programming performance of the students. https://www.indjst.org/

Datasets
The datasets used in the study were extracted from the database of the Electronic Management System of the university. The datasets are composed of the final ratings of the undergraduate information technology students from the three different courses, namely, Mathematics in the Modern World, Mathematics Enhancement 1, and Computer Programming 1 who were enrolled during the first semester of the school year 2019 -2020. The actual 87 datasets were used for the decision tree model using J48 Algorithm, however, only 73 were selected for descriptive-correlational part of the study. Those are the students who passed the programming course, this is in order to examine the influence of the independent variable in passing the programming course. Students with failure ratings in their mathematics courses were eliminated as it was considered as significant outliers. Further, the datasets of the students are composed of 60.3 percent of females and 39.7 percent males. Most of the students are between the age of 18 and 21 years old (83.6%), while only 2.7 percent are aged above or equal to 29 years old.

Ethical consideration
In compliance of the Data Privacy Act of the Philippines, the researchers ensured that the protocols of conducting the study and collecting the data were followed. The utmost confidentiality of all the data gathered was assured and was solely used for the study purpose. After the analysis, the data were deleted from the computers of the researchers.

Data gathering procedure
The data used in the study were collected from the database of the Electronic Management System of the University. The data were collected from the Registrar Office of the campus on the submitted grade sheets of faculty of the students involved in the study. There were 87 datasets used for generating patterns from the decision tree model with four variables (Gender, Mathematics Enhancement 1, Mathematics in the Modern World, and Programming 1).

Treatment of data
The researchers hypothesized that the gender of the students and their performance in two mathematics courses do not have significant bearing on their programming course. The study employed the correlation analyses using Pearson r correlation and Point Biserial Correlation. These methods were used to estimate the correlation between continuous measure of independent and dependent variables, and between categorical independent variable and continuous dependent variable, respectively (39) .
The tests were done using IBM SPSS software to test the hypothesis (37) . Further, the descriptive statistics (40) such as, mean (M), percentage, standard deviation (SD) and coefficient of variation (CV) were computed to describe the level of mathematics and programming performances. It was set at 0.05 level of significance alpha to reject the bull hypothesis in the study.
Furthermore, the data were encoded in Microsoft Excel application and saved in a CSV format and transformed into a nominal type of data for algorithmic analysis. Waikato Environment for Knowledge Analysis software was used to generate patterns and decision tree model based on the J48 algorithm.

J48 Algorithm
The J48 algorithm is the implementation of ID3 (Iterative Dichotomiser 3) algorithm developed by the WEKA project team. It is the improved algorithm from ID3 which deals with both discrete and continuous variables, missing values and the pruning process of the tree after construction. The classifier used by J48 is a decision tree that is built from root to leaves. It uses information gain as its attribute selection measure by letting a node to hold the tuples of partition D. The attribute with the highest information gain is chosen as the splitting attribute for the node or the root node. The expected information needed to classify a tuple is given by

The information gain is the difference between the original information requirement and the new requirement, that is, Gain (A) = In f o (D) − In f o A (D).
The results from Gain(A) tells how much would be gained by branching on A. The attribute A that gained the highest information gain from Gain(A), will be the splitting attribute or will be the root node. This means that the attribute A with the highest information gain will do the best classification.

Results
This study employed educational data mining method utilizing J48 classification algorithm and descriptive and correlation analyses to obtain the goal of the study. Particularly, we utilized the point biserial and Pearson r correlations in order test the hypothesis. The sequence of the results starts with the mathematics performance of the students in the mathematics enhancement 1 and mathematics in the modern world, followed with the correlation analysis results, and patterns from the generated tree model. Table 1 shows that most of the students at 71.2% obtained only a passing grade on the course. The average performance of the students was 2.81 (SD = 0.38), interpreted as "passed". The variation of students rating from the mean was calculated at 13.52% which indicates a somewhat homogenous performance by the students on the course.   Table 2 shows that highest percent of students at 52.1% achieved a rating of 2.0 to 2.4 which was interpreted as "very good" for the course mathematics in the modern world. The overall performance of the students was 2.16 (SD = 0.27), interpreted as "very good". The variability of the performance as a whole by the students from the average was 12.5%.

Mathematics and programming performance
In terms of programming performance as seen in Table 3, out of 73 students, 43.8% got a rating of 2.5 to 2.9 which was interpreted as "good". A 14.8% of coefficient of variation of the ratings was computed which indicate a 14.8% of dispersion of students' performance on the course as whole from the mean performance. The overall performance of the students was recorded at 2.64 (SD = 0.39), interpreted as "good". Table 4 llustrated that female students on the average has a rating of 2.63 (SD = 0.38), interpreted as"good". Whereas, male students obtained an average performance of 2.66 (SD = 0.38) which was interpreted also as "good". The female students as a https://www.indjst.org/ whole performed better compared with male students in the programming 1 course based on the computed variation of ratings from the mean.

Correlation of mathematics performance and gender on their programming course
The results in Table 5 indicates that out of three identified independent variables, only the mathematics enhancement 1 was found significantly correlated on programming course (r = 0.316, p = 0.001). The degree and direction of the relationships between variables were "medium correlation" and direct relationship, respectively (39) . The low academic performance of respondents on mathematics enhancement 1 course corresponds to a low academic performance on programming course (41,42) .  Using the 10-fold cross validation, the results in Table 6 indicates the generated patterns based on the result of J48 algorithm. The result also shows that the mathematics courses greatly influence the programming performance of the students. The table shows that mathematics enhancement 1 (Math_En as the root node of the tree) also significantly influence their performance in programming course. The model also supports the previous analysis that female information technology students performed well in programming (Sex = F: P_Very_Good). Figure 2 shows the decision tree model. https://www.indjst.org/

Correlation of constructs to programming performance
The results of data analyses in this study shows the direct relationship between mathematics performance and programming performance among university students. The extent of relationships among constructs can be observed on their students' performance (grade ratings) on three subjects studied (43) . In the present study, the Mathematics in the modern world is not significant to programming performance of the information technology students.
The Mathematics Enhancement 1 revealed a medium correlation to programming performance on computational thinking skills learned in College Algebra and Trigonometry (44) . It allows university students to formulate logical problems with integration of computers in creating the solution (45) on their recent study mentioned five dimensions of computational thinking on creativity (46) algorithmic thinking skills (47) , collaboration (48) , critical thinking (26) and problem solving (42,43) .
In Eastern Visayas State University Programming I is offered at the early stage of the information technology curriculum which focuses on C Programming Language. This is a low-level programming language that serves as algorithm platform to known programming languages such as Java, PHP, Python and other higher programming languages. It covers basic programming principles on problem solving, algorithm, flowcharting, writing source code and program simulation (42) . These allow students to learn programming skills in structured and object-oriented algorithms. The course requires deeper understanding of the problems through abstract reasoning, logical thinking and perform arithmetic solutions in solving the problems. Writing and developing computer programs further requires higher order thinking skills on the structure of the program and semantic rules on programming languages in the realization of solution to the problems (49) .
Several studies support this claim on students critical thinking skills (computational and rigorous thinking skills) on its impact to programming performance among students in the University (7,(50)(51)(52) . Further, students with better mathematics experiences will gain a better programming experience in the University (53) .
On the other hand, the study reveals to be significantly no relationship on the performance of the students in programming between genders. It can be associated to the students learning interest and self-efficacy on learning programming is a critical factor in determining their programming performance as supported by existing literature (54,55) . The respondents' mathematics and programming performance is in contrast to the existing literature between male and female students (53) .

Curriculum enhancement to improve the performance in programming
University curriculum aims to provide quality education to students by constantly evaluating its learning outcomes (44) . It can be observed that student's academic satisfaction is associated to the quality education offered by the university in terms of different indicators related to instruction (56) . Setting expectations, metacognitive reflection, simulating, modeling is among the ten tools mentioned by the researchers for effective learning in teaching programming and mathematics (57) . Researchers used https://www.indjst.org/ ScratchMath program in developing students computational thinking skills and positive attitudes towards mathematics through programming activities. Several technology integration in the university has been observed aimed to achieve students' academic performance in the university. Wherein the academic performance of the students is important for educational institutions where it will help for a better strategic planning to improve the quality of the students (58) . Applying technology in education provides a new way of teaching and learning as it also brings to the educators and students opportunities and challenges in instructions (59) . Moreover, the development of mobile applications on learning mathematics and programming serves as class intervention materials for students' learning difficulties (60) . The application provides mathematics support to students on the mathematical problems presented on the screen and provides mobility access to contents (61) .
Currently the University is using blended learning to the students. Students access to mobile learning technology, learning management system and access to video content on YouTube Education; offers self-paced learning approach that elevate students' learning experience in the university and thus achieve academic performance (62,63) . The development of various learner's modules, teacher guides, worktext and information sheets in the university contributed greatly to students flow on learning engagement in their programming performance (49,64) . These learning materials can be utilized in pair programming approach to enhanced performance among slow paced students (48) .
Conforming to the results of previous studies, curriculum writers for the information technology program should give mathematics an importance as it contributes to the success in programming particularly to beginning programmers. Possible improvements to help students acquire a deeper understanding to mathematics would result to an increase of performance not only in their programming courses but in their other general mathematics subjects as well. If further results to other studies revealed that performance in mathematics is significantly correlated to the performance in programming, mathematics should come to be a requirement in talking programming course in the University.

Conclusion and Recommendations
It is concluded that the low performance of the students in programming course was affected by their poor performance in mathematics. This gave emphasis to the significance of prior knowledge in the students understanding that allows them to link with new information. Mathematical knowledge and skills as one of the pre-requisite courses and core skills on programming, it plays integral part on achieving effective learning outcome in programming. Programming performance was directly and positively associated on the level of acquired mathematical skills by the students. Students gender, on the other hand, has no significantly bearing on programming performance.
In light of the findings and conclusions of the study, the researchers recommend that the information technology department faculty members of the university should continue improving its quality of instructions particularly in teaching and learning motivation towards mathematics and programming courses in order to improve the performance and outcome. It is also recommended to conduct a frequent general assessment to the students in order to further investigate factors that lead to student's difficulty on the concerned learning areas. Develop mobile learning environment, learning management systems, various YouTube video contents, and other self-paced learning strategies to further enhance the programming performance of the students. However, the question on the other factors that affects the student's performance in Mathematics courses, which indirectly influence their performance in Programming 1 course, remained to be investigated. Also, the conduct of related research is highly recommended that will focus on determining other factors that may affect student's programming performance which are not considered under study.