Forecasting Philippine Household Final Consumption Expenditure on Education Using Discrete Wavelet Transformation on Hybrid ARIMA-ANN Model

Objective: This study conveys a forecast of household final consumption expenditure in education of the Philippines from 1st quarter and 4th quarter of 2023. Methods/Analysis: The household final consumption expenditure in education data was obtained from the Philippine Statistics Authority which was part of the National Report for the 1st quarter of 1998 up to the 4th quarter of 2018. The data were forecasted using ARIMA, ANN, Hybrid ARIMA ANN, and the proposed Discrete Wavelet Transformation using Daubechies filter on the Hybrid ARIMA ANN. Findings: The forecasting accuracy of the model for a 1 year and 5 year forecast was compared with ARIMA, ANN, and Hybrid ARIMA ANN through the value of its individual Mean Squared Error, Root Mean Squared Error, and Mean Absolute Percentage Error. It was shown that the proposed DWT using Daubechies filter on Hybrid ARIMA-ANN has an MSE of 0.0009, RMSE of 0.0304, and MAPE of 0.1750 for 1 year forecast and an MSE of 0.0004, RMSE of 0.0194, and MAPE of 0.1167 for the 5 year forecast. It was revealed that the proposed model has the best forecasting performance comparing to ARIMA, ANN, and Hybrid ARIMA ANN. Novelty/ Improvement: For the Department of Education of the Philippines, preparations and plans can be develop to cope up with the forecasted expenditure on education. For the citizen, the result of this research will give awareness on the movement of expenses in education and will let them prepare for it. Other forecasting models and filters on DWT can be utilized on future works which may improve the results of this study.


Introduction
Education can be thought as one of the most important investment and gift that a parent or guardian can pass to its children. Quality education is not just a school factor but also through parents' effort and support 1 . A lot of parents strive hard to bring their child in private schools wherein they believe can offer higher quality of education. In addition to that, most parents spend money on buying new school supplies for their child as well as different things like computers and different gadgets which may help their child learn.
Several parents also bring their child in tutorial centres or avails home tutoring services to elevate their children' s knowledge and competency. According to Philippine Statistics Authority (PSA), expenditure of households in Education contributes 18.7 percent to the growth of the Philippines' Household Final Consumption Expenditure (HFCE) 2 . It was also revealed that there is a continuous growth on the expenses of Filipino with regards to education.
Economists, analysts and different people greatly show interests on how goods or resources are consumed or used. Not only that, they are also focused in determining the future consumptions based from historical data. For instance, Education Expenditure can be portrayed in a data wherein the consumption per year or any intervals in a chronological order. This data type is called Time Series (TS) data which can be analyzed, modeled, predicted, and forecasted using different methodologies 3 . It was perceived as one of the most important areas of forecasting since it relates underlying relationship of time and the variable and it also reveals pattern that may be useful for the economy.
Autoregressive Integrated Moving Average (ARIMA) model is one of the most popular and most used model in forecasting time series data because of its extensive capabilities in forecasting and also because of the Box-Jenkins methodology in model building processes 4 . However, ARIMA pre-assumes that the data is in linear form in which most of economic time series data, it isn't true 5 . On the other hand, Artifical Neural Networks (ANN) are best used in data that are nonlinear in form. Because of the development of different algorithms in ANN, it was now then enjoyed to be used in forecasting TS data. One factor of it is that fewer assumptions are needed to consider before it can be used unlike several traditional approach in forecasting time series data 6 .
According to Montgomery 7 , TS data may not purely have linear or nonlinear characteristics but may be a combination of both. According to 8 the residuals from an ARIMA model is nonlinear and can be used in an ANN model to produce a nonlinear forecast since it was claimed that the forecast of a time series data is a sum of linear and nonlinear forecast. That is,t t t y L N = + Moreover, it was proven by Zhang that the Hybrid ARIMA-ANN performs better than the individual performance of ARIMA and ANN.
Authors in 5 similarly like Zhang, assumes that TS data has both linear and nonlinear characteristics. Base from their model, ARIMA has the ability to generate a linear forecast for a TS data. The result of the ARIMA Model can be used as input for an ANN Model together with the past values of the data to obtain . The proposed model of Khashei and Bijari showed higher performance with Zhang's model for a single time step. But, this model has lower accuracy in multi-step forecasting, because past predictions are used as inputs instead of past original values making the model less accurate.
Authors in 9 details that TS' linear and nonlinear component can be modeled into two, additive and multiplicative. Additive Model in this research assumes the same as with Zhang, while the Multiplicative Model explains that the linear and non-linear component of a TS data when multiplied is the actual data itself. Thus, it can be modeled as, According to this model, residuals are quotient of the actual data divided by the predicted data from ARIMA.
Author in 10 used a hybrid ARIMA-ANN model similar to Khashei and Bijari in predicting value of the stock market EGX30 index. In this research, past values of the data, forecast of data through ARIMA, residuals and past residuals of ARIMA were used as input for the ANN. The model showed better accuracy on multi-step forecasting than.
It was clearly shown that hybrid models are effective in forecasting TS data. Yet, it was claimed that results may be improved through wavelet filter 11 .
According to 12 , wavelet based approach improves trend queries on TSdata. Moreover, 13 shown that wavelet filtered TS data yields more accurate results than models that do not undergo wavelet filtering.
Boto-Giralda, Diaz-Pernas, Gonzales-Ortega, Diez-Higuera, Anton-Rodriguez, Martinez-Zarzuela, and Torres-In 14 proposed a wavelet-based denoising for traffic volume forecasted using Artificial Neural Network. Results shown an improvement in forecasting compared on the use of original data.
In 15 suggested a hybrid ARIMA-ANN models based on discrete wavelet transformed data. On this study, the researcher took the average forecast through three wavelet transforms, Haar, 2 tap Daubechies, and 4 tap Daubechies. The result showed a higher accuracy than the ARIMA, ANN, and a Hybrid ARIMA-ANN model.
In this research, the researcher assessed the ability of a Hybrid ARIMA-ANN model in forecasting HFCE on Education. Particularly, this research aimed to model HFCE on Education as TS data and forecast it with a proposed Hybrid ARIMA-ANN by using a TS data that was transformed using Daubechies Filter (DF) for the Discrete Wavelet Transformation (DWT). The accuracy

Statement of the Problem
This research aimed to answer the general problem "What is the forecasted Household Final Consumption Expenditure on Education of the Philippines for the 1 st Quarter of 2019 to the last Qaurter of 2023?" Specifically, the study sought to answer the following questions:

Scope and Limitations
This study focused on determining best hybrid model based on ARIMA and ANN for HFCE on Education and also forecasted several time-steps ahead. The data that was used in this research was generated from the open source data of the PSA and is in quarter based format starting from 1998 to 2018. No other methods other than ARIMA and ANN were used on this study. Moreover, existing models proposed by previous researchers were used in modelling HFCE on Education.

Method
In this section, four models were used to handle the data. The first is the classical linear model ARIMA, followed by the nonlinear model ANN. The researcher utilized Hybrid ARIMA-ANN model and Discrete Wavelet Transformation using DF on TS data applied on a Hybrid ARIMA-ANN model.

Data Gathering and Preparation Procedure
The dataset that was been used in this research was obtained from the PSA. It is a dataset that is part of the reports for the National Accounts of the Philippines 2 . The researcher generatedthe HFCE by Purpose MATLAB is a programming platform produced by Math Works that is primarily used in computational mathematics. Aside from this function, MATLAB can also be used to analyzed data, develop algorithms, and create models and application 16 .

Autoregressive Integrated Moving Average
ARIMA models can be considered as the most common type of forecasting models which can be made stationary using differencing. ARIMA acronym pertains to three key aspects within the model: AR denoted by which stands for Autoregression relates to the use of dependent relationship between observations and some number of lags. I stands for Integrated denoted by is the number of differencing of the raw observations in order to make the data stationary. MA denoted by stands for Moving Average describes models that use dependencies between residual errors and observations applied to lagged observations 17 . The general forecasting equation for ARIMA in terms of, is given by: Where: θ = moving average parameters ϕ = autoregressive parameters e = residuals The Box-Jenkins method refers to the process of stochastic modeling and an iterative approach in preparation for ARIMA. There are three steps in Box-Jenkins method namely, Identification, Estimation, and Diagnostic Checking. In identification, data is used and all its related information in selecting sub-class model that may best describe the data. In the estimation phase, the data is used to train parameters of the model. The last step of the method is the Diagnostic Checking wherein the evaluation of the fitted model takes place and check for several parts of the model that needs improvement 18 . Figure 2 illustrates an Artificial Neural Networks that has two hidden layers. Artificial Neural Networks are computing systems that is inspired by biological neural networks which is a main tool in machine learning. It composes of input and output layers, and several hidden layers that transform input data into a form which is useful to the output layer 19 .

Artificial Neural Network
An ANN has the following output: In this, ϕ j and θ ij are called the weights. ϕ 0 and θ 0j are the bias terms and ε t is the white noise 6 .

Hybrid ARIMA-ANN
ARIMA and ANN are successful models in the process of forecasting time series data in their own domains. ARIMA is widely known for its accuracy in data that is linear in nature, while ANN is popular for its performance in dealing nonlinear data. However, not all data are pure linear or nonlinear and mostly time series data is a combination of the said characteristics 8 . These data can be illustrated as, y t = L t + N t Where L t is the linear component of a time series y t and N t is the non linear component. According to the procedure of 8 on his Hybrid ARIMA-ANN, the given time series data y t is fitted on an ARIMA model to obtain its linear prediction L . The error series which is the difference between the actual data and its linear prediction. This series was believed as the components which is non-linear on the time series data. That is,t t t e y L = − The error series was then predicted to ANN which was claimed handles accurately nonlinear data. This gives the nonlinear prediction  t . The linear prediction ˆt L and  t was then added to obtain ŷ t .t

Discrete Wavelet Transformation
Discrete wavelet transformation is often used in signal and image processing in different fields 20 . In general, wavelets are functions from a single function ψ that underwent dilations and translation. That is, With this, the parent wavelet ψ must satisfy the condition ∫ψ(x)dx = 0 21 . As wavelets are called as dilates of one function, this means that low frequency wavelets have a > 1 which pertains to a wider width of wave while a < 1 corresponds to the high frequency wavelets or narrow width waves. DF in the process of Discrete Wavelet Transformation allows a straight forward implementation of DWT since it was an improved version of existing wavelet transformation filters 22 . Moreover, wavelet transformed data was found to produced better accuracy forecasting time series than using original set of data 23 .
In the proposed model, the TS data was decomposed into low and high frequency components using Daubechies filtration on DWT. The obtained components were then reconstructed using inverse Discrete Wavelet Transformation 6 . The obtained TS data was used following the procedure on Zhang's Hybrid Model.

Results and Discussion
Four different time series forecasting methods are applied to HFCE on Education data. To determine the accuracy of the models, the researcher utilized Mean Square Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) for 1 year and 5 year forecast of the different models with respect to the original data. Table 1 shows the Central Tendency measures of HFCE on Education data. It reveals that the mean expenditure of Filipino households for education is 60892.18 in million PHP while its median price is 54855.00 in million PHP. It was also revealed that the standard deviation of the expenditure on education from the 1 st quarter of 1998 to the last quarter of 2018 is 35230.72 in million PHP. Figure 3 illustrates the graph of HFCE on Education. It clearly shows that the expenses increase from 1998 to 2018. Figure 4 and 5 suggests that an ARIMA (16,1,12) x (0, 1, 0) 4 is suited for the logarithm of the HFCE on Education data.

ARIMA Model
To determine the goodness of fit of the model, Akaike information criterion (AIC) and Bayesion Information Criterion (BIC) was used as goodness measures. Table 2 shows a low value of AIC and BIC which is good for the model. Figure 6 demonstrates how the ARIMA Model fit into the actual data. It also shows the graph of the residuals from the ARIMA Model. Unit: In million PHP

ANN Model
The researcher used Deep Learning Toolbox of MATLAB to model the TS data in ANN model. In this model, the input y(t) is the logarithm of the HFCE on Education which is the same data that was modeled using ARIMA. This ANN model has 4 hidden layers with 2 feedback delays and 1 output as was depicted on Figure 7.

Hybrid ARIMA-ANN
The researcher used the ARIMA (16,1,12) x (0, 1, 0) 4 model that was initially modelled in this study. A forecast was done using the ARIMA model and the error series we as taken. The error series was then modeled in the ANN model of 4 hidden layers with 2 feedback delays. Figure 9 shows the response output of the residuals data that was implemented on the ANN Model. It clearly shows that most of the training and test outputs meet the target. The outcome of the model was then added to the results of the ARIMA Model.

Daubechies Wavelet Hybrid ARIMA-ANN
The same procedures on the Hybrid ARIMA-ANN model were done on the proposed hybrid ARIMA-ANN. The difference is that the logarithm data of HFCE on Education was deconstructed into high and low components using 2 tap Daubechies on DWT. It was then reconstructed using Inverse DWT and obtained a new set of data. This data was then used on the Hybrid ARIMA-ANN model.

Performance Assessment
To assess the performance of the DWT using DF on Hybrid ARIMA-ANN Model, the 1 st quarter of 1998 up to the last quarter of 2014 or the first 64 quarters of HFCE on Education was been used as the training set for this research. The results were tested using the 1 st quarter of 2015 up to the last quarter of 2018 or the last 20 quarters of the data. The models were implemented using Econometric and Deep Learning toolbox of MATLAB. Table 3 shown the obtained forecasting results of ARIMA, ANN, Hybrid ARIMA-ANN, and DWT using DF on Hybrid ARIMA-ANN in 1 year and 5 years forecast. Results shown that the proposed model outperformed the three models both in 1 year and 5 years forecast. The table reflected lowest MSE, RMSE, and MAPE on the proposed model. It was also revealed that the ANN model with 4 hidden nodes has the lowest performance reflected by high value of MSE, RMSE, and MAPE. Moreover, it was also revealed that the ARIMA (16,1,12)x(0,1,0) 4 model has an almost equal performance to the proposed hybrid model on 1 year forecast cast and outperforms ANN and Hybrid ARIMA-ANN. Figure 10 and 11 depicts the forecasting accuracy of the ARIMA Model in forecasting HFCE on Education for 1-year and 5-year time. The red line represents the forecast and the blue line represents the actual data. It shows that the results of the ARIMA Model are near to the actual data. Figure 12 and 13 shows the forecasting accuracy of the ANN Model for 1-year and 5-year step forecasting. The graph shows that the model is not that fitted for the actual data. Figure 14 and 15 illustrates the graph for the forecast on 1-year and 5-year step using the Hybrid ARIMA-ANN Model. It shows that the red line which depicts the forecast data is near to the actual data and follows the flow of the graph of HFCE on Education.           Unit: In million PHP

Conclusion
A lot of researches had used ARIMA and ANN to model and forecast TS data since it was been proven effective and accurate. Yet, some aspects of TS data are not suitable for ARIMA or ANN alone which is the reason why hybrid models like ARIMA-ANN are produced to improve results and have much better forecast.
Through several researches, ARIMA-ANN is found accurate for TS forecasting. However, in this research it was believed that it may still be improved using techniques in DWT. It was proposed in this research a hybrid ARIMA-ANN model in which the TS data was first decomposed using Discrete Wavelet Transformation specifically with Daubechies filter and recomposed using Inverse Discrete Wavelet Transformation to obtain a new set of data. The result shown that the proposed model has better performance in forecasting HFCE on Education than the ARIMA model, ANN model, and Hybrid ARIMA-ANN model. It was been reflected that DWT using DF on Hybrid ARIMA-ANN model has the lowest MSE, RMSE, and MAPE for both 1 year and 5 years forecast.

Recommendations
TS forecasting is an important aspect in economics and plays a great role in the development and planning of an economy. Such innovation in forecasting TS data like the use of Discrete Wavelet Transformation using Daubechies Filter on Hybrid ARIMA-ANN may help a lot of people which has interests on TS forecasting. With these results, the researcher recommends that future works use different Hybrid models and filters of Discrete Wavelet Transformation particularly to economic' TS data.
With the results of the proposed model, the researcher recommends that the education sector should prepare and develop plans that may help the family to cope up with the forecasted expenses.