An Efficient Time Series Analysis for Pharmaceutical Sector Stock Prediction by Applying Hybridization of Data Mining and Neural Network Technique

Objectives: The nonlinearity of the stock market is widely accepted all over the world and to reveal such non-linearity the most effective technique has proved to be constructed through application of either data mining or neural network. Pharmaceutical sector is a rapidly growing in Bangladeshi stock market. The objective of this paper is to investigate whether the hybridization of data mining and neural network technique can be applied in predicting the stock price for Pharmaceutical sector of Dhaka Stock Exchange (DSE). Methods/Analysis: This study uses daily trade data for Pharmaceutical sector of DSE. We have analysed the behaviour of daily average price for Pharmaceutical sector of DSE. For this study, 6 top listed pharmaceutical companies have been selected to perform the analysis and selected time frame for the research is 15 years (2000-2015). The analysis is performed in two stages where first stage performs the K-means clustering of data mining method to discover the stock with most useful pattern and second stage applies the nonlinear autoregressive with Exogenous Input neural network method to predict the closing price for the selected stock. Findings: The prediction performance through the hybridization of data mining and neural network technique is evaluated and positive performance improvement of prediction is observed which is very encouraging for investors. The research also depicts that hybridization of data mining and neural network technique can be applied in determining the stock investment decision for Pharmaceutical sector of DSE though the impact of many different information has greater influence in determining the stock price. Novelty/Improvement: We intend to apply the data mining and optimized neural network in predicting stock market. We would like to work with the parameter and learning of the neural network to achieve better result. We will further investigate the effect of various factors viz. dollar price, gold price, FDI, bank interest rate etc. on stock price and index movement. An Efficient Time Series Analysis for Pharmaceutical Sector Stock Prediction by Applying Hybridization of Data Mining and Neural Network Technique Das Debashish1,2, Sadiq Ali Safa1 and A. Noraziah1 1Faculty of Computer Systems and Software Engineering, University Malaysia Pahang Kuantan, Pahang Darul Makmur, Malaysia; debashish.das@apu.edu.my, alisafasadiq@ump.edu.my, noraziah@ump.edu.my. 2Faculty of Computing, Engineering and Technology, Asia Pacific University of Technology and Innovation TPM, Bukit Jalil, Kuala Lumpur 57000, Malaysia.


Introduction
Stock market plays a vital role in economy as it is one of the most important sources for companies to raise money.Business can be publicly traded through stock market and extra capital can be also raised through it for expansion of the businesses by selling the shares of the company in public market.It allows the investors to easily and quickly buy and sell securities.Hence, it is an attractive option for the investors compared to other less liquid investment.Share price is an important factor of the dynamics of economics as it can influence the social mood.
Bangladesh stock market consists of two stock markets namely, Dhaka Stock Exchange (DSE) and Chittagong Stock Exchange (CSE).But, DSE is more matured and bigger market than CSE.DSE was first incorporated as the East Pakistan Stock Exchange Association Limited on April 28, 1954 and it was renamed as Dhaka Stock Exchange (DSE) Limited on May 14, 1964.DSE trading activities was resumed after liberation in 1976 and at that time, only 9 companies were listed.But currently, there are 260 companies enlisted in DSE [Source: www.dsebd.org] at 18 different sectors i.e.Bank, Ceramics, Cement, Engineering, Financial Institutions, Food and Allied, Fuel and Power, Insurance, IT, Jute, Miscellaneous, Paper and Printing, Pharmaceuticals and Chemicals, Services and Real Estates, Tannery Industries, Telecommunication, Textile, Travel and Leisure.
Pharmaceutical industry has developed in Bangladesh over a short period of time and 25 companies are enlisted in DSE from this sector [Source: www.dsebd.org].Usually, Pharmaceutical companies invests a huge amount of money in research and development and then go for production, but in Bangladesh very few companies have invested for research and most of the companies are small firms as they operate not by innovating drugs rather by copying the drugs from others.Some of the companies of Pharmaceutical sector produce medicine of international standard and export significant amount of medicine worldwide.Hence, in this research instead of considering all the pharmaceutical sector companies we have selected 6 reputed companies namely ACI, BXPHARMA, GLAXOSMITH, RECKITBEN, RENATA, and SQURPHARMA.
Prediction for stock market is a very complicated task as it does not follow any specific process.Many factors are involved for fluctuation of the stock market movement.Economic condition, political situation, traders' expectation, catastrophes and other unexpected events are some of the major reasons for fluctuation of stock market.Stock market prediction becomes even more complicated due to the various factors involved.Data mining and neural network techniques have been introduced for overcoming the complicacy and these techniques have been applied to ease the prediction 1 .
Data Mining is a method that can extract unknown projecting information from large databases and it is a well-known technology that helps organizations to focus on the most important information in data repositories with great potential [2][3][4][5] .Classifying a large data into smaller groups is known as Clustering and it shares the same characteristics.It is also a basic unit of classification of initially unclassified data based on common properties 6 .
Artificial neural networks are parallel computing devices and it consists of many interconnected simple processors.Neural network is a significant method for stock predictions as it has the capacity to deal with fuzzy, uncertain and insufficient data which may fluctuate swiftly over very short span 7 .Researchers have attempted numerous researches on applications of neural network and its effectiveness over traditional methods.The most common field of applications of neural network is investigated 8 concludes that 53.5% of its application is in production/ operations and 25.4% in finance.Among the different financial applications, stock and performance prediction is the most common applications of neural network.This paper will analyse the application of k-means clustering data mining and Nonlinear Autoregressive with Exogenous Input (NARX) technique in predicting stock price for the companies of pharmaceutical sector from Dhaka Stock Exchange (DSE).It will also try to address some of the advantages and limitations of the methods and stock prediction after performing the analysis and methodology to those companies with respect to problem domains, the data model and results criteria.
The dynamic and non-linearity of stock market makes the prediction challenging for researchers.Moreover, influence of various internal and external factors towards the movement of stock price make the prediction even more challenging.Investors can avail the recommendation through stock prediction system as choosing the right stock is the most crucial for stock investment.Hence, Stock prediction is an important research dimension which can suggest the investors from choosing the wrong stock and avoid huge loss.The body of literature will discuss the areas relevant to stock market prediction.
Stock market forecasting has received widespread attention of researchers for about two decades.But most researchers have considered macro-economic variables and stock market index return to compute future trend.The outcome through the research was non-linear data.Our research considers about making prediction of future stock values applying hybridization of data mining and neural network technique.It aims at feeding historical data to the data mining algorithm to categorize the organization and then pass the data to the neural network where data and mathematical formula will train the neural network for better prediction.
Extracting required data from large database is known as Data Mining, which is also called knowledge discovery.Data mining also combines statistics and artificial Vol 9 (21) | June 2016 | www.indjst.orgintelligence (AI) with database management system (DBMS) to process huge collection of data.Neural network processes information through mathematical model, which is very popular among researchers for its accuracy in prediction.Numerous researches have been carried out in these two fields due to the efficiency of those techniques.
Various methods have been applied to predict stock market.S. P. Deshpande et al. have narrated about the various applications of data mining in decision making for diagnosis at medical science, bankruptcy prediction, player selection at sports, product recommendation, prediction for stock holders and investors etc. 9 .Have demonstrated that the nonlinearity of financial data can be revealed by neural network 10 .Have concluded that technical analysis, fundamental analysis and linear regression are not consistent in making prediction with fewer errors 11 .Have found the significance of neural network for stock predictions.It has also the ability to deal with data that may fluctuate rapidly, fuzzy, uncertain and insufficient 4 .Have attempted to predict the stock price for ACI pharmaceutical by applying back propagation method for training the neural network and Multilayer Feed forward network 12 .They have achieved good accuracy in prediction for pharmaceutical sector of Bangladesh stock market using Artificial Neural Network (ANN).However, they have recommended to use more input data to achieve more accurate results.

Data Mining with K-means Clustering
K-means clustering algorithm is a simple clustering technique used for medical, biometric, financial prediction and other related fields.It clusters observations into groups of related observations without any prior knowledge of those relationships.A large database can be processed quickly and an optimal cluster can be produced faster through k-means clustering.The following function (1) can express the k-means clustering process: In the above Function, n is the number of objects in a dataset X, Q l is the mean of cluster l, y il is an element of a partition matrix Y n×k , d is a similarity measure (Squared Euclidean distance) In k-means algorithm, k is provided as input and the observations are clustered into k groups.The algorithm then assigns each observation to clusters based upon the observation's proximity to the mean of the cluster.Mean for each cluster is then recomputed and the process is repeated.The steps of the algorithm are as follows: K points are selected arbitrarily as the initial cluster • centers ("means").
As per the squared Euclidean distance between each • point and each cluster center, each point in the dataset is assigned to the closed cluster.Each cluster center is recomputed as the average of the • points in that cluster.
Repeat the steps 2 and 3 until the clusters converge, • where the convergence will take place if no observations change clusters when steps 2 and 3 are repeated.

Nonlinear Auto Regressive with Exogenous Input (NARX) Neural Network
Nonlinear Autoregressive with Exogenous Input (NARX) neural networks can be used to predict a time series from a given past series through appropriate training.It can predict a series called y(t) for a given d past values of y(t) and another series x(t) 13 .Where, neural network is a collection of interconnected computer units known as neurons that can send and receive signal from each other.

Training a Neural Network
Training a neural network is the most important task as the neural network performance heavily depends on training.Data used for training needs to be illustrative to the task to be performed.Lot of research is needed to gain an acceptable result as the training is an unplanned process.The following cases to be considered for designing a neural network solution: Identify appropriate network model • Choose a network topology • Decide learning parameters • Data needs to be preprocessed to design a network.Preprocessing can be performed by various complex statistical process or by scaling the data between 0 and 1.In fact, it can be performed in a better way through adequate domain knowledge relevant to the task and good neural network engineering concept.
Data is divided into two subsets namely, training and testing where the actual performance of the network can be determined by testing.Testing can also help to identify the poor performance of the network through training.The network can be generalized through the intellectual management of testing where the good generalization of the network depends on the data that is similar but different from training data 14 .

Prediction through NARX
A dynamic filtering by which future values are predicted by past series is known as prediction.Various dynamic models are produced to design a good predictive model where the dynamic models play the role in analysis, simulation, monitoring and controlling various systems such as financial, manufacturing, chemical, robotics, aerospace etc. NARX can be used for stock prediction through proper training for a given past series.

System Architecture
The nonlinearity of a stock market may be revealed by neural network 15 .Literature review has identified and reviewed stock prediction using data mining and neural network technique.Rising stock market like Bangladesh and especially pharmaceutical sector being most important sector of the stock market has not been addressed much by previous research through application of data mining and neural network technique.
We have collected pharmaceutical sector stock data for this research from DSE for about 15 years.Then, k-means clustering algorithm is applied to identify the growth rate prediction for the company's share based on price, volume and turnover.At first, we divide the data into two clusters namely, highly growing and average growing by applying k-means clustering algorithm based on volume quantity, fluctuation in high and low price of the shares.Later, we have used neural network algorithm through Nonlinear Autoregressive with Exogenous Input (NARX) to predict 16 the closing share prices for the organization selected through clustering.At this stage, the investors can make their decision regarding the investment for the organization.It has been observed that the combination of k-mean and NAR can generate more useful item-set from large market data 17 .Figure 1 depicts the proposed architecture of the hybridized prediction system.
The collected Pharmaceutical data from DSE cannot be used directly after collection and hence data is preprocessed to use it for k-means clustering as per the sample data Table 1.

Results and Analysis
The research is performed in two stages where the stock with useful pattern is selected through k-means clustering in first stage and closing price of the stock is predicted with Nonlinear Autoregressive with Exogenous Input (NARX) in second stage.The result of each stage is analyzed as below:

Result through k-means Clustering
The stock with very useful pattern is placed in one cluster and the stock with average useful pattern is placed in another cluster through k-means clustering algorithm.The stocks are clustered through the daily trading features i.e. high price, low price, average price, closing price, trade volume and turnover of the company.Table 2 demonstrates the clustering process.It has been observed that ACI has useful similar pattern overall and hence it has been selected for stock prediction through NARX as the investor has the higher possibility of gain through this stock.

Result through NARX Neural Network Model
Intraday Closing price of ACI for 15 years has been selected and then NARX is applied to predict the Closing Price.As Time Series method for cash forecasting is a better option, we have selected this method for stock prediction.Table 3 demonstrates the classification of data that has been selected for prediction through NARX.

Architecture of the Neural Network Model
The architecture of the model is demonstrated through

Regression
The regression that helps describe nonlinear relationships in the experimental data is expressed by the Figure 3 that plots four regressions and it demonstrates the network output with respect to actual data (target) for training, validation, test and all data sets.For a perfect fit, the data should fall along a 45 degree line, where the network outputs are equal to the targets.Here, most data fall along 45 degree line and all the R values produced by each plot is more than 0.99 and much closer to 1. Hence, it indicates that the fit by Regression is reasonably good for all data sets.

Forecasting Results
The final predicted closing price of ACI generated through NARX neural network algorithm is listed in the Table 4 which demonstrates Actual Price vs. Predicted Price.

Performance Evaluation of the Predicted Result
The performance of our proposed model is measured through Mean Absolute Percentage Error (MAPE), Mean Absolute Deviation (MAD) and Root Mean Squared Error (RMSE).MAPE, MAD and RMSE are calculated through the equation ( 3), ( 4) and (5).Where, (a 1 ,a 2 ,a 3 ,…….,an ) are actual values and (p 1 ,p 2 ,p 3 ,…….,pn ) are the predicted values.The performance evaluation is listed in Table 5.
We have attempted to forecast ACI Company's intraday closing price for Dhaka Stock Exchange through this research.We have used 15 years data for prediction through the neural network.The performance evaluation indicates positive performance improvement through the created network, which is very encouraging for this research work and it will guide the investor for investment into a particular stock.

Conclusion
The Pharmaceutical sector is a very important sector in DSE with lot of well reputed companies listed.We plan to investigate the possibilities of creating a generalized prediction system not only applicable for pharmaceutical sector but also for other important sectors and as well as applicable for major global stock market.We also plan to apply the same technique for predicting not only for stock prices but also for various stock indices so that individual and institutional investors can avail a very helpful guideline for stock investment.
In future, our goal is to apply the data mining and optimized neural network in predicting stock market.We plan to work with the parameter and learning of the neural network to achieve better result.We also plan to investigate the effect of various factors viz.dollar price, gold price, FDI, bank interest rate etc. on stock price and index movement.

Acknowledgement
This paper is made possible through the help and support from everyone.Especially, we would like to dedicate our acknowledgment of gratitude toward the significant contributors.We would like to thank UMP and APU for most support and encouragement.We would also like to thank DSE library for providing the data.Finally, we sincerely thank to our parents, family, and friends, who provide theadvice and support.The product of this paper would not be possible without all of them.

Figures and Tables Figure 1 .
Figures and Tables

Figure 2 .
Figure 2. Architecture of the neural network Model.

Figure 2 . 2 )
Number of Hidden Neurons in this architecture is 10 and Number of Delays (d) is 2. The following Equation 2 defines the NARX problem for stock prediction.y(t) = f(x(t-1), …..,x(t-d),y(t-1), …..,y(t-d)) (The architecture has the following steps: a) Number of Hidden Neurons b) Number of Delays

Figure 2 .
Figure 2. Architecture of the neural network Model.

Figures and TablesFigure 1 .
Figures and Tables

Figure 2 .Figure 3 .
Figure 2. Architecture of the neural network Model.

Table 1 .
Architecture of the proposed hybridized Model.Typical data used for k-means clustering.

Table 3 .
Classification of Data Sets

Table 1 .
Typical data used for k-means clustering.An Efficient Time Series Analysis for Pharmaceutical Sector Stock Prediction by Applying Hybridization of Data Mining and Neural Network Technique Vol 9 (21) | June 2016 | www.indjst.org prime goal of stock research is to facilitate the investors in having good return through stock investment.The hybridization of data mining and neural network technique has proved to be potential in making stock predictions for pharmaceutical sector.However, prediction for a stock market like DSE is a very complicated task as it does not always follow the rule in moving the stock trend but many different external factors i.e. bank interest rate, foreign direct investment (FDI), gold rate, dollar rate, political turbulence, gambling play their role as well.

Table 5 .
Performance Evaluation of hybridization technique

Table 4 .
Actual Value vs. Predicted Value of ACI