Coronary artery disease prediction using hidden Markov model based support vector machine

Background: Medical data classiﬁcation has become a hot research domain in data mining, but still it faces the increased classiﬁcation accuracy issues. Methods/Statistical Analysis: Novel Hidden Markov Model based Support Vector Machine (HMM-SVM) is proposed to classify and predict Coronary Artery Disease (CAD). The features are extracted using HMM, and normalized using SVM. Feature Extraction assist the classiﬁcation algorithm to get better results. HMM-SVM performs classiﬁcation by extracting the features of Z-AlizadehSani dataset and ﬁnally selects the appropriate feature to perform classiﬁcation. Findings: Z-AlizadehSani dataset holds 303 records with 4 diﬀerent types of features, which are demographic, symptom and examination, ECG, and laboratory cum echo. For extracting these features and ﬁnding hidden information there exists no common algorithm. In HMM-SVM, HMM is applied to extract features by ﬁnding the hidden and previous stage values, and SVM is applied to perform classiﬁcation on extracted features. To analyze the performance of HMM-SVM benchmark performance metrics are utilized. Discriminative performance results of internal validations are high in the task of binary classiﬁcation (i.e., sensitivity-98.2%; speciﬁcity-97.96%). False Positive Rate of HMM-SVM is entirely low (i.e.,1.87%) when comparing with previous algorithms. HMM-SVM holds the classiﬁcation accuracy as 98.02% and which is the better cum expected results towards the prediction of CAD. Novelty: Detailed analysis indicates HMM-SVM have better eﬀects towards classifying and predicting CAD. Furthermore, care needs to be placed in adhering to ethical principles while utilizing the models that are automated. Future studies should make use of bio-inspired concepts to get even better results.


Introduction
Coronary artery disease (CAD) is one among the many heart diseases that lead to sudden death without any symptoms, and it is increasing in South Asian countries like India. First and most common sign of CAD is heart attack. Root cause for CAD is plaque that gets build in arteries wall. Knowledge discovery is the efficient process to analyze and understand the enormous amount of available data. It involves the process of identifying the valid, potential, novel, and useful patterns in the data. Data mining is the method to apply machine learning in datasets to extract the hidden, unknown, and better patterns in the datasets. Data mining is seen as a better step in discovering the knowledge in vast data.
Angiography is the best method to detect CAD, but it involves more extensive cost and knowledge. Because of limitations like this, researchers started applying data mining techniques in medical field to make decisions. Predictive data mining is better computing method to build a model using the features in dataset. It ranges from easy level to complex level, and persuasive ones involve hidden markov model (HMM). The main objective of this paper is to make use of HMM for identifying the CAD and compare its classification accuracy with other classification techniques.
In order to enhance the prediction of heart disease from Parkinson disease, optimized crow search algorithm (1) proposed, where it attempted to predict the heart disease more accuracy to provide on-time treatment. The results showed that the proposed algorithm is not fit for dataset related to heart disease, where the classification accuracy becomes very low. Two-Class Classification (2) proposed with the framework of machine learning by utilizing artificial neural network classification concept. The classifier works by selecting the spectral features of sub-band. The result shows that classifier could not perform well when there exist noisy data more than the remarkable range, where the false positive gets increases. Robust algorithm (3) was proposed to localize the heart beats and classify by utilizing the variables and threshold values. During the process of classification, dataset labels were utilized to find the variations in heart beat. Localization errors in the results shows that the algorithm is not suitable large datasets. Multistage classification (4) was proposed to classify and provide diagnose to the patient of congestive heart failure. It performs the analysis based on variation in heart rate, and computes the features related to heart rate. For computing the features, it uses the domains of time and frequency. The result provided a ineffective result regarding the true negative which cannot be used to provide treatments or medications to the patients.
Automated classifier based on support vector machine (5) proposed to classify the electrocardiograms towards predicting the heart disease. It depends on the time of electrocardiograms, to train the support vector machine to select the feature. The results proved that the classification accuracy went down due to feature selection concept, where the classifier omitted the important feature for classification. Modified version of ant colony optimization (6) proposed to increase the classification accuracy towards predicting the coronary artery disease, where it uses the least square model of regression. The correlation coefficients calculated for checking the fitness level between the selected features. The result came with low classification accuracy. Deep learning strategy (7) for the classification of ECG towards heart disease prediction was proposed. This strategy was proposed with the target of classifying in a automatic manner. The result showed that the results were not efficient when comparing with the existing algorithms in the term of sensitivity. Treadmill advanced prediction method (8) was proposed to predict the coronary artery disease. Instead of following the traditional statistics approach, data mining strategy was used. In this decision tree, knearest neighbor, k-sorting and searching algorithms were ensembled. Features selection were also performed to increase the accuracy, but f-measure value got decreased.
Imbalanced data classifier (9) was proposed to predict the heart disease even there exist missing values in the dataset. Symmetric Minority Over Sampling Technique was applied to increase the performance of classifier, resulting that it is not suitable for all dataset by giving the poor f-measure. Hybrid classifier (10) , which was an ensemble of neural network and genetic algorithm, was proposed for the classification of coronary artery disease. Initially neural network performed and then the genetic algorithm was used. The result showed that the specific hybrid classifier was not fit for the prediction of coronary artery disease, where the results came with very low classification accuracy. Mobile health service platform (11) was proposed to analyze and classify the sounds of heart in order to predict the heart disease. It aims to monitor the patients from remote location by using the wireless technology. The service platform is built by integrating the Hidden Markov Model and Mel Frequency Cepstral Coefficient. The result provides inaccurate results varying with different mobile devices.
Disease gene classification method (12) proposed with the utilization of metagraph representation method, where it has integrated the terms, which describe the protein. The result showed the method is not suitable to predict the heart disease only by the gene, where the result came with increased false positive and false negative. Deep learning based convolutional neural network (13) proposed to classify the heartbeat to predict the level of heart disease. It utilized the function of batch based weight loss to measure the loss and overcome the problem of imbalance, which occur between classes. The accuracy has become low due to the dynamic change of class and batches. Computer aided diagnosis system (14) proposed to predict the valvular heart disease by utilizing the signals of impedance cardiography. It uses the concepts of selecting features by using the support vector machine and k-nearest neighbors algorithms. The result came with very low true positive which affecting the accuracy. Extreme Gradihttps://www.indjst.org/ ent Boosting based Classifier (15) proposed to detect the heart disease by analyzing the electrocardiogram signals. It extracts the features from six broad categories and finds the best feature by utilizing the recursive feature elimination concept. During the feature extraction phase, important features were discarded leading to misclassification.

Proposed Methodology
With the increased development in medical field, the classification is becoming a major idea to handle and organize medical data. The classification in medical field is possible to formulate as the following. Consider the training set of labeled texts where s j indicates every text that are contained in set S, and b j that belongs to s j is expected to be available in B = (b 1 , . . . ., b n }. The main intention of classification is to develop a better learning algorithm that takes the input as training set S and result in generating a classifier g : E → B, which have the ability to perform classification in a more accurate manner from S. Even though adequate methods are proposed for classification, support vector machine (SVM) is addressed as identified as better classifier, which falls under binary classification family. This paper attempts to combine HMM with SVM for classifying the medical data towards prediction of CAD. Initially, HMM is applied to extract the features to generate output that involves discrimination information. Secondly, Outputs of HMM are normalized into new feature vectors. Finally, new feature vectors are fed as input to SVM for classification.

HMM based Feature Extraction
Selecting the features and better technique for classification are discussed in this part. In medical data classification, term values involved are used for classifying the features, but there exist no guarantee for better classification accuracy when in involves all values and features. Hence, there exist a need for eliminating the information that are not useful for classification.
Consider P = (P 1 , . . . , P s ) as sequence of medical data from the dataset, where P s indicates the corresponding tokens to medical data. Precisely, every token act as a token for extracted vectors. Linguistic tags Z j are added to selected tokens of P s . The feature extraction maps P 1 , . . . , P s to a unique sequence tags An HMM λ = (π,C, D) consist restricted number of states (T 1 , . . . , T m }. The probability at initial state T j will be π j = Q ( f 1 = T j ) and the probability at transition state (i.e., T j to T k ) will be c jk = Q ( f s+1 = T k | f s = T j ). Over different number of observations every state in HMM will get characterize by d j (P s ) = Q (P s | f s = T j ) probability distribution. P = P 1 , . . . , P s is considered as a observational sequence. According to the theorem of Bayes, for every P s observation its necessary to return z j tag which increases probability Q (τ s = Z j |P). It indicates the necessity of identifying the states f 1 , . . . , f s to increase Q ( f s = T j |P, λ ) which will return the Z j tag corresponding to T j state for P s token.

HMM's forward-backward ethod
Forward-Backward (FoBa) is a method which interferes HMM to calculates post values of overall variables in hidden state α s ( j) = Q ( f s = T j , P 1 , .., P s |λ ) which is considered as the forward variable that quantifies the reaching state T j at time t that observes the sequence P 1 , . . . , P s . β s ( j) = Q (P s+1 , . . . P s | f s = T j , λ ) is considered as the backward variable that quantifies the possibility to observe P t+1 , . . . , P s rest sequence state T j at time t. The task of interference is called as smoothing. FoBa utilize dynamic programming principle to calculate post values distribution in 2 different passes. First pass proceeds forward in unique time and second pass proceeds in different unique time. FoBa is utilized to denote any algorithm that belongs to classes of algorithm and operates in a simultaneous manner. It is necessary to calculate α s ( j) and β s ( j) to express the probability in state T j at time t.
Finally, the features are extracted using HMM as below: Generate the sequence P = (P 1 , . . . P s ), where P s denotes the vector that contains the detection V s .
When the feature extraction process gets over, the outputs are normalized to form a new feature vector, and then SVM classifier is applied for classification.
FoBa algorithm starts with some initial estimate of the HMM parameters λ = (A, B). We then iteratively run two steps. Like other cases of the EM (expectation-maximization) algorithm, the forward-backward algorithm has two steps: the expectation step, or E-step, and the maximization step, or M-step.
https://www.indjst.org/ The Expectation-Maximization (EM) algorithm is the method to discover maximum-likelihood estimations for model parameters when dataset faces incompleteness problem, missing data points, or unobserved latent variables. It is an iteration-based method to approximate the function of maximum likelihood. When maximum likelihood estimation gets the ability to find "best fit" model for few data in dataset, then it will not work well.
In the E-step, we calculate the expected state occupancy count γ and the expected state transition count ξ from the earlier A and B probabilities. In the M-step, we use γ and ξ to extract the features by recalculating new A and B probabilities.
f unction FoBa (observations o f len T, out put vocabulary V, hidden state set Q) returns HMM = (A, B) initialize A and B iterate until convergence f or all t, i and j

M-
Step

SVM Classifier
SVM is one of the great classification algorithm coming under supervised learning category. SVM working mechanism works based on risk minimization concept derived from the theory of computational learning. Increased generalizing capability in SVM makes it best suitable for dataset having high number of features. In multiple r (16,17) searches$ it has been shown that SVM can perform better than different classification algorithms. Consider a collection of samples ((z 1 , x 1 ) , (z 2 , x 2 ) , . . . ., (z l , x l )} where z l ∈ R m , and x l ∈ (−1, +1}. Let the decision functions be sgn ((W • χ) + b) , where (W • χ) indicates the product that is calculated using W and χ. Hence, the decision function g w,b have the properties as shown in Eq.(1).

or all t, i and j
In most cases, the hyperplane used for separation does not exist. In order to make the possibilities to violate Eq.(1), slack variables are necessarily to be introduced which are ξ j ≥ 0, j = 1, . . . ., l. Hence, the issues in SVM can be illustrated as Eq. (2) minimizeϕ The minimization issue mentioned in Eq.(2) is constrained-quadratic-programming issue, that can be formulated as convex constrained-quadratic-programming issue as shown in Eq.(3) https://www.indjst.org/ where Lagrange multipliers are indicated as α j , the parameter B is used to assign a penalty for the misclassification. As a result of solving the Eq. (3), it provides a option to form a decision function and it is shown in Eq. (4) where bias term is indicated as d. A tiny fraction portion of coefficients α j is considered as nonzero. The equivalent pairs of entries are treated as support vectors and it defines the function used for taking decision. Eq.(4) provides a way to generalize the issue in nonlinear case and it is possible to achieve by mapping problem data into increased space of dimensional feature (H) which is done by transforming (z j • z k ) as ϕ (z j ) ϕ (z k ). Mapping function is completely defined from definite kernel function (z j , z k ) = ϕ (z j ) ϕ (z k ). Hence, the function used for taking decision can be redrafted as

Combined HMM with SVM
The proposed new approach for classifying the medical data that ensemble HMM and SVM is discussed in this current section.
Newly obtained feature vectors are normalized using || f || 2 , and it becomes essential for SVM classification. So far, HMM feature extraction was discussed and the remaining discusses the learning of HMM parameter λ in generating the set P = ( P 1 , P 2 , . . . .P L } . To resolve the issues in generating P = ( P 1 , P 2 , . . . .P L } , Baum-Welch concept is utilized parameter estimation is shown in Eq.(6) and Eq. (7): where joint-event is indicated as ε l s ( j, k), and the state-variable is indicated as γ l s ( j). ε l s ( j, k) and γ l s ( j) are associated with observation sequence L.
Consider a pair of new feature vectors that is received from output of HMM. SVM task is made to describe (i) class in a binary manner, (ii) classifier B jk for every distinct classes of j and k. Positive labels are allocated to j-th class, and negative labels are allocated to k-th class. The classification that involves decision function is given by Eq.(8) https://www.indjst.org/ where Λ represents total count of j-th and k-th classes in training data. When an unknown sample is given as a input and if the decision function predicts the sample to class j, then classifier B jk provides one vote for class j, else the vote is provided to class k. Once after receiving all the votes from the classifier, then unknown sample will belong to class having increased votes.

Z-Alizadehsani Dataset
Z-Alizadehsani dataset holds the record of 303 patients with 54 features. Individual features in dataset are considered as the indicator of CAD symptom based on the medical history. Some particular features are not used in diagnosing the CAD by data mining. Features of Z-Alizadehsani dataset are categorized into 4 groups, which are: (a) demographic, (b) symptom and examination, (c) ECG, and (d) laboratory and echo. Every individual patient's falls in two categories, that is either Normal or CAD. If the diameter range is greater than or equal to 50% then patient is treated as CAD patient, else the patient is treated as normal. Few features are utilized to confirm the history of (a) hypertension, (b) Diabetes Mellitus, (c) smoking, (d) smoking of cigarettes formerly, and (v) confirmation of heart disease with patients first-degree relatives.

About MATLAB
Matlab R2013a is used to evaluate the proposed work against the baseline schemes. Matlab act as a major tool in business analytics, text mining, medical image mining, and for most machine learning based algorithms. Because of its ease of use and user friendliness, it is being utilized in many engineering applications, edification, research, and guidance of proposing new tools. By default, Matlab is built with the functionality of providing huge inbuilt mathematical functions, which aims to resolve scientific oriented problems. Mostly Matlab is used to design, explore, and solve many iteration-based problems. Inbuilt applications and tools available in Matlab make design of predictive models in an accurate and rapid manner.

Performance Measures
Specificity, Sensitivity, and Accuracy are the default performance measures used in data mining for validation. Positive rate of true and false values are mostly significant in medical field. Precision, Recall and F-Measure are also considered to check the performance of classifiers. Confusion matrix is treated as a special type of table to provide visualization to algorithms performance. By considering the dual class problems (i.e., Class 1 and Class 2), matrixes will receive rows and columns in 2 ways, it is identified to count true positives (TP), true negatives (TN). false positives (FP), and false negatives (FN). These measures [7] are defined as follows: TP -Accurately classified class 1 samples based on statistical way of measuring. TN -Accurately classified class 2 samples based on statistical way of measuring. FN -Inaccurately classified class 1 as class 2 samples based on statistical way of measuring. FP -Inaccurately classified class 2 as class 1 samples based on statistical way of measuring. By make use of above-mentioned measures, this research work performs the calculation of result values using benchmark performance metrics and it is defined as below:

Results and Discussion
This section discusses the results obtained by HMM-SVM against the previous methods, namely SMO (16) , ACO-SVM (17) and CPCA-SVM (18) . From Figures 1, 2, 3, 4, 5 and 6, X-axis is plotted with corresponding metrics and Y-axis is plotted with result values in percentage. Figure 1 compares the performance of HMM-SVM towards predicting TP and TN against SMO (16) , ACO-SVM (17) and CPCA-SVM (18) . It is clear that the performance of HMM-SVM is better than the other methods. HMM-SVM performs the classificahttps://www.indjst.org/ tion based on selected features. The other methods perform classification by considering the records of dataset in a sequential manner. Corresponding result values of Fig 1 is shown in Table 1.  Figure 2 demonstrates that HMM-SVM has better performance than other considered methods. It is clear that SMO (16) has the worst performance where it provides more FP and FN. HMM-SVM too is providing FP and FN, but it is very low than SMO (16) , ACO-SVM (17) and CPCA-SVM (18) . Because of considering the hidden state values, HMM-SVM has reduced FP and FN.  Figure 3 compares Sensitivity and Specificity results of HMM-SVM against SMO (16) , ACO-SVM (17) and CPCA-SVM (18) . It is clear to understand that HMM-SVM has better results, where it gives enhanced results. FoBa process in HMM-SVM considers the previous state values in providing the results, hence the results are better than SMO (16) , ACO-SVM (17) and CPCA-SVM (18) .

Sensitivity and specificity analysis
https://www.indjst.org/    Figure 4 illustrates the Positive Rate comparison of HMM-SVM against SMO (16) , ACO-SVM (17) and CPCA-SVM (18) . It is evident that HMM-SVM has enhanced results than other methods. Utilizing the results obtained from newly derived features assist HMM-SVM to provide enhanced HMM-SVM. Other methods simply consider all features to find the results.  Figure 5 compares the Precision and Recall results of HMM-SVM, SMO (16) , ACO-SVM (17) and CPCA-SVM (18) . Expectationmaximization in HMM-SVM provides a way to give better results than other methods. Due to not giving priority to the expected results and processing the classification, SMO (16) , ACO-SVM (17) and CPCA-SVM (18) is getting poor results in precision and recall.   Figure 6 shows that HMM-SVM has out performing results than SMO (16) , ACO-SVM (17) and CPCA-SVM (18) , which indicates that HMM-SVM has better performance towards predicting CAD. It is because that HMM-SVM gives importance to the hidden state values and considering the features effectively, but due to not considering the hidden state values other methods are lacking in providing the better accuracy and f-measure.

Conclusion
Novel classification approach ensembling HMM and SVM methods for medical data classification is proposed by us, and its core contribution is to ensemble the HMM and SVM to solve the issues that arise during classification. The proposed classifier is evaluated using Z-AlizadehSani dataset for classification accuracy to predict coronary artery disease. The result shows that the proposed classifier has outperformed the baseline schemes. Future enhancement of this research work can be focused bioinspired optimization based classification, which will result in achieving improved and increased classification accuracy.