Multi-Tier Hybrid Feature Selection by Combining Filter and Wrapper for Subset Feature Selection in Cancer Classification

Objectives: We have implemented a bio-inspired algorithm, particle swarm optimization method of miRNA subset selection to indentify the irreverent and redundant miRNAs for proper assessment of cancer diagnosis. Methods/Statistical Analysis: In this study we develop a creative multitier framework for subset selection to improve accuracy of caner classification. In the first tier we have used different filter methods to rank miRNAs according to their class relation then using union operator we have created a combinational model (Second tier) which consist of top ranked features of individual filter methods. Here the miRNAs are indentified according to their ranking with the threshold value defined. In third tier (feature pre selection model) improvised competitive swarm optimization algorithm is used to generate feasible optimal subset from the generated weighted miRNA in second tier to detect the biomarker gene for cancer detection. To minimize the gap between exploration and exploitation we have used Mamdani Fuzzy interference system. All selected genes from the fourth tire (feature reselection) is classify with classifier such as KNN. Findings: The objective has successfully achieved by implementing improvised competitive swarm optimization technique. Experimental result demonstrated that the proposed ICSO-KNN performs better than other method like PSO, PCA and PSO-KNN. ICSO-KNN outperformed with less error and larger amount of new solutions. Application/Improvements: We have four tier frameworks as an efficient feature selection algorithm which outperforms better. This approach may help to use any other metaheuristic feature selection to solve multimodal subset problem.


Introduction
The impact of miRNA is important for growth of cancer in human body. MiRNA are the small RNAs which are more then million no's available in the human body, but all of them are not responsible for growth of such dangerous disease.
So identification of irreverent ones is a tough task for due to the high dimension of data. To improve the classification accuracy we have proposed a new method for better classification of the cancer. Because identification of the related subset of cancer genes which is called biomarker genes for the better treatment of the patient. For that reason we have considered Subset feature selection as preprocessing technique for classification of cancer. Basically if the irrelevant and redundant miRNA feature present, then applied classification algorithm takes more time complexity and space complexity for such high dimension data. So to reduce such we should consciously apply the different preprocessing techniques to avoid or minimize from the dataset. Choosing of preprocessing techniques even if we remove small amount miRNA the information gene should not get loose. So after preprocessing the data is get available with fewer no of miRNA for betterment extraction of biomarkers. Feature subset algorithm is the best way to remove irrelevant and redundant miRNA 1 . Different subset selection algorithms are proposed, exhaustive search is one which evaluates each Keywords: Filter, Mamdani, Wrapper, ICSO, KNN and every subset and find out the best feature set. The disadvantage of exhaustive search is having exponential time complexity [2][3] . So we can assume that this searching technique is suitable for small and medium size datasets but not for high dimension datasets. For high dimension data we can consider approximate feature selection techniques like filter, wrapper and embedded methods. Filter method can be used in preprocessing phase to select high rank gene using a learning algorithms.
In wrapping the learning algorithm is wrapped with search algorithm to find a subset for detection of high performance subset feature set. There are two basic types of wrapper algorithms such as sequential selection algorithms and metaheuristic algorithms 4 . Sequential selection algorithm removes features until it reaches the maximum value of objective function. Sequential forward and sequential backward are the two different methods of sequential selection 5 . A metaheuristic algorithm evaluates different subsets based on optimization of the objective function [6][7][8] . Different types of algorithms belong to metaheuristics nature such as Particle swarm optimization 9 , Genetic Algorithms (GA) 10 , Ant Colony Optimization (ACO) 11 , Competitive Swarm Optimization (CSO) [12][13] . For solving continuous space problems PSO is the first option for solving the problem. Two types of PSO techniques such as Continuous PSO and Binary PSO has been proposed for various feature selection problems, it proved that CPSO performs better [14][15][16] . Basically for selecting feature subsets are computationally efficient by filter algorithm but it suffers with feature interaction problem. It means the optimized feature depend its interaction with other features. So the interaction between the features in the miRNA dataset may be various types such as two ways, three ways and multi ways.
It is noteworthy to discuss that importance of the miRNA varies when the feature is combine with another.
In the year of 1989 the researchers started solving of feature subset selection with metaheuristics algorithms. Due to large dimension of data this approach is not suitable till 2007. Most of the researchers used GA as a metaheuristics algorithms for selection of near optimal feature subsets. In 17 proposed for selection of a feature subset a Genetic algorithm with multiple populations is used where each and every neighbors share their knowledge (Solutions). A hybrid GA was developed by 18 with combination of local search algorithm. A hybrid genetic local search algorithm was proposed by 19 with combination if KNN with feature weighting. To improve the classification accuracy GASVM a hybridized model was designed by 20 . Similar different research subset feature selection with GA was described on [21][22] . For feature subset selection different metaheurastic algorithms are used bur Particle swarm optimization (Binary PSO and Continuous PSO) used by many of the researchers to solve problem. In case of Continuous PSO the feature with higher threshold value λ is selected otherwise feature not selected. To improve the classification accuracy a hybrid PSO-SVM model was proposed by 23 for feature selection. To solve feature selection problems 24-26 used a combination of chaotic maps with BPSO. To achieve highest featured genes and to avoid the premature problem [27][28][29] proposed advance BPSO algorithm by adjusting the local and global optimum. Similar different research subset feature selection with PSO was described on. A hybrid algorithm Ant Colony Optimization (ACO) and Artificial Bee Colony (ABC) was proposed by 30 to indentify the best ant of colony by exploit the bees where each bee considers their food source by the ants. A hybrid algorithm ABC and DE (differential Evolution) was proposed by 31 for best feature selection. A novel graph representation ACO by 32 which represents the features are represents as node in graph. From both nodes one represents selection of feature and other represents removing the feature. An ACO algorithm was proposed by 33 where the fitness of individual ant was defined by its classification accuracy.
From the above literature survey we understand that many researchers used multi objective optimization problem due to reduce classification error rate and number of features. Last four years PSO with multi objective feature subset used by many researchers. A multi objective ACO algorithm was proposed by 34 where ACO is used to reduce both classification error rate and number of features as described above. Multi objective feature with classification accuracy will performance optimized by using DE [35][36] . As per the literature study we analyze that none of the methods provide optimal feature subset with high dimensional dataset. So above approaches are advisable for low and medium dimension data. As per survey we conclude that most of the approaches are two tiers.
Where in first tier different filter algorithms are used to find out the best features and then a ranking approach is applied, the second tier wrapper approach is applied to the selected high ranked features.In 37-38 use two tier approaches for generating optimal feature subset from Vol 12 (3) | January 2019 | www.indjst.org

Fundamental Concept of Applied Optimization Technique
In this section, explain the concept of competitive swarm optimization.

CSO Algorithm
Till now many researchers has taken their approaches to PSO but these are unable to solve the large scale optimization problems. The performance enhancement of the most existing PSO methods is based on complex algorithm enactment which numerously increases the computational complexity. The present PSO techniques endeavor to alter gbest and pbest value, which makes the large scale optimization performance enhancement constrained. In 9 author has introduced CSO method as the effective solution to the large scale optimization problem in which the particles keep learning from the competitors in a random manner without considering the gbest and pbest. In each phase the swarm is arbitrarily partitioned into two different groups and from each group a pair wise competition is performed. After every execution the winner is considered for the next iteration and the other one updates its position and velocity by gaining the values from the winner one. Where itc is the iteration counter R R R itc i tc itc 1 2 3 , , e three randomly generated vectors within [0, 1] n , x x win itc los itc , denote the winner particle and the loser particle, respectively, x −itc denotes the mean position of current swarm in iteration t, and φ controls the influence of .

CSO Algorithm
in swarm p itc do 3. Initialize the position x i itc and velocity v i itc 4. end for 5. while termination criteria is not meet do 6. for all particle p i itc do 7. calculate fitness f x i itc ( ) 8. end for 9. while p itc =Ø do 10. randomly choose two particles p r itc 1 and p r itc 2 from p itc different datasets. In two tier approach the ranked feature pool (by ranking) has not consider the importance of wrapper. Proposed model of tier 1 (Preliminary Screening) we have applied different filter approaches for subset feature selection and considered the best ranked features of individual. In tier 2 (Combinational model) For classification point of view we have combine all high ranked features of individual selection methods applied in tier 1. In the tier 3 (Feature Pre selection Model) we have applied improvised competitive swarm optimization algorithm with the current swarm to achieve the best optimized result. As per my concern no such method is applied by any researcher using feature ranked obtain by using ensemble filter with competitive swarm optimization algorithm. In this study we have multi tier subset selection technique with combination of multiple filter and multiple wrapper techniques to find out the comparative study of cancer classification accuracy. Experiment analysis indicates that proposed multi tier technique provide better performance in term minimization of miRNAs, error rate, accuracy of classification. The overall concept of this paper is summarized below.
A multi tier hybrid feature selection is purposed. In the tier 1 of the purposed model different filters are such as Mutual Information Feature Selection (MIFS) 39 , Joint Mutual Information (JMI) 40 , Max-Relevance Min-Redundancy (MRMR) 41 , Interaction Capping (ICAP) 42 , Conditional Infomax Feature Extraction (CIFE) [43][44] , and Double Input Symmetrical Relevance (DISR) are used to generate the miRNA pool by detecting the top 10% rank features using union operator. Tier 2 we have calculated the weights of each miRNA based on its rank. The main motto of the tier 2 presents a filter algorithm to calculate the weight vector. In tier 3 of the purposed model we have implemented a wrapper algorithm to find out the optimal features which will provide better information about the disease. As we know top ranked features may provide better opportunity to indentify the biomarker genes from the large miRNA datasets. As per the literature survey we only few researches are done with ensemble filter wrapper using weight rank approach. The layout of this research article is presented as follows. The proposed model, experimental analysis of the proposed method, has been detailed in section 2 and 3 respectively. Finally section 4 and 5 presents the experimental analysis and conclusion drawn from this research work respectively.

The Proposed Approach
Algorithm 1 described the overall description of the proposed model. The individual stage description is stated below.

Tier 1: Preliminary Screening
1. Using different filter approaches find out the best featured miRNA. 2. The best feature can be identifying using ranking approach.

Tier 2: Combinational Model
1. Using Union Operator combine the selected feature by different filter methods. 2. Apply Rankers algorithms to predict weight of miRNA (Threshold Vector).

Tier 4: Feature Reselection
We have considered K -Nearest Neighbors (K-NN), SVM as classification algorithm. K-NN (Instance based machine learning) where k stands for no of training instances Figure 1.

Tier-1: Preliminary Screening
From literature survey different researchers applied filter approaches with ranker for best feature subset selection. But the used filter may give poor performance for another dataset. Due to lack of knowledge regarding the dataset it is difficult to choose best ranking based filter for a certain dataset. In the above situation we have use one filter-based ranker for feature subset selection with trial-and-error runs to consider a better filter algorithm 45 . As we know feature subset selection is a computationally expensive problem, it may suffer with high resource consumption.
To prevent above disadvantage and to reduce miRNAs ranking variability we have proposed a more effective and robust filter algorithm for miRNA selection which integrates several well-known existing ensemble algorithms 46 .
Here we have applied an ensemble to choose the miRNAs having top rank of each filter.

Combinational Model
In tier 2 of the proposed model used to find out the weight vector of the different features of the considered dataset. The ranked features of different classifiers are combined by evaluating the mean and the score of feature d is calculated by the mean of the ranking score of the selected miRNA ranking list. Implementing the min-max method 47 we try to normalize miRNA ranking value with 0.1 to 0.2.

Feature Pre Selection Model
In the pre selection model we have used competitive swarm optimization algorithm, during the execution of the proposed algorithm, the value of all particles are considered between [0,1]. And the proposed swarm technique performs the search operation in the continuous search.
For this algorithm we have considered a threshold parameter for mapping the solution with in continuous space to binary miRNA. Mamdani fuzzy interference is used to manage the balance between exploration and exploitation by keeping eye on the value of r 1 . Here we use purpose mamdani interference system with 2 inputs and 1 output [48][49][50][51][52] . Normalized current iteration (It) and normalized diversity of swarm in the decision tree are two parameters for fuzzy system. So it is clear that iteration of the input 1 it explores search space at beginning and later it become convergence by lapse of iteration. So Itc for   end while itc = itc + 1 end while fuzzy balancer evaluated as ratio of no of lapse iteration and maximum no of iteration. The range of (Itc) between (0,1). The diversity can be calculated with respect to variance of the miRNA.

Tier 3: Feature Pre Selection Model
Where n is the no of miRNAs, N is the no of particles and variance of different particles with dimension dim. Here we have considered the upper bound is 1 and lower bound is 0 when the particles are converging to single solution. Like the iteration the diversity value also ranges between (0,1). The following rule based is proposed to calculate r 1 for fuzzy inference system. The rule's are set as VH=1, High=0.5, Med=0.2, Low=0.1, Vlow (VL)=0.01. The cost evaluation function plays a major role for generating the optimal solution of a given problem. In our research we have consider K-Nearest Neighbors (K-NN), SVM as classification algorithm. K-NN (Instance based machine learning) where k stands for no of training instances.

Tier-1: Preliminary Screening
1. Using different filter approaches find out the best featured miRNA. 2. The best feature can be identifying using ranking approach.

Tier-2: Combinational Model
1. Using Union Operator combine the selected feature by different filter methods 2. Apply Rankers algorithms to predict weight of miRNA (Threshold Vector).
fitness function is set to 20000. We have also considered the value of c1 and c2 to 1.49 and the value of w is set to 0.7928. As per the requirement we have consider following parameter sets for program execution. For Xue's algorithm, the threshold parameter λ is set to 0.6 and 0.5 canonical PSO and original PSO respectively. The parameters for other experiment used from as per the original work done by the different authors. To achieve statistical results we have run each algorithm 40 times independently Table 1.

Result Study
Accuracy of a classifier depends on various criteria but the main motto is to maximize the generation capability (high accuracy and low error rate).To reduce over lifting problem we consider the average performance analysis of all algorithms. For this we have use Wilcox on rank sum test, where the symbols "+","-","≈" represents inferior to, superior or equal to of algorithms presented in Table 2.
It is noteworthy to express that our proposed algorithm (ICSO-KNN) performs in term of statistical misclassification elimination (low) as compared with others. Low ranked features (miRNA) eliminated by the searching process and indentify only the high ranked ones. Table  3 presents average positive predictive values of various algorithms. Table 4 represents the statistical no of indentified miRNA compared with different miRNa sub selection algorithms. Proposed algorithm performs superiority because the searching methodology reduces irrelevant miRNAs. From the table analysis it is clearly understood, the performance of PSO depends on none of miRNAs initialized during the first generation. But the purposed methodology not concern about the initialization and it use near optimal miRNA feature subset. Table 5 represents frequently selected miRNA. As we are mainly focusing on ensemble based filter technique, to achieve this we made a result analysis with individual filter approach and KNN against proposed method. It is presented on Table 6.

Tier 4: Feature Reselection
For classification we have considered to well-known classification techniques such as K-NN. To evaluate the threshold cost of for K-NN 10 fold cross validation we have to follow following algorithm.

Input Parameters
Vector x and threshold Vector Cost=Null For d=1 to n do If (distance of the particle > threshold value) Cost = Cost + distance End if End for Calculate the cost for K-NN for 10 fold cross validation. Output: Calculate the cost Vector.

Experimental Analysis
In this section we have evaluated the accuracy and effectiveness for the proposed algorithm to reduce the error rate of classification and minimization of no of miRNA features. Experimental section we have presented the numerical result of the proposed algorithm with comparison with different miRNA subset selection algorithms.

Dataset and Experimental Execution Settings
For experimental study we have consider two cancer datasets such as lung and melanoma from Gene Expression Omnibus (GEO) which is easily downloadable and publicly available 53 . We have not done any type preprocessing on mentioned datasets. We have spitted the dataset in to two parts such as training set and test set with ratio of 70-30%.

Dataset Details
Using Matlab 2015b we have implemented and tested some algorithms with K-NN classifier with k=10 and some are implemented on java SE8 using weka data mining tool .To evaluate the performance of the proposed algorithm with 2S-GA 54 , 2S-HGA 39 , 2S-PSO 38 , Xue1-PSO, Xue2-PSO, Xue3-PSO, and Xue4-PSO 35 , GA, the original CSO algorithm 25

Conclusion
The main focus of this study is to identify small feature subset from high dimensional feature dataset. The objective has successfully achieved by implementing  important feature of the purposed algorithm is that it is not rely on the initialization for selecting feature. We have four tier frameworks as an efficient feature selection algorithm which outperforms better. The next aim is to use any other metaheuristic feature selection to solve multimodal subset problem.