Recognition of Disease in Leaves Using Genetic Algorithm and Neural Network Based Feature Selection

Objectives : To suggest a suitable image recognition approach for the early recognition of leaf diseases using hybrid features with genetic algorithm and neural network feature selection technique to maximize the accuracy. Methods: Various image processing techniques are utilized to recognize disease in the leaf. In the pre-processing phase, CNN based de-noising is utilized to remove noise from the image. Next, disease part of the leaf is segmented by Pixel wise Classiﬁcation approach with an optimization technique. Then, the colour and the segmented area of an image is used to remove out texture features. With GLCM and SIFT technique. Then the appropriate features are selected by Genetic Algorithm and Neural Network based feature selection. Extracted features are classiﬁed using an Ensemble classiﬁer for recognizing disease in the leaf. For this experiment images are taken from the plant village data set. In this work, the disease caused by Alternaria solani and Pytopthara infestans pathogens is considered for recognizing the disease in the leaf. 1500 leaves are used. From each image, 1654 features are extracted. There is a 70% training and 30% test data split. Classiﬁcation accuracy, precision and recall measures are considered for evaluating the proposed work’s performance. Findings: The proposed work gives 97.7% classiﬁcation accuracy, 97.3% precision and 97.5% recall measures with various visual feature descriptors and GANN feature selection. Novelty: The in-depth investigation compares the proposed descriptor GANN_SVM and GANN ES detection and classiﬁcation technique to local and global SVMs. The suggested descriptor outperforms current approaches for diagnosing Alternaria solani and Pytopthara infestans leaves and the method can be applied to all plants infected leaves.


Introduction
"An image is worth a thousand words". Technological advancements have created many opportunities with digital image processing to recognize the subject of visual content. Image processing performs some operations on digital images to extract an enhanced version of an image or to get some significant information from it. It is also applied in various fields such as Medical, Astronomy, Agriculture, etc. Agriculture, which is the leading sector of the world's economy. The quality of agricultural product can be affected by various diseases present in the plant's parts such as stem, leaf etc. Recognition of diseases in plants plays an active part in the current scenario. In the age of technology, automation is the most effective solution to any problem. Detection of plant diseases manually in remote location is a wearisome job. It is necessary to develop an automatic system for detection and recognition of plant leaf diseases. Processing of image is a crucial task because of enhancing the appearance of the image and to get more relevant information from the image. The first step in image processing is getting the image data. Image data is captured by scanning the surface via various electronic or optical devices. Then, the captured data undergoes various image processing operations to obtain the relevant information.
Pre-processing is the first step in processing of the image. An image is degraded by variety of reason. The image quality is increased by using pre-processing techniques such as image enhancement and noise removal. Enhancing is the process of improving the quality of image content for subjective evaluation of human perception. This process helps to bring out the most clarity, sharpness and details needed for extracting and analyzing information. Noise removal or denoising removes the unwanted distortions present in the image while preserving significant details from the image.
Early detection and classification of apple leaf disease is performed with IFPA-GA with SVM-SVI technique which achieves better accuracy and speed. This accuracy may be improved by using different features selected from the image. The authors have used only a limited number of plant disease categories for their study. This might not be representative of the entire range of plant diseases that exist (1) . Image segmentation based on PSO is considered for recognizing sunflower leaf diseases with minimum distance classifier in which the accuracy of the system is further improved with a hybrid image segmentation technique because proper segmentation part increases the accuracy of the recognition (2) . The accuracy for detection and classification of fungal leaf diseases in soybean leaf was achieved by 87.3% with incremental K-means, Color and texture features, and SVM classifier. The accuracy depends on the database sample size (3) .A methodology based on histogram intensity indices is used to develop a method for segmenting leaf blight sickness focused on detecting only the affected area (4) .An intelligent image-based system is developed with an adaptive clustering technique to find the segmentation region only (5) . The paper proposes an innovative genetic algorithm called GARS (6) for feature selection in high-dimensional datasets. The proposed algorithm has not been compared with other state-of-the-art algorithms for feature selection and also does not provide any analysis of the stability of the selected features by the proposed algorithm. A framework for segmentation of plant leaf images using improved fast fuzzy c means clustering (IFFCMC) and adaptive Otsu Threshold algorithm is proposed with 2D Adaptive Anistropic diffusion filter (2DAADF) and Adaptive Mean Adjustment (AMA). The performance of the segmentation can be measured using Jaccard and Dice Similarity Coefficients. JSC achieves 0.749 and DSC has 0.7754 values which outperform the traditional methods (7) .The rapid detection and recognition method is used for disease identification with HIS and LAB-based color segmentation algorithm and CNN for classification of disease. This approach has shown a 75.59% average detection rate for diseased parts under complex background conditions. The accuracy is also improved with (8) . The supervised learning algorithm is used to create a model for classifying and identifying maize leaf diseases, but the model can not be applied for all types of samples and the suggested random forest classification technique is superior to the other techniques and achieves 79.23% accuracy (9) . Minimum cross entropy (MCE) based multi-level thresholding with the bacterial foraging optimization (BFO) algorithm is used to separate the cropped image from the complex background which gives better quality segmentation results than the ABC algorithm (10) .Image Clustering based on Non-dominated Sorting Genetic Algorithm (NSGA-II) is proposed to detect the diseased part, feature selection is carried out with PCA, and a multi-class SVM classifier is developed identify the disease in the tea leaves, respectively. This system achieves 83% accuracy which can be improved further (11) .Thermal camera image for the detection of disease in banana trees is carried out in (12) using a multilevel thresholding technique and this method produces better parameter values above 80% namely, the recall value of 85.4% the precision of 89.35% the f-measure of 87.3% and the accuracy of 92.8%. Identification and detection of plant diseases are carried out by machine learning and image processing techniques with hybrid methodologies (13) .
The evolution of cutting-edge Artificial Intelligence (AI) technology, which supports high production, effectiveness and benefits in various industries. With image processing technology, Texture and Bayesian optimization with convolutional neural network is considered for disease recognition. One of the main limitations of this work is that the results are not compared with those obtained in previous works. The dataset used in this research was also created using images captured under varying lighting and context situations for each class. This led to classification bias. Before beginning a classification process, the authors advise performing segmentation and removing the background. The dataset also has an inequitable distribution https://www.indjst.org/ of classes because the image count for some classes varies greatly (14) .Machine learning and computer vision algorithms are utilized to detect the diseases in plants in which the accuracy may be improved with appropriate feature extraction technique (15) .Disease detection in tomato leaves are considered which used ensemble CNN for classification of disease in the leaf (16) . The suggested models (15) (16) were tested on a limited dataset of tomato leaves with six disorders. It is unclear how well the models would perform on other types of plants or diseases. Additionally, the proposed models require high-quality leaf images, publicly available datasets, and accurate segmentation, which may be difficult to obtain in practice.CNN with optimal feature selection is suggested for recognizing disease in apple leaves with increasing accuracy and computational time (17) .The proposed framework's performance was evaluated only on the augmented Plant Village dataset, which may not represent of all possible scenarios.
A hybrid feature selection method that combines wrapper and filter-based techniques is developed for microarray datasets and tested on eleven high-dimensional datasets. Additionally, the method is to be improved by applying filter methods as a fitness function (18) . The proposed method may not be applicable to all types of datasets with imbalance classes. Recognition of leaf diseases in fruit plants is considered in (19) and transfer learning is proposed to get the effective results in the classification stage. One of the limitations is that the data set used in this study is unbalanced, which reduces the improvement in accuracy of the proposed model. The study (20) applied the Inception V3 model to classify and identify diseases in Basil and mint plants. The study did not analyze false positives, which could be critical in detecting potential misclassification of healthy leaves as infected. Deep Learning models are used to identify the disease in olive leaf with Genetic Algorithm in which GA is utilized to optimize the parameters in the deep learning model (21) .The study focused specifically on the diagnosis of olive leaf diseases and may not be directly applicable to other plant diseases or agricultural contexts.
Even though most researchers have addressed this issue in different aspects, there are some limitations in segmenting the desired part from the image. These limitations include extracting relevant features from the image, and the techniques used in each phase of the recognition process.
This work uses pre-processing, image segmentation, feature extraction, feature selection, and classification phases to recognize the disease present in the leaf. In the pre-processing phase, CNN de-noising is applied. The leaf image's desired diseased part is segmented using pixel-wise classification approach with the Duck Search Optimization technique. This segmentation approach gives the optimum clustering result for segmenting the diseased part of the leaf. Next, Global Color (265).

Methodology
The main goal of this work is to recognize the disease present in plants using leaf of the image. Visual examination of disease present in the plant is time-consuming and inconvenient task. The methodology used in the proposed work is depicted in Figure 1.
Emerging technologies such as image processing, computer vision, machine learning are employed to recognize the disease present in the leaf of the plant very efficiently. The leaf image has to undergo various tasks to recognize the disease present in the image. First, the image must be pre-processed to get a more enhanced image and then infected, healthy and background parts are separated using image segmentation operations. Each disease has different types of symptoms. Next, information present in the infected part is extracted using various feature extraction methods and then appropriate features are selected and the selected features are given as input to an ensemble classifier for recognizing the disease.

Image Pre-processing
The main goal of image pre-processing is to improve the quality of image data that decreases the unwanted distortions and improves some features present in the image which will be useful for further higher level process of images. Image denoising is the process of removing unwanted distortions and to restoring the original image. CNN Model uses 2 steps: designing a network architecture and learning the network using trained data. To design a network, this work uses 3-layered architecture. First layer consists of Rectifier Linear Unit (ReLU) and Convolution. Batch Normalization is included between Convolution and ReLUin the second layer to quicken training of network and enhance the performance. Convolution is the last layer. This technique estimates the residual image and then get the clean image by subtracting the residual image from the noisy image. This CNN model is used to handle image denoising at unknown noise level (15) . https://www.indjst.org/

Image Segmentation
Image segmentation is the process of subdividing an image into multiple parts. Normally, this can be achieved by Similarity methods, Discontinuity analysis and Pixel driven Methods. In this work, a Pixel based method is used to segment the image. Pixel based methods uses the pixel level features and classify the pixels into different groups. This work uses Fuzzy C Means (FCM) and Support Vector Machine (SVM) techniques to group pixels into diseased, healthy and background parts. In addition, the Duck Search Optimization (DSO) technique is utilized to optimize the cluster center in FCM. DSO's main objective is to maximize the distance between duck groups and minimize the distance to search for prey. In this work, the cluster centers for grouping the pixels are optimized to separate background, diseased and healthy part of the image.

Feature Extraction
The features extracted from the image play a major role in recognizing the disease present in the leaf image. Generally, features present in the images are color, texture and shape. These features can be extracted locally or globally. In local feature extraction, images are divided into multiple patches, and the required features are derived from the patches. The required features are extracted from the whole image in global feature extraction. In this work, local and global processing extracts features from the image. Color features are extracted from the image's R, G, and B components based on color moments. Next, features are extracted from the histogram of the gray scale version of the image. Texture features of the image are extracted by using first and second order statistics (GLCM). GLCM stands for Gray Level Co-occurrence Matrix. Then, Texture features are extracted by using SIFT. SIFT uses two steps such as SIFT Detector and Descriptor. SIFT Detector is used to detect the interest point. From the patch around the interest point, SIFT find gradient and magnitude of histogram to describe the feature descriptor. In addition to this work, this work finds statistical texture features from the patch. Bag of Visual Words generates a fixed length feature vector for all images in the dataset irrespective of their size. Every feature extracts a different set of values from the image. All the above extracted features are joined together to describe the proposed visual descriptor.

Procedure to describe the proposed Visual Descriptor from the Image
Input: Segmenting ROI https://www.indjst.org/ The output is a feature vector describing the feature.
1. Extract color moments (Mean, Standard deviation and Skewness) from RGB Components of RGB image. Store it as CF. 2. Extract 256 features from a gray level image using histogram analysis. Store it as HF 3. Extract 10 texture features using First order and GLCM statistical measures. Store it as TF. 4. Find interest points in the image using SIFT Detector 5. From the patch around the interest point, extract features using magnitude and orientation histogram, first order and GLCM statistics. 6. Cluster the features using K-Means Clustering into 10 Clusters to find bag of visual word to obtain fixed length feature vector. Store it as SF. 7. Combine all the features extracted from the above steps and store it in a one dimensional vector as VD = [CF, HF, TF, SF]. 8. Repeat the procedure for all images in the image dataset

Feature Selection using GANN
Feature selection is the most important part of the disease recognition process. It eliminates the unwanted and redundant features that are extracted from the image to get an accurate classification. This paper implements a wrapper feature selection method to eliminate the inappropriate features. Genetic Algorithm (GA) is used as a search procedure and Neural Network (NN) is used as a classifier to measure the goodness of the selected subset of features. An initial population of chromosomes is generated for search solutions, and then a new, more effective population is generated via selection and recombination operators to find the optimal value among them (6) . The fitness function is to minimize the loss of the neural network. Minimizing the mean squared error of NN is used as a fitness function.

Genetic Algorithm
GA is a search approach based on a biological system's Darwinian natural selection and genetics. Genetic Algorithm is one of the most effective computational models based on evolutionary principles. In various fields of image analysis, genetic algorithms becoming increasingly popular to address challenging optimization problems. These algorithms use recombination operators to encode a candidate solution to a problem in a simple chromosome-like data structure while preserving essential information. In feature selection, genetic algorithms use natural selection to reject "weak" alternatives and eventually provide the most suitable option.
GA is based on selection. In this step, nature will choose people from the initial population who have strong genes that contribute to the population in the next generation, which is crossing. They will create children after completing the cross-over operation called step reproduction. After that, apply mutation to random changes in individual parents to form children.

Neural Network
Deep learning techniques rely on neural networks, a subfield of Machine Learning commonly known by its Artificial Neural Network (ANN). Their signature and overall design are derived from the brain, taking cues from how real-life neurons communicate. One variety of ANN is known as a Feed-Forward which is based on unidirectional nature of its processing. It consists of a large number of artificial neurons called units arranged in a series of layers. This indicates the layout of the grid, it means the amount of concealed layers and the quantity of uncovered units in every layer. The main components of NN are the artificial neuron i.e., the computing element, the architecture of NN and the learning algorithm used for network training. Figure 2 shows the flow of GANN feature selection.

Procedure used in GANN feature selection
1. First, make the population's individuals (chromosomes and set their initial values 2. Estimate fitness value for each individual in the population using NN 3. Perform selection operation, to choose the individuals that recombine for next generation using a roulette system based on their fitness levels selection. 4. Apply crossover operator, to recombine the selected individuals to generate a new population Apply mutation operators to solve the problem of low diversity in new generation https://www.indjst.org/

Classification
Classification is a crucial part in recognizing the disease in the field of image processing. It is used to classify the feature present in the image for recognizing the disease. The selected feature from the above step is taken as an input to the classifier algorithm. Ensemble classifier with boosting technique recognizes the disease with the selected features. To evaluate the performance, experiments have been performed using the chosen features with SVM and an ensemble classifier. These classifiers measure the performance of the analyzed features with accuracy, precision and recall

Results and Discussion
For experimentation, leaves affected with Alternaria solani and Pytopthara infestans pathogens are collected from plant village potato and tomato leaf images (22) .SVM and Ensemble Classifiers are used to choose the classifier with enhanced accuracy. 1150 leaves were collected for this purpose. Utilizing MATLAB R2020a and an Intel Core I7 processor with 16 GB of RAM and an 8 GB GPU, the suggested system is created. The performance of the proposed technique is assessed in terms of parameters such as classification accuracy, precision, and recall. In this work, the global descriptor is descripted using color and texture features globally. The local descriptor is descripted by Scale Invariance Feature Transform (SIFT) methods. 1654 features data sets are constructed for each image before feature selection. The appropriate features are selected from the feature set using GANN feature selection. The proposed feature set is evaluated with a global descriptor(GD) that is descripted based on color and texture attributes globally and a local descriptor (LD) based on Scale Invariance Feature Transform (SIFT) methods.

Performance Measures without GANN Feature Selection
The proposed descriptor (PD) extracted from the segmented image is compared with GD and LD descriptors. These features are classified using Support Vector Machine(SVM) and Ensemble Classifier(ES) for recognizing the disease. Table 1 gives the performance measures of the feature set constructed for disease recognition before feature selection from the feature set From Table 1, the proposed descriptor(PD_SVM) combines color and texture features extracted using color moments, histogram, first and second order statistics, SIFT methods. The relevant features are selected from the above descriptor using https://www.indjst.org/ GANN wrapper feature selection technique. Figure 3 compares the performance of the proposed GANN Feature selection with different feature descriptors and with SVM, Ensemble classifier (ES).

Performance Measures with Previous Work
The proposed approach is compared with the methodologies explained in Mukhopadhyay (11) , Nanehakaran (8) , Panigrahi (9) and it is shown in Table 3. Table 3. Performance Measures with Previous Work Methods Accuracy in % Mukhopadhyay (11) 81 Nanehkaran (8) 75.59 Panigrahi (9) 79.23 Proposed ES_GANN_Selection 97.7 According to the findings, the proposed method yields the most accurate results in terms of accuracy for recognizing the disease affected in the leaf. This is compared to other mentioned methods in the literature.

Conclusion
Image processing plays a vital role in all fields such as medicine, agriculture, remote sensing, etc. Proof of the efficiency of a plant disease detection method relies on the choice of the quality of the input image, the efficiency of the algorithm for detecting the diseased part of the image for recognizing the disease, the selection of efficient features to describe and finally, the choice of classifier for recognizing the disease. The proposed image recognition approach uses pixel wise classification with an optimization technique that improves cluster center selection and gives better segmentation results. Features such as hue and texture taken from the segmented part also enhance the classification accuracy with the suggested feature extraction and feature selection technique. This is compared to other standard features selected from the image. The performance of this work is compared with some existing approaches in the literature. This proposed work gives 97.7% classification accuracy. Speed measure is not considered in this work and only considers leaves affected by Alternaria solani and Pytopthara infestans pathogens. In the future, various features are retrieved from the image using different feature extraction techniques. The relevant features are selected from the set using other optimization-based feature selection methods. Different pre-processing and image segmentation approaches will be employed to improve classification accuracy.