Potato Plant Leaf Diseases Identiﬁcation Using Transfer Learning

Background/Objectives: Agriculture is a major food source for Ethiopian population. Plant diseases contribute a great production loss, which can be addressed with continuous monitoring. Early plant disease identiﬁcation using computer vision and Artiﬁcial Intelligence (AI) helps the farmers to take preventive course of action to increase production quality. Manual plant disease identiﬁcation is strenuous and error-prone. Methods: In this study, we present a convolutional neural network architecture inception-v3 model to detect potato leaf diseases using a deep learning-based transfer learning technique. We used separable convolution in the inception block that can minimize the number of parameters by an outsized margin and to utilize resource eﬃciently. The inception-V3 model have a higher training accuracy and needs less training time than the main CNN architecture, as the used parameters are fewer. Findings: In this study, there is an improvement on the little noisy on sample images which leads to misidentiﬁcation of diseases. In our experiment, we have used an RGB color channel image dataset to train model, which yields an overall accuracy performance of 98.7% on the held-out test set. Novelty: In order to identify potato leave diseases, we conducted transfer learning for high performance classiﬁcation with pixel-wise operation to enhance the number of leaf images. A model based on inception-v3 transfer learning approach is presented in this study for disease identiﬁcation of potato leave images, thus provide an eﬀective computer-aided recognition model for potato disease classiﬁcation in the absence of large data.


Introduction
Potato is an important food source across the world, and it can be stored to provide food security as a result it is a commercial and existence agriculture (1) . In developing countries like Ethiopian, plant diseases are detected manually by trained experts scouting in cultivation field and inspecting potato foliage (2) (3) . This task is very monotonous, in some cases it is impractical due to the unavailability of professionals in remote regions (2) . However, advancements in image processing, and deep learning https://www.indjst.org/ in disease recognition of plant leaves using images can make the process far effective and timely. The common disease of potato plant is early blight and Late blight. Such kind of plant diseases is caused by pathogens like bacteria, fungus, parasites, virus, and unfavorable environmental conditions. Plant diseases in the leaf affects photosynthesis process, thereby leads to plant death (4) .
Disease identification in its early stage plays a vital role in the agriculture industry due to plant diseases are often unavoidable. It is essential to be treated the potato plant diseases as soon as possible to manage the degradation of the plant quality and its production quantity which will lead to a loss, when a certain potato plant is infected by diseases. Although, as simple as it may sound, identification of potato plant disease is not a light task to be carried out. To be able to detect potato plant leaf diseases, it needs trained personnel knowledge which is in fact, doesn't incorporate. As a result, the farmers would require to consult trained professionals in potato plant identification which can be rather expensive, boring, and sometimes results would be inaccurate. Nowadays, deep learning has become a significant field of interest of researchers, contributing an essential part in the farming industry for a various importance's from fruitlet grading to weed recognition.
In this study, we try to design a model for potato plant leaf disease identification based on images taken from kaggle plant village dataset using inception-v3 neural network architecture. In image identification, inception-v3 is powerful in understanding patterns from large numbers data, increase the depth of the network to achieve higher performance, reasonable computation cost, number of trainable parameters used in the network and resource utilization (5) (6) . Transfer learning is immensely and effective when we have small dataset, which has a pre-trained model. This model has been trained on an extremely large dataset (7) .

Literature review
In the past few decades, machine learning approaches were used in several fields such as face recognition, image processing and video processing, but feature engineering remain a difficult issue. However, the emergence of image processing and deep learning gains a substantial quality results in various benchmarks for plant pathologists without strenuous feature engineering. Plant disease identification has been the most important research field in which convolutional neural network and image processing methods have been widely used for accurate disease identification. Here we take some of research works conducted related to plant leaf disease detection and classification using different advanced approaches. Usama Mokhtar (8) provides an efficient method that could identify a tomato leaf as healthy and diseased. The image that has been given to the model was preprocessed initially by removing unwanted background and gray level occurrence matrix is used for texture feature extraction. It has achieved an accuracy of 99.83% using the linear kernel function using support vector machine classifier. However, the gained performance accuracy is high, it is not adequate enough to identify the healthy and diseased tomato leaves. David Hughes (9) presents image based plant disease detection using deep learning techniques in which authors used AlexNet (10) and GoogleNet (11) models to train 54,306 images from the plant village dataset in that GoogleNet Achieves 99.35% of training accuracy. But, the accuracy downs to 31.4% while tested in images taken under various conditions to train the model. In the paper three training split distributions 75/25, 60/40 and 70/30 was used. In (12) an artificial intelligence based banana diseases identification is presented using deep learning approach in which they trained in three different convolutional neural network architectures i.e. inception V2, ResNet50, and MobileNetV1 to recognize banana leave diseases and pests in transfer learning technique. Authors have used 18,000 banana leave images taken from different areas and labeled into 18 various categories. The experimental output shows that the model achieved 90% training accuracy from the dataset used in the experiment. In (13) neural network based tea leave disease recognition is presented in which authors firstly, data enlargement and segmentation are used to process the tea leave images, and then input to network for training the model. Secondly, to achieve a higher identification accuracy of the network, the iteration and learning rate were adjusted and dropout was applied in the case of overfitting. Finally the experimental result indicates that the identification accuracy achieved 93.75% of training accuracy. However, in this study, high performance classification is employed by adopting the pre-trained inception-v3 based transfer learning weights which is extended by training models from ImageNet dataset to accelerate the learning efficiency of the new inception-v3 model to build better performance detection through by sharing the trained model parameters to the new model via transfer learning.

Materials and methods
A generalized overview of the classification of the potato plant leave disease is presented in the following Figure 1. To implement our new transfer learning model a dataset is taken from public database. The images are labeled according to their class category then pre-processing is conducted including resizing of images, filtering of images, applying various data augmentation techniques such as image rotation, flipping and shifting to maximize the size of the dataset. The training and validation images are fed into the pre trained inception-v3 model and features are extracted. Deep learning is a part of machine learning and artificial intelligence algorithm in which its layers are closely related (14) . The result of the first layer will be used as an input to https://www.indjst.org/ the next layer. In this work, we try to design an inception-v3 based transfer learning model for potato leave disease detection to build high performance detection for small data using pre trained on large datasets. In plant disease identification experiment, convolutional neural network is an appropriate learning technique in deep learning approach in which it can accurately recognize plant diseases (15) . The main steps in this work are image acquisition, image preprocessing, segmentation, feature extraction and identification of potato diseases as depicted in the following Figure 1.

Data acquisition
In our work we have analyzed 2152 potato leave images taken from plant village dataset, which have three categories. We split the data into two sections, the training portion which is dedicated to train the proposed model and the testing part is used for validation purpose. The data is divided by 80/20 for training and testing respectively. In this approach, we have resized the image dimension into the standard deep learning approach models which is 256 x 256 x 3 pixels in order to train the inception v3 model and to make the training computationally feasible.

Image pre-processing
The contaminated plant leaves in an image produces noise. The noise here would be leaf sand, may be dust and other stuff.
To get high training accuracy performance, it is significant to remove the noisy data from plant images. Image pre-processing methods then are used to remove noises from leave images. Many image pre-processing ways are accessible such as image clipping in which cropping the leaf image to get area of interest. The other technique is smoothing filter which is performed to achieve image smoothing. In image processing ZCA whitening, standardized rotation and translation were used for data augmentation. https://www.indjst.org/

Segmentation
Image segmentation is a technique of classifying each pixel in an image as belonging to specific class (16) . As for the various sizes of the potato plant leaves, it is imperatively essential to locate and segment the image to increase the performance of identification of potato diseases by reducing the background interference information of the leave images to get the image's interest of region that is convenient for the inception v3 model to extract features. Image segmentation technique is conducted based on various intensity discontinuity and similarities among the pixels (17) . Image segmentation means partitioning the image into various parts with same features, or having rough resemblance which can be used to identify feature similarities in the gray levels between the pixels in an image region. We have achieved segmentation in this work through converting an RGB color mode images to HIS model.

Feature extraction
Feature extraction plays an important role in digital image analysis (18) . Different image pre-processing methods such as standardization, thresholding, binarization, etc. are applied to the sample digital images before gaining functionality. After this feature extraction technique is applied to acquire patterns that are useful in image identification. After all the depiction of the image significant patterns of the image is extracted using innumerable natures of feature extraction with respect to images, the similar features together form a feature vector to recognize and categorize an object. In this work feature extraction is performed using inception-v3 model.

Classification
The great variations in size, shape, color, texture, background, layout, shape and imaging illumination of plant diseases and pests in real time environment makes the detection task difficult. Because of the strong feature extraction capability, the adoption of convolutional neural network-based identification and classification network has become the most commonly applied pattern in plant leaf diseases and pests detection. Feature extraction part of neural network identification network consists of cascaded convolution layer plus pooling layer which is followed by fully connected layer plus softmax classification layer. Softmax classifier performs identification of outputs based on the given inputs.

Experiments and results
When we look the following plot in figure 2, in the first epoch the training accuracy is around 65% and validation accuracy is around 75%. Then both the training and validation accuracy automatically increases linearly and no decreasing, at the same time the validation loss is decreasing linearly and oscillates. There are much variations in between the training and validation accuracy. Therefore, there is an over fitting problem in the model while we train using the dataset. To address this issue, we used a dropout and augmentation methods.  Figure 2, most of the time the training accuracy is higher than validation accuracy throughout the curve. However, the variation between the training accuracy and validation accuracy is lesser than compared to in the training phase. As clearly depicted in Figure 3, the training loss is far lesser than the validation loss throughout the curve in the plot. It is due to a random sample from our validation set: the validation set at each evaluation step is unrelated. Validation loss was very high initially, while it gradually decreases, over fitting happens in the validation phase as well. However, the amount is lower than compared to in the training phase. Training accuracy is augmented linearly and reaches 95%, while the validation accuracy oscillates up and down to reach 96% accuracy at some instant.  Figure 4, our model achieves 98.7% training accuracy and 97.3% validation accuracy. This identification accuracy is realized when the model is trained with the segmented or region of interest of the potato image which can increase the validation accuracy. As a result, segmenting an image and training the model with the segmented region of interest have an important impact on the model performance. We trained the model by applying augmentation, segmentation and dropout methods to increase the training and validation accuracy performance.

Fig 5. Training and validation loss
When we see the above plot in Figure 5, the training loss is much lesser than the validation loss across the curve. The curve in the training loss is more stable around in the final epochs as compared to the validation loss in the validation phase. However, the amount is decreases as compared in the training phase of the model. The validation loss somehow go oscillates towards in the validation phase. It was relatively small throughout the plot. Except at some instants such as epoch 25. The model here, performs out very well as compared in the validation and training phase.
In (19) employed a multi-level deep learning model for potato disease categorization using YOLOv5 image segmentation technique. The dataset used are taken from Central Punjab and Pakistan which can represent a specific region and plant diseases have a significant environmental factor and the model would not outperform in datasets that collected in other regions. In (20) plant disease identification using transfer learning and CNN is conducted in which authors replace the standard convolution with separable convolution to minimize the number of parameters that is trained on 14 various plant species and 38 different disease classes. The implemented model achieves 99.56%, 98.42%, 99.11%, and 97.02% using EfficientNetB0, inception-v3, inceptionResNetV2, and MobileNetV2 respectively. They used three various representations which is color, grayscale and segmented image datasets taken from plant village. In (9) performed a work by applying GoogleNet and AlexNet CNN architectures to train 54,306 image datasets taken from the kaggle public plant village database collected under in controlled environment, in which GoogleNet architecture realizes enhanced and reliable through a training accuracy of 99.35% and 85.53 % in case of AlexNet architecture. However, the performance accuracy decreases to 31.4% while tested in image datasets that taken under conditions varied from images used to train the model. In (21) contributed a residual network CNN architecture based plant phenotyping and infection classification on an unequal imbalanced data. But in this study, we build a high-performance potato leave disease detection using inception-v3 transfer learning weights which is expanded by training the new model from ImageNet dataset to accelerate the learning efficiency of the new inception-v3 model to build a higher accuracy performance. In our experiment the datasets are taken from the public dataset in that we apply 80/20 splitting ration for the training and testing phases.
In our experiment, the overall accuracy on the training dataset is performed to evaluate the performance of the proposed model with other models using unseen data. the highest test accuracy. Various parameters affect the performance of the pre-trained weight in deep learning approach. Metrics such as pre-trained weight size, number of parameters and depth of layers has significant impact on the CNN architectures accuracy performance. The accuracy of CNN models performance would enhance with the increase of layers.

Conclusion
Plant disease identification in its early stage plays a vital role in the agriculture industry. In this study, we attempt to design an inception-v3 transfer-learning model for potato plant leave diseases identification. The model is fine-tuned and trained to detect the healthy and diseased potato leave images. The achieved results indicate that the proposed model outperforms than the AlexNet and GoogleNet architectures. In our experiment work, the potato leave image from plant village dataset has three classes including the healthy leave images. The dataset we used for the experiment is a three-color channel image dataset by applying segmentation method. In the first experiment, the model achieves a training accuracy of 96.8%. However, after the augmented dataset, and applying segmentation on the images, the training accuracy is enhanced to 98.3% which is a higher performance. In the future work, potato leave disease identification would be further investigated with large number of datasets. We will conduct further research works using ensemble learning to analyze the diseases severity and to find higher performance.