Enhanced segmentation network with deep learning for Biomedical waste classification

Objective: To maximize the accuracy of classifying the medical wastage, an Enhanced Segmentation Network (EnSegNet) with Deep Neural NetworkTrash Classification (EnSegNet-DNN-TC) is proposed in this article. Methods: Initially, a core trainable segmentation network called SegNet framework is proposed which uses the Encoder-Decoder Network (EDN) and a pixel-wise classification layer for image segmentation. The decoder is used to upsample its low-resolution input feature maps via max-pooling. Also, SegNet uses fewer parameters for training. The uncertainty inherent to the EDN ismodeled by the Bayesian functions to segment the input images. But, this SegNet can sample a limited amount of pixels in the images. Hence, an EnSegNet is proposed that uses Content-Sensitive Sampling (CSS) to samplemore pixels in the data-sparse regions and fewer pixels in data-dense regions. Once the segmentation is completed, the DNN is applied for classifying the wastage using the segmented images. Findings: The experimental results show that the EnSegNet-DNN-TC framework achieves 88% accuracy compared to the DNN-TC for considering 100 images of different categories of biomedical wastes from the trash image dataset.


Introduction
Biomedical wastage normally creates from human, animal healthcare, medical training and research, biological laboratory wastage and other facilities. Part of the wastage stream is contagious or possibly harmful and should be carefully handled to protect health and sanitation workers. Typically, biomedical wastage are regulated and controlled based on different standards and protocols in various nations. In healthcare applications, the wastage are produced during inappropriate management which causes a direct health impact on the public, the atmosphere and the healthcare personnel. Biomedical wastage are a dangerous health hazard to the community, hospital, healthcare units, flora and fauna of the region. It should be accumulated in https://www.indjst.org/ a secure atmosphere at all times and must not be mixed with a chemical, radioactive or other laboratory trashes (1)(2)(3) .
Nearly 75-90% of the biomedical wastage is non-dangerous and as harmless as any other wastage. The rest 10-25% is dangerous and may be harmful to humans or animals or the atmosphere. The Government of India states that biomedical wastage is a part of hospital hygiene and maintenance activities. The World Health Organization (WHO) has classified biomedical wastage into different types such as common wastage, contagious or hazardous wastage, radioactive, chemical, pathological, pressurized containers and drugs. Also, a series of training modules on better practices have been developed by the WHO in biomedical wastage management covering all features of wastage management activities from detection and classification of wastage for directing their secure disposal using both non-incineration and incineration policies.
In recent decades, the classification of biomedical wastage has been interested which is a promising application of computer vision. One of the easiest methods for automatically classifying the wastage or trash is deep learning algorithms that can detect and classify the wastage by using the images (4) . Many Convolutional Neural Network (CNN) frameworks such as ResNext, ImageNet, VGG, ResNet, MobileNet, DenseNet and RecycleNet (5) have been available for biomedical wastage classification process using images. Among those algorithms, ResNext was the best framework for Transfer Learning (TL) to categorize the trash.
This ResNext framework has been used by Vo et al. (6) to design a DNN-TC framework that automatically classifies the trash in smart wastage sorter machines. At first, the trash image dataset was collected which comprises many images belonging to various classes: organic, inorganic and medical wastage from Vietnam. Then, a DNN was applied which was an enhancement of ResNext for increasing the classification accuracy. The standard ResNext-101 was modified by adding two Fully Connected (FC) layer for reducing the redundancy. In the data preprocessing step, the brightness of input images was normalized. After that, horizontal flip and random crop methods were applied to the input images for generating more images in the training and testing. During the training process, the pre-trained weight was loaded from the actual ResNext-101 on the ImageNet dataset. Then, the fine-tuned process was performed for learning the features of wastage from the trash dataset and the framework with the best accuracy was chosen by estimating the testing dataset for classifying the final output of each input image. Here, the confidence for each class was computed by the log softmax function in the last layer. Though it achieves the best accuracy, a segmentation technique was required for preprocessing the input images and further improving the efficiency of trash or wastage classification. Therefore, in this article, an EnSegNet-DNN-TC framework is proposed for increasing the performance of the wastage classification. Initially, a core trainable segmentation network called SegNet framework is proposed for preprocessing the input images. It has the EDN which is topologically equal to ResNext-101 architecture and a pixel-wise classification layer. The decoder mainly upsamples its input feature maps by max-pooling. Also, it uses the reduced number of parameters for training. Moreover, the uncertainty inherent to the EDN is modeled via the Bayesian functions for segmenting the input images. But, it can a sample limited amount of pixels in the images. As a result, an EnSegNet is proposed that uses CSS to sample more pixels in the datasparse regions and fewer pixels in data-dense regions. Thus, this EnSegNet is learned to use the sampled pixels for segmenting the image into data-sensitive super-pixels. Then, the segmented image is fed to the DNN for efficiently classifying the trash.
The rest of the article is prepared as follows: Section 2 studies the researches related to the wastage classification. Section 3 describes the functioning of EnSegNet-DNN-TC and Section 4 portrays its performance. Section 5 summarizes this research work and suggests future scope.

Literature Survey
Kennedy (7) proposed an OscarNet using TL for classifying the disposable wastage. In this model, a large CNN was pre-trained for the ImageNet process. Also, the FC layers were removed and a single hidden dense layer was added for classifying the images of disposable wastage into different types. However, it was not suitable for training features of multiple large CNNs simultaneously. Also, the decoding time was high due to the high dimensionality of the feature maps.
Chu et al. (8) proposed a Multilayer Hybrid deep-learning System (MHS) for automatically sorting the wastage disposed by individuals in the urban regions. First, the wastage images were acquired and fed to CNN for extracting the image features. Also, a Multi-Layer Perceptron (MLP) method was used to consolidate images and other features for classifying the wastage as recyclable or others. But, its efficiency was poor when wastage items lack distinctive image features.
Aral et al. (9) analyzed different deep learning models such as DenseNet, InceptionResNet, MobileNet and Xception structures for classifying the Trashnet dataset. Here, Adam and Adadelta were applied as the optimizer in these network structures. But, the accuracy rate was not effective in real-time systems because of a comparatively small amount of data and white background of the images. https://www.indjst.org/ Adedeji & Wang (10) proposed an intelligent wastage classification by ResNet. Here, Support Vector Machine (SVM) was used rather than the FC layer and optimized by the radial basis kernel for classification. But, the accuracy was not effective. Sousa et al. (11) suggested a hierarchical Faster Region-based CNN (FR-CNN) for identifying and classifying the wastage in food trays. Also, a novel dataset called labeled wastage in the wild was collected and annotated for classification. However, the mean average precision was less and the complexity was high.
Xue et al. (12) proposed CNN for realizing the fast analysis of fertilizer via evaluating different fertilizing phase images. Here, images of various fertilizing ingredients were gathered for constructing the dataset which was classified by CNN. But, the training was complex while increasing the network layer numbers and parameters. Mazloumian et al. (13) recommended DNN for classifying the food wastage using preprocessing and classification. The preprocessing was used for enhancing the images via scaling, background subtraction and Region-Of-Interest (ROI) cropping. Then, deep CNN was employed to classify the wastage. But, the accuracy was less.
Toğaçar et al. (14) designed an auto-encoder with integrated feature selection in CNN for categorizing the wastage. First, the dataset used for the classification of wastage was reconstructed with the auto-encoder network. Then, the feature sets were extracted and fused using CNN. Also, the ridge regression was applied on the fused feature set to reduce the number of features and SVM was used for classification. But, it was not suitable for multi-class datasets.
Nowakowski & Pamula (15) proposed a new method for classifying and identifying the e-wastage. In this method, CNN was applied for classifying the types of e-wastage whereas FR-CNN was used for identifying the type and size of the wastage equipment in the images. Once the size and types of wastage were automatically classified and identified from the images, a collection plan was prepared by the e-wastage collection organizations via allocating the adequate amount of vehicles and payload capacity for a specific e-wastage project. However, complexity was high while using large-scale datasets.
A multi-level approach (16) was introduced for segmenting the waste objects. First, the scene-level segmentation was applied to capture the long-range spatial contexts and create a primary coarse segmentation. Then, few possible object areas were chosen by the coarse segmentation and an object-level segmentation was performed. After, the scene and object-level outcomes were combined into a pixel-level FC conditional random field for generating the coherent final localization. But, its robustness was less while performing on multiple datasets with large object appearance.

Proposed Methodology
In this section, the EnSegNet-DNN-TC framework is explained in detail. Generally, SegNet framework is stimulated by the unsupervised feature learning structure. The core training unit is EDN. The encoder encompasses the convolution with filters, pixel-wise tanh non-linearity, max-poling and sub-sampling for obtaining the feature maps. The highest feature maps in the encoder are accumulated and transferred to the decoder which upsamples them by the accumulated combined variables. Then, the actual image is restored via convolving the upsampled maps.

Design of SegNet framework
Typically, SegNet comprises the EDN and the pixel-wise classifier. Its major parts are shown in Figure 1. It is only the convolution (conv) layer since no FC layers exist. The decoder can upsample its input via the max-pooling for generating the sparse feature maps. After, conv with the filters is performed for densifying the feature maps. Moreover, the resultant decoder feature maps are given to the softmax for segmenting the images in a pixel-wise manner.
The encoder involves 13 conv layers similar to the VGG16 (17) and so the training process can be initialized from the weights learned to segment and classify the huge amount of images. For retaining high-resolution feature maps and minimizing the number of training parameters, the FC layers are removed.
Each encoder has a compatible decoder so that there are 13 layers in the decoder. The resulting decoder outcome is given to the multi-class softmax classification to create separate class likelihoods for all pixels. The group of feature maps is generated by conv with the filters in the encoder.
After that, these are batch regularized using an element-wise Rectified Linear Unit (ReLU) (max (0, x)). Next, max-pooling with a non-overlapping window is employed to sub-sample the input image. Before this task, the edge details are estimated to reduce the loss of spatial resolution.
After sub-sampling, all feature maps of the encoder are generated while storage is not restricted. But, it is not applicable in real-time uses. So, an efficient method is used for collecting only the highest feature values in every pooling window. A suitable decoder upsamples its input feature maps by the highest feature values obtained in the respective encoder feature maps.
The decoding method of SegNet is shown in Figure 2 wherein a, b, c and d are the values in the feature map. Typically, it utilizes the max-pooling for upsampling the feature maps and convolving them with decoder filters. https://www.indjst.org/  In this method, sparse feature maps are generated and convolved using the decoder filters for generating the dense feature maps. After that, batch regularization is used on every map. Here, the decoder compatible with the primary encoder generates a multi-channel feature map whereas the remaining decoders generate the feature maps with an equal amount of dimension and channels in their encoder. The outcome of the resultant decoder is given to the softmax which segments all pixels separately according to their likelihoods. But, it can sample a limited amount of pixels in the images. As a result, an EnSegNet is proposed that uses CSS to sample more pixels in the data-sparse regions and fewer pixels in data-dense regions.

EnSegNet framework using CSS
A measurement of content-sensitiveness(ConSen) is introduced for producing the content-sensitive superpixels. It defines the superpixel's dimension must be responsive to the deviation of the data in the super-pixel. So, the ConSen of a super-pixel is https://www.indjst.org/ measured by the fraction of the color deviation in the super-pixel (S) to the size of it.
In Eq. (1), K stands for the number of pixels in S, p i denotes i th pixel in S and M p i denotes the color deviation of p i which is determined in horizontal and vertical orders. Here, S is a set of grouped homogeneous pixels in an image. For both orders, 2 positive and 2 negative elements are considered. Consider the pixel P (x 0 , y 0 ) whose color is c (x 0 , y 0 ), the window dimension around P is (2s + 1) × (2s + 1). Assume H − 1 and H − 2 are the negative elements in horizontal order of P, H + 1 and H + 2 are the positive elements in horizontal order of P.
Here, c (x, y) is the color of a pixel at position (x, y). Likewise, the color deviation is obtained by denoting V − 1 ,V + 1 ,V − 2 and V + 2 , accordingly. After that, the color deviation of P is defined as: Where To use the density matching property of SegNet-DNN-TC, the CSS is proposed for generating the content-sensitive superpixels. So, it produces huge clusters having the number of components while increasing the number of training images and smaller clusters having some components while using fewer amounts of images. Thus, the major aim of CSS is that numerous pixels have to be sampled in data-sparse areas and lesser pixels in data-dense areas. So, a likelihood of sampling pis defined as: In Eq. (11), Max (I) is the highest deviation of color for each pixel, M (p) is the color deviation of p. A larger L (p) which signifies pixels in data-sparse area needs to be sampled. So, the input image for the training dataset is segmented and fed to the DNN 5 classifying the biomedical wastage efficiently.

Experimental Results
In this section, the effectiveness of EnSegNet-DNN-TC is analyzed and compared with the DNN-TC framework by using MATLAB 2017b. In this experiment, a trash image dataset is collected which consists of 200 images of different categories of biomedical wastage: infectious waste, chemical waste, sharp waste, pharmaceutical waste and pathological waste. Infectious wastes include blood-soaked bandages, discarded surgical gloves and masks, cultures, stocks or swabs.
The chemical wastes are various types of chemicals used in the production of biologicals, cleansing, etc. Sharp wastes are needles, syringes, scalpels treatment, autoclaving or micro blades, glasses and so on. These may cause waving and mutilation shredding puncture and cuts. Similarly, pharmaceutical wastes can be the site of spills, half-used bottles, IV equipment with residual medicine on it. The pathological wastes include the materials eliminated from the body in surgery and fluids as well as solids removed in autopsies except teeth. From this dataset, 100 images are taken for training and the remaining 100 are for testing. The comparison is carried out based on precision, recall, f-measure, accuracy, error rate and Root Mean Squared Error (RMSE). Figure 4 portrays the samples of the considered trash image dataset.

Precision
It is measured according to the amount of correctly classified biomedical wastage at True Positive (TP) and False Positive (FP) rates.

Precision =
No

Recall
It is measured according to the classification of the biomedical wastes at TP and False Negative (FN) rates.

Recall =
No  Figure 7, the recall for EnSegNet-DNN-TC and DNN-TC frameworks with a varying numbers of images are depicted. This analysis observes the recall of EnSegNet-DNN-TC for 100 images is 5.01% maximized as compared to the DNN-TC. So, it is concluded that the recall of EnSegNet-DNN-TC can be increased while increasing the number of input images.

F-measure
It is computed as the harmonic average of precision and recall.

Accuracy
It is the fraction of accurate classification of medical wastage over the total number of trials performed.
TP is an outcome where EnSegNet-DNN-TC appropriately classifies the biomedical wastes images as biomedical wastes. TN is an outcome where EnSegNet-DNN-TC appropriately classifies the non-biomedical wastes images as non-biomedical wastes. FP is an outcome where EnSegNet-DNN-TC inappropriately classifies the biomedical wastes images as non-biomedical wastes. FN is an outcome where EnSegNet-DNN-TC inappropriately classifies the non-biomedical wastes images as biomedical wastes. In Figure 9, the accuracy (%) for EnSegNet-DNN-TC and DNN-TC frameworks with a varied number of images are portrayed. This analysis observes the accuracy of EnSegNet-DNN-TC for 100 images is 4.76% maximized as compared to the DNN-TC. So, it is concluded that the EnSegNet-DNN-TC can maximize the accuracy for biomedical waste classification with an increasing amount of images. https://www.indjst.org/

Error rate
It is calculated as: In Figure 10, the error rate for EnSegNet-DNN-TC and DNN-TC frameworks under a varying numbers of images are shown. This analysis indicates the error rate of EnSegNet-DNN-TC for 100 images is 24.22% reduced as compared to the https://www.indjst.org/ DNN-TC. Thus, it is observed that the EnSegNet-DNN-TC can minimize the error rate while increasing the number of images for classifying the biomedical wastes.

RMSE
It is also a measure of the accuracy of segmentation. It is computed by taking the square root of MSE value as: In Eq. (17), N is the total amount of images, S is the segmented image, A is an actual image and i, j are pixels in the images.  Figure 11 depicts the RMSE for EnSegNet-DNN-TC and DNN-TC frameworks with the different numbers of images. This analysis notices the RMSE of EnSegNet-DNN-TC for 100 images is 7.43% minimized as compared to the DNN-TC. So, it is concluded that the EnSegNet-DNN-TC can reduce the RMSE when the number of images is high for classifying the biomedical wastage.

Conclusion
In this article, an EnSegNet-DNN-TC framework is proposed to increase the accuracy of wastage classification. At first, a SegNet is designed in which EDN uses max-pooling to upsample the input feature maps. As well, its uncertainty to segment the images is measured via Bayesian operators. But, it samples a limited amount of pixels. So, an EnSegNet is developed which applies CSS to sample more pixels in the data-sparse regions and fewer pixels in data-dense regions. Once the segmentation is completed, the DNN is applied for classifying the wastage. To conclude, the experimental outcomes proved that the EnSegNet-DNN-TC achieves 83.76% mean accuracy and 0.138 mean error rate compared to the DNN-TC. Though it extracts features sufficiently, there are subtle variances between different images and misjudgments due to its high complexity. Hence, the future work of this research work includes the fusion of deep features and texture features to prevent the misjudgments of EnSegNet-DNN-TC using complex background images.