Bi-Level algorithm for the segmentation and counting of Leukocytes and Erythrocytes

Background/Objectives: To present an accurate quantitative approach based two-phase algorithm to count both the leukocytes and erythrocytes for identifying the severity of leukaemia in the human body. Methods/Statistical analysis: The algorithm is having two-phases with the ﬁrst phase meant for recognizing and counting the leukocytes using the thresholding based segmentation technique that focuses on the intensity values of pixels of the greyscale blood smear images; whereas the second phase recognizes the erythrocytes by their circular shape using Circular Hough Transform (CHT) method. The system experiments with 26 stained blood smear images from the ALL-IDB1 benchmark dataset. Findings: The ﬁrst phase of the algorithm achieves 99.41 per cent overall accuracy in leukocytes detection and in the second phase 99.76 per cent overall accuracy is attained in erythrocytes detection. Novelty/Applications: This proposal applies Circular Hough Transform in detecting the erythrocytes by adjusting the radius of the circle according to the magniﬁcation rate of the sample image.


Introduction
Many research proposals evident that leukocytes' counting helps in detecting leukaemia. Along with this, it is also essential to find out the erythrocytes counting for providing the largest possible accuracy.
In (1) the preprocessing is done by standardizing the color space, and then the segmentation is performed by applying color component subtraction. Color component subtraction is experimented on RGB, CMYK, and HSV color spaces. The subtraction of G component of RGB color space and S component of CMYK color space has given 97.79 per cent of segmentation accuracy. In (2) a segmentation algorithm is developed using filtering, thresholding, and https://www.indjst.org/ watershed transform. The algorithm has used shape and color based features to do the segmentation of leukocytes in the accuracy rate of 99.86 per cent and erythrocytes at the rate of 93.4 per cent. That method has not achieved the highest accuracy in detecting erythrocytes, because the applied watershed algorithm less performs at detecting the ridgelines of the overlapping and crowded erythrocytes. It is apparent that the shape, size, and texture of these two blood components are differing to a great extent (3,4) .
In (5) an algorithm is developed, that finds out a distinct threshold value for each blood smear image based on the high and low-frequency values of blood components. Along with this threshold value, the Sobel filter and watershed transform methods are applied for detecting the edges of blood cells in the frequency domain. The algorithm is tested with 30 blood smear images and yields 93 per cent accuracy. In (1,6) the different types of WBCs are segmented amongst RBCs using Otsu's thresholding along with mathematical morphology and watershed transform, extracted geometrical features which define the shapes of WBC and then classified them by applying SVM in hierarchical stages. In their work, the erythrocytes are distinguished using histogram equalization. Their system achieves 64 per cent to 95.27 per cent recall value in separating RBCs with 307 test images. In (7) the RBCs are partitioned using Otsu's thresholding. The radius based CHT (Circular Hough Transform) method is used for identifying RBC components by drawing circles on the segmented image. This is used in finding the maximum radius and the minimum radius red blood cells and to identify their count. This algorithm is not 100per cent accurate because of not recognizing some RBCs due to their shapes are not exact circles, and they are falling out of the range of calculated radius. The algorithm has experimented with 10 real-time images. In (8) found the Euclidean distance between the centroid of WBC nuclei and the leftover pixels. Then they applied fuzzy rules to segment the erythrocytes in the accuracy rate of 97.31 per cent and leukocytes with an accuracy rate of 95.39 per cent. This fuzzy inference system classifies the image components as leukocytes, leukocytes nuclei, leukocytes cytoplasm, erythrocytes, and plasma. Here erythrocytes are classified in the accuracy rate of 95.39 per cent with 530 test images. In (9) an algorithm is developed with two subparts: In the first part, the sample image is preprocessed using the linear filter after that the WBCs are segmented with Otsu's thresholding and morphological operators in the accuracy rate of 94.25 per cent. In the second part, RBCs are segmented counted in two ways, by applying the watershed algorithm 91.07 per cent accuracy is obtained and by applying CHT 92.67 per cent accuracy is achieved. This algorithm has got experimented with the ALL-IDB1 dataset.
In (10) the RBCs are segmented by applying a set of image processing methods like pre-processing with wiener filter, edge detection with log filter and post-processing with morphological operators. Their system got 97 per cent of segmentation accuracy. In (11) a system is introduced for segmenting the leukocytes, erythrocytes, and plasma of blood smear images. This is done in several steps such as smoothing by non-linear Kuwahara filter, color normalization in HSI color space, pre-segmentation of the nucleus with thresholding, pre-segmentation of plasma, WBCs, and RBCs with naïve Bayes classifier, localization of erythrocytes using template matching approach, and segmentation of leukocytes using level-set approaches.
In RBC segmentation, all of these methods have given importance to the segmentation phase. Uniquely, the proposed system explores a set of pre-processing steps to highlight the cells for the proficient segmentation. It is clear that the same procedure cannot be used for segmenting both the leukocytes and the erythrocytes. Hence this paper proposes a two-phase algorithm for the separate recognition of these two blood components. Leukocytes can be comfortably segmented by applying thresholding and watershed transform on the other hand erythrocytes are segmented with Circular Hough Transform (CHT). Like the other blood components, erythrocytes are also circular objects; their diameter values fall in a particular range and are large in the count. The CHT does the segmentation by considering the roundness of the radius of these target objects. This, in turn, separates erythrocytes from the larger leukocytes. To promote the segmentation accuracy of the algorithm contrast adjustment has been carried out in the pre-processing stage. This helps to detect edge pixels of the erythrocytes.

Image acquisition and Pre-processing
Obtain the true color blood smear image I, Convert it into grayscale image I GRAY SCALE .
The outcome of the pre-processing stage is shown in [ Figure 1 ] and work flow of the proposed system is shown in [ Figure 2 ].

Thresholding
As the majority of leukocytes in the sample images are lymphocytes, and they have larger nucleus regions, they can be recognized by segmenting their nucleus regions. The leukocyte nucleus segmentation is implemented by recognizing the intensity variation between WBC nucleus regions and the remaining components. Therefore, the gray converted image is thresholded by setting 255 (represents white) as the intensity for the pixels those who have the intensity values less than 100. This thresholding partitions the nucleus regions of WBCs and displays them full of white pixels. This is done by having an estimated threshold value of the image I GRAY SCALE . Since the intensity values of the image pixels are ranging about 0 to 255, set some random threshold value T 1 within this range which partitions the image components into two groups. They are foreground components and background components. Here the foreground is the components that represent WBC and background is the blood components such as erythrocytes and plasma. The threshold value is adjusted for standardization by making an increase or decrease. This process is repeated to some selected set of image samples and the mean threshold value is found from them.
The mean threshold value serves as the consistent value for segmenting the WBC of the test images.
The thresholded image is binarised for further processed by the watershed algorithm. This is done as:

Watershed transform
The watershed transform algorithm is applied for separating the touching cells, which helps in identifying the accurate counting. Watershed is a type of region-based segmentation that treats the regions as catchment basins and their boundaries as watershed ridgelines. And hence partitions each catchment basin from one another results in finely segmented regions (12,13) . Steps are: 1. Complement the grayscale image I GRAY SCALE .
2. Perform distance transform on the complemented image. 3. Negate the distance transform. This makes the leukocyte regions as catchment basins. So that each leukocyte has a catchment basin. 4. Apply the watershed transform. This task results label matrices for catchment basins made up of values greater than zero. 5. The zero-valued pixels construct the boundary of catchment basins, thus separate the leukocytes.

Mathematical morphology
The resultant image is given to morphological operators to get adjusted the irregular boundaries of leukocytes.
Erosion: This process of morphology shrinks the objects by discarding unrelated details in general cases. Here it is used for separating the touching leukocytes from each other (14) .
Here I W BC is the test image and B represents the structuring element. Two-dimensional disk form of the structuring element is used for doing erosion. The variable z denotes the component pixel.
Closing: This is the process of dilation and then erosion. It smoothes and fills the contour of leukocytes and discards tiny holes on it.
Filling: Filling operation fills the holes inside the leukocyte regions by converting the background pixels into the foreground till all the background pixels falling inside the leukocyte' boundary are getting converted.
I CLOSED c denotes the complemented image. X denotes the component pixels of the image Border Clearing: It clears the leukocytes touching the boundary of the image by suppressing the object pixels lying along the image boundary. This is done by checking the pixels on the boundary of the binary image if any pixel value is 1, all the pixels of the object in which it is a component are converted to 0.

Feature extraction
The needed and necessary geometric and shape features of the segmented image such as solidity, area, and roundness are extracted (15) . Area: It finds the area of the Region of Interest (ROI) by counting the number of pixels falling inside the region.

Area = Number o f pixels in ROI (9)
Solidity: It represents the coefficient of the difference of the distance of each pixel along the boundary to the center of the leukocyte object. It can be defined as: Roundness: It is used to define how much nearly the leukocytes are looking like geometric circles. It is calculated as:

Feature thresholding
The features are thresholded to filter the leukocytes from some other background blood elements. After sampling a random set of images, the threshold of solidity, area, and perimeter are set as 0.80, 800, and 0.40 respectively (16) . It is done as: Finally, the leukocytes are counted by using the index value.
[ Figure 3 ] shows the steps followed in segmenting the leukocytes.

Exploring Erythrocyte blobs using morphological operators
The erythrocyte blobs of the gray converted image are opened using erosion with the help of a structuring element with a radius of 50 pixels. Then the eroded image is dilated. The structuring element is like a disk in shape. This is used for separating the background of the components of blood such as platelets, leukocytes, and plasma.
The separated background is subtracted from the original image. This process diminishes the background from the foreground red blood cells.

Contrast adjustment
The intensity rate of all the pixels in the background cleared image needs to be adjusted to get increased visualization of erythrocytes. The existing pixel intensities are mapped to new intensity values, which amplifies the contrast ratio of the sample image and hence the region of interest can be found easily. After doing the contrast adjustment, the erythrocytes appear darker than the background; therefore the darker regions can be targeted by the forthcoming segmentation method with less effort.

Circular Hough transform
CHT is one another segmentation technique used for recognizing the circular objects or any other geometric structures though they are imperfect in shape. It can do this job even without having any prior information about the circles found in the image. The basic equation defining a circle is described as: Here, a and b define the center pixel of the circle, x and y represent any pixel upon the circumference of the circle, and r symbolizes the radius (17,18) . The radius is not given, therefore the algorithm starts generating circles with radius = 1, 2,3,…. The upper limit of the radius will not exceed the pixels of the diagonal of the test image. Values of the variables a as well as b are calculated as: Here, Θ sweeps from the value of 0 degrees to 360 degrees. According to these equations of a circle, the CHT has to find out three anonymous values, they are the center, the radius, and the point lies on the circle. Using these values, it finds how many of these sets of features are found in the image. Initially, it identifies the parameters of a curve which finely matches with a set of edge points. These edges are commonly identified by an edge detection algorithm such as Canny or Sobel. Detected edges may be noisy or multiple edge fragments may be found for a single set of features (19) . The edge detector detects the edges wherever the specified set of features found, thus displays circles on the image. The peak values of the features corresponding to the essential lines in the input image. [ Figure 4 ] shows the stages of erythrocytes segmentation before applying CHT. The CHT is utilized here in this paper to solve the problem of detecting the circle-shaped objects in blood smear image which are erythrocytes whose radius is approximately equal to the value passed as arguments. Almost all the erythrocytes are falling in the radius range of 30 to 50 pixels wide; these are efficiently detected by the algorithm. The radius range is adjusted for standardization and the algorithm is tested with several sample images. The radius is thresholded is needed here for giving precise counting of the cells. It is found that the radius range varies depends upon the magnification rate of the images in the dataset experimented. The image without doing contrast adjustment is yielding poor results with CHT than the contrast adjusted image with CHT.
[ Figure 5 ] shows the image after applying CHT.

Experiments and Results
The dataset chosen to do this experiment is ALL-IDB1 constructed by Donida Labati. This is an open-source benchmark dataset issued on request that contains 108 peripheral blood smear images in JPG format. This dataset contains both cancerous and non-cancerous images indicated by the numbers one (1) and zero (0). Part of the image set is captured by a hematology laboratory microscope attached with the Canon PowerShot G5 camera. The size of each RGB image is falling in between 1.6MB to 1.9MB. The width and height of the images are 2592×1944 pixels and resolution is 180dpi. Whereas the remaining images are taken with the help of Olympus Digital Camera in the width and height of 1712×1368 pixels and 144 dpi resolution. These two sets of images are differing on lighting, magnification, and resolution. The size of each image in this set is less than 600KB (20) . The system is implemented using Matlab 2015a Image Processing Toolbox installed on HP-PC which is having a 64-bit Operating system, 4GB memory and Intel(R) Core(TM) i5 processor. It is supported by the Windows 7 Professional operating system. The first phase of the algorithm segments and counts the leukocytes. It is experimented with randomly selected 26 test images that show 99.41 per cent of accuracy, while the second phase detects and counts erythrocytes with an accuracy rate of 99.76 per cent. To find the accuracy, the results are compared with the ground truth values of each image provided by a field expert. Along with this, the ground truth values are also shown for the appropriate images.
[ Table 1 ] shows the leukocytes and erythrocytes counting and their individual accuracy for each image resulted by the algorithm in detecting leukaemia. The results of over segmentation done by the algorithm are shown in [Tables 2 and 3 ] for leukocytes and erythrocytes respectively. [ Table 4 ] shows the features extracted from the sample image. The significant geometric shape features chosen for taking decisions about the blood components are area, solidity, and roundness. The table depicts the features of leukocytes after doing feature-based thresholding.
[ Table 5 ] compares the results of the proposed system with some existing researches in segmenting leukocytes and erythrocytes.     From the observed results of Images 5, 9, 10, 12, 15, 20, and 26, the over segmentation of erythrocytes done by the algorithm can be understood. The algorithm does over segmentation of leukocytes in four images namely Image1, Image3, Image9, and Image10. Overall accuracy of the algorithm in segmenting and counting leukocytes and erythrocytes for 26 images are calculated separately as: Here, actual result is the total of number of white cells or red cells in each of the sample images, and expected result is the ground truth value obtained with the help of trained hematologist. In the 26 sample images, 341 leukocytes are there as per the ground truth; whereas the proposed algorithm identifies 339 leukocytes only. Same way, 26 sample images possess 8403 erythrocytes, and the algorithm found out 8383 erythrocytes.

Conclusion
The medical domain keeps on welcoming the research works paving the methodologies to meet the unmet solutions. This proposal emphasizes to the fundamental need of recognizing the counting of blood components that permits to detect a diverse range of diseases. This task is accomplished by developing a well-defined two-phase system attaining high accuracy. In the first phase, the segmentation accuracy of leukocytes is 99.41 per cent and the second phase registers 99.76 per cent of accuracy in segmenting erythrocytes. On certain images, the second phase of the algorithm performs over-segmentation due to the reason that the dataset is having images of two variations in contrast level, as they have been taken by two different cameras. Over-segmentation falsely identifies other blood components which are overlapping in their radius and diameter with the same of erythrocytes. In some locations, many circles are coinciding. This is because of numerous nearby Hough spaces peak with related feature values. In the future, these issues can be safely avoided by enriching the texture features of erythrocytes than their background and by finding out the edge pixels more accurately.

Limitations
Insufficient availability of online open source blood smear images limits the research scope of testing the proposed algorithm.