Facial Emotion Identification Based on Local Binary Pattern Feature Detector

Objectives: The emotion detection is one of the important fields in computer human interaction and this study plays significant a role for identification facial expression from the images. To identify the single emotion, need a various variability of human shapes such as pose, color, texture, expression, posture and orientation. In this study, we implement Local Binary Pattern (LBP) based filters for identifying the dynamic face textures. And moreover, this approach also provides extension and simplification. Methods/Statistical Analysis: We used built-in FER2013 datasets, the database consisting seven classes (Surprise, Fear, Angry, Neutral, Sad, Disgust, Happy). The dataset is divided into three parts testing, validation and training (15% and 70%). The Convolution neural network is trained with feature Descriptor Local Binary Pattern. Findings: The experimental results have demonstrated that local LBP representations are effective in spatial dynamic feature extraction, as they encode the information of image texture configuration while providing local structure patterns. The advantages of our approach include local processing, robustness to monotonic grayscale changes and simple computation. The results show that, the performance LBP based Convolution Neural Network (CNN) model is better than conventional CNN. This research study further helps in image classification and image processing fields. Application/Improvements: It is recommended that LBP should be used for finding the local regions or pattern from the image. The LBP computation and local processing is quite better with robustness and monotonic changes.


Introduction
Facial expression is one of most important way for verbal communication, and it demonstrates internal affective intentions and states. Based on various facial expression and representation of human face, the Face is one of the predominant ways for communicating and inferring states with peoples. Over the last decade, programmed based expression detection powers in the field of the human computer interactions. (e.g. emotion analysis, speaking, chatting and image processing) 1 . FER recognition system is increasing day by day in field of artificial intelligence prompting for solving the important issue in the scientific community. The issues in FER system become even more advanced and hard when we detect the expression from the videos and distinct dynamic textures in image. The basic automatic facial expression system is divided into three different phases 2 : Face preprocessing / acquisition, Face feature detection/extraction and finally the Classification of facial expression. In face recognition system, face acquisition is basic processing stage where we find the important region from the input image. The most real face detector application used most common type cascade classifiers with Harr-wavelet features 3 . With the help of the eye position and face it detected textures which are quite balanced. After locating and identifying the face regions, the second step is to extract the features from the input original image to show the facial expressions. This task can be done through the two suggested techniques: a) Appearance based methods, and b) Geometric based methods 2 . The Geometric based approach shows the location and shapes of facial expression parts such as noise, mouth, lips and brows. However, the geometric features need the accurate and most reliable facial expression detection which is hard to understand the real time applications. More significantly, the usually geometric features cannot encode into the skin texture such as furrows and face wrinkles that are the serious and life-threatening for facial expression show off. As compare to geometric, the appearance approach shows the changes the skin texture of face, to combine with bulges, wrinkles and furrows. There are some image filters are used to extract the complete face features and extract the important region form the input image and try to find the facial appearance changes with help of Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA) and Gabor Wavelet Analysis (GWA). But, it's quite expensive for utilization of Gabor filters to convolve the face and extracting the multi scale and multi orientation coefficients from the original image. Gabor filter is inefficient with respect of time and memory supply 4 -7 . Recently, the first time Local Binary Pattern (LBP) proposed for the texture analysis, and its nonparametric techniques for extracting the local features and structure of input image 4 . Still good descriptor is not available for finding the appearance of local regions and computes low and high intra-class variance. The main assets of local binary pattern features acceptance brightness updating and simplicity in computation. In facial expression recognition LBP is successfully used for local regions and feature extraction from the image 5 -7 . In final stage of automatic emotion expression is classify the various emotion-based expressions from extract total features using different classification techniques such as Support Vector Machine (SVM) 8 and Simple Neural Network (SNN) 9 , Hidden Markov Models (HMM) 10 , K-Nearest Neighbor (KNN) 11 and rule-based classifiers 12 has been implemented for emotion detection and expression recognition.

LBP
The LBP based image filter operator is developed for texture description and feature extraction. This filtering operator assigned class in each pixel of the image and set threshold (3 × 3) with neighborhood per pixel with center pixel value and results will generated in binary numbers (0,1). Furthermore, histogram with labels can be used to describe the texture description 13 .
To deal with various texture scales, the LBP filter used to combine different neighborhood hyperparameters 7 demonstrating the local neighborhood needed to be sampling points and each pixel points are evenly adjusted with centered of the circle. Further, labeled with number of sampling points and radius in the operator LBP linked with gray scale and rotation in-variant texture filter. The bilinear interpolation is used whenever the point does not exist in the center. The notation P and R indicates the pixel neighborhood where P shows the sampling points and R shows the radius. Another type of original image filtering operator is uniform pattern the LBP operator also called uniform operator if the binary pattern contains binary transition (bitwise) between 0 -1 14 . Vice versa when the binary bits in circular form then 0-transitions (00000000), 2-transitions (01110000) and 2 transition are uniform whereas the pattern 4-transitions (11001001) are uniform 4-transitions (11001001) and 6-transitions (01010011) are not. Figure 1 Simple LBP feature descriptor operator and along with fixed and selected m neighbors the code of LBP operator is given for the center pixel t c given by: The Eq (1) configure and setup the LBP condition where values are 0 and 1 and output of the filter LBP shows with binary labels, as denoted with m-bit binary numbers with 2 discrete values shows the smoothness and texture alignment in local region. if t r ≤ t s U() = 0 and if t r > t s then 1 the local binary pattern computed in clockwise direction for example the sequence of binary numbers of number 83 (11001010). Assuming the coordinate t s (0,0) and t r of rsin (2π), rcos (2π) with each neighbor and hyperparameter set (m, r). In bilinear interpolation estimation, the neighbors do not exist in the center position. After completing and obtaining the local binary pattern operator code with existence of histogram 15 in Figure 2 shows the results LBP of dynamic texture with histogram orientation of edge, flat and corner. Figure 3 represents the system architecture of LBP based Convolution Neural Network (CNN). The CNN is combined with two types of layers, respectively called C-layer (Convolutional layer) and S-layer (sub-sampling layer) distinct from conventional deep learning methods. Addition to this, Convolutional neural network takes input as two-dimensional (2D) image. So far, this approach has unique advantages in area of image recognition. As shown in Figure 2, the images data feed as input in the classic network CNN. The model is convoluted with various kernels and generates the unique features, and moreover these features mapped into the layer C1. In addition, the layer C1 features will be reduced and sub-sampled into the layer S1 with respect of size and form. Usually, the size of pooling method is 2 × 2, and further this process also repeated for the other layers such as C2 and S2. Once the important features are extracted, the 2D pixels are converted into the 1D form and pass to the classic neural network. in applied computer applications, we mostly utilized softmax as final manifold classifier. The model used input 64 × 64 grayscale images and classification outputs results are rendered in single class from seven different classes, representing happy or other labels. There are four various sizes Convolutional layers are used from C1 to C4 and combine with three MaxPooling layers from P1 to P3 and addition to this, the regularization approach used as dropout layers in network and these layers are fully connected with input and output layers as shown in Figure 4. In C1convolutional layer used filters 64 × 64 the input image of 32 different automatic learnable kernels with size 3 × 3 to give results in 32 matrices and dimensional of 62 × 62. The convolutional results passed from C1 to P1 MaxPooling layer with (32) size of 2 by 2 learnable kernels. And results received from the P1 will be (32) matrices with dimension 31 by 31 size. The results from C1 to P1 passed from the second Convolutional layer C2 which has (32) self-learnable kernels size of 3 × 3 to receive the 32 matrices with size of 29 by 29. Then, MaxPooling layer P1 used (32) learnable kernels of size (2, 2) implements for continue the previous computed results to get (32) matrices of size 14 × 14. Next, the layers C3 to C4 with (32) automatic learnable kernels with size of (3,3) on every layer. The results get C4 are 64 matrices with size of (10,10) which passed to the P3 with auto learnable kernels sizes of (2, 2) to get the results in form of the (64) matrices with size 5 × 5. The results are passed from flatten layer with 1600 values. Further process the data, two fully connected layers are used, the layer 1(1024) hidden units and other with (512) and final outputs generates results with two classes "happy" or other classes from FEA2013 dataset.

Results and Discussion
In this section, the effectiveness of proposed approach is evaluated through the publicly available Facial  The results indicate that usage of modern deep learning models has potential effect on system to improve the performance of Facial expression database. Our proposed LBP based CNN model performed well as compare to other feature extraction operators with CNN. By merging with LBP feature extraction operator with CNN, we achieved 74% testing accuracy which is quite better than HOGCNN 16 . Addition to this, the LBP based architecture obtains good results utilizing additional samples of training data or features. We expect that addition of comprehensive data augmentation and face registration techniques with FER specific improve the further results. Table 1 summarized the results of proposed LBPCNN model on FER2013 database. The Figure 4 indicates training and testing accuracy with 98%,74% and similarly, Figure 5 shows the training and testing losses, where 0.05 training and 2.01 testing.
In Table 2 summarized the results of proposed FER-LBPCNN system with various hyperparameters. The numerous results have been achieved with combination of epochs (15 to 50) and batch sizes (8 to 1024). By using batch 32 and epochs 25 our model represents state of art performance. In Figure 6 depicts the results of confusion matrix with predefined labels. Further, this evaluation method analysis the performance of model in more detail with "True" and "False" positive and True and False negative. The matrix shows the results labelwise, where label "Happy" correct predication is 541 which is comparatively better than six labels. Figure  7 shows the classification report, where individually     precision and recall computed categorize-wise and addition to this, the classification report summarized in Accuracy 74%, macro 73% and weighted averages 74%.

Conclusion
This work proposed a novel and automated effective Facial Expression Recognition (FER) system. The FER system is dividing a facial image into small regions and using local binary pattern to compute the features and description of each region. In this study, we conducted experiments on FER2013 dataset which is publicly available and most common image database in FER based CNN. Furthermore, the dataset is divided into three parts: training, testing and validation with 35089, 3589 and 3589. Evaluation metrics are used to analysis the systems performance's with different way namely Accuracy and Confusion Matrix and Classification report. The FER-LBPCNN training and testing accuracy were of 98%, 73%, respectively. Similarly, Losses were 0.04, 2.00. It is found that the Accuracy of FER-LBPCNN is better than Histogram of oriented Gradients and traditional CNN models. We have identified the existing issues and bottlenecks. Our FER system is computationally simple and robust in terms of rotation variations and grayscale images making as it is very promising in real applications.

Future Work
In future, the more advance features descriptor filters are to be implemented for finding the weights for facial images by dividing into the local regions and further, explore temporal information.