Attention Balanced Multi-Dimension Multi-Task Deep Learning for Alopecia Recognition

Objective : To increase the accuracy of Alopecia Areata (AA) classiﬁcation by learning local and global features across AA images and scalp hair images. Methods : An Attention-based Balanced Multi-Task Deep (AB-MTDeep) learning system is proposed. In this system, the MTDeep model incorporates both Multi-Task Learning (MTL) and Cross-Residual Learning (CRL) to simultaneously train hair and scalp images for recognizing AA conditions. In MTL, a new shared encoder is added to the MTDeep model, whereas in CRL, cross-residual layers are added to improve the model’s eﬃciency. According to this learning, both local and global features are learned at multiple scales, as well as, concatenated to get the cross-feature representation. Such features are then classiﬁed by the softmax classiﬁer to recognize AA conditions. Findings : Finally, the test outcomes demonstrate that the AB-MTDeep system on hair and scalp image databases realizes an accuracy of 95.11% compared to all other classical systems. Novelty : This model has considerably increased the accuracy of classifying AA conditions. Thus, it represents a promising classiﬁer for AA classiﬁcation.


Introduction
Many researchers have determined the variations in scalp conditions, yet in a few scenarios, they are difficult to find (1) . The deep learning-based intelligent scalp recognition system known as ScalpEye (2) elucidates how to automatically interpret scalp-hair microscopic images to evaluate the condition of the patient's scalp. The developed system is used for recognizing and diagnosing common scalp issues like psoriasis, cellulitis, baldness, and oily hair. This system includes a mobile application, a web-enabled deep learning engine, a web-enabled management system, and compact scalp microscopy. The deep learning engine used to detect various kinds of scalp issues is the FRCNN with the Inception_ResNet_v2_Atrous structure. AA is one of many scalp symptoms, with a lifespan frequency of about 2% (3) and a distinctive feature of rapid onset of non-scarring alopecia in typically clean regions (4) . AA is a common cause of hair loss, and numerous scalp and dermoscopic pictures have been used to identify and diagnose it. Trichoscopy and biopsies are often required (5) , but the issue of how many tests are necessary for a reliable diagnosis is one of the drawbacks. To solve this issue, deep learning algorithms (6)(7)(8) are needed to accurately detect both AA and scalp conditions in dermatology and trichology.
Therefore, an AB-MTDeep learning system based on both MTL and CRL is developed in this study for diagnosing AA and scalp problems in a large number of people with various types of baldness. The Multi-Task Deep (MTDeep) model, which includes the LSTM and Faster Residual Convolutional Neural Network (FRCNN), is used to extract global and local features at various scales from the scalp and AA hair images. The final feature vector representation was then obtained by merging these features. Further, an expansion of residual learning is adopted for the MTL, which allows intuitive training across various associated tasks such as AA recognition with the help of cross-links, namely cross-residuals. Such cross-residuals allow for stronger network generalization and are considered a type of in-network normalization. In residual learning, identity mapping using the bypass links is introduced to learn actual mapping. Finally, the Fully Connected (FC) layer, followed by the Softmax classifier, is trained to classify the AA conditions. To summarize, the major contributions are (i) an adaptation of a new expansion of residual learning by cross-links for pairing many associated tasks in a situation named cross-residual learning; (ii) the design of an MT-Deep model with a fan-out structure using cross-residual layers; and (iii) an analysis of the AB-MTDeep network on AA recognition challenges, providing a unified system with higher precision.
Thus, the MTDeep system can be extended by cross-residual learning, and the data from different associated tasks can be merged to get the cross-task representations.

Recent Works
A Deep Neural Network (DNN) (9) was presented for determining the Severity of the Alopecia Tool (SALT) score and identifying the affected scalp areas in AA patients. A computational technique based on texture examination was presented to obtain AA lesions by evaluating pre-processed scalp photos. But the photos were captured at a particular tertiary institution, which leads to limited generalizability. Also, the training was time-consuming and sensitive to artifacts induced by improper exposure.
The pre-trained classification of scalp diseases (10) was developed using image processing techniques. The scalp photos were initially gathered and prepared. The Region-Of-Interest (ROI) was then identified from all of the photos using various attributes like form, color, and texture. During the categorization, the pre-trained characteristics served as a reference. The scalp conditions were categorized using the SVM. But it was time-consuming and suitable for a limited dataset.
Based on the lightweight CNN model, the Hair Diagnosis MobileNet (HDM-Net) method (11) was developed, which was quick and easy to use to determine the degree of hair damage. The HDM-Net was used to find and choose the features. The SVM was then fed these features to classify photos of hair damage. Although it decreased the number of parameters, its precision was insufficient.
The trichoscopic characteristics of Female Pattern Hair Loss (FPHL) in Chinese Han patients (12) were investigated, and the variance between male and female patients with FPHL was analyzed. First, trichoscopy photos were collected for various scalp regions. Then, hair density, hair shaft diameter, vellus hair percentage, and hair follicle unit percentage were computed and examined manually. However, with insufficient data, the solutions may not be delegated to healthy people.
A methodology (13) was created for classifying healthy hair and AA. First, pre-processed hair photos of healthy and AA conditions were gathered and divided. Then, different features were retrieved from each segment, including texture, shape, and color. Additionally, those features were divided into healthy and AA using SVM and KNN classifiers. But the accuracy of these classifiers was not effective for a large number of samples. A deep learning model (14) was developed for automated trichoscopy scan evaluation and a quantitative framework to categorize male androgenetic alopecia. First, trichoscopy scans were obtained, and a deep learner was constructed based on CNN. Then, the relationships between fundamental and detailed categorization were examined, and a quantitative framework was applied to predict fundamental and detailed categorization through multiple ordinal logistic regressions. But the dataset was limited, and most of the participants were Chinese. Various images from distinct areas were essential to enhance the framework.
With the DL model, the effectiveness of scalp density was examined (15) . First, patients with male alopecia provided RGB images of the scalp. The position data for the hair follicles and data on how to sort hair follicles based on the number of hairs were then obtained, along with the accompanying labeled image. The groups of hair follicles in those photos were also identified by the classification algorithms EfficientDet, YOLOv4, and DetectoRS. However, because the photos of Group 3 contain traits that are similar to those in the other groups, the effectiveness of all these classifiers for the hair follicles in Group 3 was decreased. Furthermore, group 3 and another group experienced class disparities.
https://www.indjst.org/ A deep learning-based intelligent scalp diagnosis and classification system called AI-ScalpGrader (16) was developed using EfficientNet to diagnose and categorize scalp conditions. But it achieved accuracy values of 87.3 to 91.3%. A 2D CNN framework (17) was developed to predict different kinds of hair loss and scalp-related diseases. But the drawback of this framework was the unavailability of a proper dataset and the lack of variety among the images distributed over the internet.

Methodology
In this section, the proposed AB-MTDeep learning system is explained briefly. A pipeline of the presented study is shown schematically in Figure 1. First, healthy and AA hair and scalp photos are obtained from freely accessible websites. The MTDeep model is then trained using those photos based on the MTL and CRL. Moreover, the learned model is utilized to classify the test samples into healthy and various AA states.

Dataset Description
In this system, the following 2 openly accessible datasets are considered for analysis: • Figaro1k dataset: It is a public dataset enclosing 1050 photos of hair, evenly allocated into distinct types like straight, wavy, and curly. It is available at http://projects.i-ctm.eu/it/progetto/figaro-1k. Of these, 350 photos of normal hair are considered for this study. • Dermnet dataset: It is a public dataset accessible on Dermnet, enclosing 23 types of dermatological illnesses with AA.
Overall, 1050 photos (350 from each AA type) are obtained for 3 distinct AA types: mild, moderate, and severe. It is available at http://www.dermnet.com/dermatology-pictures-skin-disease-pictures.
The hair and scalp photos in these datasets are fed to the AB-MTDeep system for AA classification. Figure 2 gives a few examples of scalp hair images.

Cross Residual Learning for Multitask Learning
For an input image x and output y vector to a residual learning layer and the mapping function F (x, (W i }) to fit, the residual learning is defined as: In Equation (1) associated training tasks called CRL. For a task t and N − 1 other associated tasks, the task output of the cross-residual block is defined as: In Equation (2), the superscript (•) defines the target task, and a regularization factor is neglected for simplicity and can be combined with the bypass weights W ( j) s . As depicted in Figure 3(b ), the other target tasks additively contribute to the present target task t by ∑ j̸ =t W

Early Normalization Interpretation
In optimization, while decreasing a loss L ( f (x) , y), a normalization term R ( f (x)) is added to limit the falseness of the solution, a factor in hypotheses in this model, and avoid overfitting. For instance, in resolving FRCNN-LSTM networks, the squared 2-norm is a standard option to penalize large variable ranges and smooth network mappings. Cross-residual elements are https://www.indjst.org/ considered as a method of normalizing the result of a particular task (AA recognition) by other associated tasks (AA types classification), i.e. the trained mapping F is not needed to be too far from a weighted mixture of task-specialized conversions of the input ∑ j W ( j) s x. Typically, if this normalization occurs in the loss unit of the FRCNN-LSTM, the earlier task training in the network is introduced by the cross-residual layers, and those are stacked for extra data combinations. A cross-residual layer adopts normalization via biasing at the layer level, i.e., by relating to a considered task's residual instead of the final loss, in contrast to the usual normalization. Thus, cross-residual layers act as an in-network normalization technique with less stochasticity.

Link to Residual Networks & Multitask Learning
The residual networks are considered bridge networks, which do not have change or carry gates. In bridge networks, an output bridge layer is described by In Equation (3), T and C are the change and carry gates, correspondingly. If both gates are on, then this is exactly similar to a residual layer. By expansion, a cross-residual layer is considered an ungated bridge layer with many bridges integrating onto an equal data link. After that, cross-residual weighting units carry gates that direct the number of cross-task pollination. Likewise, it is argued that residual layers can also be considered as the MTDeep training units without gates. In the case of the long short-term memory network, In Equation (4), Equation (5), & Equation (6), t is the time, x, f and o are the input, forget, and output gates, C and h are the cell and output states, correspondingly, as well as, keyhole links and a few bias expressions are neglected for simplicity. By discarding recurrent links t − 1 for the feed-forward case and creating the long short-term memory fully ungated, i.e.x = f = o = I and initializing the cell state to the input C t−1 = x, only a residual layer is presented. Through expansion, cross-residual layers are then considered feed-forward, ungated networks with additively coupled cell states. Cross-residual weight layers are then compared to the forget gates. Because the bridge layers are regarded as feed-forward long short-term memory with only forget gates, this is very similar to the carry gate of bridge networks. A key variation is that cross-residual layers merge the converted input H with many and normally distinct prior cell states C (k) t−1 or data bridges a (k) .

Multi-Task Cross Residual Learning-based AA Recognition model
In this multi-task cross-residual network, additional cross-task integration is allowed using the cross-residual weights, which improves late-layer representational efficiency without demanding large input feature spaces. Also, a few task specializations are adopted such as attention strategy in the cross-residual layers to create a moderately distinct regularization for all task branches and increase classification efficiency.

Attention Strategy
The attention strategy is added to the MTDeep model by altering the normalization step in the activation function before softmax output. Two kinds of activation functions are utilized: channel attention and spatial attention. Channel attention A 1 executes l 2 -normalization within each channel for all spatial coordinates to eliminate spatial information. Spatial attention A 2 executes normalization within the feature map from all channels and then sigmoid to obtain spatial information. https://www.indjst.org/ In Equation (7) and Equation (8), i ranges over each spatial coordinate, c ranges over each channel, µ c is the mean of feature map from c th channel, σ c is the standard variance of feature map from c th channel, and x i is the feature vector at i th spatial coordinate. The attention-based cross-residual learning can refine the feature maps and suppress noises from scalp hair features by preserving significant details. Thus, the AB-MTDeep model is designed for AA recognition using the CRL and MTL strategies, as shown in Figure 4. Given hair and scalp images, the encoders with residual links are trained independently to get global and local feature representations. Such features are merged to obtain a final feature representation, which is further passed to the convolution unit, which serves as a multi-task cross-residual network. The given feature vector is learned based on multi-task cross-residual learning to create the trained AB-MTDeep system. The trained AB-MTDeep network will then be used to test the unlabeled hair and scalp samples, resulting in the identification of AA conditions.

Results and Discussion
This section examines the efficacy of the AB-MTDeep system by implementing it in MATLAB 2017b using the Figaro1k and DermNet databases (discussed in Section 3.1). In this analysis, a total of 1400 photos (350 from Figaro1k and 1050 from DermNet databases) are used. Of these, 1120 photos (280 from Figaro1k (i.e., normal hair class) and 840 from DermNet databases (i.e., 280 mild, 280 moderate, and 280 severe AA)) are applied for training. Similarly, 280 photos (70 from Figaro1k (i.e., normal hair) and 210 from DermNet databases (70 mild, 70 moderate, and 70 severe AA)) photos are applied for testing. The network is trained with stochastic gradient descent using a batch size of 25, momentum of 0.9, weight decay of 0.0001, and a training rate of 0.001. The AB-MTDeep system performance is compared with the techniques discussed in the literature, including ScalpEye (2) , DNN (9) , SVM (10) , KNN (13) , YOLOv4 (15) , and AI-ScalpGrader (16) regarding the following metrics: • Accuracy: It is the percentage of accurate recognition among all the photos that were tested In Equation (9), TP is the quantity of normal samples exactly classified as normal, TN is the quantity of AA samples exactly classified as AA, FP is the quantity of AA samples inexactly classified as normal, and FN is the quantity of normal samples inexactly classified as AA. https://www.indjst.org/ • Precision: It is calculated by Equation (10) Precision = T P T P + FP (10) • Recall: It is computed byEquation (11) • F-measure: It is determined as Equation (12) F Table 1 presents the confusion matrix, which shows the outcomes of each class normal, mild AA, moderate AA, and severe AA.  Figure 5 depicts the efficiency of various AA classification systems in terms of precision, recall, and f-measure. It indicates that the precision of the AB-MTDeep system is 32.3% larger than the KNN, 21.9% larger than the SVM, 18.3% larger than the DNN, 13.3% larger than the YOLOv4, 5.5% larger than the ScalpEye, and 3.9% larger than the AI-ScalpGrader systems. The recall of the AB-MTDeep system is 32.5% larger than the KNN, 22.1% larger than the SVM, 18.8% larger than the DNN, 13.9% larger than the YOLOv4, 6.5% larger than the ScalpEye, and 5.6% larger than the AI-ScalpGrader systems. Similarly, the f-measure of the AB-MTDeep system is 32.4% higher than the KNN, 22% higher than the SVM, 18.6% higher than the DNN, 13.6% higher than the YOLOv4, 6% higher than the Scalp Eye, and 4.8% higher than the AI-ScalpGrader systems. Figure 6 portrays the accuracy (in %) achieved by various AA classification systems. It observes that the accuracy of the AB-MTDeep system is 32% higher than the KNN, 21.4% higher than the SVM, 17.9% higher than the DNN, 12.8% higher than the YOLOv4, 6.5% higher than the ScalpEye, and 4.7% higher than the AI-ScalpGrader systems. This is due to the adaptation of both MTL and CRL for the deep learning framework, which learns both local and global characteristics from the AA hair and scalp images at multiple scales. Table 2 presents the performance analysis of the original FRCNN-LSTM and the AB-MTDeep learning. It is noticed that the proposed CRL and MTL for the MTDeep learning system outperforms the actual network for AA classification. https://www.indjst.org/

Conclusion
An AB-MTDeep learning system was developed in this study that uses multi-task cross-residual learning to improve AA recognition. This system achieved promising results with an accuracy of 95.11%. This system could also be utilized in any kind of disease recognition problem. The multi-task cross-residual learning enhanced the performance of the MTDeep system when using large-scale datasets. Though it recognizes the different AA conditions, its generalization relies on the number of training images. It is necessary to train an adequate number of images. So, future work will focus on adopting adversarial networks to generate more training images of scalp and AA hair conditions.