Detecting Multi-label emotions from code-mixed Facebook Status Updates

Objectives: With the growth of Social Media and the increasing use of English-Hindi (Hinglish) in linguistically diverse countries such as India, it is becoming increasingly important to analyse Hinglish-language content on Social Media platforms like Facebook. Prior sentiment and emotion analyses have only focused on single-label classiﬁcation, ignoring the possibility of coexisting emotions within one instance. By analysing code-mixed Facebook status updates, the study aims to investigate multiple emotions. Method: 15,995 English-Hindi mixed Facebook status updates are annotated with emotions like joy, sadness, anger, fear, trust, disgust, surprise, anticipation, and love. Diﬀerent pre-processing techniques are used to normalize the noisy data to produce more accurate results. We apply ﬁve diﬀerent multi-level classiﬁcation algorithms with word-level and character n-gram approaches to test the best classiﬁcation results. Findings: The results of the experiment indicate that a status update can evoke multiple emotions rather than just one. Precision, recall, F1 score, and accuracy using both Micro and Macro averaging are used to evaluate the performance of diﬀerent classiﬁers. As compared to other classiﬁcation algorithms, the Classiﬁer Chains algorithm with its 2-6-gram approach has the highest accuracy of 86% with a precision of 0.98. As compared to other classiﬁers, the Classiﬁer Chains algorithm oﬀered better results due to its ability to consider the correlations between class labels. Applications: The article focuses on the multi-label emotion classiﬁcation task, which examines whether a Facebook status update shows none, one, or more of the nine emotions as outlined by Plutchik’s wheel of emotions. Considering the emotion of a text can support decision-making processes in various ways.


Introduction
The initiative of the Indian government towards digitalization and the spread of the Internet in recent years has resulted in the massive growth in the digital population of https://www.indjst.org/India which is predicted to cross the figure of 650 million Internet users by the year 2023 (1) .Nowadays the social media is the buzzword amongst youngsters.According to a survey conducted by Pew Research Centre (2) , about 95% of youngsters have smartphones and 45% of them are found to be constantly online.The availability and access to smartphones and Internet connectivity at a cheaper cost has resulted in an estimation of more than 3 billion people will be using social media in near future across the world.The widespread use of social media has resulted in the generation of an enormous amount of social media text.The text written by people of different strata belonging to varied cultural backgrounds has given birth to a very informal way of textual communication with diverse linguistic distinctions.In a country like India where people belonging to multilingual societies, it is observed that informal communication is mainly a mix of transliterated Hindi i.e. (Roman Hindi) with English or English with their regional languages called code-mixing.In India, most of the social media texts are written using code-mixing more precisely using the mix of English and Hindi languages.
More than 2.6 billion people (3) use Facebook each month, making it the most popular social network in the world until recently.Young people from low-income families still find it appealing.Statistics published by Statista (4) indicate that India has more than 290 million Facebook users followed by the United States with 180 million.Several things can be done on Facebook, including creating profiles, uploading photos and videos, sending messages, and keeping in touch with friends, family, and colleagues.People use Facebook to communicate with one another in an informal, and fast manner.According to Facebook surveys, approximately 83% of women and 75% of men are Facebook users among Internet users.Facebook users are also estimated to have 155 friends on average.According to Facebook usage statistics by age groups, 62% of people aged 65+ and 72% between 50-64 are active Facebook users.A whopping 88% of internet users aged 18-29 use Facebook, along with 84% of users aged 30-49.Facebook is used by 82% of college graduates.Having reviewed these statistics, we identified Facebook as the platform that represents a wide variety of emotions expressed in Facebook status updates.
People express their views and opinions in the form of emotions.Facebook has revolutionized the way people communicate with each other.Facebook channelizes the opinions, facts, emotions expressed by people on different topics in the form of text messages.Emotion analysis focuses on detecting the polarity (positive or negative) and the emotional state such as joy, anger, sadness, etc (5) .The earlier work on sentiment analysis and emotion analysis has mainly classified the text into either positive, negative, or neutral sentiments (6) , or identified single emotion from the text data.On the other hand, in a multi-label emotion classification, a piece of text is found to be associated with all the emotions, a subset of emotions or no emotions (7)(8)(9) .Hence, in this article, we focus on the multi-label emotion classification task, which aims to identify the presence of all, some, or no emotions.In this study, we are using eight primary emotions (joy, sadness, anger, fear, trust, disgust, surprise, and anticipation) and one secondary emotion (love) as described in Plutchik's wheel of emotions (10) .[Figure1] There have been several early approaches for analysis of code-mixed data, mostly focusing on pre-processing (11)(12)(13)(14) , language identification (15)(16)(17)(18) , lexicon building (19) , sentiment classification (20)(21)(22) and subjectivity analysis (23) .Traditional algorithms are built to work on single-label classification problems.Hence, in a lot of prior approaches, the multilabel problem is divided into multiple single label problems so that they can be applied with the existing single label algorithms.The emphasis of such classification is on finding the single label classifier's predictions individually which are then transformed into multi-label predictions.The most natural approaches called one to rest and binary relevance are used in multi-label problems to transform them into multiple individual binary classification problems, one problem for each label.Then, an independent binary classifier is trained to predict the relevance of one of the labels (24,25) .These are the simplest classification approaches to implement multilabel output.However, both these approaches of classification do not consider the possible correlations between class labels.This leads to chances of over fitting the label combinations because it does not take into consideration the association between the previously observed combinations of labels.
In this article, the other classification models like Classifier chains, Label power set, and a k-Nearest Neighbor (KNN) adaptive algorithm using a lazy learning approach named multi-Label k-Nearest Neighbor (ML-kNN) are presented.In all these algorithms the correlation between the underlying labels is considered.The classification on Facebook status updates with the given features shows that Classifier chains achieved better performance as compared to other experimented models.
The contribution of this work is summarised as under.
• Multi-label classification paradigm in Emotion Analysis domain.
• A real dataset composed of Facebook status updates labelled with nine emotions.
• Creation of Hindi-English emotion lexicons of 67,000 words labelled with nine emotions.
• Creation of 3000 emoji lexicons labelled with nine emotions.
To recognize the coexistence of different emotions, the proposed architecture uses a dataset of 15,995 code mixed Facebook status updates.Different algorithms are used to detect the coexistence of different emotions.Based on the results of the experiments, we found that our system is better than the state-of-the-art systems, surpassing them.In the rest of the article, the following information is presented.Described in Section 2 is relevant work on multi-label problem transformation and emotional analysis of Facebook statuses.The rest of the article is arranged as follows.In Section 2, we overview the related work on multi-label problem transformation methods and emotion analysis of Facebook status updates.In Section 3, it is described how the exploratory analysis was conducted.Section 4 provides a detailed explanation of the methodology.In Section 5, we report the experimental results and evaluation of algorithms followed by the conclusion.

Exploring the Dataset
English-Hindi code mixed 32,693 Facebook status updates of 80 people are collected.The status updates collected are in the json format which consists of all the information such as timestamps, posts, titles, tags, and attachments.We have used the Python programming language for implementing the system.The status updates collected are stored together in the CSV File.Extensive semi-automated processing is carried out to remove all the noisy status updates.In the annotation phase, we further removed all those status updates which were not expressing any emotions.Finally, we have got the corpus containing 15,996 Facebook status updates with 99066 words.Among the 15,996 Facebook status updates 321 status updates were in Devanagari Script, 146,45 status updates in Roman Script, and 1002 in Mixed Script.The dataset annotation was carried out for segregating the status updates depending upon the language used in it.For each status update, a label was assigned to its source language.In the first phase, the database was divided using two kinds of labels namely, ' eng' and 'hin' .S1: "happy birthday dear wishing you a magical birthday".'eng' label was assigned to status updates which are present in English language vocabulary, such as "magical", and "happy" used in S1.
S2: " " 'hin' label was assigned to status updates which are present in the Devanagari Hindi vocabulary such as " ", and " " used in S2.
S3: "happy birthday dubey ji kamyab our khush rahiye" The label 'mix' was assigned to status updates that are present in Roman (Transliterated) Hindi such as "khush", and "kamyab" used in S3. https://www.indjst.org/Emotion annotation of the status updates was done with nine primary emotions, namely, Joy, Trust, Anticipation, Sadness, Anger, Fear, Disgust, Surprise, and one secondary emotion i.e., Love separately for all three kinds of labels.The initial Exploratory Data Analysis for the total number of posts in each emotion category and posts with multiple emotions reveals the following facts about the corpus as depicted in [Figures 2 and 3].

System Architecture
After annotating the corpus, we try to detect emotions in all three kinds of noisy status updates.Thus, a data preparation task was employed to clean the data followed by feature identification and extraction and finally, the classification of emotions as joy, trust, anticipation, surprise, disgust, sadness, anger, fear, and love.The steps are described in sequential order as under.

Pre-processing of the noisy Status Updates
Noisy status updates: The unstructured status updates containing special characters, URLs, blank lines, multiple spaces, repeated characters, punctuations, and numbers were all cleaned up in a pre-processing step.
Conversion of Emoticon and Emoji to Words: In social media, people prefer to converse or express their feelings in the form of emoticons and emojis along with text data.In emotion analysis, they give some valuable information and so removing them might lead to losing important information related to underlying emotions.Hence, they are converted to their text equivalents.
Stop words: Stop words corpus obtained from NLTK was used to eliminate most unproductive words which provide little information about individual status updates.While doing so words like not, never that are depicting negation are preserved.
Slangs and Abbreviations: Most of the time status updates are written using popular slags and abbreviations.As they are used to convey emotions they are converted to their respective full forms.
Spelling correction: Typos are common in text data available on social media and thus spelling mistakes are corrected before the analysis.
Translation: We used Google translate app to translate all transliterated Hindi (Roman Hindi) scripts to Devanagari Hindi script.Some words were not translated by this app correctly.We have created a list of 67,000 transliterated words with their corresponding Hindi and English meanings.The Devanagari Hindi script is then translated to English using the google translate app.

Feature Engineering
The process of creating features by extracting information from the data was employed using various techniques as described below.These feature vectors are then used to train our machine learning models.
Word N-Grams: Bag-of-words is the simplest method of extracting features from the text.However, it treats each word as an independent word and counts its frequency.We use the word n-gram where n ranges from 2 to 6.However, it was improved by using Bag-of-words with n-grams because they captured the context around each word.
Character N-Grams: Character N-Grams are language independent and more reliable even though they increase the dimensionality of the problem.They also give information about their content and context.We use character n-gram where n ranges from 2 to 6.
TF-IDF Vectorization: TF-IDF weight is used as a statistical measure to identify how important the word is in that document.The value of a word increases proportionally to count, but it is inversely proportional to the frequency of the word in the corpus.
NRC lexicons: NRC Lexicon (26) containing 14,182 English unigrams was used for this study.Emotion is associated with each word of the lexicon if it has an association score of 1, otherwise, an association score of 0 is assigned.With weights associated with each lexicon, it was possible to preserve all the emotions represented by a single lexicon.If the training and testing sets belong to the same domain, the emotion lexicon improves classification accuracy significantly.Additionally, NRC lexicons have been augmented with 3000 emoji-based words to help represent emotions.
Emoji: Social media posts are largely dominated by emojis.Emotions are expressed nowadays using emojis rather than text messages.Emojis were assigned labels for nine different emotions to improve the classification of their emotion and sentiment.

Classification Models
The pre-processed corpus is used to develop feature vectors with the traditional Bag-of-Words (Bow)approach with word-ngram and character n-gram techniques.The TF-IDF vectorizer is used to create an index for each word in the vocabulary.Before applying the baseline machine learning model, we balanced the dataset so that the train and test datasets have equally distributed class labels.We split the data into 70% for the training dataset and 30% for the test dataset.The class labels are mutually dependent in single-label classification whereas in multi-label classification as multiple class labels are not mutually exclusive special machine learning algorithms are required for predicting the output.
One-Vs-Rest: The algorithm uses the heuristic method to divide the multi-label problem into individual one label per class binary classification problems.And for an unseen instance, the predictions are made based on the class with maximum confidence.The algorithm also assumes that the classes are mutually exclusive without considering any underlying dependencies between them.https://www.indjst.org/Binary Relevance: In this case, an ensemble of single-label binary classifiers is trained, one for each class.Each classifier forecasts if an instance belongs to a particular class or not.Thus, the multi-level output is the union of all classes that were predicted.However, the classifier does not take into consideration the correlations between class labels (27) .
Classifier Chains: A chain of binary classifiers is constructed to decompose the multi-label classification problem into many binary problems.The labels of unseen instances are predicted sequentially by using the output of all preceding classifiers feature input for the subsequent classifiers (28) .
Label Powerset: The algorithm considers the correlation between different class labels.It uses a training dataset to transform the multi-label problem into one binary classifier for every label combination (29) .
ML-kNN : The algorithm first identifies the k-nearest neighbors for each instance of the training set.The maximum a posteriori principle is used to determine the label sets for the unseen instances which are based on the label sets of neighboring instances (30) .

Results and Discussion
We trained different classification algorithms on the dataset of Facebook status updates containing 15995 status updates with nine labels and evaluated their performance.The classifiers used the BoW approach augmented with Word N-gram and Character N-gram feature extraction techniques with TF-IDF vectorization.Based on the data and the extent of pre-processing, the results are quite satisfactory.We trained a binary Relevance classifier for each of the nine labels as our baseline classifier.We report the performance of all the models using four evaluation metrics like accuracy Rate, Precision Rate, Recall Rate, and F1 Score using both Micro and Macro averaging.By using Micro-averaging, the average of all True Positives, True Negatives, False Positives, and False Negatives is calculated for each class.When Macro-averaging is performed, different sets of precision and recall are calculated to achieve the system's average accuracy and recall.In multi-label classification, instead of evaluating accuracy based solely on precision and recall metrics, predictions are made based on the labels falling into multiple classes rather than inaccurately guessing no labels at all.
Positive class predictions are measured in precision against all positive class predictions.Alternatively, it can be defined as the number of true positives divided by the number of false positives.
The precision is the ratio TP/(TP + FP) The Recall represents how successfully the fraction of the relevant instances are classified by the classifier.The Recall is calculated by dividing true positives by the sum of true positives and the number of false negatives.
The recall is the ratio Precision and recall determine the F1 Score, which is weighted average of both.Multilabel classification is determined by averaging the F1 scores of individual classes.The F1 score is: The accuracy metric measures the ratio of correct predictions over the total number of instances evaluated.
Accuracy = (T P + T N)/(T P + T N + FP + FN) (4) The results of the experiments conducted using different classifiers are compared as shown in [Table 1].Overall, we obtain good accuracy values for every model.We observed that char n-gram TF-IDF vectors surpass the word n-grams.We chose the Classifier Chains model above the remaining models because it has the highest test set accuracy, which is near to the training set accuracy. https://www.indjst.org/

Fig 2 .
Fig 2. Count of status updates under each label

Fig 3 .
Fig 3. Count of status updates with multiple labels