Performance Evaluation of Sentiment Analysis on Balanced and Imbalanced Dataset Using Ensemble Approach

Background: Class imbalance is often discussed as a strenuous task in the realm of sentiment analysis. In an imbalanced classiﬁcation, few minority class instances are unable to provide suﬃcient information, therefore direct learning from an unbalanced dataset can produce unsatisfactory results. This work aims to address the problem of class imbalance. Methods: At primary level this study uses a novel Synthetic Minority Oversampling Technique (SMOTE) for balancing the dataset and then proposes an ensemble model, named Ensemble Bagging Support Vector Machine (EBSVM) for opinion mining. To measure the performance of the particular approach, numerous analyses are conducted on both imbalanced and balanced datasets. Then the work compares the eﬀectiveness of the suggested model with three base classiﬁers (Nave Bayes (NB), Decision Tree (DT), and Support Vector Machine (SVM)). The customer reviews for restaurants is chose as the dataset for this work. Accuracy, precision, recall and F-measure are used as metrics for evaluation. Findings: According to the results, the suggested EBSVM model excels all other individual classiﬁers with the imbalanced and SMOTE balanced dataset. The balanced EBSVM classiﬁer improves the imbalanced EBSVM Classiﬁer in terms of accuracy. Precision, recall and F-measure of the minority class in the imbalanced classiﬁers have been improved in balanced Classiﬁers. Novelty: The performance of opinion mining classiﬁers for imbalanced and balanced datasets is evaluated in this paper. The work examines not only general opinions, but also speciﬁc aspects such as food, service, ambiance, quality, and price. Comparing the suggested model with existing classiﬁcation algorithms in the literature, it has found that it outperformed the other models.


Introduction
In machine learning, dealing with class imbalance problems in the datasets is a strenuous task. When the sample size of one class is significantly smaller or much larger than the sample size of another class, data imbalance occurs (1) . The problem arises when the dominant class consists of the majority of the dataset, while the minority class has a small representation in the dataset. If the majority class's degree of class imbalance is high, a model may produce higher accuracy rate, however, such a classifier cannot be suggested as an accurate method for classification. Most traditional machine learning techniques struggle with the unbalanced nature of most real -world datasets.
The strategies that can be used to rectify the problem of class imbalance operates at two levels: data-level and algorithmlevel (2) . Over-sampling and under-sampling are two common data-sampling approaches for dealing with unbalanced datasets. In under-sampling the samples are randomly selected from the majority class and remove the remaining (3) . During the oversampling, minority samples are added via replication to ensure that the distribution is equal and balanced (4) . This work is an attempt to balance the dataset using the over-sampling technique SMOTE (Synthetic Minority Over sampling Technique).
The dataset that is opted for this analysis is the customer reviews for restaurants. The major aspects that isolated for this analysis includes Food, Service, Staff, Ambience, and Price that is produced by NMF (Non-Negative Matrix Factorization) and by the literature study. Initially the dataset is balanced using SMOTE, and the comparison of the performance of ensemble technique is conducted for opinion classification at the next level. The ensemble learning approach combines numerous classifiers to produce a model (5) . This strategy is particularly aimed at the accuracy of classification. Bagging and boosting are the popular methods adopted for assembling several classifiers.
Several researches are initiated with the help of single classifiers and many recent studies have focused on the ensemble classifier to increase classification accuracy. This session is an attempt to introduce research contributions from multiple authors on strategies for addressing imbalanced datasets, as well as an ensemble strategy for sentiment categorization. To identify toxic comments on social media networks, Rupapara.V and et al. (6) present an ensemble approach called Regression Vector Voting Classifier (RVVC). The impact of imbalanced and balanced dataset is analysed using random under-sampling and SMOTE oversampling techniques. This work has compared several machine learning classifiers with the proposed approach and identified that models suffer from poor performance with an imbalanced dataset, while a precise result can be obtained using a balanced dataset.
To address the imbalance issue, Abeer S. Desuky and Sadiq Hussain (7) propose a modified hybrid method. Simulated annealing was the method applied to select the highest suitable group of major class records. KNN, DA, SVM, and DT classifiers were used to evaluate the efficacy of their method at the next level. This method is applied in unbalanced datasets to reduce the misclassification. This paper attempts to validate the technique against 51 real-world datasets. The evaluation metrics used here are: G-mean and F-score.
M.Govindarajan (8) proposes an effective ensemble technique for developing accurate classifiers for the Usenet2 dataset. A collection of 20 newsgroups constitutes the Usenet2 Dataset. The suggested method uses NB, SVM, and Gentic Algorithm (GA) as base classifiers. Both heterogeneous and homogeneous models are built in this work. The proposed bagged approaches enhance classification accuracy much more than the base classifiers. The hybrid NB-SVM-GA classifier outperforms the base classifiers in terms of classification accuracy. Le wang and et al. (9) , provide a method for classification of imbalanced datasets. This study analyses the classification methods of unbalanced data sets from various perspectives such as data sampling, algorithm level, feature level and, deep learning methods. The merits and demerits of these methods are discussed in detail in this study. The data sampling method classify unbalanced data sets using SMOTE, SVM, and k-nearest neighbour technologies. Then presented the imbalanced data sets classifier's evaluation criteria. Salim Sazzed and Sampath Jayarathna (10) discuss Lexical Rule-based sentiment Analyser (LRSentiA) as a lexicon-based tool to determine a review's semantic orientation and its confidence rating. They introduce a hybrid approach called SSentiA by Combining LRSentiA with a machine learning classifier. The authors analyze the performance of LRSentiA and SSSentA to existing unsupervised, lexicon-based, and self-supervised algorithms. SSentiA considerably improves sentiment categorization performance.
A study by Mishra et al. (11) extracts and analyses tweets about CIOVID-19 from tourism sectors such as healthcare and hospitality from all over the world and sentiment analysis is performed with the VADER package. The LDA (Latent Dirichlet Allocation) topic modeling technique was used to identify the hidden pattern and to identify inter-cluster similarity between terms. Based on the deep learning analysis of social media, the study provides a practical strategy to maintain access to the internet during the Covid-19 pandemic. Furthermore, the LSTM RNN model enables the government to track social media sites such as Twitter as a means of monitoring citizen sentiment. This enables them to make better decisions, while acting in the long-term interests of the country and its citizens. Basha and Rajput (12) present a framework for supervised sentiment analysis called SSM (supervised topic level sentiment model), which is capable of solving overall sentiment analysis problems. This work used belief maximization for the SSM model and Dirichlet distribution for aspect estimation. The researchers tested these models on different reviews of different products and found that the SSM model outperformed the on-hand algorithm in terms of aspect recognition and overall sentiment prediction.

Methodology
The methodology adopted in this analysis, proposes a framework for addressing the issue of class imbalance, as well as an ensemble approach for improving the aspect based sentiment classification accuracy. By mixing redundant and complementary classifiers, the ensemble model improves reliability, accuracy, and the quality of results. The model uses synthetic minority oversampling (SMOTE) to balance the dataset, and ensemble based bagging with SVM for opinion mining. The effectiveness of the suggested method is evaluated against the base classifiers (NB, DT, SVM) for both balanced and imbalanced datasets. A comparison between performances against the recent literature is also attempted in this session. Figure 1 summarizes the methodology of the study.

Input the Dataset
The dataset used for this work is restaurants Review dataset. The restaurant reviews have taken from TripAdvisor.com using Web Crawlers. There were a total of 10,089 reviews taken, out of which 26,059 sentences were available.

Aspect extraction
This phase extracts the aspect from the review dataset with the help of topic modeling techniques LDA (Latent Dirichlet allocation algorithm) and NMF (Non-Negative Matrix Factorization) and by the literature study. Food, Service, Staff, Ambience and Price are the aspect set created for Restaurant Reviews. The reviews for each of the five aspects are saved in their own csv files.

Lexicon Based Classification
Using a lexicon-based approach, the aspect-based review sentence is classified as positive or negative. This work use Vader package for Lexicon based classification. It's a sentiment analysis method based on rules. This stage determines the polarity of https://www.indjst.org/ the review sentence and stored distinct csv files for each aspect.

Balancing the Data set With SMOTE
In this research, over-sampling approach is employed to balance the dataset. SMOTE is used to deal with imbalanced datasets. In this approach, the minority classes are over-sampled by adding synthetic samples based on feature-space similarities between existing minority examples (13) (14) . To generate a synthetic data point, the vector between one of the k neighbors, and the current data point is used. The vector is then multiplied by a random number between 0 and 1. Adding this, a new synthetic data point will be created.

Feature Transformation using Count Vectorizer
Count Vectorizer converts a text into a vector based on the frequency of each word that appears in the text file (15) . It transforms a set of text documents into a matrix of token counts. A document vector is generated from the text document after feature selection. In this work scikit-learn library in Python provides the Count Vectorizer.

Single base Classifiers (Nave Bayes, Decision Tree, and Support Vector Machine
Using the VADER lexicon, a set of base classifiers such as NB, DT, and SVM are developed to predict classification scores. The Naive Bayes algorithm classifies data based on probability. This method derives the posterior probability of a class based on the distribution of the term throughout the text (16) . The Decision Tree and Support Vector Machine can be used for regression and classification (16) .

Ensemble Classification Using Bagging SVM (EBSVM Classifier)
In this method, the Bootstrapping and Aggregation machine learning approaches are combined into a single ensemble classifier. On each sub-sample of training data, the multiple SVM is created (10 sub-samples each contains 100 samples). Each SVM is built deep with sub-sampled training data to boost the classifier's performance, after that each SVM result is combined to get the optimum prediction. The number of SVM used determines the accuracy of the model's prediction in the bagging approach.

Performance Metrics
The confusion matrix (17) shown in Table 1, is used to assess the model's performance. Based on actual and predicted values, a confusion matrix is created (Table 1) using the classification techniques. The performance evaluation metrics used in this work are 1) Accuracy 2) Recall 3) Precision 5) F-measure. The true decisions predicted by the classifier are called accuracy. Recall (sensitivity) refers to a classifier's ability to reliably identify positive classifications (18) . The proportion of observations from the positive class that is correctly identified as positive is measured by precision (18) . The harmonic averages of recall and precision are used to calculate the F-measure.

Dataset
Choosing a restaurant has often become a selection that completely depended on its online reputation. TripAdvisor.com is a leading travel website, offering information and reviews on Restaurants, Hotels and Attractions, as well as a range of travel options and planning tools. The reviews posted on the TripAdvisor website may provide details about a user's experiences and suggestions for other users, which helps the consumer make a decision. The customer reviews for Restaurants were taken from Trip Advisor website using web crawlers for two Restaurants each in four Metropolitan cities were used as Dataset in this proposed work. There was a total of 10,089 reviews taken, out of which 26,059 sentences were available and this set is an unbalanced set. The proposed EBSVM ensemble classifier, as well as the selected machine learning classifiers, are evaluated through several experiments. The proposed model is also has compared with the existing models in the literature. The experiment conducted here are divided into two categories: Comparative testing of the models on imbalanced dataset and comparative testing of the model on a balanced dataset.

Comparative Testing of the Models on Imbalanced Dataset
This work compares the proposed model against the base classifiers and also against the existing classification models in the literature on imbalanced dataset. It has determined that in most of the cases this model outperformed the other existing models. Table 2 presents the overall accuracy percentage of the machine learning classifier's on imbalanced databases. It compares the accuracy percentages of imbalanced base classifier's along with proposed EBSVM classifier for different aspects. Here, the proposed EBSVM shows higher accuracy than the base classifiers. The performance evaluation with the help of precision, recall and F-measure is shown in Table 3. Here, in most of the aspects the precision, recall and F-measure of proposed EBSVM Classifier has been improved when it is compared to the base Classifier. Result shows that the majority class data over fits the models because the majority class gets more data compared to the minority class. Therefore, the minority class makes greater wrong predictions than the majority class. Table 4 compares six resent literature against the proposed model. EBSVM provides better Accuracy of 97.6% forthe aspect 'staff ' , Precision of 94% for the aspect 'Service' , Recall of 79% for the aspect 'Price' and F1 Score of 82% for the aspect 'Service' respectively.

Comparative Testing of the Models on Balanced Dataset
Here it is a comparison of the proposed model against the base classifiers and also against the existing classification models in the literature on balanced dataset. It shows that in most of the cases in this model performed better than others.

Proposed Model against Base Classifiers
The overall accuracy percentage of machine learning models on balanced databases is shown in Table 5. For different aspects, it compares the accuracy percentages of balanced base classifiers and proposed EBSVM classifiers. Here, the proposed EBSVM is more accurate than the base classifiers. https://www.indjst.org/ Table 6 provides an evaluation of performance using precision, recall, and F-measure. The SMOTE is used to balance the dataset. A significant improvement in performance has been achieved with SMOTE.  Table 7 presents a comparison of three resent literature against the proposed model. The SMOTE-EBSVM provides highest Accuracy of 98.3% for the aspect 'staff ' , Precision, Recall and F1 Score of 99% for the aspect 'Staff ' respectively. Here the proposed model performs well than others.

Conclusion
This work employs an over-sampling technique in conjunction with an Ensample strategy to deal with class imbalances and attempts to improve classification performance. This technique balances the dataset using a synthetic minority oversampling technique (SMOTE), then for opinion mining an ensemble based bagging with SVM (EBSVM) is applied. The evaluation criteria for the performance of classifiers for imbalanced and balanced data sets were accuracy, precision, recall, and the F-measure. This work compared the proposed model to the base classifiers (NB, DT, and SVM), as well as with current classification models in the literature for both balanced and imbalanced datasets. The balanced SMOTE EBSVM Classifier performs well compared to the imbalanced Classifier. Compared to existing classification algorithms in the literature, the proposed model performed better. This study should be extended to a large collection of real-world datasets and this work also planned to investigate algorithmlevel techniques in the future as it proved to be precise and accurate one than the existing models. This work has also aimed at developing realistic opinion summaries for each aspect separately as well.