Extra-tree learning based Socio-economic factor analysis and multi-class adaptive boosting meta-estimator for prediction of agricultural productivity

Background/Objectives: In socio-economic factor analysis, the observed data are essential in the random distribution for the adequate representation of the random components associated with various factors and lead to poor prediction in the case of the Logit and Probit model. The objective of this work is to have machine learning based model for socio-economic factors analysis and ensemble learning based model for efficient prediction of agricultural productivity. Methods: In this work, extra-tree classifier machine learning model based socio-economic factors selection has been used and found capable to evaluate the socio-economic factors that contain relevant information to the target variable agricultural productivity. In addition to this, the multi-class adaptive boosting ensemble learning approach is used for the prediction of agricultural productivity of respondents (farmers) from their socio-economic profiles. This proposed research has been evaluated by using the test case of analyzing the socio-economic factors of the farmers affecting agricultural productivity in Sambalpur District, of Odisha State, India. The farmers’ socio-economic data are collected by using structured interviews through questionnaires that are in line with standard Participatory Rural Appraisal. Findings: It is found that the proposed approach of socio-economic factor identification is efficient for computing the relationships between socioeconomic factors and agricultural productivity. Novelty: In this application domain of socio-economic factor analysis, the proposed method employs extra-tree classifier and boosting ensemble learning for socio-economic factor analysis towards agricultural productivity which is found efficient than other existing approaches such as Logit, Probit, Linear Regression, Linear Discriminant Analysis, Naïve Baise, and other counterparts.


Introduction
Un-doubtfully, agriculture is the most important gift of environmental services including water, forest, pastures, and soil nutrients. However, socio-economic factors of farmers such as Marital Status, Household Size, Total Annual Income, Educational Level, Farm Size, Membership of farmers cooperative society, Years of residence, Available amenities (such as Electricity, Pipe borne water, Tarred roads, etc.), Farming experience, Quantity and Type of fertilizer used, Access to Government Schemes, etc., also plays important roles in sustainable agricultural productivity. In comparison with many other developed countries like the USA, Brazil, China, etc., India is becoming the largest hub of outsourcing of various agricultural products such as banana, guava, papaya, sugar cane, mango, etc. to other countries. In the 20th Century, India has been evolved as one of the leading farming countries for the production of several agricultural products. However, some national and international reports (1) indicated that the country needs to produce more major agricultural products such as rice and wheat for the increasing number of population. Many researchers investigated the reasons such as poorly maintained infrastructure, improper irrigation systems, inconsistent Govt. policies, etc. are the shortfall and low growth in agriculture. Moreover, some other major factors related to environmental, social, technological, and policy-oriented need to be taken care of at the utmost level. Technology can help in evaluating and reducing crop losses, upgrade infrastructure, and restore traditional methods of cultivation, all of which dispense towards the larger goal of enlarged productivity. Therefore, the distribution of research and modernization has the future to unlock enormous benefits in the Indian agricultural sector in which a major part of the country's population is directly and indirectly associated.
The socio-economic factors of farmers such as marital status, household size, total annual income, educational level, farm size, membership of farmers cooperative society, years of residence, available amenities (such as electricity, pipe-borne water, tarred roads, etc.), farming experience, quantity and type of fertilizer used, access to government schemes, etc., also play important roles in sustainable agricultural productivity. Coming to technological issues, 'availability of advance technological and financial requirements such as banks, cellular phones with app accessing the facility, radio signals effect, awareness about the quantity of fertilizers and pesticides to be used' are some of the major reasons for the decrement in the agricultural production. However, some of the solutions are discussed by the researchers to improve and assess productivity. In his study, Fusi et. al (2) mentioned that rice fertilized with urban sewage sludge and possible mitigation strategies: an environmental assessment. Their results suggested that the main correspondence to the environmental effect of rice is nitrogen emissions related to the application of diesel recycled for fieldwork and fertilizers, methane emissions reacted with the depravity of organic matter at the time of flooding period. If rice fertilized with urban sewage sludge, replacement of urban sewage sludge with organic fertilizer decides a development in categories of toxicity related impacts and applications of additional aeration period in the time of cultivation is profitable for climate change are the two productive possibilities to decrease the environmental stress. Pingali and Roger (3) discussed the impact of pesticides on farmer health and rice environment and analyzed that, Asia's exclusive are raising apprehension about unfavorable effects of pesticides on human health and the environment. In Asia, usage of pesticides is small, whereas usage of chemicals humiliates very quickly in tropical flooded conditions. These chemicals are very dangerous to humans, which affects their health problems. So many pesticides in Asia are intensely hazardous and effects on farmers. The primary result is the contradictory impact of pesticides on human health is more and ruin the force on the environment and as well as the paddy ecosystem. Tuong and Bouman (4) emphasized on rice production in water-scarce environments. Water-saving inundation, such as dying and alternate wetting and saturated soil culture can fail the non-productive water outflows and raises water productivity. It will switch aside from being often anaerobic to complete aerobic through the acceptance of water-saving irrigation technologies. The shifts will have influential and higher unknown effects on the lowland rice ecosystem. Wassmann et.al (5) proposed the regional vulnerability of climate change impacts on Asian rice production and scope adaptation. Any degeneration of rice productions through climate change would dangerously harm food security in Asia. The rice economies are experiencing particular climate change affected due to the rise of sea level. Powerful developments of rice production systems, i.e., larger elasticity to salinity and flooding are critical for cultivating or raising yields in these fertile deltaic regions. Indo Gangetic plans afflicted by the melting of Himalayan glaciers have a high climate change threat in Asia. Masutomi et.al (6) introduced impact assessment of climate change on rice production in Asia in comprehensive consideration process/parameter uncertainty in general circulation models to focus on process either in general circulation model in the evolution of the effects of climate change on rice production by using any number of climate predictions. There are three considerations on a special report on emission scenarios (SRES). The starting condition is not taken into application due to the non-availability of data. Other research related to the analysis of factors affecting the choice of crop (7) , adoption of seed and fertilizer (8) , changes in farmland prices (9) and loyalty of members in marketing co-operatives (10) have been reported in the field of agricultural study. Further, a study on input use in agriculture through multi-criteria analysis (11) has been reported and found successful for sustainable agriculture. Hamade et.al, (12) have analyzed qualitative and quantitative approaches to rural development through identifying impacts of technological innovation used in farming on rural farmers' households.
https://www.indjst.org/ From the literature study in this domain of socio-economic factor analysis for agricultural productivity, it is found that few statistical and other mathematical modeling based approaches are developed for identifying the socio-economic factors affecting agricultural productivity. Yugada et.al (13) conducted a study on socio-economic factors and constraints influencing productivity among cassava farmers. Cassava is one of the important food crops grown in Africa. It is a drought-resistant and high acquiescent with enhanced pest management practices. Their study concluded that many characteristics on socioeconomics of farmers in the study field such as farming experience, education, farm size along with others affect the production of cassava while harms such as; low accessibility of labor, insufficient funds, and adverse prices were with the main troubles faced by farmers are explained in their study field. Cassava farmers have socio-economic characteristics information such as age, marital status, gender, occupation, and experience on farming. Their study discloses that there was more male in the production of cassava when compared to female and the majority of cassava farmers are married. The farming experience is more than half a percent. Finally, the author's study showed that the majority of cassava farmers have knowledge of cassava farming and engaged in small scale production. Nigeria is one of the advancing countries facing scarcity of cereal crops like maize. Depend on this, Ajah and Nmadu (14) made a research on social-economic factors influencing the output of small-scale maize farmer's outcomes was held in Abuja. A multistage trail models and semi-structured inquiry were recycled for data collection. Their results showed that the land rent, the land area cultivated, years of farming experience, and the quantity of fertilizer applied were the important socio-economic aspects that significantly influenced maize outcomes. This supported the presumption that socio-economic factors impact maize output. Based on the discovery, their paper was endorsed that farmers in the study field should be intimated through augmentation services of socio-economic factors that impact on maize outcome so that farmers will consider them in the result decision-making process. Vegetables are profitable for their endowment to the share of cultivation in the Swaziland economy. At present local production of vegetables are lower than local demand, hence space is loaded by imports from South Africa. Xaba and Masuku (15) study intended to recognize the factors affecting the productivity and profitability of vegetable production. Their results showed that the factors that extensively exaggerated productivity of vegetable farmers were admittance to the gender of the farmer, fertilizer quantity, selling price, distance to market and credit were important and certainly related to the yield of the vegetable farmers whereas the distance to market was miserably related to productivity. Sorghum is the third most important cereal crop grown in the world. It is a scratchy standard rising grass used as livestock feeds, fencing houses, and food. Sorghum has been used various food items such as cake, malted beverage, bread and ethanol, and some other in major parts of the world. Zakuwai (16) conducted a study on socioeconomic factors that affect sorghum production in Adamawa state, Nigeria. Socio-economic factors like age, education, marital status, and so on are the major factors affecting the level of productivity in Nigeria. Therefore their results helped makers in the country to create more knowledgeable decisions in civilizing livelihood and production of the farmers. Data were collected from 240 farmers with the help of the ordered list, using a purposive and arbitrary case. Their results disclose that mostly married with small family size, male farmers take over the venture, with small farm size, The coefficient of gender, education, credit variables, and age were expected to be unenthusiastic and statistically significant. Usman and Dodo (17) performed a study on socio-economic factors influencing agricultural insurance in rice production in Kano state, Nigeria. Agricultural insurance is necessary in urbanized countries and its profits are appreciated in the whole world. The main objectives of their study were to recognize socioeconomic factors controlling agriculturalist's compliance continue to insure their production of rice and the authors tested this assumption. Their main data were composed of a survey field using a questionnaire controlled to 120 rice farmers in the scheme of agricultural insurance. Finally, their results were concluded that farm size and formal schooling are the socioeconomic factors that manipulate farmer's compliance to continue taking rice insurance. Agricultural productivity refers to produced output by a given input in the farming sector. It can be described as the ratio of output to the inputs in farm production. Sustainable agriculture means cultivating in sustainable ways depends on the understanding of the echo system and a brief study on the association between an organism and their environment. EGWU and William (18) conducted a study on factors affecting sustainable agricultural productivity in Ebonyi state, Nigeria. Their results showed that males are the majority of respondents and they further revealed that constraints restraining sustainable farming productivity were environment, land ownership system, and funds.
Production of food in Nigeria is no longer maintained with population growth. To examine the recognized problem of apparently rejected food production in Nigeria, Anibogu et.al (19) performed a study of socioeconomic factors influencing agricultural production among cooperative farmers in Anambra State, Nigeria. Their results are vigorous with varying insightful consequences. Gender has consequence and converse relationships with farming production which indicates that a rise in more males than females in farming production activities will carry out a decrease in output of farmers. Their study explained that marital status has an optimistic relationship with farmer's outcome levels and many other educational qualifications, farming experience, type of technology employed, crop type, seeding obtained, and fertilizer acquired have a bright and important association with the output of farmer. Women cultivate an extensive amount of food eaten by entire families, but they still have https://www.indjst.org/ no idea or less admittance to technology, land, credit, and knowledge than men. The main objectives of the Jiriko's study (20) are to recognize the socio-economic individuality of women farmers and to resolve the association between food production and socioeconomic distinctiveness. The author's results showed that women have a low level of education and still active. So, they further cannot be engaged in the formal sector. In their study, six villages were selected and in these six villages eighty percent of women were arbitrarily chosen, two hundred women were managed with a structured questionnaire. The author found that the respondent's farm size is small, has low socio-economic distinctiveness and as a result, income produced is poor and low. The regressive analysis revealed that income, training, farm size, wealth, and inputs are the socio-economic characteristics that contributed drastically to food productions.
Socio-economic factor analysis for agricultural productivity has attracted many researchers in this field and other allied fields of science and engineering due to the social impact of this study. It is evident from literature survey that, various statistical model and few machine learning model have been applied in the field such as Descriptive statistics (13,16) , multiple regression analysis and descriptive statistics (14) , Descriptive and inferential statistics (15,18) , logit model (regression model) (17,19) and Probit (21) . This study offers an advanced machine learning based model for socio-economic factor analysis and efficient ensemble learning based model for prediction of agricultural productivity. The objective of this study is to mining these socio-economic factors and designing an automated system identification model for i) Quantification of the socio-economic factors of farmers in the study area, towards agricultural productivity, iii) Extraction of other unidentified socio-economic factors through data acquisition methods and evaluation of its degree of influence of these factors on agricultural productivity through feature selection and evaluation techniques, iii) Designing of system identification model for a data-driven automated operational system for the evaluation of socio-economic factors affecting sustainable agricultural productivity, and iv) Identification of the issues of low productivity and suggestions way out.
The main contribution of this research can be summarized into two parts: (i). Extra-tree learning based Socio-economic factors identification affecting the agricultural productivity. The major steps implemented for this approach are as follows: a) Drawing of the predefined number of sample of socio-economic profiles based on the chosen unique set of socio-economic factors; b) Designing of pool of Decision tree from the derived samples; and c) Finding of socio-economic factors from the aggregates of the results of multiple Decision trees.
(ii). Designing multi-class adaptive boosting ensemble learning-based model for prediction of agricultural productivity from selected social-economic factors from Extra-tree learning model. This consists of two major steps: i) Initialization of weight vector for each socio-economic profile, iv) Obtain the vector of weighted AO prediction error and weight parameter and v) Update the weight vector and repeat until the error reaches a threshold.
The paper is organized as follows: Section 2 describes some important early developed methods and their approach to solve the problem; Section 3 comprises of Data Collection and Preprocessing; Section 4 includes Proposed Model for Socio-economic Factor Analysis and Proposed model for prediction of agricultural productivity; Simulation Results and Analysis is presented in Section 5 followed by Conclusion in Section 6.

Data collection and preprocessing
This study has been planned to evaluate socio-economic factors affecting agricultural productivity based on intelligent machine learning approaches and the results of the proposed methods have been considered as a case study for the Sambalpur District, Odisha State, India. Data based on a survey in 2008 by Dept. of Agriculture and Farmer's Empowerment, Govt. of Odisha (22) , out of 15.582 million hectares area, the State has cultivated area of 61.80 lakh hectares (39.7% of total land). Further, these cultivated areas is consist of three types of land, such as high land, medium land, and low land, and their distribution is 48% (29.14 lakh hectares), 28% (17.55 lakh hectares) and 24% (15.11 lakh hectares) respectively. According to the Census of India (23) , farming is the main livelihood for peoples of Odisha, where 61.8% of the working population are engaged in agricultural activities. Sambalpur district comprises of 9 blocks: Bamra, Jamankira, Jujomora, Kuchinda, Maneswar, Naktideul, Rairakhol, Rengali, Sambalpur and 3 Sub-Divisions: Kuchinda, Rairakhol and Sambalpur.
Un-doubtfully, agriculture is the most important gift of environmental services including water, forest, pastures, and soil nutrients. However, socio-economic factors of farmers such as Marital Status, Household Size, Total Annual Income, Educational Level, Farm Size, Membership of farmers cooperative society, Years of residence, Available amenities (such as Electricity, Pipe borne water, Tarred roads, Television service, Radio signals, GSM networks, Banks and Markets, etc.), Farming experience, Quality of seeds used, Quantity and Type of fertilizer used, Sources of labour, Sources of seeds, Pesticides Usage and Access to Government Schemes, etc., also play important roles in sustainable agricultural productivity. As per the Census of India 2011 (23) , the district has a population of 10.4 Lacs, out of which 70K are the cultivators. The Rice, Groundnut, Gram, Mustard, Arhar, Castor, Linseed, and Sugarcane have mostly cultivated crops in Sambalpur. The Sambalpur sub-division has 5381 no. of cultivators out of which 4896 are male and 485 are female. Out of the total population, we have collected the sample https://www.indjst.org/ of farmers by using Eq.1 .
In Eq.3, EQH NS represents no. of household with Education Qualification of Household (EQH) having no schooling (NS).
Similarly EQH PS , EQH UPS , EQH HS and EQH C is for Primary Schooling (PS), Upper Primary Schooling (UPS), Higher Schooling (HS), and College Level (C). The data distribution of the collected socio-economic data from respondents has been presented in the form of a boxplot in Figure 1. In the data preprocessing phase, the data collected against the identified socio-economic factors affecting the agricultural productivity are converted to numerical values by using label encoding. For example, the respondent may provide the data for the socio-economic attribute Agricultural Outcome (AO) as 'Very Good' , 'Good' , ' Average' , and 'Poor' which indicates the https://www.indjst.org/ status of their agricultural productivity. While employing label encoder, all these values 'Very Good' , 'Good' , ' Average' and 'Poor' are converted into '1' , '2' , '3' and '4' respectively. Similarly, the obtained 'Yes' and 'No' values against AvailGS are replaced with '1' and '0' respectively. During socio-economic study related to agricultural productivity in line with PRA (24,30) , we have identified 44 no. of socio-economic factors ( Table 1 ) (14,21) .

Proposed model
This section includes the proposed methods for (i) Socio-economic factors identification affecting the agricultural productivity (Sect. 4.1), and (ii) Designing multi-class adaptive boosting ensemble learning-based model for prediction of agricultural productivity from optimal social-economic factors (Sect. 4.2).

Proposed model for socio-economic factor analysis based on extra-tree classifier
This section includes the proposed Extra-tree learning based model for socio-economic factor analysis. The Logit model has a limitation of representing random variation and it is unable to handle the unobserved factors that are correlated over time. Eventually Probit model can handle these issues of temporally correlated errors. However, the limitation of the Probit model is that it requires all the data to be in normal distributions. In many real-life events, the normal distributions of data provide an inappropriate representation of the random components and may lead to poor prediction. Therefore in this work, machine learning based socio-economic factors selection has been used for effective results with better outcomes. Machine learning models are capable to find out the variables that contain relevant information to the target variable. In addition to this, these are competent to prune out the variables which are entitled to the addition of noise to the predictions. Logit and Probit model is designed for inference about the relationships between independent variables and dependent variables. However, the machine learning model is efficient in terms of target prediction. The proposed model employs the Extra trees classifier (31) for the selection of optimal socio-economic factors. The proposed method of socio-economic factors selection using Extra trees classifier has been presented in Algorithm 1 and Figure 3. 3. Create a dataset sample X k ⊆ X of k random socio-economic factors from the factor-set, where k ⊆ N . 4. Design a Decision Tree DT k on the sampled data X k by selecting suitable factors for splitting (Fig.2) based on information gain (Eq.4) (32) by using the Gini Index (Eq.5) (33) for the best splitting of the data.

IG
InfoM gini https://www.indjst.org/ Here in Eq.4 and Eq.5, IG is information gain obtained after splitting the socio-economic factor set F X k along selected ]) is the Gini information measure on the dataset X k with selected factor F S X k and P ( ao i | X k ) is the conditional probability of ao i given data distribution X k .

Select an optimal list of socio-economic factors from the aggregates of the results of multiple Decision trees
and prediction performance. 6. Sort the socio-economic factors in descending order according to the Gini Importance. 7. Select and return the top s (pre-defined) number of socio-economic factors.

Proposed multi-class adaptive boosting ensemble learning-based model for prediction of agricultural productivity
This section presents the proposed meta-estimator based model for the prediction of agricultural productivity from selected socio-economic factors (AH, EQHHH, HS, HPF, HPQ, EQHQ, NOD, ALU(ACRE), FRT, LS, LT(I), LT(NI), AvailGS, LL, and UF) ( Figure 6) by using Algorithm 1. Let X = (X i , ao i ) n i=1 be the recorded 'n' no. of socio-economic profiles of 'n' no. of respondents with instances of various agricultural productivity (AO) label collected from field study through structured interviews with questionnaires. Here X i (Eq. 6) denotes i th instance of recorded socio-economic profiles and ao i represents the corresponding AO type. X i is having 'k' no. of selected optimal socio-economic factors out of 44 no. of considered socio-economic factors ( Table 1 in Appendix Section). The target variable AO are of four classes, 'Very Good' , 'Good' , ' Average' , and 'Poor, which represent the level of agricultural productivity of the respondents.
In Eq.6, k is the number of socio-economic factors of socio-economic profile in the dataset and ao i ∈ ao (Eq.7) is any one of the activity type.
ao j In Eq.8, ao j X i represents predicted AO type of j th classifier, X i denotes i th instance of past socio-economic profile without AO type information and Ψ C j (X i ) is the prediction of j th classifier on X i .
In this work, we have used Multi-class Adaptive Boosting as the model for prediction of AO type. The proposed Multi-class Adaptive Boosting (35) model makes use of the Decision tree (DT) as base classifier for the prediction of AO type ao i . In this present work, multiclass AdaBoost has been used for boosting the performance of DTs for multiclass classification problems. This proposed model is composed of four major steps: i) Initialization of weight vector for each socio-economic profile X i , ii) Addition of DT sequentially DT t (X) by using splitting along the features, iii) Predict AO by using each DT t (X) , iv) Obtain the vector of weighted prediction error and weight parameter and v) Update the weight vector and repeat until the error reaches a threshold. The details of step by step computation can be visualized in Algorithm 2. Here the proposed model predicts the AO type from 'N' no. of DTs constructed from weighted instances (socio-economic profiles) from the training data. Sequentially, the DTs are added and trained from weighted instances in training data. The prediction error is obtained by this process and it is continued until the stopping criteria are met. Here, two stopping criteria are considered such as i) no substantial improvement in prediction performance or, ii) the required/predefined no. of DT (i.e. N) has been created. Here the aggregate of weighted average of the resultant pool of DTs' prediction give rise to final AO prediction. Algorithm 2 presents the step by step working scheme of the proposed model.

Algorithm 2: Multi-class Adaptive Boosting Ensemble Learning based Model AO Prediction
1. Initialize the weights (Eq.9) of each X i ∈ X.
2. For t=0 to N i) Add DT sequentially DT t (X) by using splitting along features by using information gain computation (Eq.4) using Gini index (Eq.5).
In Eq.10, ao ′ is the vector of AO prediction and DT t (X) is the i th Decision Tree applied on X.
iii) Select the model DT t (X) with smallest amount of weighted prediction error (Eq.11): https://www.indjst.org/ In Eq.6, e t is the vector representing weighted AO prediction error and W t is the t th weight vector. iv) Calculate the weight parameter (Eq.12) of t th model: In Eq.12, δ t is the t th model's weight parameter. v) Apply Re-weighting and Update the weight of each socio-economic profile X i (Eq.13): In Eq.13, W t+1 X i is the (t + 1) th weight X i and θ is the normalization factor such that ∑ n i=1 W t i = 1 . vi) If (e t − e t+1 < λ , λ is the threshold) then, Break; Else, Continue; End_For 3. Return the final prediction (Eq.14): In Eq.14, Ψ AdaBoost (X) is the final prediction on X . End_Algorithm

Simulation environment, system and parameter setup
The experiments have been conducted in a system with Windows 10 Pro 64-bit OS, Processor Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (8 CPUs),~3.4GHz and 4GB RAM. The proposed model has been simulated and tested in Python programming environment. This programming environment and setup includes Pandas and Numpy framework (for data analogy); Matplotlib and Mlxtend framework (for data visualization); sklearn framework ( for pre-processing of data and classification model); classification-metrics framework (for performances measurement and analysis); Seaborn (for high-level interface with informative statistical graphs for correlation analysis); statsmodels.api (for the experiment on logit and probit model) and, scipy and Itertools (for scientific computing and efficient looping respectively). All the machine learning model parameters are set as per the baseline model and tested with 70% -30% Training and Testing Spit.

Results and analysis
In this section, the results obtained from the proposed model for optimal socio-economic factors selection (Algorithm 1) and prediction of AO type from socio-economic profile (Algorithm 2) are presented. The summary of the result of the Logit and Probit model for socio-economic factor analysis are displayed in Tables 2 and 3 respectively. It may be interpreted from the result of the Logit model that a unit increase in Rabi Crop farming (FD(RC)) results in 33.23% increase in agricultural production. However, it is likely to be increased by 20.76% in the Probit model ( Table 3 and Figure 4 (b)). The list of selected socio-economic factors by using Probit, Logit, and proposed Extra-tree learning model are listed in Table 4 in the Appendix.    The list of selected socio-economic factors ( Table 4) from Algorithm 1 and their data distribution has been presented in Figures 5 and 6 presents the correlation matrix of the selected socio-economic factors. Table 4. Selected Socio-economic factors using Logit, Probitand Proposed Extra-tree Learning Technique Used Selected Socio-economic Factors Logit (17,19) FD The AO prediction performance of the proposed prediction model (Algorithm 2) has been proposed and its performance has been compared with eleven standard machine learning based models: DT, K-Nearest Neigbhor (36) , Naïve Bayes (37) , Random Forest (38) , Multi-Layer Perceptron (39) , Linear Discriminant Analysis (LDA) (40) , Linear Regression (LR) (41) , Quadratic Discriminant Analysis (QDA) (42) and Stochastic Gradient Descent (SGD) (43) . Various performance metrics such as precision, F1-score, ROC-AUC, and recall are considered to compare all the models. The prediction of agricultural productivity by using various models such as DT, KNN, MLP, RF, NB, LDA, LR, QDA, SGD, and the proposed ensemble based model can be found in Figure 7 Figure 8 represents ROC w.r.t. agricultural productivity labels 1 to 4, where class 1 indicates the label 'poor' , and class 4 indicates the label 'very good' . Here, It is found that Micro-average and Macroaverage ROC curve have covered the area of 0.95 and 0.94 respectively and is higher than other compared models. Moreover, the class-wise coverage of the ROC curve for class 1 is 0.93, class 2 is 0.95, class 3 is 0.95 and class 4 is 0.93 respectively. Hence it is evident that the performance of the proposed method is superior to other models. A detailed comparative analysis among RF, KNN, DT, MLP, LR, and Proposed Ensemble Model with the considered performance metrics such as precision, recall & F1-score (for both class wise & overall prediction) and accuracy has been represented for all the classes in Table 5. Similarly, Table 6 presents a comparative analysis on the prediction of other considered models SGD, NB, LDA, and QDA. In Table 5 and Table 6, it is noticeable that the performance of the proposed model is superior to other models in terms of prediction. The https://www.indjst.org/ proposed socio-economic factors selection has been compared with the performance of Logit (17) (19) ( Table 2 ) and Probit (21) ( Table 3 in) based selected socio-economic factors ( Table 2) are presented in Figure 6.  Table 7 summarizes three major comparisons: i) performance of machine learning models with the optimal list of social-economic factors obtained through Logit model (  Table 4); ii) performance of machine learning models with the optimal list of social-economic factors obtained through Probit model ( Table 4); and iii) performance of machine learning models with the optimal list of social-economic factors obtained through proposed Extra tree classifier based model ( Table 4). Further, the proposed ensemble learning model based prediction of AO is found better than other counterparts. The accuracy of the proposed method is 88%, which is marginally best than the other models. Table 8 represents the performance comparison of the proposed model (Algorithm 1) for prediction of AO type with Extra-tree model based socio-economic factor selection (Algorithm 2) with other machine learning models. Figure 9 represents the performance of the proposed prediction model with number of estimators.    The overall comparative analysis has been represented in Figure 10. Here the data has been split into 70% and 30 % using the stratified sampling method and the performance has been shown for the methods such as DT, QDA, MLP, SGD, NB, LR, https://www.indjst.org/ LDA, RF, KNN, and the proposed method. It is worthy to note that from all the result analysis, the performance of the proposed model is superior to all the other models.
It is observed that the proposed Extra-tree learning and multi-class adaptive boosting meta-estimator based socio-economic factor analysis model is found better for analyzing and predicting agricultural productivity. However, it requires large and complex computation as compared to Logit and Probit model. On the other hand, the Logit and Probit model is inherently better for the identification of the correlation between socio-economic factors. But, the Logit model has the limitation of representing random variation for the unobserved factors, and the Probit model has the issues of temporally correlated errors. The simulation results show that the proposed approach has better performance in the prediction of agricultural productivity.

Conclusion
Although Probit and Logit model for factor analysis and its application to socio-economic factor analysis has been found suitable to infer the relationships between socio-economic factors (independent variables) and agricultural productivity (dependent variable), it is found poor in terms of prediction of agricultural productivity (target variable). The machine learning model based on socio-economic factors selection by using Extra trees classifier is found capable to prune out the socio-economic factors that contain relevant information to the target variable agricultural productivity. However, this approach of socio-economic factor selection requires heavy and complex computation as compared to the Probit and Logit based model. In socio-economic study, usually, the data collected from respondents are highly unstructured and random. Hence, relying on a single model prediction is not sufficient to make a decision. Here in this study, an ensemble meta-learner has been used; which is a form of meta-learning that constructs a higher-level prediction model over the predictions of considered base classifiers. This ensemble learning-based approach is found better in terms of agricultural productivity prediction.
This work may be a framework for the further study of socio-economic factors and supplement to the existing knowledge base for agricultural research in India and abroad, particularly in the area of agricultural productivity analysis. Further, it can be used as a system identification model for the identification of various social-economic factors of respondents (farmers) and the evaluation of these factors towards sustainable agricultural productivity. The expected outcome of this project may be a data-driven operational system for evaluation of socio-economic factors that influence sustainable agricultural productivity in India and can be extended to further study in Abroad.