Artificial Neural Networks Based Integrated Crop Recommendation System Using Soil and Climatic Parameters

Objective : To develop crop recommendation system depending on location specific soil and climatic conditions. Method: The study introduces a novel recommendation system which uses Artificial Neural Networks (ANN) for recommending the suitable crop. The crops are recommended based on (a) Soil properties (b) Crop characteristics (c) Climate parameters. The crops namely maize, Finger millet, Rice and sugarcane is considered for the study. Depending on degree of relationship and limitations of the factors considered, following suitability classes are established: (a) Highly suitable: S1 (b) Moderately suitable: S2 (c) Marginally suitable: S3 (d) not suitable. The system uses the climate data from Meteorological survey of India and the soil data of Hadonahalli and Durgenahalli of Doddaballapur (dist.), Karnataka, India. The user interface developed takes the location specific soil properties as real time input and recommends the suitable crop considering the input and climate parameters. Findings: For the measurement of accuracy the model was tested on with ANN and decision tree. Overall accuracy value of ANN is 96% where the accuracy value of Decision tree is 91.5%. Hence the results obtained from ANN can be considered more efficient. Novelty: The number of models developed for crop recommendation is limited and the proposed model serves as the promising aspect in the planning of crops. 
Keywords: Crop recommendation; ANN; Soil characters; Climate; MongoDB


Introduction
India is one of the major producers of agricultural products across the world. The agricultural sector is the employment provider for 58% of the Indian population and its contribution to the GDP is 17% (1) . Crop yield is dependent on the variety of attributes such as soil conditions, rainfall, available sunshine, irrigation, fertilizer application, pests, and land preparation. The common difficulty that Indian farmers face is that they do not opt for the crop according to the soil and climatic conditions (2) . Considering the fact that climate and soil properties have direct influence on crop yield, there is need to devise crop management practices based on soil and site suitability for maximizing production (3) . Predominantly weather and agriculture are strongly co-related and it is a necessity to embrace the changes in the climate patterns productively (4) . Climatesmart agriculture strategies is important to improve the yield and quality of yield. The previous researches (5) summarizes the effect of extreme weather conditions on crops.
Lately the advent of precision agriculture presently has bought major changes in field of agriculture setting focus on irrigation methods, fertilizing, crop monitoring and yield prediction (6) . Incidentally choosing the suitable crop relative to location specific soil parameters and climatic conditions is important for increasing productivity. Hence farmers must be empowered with tools that enable them choose the right crop fitting location specific climatic and soil properties. Integrating machine learning for agricultural planning purposes is a promising approach in developing countries which has led to developing applications such as crop yield forecasting, crop disease identification, fertilizer management, and so on (2,7) . But the number of studies carried out in developing the recommendation system to choose the precise crop is limited. Developing the crop recommendation framework considering the location specific parameters will benefit the farmers. The work presented in this article aims at developing a recommendation model that suggests the most suitable crop depending on location-specific soil characteristics and climatic conditions. To the best of our knowledge no similar recommendation model using ANN has been reported.
The paper is organized as follows. Section 2 presents a literature survey of current web based and mobile based agricultural applications available, section 3 presents the proposed framework, section 4 describes the implementation details, section 5 discusses the experimental results.

Related work
Substantial amount of work has been carried out on impacts of climate and the soil on crop yield. Machine learning has unlimited potential in utilizing the historical climate dataset and demonstrate the linkages between climate and crop performance. It is the new scientific dimension that use data intense approaches to drive agricultural productivity. While Big Data handles massive amount of data generated from the farms by leveraging the technologies such as cloud computing and internet of things, machine learning techniques analyzes and supports decision making in smart farming (6) . Significant work has been carried out on yield forecasting using machine learning techniques. Studies indicate that considering the effects of climate of soil and climate parameters for the calculation of final yield is satisfactory (8) . Suggests novel applications of Machine Learning in agriculture can improve their operations, as algorithms can facilitate in classifying, clustering, detecting, and predicting different environmental conditions affecting agricultural operations and interpret the climate and weather-related risk in agriculture (9) proposes that combining machine learning with domain knowledge improves the conclusions about climate impact on agriculture. Artificial neural network (ANN) and multiple linear regression (MLR) has been used to predict biomass yield of winter wheat by identifying input features such as soil, precipitation, topographic, and management factors, the amount of (nitrogen, phosphorus, and potash) fertilizers consumed, and efficiency of water usage. The model has a determination coefficient R of 90% for the tillage method (10) . It is evident that soil and climate directly affect the yield, works related to recommending the crops depending on climate soil has been explored. The authors of (11) proposes the crop model that suggests the suitable crops considering temperature, rainfall and soil pH applying decision trees and logistic regression. Data mining used in crop recommendation system chooses site specific soil parameters only. Using an ensemble model with random tree, CHAID, K-NN and naïve Bayes algorithms generates rules for recommendation (12) . The authors of (13) uses decision tree, K-NN, random forest to demonstrate recommendation model considering soil type, precipitation, temperature is developed (14) . Incorporates soil and climate information for improving nitrogen recommendation for corn. Performance of eight different machine learning algorithms were observed on the dataset containing soil and weather variables. The performance of the algorithms were assessed based on their prediction of nitrogen fertilizer recommendation. Along with the adaptability of the crops to the environment, each crop species require specific soil -site conditions for optimum growth (15) . Based on the literature survey the shortcoming that we observed in these notable publications is that the authors have considered lesser experimental parameters for developing the recommendation model. In our work we have considered suitability of soil properties, climatic properties with calculated Evapotranspiration (Thornthwaite method). Our recommendation model is developed with Artificial Neural Networks and takes real time input to suggest the location specific suitable crop.

Materials and methods
The proposed framework builds a recommendation system that suggests a suitable crop by considering the physical properties of soil, climatic properties and crop characteristics. Choosing the right crop suitable to location specific conditions contributes to increase in crop yield (12) . This recommendation system empowers the farmers to decide a suitable crop for plantation. It also helps government agencies to device effective land management practices to increase productivity and maintain soil fertility.
The proposed framework consists of four main stages as shown in Figure 1 The major steps followed in correlating soil and climate properties with crop requirements includes following steps: The data required for the study includes climate parameters, physical properties of soil and crop characteristics. Climate parameters are obtained from meteorological survey of India. The climate data is considered for the time span of 10 years (2007-2017). The crops namely maize, Finger millet, Rice and sugarcane is considered for the study. The crops considered are the important economics crops grown in the area. A document data store such as Mongo DB is the best option to store semi-structured data. Mongo DB is an open source document database that provides high availability, high performance and automatic scaling.

Data set used
In this study we focus on acquiring datasets for the locations Hadonahalli and Durgenahalli of Doddaballapur (dist.), Karnataka India The district lies at 13º20' north latitude and 77º 31' east longitude. The data set consists of distinctive soil and climate characteristics together with crop requirements of maize, finger millet, sugarcane and rice. Unlike the standard weather parameters such as rainfall and temperature, other environmental aspects such as precipitation, humidity, wind speed, sunshine hours, potential evotranspiration is considered. Daily meteorological data for the location is collected from Agro meteorology Section, University of Agricultural Sciences, Bengaluru from the period 2007 to 2017. The land/ soil characteristics for this location is obtained from National Bureau of Soil Survey and Soil Usage Planning (NBSS & LUP), Bengaluru. The soil dataset consists of 12 measures of soil physical properties. Among these measures, six attributes common and significant for crops has been considered (16) . The data of soil consists of texture, soil pH, gravel code, erosion code and water retaining properties such as slope and depth. The soil parameters are indispensable for the crop growth. Even though their nutrient levels vary, soil efficiency directly effects the crop growth (17) . The crop requirement data consists of mean temperature, soil drainage, texture, depth, slope, length of growing period for every crop under study (3) .

Potential Evapotranspiration (PET)
Potential Evapotranspiration (PET) is extracted from the submitted data. PET is calculated using Thornthwaite method (18,19) . To calculate Potential Evapotranspiration (PET) using Thornthwaite method, the procedure is as follows: https://www.indjst.org/ Where i is the monthly heat index, T is monthly temperature in degree Celsius.
Monthly heat index is calculated as follows: Obtained values are later corrected according to the real length of the month and the theoretical sunshine hours as follows: Where N: are the theoretical sunshine hours for each month and d number of days for each month. Var Rice S1 { _id: Object_id ("7098203fd5g6758ab3k98734") Drainage: {'Imperfectly drained"} Texture: {"Clay", "silty", "clay", "clay loam", "silty clay loam"} ….. } _id holds the Object_id which is unique value generated for every document. Object_id are 12 Bytes values comprising of a 4-byte timestamp value, a 5-byte random value, a 3-byte incrementing counter (20) . Mongo DB have dynamic schemas and provides flexibility to integrate the data faster and easier. The motivation of the Mongo DB language is to implement a data store that provides high performance, high availability, and automatic scaling. Mongo DB is extremely simple to install and implement (21,22) .

Data storage
The given raw data provided as comma separated values (.csv) format is first loaded onto clusters of Mongo DB or the given data is augmented into the existing data. Each object stored in Mongo DB consists of location specific soil data and the climate data.

Artificial Neural Network
Artificial neural networks are the nonlinear mathematical learning models that are designed by simulating biological neural networks. ANNs has the ability to process the nonlinear datasets and map them with the output. Multilayer perceptrons (MLP) is most widely used ANN to solve nonlinear datasets. The network of ANN model has three main layers: input layer, hidden layer and output layer. Figure 3 shows the generalized structure of multilayer perceptron model. Finding the suitable network structure is one of the major problem faced by the researchers (23,24) . There is no systematic approach in the literature to find the structure of the neural networks. One of the basic thumb rule to choose the hidden neurons is that their number should be between the size of the input layer and the size of the output layer (25) . Researchers have adopted trial and error method to choose the hidden units and arranging these units into hidden layers until error reaches the minimum value.
Each neuron in the input layer receives the input from the user and each input signal received in broadcasted to the neurons of the hidden layer. Each unit in the hidden layer computes the output by summing up the weighted input signals and applying the activation function using equation (4).
https://www.indjst.org/ W H is the of the input unit to the hidden unit, B H is the bias, f is the activation function. Each output unit sums its weighted input signal and applies its activation function to compute its output signal using equation (5).
Activation function used in the neural network are the mathematical equations that decide whether a neuron must be produce the output. The activation function is attached to every neuron in the network. Hence it can be considered as the gate that between the input to the current neuron and the output of the neuron. Nonlinear activation functions are required to model complex data and predict the output (26) . RELU(REctified Linear Unit) is the most widely used activation function (27) . RELU outputs zero for the inputs lesser than zero and output one for inputs greater than zero.
Softmax function is the suitable choice for multiclass classification (28) . Softmax considered useful because it converts the scores to a normalized probability distribution. The output of the neural network passes through the softmax activation function which converts the scores into probability values which sums up to one. This can be illustrated as follows: Softmax output function can be expressed as: σ () = e z i ∑ K j=1 e z j Backpropogation algorithm is the most widely used in training multilayer perceptron neural network. It is used in feed forward networks where input signals are sent forward and errors are propagated backwards to adjust the weights in a manner to minimize the output error. During the training this process is repeated and each repetition is termed as epoch (29) . The error calculation in Backpropogation algorithm is given by Each weight is updated as Where ∆w i is the correction factor calculated as follows Where △w i is the learning rate and δ j is the difference in the output for x i as the input. ANN is amalgamated in agriculture sector considering its advantages over traditional decision models (13,30,31) . https://www.indjst.org/

Implementation
The system is implemented using a python script running on a light weight framework known as flask over an EC2 instance. The experimental dataset consists of four sets of crop data for the crops rice, maize, sugarcane and finger millet. The dataset available is in form of categorical data. These categorical data is converted into numerical form using one-hot-encoding that enables easy implementation of machine learning algorithms. (32) The dataset is split into training data and the test data in the ratio 60:40. Our model has 1 input layer, 2 hidden layers and a multilabel output layer. The model was trained with Backpropogation algorithm. Mean square error (MSE) was considered as the decisive parameter in to train the model. MSE is calculated as shown in Eqn. 9 Where To is the original value, Tr is the recommended value. A four-layered ANN architecture has been proposed with the 1 input layer with 6 neurons, 2 hidden layers with 5 neurons, and 1 output layer with 4 neuron as shown in the Figure 4. Neural networks can recognize numerical patterns, all the real world data must be converted into this numerical form. The dataset available is in form of categorical data. These categorical data is converted into numerical form using one-hot-encoding that enables easy implementation of machine learning algorithms (32,33) .
The model was trained with the back propagation algorithm. Rectified Linear Unit (RELU) activation function was used in input and hidden layers. Our system is the measure of how probable a certain crop is to grow in the given soil and weather condition. This models compares probability of crops like maize, finger millets, rice and sugarcane and ranks them according to the best choice. Hence Softmax activation function is used in the output layer. The training of ANN model was stopped after 89 epoch as the minimal MSE of .0005 was obtained.
Decision tree classifier is trained using ID3 algorithm over the same dataset. Many published works have used decision tree classifier for the recommendation model (11,13) . So decision tree is used to compare with the results obtained from the ANN.

Results and discussions
The results obtained from the model are analyzed for different crops over the same test data. The same test data is used to provide an unbiased evaluation of the model fit on the training dataset. Decision tree classifier is used over the same dataset to comparative analysis of the results obtained from ANN.
Suitability is recommended in terms of class:  The performance of the Decision tree and ANN models in the recommendation of suitable crop are measured and tabulated in Table 6. From the accuracy results, it can be concluded that ANN performs better compared to decision tree model in all the crops. So ANN can be used for effective crop recommendation with location specific soil and climatic parameters. Figure 5 a-d shows the plot of suitability classes vs. location specific data obtained. Suitability classes plotted are obtained original values and recommended classes obtained from ANN. From the plots we can observe that accuracy obtained by ANN is satisfactory.

Conclusion
The presented recommendation model is effective method for solving the problem faced by farmers to choose the right crop during the cropping season. The model is calibrated and tested with the data from a single region in India. The same model cannot be generalized with different soil types. Hence, there is a need to use the soil samples from different regions to generalize the model. Additionally, the model is trained with decision tree classifier to validate the performance of present model developed using ANN. With the measured accuracy values it has been found that recommendation model developed with ANN performs better with 97% accuracy compared 92% accuracy obtained from decision tree classifier. Furthermore, ANN performs better for larger datasets.
In this work a well-trained artificial neural network is used to recommend the suitable crop based on location specific soil data and historical weather data. In the future, the proposed system can be extended taking into account market demand, availability of market infrastructure, expected profit, post-harvest storage, and processing technologies. This would provide a comprehensive crop recommendation based on geographical, environmental, and economic aspects leading to successful agricultural system.