Analysing Soil Data using Data Mining Classification Techniques

Data Mining (DM) becomes popular in the field of agriculture for soil classification, wasteland management and crop and pest management. In1 assessed the variety of association techniques in DM and applied into the database of soil science to predict the meaningful relationships and provided association rules for different soil types in agriculture. Similarly, agriculture prediction, disease detection and optimizing the pesticides are analyzed with the use of various data mining techniques earlier2. In3 analyzed J48 classification algorithm in high accuracy for predict the soil fertility rate. In4 investigated the uses of various DM techniques for knowledge discovery in agriculture sector and introduced different exhibits for knowledge discovery in the form of Association Rules, Clustering, Classification and Correlation. In5 predicted the soil fertility classes using with classification techniques were Naïve Bayes, J48 and K-Nearest Neighbor algorithms. In6 used Adopted data mining techniques to estimate crop yield analysis. Multiple Linear Regression (MLR) method was used to find the linear relationship between dependent and independent variables. K-Means clustering approach was also use to form four clusters considering Rainfall as key parameter. In7 analyzed the vegetative factors of landslides in the Shimen reservoir watershed in northern Taiwan. Decision tree, Bayesian Network data mining techniques and the non-linear approaches were implemented. Optimization based Bayesian Network approach was considered as better than non-linear. In8 analyzed the virtual significance of soil fertility and the crop management factors to predict the maize yields and in determining the yield variability and the gap between farmers. Classification and regression tree analysis was used to predict the result. In9 investigated two comprehensive methods to calculate the production related yield gap and a soil fertility related nutrient balance. The methodology allows knowledge from microscale to higher-scale levels and determines land quality. In10 predicted soil attributes and analyzed soil data using classification techniques. Soil properties such as pH value, Electrical Conductivity (EC), Potassium, Iron, Copper, etc. were classified using classification algorithms like Naïve Bayes, J48 and JRip. Among the algorithms, J48 was considered as simple classifier and produced better result. Abstract


Introduction
Data Mining (DM) becomes popular in the field of agriculture for soil classification, wasteland management and crop and pest management. In 1 assessed the variety of association techniques in DM and applied into the database of soil science to predict the meaningful relationships and provided association rules for different soil types in agriculture. Similarly, agriculture prediction, disease detection and optimizing the pesticides are analyzed with the use of various data mining techniques earlier 2 . In 3 analyzed J48 classification algorithm in high accuracy for predict the soil fertility rate. In 4 investigated the uses of various DM techniques for knowledge discovery in agriculture sector and introduced different exhibits for knowledge discovery in the form of Association Rules, Clustering, Classification and Correlation. In 5 predicted the soil fertility classes using with classification techniques were Naïve Bayes, J48 and K-Nearest Neighbor algorithms. In 6 used Adopted data mining techniques to estimate crop yield analysis. Multiple Linear Regression (MLR) method was used to find the linear relationship between dependent and independent variables. K-Means clustering approach was also use to form four clusters considering Rainfall as key parameter. In 7 analyzed the vegetative factors of landslides in the Shimen reservoir watershed in northern Taiwan. Decision tree, Bayesian Network data mining techniques and the non-linear approaches were implemented. Optimization based Bayesian Network approach was considered as better than non-linear. In 8 analyzed the virtual significance of soil fertility and the crop management factors to predict the maize yields and in determining the yield variability and the gap between farmers. Classification and regression tree analysis was used to predict the result. In 9 investigated two comprehensive methods to calculate the production related yield gap and a soil fertility related nutrient balance. The methodology allows knowledge from microscale to higher-scale levels and determines land quality. In 10 predicted soil attributes and analyzed soil data using classification techniques. Soil properties such as pH value, Electrical Conductivity (EC), Potassium, Iron, Copper, etc. were classified using classification algorithms like Naïve Bayes, J48 and JRip. Among the algorithms, J48 was considered as simple classifier and produced better result.
2 Analysing Soil Data using Data Mining Classification Techniques

Agricultural Data Mining
Data Mining is essential to discover the agricultural related knowledge such as soil fertility, yield prediction and soil erosion. Soil prediction helps to for soil remedy and crop management. Classification algorithms involve finding rules that partition the data into disjoint groups. A set of classification rules are generated by such a classification process, which can be used to classify future data 11 .
Following section give explanation of classification algorithms such as Naive Bayesian classifier, J48 decision tree classifier and JRip classifier.

Naive Bayes
A Naive Bayes classifier is one of the classifiers in a family of simple probabilistic classification techniques in machine learning. It is based on the Bayes theorem with independence features. Each class labels are estimated through probability of given instance. It needs only small amount of training data to predict class label necessary for classification 12 .

J48 (C4.5)
The J48 is one of the classification-decision tree algorithm and it slightly modified from C4.5 in Weka. It can select the test as best information gain. This algorithm was proposed by Ross Quinlan. C4.5 is also referred to as a statistical classifier. J48 predicts dependent variable from available data. It builds tree based on attributes values of training data. This classifies data with the help of feature of data instances that said to have information gain. The importance of error tolerance is developed using pruning concept 13,14 .

IREP optimized version is Repeated Incremental
Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen. This algorithm is a propositional guideline learner. J-Rip classifier is one of the decision tree pruning models based on association rules. It is an effective technique to reduce error pruning. In this algorithm, the training data is split into two sets and with the help of pruning operators the error is reduced on both the sets. Finally rules are formed from two sets such as Growing set and Pruning set.

Results and Discussion
In this work, we collected the agricultural soil dataset from the soil testing lab., Virudhunagar District. We have taken 110 data which contains the attributes such as Village Name, Soil Type or Color, Soil Texture, PH, EC (Electrical Conductivity), Lime Status, Phosphorous. This system predicted the soil type Red and Black based on the PH and EC value. The PH value of Black soil discovered as greater than 7.7 and Red soil found as less than 7.7. We took three classification algorithms such as JRip, J48, Naive Bayes to predict the soil type Red and Black. While applying three classifier algorithms, JRip considers the entire attribute. But, J48 classifier considers only PH and EC value. Tree is build based on above two attributes. JRip classifier generates the rules efficiently and shows good performance for this soil data set. As comparing these three algorithms JRip resulted in high accuracy. Here, full dataset considered as training set.
Based on the training data set it is concluded that weighted average of True Positive Rate of JRip classifier is 0.982. In the case J48 and Naïve Bayes TP Rate is 0.97 and 0.86 it indicates the low level. So, automatically JRip classifier classified the data set in higher sense. Soil properties differed among sites with Red textured soils and Black textured soils. It since that below 7.0 is acid soil and above 7.0 is alkaline soil. The spectral analysis was sufficiently sensitive to capture the variation in soil fertility between the different soil natures.The soil dataset which contains the attributes like soil type, pH value, etc. are given in Figure 1. This data set organized in Excel Sheet with saves as type is CSV extension. The number of incorrectly classified instances, error rate of JRip is given in Figure 2.  The comparative analysis of classifiers is given in Table  1. Here JRip performed better classification to compare the other algorithms and also Kappa Statistic value becomes nearest 1.00 in JRip algorithm. The JRip algorithm gives the high prediction accuracy is given in Figure 4. The Naive Bayes Algorithm has less accuracy compared than J48 and JRip.

Conclusion and Future Work
In this paper, the comparative analysis of three algorithms like Naïve Bayes, JRip and J48 is projected. JRip classification algorithm gives better result of this dataset and is correctly classified into maximum number of instances comparing with the other two. JRip can be recommended to predict soil types.