GIS Based Hotspot and Cold-spot Analysis for Primary Education in India

Objectives: This study explores the status of primary education throughout the country in states and union territories (UTs). The main objectives of this study are to identify hotspots and cold-spots and delineate the cluster of primary education levels in India applying Getis-Ord Gi* statistics. Methods/analysis: The spatial clustering among the various Indian states was determined using Arc GIS cluster–outlier analysis. An area is considered statistically significant if its p-value is below 0.05. In order to define India’s spatial cluster of elementary EDI scores based on the hotspot and coldspot analysis and delineate Indian primary education clusters using Getis-Ord Gi* statistics with a fixed distance band of ArcGIS. Findings: The rate of increase of 6–10 year old (grade 1–5) enrolment in primary education rose from 42.6% in 1950–1951 to 100.8% in 2014–2015. The resulting education development index (EDI) of primary education shows a clustering of hot and cold-spots. The spatial outliers are also identified. This EDI demonstrates a geographic trend in the growth of primary education over 10 years as high or low primary education clusters and helps measure the government’s efforts. Improvements: Further this study could be enhanced to district level primary education in India. This also shows how GIS analysis works for these many more unit’s overtime. It may be an interesting analysis, where indeed the GIS and visual approach could add to our knowledge.


Introduction
According to estimates by the Unique Identification Authority of India, the population of India according to 2018 statistics is 1.34 billion. 1 This comprises 17.74% of the overall world population. Education level is of utmost importance for the entire progression of human beings. 2 According to the Indian ministry of human resource development, at the department of school education and literacy, to consider a 7-year-old child as literate, he/she should be able to read, write, and understand a language. As for adults (individuals aged 15 to 24 years old), a literate person should be able to read, write, and understand commonly used statements in a language. Based on this classification, the latest ministry's statistical reports show an improvement in the percentage of literate adult females between 2001 and 2011 (from 47.8% to 59.3%); however these numbers are still at low levels, because they indicated that 40.7% of future mothers were unable to read (Table 1). Furthermore, the percentage of illiterate females was the highest among the Scheduled Castes (SC)-the group of Indians living in hard conditions outside villages (59.8%). Table 2 shows that children's education has also improved between 2001 and 2011, although a great deal of work is still required to fight against illiteracy and therefore against poverty. 3 Based on the UNESCO Institute for Statistics, the worldwide rate of literacy is 86.3% as per 2015, while the literacy rate in India was 72.1% with an 18.1% difference between males and females, which shows that India still has a long journey towards education improvement in general and filling the gap between male and female education in particular. 4 The primary education level is a critical education stage of a student's life.
Taking into consideration the mental, physical and emotional changes that occur to students within this age Keywords: GIS, Hotspot Analysis, Cluster, Primary Education, Arc GIS, India group (6 to 14 years old), this school stage is a critical time during which students should develop a sense of belonging, sense of responsibility, and self-confidence in their own abilities. At this stage, classroom activitieshand-in-hand with the social environment in schoolaffect students' development and their desire to change their situations to achieve a better life. 5 Recently, the variation in the elementary education curriculum and school facilities has become an important aspect of elementary education outcomes. Therefore, authorities have planned for the Universalization of Elementary Education (UEE). UEE includes three important areas: first, the provision of universalization, which means that all children aged between 6 and 14 years old have access to a school; second, enrollment universalization, to ensure that all children of the aboveindicated age group are enrolled in school; finally, retention universalization, which ensures that students who joined elementary school continue until they graduate from upper elementary level. 6,7 In order to evaluate the primary and upper elementary education throughout India, the government (MHRD) in coordination with the National University of Educational Planning and Administration (NUEPA) has developed a computerized Educational Development Index (EDI) that helps in assessing four main areas: school access, facilities' infrastructure, teachers, and education outcomes. The areas of concentration were carefully chosen based on experiences of countries with successful education systems and based on the economic conditions in the country. In India, the GDP per capita is $1709 per year, compared to $57,600 per year in the USA. 8 This number indicates poverty, and therefore, if a school is not easily accessible, parents cannot support long-distance transportation. Children spend most of their waking time in school, which makes school infrastructure-such as the availability of washrooms and drinking water of major importance. 9 Teachers and teacher-classroom interaction are the main factors in the educational journey, not only in curriculum delivery but also in the social and behavioral skills of students. 10 A study conducted in the USA showed that a high percentage of variance in Social, Emotional, and Behavioral (SEB) scores of fourth and fifth-grade students were teacher-specific and/or influenced by classroom characteristics. 11 The EDI consists of 23 indicators, categorized into positive indicators, such as the gross enrolment ratio, and negative indicators, such as the percentage of habitations not served ( Table 3). The highest value will be considered as the best value and the lowest as the worst value in the case of a positive indicator. Similarly, if the measure is negative in nature, the lowest value will be the best value and the highest will represent the worst value.

Applications of Getis-Ord Gi* Statistic
In this research study, two types of geostatistical methods are used: one is about finding similarity patterns and the other method deals with determining anomalous values along with similarity patterns. Hotspot analysis by (Getis-Ord Gi*) is used to find the similarity pattern and, cluster and outlier (Anselin Local Moran's I) analysis are used to determine anomalous values, respectively. For the hot spot analysis, z-scores and p-values for each polygon are calculated based on the Getis-Ord Gi* statistic 12 and statistically significant hotspot and coldspots are identified. The Getis-Ord Gi* method is used to determine the trend (clustering) in the attributes of spatial data (points or polygons) in a particular location. In this method, a statistic is computed for each point or polygon in the study area and the pattern of local spatial autocorrelation over the study area is derived. We can determine the extent to which each polygon is having polygons with high or low values in the neighbourhood in a specific geographical area. This is an applied geographic approach that has been extensively used to identify the clustering of various relevant issues of research interest like species populations, 13 disease, 14 crime incidence, 15 medical care availability, 16 and food retailers, 17 the choice of transportation mode, land cover change, terrain analysis, climate studies etc., based on their Imane Ali Saleh and Perumal Balakrishnan spatial closeness which is measured by the proximity of polygons. 18 Recently this method was used to assess the possibility of lead pollution in post-industrial landscapes in Oakland, California. 19 From Getis-Ord Gi* analysis it was found that the lead pollution in this region can be correlated to the land use on both macro and micro scales. The Moran's I and Getis-Ord Gi* statistics used to analyze the heavy metal clusters in this method emphasized the significance of accuracy of spatial location in hotspot analysis. 20 However, only a few studies related to school education development have applied hot spot analysis.
The analysis of hotspots by (Getis-Ord Gi*) and the cluster and outlier analysis (Anselin's localMoran's I) were chosen to analyze primary education development India, according to the relevant literature. These methods of Geostatistics can be applied in concurrence, although in their outlook they are basically different, as set out in previous research. 21,22 The analysis of the hotspot is aimed at identifying groupings within a region. Such groupings can represent either high or low values of a given parameter, corresponding respectively to hotspots and cold spots. A hotspot analysis (Getis-Ord Gi*) was performed to classify these spots, which can be implemented via ArcGIS. 23 By contrast, the cluster and outlier analysis identify groupings or unusual values based on the proximity criterion. This analysis identifies five geographic class types. On one hand, this approach recognizes spots that match their surroundings with either high or low values. On the other hand, the analysis also finds areas where the parameter under study is much higher or lower i.e., a spot with a very different value compared to its surroundings. There are also cases where there is no possibility of making associations. The Gi* statistic output thematic maps are showing the spatial cluster area in the study area. Positive Gi* values indicate high-value spatial dependency. Gi* negative values suggest low-value spatial dependence. The statistically significant clustering is derived using the level of confidence and z-scores obtained. This determines whether an area is a hotspot, cold-spot or outlier (with higher value surrounded by lower value and vice versa). This research would be helpful for policymakers, practitioners and researchers and could contribute to the body of knowledge related to the understanding of geographic patterns and spatial statistics of primary education in India.

Materials and Methods
Education Development Index (EDI) scores of primary education were obtained during a period of 10 years between 2005 and 2015 24 DISE Flash Statistics by the National University of Educational Planning and Administration (NUEPA) and the Government of India, Department of School Education and Literacy (Table 4). When 2014 data was collected, normalized scores were obtained from 36 states including union territories (UTs) all over the country in the academic year 2014-2015, and the state of Telangana (29th state) was also included in the study (Figure 1). Data regarding India's population were obtained from the government of India database, states' areas were calculated using Arc GIS software, and the population density of each state was calculated by dividing the total amount of population by the area per km 2 . To calculate the different states' areas, the data coordinates were converted from the GCS coordinate system to the UTM coordinate system. The conversion can be done in Arc GIS by accessing data management in Arc Toolbox and choosing "projection and transformation" and then "project"; the file is selected in the input, and the UTM coordinate system is chosen with the appropriate zone of India. The purpose of population density calculation is to associate the level of primary education and the population affected per state. 1

Cluster-Outlier Analysis
The spatial clustering among the various Indian states was determined using Arc GIS cluster-outlier analysis. An area is considered statistically significant if its p-value is below 0.05. The analysis result shows the clusters of high values (HH) and clusters of low values (LL). In addition, HL outliers, which are high values surrounded by lower ones, and LH outliers, which are low values surrounded by higher ones, are also shown in this analysis. Z scores and p values were also analyzed. High positive z scores indicate a cluster (high or low values) and low negative z scores (critical value as ±1.96 or 2.58) indicate a statistically significant (0.05 level or 0.10 level) spatial outlier consistence. 25

Hotspot Analysis
To define India's spatial cluster of elementary EDI ratings, hotspot and cold-spot analysis are conducted to delineate India's spatial primary education clusters based on Getis-Ord Gi* statistics using ArcGIS software's fixed distance band. 26,27 Figure 2 shows that positive z scores with high values exhibit significant clustering of high EDI values (hotspot areas), while negative z scores with low values show significant clustering with low EDI values (cold-spot areas). Z scores near zero indicate no significant spatial clustering. This test works by looking at each feature within the context of its neighboring features. 28,29 The Getis-Ord local statistic is given as Where x j is the attribute value for feature j,w i , j is the spatial weight between feature i and j, n is equal to the total number of features, and j 1 The G i * statistic is a z-score, and so no further calculations are required.

Results and Discussion
The top five EDI scores and the lowest five EDI scores are  level over the 10 years (Table 5). Overall, the mean EDI score improved between 2005 and 2015 from 0.535 to 0.584. Figures 3 and 4 show a high concentration of high EDI scores (blue) in the south part of India, while low EDI scores are concentrated in the Eastern part. Bihar maintained the lowest score throughout the 10 years and Delhi maintained the highest score; some states showed positive or negative variation in their scores, showing that the EDI scores of all states have improved during the 10 years.
On the other hand, Figure 5 shows that Delhi has the highest population density; Bihar and West Bengal also have high population density, while the Eastern states with low EDI scores have moderate to low population density.
It is worth noting that the higher population density of Delhi compared to Bihar does not reflect that more students are benefitting from good-quality primary education, as Delhi is a very small state compared to Bihar. Delhi's overall population (18,343,784) is 15.3% of Bihar's population (119,461,014), which indicates the severity of the problem in large states with a low educational level.

Cluster-Outlier Analysis
In the local cluster analysis, aggregates of states with lower or higher values are easily detected. Significant small clusters of the index of educational development were spread throughout the state (high-high); however,   results also showed no significant spatial autocorrelation or low-value (low-low) clustering pattern in India during the study period, as shown in Figure 6a-j. There are other clusters found, all of which are correlated with p-values higher than 0.05, which are not shown in the expected results for 2013 shown in Table 6. In this current study, the presence of only extreme p values (less than 0.01) clearly shows the identification of statistically significant clusters. The presence of a significant aggregation of states with low EDI scores or high EDI scores can be easily detected using cluster-outlier analysis. Overall, the Eastern part of  India (Bihar, Jharkhand, West Bengal, Assam, Meghalaya, and Arunachal Pradesh) showed a low-low cluster and maintained it throughout the 10 years, which indicates significantly lower primary education level EDI scores when compared to the rest of India. On the other hand, Figure 6 shows a high-high cluster toward the south area (Kerala and Tamil Nadu). These clusters demonstrate that government support is not uniformly distributed throughout the country and that some highly populated areas such as Bihar and West Bengal ( Figure 6) are suffering from poverty and low elementary education levels.
It is interesting to see that Mizoram, in the east, was marked as a significant HL outlier in 2008-2009, which indicates an improvement in its scores compared to the neighboring states, but this improvement did not continue. Kerala was not a part of the HH cluster every year, and in the last data collection, its scores were not significantly higher than the neighboring states, showing that the southern EDI scores are generally higher. Finally, Jammu and Kashmir were indicated as LH outliers in the last data collection (2014-2015), which indicates a drop in their level of primary education compared to the neighboring states.

Hotspot Analysis
Figure 7a-j shows maps based on the Z-scores (standard deviation values); high positive z-scores indicate significant hotspots (red), while negative low z-scores indicate significant cold-spots (blue). Areas that have z-scores close to zero have p-values > 0.05 and therefore are not significant. The results of the analysis showed strong spatial trends of high-educational growth in South India, namely Tamil Nadu, Kerala, Karnataka, and Andhra Pradesh, which throughout the 10 years of the study showed small to very large hotspots (red and orange). Most of the cold-spots, however, are located in India's northeast region. There were also some cold-spots in the northern part of India. These observations indicate the maintenance of an acceptable overall primary education level in north India. On the other hand, cold-spots cover large highly populated areas of India, including all of the eastern parts except Sikkim. If we calculate the population of the areas that are continuously cold-spots during the 10 years of the study (Figure 8), we find it to be 342,214,000, which is 27% of the Indian population. Hotspot analysis shows that children aged 6-10 make up 11.32% of the population living in this region.

Conclusions
The tools used in this analysis are very valuable, and they helped us to determine the effect of various factors on primary education in India. EDI scores take into consideration all the conditions that affect the quality of education, and the model gives us a final score, upon which we have based our comparisons. Our results show the states in risk and also the patterns of variation in primary education level over the years.
States permanently in cold-spots require urgent governmental attention. The fluctuations in some states require further analysis and for these patterns to be linked to the political and socio-economic conditions during the years of the study. The spatial statistics obtained can represent the first step in building a Table 6. High and low clustering for different years using Getis-Ord Gi* statistics