Prediction of COVID-19 trend in India using time series forecasting

Objectives: COVID-19 pandemic is one of the prevalent challenges mankind has ever faced and there is a lot of uncertainty prevailing over the future with respect to COVID-19. In this situation machine learning algorithms can be useful for real-time analysis and prediction of trends of the infections. The objective of the research study is to analyze the COVID-19 trend in India and forecast the trend of outbreak in near future. This model can provisionally guide the government and healthcare organizations in making preparations for the upcoming situation arising out of COVID-19 transmission. Methods: The COVID-19 data from 30-Dec 2019 to 27-July 2020 was used for prediction of COVID-19 trend in next 30 days i.e. from 28 July to 26 August. The time series forecasting approaches with ARIMA Model and PROPHET were used for forecasting. The performance of these models was evaluated using validation metrics and good performance was indicated. Findings: The prediction results indicate an increasing trend of COVID-19 positive, active and deceased cases in India for next 30 days i.e. up to 26 August 2020. Novelty: COVID-19 pandemic is a new problem. The novelty and originality of this research lies in the fact that time series forecasting is used for real time analysis and prediction of COVID-19 pandemic.


Introduction
Coronavirus disease (COVID-19) emerged as a mysterious viral respiratory disease in Wuhan, Hubei, China in Dec 2019 (1) . The cause of disease was identified as kind of corona virus also known as novel coronavirus with the formal name of SARS-CoV-2 by the International Committee on Taxonomy of Viruses, as it was showed similarity with SARS-2003.The virus can progress through the respiratory tract into a person's lungs causing inflammation and the air sacs, or alveoli, that can fill with fluid and pus. This condition can limit a person's ability to take in oxygen and in severe cases; patients cannot take enough oxygen or breathe out carbon dioxide, finally leading to life threatening conditions like heart failure.
The corona virus outbreak came into limelight on 31st December 2019 when China informed WHO about the outbreak of pneumonia in Wuhan, Hubei province. COVID-19 emerged as a public health emergency very quickly and was declared as a global pandemic by WHO on 11 March 2020 (2) . As on 27th July 2020 World has unfortunately witnessed a total of 16.2 million COVID-19 infected cases with 0.648 million deaths and 940000 recoveries (3) .
Most of the COVID infected cases reported symptoms like fever, dry cough, and tiredness with breathing difficulty in severe cases.COVID-19 infection can lead to potential complications like pneumonia, cytokine strokes and multi organ failures ultimately resulting in death of the patient (4)(5)(6) . Some of the cases are asymptotic without noticeable illness in the infected person but the asymptotic patients are also contagious virus carriers although their infectivity may be weak (7,8) .
The mode of transmission of SARS-CoV-2 is through close direct or indirect contact with the infected persons via infected respiratory secretions or saliva. When an infected person coughs or sneezes, sings or talks within 1 meters of the range the virus can reach the mouth, eyes or nose of the susceptible person resulting in infection (9) . Contact with the surrounded objects or surfaces infected with virus can result in infection via indirect mode of transmission called fomite transmission (10) . Indoor settings with poor ventilation virus can spread through air as a result of airborne transmission. Incubation period for the virus is 14 days and in the majority of cases symptoms start to appear after 4-5 days of exposure while in some cases symptoms may appear as late as 11 days after exposure (11) . Scientific evidence suggests that 1-3 days before the development of symptoms is the peak time when an infected person can transmit the virus to another person (12,13) . Even though people from all age groups are at risk of infection, most of the infections were reported from the middle age group and the most vulnerable categories of population to virus includes senior citizens, pregnant women and persons with co-morbidities (14) .
The COVID-19 patient can show the following symptoms:  (13,14) The first case of COVID-19 was reported in India on 30 January 2020 in Kerala when a student travelled from Wuhan to back his home in India (15) . To contain the virus transmission in the community, a number of countries imposed lockdown restrictions to ensure confinement of the general public in their homes. Indian Govt ordered nationwide lockdown for 21 days on 24 March 2020 which was further extended as a preventive measure against spread of COVID-19 infection when the number of reported COVID-19 cases came close to 500 restricting the movement of 1.32 billion populations. As on 27 th July, 2020, India has reported 1.43 million COVID positive cases, 0.486 million active case, 32,771 deaths and 9.17 million recoveries. On 26 th July 2020 India recorded highest number of positive cases numbering 50,000. Moreover, from the last five days the daily reported positive cases is more than 45,000 each day. The recovery rate from the disease is almost 64% in India and the case fatality report (CFR) is 2.3%. There are about 1300 testing laboratories in India and number is keeping on increasing. More than 0.5 million tests are conducted daily and there are 1.1 million isolation beds in the country and more than 11000 COVID care facilities. The CFR is progressively decreasing and it is a good sign. The lockdown declared by India 0n March 22, 2020 is continuing and it is the 125 day today.
Among Indian states and Union territories, Maharashtra reported the highest number of 3,75,799 cases with 13,756 deaths followed by Tamil Nadu having 2,13,723 cases and 3,494 deaths, Delhi is third worst affected city having 1,30,606 cases and 3827 fatalities (16,17) .
As on date India stands at the third position in number of reported coronavirus cases, with the U.S. leading in number of 41,48,011 coronavirus cases and 148,012 deaths followed by Brazil having 2,419,091 cases and 87,004 deaths. National average of CFR (COVID-19 Fatality Rate) in India is 2.28% which is 1.98% less than the average global CFR and is lower than that of the United States (3.88 per cent) and Brazil (3.81), as on 20 July 2020 but it varies for the individual states with highest CFR of 4.09% in Gujarat, 3.55% in Maharashtra, and 2.57% in West Bengal (18,19) .
As of now, there is no treatment for COVID-19 but symptoms may be treated with a certain combination of drugs or plasma transfusion for severe patients depending on clinical conditions of the patient treated (20,21) . Even though a number of countries like Russia, India, U. S have started human trials on vaccines to protect from the disease but there is no reported success till date (22)(23)(24) .
Time Series forecasting can be used to predict the number of COVID-19 cases, deaths and recoveries in the near future. We are using time series forecasting approaches with ARIMA model and PROPHET to predict the COVID-19 cases in India for next 30 days. Although it is novel coronavirus and the data available at the initial stage was very small, therefore there are chances of some uncertainty or inaccuracy with forecasted trend of infections (25) (26) . ARIMA Models have already been used for the prediction of infectious diseases or other natural calamities and is suitable for short term predictions based on historical data (27)(28)(29)(30)(31)(32)(33) . Number of studies have been conducted to predict future trends of COVID-19 using various statistical models but there are few limitations like lack of proper data, unreported cases, lack of testing, over fitting of data, use of improper model and dynamically changing situations leading to unpredictable increase or decrease in the number of cases (34)(35)(36)(37)(38)(39) .
Keeping in view the increasing trend of the COVID-19 and subsequent rise of fear among the people motivated us to carry on the research to forecast the cases in the coming days, so that the necessary preparedness can be executed (40) . The paper is organized into five sections. Section 1 gives the overall introduction of the work carried out. Section 2 depicts the objectives clearly. Section 3 describes the methodology adopted along with the detailed flow of procedures. Section 4 discusses the results and findings and in the last section 5 gives the summary of the work done.

Objectives
1. To predict number of COVID-19 cases, 2. To predict number of recovered cases and 3. To predict number of deceased cases in India from 27-July to 26 August.

Methodology
The flowchart of the methodology adopted here in this study is shown in Figure 1.  The ARIMA model is denoted as ARIMA (n, f, p), in which n determines the order of auto regression terms, f is the degree of differencing used to form a stationary times series, and p is the order of moving average. The value of 'v' at time t for an ARIMA model is estimated as equation (1). Here the moving average parameter is denoted by ′ ∅ ′ At f=2, the second difference is actually the 1st difference of the 1 st difference and not the difference of the 2 periods ago. It is not the local trend but, it is the local acceleration of the series. The general forecasting equation in terms of v is given by: The time series is checked for stationary using Augmented Dickey Fuller (ADF) test and may require logging or differencing of terms to make it stationary and to stabilize the series (41) . https://www.indjst.org/

PROPHET
It is an open source Face book library that can be used for decomposition of time series and forecasting of future trends easily and accurately. This model is flexible in nature and can deal well with missing values. The time series model is additive in nature and fits linear data. It has three main components: • Trend • Seasonality • Holidays The general equation for PROPHET model is given as: Where g(t) logistic growth curve for modeling non periodic changes in time series. s(t) is periodic changes, h(t) is effectiveness of yearly seasonally holidays and e(t) is error term accounts unusual changes not accommodated by model.  Figure 2 displays the COVID-19 trend for India for the daily confirmed, total confirmed, daily recovered, total recovered, daily deceased and total deceased cases from 30 Jan 2020 to 20-July 2020.The data was plotted to visualize the count of confirmed cases, deceased cases and recovered cases in India. Graph indicates continuously ascending trend in all components. The visualization indicates that number of confirmed COVID-19 cases rose from 1 on 30 January, 2020 to 1.1 million cases on 20 July, 2020.X-axis in the plot denotes dates and Y-axis denotes the number of cases. The Figure 3 displays Indian state wise COVID-19 trend for active, confirmed, recovered and deceased cases. The total cases are increasing with time but certain states are more affected like Maharashtra, Tamil Nadu, Delhi and Gujarat as indicated in the figure.

Results and Findings
In this research study, time series analysis and forecasting using Arima model and Prophet to illustrate the trend of infections from the period of 30 January to 27 July and to forecast the future cases from 28 July to 26 August.

Forecast using ARIMA model
The visualization displays residual plots for confirmed, deceased, recovered and active COVID-19 cases in the country from 30 Jan 2020 to 27 July 2020.Time series is non stationary and non seasonal in nature. Increasing trend indicates the continuous surge in number of COVID-19 cases. Daily data of covid-19 cases acts as a variable for time series data model against time. We are using Auto ARIMA for the order of auto regression to find the value of order variables.  Fig 6. (E) and (F). ACF and PACF plots for total recovered cases and total deceased cases respectively https://www.indjst.org/ for determining the order parameters for ARIMA model. The plots show that the ACF and PACF values of the time series are declining and almost close to zero. Figure 7 shows error Residuals vs. fitted values. The dotted line at y=0 indicates our fit line. Any point on the fit line has zero residual. Points above have positive residuals and points below have negative residuals. The red line is the smoothed high order polynomial curve to give us an idea of the pattern of residual movement. In our case we can see that our residuals have slightly curved patterns showing slight deviation in results indicating the presence of outliers in data and some errors but the assumption of normal distribution holds true.
The figure also displays Normal Q-Q plot used to check whether our residuals are following Normal distribution. It displays scale location plot to indicate the spread of points across predicted values range to assume homoscedasticity in regression. Figure also indicates Cook's distance to find influential outliers in the predictor values set and illustrates the histogram of residuals.  Tables 1, 2 and 3 show the forecasted daily confirmed, recovered and deceased cases from July 28 to August 26, 2020 respectively with 95% confidence interval. Tables 4, 5 and 6 show the forecasted total confirmed, recovered and deceased cases from July 28 to August 26, 2020 respectively with 95% confidence interval. The ARIMA model predicts that the daily number of positive cases will increase in coming days can reach up to 93756 cases in the worst case scenario and 59826 in best case scenario. Average count of daily forecasted cases on 26 August is 76791 with 95% confidence Interval (CI). The count of daily recoveries will also increase and can reach up to lower limit of 50025, average count of 64875 and upper limit of 79724 on 26 August. The count of daily deceased cases will increase and can reach up to lower limit of 490, average count of 836 cases and upper limit of 1182 on 26 August.
The ARIMA model predicts that the total number of positive cases will increase in coming days can reach up to 3101334 cases in the worst case scenario and 2618719 in best case scenario. Average total count of forecasted cases on 26 August is 2860026 on 95% confidence Interval (CI).
It is observed that the number of deaths as well as recoveries increase with due course of time. The recovery rate of India is increasing day by day and is higher than CFR. The count of total recoveries will also increase and can reach up to lower limit of 1773257, average count of 1959374 and upper limit of 2145491 on 26 August.
The data was divided into training data set from 30 January to 27 July and test data set for verification from 28 July to 26 August. Then, the actual cases were plotted against the forecasted cases from 7 July to 27 July to visualize the precision of forecasted values. Table 7 indicates goodness of fit of the models used for forecasting. In this study, we used three performance measures, namely ME, MAE and RMSE. The low values of MAE, RMSE and ME indicate good fit of the models. The MAE is smaller than the RMSE and gives indication of low errors in the variance.

Forecast using PROPHET
Time series forecasting using PROPHET is used to plot the observed trend of infections. Figures 15 and 16 illustrates the trend of COVID-19 infections for 180 days i.e. from the 30 th January to 27 th July in India. Figures 17, 18 and 19 show the forecast for daily confirmed, recovered, and deceased cases in India respectively from 28 July to 26 August using PROPHET. Figures 20, 21 and 22 show the forecast for total confirmed, recovered, and deceased cases in India respectively from 28 July to 26 August using PROPHET.
The PROPHET time series forecasting predicts that the count of average daily positive cases can reach up to the count of 71727 cases on Aug 26 with count of 45779 daily recovered cases and 927 daily deceased cases. The forecast illustrates that total COVID-19 confirmed cases will rise to the count of 2007084, total recovered will reach up to 1316682, and total count of deceased cases in the country will reach up to 45679 in average scenario by 26 August 2020. Figure 23 illustrates Actual VS Predicted Values for PROPHET to indicate the accuracy of forecasting.

Conclusion
Although the information about the Novel COVID-19 virus is evolving dynamically and not much is known about its behavior, mathematical models and machine learning algorithms can be used to predict the trend of active, positive, recovery and death cases. This research study helps in the real time analysis and forecasting of COVID-19 trend in India. Time series forecasting using ARIMA model and PROPHET were used for prediction of daily and total positive, recovered and deceased COVID-19 cases in India from 28 July to 26 August and results are indicating an increasing upward trend in the forecasted time of 30 days. This means that India has entered the stage of community transmission of SARS-Cov 2 virus. The number of daily deceased cases is much lower than the recovered cases that indicate low possibilities of causalities from the infections. India will have to ensure social distancing and other safety precautions to contain the spread of virus. We validated precision and accuracy of models using RMSE, ME and MAE, and the validation results showed good regression fit and accuracy in prediction indicating good forecasting performance. As the situation with regard to present COVID-19 pandemic is keep on changing, therefore the exact prediction seems to be little difficult.