Time Series Analysis of Disease Outbreaks in Eastern Samar Province Philippines

Objective: To analyze the 5-year occurrence of different diseases based on hospital records in Eastern Samar Provincial Hospital; to present a time series analysis model for the occurrence of disease in Eastern Samar. Methods: On analyzing the data this study used the data mining model to process the activities and used Orange as the tool to show the result of the analysis. The study chooses 5 common diseases which are: Acute Gastroenteritis, Urinary Tract Infection, Pneumonia, Hypertensive Cardiovascular and Pulmonary Tuberculosis from the year 2007-2011 and through Orange widget the gathered data imported and used graph visualizer. The study gets the sum of all patients every month, year and by a quarter to get the different analysis of the occurrence of the disease. Findings: The result showed the time series analysis of the highest number of the patient from the year 2007-2011 was the Pneumonia followed by Acute Gastroenteritis, UTI, Hypertensive Cardiovascular and Pulmonary Tuberculosis, respectively. Borongan Eastern Samar has the highest number of occurrence of all the 5 diseases. Application: Consider common diseases that are affected by distinctive weather condition (rainy season and dry season).


Introduction
Disease outbreaks nationwide, especially to remote areas in the Philippines have been an inevitable and a major concern to the government. The concern agencies trying their best, they have attended training and seminars in and outside of the country just to address these disease outbreaks. The main focus of this study was the analysis of the series of time of the disease outbreaks with the use of Orange data mining tool and using it for classification in the field of medical bioinformatics. The use of time-series for the monitoring of disease is useful in preparing public health authorities to be forewarned of an impending disease outbreak and gives adequate time for disease outbreak mitigation efforts to be implemented 1 . Forecasting the future trajectory of cases during an infectious disease outbreak can make an important contribution to public health and intervention planning 2 . Planning by forecasting trends in the future is one way to apply statistical knowledge to analyze data in the past that are related to the current events 3 . The study can contribute to the Eastern Samar Provincial Hospital Philippines to have an advanced preparation on what kind of medicine is most important or appropriate within the month. With the use of the data from the hospital records from Eastern Samar Provincial Hospital, the study conducted the Time Series Analysis of Diseases Outbreak in Eastern Samar that has been used in analyzing the data by comparing data from the year 2007-2011 and analyzed the occurrence of disease in each month.
Orange is a component-based visual programming software package for data visualization, machine learning, data mining and data analysis.
There are several different levels at which Orange can be used especially in data analysis or the time series analysis. Orange contains modules for data classification and accuracy to analyze diseases. It has been used in bioinformatics for diagnosis and analysis of diseases (Orange, edited 2018).
The goal of the study was to analyze a time series of disease outbreak in Eastern Samar that will show the occurrence of disease by month, year, within five years. It contributed other information to Eastern Samar Provincial Hospital on the possibility of occurrence of what disease in every month and year.

Objectives
This study aimed to: • Analyze the 5-year occurrence of different diseases based on hospital records in Eastern Samar Provincial Hospital. • Present a time series analysis model for the occurrence of disease in Eastern Samar.

Research Design
This study used the investigative type of research in which it used facts or information already available and analyzed them to make a significant assessment of the data. It involves deeply study and assessment of available information in an attempt to explain the complex occurrence. Figure 1 shows the example of hospital records that have been used a dataset for analyzes and save it into an Excel file. The dataset was separated or grouped into 4 columns according to the month, age, gender, address and diagnosis.

Data Mining Tool
In order to carry out experimentations and implementations, Orange was used as the time series analysis tool; it is an open source data visualization and analysis tool. Orange consists of a canvas interface onto which the user places widgets and creates a data analysis workflow. Orange has also featured for different visualizations, such as scatter plots, bar charts and trees to dendrograms, networks and heat maps. By combining the various widgets the design of data analytics framework can be done. Figure 2 represents the data analytics framework of the study in which the researchers used the five various widgets which include File, Data Table, Tree, Tree Viewer and the Distribution.

Research Locale
The study was conducted at the Eastern Samar Provincial Hospital Philippines Record's Office. This study limits to patients within the Province of Eastern Samar from 2007-2011.

Data Analysis
The data gathered from Eastern Samar Provincial Hospital was encoded manually into Excel and the 5 common diseases Acute Gastroenteritis, UTI, Pneumonia, Hypertensive Cardiovascular Disease and Pulmonary Tuberculosis were chosen.
This study used Orange as data mining tool to analyze the data gathered to show the possibility of occurrence of disease, through different chart and tables the analyzed data was shown different distribution of diagnosis for every month and year; the percentage of diseases for every year; the distribution of address by diagnosis and get the structured result of time series analysis for every month, year and by quarter of year 2007-2011.

Results and Discussion
The widget was used to input the data each type will appear and can be numeric or categorical and choose the target data, the diagnosis and apply to reveal the result. Figure 3 represents the File widget used from Figure 2 that contains the following datasets with its respective data type such as categorical for the month, gender, address, diagnosis and numeric type for the age. The main target of the study is the Diagnosis or specific disease. The figure above represents the month of January 2007. The file contains 218 instances which represented the total number of  Figure 4 represents the second widget that was used which is the Tree Viewer which contains and display the tree of the analyzed data from Figure 3. The figure represents all Tree Viewer widget for different months in each year. Figure 5 represents the dataset presented on a table. The dataset was placed according to their role. The first column represented by the diagnosis dataset filled with gray color represented the target class and the rest of the data represented as the feature. The figure represents all Data Table widget for different months each year. Figure 6 represents the distribution of each data or variable grouped by the variable Diagnosis. The distribution represented by a graph with a different color which depends on specific diagnosis/disease. The figure below represents the distribution of month grouped by diagnosis. The figure represents all file widget for different months each year. The 5-year occurrence of different diseases is based on hospital records in Eastern Samar Provincial Hospital. Figure 7 shows the highest percentage of diseases occurring in every month in 2007 which was Pneumonia represented with green color. The second disease that occurred frequently was the Urinary Tract Infection which is represented by the yellow color and the third disease that occurred frequently was Acute Gastroenteritis which is represented by the color blue and fourth is Hypertensive Cardiovascular Disease represented by the color red and the last disease that occurred frequently is Pulmonary Tuberculosis which was represented by the Orange color. Distribution of Different Disease is by Percentage for 2007. Table 1 shows the percentage of occurrence of the five specific diseases in each month from the above figure. The five diseases that most occurred each month was placed into five different rows and for the month 12 different columns. The results which represented in the table, the disease that occurred mostly from the five diseases were Pneumonia from the month of January-May and October-December. Figure 8 shows that Can-avid has the most percentage of patients from Pneumonia. The graphs represented by Diagnosis are grouped according to address. Figure 9 represents the distribution of each month where the disease occurred. The graph shows that the highest to the lowest frequency of diseases occurred in every month for 2008 was Pneumonia, the second highest disease that occurs as the Urinary Tract Infection, the third disease was Acute Gastroenteritis, the fourth was Hypertensive Cardiovascular Disease and the disease that occurred less frequently was Pulmonary Tuberculosis. Each disease is represented with different colors. Distribution of Different Disease is by Percentage for 2008. Table 2 shows the percentage of occurrence of the five specific diseases in each month from the table above. The five diseases that occurred most frequently each month were placed in five different rows and 12 different columns. Figure 10 shows that Borongan has the most percentage of patients from Pneumonia. The graphs represented by Diagnosis are grouped according to address. Figure 11 represents the monthly distribution of each of the five (5) diseases that occurred. The graph shows the highest to lowest percentage of diseases in every month in 2009. Pneumonia has the highest percentage of occurrence, followed by Urinary Tract Infection, Acute Gastroenteritis, Pulmonary Tuberculosis and Hypertensive Cardiovascular Disease respectively. Distribution of Different Disease is by Percentage for 2009. Table 3 shows the percentage of occurrence of the five specific diseases in each month from the table above. The five diseases that most frequently occurred each month were placed in five different rows and 12 different columns. The disease that occurred mostly from the five diseases was Pneumonia. Figure 12 shows that Borongan has the most percentage of patients from Pneumonia. The graph represented by Diagnosis was grouped according to address. Figure 13 represents the distributions of each month for each of the five (5) diseases that occurred. The graph shows the highest to lowest percentage of occurrence of these diseases in every month for 2010. The Acute Gastroenteritis was the most highly diagnosed disease within this year, followed by Pneumonia, Urinary Tract Infection, Hypertensive Cardiovascular Disease and Pulmonary Tuberculosis respectively. Distribution of Different Disease is by Percentage for 2010. Table 4 shows the percentage of occurrence of the five specific diseases in each month. The five diseases that occurred most frequently each month were placed into five different rows and 12 different columns. Figure 14 shows that Borongan has the most percentage of patients from Pneumonia. The graphs represented by Diagnosis are grouped according to address. Figure 15 represents the distributions of each month where the disease occurred. The graph shows the highest percentage of occurred diseases in every month for the year 2011. Pneumonia has occurred mostly from the five diseases, the second highest disease occurred was the Acute Gastroenteritis, the third disease occurred was Urinary Tract Infection, the fourth was Hypertensive Cardiovascular Disease and the last was Pulmonary Tuberculosis. Distribution of Different Disease is by Percentage for 2011. Table 5 shows the percentage of occurrence of the five specific diseases in each month. The five diseases that occurred most frequently each month were placed in five different rows and 12 different columns. Figure 16 shows that Borongan has the most percentage of patients from Pneumonia. The graphs represented by Diagnosis are grouped according to address. Figure 17 shows the occurrence of the 5 most common diseases for the month of January from 2007-2011. The time series analysis for January shows the different number of patients every year.
The sum of all patients in January shows that Pneumonia has the highest number of patients followed by Acute Gastroenteritis, UTI, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 18 shows the occurrence of the 5 most common diseases for the month of February from 2007-2011. The time series analysis for February shows a different number of patients every year. The sum of all patients in February shows that Pneumonia has the highest number of patients followed by Acute Gastroenteritis, UTI, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 19 shows the occurrence of the 5 most common diseases for the month of March from 2007-2011. The time series analysis for March shows a different number of patients every year. The sum of all patients in March shows that Acute Gastroenteritis has the highest number of patients followed by Pneumonia, UTI, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 20 shows the occurrence of the 5 most common diseases for the month of April from 2007-2011. The time series analysis for April shows a different number of patients every year. The sum of all patients in April shows that Pneumonia has the highest number of patients followed by Acute Gastroenteritis, UTI, Pulmonary Tuberculosis and Hypertensive Cardiovascular respectively. Figure 21 shows the occurrence of the 5 most common diseases for the month of May from 2007-2011.
The time series analysis for May shows a different number of patients every year. The sum of all patients in May shows that Acute Gastroenteritis has the highest number of patients followed by Pneumonia, UTI, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 22 shows the occurrence of the 5 most common diseases for the month of June from 2007-2011. The time series analysis for June shows a different number of patients every year. The sum of all patients in June shows that Pneumonia has the highest number of a patient followed by UTI, Acute Gastroenteritis, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 23 shows the occurrence of the 5 most common diseases for the month of July from 2007-2011. The time series analysis for July shows a different number of patients every year. The sum of all patients in July shows that Pneumonia has the highest number of patients followed by UTI, Acute Gastroenteritis, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 24 shows the occurrence of the 5 most common diseases for the month of August from 2007-2011. The time series analysis for August shows a different number of patients every year.
The sum of all patients in August shows that Pneumonia has the highest number of patients followed by UTI, Acute Gastroenteritis, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 25 shows the occurrence of the 5 most common diseases for the month of September from 2007-2011. The time series analysis for September shows a different number of patients every year. The sum of all patients in September shows that Pneumonia has the highest number of patients followed by UTI, Acute Gastroenteritis, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 26 shows the occurrence of the 5 most common diseases for the month of October from 2007-2011. The time series analysis for October shows a different number of patients every year.
The sum of all patients in October shows that Pneumonia has the highest number of patients followed by, Acute Gastroenteritis, UTI, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 27 shows the occurrence of the 5 most common diseases for the month of November from 2007-2011. The time series analysis for November shows a different number of patients every year. The sum of all patients in November shows that Pneumonia has the highest number of patients followed by UTI, Acute Gastroenteritis, Pulmonary Tuberculosis and Hypertensive Cardiovascular respectively. Figure 28 shows the occurrence of the 5 most common diseases for the month of December from 2007-2011. The time series analysis for December shows a different number of patients every year. The sum of all patients in December shows that Pneumonia has the highest number of patients followed by Acute Gastroenteritis, UTI, Pulmonary Tuberculosis and Hypertensive Cardiovascular respectively. Figure 29 shows the occurrence of the 5 most common diseases for 12 months in 2007. The horizontal axis represents months and the vertical axis represents the number of patients. The graph presents both the increase and decrease in the number of patients for the five most common diseases per month. The different color outlines the different diseases. Color Blue for Acute Gastroenteritis, red for Pneumonia, green for Urinary Tract Infection, violet for Hypertensive Cardiovascular Disease and blue for Pulmonary Tuberculosis. Figure 30 shows the occurrence of the 5 most common diseases for 12 months in 2008. The horizontal axis represents months and the vertical axis represents the number of patients. The graph presents both the increase and decrease in the number of patients for the five most common diseases per month. The different color outlines the different diseases. Color Blue for Acute Gastroenteritis, red for Pneumonia, green for Urinary Tract Infection, violet for Hypertensive Cardiovascular Disease and blue for Pulmonary Tuberculosis. Figure 31 shows the occurrence of the 5 most common diseases for 12 months in 2009. The horizontal axis represents months and the vertical axis represents the number of patients. The graph presents both the increase and decrease in the number of patients for the five most common diseases per month. The different color outlines the different diseases. Color Blue for Acute Gastroenteritis, red for Pneumonia, green for Urinary Tract Infection, violet for Hypertensive Cardiovascular Disease and blue for Pulmonary Tuberculosis. Figure 32 shows the occurrence of the 5 most common diseases for 12 months in 2010. The horizontal axis represents months and the vertical axis represents the number of patients. The graph presents both the increase and decrease in the number of patients for the five most common diseases per month. The different color outlines the different diseases. Color Blue for Acute Gastroenteritis, red for Pneumonia, green for Urinary Tract Infection, violet for Hypertensive Cardiovascular Disease and blue for Pulmonary Tuberculosis. Figure 33 shows the occurrence of the 5 most common diseases for 12 months in 2011. The horizontal axis represents months and the vertical axis represents the number of patients. The graph presents both the increase and decrease in the number of patients for the five most common diseases per month. The different color outlines the different diseases -Color blue for Acute Gastroenteritis, red for Pneumonia, green for Urinary Tract Infection, violet for Hypertensive Cardiovascular Disease and blue for Pulmonary Tuberculosis. Figure 34 shows the quarterly number of patients in 2007. Q1, Q2 and Q4 Pneumonia has the highest number of patients followed by Acute Gastroenteritis which highest in Q3 UTI, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 35 shows the quarterly number of patients in 2008. Pneumonia has the highest number of patients in each quarter followed by UTI, Acute Gastroenteritis, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 36 shows the quarterly number of patients in 2009. Q1, Q2, Q3 and Q4 Pneumonia has the highest number of patients followed by UTI, Acute Gastroenteritis, Pulmonary Tuberculosis, and Hypertensive Cardiovascular respectively. Figure 37 shows the quarterly number of patients in 2010. Q1 and Q3 Acute Gastroenteritis has the highest number of patients followed by Pneumonia which highest in Q2 and Q4, UTI, Pulmonary Tuberculosis and Hypertensive Cardiovascular respectively. Figure 38 shows the quarterly number of patients in 2011. Q1 and Q4 Pneumonia has the highest number of patients followed by Acute Gastroenteritis which highest in Q2 and Q3, UTI, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively. Figure 39 shows the occurrence of the 5 most common diseases for 2007-2011. The horizontal axis represents the year and the vertical axis represents the number of patients. The graph presents both the increase and decrease of the number of patients for the five most common diseases per year. The different colors outline the different diseases are -color blue for Acute Gastroenteritis, red for Pneumonia, green for Urinary Tract Infection, violet for Hypertensive Cardiovascular Disease and blue for Pulmonary Tuberculosis. The highest number of patient Vol 12 (10) | March 2019 | www.indjst.org                           is Pneumonia, Acute Gastroenteritis, UTI, Hypertensive Cardiovascular and Pulmonary Tuberculosis respectively mostly occurred in Borongan Eastern Samar. Figure 40 represents the overall result of each disease grouped by address. A different color represents each place where the disease occurred.

Conclusion
The study revealed the possible result of the diseases such as Urinary Tract Infection, Acute Gastroenteritis, Pneumonia, Hypertensive Cardiovascular Disease and Pulmonary Tuberculosis that occurs mostly in each month from the year 2007-2011 by the use of Orange data mining tool. The study was successfully conducted a 5-year time series analysis model of data. It was revealed that as the year changes the number of patients also changes depending on what kind of disease they are diagnosed with.
The number of patients that were diagnosed among the five different diseases from 2007, 2009 and 2011 was lower than the number of patients in the year 2008 and 2010.