Reﬁning Search Performance through Semantic based CBR Model and QoS Ranking Methodology

Objectives: To reﬁne search performance using semantic web with an improved algorithm to retrieve the information eﬃciently. Methods: In order to establish the SCBR model and improve the performance of Web search, this paper adopts the Natural Language Processing (NLP) technology and the Quality of Service (QoS) ranking method, and endeavors to develop a relevant reliable and eﬃcient search engine. Findings: Mean average precision tests revealed for quickness and precious of search results, and achieves the values from 82.98% to 99.53%. The experimental results show that the NLP technique improves the performance of SCBR model, and achieves higher average precision and recall values . Novelty: This research focuses to develop a related reliable and an eﬃcient search engine to retrieve the accurate results for the user’s complex query. It even bears the human error in typing, and suggests the expected word to search for. It also aims to retrieving the same result for synonym words which prevent the appearance of irrelevant search results. is observed that along with the increase of the threshold value the precision results a sharp rise. Mean average precision tests for the quickness and precious of a search result. The mean average precision values from 82.98% to 99.53%, in contrast, recall ranges from 50.49 to 90.50%.


Introduction
The use of polysemous words in traditional search engines leads to a decrease in the quality of their output and keywords could represent only fragmented meanings of the content, and the content determined through keywords did not always meet the query requisites (1) . World Wide Web (WWW) works on the principle of keyword matching acquiescent low precision and recall. Semantic Web, an extension WWW improves the information retrieval process. Query opening out is utmost important in information retrieval to retrieve relevant results. Semantic search uses an information extraction; there are many studies in this field (2)(3)(4) main dissimilarities among these studies arise from the structure of sources, details of the extracted information and computational memory resources. NLP-based approaches are domain independent, but use parse trees of sentences, POS taggers, chunk parsing, anaphora resolution, etc. In order to extract information. They need heavy computational processes (5,6) . There are some alternative information extraction methods such as pattern / rule-based information extractors against heavy computational costs. These methods are classified according to the creative forms of patterns and rules: automatic or manual. Automatic methods (7)(8)(9)(10)(11)(12) are superior compared to the manual ones considering the effort spent on the domain. On the other hand, they suffer from low precision-recall rates. The literature revealed that current studies on traditionalbased search method are not mature enough: either they are not scalable to large knowledge bases or they cannot capture all the semantics in the query (13)(14)(15)(16) . To overcome the weaknesses of the current web system and to utilize the strengths query expansion a novel framework based on ontology and NLP is proposed for information retrieval (17)(18)(19)(20)(21)(22)(23) . In the proposed framework, domain specific knowledge is utilized for ontology construction. In framework ontology knowledge base, ontology tool is used to construct an ontology knowledge base. Based on constructing ontology most semantically related words in a query are identified and query is expanded. A ranking metric function is defined for reputation service among the search module. Based on the proposed framework query are expanded (Semantic query expansion) and evaluated on three popular information retrieval models, namely Extended Case Based Reasoning (ECBR) model, Vector Space Model (VSM) and Latent Semantic Indexing (LSI) model (24)(25)(26) . This research paper main aim to build a mechanism which improves the performance without trailing the search efficiency. Our main contribution is to fill this gap by implementing a semantic search system is proposed. In other words, we try to implement a system that performs at least as good as traditional approaches and improves the performance and usability of semantic querying. We tested our system in education domain to see the effectiveness of semantic searching over traditional approaches and observed a remarkable increase in recall and precision. In addition, we indicated that our approaches can response difficult semantic query, typing errors, imprecise data, which is not possible with the traditional methods.

Methodology
This paper is to develop a related reliable and an efficient search engine to retrieve the accurate results for the user's complex query. The overall system architecture of the semantic search system is shown in Figure 1. The system consists of ontology knowledge base, reputation database, search module and ranking module. Ontology knowledge base is to store domain specific service ontologies and metadata. Search module is to retrieve semantic relevant metadata from ontology knowledge base and send to ranking module. The ranking module will gain the quality of search information regarding the retrieved metadata from the reputation database and both metadata and quality of search information will send back to the user. The user ranks the metadata based on various QoS (Quality of Service) parameters are based on the https://www.indjst.org/ theory of CCCI Metrics. In the following section we will discuss in detail.

Ontology Knowledge Base
The ontology knowledge base consists two main components which service ontology's and metadata in which semantically related ontological concepts and metadata are linked by referencing their URL to one another and it has a two rules contained in the semantic relationship; the first one is a concept may semantically relate to arbitrary metadata and the second one is a metadata may also semantically relate to arbitrary concepts.
First, we describe about the ontology is denoted as the conceptualization of the service, in which identified by service name, service description and linked metadata. The service ontology as the amalgamation of the ontology name and a tuple where the elements of the tuple can be complex elements, the service name can be used to uniquely identify a service, and the service description refers to the definitional descriptions of a service; the normal form of a service description of a set of words like noun, adjective or adverb. A concept may have many service descriptions. The adventure of setting the property of service description is to compute the semantic similarity values between concepts and query, linked metadata refers to the URL of semantically relates to metadata to a concept. The ontology is the definition of service concept in the root of the service concept hierarchy. As leaf concepts all other concepts in this hierarchy automatically inherit its properties.
Second, we respond metadata the purpose of metadata is to bring about meaningful information with regards the real environment. The metadata defined as linked concepts, service name, service address, contact details, and metadata descriptions. The linked concepts refer to the URL of semantically related to concepts to the metadata. Service name refers to the name of the college or institution. Service address refers to the address where can be located. Service contact details, refer to the information regarding phone number, fax number, website and so on. Service description refers to the detailed text description regarding the content of a service. This can be used for matching with concepts.

Search Module
In this section describe about search module. The search module has two main parts; Natural Language Processing (NLP) techniques and SCBR model. The NLP techniques for spell-check and to find synonyms. The SCBR model is employed for increasing search efficiency. In Figure 2 shows the search system architecture and the whole workflow of the search module is below.
1. An education lender enters a set of key terms into the search engine interface. 2. Check spelling and find the synonym for each word from WordNet API. 3. Match the query and concept with ontology knowledge base using (SCBR) algorithm is designed to who have not domain knowledge with regard to their service query. 4. Choose concept is designed for human computer interaction. 5. Refers to Meta data. 6. Retrieve the metadata information from the ontology knowledge base and provide synonym information to education lenders.
The search interface sends each query term to the spell check module for supposing the user enter words in incorrect or spelling mistake, the spell check method corrects the mistake and pass the synonymy. The Java program (synonymer-spell checker) providing suggestions and passes each of the words to Word Net API. If one query term can be retrieved from the API, the API returns its synonyms; otherwise, the query term is filtered. After the process has been completed, the search interface sends the query terms and their synonyms to the query concept matching model. The query-concept matching algorithm is run to compute the similarity values between the service ontology concepts stored in the ontology knowledge base and the query terms then provide relevant information to the user. Once the user selects a result, all its semantically relevant metadata will be retrieved from the service knowledge base.

SCBR Model
This paper proposes SCBR, Semantic CBR (Case Based Reasoning) Algorithm for query concept matching model, which is an enhanced version of Extended Case Based Reasoning (ECBR) algorithm (22) . It is expected that the SCBR algorithm is giving efficient search results than ECBR. The principle of the SCBR model is to seek the maximum similarity value between a query and their ontology knowledge base. If a query key term is contained in it, a value 1 will be awarded; if a meaning of a query key term is in it, a value 0.5 will be awarded; otherwise 0 will be awarded. Here we set the optimal threshold value 0 to 1 with an https://www.indjst.org/ A threshold value needs to be filtered irrelevant data. We then obtain the performance concepts for each time of the variation of the threshold value. The query is then compared with metadata description property of each concept from the ontology. The highest value between the query and any data description properties of a concept is considered as the similarity value between the query and the concepts. In addition, this paper add spell-check is designed such that it gets the query, and passes each of the words the spell-check module where the word is retrieved with the correct spelling and the correct word is being passed to get the synonym and the word with its synonyms is being passed to the matching module to check for the presence of the word or its synonyms in the ontology and if present it is being retrieved along with the metadata. The SCBR model can be mathematically shown as: Where, q is a processed query, d is a result data, here d denoted as a concept c, md i is a meta data descriptions property of data d, k ih is a key involved in md i,∑ mdi is the sum of associated with md i , q kt is the query key term involved in md i , s t is the semantic term, w t is a function that returns a weight associated with s t , m is a meaning of query, sc ih is will check the spelling of the query provide by the user.

Ranking Module and Reputation Database
The ranking module is based on the QoS service methodology. The core of the methodology is CCCI metrics (Correlation of Interaction, Correlation of Criterion, Clarity of Criterion, and Importance of Criterion) are a group metrics developed by Hussain et al (25) , with the purpose of measuring the trustworthiness and reputation of services. Additionally, Hai (26) defined (extended CCCI metrics) two metrics which are reputation and ActualBehaviour criterion to rank services based on user defined criteria trustworthiness values. Here we proposed ranking metric for reputation based ranking and domain specific criteria based ranking. The metrics allow a service requester to evaluate the trustworthiness of a service provider in a service after the requester completes the service transaction with the provider. The evaluation is made by assigning different values to the https://www.indjst.org/ commitment (ranging from 0 to 6), clarity (ranging from 0 to 1) and importance (ranging from 1 to 3) of QoS criteria of a service. In our proposed work, the QoS criteria of a service are determined by an educational domain ontology concept which represents an education service sub domain. If a service metadata that represents the service has an association with a service concept, the metadata can then be evaluated by the QoS criteria relevant to the concept. After a service requester assigns values to QoS criteria of a service metadata, the requester's perception of the trustworthiness of the service metadata is estimated by the extended CCCI metrics. Afterwards, the reputation value (ranging from 0 to 6) of the service provider in this service can be calculated by obtaining all past evaluation scores. Next we define Ranking metric as follows, Definition 1 Ranking (Ranking) we define the ranking value of a service provider in a given context as the average of all involved requester reputation value for this service provider in the same service context.
Ranking has seven levels as shown below; 1. Cannot determine ranking 2. Extremely bad ranking 3. Bad ranking 4. Minimally good ranking 5. Partially good ranking 6. Good ranking 7. Extremely good ranking Where r is the number of reputed value for this service provider in the service interaction, m is the number of evaluated service requesters to a service provided by a service provider, n is the number of QoS evaluation criteria under a service concept associating to the service, and MA commitment (MA Commitment = 6) is the mutually agreed commitment value for each criterion. This metric enables the ranking of a set of service providers according to their reputation value. The reputation values computed by using the above formulae could be used for context-based service ranking. Next we respond to reputation Database is stored evaluation criteria table, reputation table and user evaluation table.

Expérimental Results and Discussion
In order to evaluate our proposed methods, for the experimental setup we collected education dataset from protégé ontology library (http://protegewiki.stanford.edu/wiki/Protege_Ontology_Library) (http://www.danmccreary.com/presentat ions/ semweb/) (www.annuniv.edu) and QALLME datasets. This domain includes information about colleges, university, K-12 students, teachers, schools, districts, enrollments, assessments, food and nutrition programs and on-line courses includes data elements and to compare the performance of four algorithm, we used four performance indicators from the information retrieval system, namely precision, mean average precision, recall and f-measure (6) are adopted in the experiment, a proper threshold values need to be decided to filter the irrelevant concepts for metadata. Next we discuss about four performance indicators as following; Precision is used to measure the preciseness of a search system. In this experiment, Precision P is defined as the number of retrieved relevant data among the retrieved data.

Precision P = number o f retrieved relevant data number o f retrieved data
Before we introduce the definition of mean average precision, the data of average precision should be defined. Average precision is the average of precision values at each retrieval relevant data for a query, given that these data are ranked according to their computed similarity values. This indicator is used to measure how quickly and precisely a search engine works.
Average precisions (Q) = sum (precisions @ retrieved relevant data) number o f retrieved relevant data (8) https://www.indjst.org/ Mean average precision refers to the average of average precision values for a set of query and can represent as below.

Mean average precisions
Recall is used to measure the effectiveness of a search system. In this experiment, Recall R is defined as the number of retrieved relevant data to total number of relevant data in the knowledge base.
F-Measure combines precision and recall, in this paper is used as an aggregated performance scale for searchers and users can specify the preferred on recall or precision by configuring different weights. When the F-Measure value reaches the highest, it means the integrated value between precision and recall reaches to the highest at the same time.

System Evaluation Results
To evaluate, the performance of the SCBR model from the perspective of information retrieval with three models such as Extended Case Base Reasoning (ECBR), Vector Space Model (VSM) and Latent Semantic Indexing (LSI). The mechanism and algorithm concerning the model referred from (20,22) . Different query is made to compare the performance of the system. All the parameter results are averaged by 100. These query cover most of the general user requirements in the educational domain.
A threshold values need to be configured to select the most similar concepts by filtering the concepts with the lower similarity values.
In addition, there are two major tasks involved in the experiment as follows: The first task is to find an optimal threshold for each IR model. The reason for this is that, in the search process, after the similarity values between a query and concepts are computed, a threshold needs to be determined for filtering the relatively dissimilar concepts to obtain the optimal performance for each model. Owing to the difference between each model, the optimal threshold could be different. To choose the optimal threshold, we utilize the F-Measure as the primary scale. The threshold scope is configured between 0 and 1 with an increment of 0.1 at each time. The second task is to evaluate with four information retrieval algorithms and to choose the optimal thresholds with the overall performance of the search process, based on the same set of query. https://www.indjst.org/ F-measure value ranges peek is 75.47% of the threshold value in 0.5. In WordNet off, the precision, mean average precision, recall and f-measure values is relatively low. Therefore, the WordNet API is increasing the performance of SCBR model and it is gaining a high level of mean average precision value.

Results Discussion
Testing results of ECBR model. It is observed that the precision, mean average precision, recall and f-measure values are less than SCBR algorithm and the WordNet API is decrease the performance of ECBR. Testing results of VSM model with WordNet on and off. In WordNet on VSM precision and mean average precision basically experiences a consistent rise, and the only exceptions occur when the threshold is 0.7 and the recall experiences fall almost linearly dropping from 67.30% to 8.20%. The highest f-measure are obtained at the threshold is 0.4. In WordNet off, it reduces the performance on the precision at the threshold values 0, 0.1; mean average precision 0.7, 0.8; recall 0.1 to 0.4 and 0.6 to 0.9; f-measure 0, 0.1, 0.3, 0.6 to 0.9. Testing results of LSI model with WordNet on and off. In WordNet on, the precision ranges from 4.10% to 76.89%, mean average precision ranges from 59.35% to 87.22%; recall ranges from 81.74% to 23.95% and F-Measure ranges from 7.80% to 36.54%. In WordNet off, it reduces the performance on precision at 0.1, 0.2, and 0.3; recall 0.3 to 0.9; f-measure 0, 0.1, 0.2, 0.3, 0.4, 0.8 and 0.9. In this section, we compare the performance of four models: SCBR, ECBR, VSM, LSI and other. The educational domain based on the four performance parameters precision, mean average precision, recall, and f-measure. In this experiment we find the optimal threshold value for each model with the same set of query. Figure 3 shows the comparison of four models of precision with WordNet API. It enhances the performance of SCBR. In Figure 4, no difference between ECBR and SCBR model of precision without WordNet API. Figures 5 and 6 depicts on comparison of four models on mean average precision. Here the Mean average precision is to test the quickness and precious of search and it is achieved championship variation with other models based on with and without the WordNet API. Figures 7, 8, 9 and 10 reveals the comparison of the SCBR, ECBR, VSM and LSI on recall and f-measure. It can be seen that the WordNet API increased the performance of SCBR, and ECBR algorithms have the same performance on recall and f-measure, without WordNet API and the difference is the SCBR is giving the result more efficiency than ECBR and the other two models. It can be concluded that the SCBR model has the highest scores https://www.indjst.org/ on the performance in Figure 11 . In addition the WordNet API enhances the performance of SCBR. From the measurements our approach (SCBR) provide more accurate search results compare than other three models (26) and other (27) .

Conclusion and Future Work
The work implemented a reliable and an efficient system, which suggests the user all the effective details to know about an educational domain. It is reliable because though it is being inputted with synonymous words and misspell, it retrieves the similar result and does not provide an irrelevant result. In this paper, we designed a more efficient SCBR algorithm and QoS based ranking metric. The SCBR is an enhanced version of the ECBR algorithm, in structure to compute the similarity values between query and concepts. The QoS based ranking metric is an extension of extended CCCI metrics and it provided quality of ranking services. We compare the performance of the SCBR model with three information retrieval models and other. To address the defect of low recall rate that done in ECBR model. We modified ECBR algorithm to SCBR algorithm to obtain better performance.
In future work, the system can be further refined with more words in the search interface relevant to cognitive system which can yield more effectiveness of the web based therapy program for enhancing occupational performances in the intellectual disability domain. The system can be better used with more performance indicators which can better model user requirements.