Development and Visualization of Domain Specific Ontology using Protege

Background/Objectives: The research aims to explore differences among various ontology development tools, its ­languages;­ finally­developed­ and­visualized­ontology­on­ specific­ domain.­Methods/Analysis: Railway Enquiry System (RES)­ontology­ is­being­developed­with­the­help­of­Protege­tool­and­visualized­using­TGViz­ tab.­ It­ involves­creation­of­ various­classes­and­their­instances­so­that­a­person­can­find­references­to­its­query.­Findings: The following manuscript makes readers aware of concept of Semantic Web because the search performed by today’s search engines is based on keyword­extraction­technique­which­leads­to­irrelevant­and­incomplete­results­marked­with­low­precision­and­high­recall.­ Developed­ontology­depicts­real­world­scenario­of­railway­reservation­system.­With­this­ontology,­a­person­can­check­its­ seat­availability,­train­fare­details,­PNR­status­and­many­more.­Improvements/Applications: The given ontology can be extended to develop railway tracking web based application using Web Ontology Language (OWL) and Semantic Web Rule Language­(SWRL). Development and Visualization of Domain Specific Ontology using Protege Usha Yadav1, Gagandeep Singh Narula2*, Neelam Duhan3, Vishal Jain4 and B. K.Murthy1 Centre for Development of Advanced Computing (CDAC), B-30 Noida 201307, Uttar Pradesh, India; ushayadav@cdac.in, bkm@cdac.in Computer Science and Engineering, Centre for Development of Advanced Computing (CDAC), B-30, Noida 201307, Uttar Pradesh, India; gagan.narula87@gmail.com Department of Computer Engineering, YMCA University of Science and Technology, Faridabad 121006, Haryana, India; neelam_duhan@rediffmail.com Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi -110063, India; vishaljain83@ymail.com


Introduction
World Wide Web (www) is a distributed repository of millions of documents which covers wide range of multidisciplinary information; to extract and retrieve particular information among these documents is a cumbersome job. There are two confusing terms associated with extraction and retrieval. Information Retrieval specifies retrieving information from millions of documents irrespective of documents are relevant or not while Information Extraction specifies extraction of information from relevant documents. WWW is the largest information construct that has gained various advancements ranging from web 1.0 to web 4.0. Web 1.0 is first generation of web that is read only and static web 1 . Web 2.0 is second generation of web and known as Social and Read/Write web 2 . Web 3.0 is considered as third generation of web and is known as Semantic Web (SW) 3 . Till this, machines are not clever as they perform tasks on basis of user input requirements. Web 4.0 is fourth generation of web and is known as Symbiotic Web. It will make machines to think in an intelligent way by reading contents of web and producing that information which loads the website faster 4 .
In order to increase degree of relevance, there is need to move towards Semantic Web (web 3.0) and ontology. In broad terms, Semantic Web is known as Global Information Mesh which consists of annotated documents represented in language friendly to humans as well as machines. It curtails the gap between humans and machines. Ontology represents relationship among classes, properties and instances in hierarchical fashion. Table 1 illustrates the differences among various generations of web. The paper is organized as follows: Section 2 presents brief information about Semantic Web and its layout. Section 3 explicitly defines ontology ranging from its components to development tools and languages. In addition to this, a comparative study has also been described among various development tools and languages. Section 4 presents case study on Railway Enquiry System (RES) and its ontology is being developed with the help of Protege tool.

Semantic Web (SW)
The idea of SW was given by the inventor of www-Tim Berners Lee in 1996 that targets to convert present information into machine friendly language 5 . In simple words, it is termed as repository of information and languages involved for presenting such information.

Architecture
Its layout consists of following components: • Unicode and URI -Unicode represent each character uniquely and provide intellectual style while URI is Uniform Resource Identifier that represents data in syntactical format.
• XML-It stands for Extensible Markup Language that consists of namespaces and schemas to define structure of data on web. • Resource Description Framework (RDF) -It is used for describing information in form of data models which in turn consists of triples viz. Subject, Predicate and Property. Example of RDF is given in Figure 1 • RDFs -It stands for RDF Schema that acts as vocabulary language to represent and inference RDF data models. • Ontology -It is defined as set of terms used to describe given domain and derive inferences from it. • Logic and Proof -In this layer, agents can make inferences in finding requirements of given resources with the help of inference systems 6 . • Trust -It signifies assurance and degree of loyalty to information 7

Ontology
The word Ontology is derived from two Greek wordsonto that means "being" and logia which means "written or spoken discourse". Ontology has wide range of definitions ranging from philosophy to artificial intelligence. Ontology is abbreviated as FESC which means formal, explicit, specification of shared conceptualization 8 .

• A set of concepts
These can be the nodes in the representation of ontologies.

• A set of properties
Every node or a concept or a class may or may not have properties related to it, properties can also be summarized as the values of the concepts.

• A set of relational properties
It implies relationship between two or more concepts or nodes. This generally generates a hierarchical way from one concept to another.

• Hierarchy of concepts
Sub concept/super concept relationships.

• Hierarchy of properties
Sub-property/super-property relationship.

• A subset of symmetric properties
It defines set of properties in a concept that have same values and same functionality.

• Transitive property relation
Transitive relation is defined as, if property A is related to property B and property B is related to property C then property A will be necessarily related to property C.

• Symmetry and Inverse Symmetry relations among properties • Domain values related to properties
• It defines the class n the level of the properties; concepts that share same property values have same domains.

• Range values related to properties
• Range is a characteristic of the concepts, which can be an interval, a list of elements or simply a character. • Minimum and Maximum cardinality for each concept-property pair • In Set theory cardinality is said to be the number of elements in a set, in this concept cardinality is a positive number that is associated with each concept and showing that how many properties are associated with that concept. Maximum and minimum cardinality is the range, discussed above, of the properties associated with any concepts.

Basic Steps for Building Ontologies
• Determine Scope:-It includes defining structure and values associated with ontology. • Consider re-using:-Recent ontologies can be re-used for defining schema of new ontology. • Enumerate terms:-Clearly specify all the terms that specifies domain and range of ontology in structured list. • Define taxonomy:-After specifying terms it is necessary to organize them in hierarchical fashion. If A is subclass of B, then every instance of A must be an instance of B. • Define properties:-It is most important step to organize the properties that link the classes while organizing these classes in a hierarchy. • Define facets:-The ontology will only require the expressivity provided by RDF Schema and does not use any of the additional primitives in OWL. • Define instances:-Ontologies are being used to organize sets of instances 9 .

How to use Ontology
Usage of ontologies depends on number of levels assigned. Level 1: As vocabulary language for interacting among multi agents in distributed scenario. Level 2: Represented as database schema that holds information about classes, properties and instances in it. Data can be retrieved easily from database by accessing its schema.

Ontology Development Languages
Following are types of ontology languages used in Semantic Web.
• LOOM 10 :-It is one of knowledge representation languages that is based on description logics and rules to build concepts automatically. • SHOE 11 :-It is used to extract relevant information from web documents. It also combines knowledge representation data and ontological features.

Ontology Development Tools
In general, ontology development includes phases like specification, design and formalization phases. All these phases are treated as SDLC phases 16 . Table 4 lists differences among various ontology editors 17 .

Case Study
The paper presents Railway Enquiry System (RES) ontology that describes terms involved in a railway reservation system. A person can see the train or can see the seat availability or also can see the fare, but a person can't book the ticket.
Developed ontology is partial (as it only shows the terms used in ontology) that describes real-world phenomena -Railway Enquiry System (RES).

Screen shots
Tool used: Protege 3.4 beta. It is created at Stanford University 19 and acts as an open-source knowledge requisition system that is written in Java 20 .

Conclusion and Future Scope
Ontology is treated as main constituent of Semantic Web that allows explicit well defined understanding of concepts among agents and analyzes domain knowledge. The paper firstly describes evolution of www from web 1.0 to web 4.0. Concept of Semantic Web and ontology is being described. In addition to this, differences among various ontology development tools and languages are listed. Lastly, the paper presents case study on Railway Enquiry System (RES), defines its classes, properties and instances by developing ontology on Protege 3.4 beta and visualizing it using TGViz tab.
As a future work, knowledge can be extracted from developed ontology by importing in any IDE like Eclipse, NetBeans and IntelliJ etc. with the help of some open source framework like Jena and Sesame. A user GUI can be designed which helps in document classification 21 as well as promoting E-learning with the help of Semantic Web technologies 22 .