Cloud and Grid Part I: Difference and Convergence

The first part of this paper is concerned with the generalized models of grid and cloud architectures as well as their mutual convergence. We study the examples of contemporary grids and clouds and next-generation mechanisms of achieving their unification into global confederation of clouds. The types and properties of virtual resources as well as approaches to their appropriate and efficient management in virtual machines are considered in the next following second part of the paper.


Introduction and Overview of the Literature
An idea of interactions between computers dates back to 1960s when the first commercially available modem had been released. Further technical and algorithmic advances allowed one to not only exchange messages, but also to provide access to databases, file storage and computational resources via standard protocols widely used nowadays. Initially, emerging technologies defined user needs. However, user demands race with available capabilities ever since and now they are at the point of overgrowing the latter. Compliance to these needs becomes a global challenge to modern applied technologies and science. The most promising solutions lie in the area of distributed computer activities. Current distributed computer infrastructures originate from parallel clusters and are divided into following classes: grids and clouds. A cluster is usually considered as a group of computers deployed in a single location and tightly interconnected via high bandwidth network 51 . The works 12,50 define clusters as follows: "A cluster is a type of parallel and distributed system, which consists of a collection of inter-connected standalone computers working together as a single integrated computing resource. " The nodes constituting the network are homogeneous in software and hardware specifications. The cluster operates mostly in shared memory mode and provides an interface to mimic a single physical machine.
There are no strict standard and widely accepted definitions for grids 24,26,10 thus their attributes sometimes overlap with that of the clouds and the mutual differences are somewhat fuzzy. However, various classifications are presented in papers 24,29,53,63,36 .
A grid is devised as a collection of loosely interconnected heterogeneous computers at different locations with varying operating systems and hardware under multiple ownership and decentralized management 29,36 . The node of the grid can represent either a single machine or a whole cluster. Buyya 13 gave one of the popular definitions for grids as follows: ''A Grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed 'autonomous' resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements.'' The grid definition by 10 is "a large-scale geographically distributed hardware and software infrastructure composed of heterogeneous networked resources owned and shared by multiple administrative organizations which are coordinated to provide transparent, dependable, pervasive and consistent computing support to a wide range of applications. These applications can perform either distributed computing, high throughput computing, ondemand computing, data-intensive computing, collaborative computing or multimedia computing." According to 32 a grid is designed for scheduled computationally intensive operations on few large allocation requests. It is a common agreement that a grid should provide the sharing of computational resources, storage elements, specific applications and equipment not subjected to a centralized control via standard open general purpose protocols and interfaces 24,53 . Implementation of these features constitutes an additional layer of abstraction over the cluster. However, such level of abstraction is still not sufficient for common users to handle effectively generalpurpose tasks.
A cloud is built upon a grid and it is devised as its generalization aimed to resolve previous complexities. As stated by 29 the cloud computing is: "A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet". According to Buyya, "A Cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resource(s) based on servicelevel agreements established through negotiation between the service provider and consumers." In contrast to the grid hardware resource sharing, the cloud provide its resources as service models at high layers of abstraction, namely: infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS). The NIST (2011) defines these terms as follows: These service models are the part of the global principle everything as a service (XaaS). The examples of this conception are the interim layers "cluster as a service" 20,66 and "grid as a service" 1 between infrastructure and platform services. The unified interfaces allow one to form federation of clouds. Since there is no evidence on the theoretical limits on scalability, the extreme case of the federation would be the global cloud of all existing computational devices, making up the Next generation of internet 32 .
This first part of this paper is concerned with an overview of the generalized models of grid and cloud architectures as well as their mutual convergence via virtualization, along with occurring obstacles like performance degradation of various nature. In the following second part of the study, we consider the types, properties of virtual resources along with their appropriate management, and propose novel ballooning approach to memory balancing on nested virtual machines.

Resource Virtualization on Physical Machine
Modern hardware performance is sufficient to emulate multiple computer systems on a single physical machine. Such emulated computer system is called a virtual machine and it is capable to run the majority of applications in a manner undistinguishable from that of a real counterpart. This term should not be confused with Java or Inferno virtual machines that only provide runtime environment for corresponding program code. The software, firmware, or hardware to run virtual machines are called hypervisor, virtualization engine, virtualization module, or virtualization system. A physical machine running a hypervisor is called a host. A software running in an emulated environment is called a guest. A hypervisor maps available physical resources to virtual ones and distributes them among sibling virtual machines. The following virtual resources are distributed: CPU, memory, storage, and network. The CPU distribution relies on thread scheduling. To overcome single thread bottlenecks and achieve high performance the scheduler should utilize hyper-threading and multi-core technologies, as well as emerging chip-level multiprocessing 9 .
The storage distribution is based on conception of "virtual hard disk" file format representing an image of storage devices within a corresponding VM. The concept of virtual storage contributes additional throughput and latency bottlenecks due to abstraction from underlying physical devices and shared access to them 37 . One approach to solve this issue is to refashion the traditional disk scheduler algorithms of the guest OS into workloadoriented form while restricting its hypervisor counterpart to a minimum activity 11 . The paper 40 shows that the hostguest combination of nested file systems also affects disk I/O performance depending on a prevailing workload pattern and is a subject to careful experimental selection.
The distribution on network resource is implemented via virtual networking, simulating various network infrastructure on top of existing hardware components. The virtual networking resource is also a subject to performance degradation arising from traffic interference between multiple single-hosted guests 47 . Non-optimal management of proportional sharing by conventional I/O schedulers leads to excessive triggering of congestion avoidance and results in additional delays 38 . One of the proposed solutions introduces the concept of Differential Virtual Time (DVT) and implements a latency smoothing host I/O scheduler, preserving fair proportionality and improving performance isolation across VMs.
The memory virtualization is the most challenging among resource mapping. This mapping has three levels of abstraction 62 .
• Host physical memory. It is used by a hypervisor and treated as available on the system. • Guest physical memory. It is maintained by hypervisor as contiguous addressable memory space and used by the guest OS running on the VM. • Guest virtual memory. It is managed by the guest OS to applications and is used by them. Some researchers consider the total host physical memory as a sum of fast volatile RAM and virtual pages on lower bandwidth media like magnetic or flash disks. However, the disk swapping on a host results in significant performance drop and means a resource exhaustion that should be avoided. The host operating system via hypervisor is unable to manage virtual machines 61 since it cannot take pages from the guest transparently: the guest would be unaware of the mapping change and it would continue to work with the memory that does not belong to it. Such a situation could lead to unpredictable damages. So hypervisor have to take care of memory scheduling. It gets the resource from host OS and redistributes it among virtual machines. The most effective policy is to provide the resource in accordance to the guest's resource demand. However, the estimation of the demand is in turn a complex problem known as the problem of physical memory size estimation.
There are following approaches to physical memory size optimization: content-based sharing, ballooning, memory compression, and page replacement. These algorithms are described in detail in 44,64 . Ballooning concerned in the paper 45 is widely considered as a most promising method.
The idea behind ballooning is to provide guest OS with an auxiliary driver which effectively reclaims guest physical memory for a hypervisor on its request by inflating or deflating within guest virtual memory like any typical application. The amount of committed balloon memory deduced from a guest OS at which this OS initiates page swapping approximates the amount of pages in physical guest memory unused by processes other than ballooning itself. The detailed description of this technique is provided in 45 . We consider ballooning and other resource balancing techniques in the second part of the study.

The Generalized Grid Model
Classical grids are described and discussed in multiple publications. The properties of a grid can be briefly tabulated as follows 15 Grid abstraction layer hides variations in the underlying basic technologies (e.g. computer clusters, storage managers, application services, etc.) and is provided by the middleware, which implements a set of services and protocols to aggregate resources in a grid. Middleware services perform information discovery and monitoring, resource management, security policies, grid scheduling, load balancing, and data management 2 .
The architecture of these services is defined by following objective-specific grid categories 39 • Computational grids, delivering application performance via supercomputing and high throughput; • Data grids, improving data access; • Service grids, providing enhanced on-demand, collaborative, and multimedia services. The computational grid category consists of distributed systems that grant single applications a high aggregated computational capacity far exceeding that of any employed machine. Supercomputing mode executes single application instance in parallel to reduce overall completion time. High throughput mode increases the completion rate of a stream of job tasks.
The data grid category is devised to provide specialized infrastructure for intensive information processing such as synthesis of new data from digital libraries in a wide area network. The main data grid activity comprises of unified infrastructure-based data services across repositories, while a computational counterparts rely on applicationbased implementations of storage management and data access schemes. Typical data grid tasks include large-scale data mining to correlate information from multiple different data sources. The main developers and administrators of large-scale data organization, catalog, management, and access technologies are European Grid Infrastructure (http://www.egi.eu/) and Globus Aliance (https://www.globus.org/).
The service grid category denotes the systems that provide distributed yet collective services unavailable by means any single machine. A collaborative grid manages collaborative workgroups, allowing users and applications to interact in real time within a virtual workspace. An on-demand grid category dynamically allocates various resources to provide new services or scale up existing ones. A multimedia grid provides an infrastructure for real-time multimedia applications. This involves mandatory support of distributed QoS (Quality of Service) in contrast to a single dedicated machine where such functionality is arbitrary.
Generally the large-scale grids, especially data-grids, employ data availability mechanisms, initially intended to decrease the data access latency and the network bandwidth consumption via replication. Studies 52 indicate that different replica strategies (e.g. Best Client, Cascading Replication, Plain Caching, Caching plus Cascading Replica and Fast Spread) are suited best for various user access patterns (e.g. random access, small temporal locality, and small geographical and temporal locality). Economical auction-based models for long-term optimal replica decisions 8 as well as a network proximity dynamical replication HBR 49 are also developed. Since data access latency is dramatically reduced via contemporary technologies, the modern replication methods solve grid reliability problems. Address 41 the system-wide data availability problem, presenting two new data availability metrics (the System File Missing Rate and the System Bytes Missing Rate) and propose a novel heuristic algorithm that minimizes the Data Missing Rate (MinDmr) in the limited replica storage.
One of the current challenges is the development of general-purpose grid systems that possess capabilities of all above mentioned objective-specific grid categories. A classical grid can be straightforwardly upgraded to the generalized version if one substitutes physical machines with their virtual counterparts 23,57 . Such replacement creates new higher layer of abstraction, allowing dynamical instantiation and migration of virtual machines (VM). This generalized grid model regards entire computing environments as three independent logical entities (computation, state, and user data) which are handled as traditional OS processes and files and can be mapped onto servers of corresponding types. The VM hosts run dynamic VM guests (VM images) and represent computational resource. The data servers handle user data and represent storage resource. The image servers compress and archive static VM states and represent memory management resource. Various benchmark results indicate that the total overhead for deployed virtual grid can be at the acceptable level of 4.2 percent 23 .
The In-Vigo system 3 is another example of decoupling the architecture of hardware and operational behavior of software resources from their physically implemented instances via the means of virtualization technologies. The In-VIGO approach to virtualization is depicted in Figure 1. The In-VIGO augments the traditional grid computing model with three additional layers of virtualization. The first layer aggregates the elementary components of a virtual computing grid into pools of virtual resources such as virtual machines, virtual data, virtual applications and virtual networks. This layer maps jobs to virtual resources (VMs) that are managed across domains and physical environments (e.g. physical machines with various OS at different locations). The second layer instantiates grid applications as services connected on demand to create virtual information grids. This layer supports multiple grid-computing mechanisms (e.g. Globus, Condor-G, .NET and JXTA) to run applications and employs encapsulation to compose them as services (e.g. via OGSI, OGSI.NET and Jini) and hide implementation details 27 . The third layer manages virtualization of interfaces (e.g. XML and UIML) from aggregated services, in order to customize displaying by various devices (e.g. as HTML for laptop, WHML for a palmtop and WAP WML for a cell phone). In other words the first layer decouples resource allocation for applications from jobs management, the second layer decouples the service composition and usage from the execution management of the underlying applications, and the third layer decouples the generation of service interfaces from corresponding device-specific rendering.
At present moment only the first layer is successfully implemented with exception of virtual networks. Users are allowed to developed applications and are provided with interactive and batch-oriented interfaces for gridenabled tools. The second and third proposed layers are de-facto aimed to implement the functional paradigm of clouds.

The Generalized Cloud Model
Classical clouds present the most recognized solutions for reliable processing and storage of large amounts of general-purpose data. For example, the vast impact of emerging cloud technologies is stated in 6 "The first similarity between cloud computing and traditional utility models, electricity or telephony, for example, is that they all have characteristics of a disruptive general-purpose technology which make a surge of associated innovation possible." Contrary to classic grids, the cloud provides users not with granted machine resources on schedule but with services on demand. The cloud properties can be described in the following list 29,30,63 • Business model is customers-oriented. Payments are usually a posteriori defined on consumption level (pay-as-you-go model); The Cloud computing infrastructures are divided into several categories: commercial clouds (e.g. Amazon EC2), scientific cloud (e.g. Nimbus) and open-source technologies (e.g. Eucalyptus, Globus VWS). According to the NIST established definitions for deployment models 32,34,46 the cloud can also be classified as a private (internal) like the Eucalyptus or a public (external) like the Amazon EC2. Since these categories and classifications are not always mutually exclusive, many hybrid clouds exist, like CLEVER, GoGrid, VOC, OpenNebula, and Globus Nimbus 2 .
Clouds usually provide following basic service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) 19 . The first model of service provides computer infrastructure in a form of a hardware such as storage, servers and data center space or network components. Second model offers an access to an OS environment to develop and run applications, while hiding all infrastructure aspects from the scope of the user. The third model implements highest abstraction level, bringing to the user only typical commonly used software applications.
Much like their grid-counterparts, clouds follow the trend of generalization. For example the software solution of Ravello systems takes advantage of the nested virtualization ( Figure 2-3), which provides a mechanism to construct a unified intercloud storage. In particular, this advanced virtualization made possible to acquire on demand identical 4000 VM from AWS and 1000 from Google Compile Platform to organize training for networking engineers with each workspace consisting of five VM 22 .

Model Convergence
Grids and clouds in many aspects share similar features and purposes. For example, computational grid corresponds to cloud computing, data-grid resembles cloud storage, and interaction grid matches cloud collaboration. Both distributed systems consist of the typical elements and processes with the same role: data, metadata and client nodes, as well as replication, monitoring and loadbalancing procedures.
However, conventional grids and clouds have differences in operation schedules and user interaction models. Cloud schedulers are designed to maintain system and data integrity via regular scrubbing (checksum validation). The scrubbing procedure is vital for detection of latent bit errors on hard drives 18,33,42,48,55 . Grid schedulers behave similarly, but their main load comes from queued user jobs. Contrary to that, on-demand user services are provided by clouds, but not by classical grids.
Recent publications show the intensifying research on the unification of the Grid and Cloud computing 4,14,56 . The popular different approaches to the integration, namely, the "Grid on Cloud" and the "Cloud on Grid" undergo consolidation into new Grid-Cloud integration paradigm, accommodating advances in architectures, communications and user demand patterns. The publication of 2 presents taxonomy for the classification of grid-cloud integration: the disjointed, the partial and the full grid-cloud integration. This paper also considers the software tools used for on-demand grid deployment over cloud infrastructure and contributes comprehensive references on corresponding studies. For example 7 considered two scenarios for deployment of scientific phylogeny application MetaPIGA on combined Grid/Cloud architecture. The challenge was to use joint advantages of Grid and Cloud infrastructures to build a high performance, reliable and open platform. In the first case, the Cloud infrastructures provided no direct access to the clients, and all interactions obligatory passed through the Grid. In the second case, the Cloud accepted tasks directly submitted by the clients. The verifications had been carried out via the MetaPIGA system deployed on Amazon, Azure and VenusC Cloud infrastructures. Presents 16 another example of obtaining a computational resource through a Grid middleware (DIET) using existing Cloud infrastructure (EUCALYPTUS). The research of Di Costanzo, et al. shows that the InterGrid system can be used to build scalable virtualized computational environments working on cloud infrastructures, such as EUCALYPTUS and Amazon EC2.
The emerging unified grid-cloud systems employ so high abstraction layers to the client side that users receive mutually separated pure uniform resources. Computational capacity and storage space are provided explicitly and network resource is delivered implicitly. The complete separation of virtualized forms of CPU, Storage and Network resources from their physical counterparts forms an NVF-infrastructure (Network Function Virtualization) which allows to construct an distributed network (grid or cloud) from above mentioned typical virtual elements. This model is under implementation via Cloud Conductor with SLA-management developed by ARCCN 60 .
The study 54 reveals that despite recent advances up-to-date hardware capabilities like I/O performance still impede full virtualization for low latency and high throughput data processing. However, the Intel roadmap papers 5,9 suggest that future microprocessors will possess several levels of virtualization, concealing the hardware details from the system software and acting as unified yet partitionable virtual machines with global interfaces.

Discussion
The reviewed studies consider the virtualization as one of key factors in achieving grid-cloud convergence into globally distributed architecture with standardized interfaces at highest abstraction levels. However, the virtualization is rivaled by alternative technologies, namely: OS containers and application containers 59 . These technologies essentially provide operating system virtualization discarding hardware emulation level and sharing the same kernel of the host OS. The main benefit of omitting hardware abstraction layer is the much lower performance overhead 21 allowing the physical host to run more containers that virtual machines. On the other hand, the containers suffer from much weaker isolation and security 65 . Still, the containers allow optimal usage while running within guest OS and providing auxiliary functionality that is insignificant to employ dedicated VM.

Conlusion
In the conducted study, we showed differences between grids and clouds as well as both the premises and obstacles to mutual convergence of these distributed architectures. The analysis of related works confirms that virtualization plays significant role in system scalability, interoperability across the various systems, formation of interclouds, unification into confederation of clouds and eventual emergence of next generation of internet. This paper also reveals that the remaining obstacles to complete virtualization consist of hardware bottlenecks, including CPU, storage, and network I/O congestion. Ongoing high-priority research considers the means to overcome these performance issues and leverage latencies via both novel algorithmic means and advanced hardware solutions. In the following second part of the paper, we provide the detailed study the types, properties of virtual resources along with their appropriate management, and propose novel ballooning approach to memory balancing on nested virtual machines.

Acknowledgments
The article is published within applied scientific research performed with financial support of the Ministry of Education and Science of the Russian Federation. Subsidy provision agreement 14.579.21.0010. Universal identifier of the agreement is RFMEFI57914X0010.