Investigation of Improving Load Balancing and Data Fragmentation in Cloud Computing Performance

Objective: In this study we have investigated the affecting parameters in cloud computing in information technology. To this end, security and load balancing in cloud computing is still considered to be one of the most important views owe tothe sensitive and confidential information stored in the cloud by the users. Methods: This study has as a prime focus on the matters related to thesecurity and load balancing aspects of CC.Furthermore, we have tried to review the researches about cloud computing, improving load balancing and data fragmentation in cloud computing and related matters in regard of its performance. Findings: As well, this research paper has shown all of aspect of investigations in recent years. In fact, this study has located novel ways for finding new paths in this field for future researches. Application: In the latest part of the paper we have presented some future themes as follows: Load balancing with novel load balancing algorithms for better efficiency, and also security issues considering some aspects such as data fragmentation and encrypted in CC are unknown for research.


Introduction
During last years, Cloud Computing (CC) has become very popular in the industry in around the world. So CC has been defined as a computational model and an embedded modelfor managing and providing services over the internet. As well, it is a suitable model for demand and network access with a common space of configurable computing resources. For instance, networksand services that can be quickly deployed to customers with the least endeavor of manage or interact provide to customer. The evolution of the calculations is that after the need for elements such as electricity 1 , water, gas and telephone and so on, it is supposed to be a critical element. In this case, users try to get the service, without concentration where the service is located, and based on needs. The world of computing continues towards that evolve of softwares so it is available to everyone instead of running on separate computers as a service. The computing systems were designed to allow users to allocate their systems for gen-eral computing tasks at ideal times. Therefore, a large number of small and large computing resources have joined together in a grid as called voluntary computing and created a huge processing power. Also some of the old and novel computing systems include cluster computing, optical computing and recent cloud computing.
Cloud computing concept is referred to the 21st century, but its foundation dates backs to the early 1950s, when there were server rooms with gigantic ultra-powerful computers. These supercomputers shared between multiple users and through connections that are made by shared terminals, while most of the processing was carried out on the supercomputer itself 2 .
The CC is like a repository of resources and services that is accessible through the internet. In fact, CC services are delivered through a data center around the world, and they cloud achieve users with a virtual platform for services and resource utilization 3 .
Generally, resources are located on the server instead of on the user's side, which reduces costs. The provided services through CC are according to the user's sample, and the user pays based on the type of resources using 4 , similar to other public services according to the level of human usage.
The widespread presence of large companies such as Microsoft, Google, Amazon, etc. in the competitive landscape of CC, is indicator of the rapid development and dominance of such computing in the world of information technology.
One of the important profits of CC provided for its developers is the ability to build cloud bank in places of the globe that are cost-effective in terms of cost and energy efficiency. So, this means by cause of security provision for natural disasters that may occur in a particular area, and assigns for the moderation of the traffic and the system load.
Cloud computing models are usually divided into two types, one according to place and another one based on the type of service.
The cloud computing types (based on service type) are as follows: Three basic services that include: • Platform as a Service (PaaS).
• Software as a Service (SaaS).
Also, the CC types (based on position) are as follows: • Private • Public • Hybrid As we mentioned in the Figure 1, we observed difference between public and private clouds based on their restrictions.
As well as, we could mention to four main advantages of clouds as follows: • Cost reduction.
As well as, the most challenges of cloud using include: • Unemployment time of apparatus.
The study has been categorized as follows: review on load balancing, data security and data slipping on cloud com-puting are presented in part two. The analysis of the paper's research has been presented in part three. The conclusion and future work has been presented in part four.

Review on Security, Load Balancing and Data Fragmentation on Cloud Computing Scope
In this section, we investigate previous reports on the kinds of security, load balancing and data slipping aspects.

Security
The cloud is a next generation model that makes dynamic resource pools, virtualization, and high accessibility. Today, we have the ability to utilize scalable, distributed computing spaces within the confines of the Internet, a practice known as cloud computing 6 . So, CC is becoming a wellknown buzzword nowadays. Many companies, such as Google, Facebook and etc. accelerate their paces in developing CC systems and enhancing their services to provide for a larger amount of users. Although, security and privacy issues present a strong barrier for users to adapt into CC. The authors in 7 find those concerns are not adequate and more should be added in terms of five aspects (i.e., availability, confidentiality, data integrity, control) for security. Moreover, the authors claim that the prosperity in Cloud Computing literature is to be coming after those security and privacy issues having been resolved 7 .
There is an important need to securely store, manage, share and investigate massive amounts of complex (e.g., semi-structured and unstructured) data to assign patterns and trends for amend to the quality of healthcare, better safeguard the nation and research alternative energy. Because of the important nature of the applications, it is important that clouds be secure. The most security problem with clouds is that the owner of the data may not have control of where the data is allocated. This means that if one wants to operate the profits of using CC, one must also use the resource placement and scheduling provided by clouds. Therefore, we require guarding the data in the midst of untrusted processes 8 .
The CC, a rapidly development information technology, has aroused the concern of around the world. The CC is Internet-based computing, whereby shared resources, software and information, are achieved to computers and devices on-demand, such as the electricity grid. Also, the CC is the product of the fusion of traditional computing technology and network technology like grid computing, distributed computing parallel computing and etc. It purposes to build a complete system with powerful computing capability through a large number of relatively low-cost computing entity, and using the advanced business models like SaaS (Software as a Service) 9 . With the development of parallel computing, distributed computing, grid computing, a new computing model appeared. The notion of computing comes from the grid, public computing and SaaS.
It is a novel approach that shares basic framework. The basic rules of the CC are to build the computing be assigned in a great number of distributed computers, rather than local computer or remoter server. The running of the enterprise's data center is just like Internet. This builds the enterprise applicate the resource in the application that is needed, and access computer and storage system based on the requirement. The authors in the study 10 recognize the background and rule of the CC, the character, style and actuality. This research paper also introduces the application field the merit of the CC. It achieves secure and dependable data storage center, therefore user need not do the awful things such storing data and killing virus, this kind of task can be done by professionals. It can realize data share through different equipment. It analyses some questions and hidden troubles, and puts forward some solutions, and discusses the future of the CC. The CC is a computing style that provides power referenced with IT as a service. Users can enjoy the service even he knows nothing about the technology of cloud computing and the enhancement for data security in CC environment professional knowledge in this field and the power to control it 10 .

Load Balancing
One important matter associated with this scope is dynamic load balancing or task scheduling. The load balancing technique used to make sure that none of the node is in idle state while other nodes are being utilized.
Load balancing algorithms were reviewed heavily in kinds of environments. The CC is the main concerns involve efficiently assigning tasks to the cloud nodes such that the effort and request processing is done as efficiently as possible, while being able to tolerate the various affecting constraints such as heterogeneity and high communication delays [11][12][13][14] .
One of the technologies used in the CC is virtualization; virtualization is a runtime unit that acts as the basis for cloud computing technology. Along with the benefits of virtualization, it requires load management by load balancing algorithms to maintain stability and efficiency in the CC. Therefore, an algorithm that can improve virtual machines is essential.In fact, the units of task and virtual machines are in measured.

Load balancing algorithms:
The load balancing algorithms are divided into two static and dynamic general categories.
Static load balancing algorithm: These types of algorithms cannot adapt themselves to changes in the environment. The principle goal of these algorithms is to reduce overall execution time.

Dynamic load balancing algorithm:
These kinds of algorithms have higher accuracy and can produce better load balancing results. An important advantage of dynamic load balancing algorithms is decision-making based on the current state of the system, which improves system performance.
Also, in the study 15 the authors arouse study is to understand the current challenges in CC, primarily in cloud load balancing using static algorithms and finding gaps to bridge for more efficient static cloud load balancing in the future.
In the following, we mentioned related algorithms: • Round Robin Algorithm (RAA): In this algorithm, requests are assigned to a list of virtual machines rotated. Initially, the first request is assigned to a randomly selected virtual machine, and then the next request is assigned to another virtual machine with a rotational arrangement.
• Weighted Round Robin Algorithm: The revised version of the rotational turning algorithm is considered by each machine based on its characteristics. If a virtual machine has the power of processing two requests, it takes two weights, so each time the virtual machine arrives it allocates two requests. The assignment function is like rotational algorithm in this algorithm.
• Dynamic Round Rabin Algorithm: This algorithm randomly assigns one selected node to one accessible virtual machine. This algorithm is very simple and cannot recognize the high and low load of the virtual machine 16,17 .

Data Fragmentation
This part states the related works in data fragmentation in cloud computing. Data Fragmentation in Cloud for Optimal function and Security that collectively methods the security and function matters.
For respond to the question "How can we ensure that the use of unreliable cloud computing resources, it will ensure the confidentiality of information?" We describe this question and related conception in CC in the following paragraphs: In 18 have proposed a fragmentation plot, which is a column-based partition in server side; there are only encrypted partitions.
Each piece uses a unique identifier to support the query. Questions are processed twice. Initially, the whole part is transmitted to the client based on item identifier. When the client acquires that piece, we will decrypt it over the fragmentary piece for a re-query request. Finally, the result is returned after the second query is executed. While this method can be appropriate for a small data set, a significant amount of overload is needed to collect the whole database. In this approach, there is one case in which the entire database should be transmitted to the client side, which poses questions about the benefits of using CC. In addition, query processing reduces system performance twice. Hence, this approach is not the right solution for this problem; it eliminates the main advantage of cloud computing in terms of storage delivery. It also requires a lot of amount of overload time to run the query. Other studies have taken this approach [19][20][21] . They are mainly focus in the query optimization method as means for the attempt to address the constraint 18 , which is related to performance.

Fragmentation method:
The fragmentation technique consists of two aspects: A primary cloud and some general cloud. In an initial configuration, the entire database is encrypted using a highly secure encryption algorithm and stored in the main cloud. While not disclosing the encryption key to the main cloud provider.
The main cloud is the core section of our plan so that the whole connection is maintained in one situation. Furthermore, the encrypted algorithm operated on the original cloud which is a block that should be enough to protect the secure data 22-24 .

Analysis
In this regard, we suppose set m include from set of virtual machines Vm = {Vm1, Vm2,…,Vmn}. Which n process task there are in T = {T1, T2,…, Tn}. All running virtual machines are parallel and unrelated, and each virtual machine runs on its source. There is no sharing of resources from other virtual machines. We schedule non-exclusive affiliates for these virtual machines. Each "n" assigned task to "m" which is presented to the virtual machine as the LP model from the formulas (1 and 2). In these formulas, j is defined the virtual machine and i to the task 25 .
Process time (PTij) is assigned to j and i. If we assign "i" function, otherwise: Then the Linear Programming (LP) model (by using LP can achieve the best result (for example, the highest profit or the lowest cost) in certain circumstances and with specific constraints) is presented as follows:

Minimize Z PT x
Subject to x j m  Makespan is minimum of work completion time. Whereas makespan can state as follows: The virtual machine capacity is as follows: Which C VM is virtual machine capacity, Pe num is processing components number in virtual machine and Pe mips is millions instructions. All virtual machine capacity is equal to: Which C is total all virtual machine capacity and assigned capacity to application program/environment. As well as, the task length is that:

TL T T mips pe
= × And also, work length is that: In the above formula, p is tasks number which is related for work. Task load relation: The task load relation in Formulas (9) and (10) Often, two schedules used in a non-exclusive system are Round Robin (RR) and Weighted Round Robin (WRR) Algorithms. The proposed algorithm is Improved Weighted Round Robin (IWRR). It is mentioned that the IWRR algorithm is the most optimal algorithm and it allocates the jobs to the most suitable VMs based on the VM's information like its processing capacity, load on the VMs, and length of the arrived tasks with its priority.The proposed improved weighted round robin algorithm is the most optimal algorithm for assigning tasks to the most suitable virtual machines based on virtual machine information such as processing capacity, virtual machine load, and the length of tasks entered with priority. For example, in the study 26 the proposed WRR service broker policy is a modified version of the service proximity policy and considers the processing capacities of DCs in terms of two important parameters, namely number of processors and processor speed. In the proposed WRR policy, an initialization algorithm maintains a weighted list of all DCs in a single region.
Load balancing in supposed algorithm: The load balancing by collecting of the suspended execution time of each created virtual machine is initialized, and then it arranges in the ascending order, and then identifying the number of tasks in each virtual machine, and finally organizes it in an incremental order.
After mathematic modeling and some definition in this regard, we describe about the analyzing of the result. The function of IWRR algorithm based on simulation result in cloud analyzed. Response time is analyzed in Round Robin (RR), Weighted Round Robin (WRR) Algorithms and Improved Weighted Round Robin (IWRR) under the combination of homogeneous and heterogeneous work lengths with heterogeneous source status.

Homogeneous tasks in non-homogeneous sources (virtual machines):
At first, we investigate total runtime comparison in three above algorithms. It is mentioned that, the data center is divided into two homogeneous and heterogeneous groups, which are equal in terms of processor power homogeneity, while in heterogeneous mode, the processors are not equal in terms of execution power. The analysis of the following cases is the highest and lowest performance algorithms of presented homogeneous work in heterogeneous environment. Figures 2 and 3 provides an IWRR with a working length of a faster completion time, then demonstrate load balancing algorithms (WRR and RR) in non-homogeneous sources (virtual machines) and homogeneous work? IWRR scheduling algorithms considered length of work along with the processing capacity of non-homogeneous virtual machines for assigning a time. Thus, the huge number of tasks assign to virtual machines with high-capacity in homogeneous work on non-environment homogeneous. This helps to complete the task in a short time.  The load balancing in the IWRR runs at end of each tasks completion, if the load balances of each virtual machine were observed the dedicated tasks which is completed, so the virtual machine determines heavy loads from group of virtual machines and the it calculates probability completion time of current works in high-load and low-load or non-load virtual machines is more likely to work in, with little or no load. If low-capacity virtual machine can all the current workfinished in as short time as possible in high-load virtual machine, sothat work moves toward the low-load virtual machine.
The WRR algorithm considers the virtual machine's capacity ratio to the total capacity of the virtual machine, and allocates the corresponding number of works entered into the virtual machine. Thus, it runs on the next level.
But if all the long tasks are assigned to the low capacity of the virtual machine based on the above calculations, then it will delay the execution completion time.
RR algorithm did not take into account any varies in the space, with the capacity of virtual machines and the work length. It simply put the things in list one by one alternately on a regular basis. So its execution time is higher than the other two algorithms.
It is mentioned that, load balancing and data fragmentation are related to each other for increasing the efficiency of the system. And we can prolong the following result on data fragmentation.

Conclusion and Future Work
Cloud computing is relatively a new subject and it must have to overcome the load balancing and data fragmentation issues in order to be more and more tangible technology of the future. We suggest future themes in this regard as follows: • Load balancing with novel load balancing algorithms for better efficiency. • Security issues concerned with data fragmentation and encryption in CC are unknown for research.