Confidentiality Technique to Enhance Security of Data in Public Cloud Storage using Data Obfuscation

Security of data stored in the cloud is most important challenge in public cloud environment. Users are outsourced their data to cloud for efficient, reliable and flexible storage. Due the security issues, data are disclosed by Cloud Service Providers (CSP) and others users of cloud. To protect the data from unauthorized disclosure, this paper proposes a confidentiality technique as a Security Service Algorithm (SSA), named MONcryptto secure the data in cloud storage. This proposed confidentiality technique is based on data obfuscation technique. The MONcrypt SSA is availed from Security as a Service (SEaaS). Users could use this security service from SEaaS to secure their data at any time. Simulations were conducted in cloud environment (Amazon EC2) for analysis the security of proposed MONcrypt SSA. A security analysis tool is used for measure the security of proposed and existing obfuscation techniques. MONcrypt compares with existing obfuscation techniques like Base32, Base64 and Hexadecimal Encoding. The proposed technique provides better performance and good security when compared with existing obfuscation techniques. Unlike the existing technique, MONcrypt reduces the size of data being uploaded to the cloud storage. Confidentiality Technique to Enhance Security of Data in Public Cloud Storage using Data Obfuscation S. Monikandan1* and L. Arockiam2 1Department of MCA, Christhu Raj College, Panjappur, Tiruchirappalli – 620012, Tamil Nadu, India; moni.tamil@gmail.com 2Departmentof Computer Science, St. Joseph’s College (Autonomous), Tiruchirappalli – 620102, Tamil Nadu, India; larockiam@yahoo.co.in


Introduction
Cloud computing is an internet based computing. It is evolved from grid computing, utility computing, parallel computing, distributed computing and virtualization 1 . It has more powerful computing infrastructure with a pool of thousands of computers and servers 2 . It provides computational resources like server, storage, software, memory, network etc., as on-demand services 3 . It helps to reduce the computational infrastructure investment and maintenance cost of IT requisite for Small and Medium scale Enterprises (SME) 4 . It provides Everything (X) as a Service (XaaS) where 'X' denotes software, OS, server, hardware, storage, etc 5 . Cloud services are scaling up and down based on the users' demand 6 . Cloud has multiple datacentres placed in different geographical locations in the world to provide reliable services to the users 7 . It provides unlimited service provisioning without any human intervention. Cloud automates the service provisioning by way of running a number of Application Programming Interface (API) in the cloud storage environment.
Data protection is a crucial security issue for most of the enterprises 8 . The main issue focused in cloud computing is data security. However, users are more concerned about the security ofthe data in the cloud. Obfuscation is a process of converting original data into unintelligible data. It is similar to encryption but it uses mathematical calculations or programming logics 9 . Encryption is the procedure of transforming the readable data into unreadable data using an algorithm and a key 10 . The major difference between encryption and obfuscation is that encrypted data cannot be processed until theyare decrypted, whereas obfuscated data can be processed without de-obfuscation. This technique has recently become popular for data storage security in cloud storage 11 . Data storage and monitoring are the most important tasks for all enterprises. Small and Medium scale Enterprises (SME) are not having enough infrastructure to do so. Cloud computing solves this problem by providing enormous virtual storage 12 . SMEs outsource the data to cloud storage. The basic definition of outsourcing is to move the in-house activities to a third party (cloud provider) who has the required competence to perform data maintenance activities in an effective and efficient way. Other reasons for outsourcing data are to gain access to the knowledge and to minimize the risks associated. This phenomenon has helped SMEs concentrate more on its core competencies and thereby to gain competitive advantage in the markets.
Data outsourcing creates many security issues on the data stored in the cloud. Researchers start using obfuscation technique to provide security to the data in cloud. To maintaining security for the outsourced data in cloud is an imperative task. To enhance the security of data,the MONcrypt SSA is proposed for securing numerical data. As a part this research, there is an SSA called AROcrypt 13 which is used for Non-numerical data. Users can use these two SSA based on their sensitive data. MONcrypt is used for Numerical data and AROcrypt is used for Non-numerical. This paper only discusses about the MONcrypt SSA.

Related Work
There are many more obfuscation techniques available in the literature. The approach for each of the obfuscation techniques can range from very simple to highly mathematical. Recently, researchers have started to use obfuscation techniques for security protection in cloud environment.
Siani Pearson et al. 14 described a privacy manager for cloud computing, which reduces the risk to the cloud computing user of their private data being stolen or misused. As a first line of defense, the privacy manager uses a feature called obfuscation. The idea is that instead of being present unencrypted data in the cloud, the users' sensitive data are sent to the cloud in an obfuscated form. The obfuscation method uses a key which is chosen by the users and known by the privacy manager, but which is not communicated to the CSPs. Thus the CSP is not able to de-obfuscate the users' data and the data are not present on the CSPs' machines. It reduces the risks of theft of the data from the cloud and unauthorized uses of the data. Moreover, the obfuscated data are not personally identifiable information, and so the CSPs are not subject to the legal restrictions that apply to the processing of the de-obfuscated data. However, it is not practical for all cloud applications to work with obfuscated data.
Maheshwari et al. 15 introduced a scheme which allows a user to store data in a cloud and perform database style query on the stored data without using standard cryptography schemes while maintaining data confidentiality. Data obfuscation is used for security which makes it hard for the cloud to reconstruct the plaintext. This scheme represents each character by its glyph image. Instead of storing the normal glyph in a given font, which can be readily understood by a machine, add noise and split the glyph image into small portions. Different portions belonging to a glyph image are stored in servers belonging to different cloud providers. To reconstruct the data, the glyph portions from different servers need to be gathered. The overall concept of this scheme is to convert the users' data into an image and each part of the image is sent to different independent clouds.
The proposed MONcrypt SSA in SEaaS is an obfuscation technique and it is compared with existing obfuscation techniques such as Base64, Base32 and Hexadecimal Encoding with respect to security, time.

Existing Data Obfuscation Techniques
Data obfuscation is a method of data hiding where data is purposely scrambled to prevent from unauthorized access 14 . The result of obfuscation is an unintelligible or confusing data. Data obfuscation techniques are used to prevent the intrusion of sensitive data stored in the cloud. However, issues have stemmed from an inability to vigorously prevent security attacks 16 . Several data obfuscation techniques are discussed in the following subsections 17 .

Base64 Encoding
Base64 encoding schemes are commonly used when there is a need to encode binary data that need to be stored and transferred over media that are designed to deal with textual data. This is to ensure that the data remain intact without modification during transport. The Base64 encodesa word "Man" with "TWFu". The Base64 encodes the word "Man", by determining the related ASCII 8-bit values for each letter. Since base64 breaks binary into 6-bit groups, the easiest way is to connect them linearly and then redistribute into 6-bit groups. 010011010110000101101110 is thus represented as: 010011 010110 000101 101110 Each of these values is then recalculated in decimal as, 010011 = (0 x 2^5) + (1 x 2^4) + (0 x 2^3) + (0 x 2^2) + (1 x 2^1) + (1 x 2^0) = 19, then the alphabet corresponding to 19 is taken from the base64 index 18 tableto produce T. In the same way remaining decimals are calculated to produce WFu.

Base32 Encoding
The Base32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but that need not to be human readable. A 33-character subset of US-ASCII is used, enabling 5-bits to be represented per printable character.
The encoding process represents 40-bit groups of input bits as output strings of 8 encoded characters. Proceeding from left to right, a 40-bit input group is formed by concatenating five 8-bit input groups. These 40-bits are then treated as 8 concatenated 5-bit groups, each of which is translated into a single character in the Base32 alphabet. When a bit stream is encoded via the Base32 encoding, the bit stream must be presumed to be ordered with the most-significant-bit first. That is, the first bit in the stream is the high-order bit in the first 8-bit; the eighth bit will be the low-order bit in the first 8-bit, and so on.
Each 5-bit group is used as an index into an array of 32 printable characters. The character referenced by the index is placed in the output string. These characters, identified from the Base32 index table in 19 are selected from US-ASCII digits and uppercase letters.

ASCII Encoding
ASCII is a character encoding scheme originally based on the English alphabet. ASCII encoding is a method of representing characters with Base2 (binary) strings. These strings can then be further converted to other formats as needed like Hexadecimal, Base64, etc.

Hexadecimal Encoding
Hexadecimal is a positional notation system with a base of 16. It uses sixteen distinct symbols. Most often the symbols 0-9 represent the values zero to nine, and A, B, C, D, E, F represent values ten to fifteen. For example, the number 2AF3 is equal, in decimal, to (2 x 16^3) + (10 x 16^2) + (15 x 16^1) + (3 x 16^0), or 10995. Each hexadecimal digit represents four binary digits (bits), and the primary use of hexadecimal notation is a human friendly representation of binary coded values in computing.
Recently, data obfuscation is focused for the security of data in the cloud storage. Normally, obfuscation technique is not using keys to obfuscate the data unlike encryption. Simple obfuscation technique without using key is easy to break by brute force attack. The proposed MONcrypt SSA is using a key to obfuscate the data in cloud. The key is kept by the user to de-obfuscate the data.

Motivation
Cloud storage is widely popular and it is used by millions of people in the world. The users are eager to adopt the cloud storage by outsourcing their IT requisites. Cloud computing has numerous advantages such as easy to use and maintain, low power consumption for operation and reductions in the overhead for storing the data. Despite several advantages, cloud also suffers from different security threats and risks. Protecting from security threats and attacks is the primary concern for the enhancement of a more secured cloud infrastructure. Motivated by this fact, this paper aims at enhancing the security of the outsourced data by achieving the following goals.
To ensure that the data stored in the cloud are accessed • only by the data owners.
To propose security service algorithms suitable for • cloud environment using data obfuscation technique.
To reduce the size of data being stored in the cloud • storage.

Problem Definition
Data security is a critical area in cloud computing environment. Cloud has no limits, and the data can be physically placed at any data centers which are geographically distributed. The users are forced to use the platform and infrastructure provided by the same CSP. Hence the CSP knows where the data are located and have full access to the data. This scenario of cloud raises several issues regarding confidentiality of data. Users' data sent to the cloud are controlled and monitored by CSPs. CSPs as privileged administrators have the rights to look into the users' data. So, there is a possibility that data are hacked from CSPs. Users do not have any control over the data in cloud storage. Moreover, cloud is a public environment. Hence, data may have the chance to be mingled with data of other users.
Users do not know whether the data are encrypted in the cloud storage or not. Maintaining keys for each user is more difficult for CSPs, and the same key is used for all users' data 20 . Users' data have to be in a fixed format specified by the CSP, and hence the CSP knows all the information required for understanding users' data. Here are the issue raised up for data protection.

Objective
The objective of this paper is as follows: • To propose a security service algorithm to enhancing the security of numerical data in cloud storage. • To reduce the size of users' data are uploaded to cloud.

Methodology
Cloud storage has huge amount of virtual storage space. It is used by millions of users every day. Outsourced data in cloud storage needs to be protected from unauthorized access. MONcrypt is one of the SSAs provided by SEaaS. Figure 1 shows the methodology diagram of MONcrypt SSA. There are three cloud services used for Security, Key Generation and Management and Cloud Storage. These services are provided by three different independent CSPs. The data which are to be outsourced need to be obfuscated before they are uploaded to the cloud storage. It is executed when users choose this SSA for data security. It uses mathematical functions to obfuscate the data such as pow(), reverse(), module() and ascii(). Key denotes the number of rotation during obfuscation process. Key used for MONcrypt is generated from Key Generation and Management as a Service (KGMaaS). Users use the keys and MONcrypt SSA to obfuscate the data. The keys used for obfuscation are kept by the users. It is not communicated to the CSP. MONcrypt SSA is used to obfuscate numerical data only. If users' data contains numerical and non-numerical values, then MONcrypt analyses data for numerical values. Non-numerical values in the users' data are not considered for obfuscation by the MONcrypt SSA. It not only obfuscates the data but also reduces the size of data being uploaded to the cloud storage. There are no existing techniques to reduce the size of the data after the obfuscation process.

Proposed Technique
This section provides the detailed description of proposed obfuscation technique called MONcrypt SSA.Data obfuscation is recently popular in the field of security in the cloud. The proposed technique is used for the data protection without loss of information content. Steps involved in the proposed MONcrypt SSA are as follows: • Users submit the PlainText (PT) and key (K) to MONcrypt SSA.  The data which are to be outsourced need to be obfuscated before they are uploaded to the cloud storage. It is executed when users choose this SSA for data security. It uses mathematical functions to obfuscate the data such as pow(), reverse(), module() and ascii(). Key denotes the number of rotation during obfuscation process. Key used for MONcrypt is generated from Key Generation and Management as a Service (KGMaaS). Users use the keys and MONcrypt SSA to obfuscate the data. The keys used for obfuscation are kept by the users. It is not communicated to the CSP. MONcrypt SSA is used to obfuscate numerical data only. If users' data contains numerical and non-numerical values, then MONcrypt analyses data for numerical values. Non-numerical values in the users' data are not considered for obfuscation by the MONcrypt SSA. It not only obfuscates the data but also reduces the size of data being uploaded to the cloud storage. There are no existing techniques to reduce the size of the data after the obfuscation process.

Proposed Technique
This section provides the detailed description of proposed obfuscation technique called MONcrypt SSA.Data obfuscation is recently popular in the field of security in the cloud. The proposed technique is used for the data protection without loss of information content. Steps involved in the proposed MONcrypt SSA are as follows:

Users submit the PlainText (PT) and key (K) to
MONcrypt SSA.  The key K is incremented by one for each consecutive digit in the ST. RTvalue is divided by 256 to find the count value and also to get the reminder value in MT. The count represents the number of times the RT is divided by 256. Finally, the obfuscated text is attainedby converting the MT into ASCII character code.
The key Kis kept by the users. This value is used for de-obfuscation to get the original PT. In the proposed MONcrypt SSA, each PT value is obfuscated as an 8-bit character value. MONcrypt SSA compresses the size of plaintext.

Obfuscation Procedure of MONcrypt SSA
The detailed description of procedure involved in obfuscation process of the MONcrypt SSA to secure the data is given below. MONcrypt SSA producing different ciphertext for same PT value which is occurred more than oncein the plaintext. In the above plaintext, value '66' appears two times at 2 nd and 4 th place. Result of ciphertext for '66' at 2 nd and 4 th place is 'y' and '\v' respectively. This confuses the cryptanalysis to hack the data using dictionary or brute force attack.
MONcrypt also reduces the size of the ciphertext produced from obfuscation. The size of the plaintext is 14 bytes and after the MONcrypt obfuscation, the size of the cipher text is 10 bytes. MONcrypt SSA converts the plaintexts into ASCII character codes. It reduces the size of data uploaded to the cloud. Existing obfuscation techniques like Base64, Base32 and Hexadecimal Encoding are not reducing the size of data after the obfuscation.

Simulation Results
Simulation is performed in the cloud environment. The cloud users' machine connected to the cloud server has the configuration of Windows Operating System with core i3 Intel processor and 4GB RAM. The users' data are obfuscated before they are uploaded and are de-obfuscated when retrieved from the cloud. Thus, the obfuscation is done in the users' machine connected to the cloud server. Time taken for the execution of obfuscation is calculated in the users' machine.
Cloud server is used for cloud storage. Key generation and SSAs are developed as web service and hosted in the cloud server. These services are used for securing the data in cloud storage. The cloud server, Amazon micro instance has the following configuration as Microsoft Windows Server 2008 Base 32-bit operating system, 2.5 GHz Intel Xeon processor, 613 MB RAM, 30GB of Elastic Block Storage (EBS). The users upload the data via user interface. The data submit to SSA in SEaaS are obfuscated and then they are uploaded to cloud server. Security levels of the proposed SSA and existing techniques are measured in cloud server using a Security analysis tool.
Performance of the proposed MONcrypt SSA is calculated by the time taken for obfuscation and de-obfuscation in the users' machine.
Table 1and Figure 2 present the performance comparison of obfuscation with existing obfuscation techniques such as Base 64, Base 32 and Hexadecimal Encoding. The time taken by the existing and proposed obfuscation technique is calculated for different sizes of plaintext. The results show that the proposed technique has taken lower time duration than existing technique for obfuscating different sizes of plaintext. Table 2 and Figure 3 present the performance comparison of de-obfuscation with existing techniques based on the time. The time taken by the existing and proposed  de-obfuscation technique is calculated for different sizes of plaintext. The result shows that the proposed technique has taken lower time duration than existing technique for de-obfuscation. Table 3 denotes the size of the plaintext and obfuscated text. Plain_text_size denotes the different size of plaintext before obfuscation. Obfuscated_text_size denotes the different size of the output data generated after the obfuscation.
The executed results show the data size after and before obfuscation. From the results, it is noted that the size of data are reduced after obfuscation. MONcrypt saves lot storage space in the cloud storage. Table 4 shows the comparison of the input and output data size of existing and proposed obfuscation techniques.
In Table 4, no existing technique reduces the size of data after processing. MONcrypt reduces the size of data once the obfuscation is completed.

Measure of Security Level of Proposed SSA
Security level is analyzed by using a security analysis tool called All Block Ciphers (ABC) Hackman. The ABC Universal Hackman tool reads the obfuscated data from cloud server and uses dictionary and brute force attack for measuring the security level of each SSA in SEaaS. The Hackman tool analyses the security level of proposed SSA with corresponding existing techniques. This tool is installed in cloud server. Hackman tool hacks the obfuscated data in Amazon server. It attacks the data in the cloud storage in different ways like dictionary and brute force attack to retrieve the original data from obfuscated data. Once the retrieval is done, Hackman compares the retrieved data with original data to find number of original data retrieved from obfuscated data. Based on the retrieval of original data, the percentage of security level is measured for the proposed SSAs. The procedure for measuring the security level of proposed SSAs is as follows,    Let N be total number of cipher text stored in the cloud storage, X denotes the number of original text retrieved from cipher text by ABC Hackman.
To measure the security level of SSA, Where, K denotes data not matched with original data. Therefore, the percentage of security level is measured by, Where, Z denotes percentage of security level of proposed SSA in SEaaS, and the percentage of insecurity level is measured by, Where, Y denotes the percentage of insecurity level of the proposed SSA, each SSA produces different security level based on its working procedure. Table 5and Figure 4 represent the comparison of security level with existing techniques. The results show that the security level of MONcrypt is 90% and security level of existing obfuscation techniques is 64%, 57% and 55% for Base64, Base32 and Hexadecimal Encoding respectively.
MONcrypt shows maximum level of security when compared with existing obfuscation techniques. It shows the enhancement of security for numerical data stored in the public cloud environment.

Result Analysis
The efficiency of MONcrypt SSA is analyzed by considering the space and time complexity. MONcrypt SSA is used for numerical data. It converts the numerical data into ASCII character code. It compresses the total size of data after obfuscation. Table 4 shows the input and output data size of existing and proposed obfuscation techniques. The data presented in the Table 4 shows that there is a linear regression is occurred for space complexity.
The result of MONcrypt SSA shows that it reduces size of input data linearly from 1 to 15 MB. Hence the linear equation for the Space Complexity (SC) is defined as, y = Kx (4) Where x represents plaintext size and y represents ciphertext size and the scalar factor K is represented in Equation (5).
Space complexity of MONcrypt SSA is,  Table 4 MONcrypt SSA  requires only 16% of memory for storing the data in cloud. It reduces the memory into 84% than existing obfuscation techniques. Space complexity of existing obfuscation techniques are K= 1. Existing techniques are not reducing the size of data after obfuscation. They did not concentrate on memory usage of the data stored in the cloud. Time Complexity (TC) is also calculated from the linear equation. Table 1 shows the obfuscation time taken by MONcrypt SSA.
Time Complexity of MONcrypt SSA is defined as y = Kx, Where, X denotes size of data, Y denotes time taken for the obfuscation technique, K denotes the time complexity of obfuscation techniques. Time Complexity of MONcrypt SSA is 0.03. MONcrypt SSA minimizes the space complexity and it also reduces the time complexity. Time complexity of existing techniques are Base64 is 0.06, Base32 is 0.04 and Hexadecimal encoding is 0.07.Hence, the proposed MONcrypt SSA is more efficient than the existing obfuscation techniques. It is more suitable for public cloud environment.

Advantages of MONcrypt SSA
• The main advantage of MONcrypt is to reduce the size of data which are uploaded to the cloud storage. • It converts the numerical data onto ASCII character code. The existing obfuscation techniques in the literature are not reducing the size of the data. • Hackers are not able recover the original data.
• MONcrypt takes low time to process the data for obfuscation and de-obfuscation. • Latency in data upload and download is reduced.

Conclusion
Cloud storage is more efficient to store the user's data. It provides cost effective usage to the SMEs. SMEs do not possess full infrastructure to maintain their data with their premises. Data outsourcing helps the SMEs to effectively maintain their data with cloud storage. Data outsourcing has data security issues with cloud storage. The SSA is proposed, namely, MONcrypt. MONcrypt SSAis based on obfuscation technique. According to the proposed technique, the data are obfuscated before they are forwarded to the cloud storage. This technique obfuscates numerical values. MONcrypt SSA obfuscates the plaintext into ASCII character code, so the size of obfuscated text is reduced. The proposed technique not only ensures the confidentiality of the data, but also reduces the size of the plaintext. Existing techniques are not reducing the size of data after obfuscation. The simulation is conducted with different sizes of data and performance is calculated based on the time taken for obfuscation. From the results obtained, it is evident that the proposed MONcrypt provides maximum security to outsourced data in minimum time.